vak.prep.split.split.train_test_dur_split_inds#

vak.prep.split.split.train_test_dur_split_inds(durs, labels, labelset, train_dur, test_dur, val_dur=None, algo='brute_force')[source]#

Return indices to split a dataset into training, test, and validation sets of specified durations.

Given the durations of a set of vocalizations, and labels from the annotations for those vocalizations, this function returns arrays of indices for splitting up the set into training, test, and validation sets.

Using those indices will produce datasets that each contain instances of all labels in the set of labels.

Parameters:
  • durs (iterable) – Of float. Durations of audio files.

  • labels (iterable) – Of numpy arrays of str or int. Labels for segments (syllables, phonemes, etc.) in audio files.

  • labelset (set, list) – set of unique labels for segments in files. Used to verify that each returned array of indices will produce a set that contains instances of all labels found in original set.

  • train_dur (float) – Target duration for training set, in seconds.

  • test_dur (float) – Target duration for test set, in seconds.

  • val_dur (float) – Target duration for validation set, in seconds. Default is None. If None, no indices are returned for validation set.

  • algo (str) – algorithm to use. One of {‘brute_force’, ‘inc_freq’}. Default is ‘brute_force’. For more information on the algorithms, see the docstrings, e.g., vak.io.algorithms.brute_force

Returns:

train_inds, test_inds, val_inds – indices to use with some array-like object to produce sets of specified durations

Return type:

numpy.ndarray