vak.prep.frame_classification.frame_classification.prep_frame_classification_dataset#
- vak.prep.frame_classification.frame_classification.prep_frame_classification_dataset(data_dir: str | Path, input_type: str, purpose: str, output_dir: str | Path | None = None, audio_format: str | None = None, spect_format: str | None = None, spect_params: dict | None = None, annot_format: str | None = None, annot_file: str | Path | None = None, labelset: set | None = None, audio_dask_bag_kwargs: dict | None = None, train_dur: float | None = None, val_dur: float | None = None, test_dur: float | None = None, train_set_durs: list[float] | None = None, num_replicates: int | None = None, spect_key: str = 's', timebins_key: str = 't', freqbins_key: str = 'f')[source]#
Prepare datasets for neural network models that perform the frame classification task.
For general information on dataset preparation, see the docstring for
vak.prep.prep()
.- Parameters:
data_dir (str, Path) – Path to directory with files from which to make dataset.
input_type (str) – The type of input to the neural network model. One of {‘audio’, ‘spect’}.
purpose (str) – Purpose of the dataset. One of {‘train’, ‘eval’, ‘predict’, ‘learncurve’}. These correspond to commands of the vak command-line interface.
input_type – The type of input to the neural network model. One of {‘audio’, ‘spect’}.
output_dir (str) – Path to location where data sets should be saved. Default is
None
, in which case it defaults todata_dir
.audio_format (str) – Format of audio files. One of {‘wav’, ‘cbin’}. Default is
None
, but eitheraudio_format
orspect_format
must be specified.spect_format (str) – Format of files containing spectrograms as 2-d matrices. One of {‘mat’, ‘npz’}. Default is None, but either audio_format or spect_format must be specified.
spect_params (dict, vak.config.SpectParams) – Parameters for creating spectrograms. Default is
None
.annot_format (str) – Format of annotations. Any format that can be used with the :module:`crowsetta` library is valid. Default is
None
.annot_file (str) – Path to a single annotation file. Default is
None
. Used when a single file contains annotates multiple audio or spectrogram files.labelset (str, list, set) – Set of unique labels for vocalizations. Strings or integers. Default is
None
. If notNone
, then files will be skipped where the associated annotation contains labels not found inlabelset
.labelset
is converted to a Pythonset
usingvak.converters.labelset_to_set()
. See help for that function for details on how to specifylabelset
.audio_dask_bag_kwargs (dict) – Keyword arguments used when calling
dask.bag.from_sequence()
insidevak.io.audio()
, where it is used to parallelize the conversion of audio files into spectrograms. Option should be specified in config.toml file as an inline table, e.g.,audio_dask_bag_kwargs = { npartitions = 20 }
. Allows for finer-grained control when needed to process files of different sizes.train_dur (float) – Total duration of training set, in seconds. When creating a learning curve, training subsets of shorter duration will be drawn from this set. Default is None.
val_dur (float) – Total duration of validation set, in seconds. Default is None.
test_dur (float) – Total duration of test set, in seconds. Default is None.
train_set_durs (list) – of int, durations in seconds of subsets taken from training data to create a learning curve, e.g. [5, 10, 15, 20].
num_replicates (int) – number of times to replicate training for each training set duration to better estimate metrics for a training set of that size. Each replicate uses a different randomly drawn subset of the training data (but of the same duration).
spect_key (str) – Key for accessing spectrogram in files. Default is ‘s’.
timebins_key (str) – Key for accessing vector of time bins in files. Default is ‘t’.
freqbins_key (str) – Key for accessing vector of frequency bins in files. Default is ‘f’.
- Returns:
dataset_df (pandas.DataFrame) – That represents a dataset.
dataset_path (pathlib.Path) – Path to csv saved from
dataset_df
.