vak.prep.spectrogram_dataset.prep.prep_spectrogram_dataset#

Make a dataset of spectrograms, optionally paired with annotations.

Prepares dataset of vocalizations from a directory of audio or spectrogram files, and (optionally) annotation for those files. The dataset is returned as a pandas DataFrame.

Datasets are used to train neural networks, predicting annotations for the dataset itself using a trained neural network, etc.

If dataset is created from audio files, then array files containing spectrograms will be generated from the audio files and saved in spect_output_dir with the extension .spect.npz. The spect_output_dir defaults to data_dir if is not specified.

Parameters:

data_dir (str) – path to directory with audio or spectrogram files from which to make dataset
annot_format (str) – format of annotations. Any format that can be used with the crowsetta library is valid. Default is None.
labelset (str, list, set) – of str or int, set of unique labels for vocalizations. Default is None. If not None, then files will be skipped where the associated annotation contains labels not found in labelset. labelset is converted to a Python set using vak.converters.labelset_to_set. See help for that function for details on how to specify labelset.
load_spects (bool) – if True, load spectrograms. If False, return a FramesDataset without spectograms loaded. Default is True. Set to False when you want to create a FramesDataset for use later, but don’t want to load all the spectrograms into memory yet.
audio_format (str) – format of audio files. One of {‘wav’, ‘cbin’}.
spect_format (str) – format of array files containing spectrograms as 2-d matrices. One of {‘mat’, ‘npz’}.
annot_file (str) – Path to a single annotation file. Default is None. Used when a single file contains annotations for multiple audio files.
spect_params (dict, vak.config.spect.SpectParamsConfig.) – Parameters for creating spectrograms. Default is None (implying that spectrograms are already made).
spect_output_dir (str) – Path to location where spectrogram files should be saved. Default is None, in which case it defaults to data_dir.
audio_dask_bag_kwargs (dict) – Keyword arguments used when calling dask.bag.from_sequence inside vak.io.audio, where it is used to parallelize the conversion of audio files into spectrograms. Option should be specified in config.toml file as an inline table, e.g., audio_dask_bag_kwargs = { npartitions = 20 }. Allows for finer-grained control when needed to process files of different sizes.

Returns:

source_files_df – A set of source files that will be used to prepare a data set for use with neural network models, represented as a pandas.DataFrame. Will contain paths to spectrogram files, possibly paired with annotation files, as well as the original audio files if the spectrograms were generated from audio by vak.prep.audio_helper.make_spectrogram_files_from_audio_files(). The columns of the dataframe are specified by vak.prep.spectrogram_dataset.spect_helper.DF_COLUMNS.

Return type:

pandas.DataFrame