vak.prep.frame_classification.source_files.get_or_make_source_files#

Get source files for a dataset, or make them.

Gets either audio or spectrogram files from data dir, possibly paired with annotation files.

If input_type is 'audio', then this function will look for files with the extension for audio_format in data_dir. If input_type is 'spectrogram', and spect_format is specified, then this function will look for files with the extension for that format in data_dir. If input_type is spectrogram, and audio_format is specified, this function will look for audio files with that extension and then generate spectrograms for them using spect_params. If an annot_format is specified, this function will additionally look for annotation files for the audio or spectrogram files. If all annotations are in a single file, this can be specified with the annot_file parameter, and that will be used instead of looking for other annotation files.

Parameters:

data_dir (str, Path) – Path to directory with files from which to make dataset.
input_type (str) – The type of input to the neural network model. One of {‘audio’, ‘spect’}.
audio_format (str) – Format of audio files. One of {‘wav’, ‘cbin’}. Default is None, but either audio_format or spect_format must be specified.
spect_format (str) – Format of files containing spectrograms as 2-d matrices. One of {‘mat’, ‘npz’}. Default is None, but either audio_format or spect_format must be specified.
spect_params (dict, vak.config.SpectParams) – Parameters for creating spectrograms. Default is None.
spect_output_dir (str) – Path to location where spectrogram files should be saved. Default is None. If input_type is 'spect', then spect_output_dir defaults to data_dir.
annot_format (str) – Format of annotations. Any format that can be used with the :module:`crowsetta` library is valid. Default is None.
annot_file (str) – Path to a single annotation file. Default is None. Used when a single file contains annotates multiple audio or spectrogram files.
audio_dask_bag_kwargs (dict) – Keyword arguments used when calling dask.bag.from_sequence() inside vak.io.audio(), where it is used to parallelize the conversion of audio files into spectrograms. Option should be specified in config.toml file as an inline table, e.g., audio_dask_bag_kwargs = { npartitions = 20 }. Allows for finer-grained control when needed to process files of different sizes.
labelset (str, list, set) – Set of unique labels for vocalizations. Strings or integers. Default is None. If not None, then files will be skipped where the associated annotation contains labels not found in labelset. labelset is converted to a Python set using vak.converters.labelset_to_set(). See help for that function for details on how to specify labelset.

Returns:

source_files_df – Source files that will become the dataset, represented as a pandas.DataFrame. Each row corresponds to one sample in the dataset, either an audio file or spectrogram file, possibly paired with annotations.

Return type:

pandas.DataFrame