vak.prep.spectrogram_dataset.audio_helper.make_spectrogram_files_from_audio_files#
- vak.prep.spectrogram_dataset.audio_helper.make_spectrogram_files_from_audio_files(audio_format: str, spect_params: dict | SpectParamsConfig, output_dir: str, audio_dir: list | None = None, audio_files: list | None = None, annot_list: list | None = None, audio_annot_map: dict | None = None, annot_format: str | None = None, labelset: str | list | None = None, dask_bag_kwargs: dict | None = None)[source]#
Make spectrograms from audio files and save them in npz array files.
- Parameters:
audio_format (str) – A
string
representing the format of audio files. One of :constant:`vak.common.constants.VALID_AUDIO_FORMATS`.spect_params (dict or config.spect_params.SpectParamsConfig) – parameters for computing spectrogram, from .toml file. To see all related parameters, run: >>> help(vak.config.spect_params.SpectParamsConfig) To get a default configuration, create a SpectParamConfig with no arguments and then pass that to to_spect: >>> default_spect_params = vak.config.spect_params.SpectParamsConfig() >>> to_spect(audio_format=’wav’, spect_params=default_spect_params, output_dir=’.’)
audio_dir (str) – Path to directory containing audio files from which to make spectrograms.
audio_files (list) – of str, full paths to audio files from which to make spectrograms
annot_list (list) – of annotations for array files. Default is None.
audio_annot_map (dict) – Where keys are paths to array files and value corresponding to each key is the annotation for that array file. Default is None.
output_dir (str) – directory in which to save .spect.npz file generated for each audio file.
labelset (str, list) – of str or int, set of unique labels for vocalizations. Default is None. If not None, skip files where the associated annotations contain labels not in
labelset
.labelset
is converted to a Pythonset
usingvak.converters.labelset_to_set
. See help for that function for details on how to specify labelset.dask_bag_kwargs (dict) – Keyword arguments used when calling
dask.bag.from_sequence
. E.g.,{npartitions=20}
. Allows for finer-grained control when needed to process files of different sizes.
- Returns:
spect_files – of str, full paths to .spect.npz files
- Return type:
Notes
For each audio file, a corresponding ‘spect.npz’ file will be created. Each ‘.spect.npz’ file contains the following arrays:
- snumpy.ndarray
spectrogram, a 2-d array
- fnumpy.ndarray
vector of centers of frequency bins from spectrogram
- tnumpy.ndarray
vector of centers of tme bins from spectrogram
- audio_pathnumpy.ndarray
path to source audio file used to create spectrogram
The names of the arrays are defaults, and will change if different values are specified in spect_params for ‘spect_key’, ‘freqbins_key’, ‘timebins_key’, or ‘audio_path_key’.