vak.prep.spectrogram_dataset.audio_helper.make_spectrogram_files_from_audio_files#

vak.prep.spectrogram_dataset.audio_helper.make_spectrogram_files_from_audio_files(audio_format: str, spect_params: dict | SpectParamsConfig, output_dir: str, audio_dir: list | None = None, audio_files: list | None = None, annot_list: list | None = None, audio_annot_map: dict | None = None, annot_format: str | None = None, labelset: str | list | None = None, dask_bag_kwargs: dict | None = None)[source]#

Make spectrograms from audio files and save them in npz array files.

Parameters:
  • audio_format (str) – A string representing the format of audio files. One of :constant:`vak.common.constants.VALID_AUDIO_FORMATS`.

  • spect_params (dict or config.spect_params.SpectParamsConfig) – parameters for computing spectrogram, from .toml file. To see all related parameters, run: >>> help(vak.config.spect_params.SpectParamsConfig) To get a default configuration, create a SpectParamConfig with no arguments and then pass that to to_spect: >>> default_spect_params = vak.config.spect_params.SpectParamsConfig() >>> to_spect(audio_format=’wav’, spect_params=default_spect_params, output_dir=’.’)

  • audio_dir (str) – Path to directory containing audio files from which to make spectrograms.

  • audio_files (list) – of str, full paths to audio files from which to make spectrograms

  • annot_list (list) – of annotations for array files. Default is None.

  • audio_annot_map (dict) – Where keys are paths to array files and value corresponding to each key is the annotation for that array file. Default is None.

  • output_dir (str) – directory in which to save .spect.npz file generated for each audio file.

  • labelset (str, list) – of str or int, set of unique labels for vocalizations. Default is None. If not None, skip files where the associated annotations contain labels not in labelset. labelset is converted to a Python set using vak.converters.labelset_to_set. See help for that function for details on how to specify labelset.

  • dask_bag_kwargs (dict) – Keyword arguments used when calling dask.bag.from_sequence. E.g., {npartitions=20}. Allows for finer-grained control when needed to process files of different sizes.

Returns:

spect_files – of str, full paths to .spect.npz files

Return type:

list

Notes

For each audio file, a corresponding ‘spect.npz’ file will be created. Each ‘.spect.npz’ file contains the following arrays:

snumpy.ndarray

spectrogram, a 2-d array

fnumpy.ndarray

vector of centers of frequency bins from spectrogram

tnumpy.ndarray

vector of centers of tme bins from spectrogram

audio_pathnumpy.ndarray

path to source audio file used to create spectrogram

The names of the arrays are defaults, and will change if different values are specified in spect_params for ‘spect_key’, ‘freqbins_key’, ‘timebins_key’, or ‘audio_path_key’.