
vak.prep.spectrogram_dataset.audio_helper.make_spectrogram_files_from_audio_files(audio_format: str, spect_params: dict | SpectParamsConfig, output_dir: str, audio_dir: list | None = None, audio_files: list | None = None, annot_list: list | None = None, audio_annot_map: dict | None = None, annot_format: str | None = None, labelset: str | list | None = None, dask_bag_kwargs: dict | None = None)[source]#

Make spectrograms from audio files and save them in npz array files.

  • audio_format (str) – A string representing the format of audio files. One of :constant:`vak.common.constants.VALID_AUDIO_FORMATS`.

  • spect_params (dict or config.spect_params.SpectParamsConfig) – parameters for computing spectrogram, from .toml file. To see all related parameters, run: >>> help(vak.config.spect_params.SpectParamsConfig) To get a default configuration, create a SpectParamConfig with no arguments and then pass that to to_spect: >>> default_spect_params = vak.config.spect_params.SpectParamsConfig() >>> to_spect(audio_format=’wav’, spect_params=default_spect_params, output_dir=’.’)

  • audio_dir (str) – Path to directory containing audio files from which to make spectrograms.

  • audio_files (list) – of str, full paths to audio files from which to make spectrograms

  • annot_list (list) – of annotations for array files. Default is None.

  • audio_annot_map (dict) – Where keys are paths to array files and value corresponding to each key is the annotation for that array file. Default is None.

  • output_dir (str) – directory in which to save .spect.npz file generated for each audio file.

  • labelset (str, list) – of str or int, set of unique labels for vocalizations. Default is None. If not None, skip files where the associated annotations contain labels not in labelset. labelset is converted to a Python set using vak.converters.labelset_to_set. See help for that function for details on how to specify labelset.

  • dask_bag_kwargs (dict) – Keyword arguments used when calling dask.bag.from_sequence. E.g., {npartitions=20}. Allows for finer-grained control when needed to process files of different sizes.


spect_files – of str, full paths to .spect.npz files

Return type:



For each audio file, a corresponding ‘spect.npz’ file will be created. Each ‘.spect.npz’ file contains the following arrays:


spectrogram, a 2-d array


vector of centers of frequency bins from spectrogram


vector of centers of tme bins from spectrogram


path to source audio file used to create spectrogram

The names of the arrays are defaults, and will change if different values are specified in spect_params for ‘spect_key’, ‘freqbins_key’, ‘timebins_key’, or ‘audio_path_key’.