Spectrogram file format#

File type#

vak uses pre-computed files containing spectrograms.

For these files, it accepts two types, either .npz or .mat. .npz is a numpy library format, for a file that can contain multiple arrays. .mat is the Matlab data file format—many labs have existing codebases that generate spectrograms using Matlab. To work with one of these formats, you will specify either npz or vak in the [PREP] section of your .toml configuration file.

Note

vak loads .mat files with the function scipy.io.loadmat. That function can only load v4 (Level 1.0), v6 and v7 to 7.2 matfiles as stated here: https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html Version 7.3 of the matfile format uses an HDF5-based format, which is not supported by scipy or vak. (For more details see this page in the Matlab documentation ) If you have are working with Matlab, please either save your .mat files in a format that can be ready by scipy.io.loadmat, or convert your data to .npz files as described in How do I use my own spectrograms?.

Conventions#

Regardless of whether they are .npz files or .mat files, vak expects any spectrogram files to obey the following conventions.

Content#

A spectrogram array files should contain (at least) three items.

  1. The spectrogram, an m x n matrix

  2. A vector of m frequency bins, where the value of each element is the frequency at the center of the bin

  3. A vector of n time bins, where the value each element is the time at the center of the bin

A fourth item is not required, but is suggested.

  1. A string path to the audio file from which the spectrogram was generated.

Other arrays can be in the file, but they will be ignored.

Array naming#

By convention each item should be associated with a string key. The defaults built into vak are: ‘s’, ‘f’, ‘t’, and ‘audio_path’. These defaults can be changed when preparing a dataset by changing the corresponding options in the [SPECT_PARAMS] section of a .toml configuration file. If you are using Matlab to generate the spectrogram files, then you will need to either save your workspace variables with the default names, or tell vak what names you used by changing the [SPECT_PARAMS] options. As noted above, the audio_path is not required, but it is added by vak.prep when generating a dataset of spectrogram files from audio.

Spectrogram file naming#

There are two valid ways to name spectrogram files. The first is to name each spectrogram file the same as the name of the audio file it was created from, with the spectrogram file format added. E.g., if your audio file is bird1.wav, then the spectrogram file should be bird1.wav.npz. The second way is to name the spectrogram file by replacing the audio file extension with the array file extension, e.g., the spectrogram from bird1.wav would be saved in bird1.npz. The second way may be more intuitive, while the first allows for other .npz files with the same stem in the same directory, e.g. day1/bird1.wav.npz and day1/bird1.ftr.npz can be found side by side. For more detail, please see the page File naming conventions.

Example array files that meet this spectrogram file format specification#

Please click on this link to download a .tar.gz archive containing spectrogram files generated by a run of vak prep on audio data: https://osf.io/9cedz/download

You can inspect the contents .npz array files by loading them with numpy.load

These files are provided to demonstrate the specification described here. You may find them helpful as examples if you prefer to generate your own spectrograms, and you need to write a script to create array files containing your spectrograms so vak can work with them, as described in How do I use my own spectrograms?.