How do I use my own spectrograms?#
vak
has built-in functionality to generate spectrograms
from audio files. It will do this when you build a dataset
with the vak prep
command, and specify an audio_format
in your configuration file.
But you can also use spectrograms that you generate with your own code, taking advantage of existing libraries in the scientific Python ecosystem, such as librosa.
There are two formats you can use for your own spectrograms,
either .npz
or .mat
files.
.npz
is a numpy
library format,
for a file that can contain multiple arrays.
The .mat
extension denotes
the equivalent Matlab data file format; many labs
have existing codebases that generate spectrograms using Matlab.
You will specify either npz
or vak
in the [PREP]
section
of your .toml
configuration file.
Step-by-step#
This recipe describes how to using spectrograms generated with Python code.
For Matlab code, the only difference is, you would save each spectrogram in a .mat
file,
using the built-in save
function.
Write your script that generates spectrograms for each of your audio files, e.g. using librosa.
In that script, save each spectrogram in an
.npz
file, along with the vectors of times and frequencies that are typically returned by a function that computes spectrograms.The naming and the contents of the file should match the specification in Spectrogram file format.
A spectrogram array files should contain (at least) three items.
The spectrogram, an m x n matrix
A vector of m frequency bins, where the value of each element is the frequency at the center of the bin
A vector of n time bins, where the value each element is the time at the center of the bin
A fourth item is not required, but is suggested.
Please consult the page specifying the spectrogram file format for more details. There is also a link on that page to download an example set of
.npz
files generated by runningvak prep
, that you can inspect to better understand what the contents of the files saved by your script should be.In the
[PREP]
section of yourvak
configuration file, specify:spect_format = 'npz'
For the
data_dir
option, put the path to the directory that contains allnpz
files saved by your script.Run
vak prep
with that configuration file.vak
will look fornpz
files in thedata_dir
directory, and link them to the correct annotation by removing thenpz
from the file name to recover the name of the audio file, and then finding an annotation for that audio file.