How do I use my own vocal annotation format?#

To load annotation formats, vak uses another Python package called crowsetta ( It has built-in support for common formats
such as the .TextGrid files generated by Praat. Even if your data is not annotated with one of these common formats, you can still use crowsetta to convert your annotations into a format that vak can read.

There are two main ways to do this. The first is to convert the annotations to a simple .csv file format, that crowsetta calls 'simple-seq'. You can easily create files in this format with the pandas library, as we show with an example script below. The second approach is to convert your annotations to a more generic format built into crowsetta, called 'generic-seq', that is designed to represent a large set of annotations as a single .csv file. In the sections below, we walk through both methods.

See also

For more detail on how vak relates annotation files to the files that they annotate, please see How does vak know which annotations go with which annotated files? in the how-to on How do I prepare datasets of annotated vocalizations for use with vak?.

Method 1: converting your annotations to the 'simple-seq' format#

The first method is to convert your annotations to a format named 'simple-seq'. This method will work for a wide array of annotation formats that all can be mapped to a sequence of segments, with each segment having an onset time, offset time, and label. The one assumption the 'simple-seq' format makes is that you have one annotation file per file that is annotated, that is, one annotation file per audio file or per array file containing a spectrogram. This is likely to be the case if you are using apps like Praat or Audacity. An example of such a format is the Audacity standard label track format, exported to .txt files, that you would get if you were to annotate with
region labels.

Below we provide an example of how you would write a very small Python script to convert your annotations to the 'simple-seq' format using the pandas library. First we explain what your dataset should look like.

Explanation of when you can use the 'simple-seq' format#

Again, this first approach assumes that you have a separate annotation file for each file you have with a vocalization in it, either an audio file or an array file containing a spectrogram. In other words, a directory of your data looks something like this:

... # more files here

Notice that each .wav audio file has a corresponding .txt file with annotations. Each of the .txt files has columns that could be imported into a GUI application, e.g. Audacity.


Those files are taken from this dataset:
You can download them to work through the example yourself.

Here’s we use the cat command in the terminal to dump out the contents of the first .txt file:

$ cat BB_SGP16-1___20160521_214723.txt
8.358329	15.019360	Common Pip
194.710924	199.112019	Barbastelle - good

We can see there are two rows, each with an onset time, an offset time, and a text label. The evenly-aligned columns tell us that they are separated by tabs (which you can also notice if you open the file in a text editor and move the cursor around). Lastly we see that there is no header, that is, no first row with column names, such as “start time”, “stop time”, and “name”.

What we want is to convert each .txt file to a comma-separated file (a .csv) in the 'simple-seq' format, with a header that has the specific column names that crowsetta recognizes. We can easily create such files with pandas. We will write a script to do so. After running the script,
we will have a .csv file for each .txt file in our directory, as shown:

... # more files here

Notice also how the script names the new annotation files. For each audio file, it creates an annotation file with the same name, including the audio extension, and the annotation extension added after that. For example, the script creates an annotation file named “DB_1-WWS16-2___20160822_203501.wav.csv” for the audio file named “DB_1-WWS16-2___20160822_203501.wav”. We could also just name the files by replacing the extension .wav with the extension .csv. One drawback of naming the files by just replacing the extension is that we cannot have any other .csv files with the same name in the directory. This would be true if we want to have an analysis file for each audio file. For example, “DB_1-WWS16-2___20160822_203501.csv”. could contain features or measurements we extract from “DB_1-WWS16-2___20160822_203501.wav”.

More on naming annotation files

As stated above, more detail on how vak relates annotation files to the files that they annotate can be found in the section How does vak know which annotations go with which annotated files? on the how-to page
How do I prepare datasets of annotated vocalizations for use with vak?. The reference section also provides a page on File naming conventions.

Example script for converting .txt files to the 'simple-seq' format#

Below is a script that loads the text files using pandas, and then adds the columns names needed before saving a new .csv file with the same values.

import pathlib

import pandas as pd

COLUMNS = ['onset_s', 'offset_s', 'label']

def main():
    txt_files = sorted(pathlib.Path('./path/to/data').glob('*.txt'))
    for txt_file in txt_files:
        txt_df = pd.read_csv(txt_file, sep='\t', header=None)  # sep='\t' because tab-separated
        txt_df.columns = COLUMNS
        # in next line, use `` to get the entire file name with audio extension
        # and then add the .csv extension to it, to follow naming convention
        csv_name = txt_file.parent / ( + '.csv')

if __name__ == '__main__':

Using the 'simple-seq' format with vak#

Once you have annotations in the 'simple-seq' format, you will set up the [PREP] section of your configuration file like this:

data_dir = "~/Documents/data/vocal/BFSongRepo-test-csv-format/gy6or6/032212"
output_dir = "./data/prep/train"
audio_format = "cbin"
annot_format = "simple-seq"
labelset = "iabcdefghjk"
train_dur = 50
val_dur = 15

vak will look for a .csv file in the 'simple-seq' format for each audio file (or spectrogram file, if you are supplying your own spectrogram files).

Method 2: converting your annotations to the generic format#

An alternative to the first method is to use the 'generic-seq' format. This method may make sense if you do not have a separate annotation file for each audio file, e.g., all your annotations are in a single file saved by an application. There are basically two steps to converting your format to generic-seq, described below.


(Previously the 'generic-seq' format was called 'csv'; this name will be removed in the next version of crowsetta).


  1. Write a Python script that loads the onsets, offsets, and labels from your format, and then uses that data to create the Annotations and Sequences that crowsetta uses to convert between formats.


    For examples, please see any of the modules for built-in functions in the crowsetta library.

    E.g., the notmat module:

    That module parses annotations from this dataset:

  2. Then save your Annotations—converted to the generic crowsetta format—in a .csv file, using the crowsetta.csv functions. There is a convenience function crowsetta.csv.annot2csv that you can use if you have already written a function that returns crowsetta.Annotations. Again, see examples in the built-in format modules.

Example script for converting .txt files to the 'generic-seq' format#

Here is a script that carries out steps one and two. This script can be run on the example data used for training a model in the tutorial Automated Annotation.

import pathlib

import numpy as np

import crowsetta

data_dir = pathlib.Path('~/Documents/data/gy6or6/032312').expanduser()  # ``expanduser`` for '~' 
annot_path = sorted(data_dir.glob('*.not.mat'))

# the name of the .csv with our `'generic-seq'` format annotations
csv_filename = 'data/annot/gy6or6.032212.annot.csv'

# ---- step 1. convert to ``Annotation``s with ``Sequence``s
annot = []
for a_notmat in annot_path:
    notmat_dict =, squeeze_me=True)
    # in .not.mat files saved by evsonganaly,
    # onsets and offsets are in units of ms, have to convert to s
    onsets_s = notmat_dict['onsets'] / 1000
    offsets_s = notmat_dict['offsets'] / 1000

    audio_pathname = str(a_notmat).replace('.not.mat', '')

    notmat_seq = crowsetta.Sequence.from_keyword(labels=np.asarray(list(notmat_dict['labels'])),
       # see `warning` below for explanation of why `annot_path=csv_filename`
        crowsetta.Annotation(annot_path=csv_filename, audio_path=audio_pathname, seq=notmat_seq)

# ---- step 2. save as a .csv
crowsetta.csv.annot2csv(annot, csv_filename=csv_filename)


In the script above, when creating Annotations, notice that we specified the annot_path as the path to the .csv file itself, instead of specifying the path to the original .not.mat annotation files. You should do the same. E.g., if you are saving your annotations in a .csv file named bat1_converted.csv, then the value for every cell in the annot_path column of your .csv file should be also be bat1_converted.csv. This workaround prevents vak from trying to open the original annotation files as if they were a .csv file, which can cause an error.

Using the 'generic-seq' format with vak#

If you have written a script that saves all your annotations in a single .csv file as described above, then you need to tell vak to use that file. To do so, you add the annot_file option in the [PREP] section of your .toml configuration file, as shown below:

data_dir = "~/Documents/data/vocal/BFSongRepo-test-csv-format/gy6or6/032212"
output_dir = "./data/prep/train"
audio_format = "cbin"
annot_format = "generic-seq"
annot_file = "./data/annot/gy6or6.032212.annot.csv"
labelset = "iabcdefghjk"
train_dur = 50
val_dur = 15