(autoannotate)=

# Automated Annotation

`vak` lets you automate annotation of vocalizations with neural networks.
This tutorial walks you through how you would do that, using an example dataset.
When we say annotation, we mean the the kind produced by a software tool
that researchers studying speech and animal vocalizations use,
like [Praat](http://www.fon.hum.uva.nl/praat/manual/Intro_7__Annotation.html)
or [Audacity](https://manual.audacityteam.org/man/creating_and_selecting_labels.html).
Typically the annotation consists of a file that specifies segments defined by their onsets, offsets, and labels.
Below is an example of some annotated Bengalese finch song, which is what we'll use for the tutorial.

```{image} ../images/annotation_example_for_tutorial.png
:align: center
:alt: spectrogram of Bengalese finch song with amplitude plotted underneath, divided
:  into segments labeled with letters
:scale: 50 %
```

:::{hint}
`vak` has built-in support for widely-used annotation formats.
Even if your data is not annotated with one of these formats,
you can use `vak` by converting your annotations to a simple `.csv` format
that is easy to create with Python libraries like `pandas`.
For more information, please see:
{ref}`howto-user-annot`
:::

The tutorial is aimed at beginners: you don't need to know how to code.
To work with `vak` you will use simple configuration files that you run from the command line.
If you're not sure what is meant by "configuration file" or "command line",
don't worry, it will all be explained in the following sections.

## Set-up

Before going through this tutorial, you'll need to:

1. Have `vak` installed (following these {ref}`instructions <installation>`).
2. Have a text editor to change a few options in the configuration files
   such as [sublime](https://www.sublimetext.com/), [gedit](https://wiki.gnome.org/Apps/Gedit),
   or [notepad++](https://notepad-plus-plus.org/)
3. Download example data from this dataset: <https://figshare.com/articles/Bengalese_Finch_song_repository/4805749>

   - one day of birdsong, for training data (click to download)
     {download}`https://figshare.com/ndownloader/files/41668980`
   - another day, to use to predict annotations (click to download)
     {download}`https://figshare.com/ndownloader/files/41668983`
   - Be sure to extract the files from these archives!
     Please use the program "tar" to extract the archives,
     on either macOS/Linux or Windows.
     Using other programs like WinZIP on Windows
     can corrupt the files when extracting them,
     causing confusing errors.
     Tar should be available on newer Windows systems
     (as described
     [here](https://learn.microsoft.com/en-us/virtualization/community/team-blog/2017/20171219-tar-and-curl-come-to-windows)).
   - Alternatively you can copy the following command and then
     paste it into a terminal to run a Python script
     that will download and extract the files for you.

     :::{eval-rst}

     .. tabs::

        .. code-tab:: shell macOS / Linux

           curl -sSL https://raw.githubusercontent.com/vocalpy/vak/main/src/scripts/download_autoannotate_data.py | python3 -

        .. code-tab:: shell Windows

           (Invoke-WebRequest -Uri https://raw.githubusercontent.com/vocalpy/vak/main/src/scripts/download_autoannotate_data.py -UseBasicParsing).Content | py -
     :::

4. Download the corresponding configuration files (click to download):
   {download}`gy6or6_train.toml <../toml/gy6or6_train.toml>`,
   {download}`gy6or6_eval.toml <../toml/gy6or6_eval.toml>`,
   and {download}`gy6or6_predict.toml <../toml/gy6or6_predict.toml>`

:::{hint}
The config files in this tutorial use options that make the tutorial run faster, 
so you can quickly get acquainted with the steps to using vak; 
these options will not necessarily give you the best performing models. 
Click the following link to download a config file for training models 
that modifies and adds options to improve performance. 

{download}`gy6or6_train_better.toml <../toml/gy6or6_train_better.toml>`

The main change is the increase in window size.
For more detail on increasing window size, 
see this project:  
<https://github.com/vocalpy/Nicholson-Cohen-SfN-2023-poster>.
For more information on other options that are added or changed, 
please see the comments in the file.
:::

## Overview

There are five steps to using `vak` to automate annotating vocalizations

1. {ref}`prepare a training dataset <prepare-training-dataset>`, from
   a small annotated dataset of vocalizations
2. {ref}`train a neural network <train-neural-network>` with that dataset
3. {ref}`evaluate the trained model <evaluate-model>` with a held-out dataset
4. {ref}`prepare a prediction dataset <prepare-prediction-dataset>` of unannotated data
5. {ref}`use the trained network <use-trained-network>` to predict annotations for the prediction dataset

Before doing that, you'll need to know a little bit about working with the shell,
since that's the main way to work with `vak` without writing any code.
You will enter commands into the shell to run `vak`; this is called the
"command line interface". The next section introduces the command line.

## 0. Use of `vak` from the command line

To use the command-line interface to `vak` you will open a program on your computer
that has a name like "terminal", where you can run programs using the shell.
It will look something like this:

```{image} /images/terminalizer/vak-help.gif
```

Basically any time you run `vak`, what you type at the prompt
will have the following form:

```shell
vak command config.toml
```

where `command` will be an actual command, like `prep`, and `config.toml`
will be the name of an actual configuration file, that let you configure
how a command will run.

To see a list of available commands when you are at the command line,
you can say:

```shell
vak --help
```

The `.toml` files are set up so that each section corresponds to one
of the commands. For example, there is a section called `[PREP]` where you
configure how `vak prep` will run.
Each section consists of option-value pairs, i.e. names of option set to the values you assign them.
For example, here is the `[PREP]` section from the configuration file
downloaded for training.

```{literalinclude} ../toml/gy6or6_train.toml
:language: toml
:lines: 1-9
```

(The files are in `.toml` format;
for this tutorial we will explain
anything specific about that format
you might need to know.)

:::{topic} Why command line?
A strength of the shell is that it lets you write scripts, so that whatever
you do with data is (more) reproducible. That includes the things you'll do
with your data when you're telling `vak` how to use it to train a neural
network. In a machine learning context, you need to reproduce the same steps
when preparing the data you want to apply the trained network to, so you can
predict its annotation.

If you don't have experience with the shell, we
suggest working through this beginner-friendly tutorial from the Carpentries:

<https://swcarpentry.github.io/shell-novice/>

Although it might seem a bit daunting at first, you can actually work quite
efficiently in the shell once you get familiar with the cryptic commands.
There's only a handful you need on a regular basis.
:::

Now that you know how to call `vak` from the command line, we'll walk through the first example
of modifying a configuration file and then using it to `prep` a dataset.

(prepare-training-dataset)=

## 1. Preparing a training dataset

To train a neural network how to predict annotations,
you'll need to tell `vak` where your dataset is.
Do this by opening up the `gy6or6_train.toml` file and changing the
value for the `data_dir` option in the `[PREP]` section to the
path to wherever you downloaded the training data on your computer.

The options you need to change in the configuration files
have a dummy value in capital letters
to help you pick them out, like so:

```{literalinclude} ../toml/gy6or6_train.toml
:language: toml
:lines: 1-10
```

Change the part of the path in capital letters to the actual location
on your computer:

```toml
[vak.prep]
dataset_type = "frame classification"
input_type = "spect"
# we change the next line
data_dir = "/home/users/You/Data/vak_tutorial_data/032212"
```

:::{note}
Notice that paths are enclosed in quotes; this is required
for paths or any other string (text) in a `toml` file. If you
get an error message about the `toml` file, check that
you have put quotes around the paths.
:::

:::{note}
Note also that you can write paths with just forward slashes,
even on Windows platforms! If you are on Windows,
you might be used to writing paths in Python with two
backwards slashes, like so: `'C:\\Users\\Me\\Data'`,
or placing an `r` in front of text strings representing paths, like
`r'C:\Users\Me\Data'`.
To make paths easier to type and read, we work with them
using the `pathlib` library:
<https://realpython.com/python-pathlib/>.
:::

There is one other option you need to change, `output_dir`
that tells `vak` where to save the file it creates that contains information about the dataset.

```toml
output_dir = "/home/users/You/Data/vak_tutorial_data/vak/prep/train"
```

Make sure that this a directory that already exists on your computer,
or create the directory using the File Explorer or the `mkdir` command from the command-line.

After you have changed these two options (we'll ignore the others for now),
you can run the command in the terminal that prepares datasets:

```shell
vak prep gy6or6_train.toml
```

Notice that the command has the structure we described above, `vak command config.toml` .

When you run `prep`, `vak` converts the data from `data_dir` into a special dataset file, and then
automatically adds the path to that file to the `[TRAIN]` section of the `config.toml` file, as the option
`csv_path`.

You have now prepared a dataset for training a model!
You'll probably have more questions about
how to do this later,
when you start to work with your own data.
When that time comes, please see the how-to page:
{ref}`howto-prep-annotate`.
For now, let's move on to training a neural network with this dataset.

(train-neural-network)=
## 2. Training a neural network model

Now that you've prepared the dataset, you can train a neural network with it.
In this example we will train `TweetyNet`,
a neural network architecture that annotates vocalizations
(see: <https://github.com/yardencsGitHub/tweetynet> ).
As of version 1.0, TweetyNet is built into vak.

Before we start training, there is one option you have to change in the `[TRAIN]` section
of the `config.toml` file, `root_results_dir`,
which tells `vak` where to save the files it creates during training.
It's important that you specify this option, so you know
where to find those files when we need them below.

```toml
root_results_dir = "/home/users/You/Data/vak_tutorial_data/vak/train/results"
```

Here it's fine to use the same directory you created before, or make a new one if you prefer to keep the
training data and the files from training the neural network separate.
`vak` will make a new directory inside of `root_results_dir` to save the files related to training
every time that you run the `train` command.

:::{note}
If you are not using a computer with a specialized GPU for training neural networks,
you'll need to change one more option in the .toml configuration file.
Please change the value for the option `device` in the `[TRAIN]` section from
`cuda` to `cpu`, to avoid getting an error about "CUDA not available".
Using a GPU can speed up training, but in practice we find it is quite possible
to train models for annotation on a CPU,
with training times ranging from a couple hours to overnight.
:::

To train a neural network, you run this command in the shell:

```shell
vak train gy6o6_train.toml
```

You will see output to the console as the network trains. The options in the `[TRAIN]` section of
the `config.toml` file tell `vak` to train until the error (measured on a separate "validation" set)
has not improved for four epochs (an epoch is one iteration through the entire training data).
If you let `vak` train until then, it will go through roughly ten epochs (~2 hours on an Ubuntu machine with
an NVIDIA 1080 Ti GPU).

You can also just stop after one epoch if you want to go through the rest of the tutorial. The `[TRAIN]` section
options also specify that `vak` should save a "checkpoint" every epoch, and we need to load our trained network
from that checkpoint later when we predict annotations for new data.

(evaluate-trained-model)=

## 3. Evaluating a trained model

(prepare-prediction-dataset)=

An important step when using neural network models is to evaluate the model's performance
on a held-out dataset that has never been used during training, often called the "test" set.

Here we show you how to evaluate the model we just trained.

This part requires you to find paths to files saved by `vak`.

There's four you need. Three of them will be in the `results` directory
created by `vak` when you ran `train`. If you replaced the dummy path in
capital letters in the config file, but kept the rest of the path,
then this will be a location with a name like
`/PATH/TO/DATA/vak/train/results/results_{timestamp}`,
where `PATH/TO/DATA/` will be replaced with a path on your machine,
and where `{timestamp}` is an actual time in the format `yymmdd_HHMMSS`
(year-month-day hour-minute-second).

The first path you need is the `checkpoint_path`. This is the full
path, including filename, to the file that contains the weights (also known as parameters)
of the trained neural network, saved by `vak`.
There will be a directory inside the `results_{timestamp}` directory
with the name of the trained model, `TweetyNet`,
and inside that sits a `checkpoints` directory that has the actual file you want.
Typically there will be two checkpoint files, one named just `checkpoint.pt` that is
saved intermittently as a backup,
and another that is saved only when accuracy on the
validation set improves, named `max-val-acc-checkpoint.pt`.
If you were to use the `max-val-acc-checkpoint.pt` then the path would end
with `TweetyNet/checkpoints/max-val-acc-checkpoint.pt`.


```toml
checkpoint_path = "/home/users/You/Data/vak_tutorial_data/vak_output/results_{timestamp}/TweetyNet/checkpoints/max-val-acc-checkpoint.pt"
```

In some cases, a `max-val-acc-checkpoint.pt` may not get saved;
this depends on the options for training and non-deterministic factors like
the randomly initialized weights of the network.
For the purposes of completing this tutorial, using either checkpoint is fine.

The second path you want is the one to the file containing the `labelmap`.
The `labelmap` is a Python
dictionary that maps the labels from your annotation to a set of consecutive integers, which
are the outputs the neural network learns to predict during training. It is saved in a `.json`
file in the root `results_{timestamp}` directory.

```toml
labelmap_path = "/home/users/You/Data/vak_tutorial_data/vak_output/results_{timestamp}/labelmap.json"
```

The third and last path you need is the path to the file containing a saved `spect_scaler`.
The `SpectScaler` represents a transform
applied to the data that helps when training the neural network.
You need to apply the same transform to the new
data for which you are predicting labels--otherwise the accuracy will be impaired.
Note that the file does not have an extension. (In case you are curious,
it's a pickled Python object saved by the `joblib` library.)
This file will also be found in the root `results_{timestamp}` directory.

```toml
spect_scaler = "/home/users/You/Data/vak_tutorial_data/vak_output/results_{timestamp}/SpectScaler"
```

The last path you need is actually in the TOML file that we used
to train the neural network: the dataset `path`.
You should copy that `path` option exactly as it is
and then paste it at the bottom of the `[vak.eval.dataset]` table
in the configuration file for evaluation.

```toml
[vak.eval.dataset]
# copy the dataset path from the train config file here;
# we will use the "test" split from that dataset, that we already prepared
path = "/home/users/You/Data/vak_tutorial_data/vak/prep/train/dataset_prepared_20240811"
```

We do this instead of preparing another dataset,
because we already created a test split when we ran
`vak prep` with the training configuration.
This is a good practice, because it helps ensure
that we do not mix the training data with the test data;
`vak` makes sure that the data from the `data_dir` option
is placed in two separate splits, the train and test splits.

Once you have prepared the configuration file as described,
you can run the following in the terminal:

```shell
vak eval gy6o6_eval.toml
```

You will see output to the console as the network is evaluated.
Notice that for this model we evaluate it *with* and *without*
post-processing transforms that clean up the predictions
of the model.
The parameters of the post-processing transform are specified
with the `post_tfm_kwargs` option in the configuration file.
You may find this helpful to understand factors affecting
the performance of your own model.

## 4. Preparing a prediction dataset

Next you'll prepare a dataset for prediction. The dataset we downloaded has annotation files with it,
but for the sake of this tutorial we'll pretend that they're not annotated, and we instead want to
predict the annotation using our trained network.
Here we'll use the other configuration file you downloaded above, `gy6or6_predict.toml`.
We use a separate file with a `[PREDICT]` section in it instead of a `[TRAIN]` section, so that
`vak` knows the dataset it's going to prepare will be for prediction--i.e., it figures this out
from the name of the section present in the file.

Just like before, you're going to modify the `data_dir` option of the
`[PREP]` section of the `config.toml` file.
This time you'll change it to the path to the directory with the other day of data we downloaded.

```toml
[vak.prep]
data_dir = "/home/users/You/Data/vak_tutorial_data/032312"
```

And again, you'll need to change the `output_dir` option
to tell `vak` where to save the file it creates that contains information about the dataset.

```toml
output_dir = "/home/users/You/Data/vak_tutorial_data/vak_output"
```

This part is the same as before too: after you change these options,
you'll run the `prep` command to prepare the dataset for prediction:

```shell
vak prep gy6or6_predict.toml
```

As you might guess from last time, `vak` will make files for the dataset and a .csv file that points to those,
and then add the path to that file as the option `csv_path` in the `[PREDICT]` section of the `config.toml` file.

(use-trained-network)=

## 5. Using a trained network to predict annotations

Finally you will use the trained network to predict annotations.
This is the part that requires you to find paths to files saved by `vak`.

There's three you need. These are the exact same paths we used above
in the configuration file for evaluation, so you can copy them from that file.
We explain them again here for completeness.
All three paths will be in the `results` directory
created by `vak` when you ran `train`. If you replaced the dummy path in
capital letters in the config file, but kept the rest of the path,
then this will be a location with a name like
`/PATH/TO/DATA/vak/train/results/results_{timestamp}`,
where `PATH/TO/DATA/` will be replaced with a path on your machine,
and where `{timestamp}` is an actual time in the format `yymmdd_HHMMSS`
(year-month-day hour-minute-second).

The first path you need is the `checkpoint_path`. This is the full
path, including filename, to the file that contains the weights (also known as parameters)
of the trained neural network, saved by `vak`.
There will be a directory inside the `results_{timestamp}` directory
with the name of the trained model, `TweetyNet`,
and inside that sits a `checkpoints` directory that has the actual file you want.
Typically there will be two checkpoint files, one named just `checkpoint.pt` that is
saved intermittently as a backup,
and another that is saved only when accuracy on the
validation set improves, named `max-val-acc-checkpoint.pt`.
If you were to use the `max-val-acc-checkpoint.pt` then the path would end
with `TweetyNet/checkpoints/max-val-acc-checkpoint.pt`.

```toml
checkpoint_path = "/home/users/You/Data/vak_tutorial_data/vak_output/results_{timestamp}/TweetyNet/checkpoints/max-val-acc-checkpoint.pt"
```

In some cases, a `max-val-acc-checkpoint.pt` may not get saved;
this depends on the options for training and non-deterministic factors like
the randomly initialized weights of the network.
For the purposes of completing this tutorial, using either checkpoint is fine.

The second path you want is the one to the file containing the `labelmap`.
The `labelmap` is a Python
dictionary that maps the labels from your annotation to a set of consecutive integers, which
are the outputs the neural network learns to predict during training. It is saved in a `.json`
file in the root `results_{timestamp}` directory.

```toml
labelmap_path = "/home/users/You/Data/vak_tutorial_data/vak_output/results_{timestamp}/labelmap.json"
```

The third and last path you need is the path to the file containing a saved `spect_scaler`.
The `SpectScaler` represents a transform
applied to the data that helps when training the neural network.
You need to apply the same transform to the new
data for which you are predicting labels--otherwise the accuracy will be impaired.
Note that the file does not have an extension. (In case you are curious,
it's a pickled Python object saved by the `joblib` library.)
This file will also be found in the root `results_{timestamp}` directory.

```toml
spect_scaler = "/home/users/You/Data/vak_tutorial_data/vak_output/results_{timestamp}/SpectScaler"
```

After adding the paths to these files generated during training,
you can specify an `output_dir` where the predicted annotations are saved.
The annotations are saved as a .csv file created by a separate software tool
for dealing with annotations, `crowsetta`. You can also specify the name
of this .csv file. For this tutorial, you can modify both so that
they point to the place where `prep` put the dataset it created for
predictions, just to have everything in one place.

```toml
output_dir = "/home/users/You/Data/vak_tutorial_data/vak/prep/predict"
annot_csv_filename = "gy6or6.032312.annot.csv"
```

:::{note}
Here, just as above for training, if you're not using a computer with a GPU,
you'll want to change the option `device` in the `[PREDICT]` section
of the .toml configuration file from `cuda` to `cpu`.
:::

Finally, after adding these paths,
you can run the `predict` command to generate annotation files from the labels
predicted by the trained neural network.

```shell
vak predict gy6or6_predict.toml
```

That's it! With those five simple steps you can train neural networks,
evaluate the train models, and then use the
trained networks to predict annotations for vocalizations.