Configuration files

This document contains the specification for the .toml configuration files used when running vak commands through the command-line interface, as described in vak command-line interface.

A .toml configuration file is split up into sections. The sections and their valid options are represented in the vak code by classes. To ensure that the code and this documentation do not go out of sync, the options are presented below exactly as documented in the code for each class.

Valid section names

Following is the set of valid section names: {eval, learncurve, predict, prep, train}. In the code, these names correspond to attributes of the main Config class, as shown below.

The only other valid section name is the name of a class representing a neural network. For such sections to be recognized as valid, the model must be installed via the vak.models entry point, so that it can be recognized by the function vak.config.validators.is_valid_model_name.

class vak.config.config.Config(prep=None, train=None, eval=None, predict=None, learncurve=None)[source]

Class that represents the TOML configuration file used with the vak command-line interface.

prep

Represents [vak.prep] table of config.toml file

Type:

vak.config.prep.PrepConfig

train

Represents [vak.train] table of config.toml file

Type:

vak.config.train.TrainConfig

eval

Represents [vak.eval] table of config.toml file

Type:

vak.config.eval.EvalConfig

predict

Represents [vak.predict] table of config.toml file.

Type:

vak.config.predict.PredictConfig

learncurve

Represents [vak.learncurve] table of config.toml file

Type:

vak.config.learncurve.LearncurveConfig

Valid Options by Section

Each section of the .toml config has a set of option names that are considered valid. Valid options for each section are presented below.

[vak.prep] section

class vak.config.prep.PrepConfig(data_dir, output_dir, dataset_type, input_type, audio_format=None, spect_format=None, spect_params=None, annot_file=None, annot_format=None, labelset=None, audio_dask_bag_kwargs=None, train_dur=None, val_dur=None, test_dur=None, train_set_durs=None, num_replicates=None)[source]

Class that represents [vak.prep] table of configuration file.

data_dir

path to directory with files from which to make dataset

Type:

str

output_dir

Path to location where data sets should be saved. Default is None, in which case data sets are saved in the current working directory.

Type:

str

dataset_type

String name of the type of dataset, e.g., ‘frame_classification’. Dataset types are defined by machine learning tasks, e.g., a ‘frame_classification’ dataset would be used a vak.models.FrameClassificationModel model. Valid dataset types are defined as vak.prep.prep.DATASET_TYPES.

Type:

str

audio_format

format of audio files. One of {‘wav’, ‘cbin’}.

Type:

str

spect_format

format of files containg spectrograms as 2-d matrices. One of {‘mat’, ‘npy’}.

Type:

str

spect_params

Parameters for Short-Time Fourier Transform and post-processing of spectrograms. Instance of vak.config.SpectParamsConfig class. Optional, default is None.

Type:

vak.config.SpectParamsConfig, optional

annot_format

format of annotations. Any format that can be used with the crowsetta library is valid.

Type:

str

annot_file

Path to a single annotation file. Default is None. Used when a single file contains annotations for multiple audio files.

Type:

str

labelset

of str or int, the set of labels that correspond to annotated segments that a network should learn to segment and classify. Note that if there are segments that are not annotated, e.g. silent gaps between songbird syllables, then vak will assign a dummy label to those segments – you don’t have to give them a label here. Value for labelset is converted to a Python set using vak.config.converters.labelset_from_toml_value. See help for that function for details on how to specify labelset.

Type:

set

audio_dask_bag_kwargs

Keyword arguments used when calling dask.bag.from_sequence inside vak.io.audio, where it is used to parallelize the conversion of audio files into spectrograms. Option should be specified in config.toml file as an inline table, e.g., audio_dask_bag_kwargs = { npartitions = 20 }. Allows for finer-grained control when needed to process files of different sizes.

Type:

dict

train_dur

total duration of training set, in seconds. When creating a learning curve, training subsets of shorter duration (specified by the ‘train_set_durs’ option in the LEARNCURVE section of a config.toml file) will be drawn from this set.

Type:

float

val_dur

total duration of validation set, in seconds.

Type:

float

test_dur

total duration of test set, in seconds.

Type:

float

train_set_durs

Durations of datasets to use for a learning curve. Float values, durations in seconds of subsets taken from training data to create a learning curve, e.g. [5., 10., 15., 20.]. Default is None. Required if config file has a learncurve section.

Type:

list, optional

num_replicates

Number of replicates to train for each training set duration in a learning curve. Each replicate uses a different randomly drawn subset of the training data (but of the same duration). Default is None. Required if config file has a learncurve section.

Type:

int, optional

[vak.prep.spect_params] section

class vak.config.spect_params.SpectParamsConfig(fft_size=512, step_size=64, freq_cutoffs=None, thresh=None, transform_type=None, spect_key='s', freqbins_key='f', timebins_key='t', audio_path_key='audio_path')[source]

represents parameters for making spectrograms from audio and saving in files

fft_size

size of window for Fast Fourier transform, number of time bins. Default is 512.

Type:

int

step_size

step size for Fast Fourier transform. Default is 64.

Type:

int

freq_cutoffs

of two elements, lower and higher frequencies. Used to bandpass filter audio (using a Butter filter) before generating spectrogram. Default is None, in which case no bandpass filtering is applied.

Type:

tuple

transform_type

one of {‘log_spect’, ‘log_spect_plus_one’}. ‘log_spect’ transforms the spectrogram to log(spectrogram), and ‘log_spect_plus_one’ does the same thing but adds one to each element. Default is None. If None, no transform is applied.

Type:

str

thresh

threshold minimum power for log spectrogram.

Type:

int

spect_key

key for accessing spectrogram in files. Default is ‘s’.

Type:

str

freqbins_key

key for accessing vector of frequency bins in files. Default is ‘f’.

Type:

str

timebins_key

key for accessing vector of time bins in files. Default is ‘t’.

Type:

str

audio_path_key

key for accessing path to source audio file for spectogram in files. Default is ‘audio_path’.

Type:

str

[vak.train] section

class vak.config.train.TrainConfig(model, num_epochs, batch_size, root_results_dir, dataset: DatasetConfig, trainer: TrainerConfig, results_dirname=None, standardize_frames=False, num_workers=2, shuffle=True, val_step=None, ckpt_step=None, patience=None, checkpoint_path=None, frames_standardizer_path=None)[source]

Class that represents [vak.train] table of configuration file.

model

The model to use: its name, and the parameters to configure it. Must be an instance of vak.config.ModelConfig

Type:

vak.config.ModelConfig

num_epochs

number of training epochs. One epoch = one iteration through the entire training set.

Type:

int

batch_size

number of samples per batch presented to models during training.

Type:

int

root_results_dir

directory in which results will be created. The vak.cli.train function will create a subdirectory in this directory each time it runs.

Type:

str

dataset

The dataset to use: the path to it, and optionally a path to a file representing splits, and the name, if it is a built-in dataset. Must be an instance of vak.config.DatasetConfig.

Type:

vak.config.DatasetConfig

trainer

Configuration for lightning.pytorch.Trainer. Must be an instance of vak.config.TrainerConfig.

Type:

vak.config.TrainerConfig

num_workers

Number of processes to use for parallel loading of data. Argument to torch.DataLoader.

Type:

int

shuffle

if True, shuffle training data before each epoch. Default is True.

Type:

bool

standardize_frames

if True, use vak.transforms.FramesStandardizer to standardize the frames. Normalization is done by subtracting off the mean for each row of the training set and then dividing by the std for that frequency bin. This same normalization is then applied to validation + test data.

Type:

bool

val_step

Step on which to estimate accuracy using validation set. If val_step is n, then validation is carried out every time the global step / n is a whole number, i.e., when val_step modulo the global step is 0. Default is None, in which case no validation is done.

Type:

int

ckpt_step

Step on which to save to checkpoint file. If ckpt_step is n, then a checkpoint is saved every time the global step / n is a whole number, i.e., when ckpt_step modulo the global step is 0. Default is None, in which case checkpoint is only saved at the last epoch.

Type:

int

patience

number of validation steps to wait without performance on the validation set improving before stopping the training. Default is None, in which case training only stops after the specified number of epochs.

Type:

int

checkpoint_path

path to directory with checkpoint files saved by Torch, to reload model. Default is None, in which case a new model is initialized.

Type:

str

frames_standardizer_path

path to a saved vak.transforms.FramesStandardizer object used to standardize (normalize) frames. If spectrograms were normalized and this is not provided, will give incorrect results. Default is None.

Type:

str

[vak.eval] section

class vak.config.eval.EvalConfig(checkpoint_path, output_dir, model, batch_size, dataset: DatasetConfig, trainer: TrainerConfig, labelmap_path=None, frames_standardizer_path=None, post_tfm_kwargs: dict | None = None, num_workers=2)[source]

Class that represents [vak.eval] table in configuration file.

checkpoint_path

path to directory with checkpoint files saved by Torch, to reload model

Type:

str

output_dir

Path to location where .csv files with evaluation metrics should be saved.

Type:

str

model

The model to use: its name, and the parameters to configure it. Must be an instance of vak.config.ModelConfig

Type:

vak.config.ModelConfig

batch_size

number of samples per batch presented to models during training.

Type:

int

dataset

The dataset to use: the path to it, and optionally a path to a file representing splits, and the name, if it is a built-in dataset. Must be an instance of vak.config.DatasetConfig.

Type:

vak.config.DatasetConfig

trainer

Configuration for lightning.pytorch.Trainer. Must be an instance of vak.config.TrainerConfig.

Type:

vak.config.TrainerConfig

num_workers

Number of processes to use for parallel loading of data. Argument to torch.DataLoader. Default is 2.

Type:

int

labelmap_path

path to ‘labelmap.json’ file.

Type:

str

frames_standardizer_path

path to a saved vak.transforms.FramesStandardizer object used to standardize (normalize) frames. If spectrograms were normalized and this is not provided, will give incorrect results.

Type:

str

post_tfm_kwargs

Keyword arguments to post-processing transform. If None, then no additional clean-up is applied when transforming labeled timebins to segments, the default behavior. The transform used is vak.transforms.frame_labels.PostProcess`. Valid keyword argument names are 'majority_vote' and 'min_segment_dur', and should be appropriate values for those arguments: Boolean for ``majority_vote, a float value for min_segment_dur. See the docstring of the transform for more details on these arguments and how they work.

Type:

dict

[vak.predict] section

class vak.config.predict.PredictConfig(checkpoint_path, labelmap_path, model, batch_size, dataset: DatasetConfig, trainer: TrainerConfig, frames_standardizer_path=None, num_workers=2, annot_csv_filename=None, output_dir=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/vak/checkouts/latest/doc'), min_segment_dur=None, majority_vote=True, save_net_outputs=False)[source]

Class that represents [vak.predict] table of configuration file.

checkpoint_pathstr

path to directory with checkpoint files saved by Torch, to reload model

labelmap_pathstr

path to ‘labelmap.json’ file.

modelvak.config.ModelConfig

The model to use: its name, and the parameters to configure it. Must be an instance of vak.config.ModelConfig

batch_sizeint

number of samples per batch presented to models during training.

datasetvak.config.DatasetConfig

The dataset to use: the path to it, and optionally a path to a file representing splits, and the name, if it is a built-in dataset. Must be an instance of vak.config.DatasetConfig.

trainervak.config.TrainerConfig

Configuration for lightning.pytorch.Trainer. Must be an instance of vak.config.TrainerConfig.

num_workersint

Number of processes to use for parallel loading of data. Argument to torch.DataLoader. Default is 2.

frames_standardizer_pathstr

path to a saved vak.transforms.FramesStandardizer object used to standardize (normalize) frames. If spectrograms were normalized and this is not provided, will give incorrect results.

annot_csv_filenamestr

name of .csv file containing predicted annotations. Default is None, in which case the name of the dataset .csv is used, with ‘.annot.csv’ appended to it.

output_dirstr

path to location where .csv containing predicted annotation should be saved. Defaults to current working directory.

min_segment_durfloat

minimum duration of segment, in seconds. If specified, then any segment with a duration less than min_segment_dur is removed from lbl_tb. Default is None, in which case no segments are removed.

majority_votebool

if True, transform segments containing multiple labels into segments with a single label by taking a “majority vote”, i.e. assign all time bins in the segment the most frequently occurring label in the segment. This transform can only be applied if the labelmap contains an ‘unlabeled’ label, because unlabeled segments makes it possible to identify the labeled segments. Default is False.

save_net_outputsbool

If True, save ‘raw’ outputs of neural networks before they are converted to annotations. Default is False. Typically the output will be “logits” to which a softmax transform might be applied. For each item in the dataset–each row in the dataset_path .csv– the output will be saved in a separate file in output_dir, with the extension {MODEL_NAME}.output.npz. E.g., if the input is a spectrogram with spect_path filename gy6or6_032312_081416.npz, and the network is TweetyNet, then the net output file will be gy6or6_032312_081416.tweetynet.output.npz.

[vak.learncurve] section

class vak.config.learncurve.LearncurveConfig(model, num_epochs, batch_size, root_results_dir, dataset: DatasetConfig, trainer: TrainerConfig, results_dirname=None, standardize_frames=False, num_workers=2, shuffle=True, val_step=None, ckpt_step=None, patience=None, checkpoint_path=None, frames_standardizer_path=None, post_tfm_kwargs: dict | None = None)[source]

Class that represents [vak.learncurve] table in configuration file.

model

The model to use: its name, and the parameters to configure it. Must be an instance of vak.config.ModelConfig

Type:

vak.config.ModelConfig

num_epochs

number of training epochs. One epoch = one iteration through the entire training set.

Type:

int

batch_size

number of samples per batch presented to models during training.

Type:

int

root_results_dir

directory in which results will be created. The vak.cli.train function will create a subdirectory in this directory each time it runs.

Type:

str

dataset

The dataset to use: the path to it, and optionally a path to a file representing splits, and the name, if it is a built-in dataset. Must be an instance of vak.config.DatasetConfig.

Type:

vak.config.DatasetConfig

trainer

Configuration for lightning.pytorch.Trainer. Must be an instance of vak.config.TrainerConfig.

Type:

vak.config.TrainerConfig

num_workers

Number of processes to use for parallel loading of data. Argument to torch.DataLoader.

Type:

int

shuffle

if True, shuffle training data before each epoch. Default is True.

Type:

bool

standardize_frames

if True, use vak.transforms.FramesStandardizer to standardize the frames. Normalization is done by subtracting off the mean for each row of the training set and then dividing by the std for that frequency bin. This same normalization is then applied to validation + test data.

Type:

bool

val_step

Step on which to estimate accuracy using validation set. If val_step is n, then validation is carried out every time the global step / n is a whole number, i.e., when val_step modulo the global step is 0. Default is None, in which case no validation is done.

Type:

int

ckpt_step

step/epoch at which to save to checkpoint file. Default is None, in which case checkpoint is only saved at the last epoch.

Type:

int

patience

number of epochs to wait without the error dropping before stopping the training. Default is None, in which case training continues for num_epochs

Type:

int

post_tfm_kwargs

Keyword arguments to post-processing transform. If None, then no additional clean-up is applied when transforming labeled timebins to segments, the default behavior. The transform used is vak.transforms.frame_labels.ToSegmentsWithPostProcessing`. Valid keyword argument names are 'majority_vote' and 'min_segment_dur', and should be appropriate values for those arguments: Boolean for ``majority_vote, a float value for min_segment_dur. See the docstring of the transform for more details on these arguments and how they work.

Type:

dict