Configuration files¶

This document contains the specification for the .toml configuration files used when running vak commands through the command-line interface, as described in vak command-line interface.

A .toml configuration file is split up into sections. The sections and their valid options are represented in the vak code by classes. To ensure that the code and this documentation do not go out of sync, the options are presented below exactly as documented in the code for each class.

Valid section names¶

Following is the set of valid section names: {eval, learncurve, predict, prep, train}. In the code, these names correspond to attributes of the main Config class, as shown below.

The only other valid section name is the name of a class representing a neural network. For such sections to be recognized as valid, the model must be installed via the vak.models entry point, so that it can be recognized by the function vak.config.validators.is_valid_model_name.

class vak.config.config.Config(prep=None, train=None, eval=None, predict=None, learncurve=None)[source]¶

Class that represents the TOML configuration file used with the vak command-line interface.

prep¶

Represents [vak.prep] table of config.toml file

Type:: vak.config.prep.PrepConfig

train¶

Represents [vak.train] table of config.toml file

Type:: vak.config.train.TrainConfig

eval¶

Represents [vak.eval] table of config.toml file

Type:: vak.config.eval.EvalConfig

predict¶

Represents [vak.predict] table of config.toml file.

Type:: vak.config.predict.PredictConfig

learncurve¶

Represents [vak.learncurve] table of config.toml file

Type:: vak.config.learncurve.LearncurveConfig

Valid Options by Section¶

Each section of the .toml config has a set of option names that are considered valid. Valid options for each section are presented below.

`[vak.prep]` section¶

class vak.config.prep.PrepConfig(data_dir, output_dir, dataset_type, input_type, audio_format=None, spect_format=None, spect_params=None, annot_file=None, annot_format=None, labelset=None, audio_dask_bag_kwargs=None, train_dur=None, val_dur=None, test_dur=None, train_set_durs=None, num_replicates=None)[source]¶

Class that represents [vak.prep] table of configuration file.

data_dir¶

path to directory with files from which to make dataset

Type:: str

output_dir¶

Path to location where data sets should be saved. Default is None, in which case data sets are saved in the current working directory.

Type:: str

dataset_type¶

String name of the type of dataset, e.g., ‘frame_classification’. Dataset types are defined by machine learning tasks, e.g., a ‘frame_classification’ dataset would be used a vak.models.FrameClassificationModel model. Valid dataset types are defined as vak.prep.prep.DATASET_TYPES.

Type:: str

audio_format¶

format of audio files. One of {‘wav’, ‘cbin’}.

Type:: str

spect_format¶

format of files containg spectrograms as 2-d matrices. One of {‘mat’, ‘npy’}.

Type:: str

spect_params¶

Parameters for Short-Time Fourier Transform and post-processing of spectrograms. Instance of vak.config.SpectParamsConfig class. Optional, default is None.

Type:: vak.config.SpectParamsConfig, optional

annot_format¶

format of annotations. Any format that can be used with the crowsetta library is valid.

Type:: str

annot_file¶

Path to a single annotation file. Default is None. Used when a single file contains annotations for multiple audio files.

Type:: str

labelset¶

of str or int, the set of labels that correspond to annotated segments that a network should learn to segment and classify. Note that if there are segments that are not annotated, e.g. silent gaps between songbird syllables, then vak will assign a dummy label to those segments – you don’t have to give them a label here. Value for labelset is converted to a Python set using vak.config.converters.labelset_from_toml_value. See help for that function for details on how to specify labelset.

Type:: set

audio_dask_bag_kwargs¶

Keyword arguments used when calling dask.bag.from_sequence inside vak.io.audio, where it is used to parallelize the conversion of audio files into spectrograms. Option should be specified in config.toml file as an inline table, e.g., audio_dask_bag_kwargs = { npartitions = 20 }. Allows for finer-grained control when needed to process files of different sizes.

Type:: dict

train_dur¶

total duration of training set, in seconds. When creating a learning curve, training subsets of shorter duration (specified by the ‘train_set_durs’ option in the LEARNCURVE section of a config.toml file) will be drawn from this set.

Type:: float

val_dur¶

total duration of validation set, in seconds.

Type:: float

test_dur¶

total duration of test set, in seconds.

Type:: float

train_set_durs¶

Durations of datasets to use for a learning curve. Float values, durations in seconds of subsets taken from training data to create a learning curve, e.g. [5., 10., 15., 20.]. Default is None. Required if config file has a learncurve section.

Type:: list, optional

num_replicates¶

Number of replicates to train for each training set duration in a learning curve. Each replicate uses a different randomly drawn subset of the training data (but of the same duration). Default is None. Required if config file has a learncurve section.

Type:: int, optional

`[vak.prep.spect_params]` section¶

class vak.config.spect_params.SpectParamsConfig(fft_size=512, step_size=64, freq_cutoffs=None, thresh=None, transform_type=None, spect_key='s', freqbins_key='f', timebins_key='t', audio_path_key='audio_path')[source]¶

represents parameters for making spectrograms from audio and saving in files

fft_size¶

size of window for Fast Fourier transform, number of time bins. Default is 512.

Type:: int

step_size¶

step size for Fast Fourier transform. Default is 64.

Type:: int

freq_cutoffs¶

of two elements, lower and higher frequencies. Used to bandpass filter audio (using a Butter filter) before generating spectrogram. Default is None, in which case no bandpass filtering is applied.

Type:: tuple

transform_type¶

one of {‘log_spect’, ‘log_spect_plus_one’}. ‘log_spect’ transforms the spectrogram to log(spectrogram), and ‘log_spect_plus_one’ does the same thing but adds one to each element. Default is None. If None, no transform is applied.

Type:: str

thresh¶

threshold minimum power for log spectrogram.

Type:: int

spect_key¶

key for accessing spectrogram in files. Default is ‘s’.

Type:: str

freqbins_key¶

key for accessing vector of frequency bins in files. Default is ‘f’.

Type:: str

timebins_key¶

key for accessing vector of time bins in files. Default is ‘t’.

Type:: str

audio_path_key¶

key for accessing path to source audio file for spectogram in files. Default is ‘audio_path’.

Type:: str

`[vak.train]` section¶

class vak.config.train.TrainConfig(model, num_epochs, batch_size, root_results_dir, dataset: DatasetConfig, trainer: TrainerConfig, results_dirname=None, standardize_frames=False, num_workers=2, shuffle=True, val_step=None, ckpt_step=None, patience=None, checkpoint_path=None, frames_standardizer_path=None)[source]¶

Class that represents [vak.train] table of configuration file.

model¶

The model to use: its name, and the parameters to configure it. Must be an instance of vak.config.ModelConfig

Type:: vak.config.ModelConfig

num_epochs¶

number of training epochs. One epoch = one iteration through the entire training set.

Type:: int

batch_size¶

number of samples per batch presented to models during training.

Type:: int

root_results_dir¶

directory in which results will be created. The vak.cli.train function will create a subdirectory in this directory each time it runs.

Type:: str

dataset¶

The dataset to use: the path to it, and optionally a path to a file representing splits, and the name, if it is a built-in dataset. Must be an instance of vak.config.DatasetConfig.

Type:: vak.config.DatasetConfig

trainer¶

Configuration for lightning.pytorch.Trainer. Must be an instance of vak.config.TrainerConfig.

Type:: vak.config.TrainerConfig

num_workers¶

Number of processes to use for parallel loading of data. Argument to torch.DataLoader.

Type:: int

shuffle¶

if True, shuffle training data before each epoch. Default is True.

Type:: bool

standardize_frames¶

if True, use vak.transforms.FramesStandardizer to standardize the frames. Normalization is done by subtracting off the mean for each row of the training set and then dividing by the std for that frequency bin. This same normalization is then applied to validation + test data.

Type:: bool

val_step¶

Step on which to estimate accuracy using validation set. If val_step is n, then validation is carried out every time the global step / n is a whole number, i.e., when val_step modulo the global step is 0. Default is None, in which case no validation is done.

Type:: int

ckpt_step¶

Step on which to save to checkpoint file. If ckpt_step is n, then a checkpoint is saved every time the global step / n is a whole number, i.e., when ckpt_step modulo the global step is 0. Default is None, in which case checkpoint is only saved at the last epoch.

Type:: int

patience¶

number of validation steps to wait without performance on the validation set improving before stopping the training. Default is None, in which case training only stops after the specified number of epochs.

Type:: int

checkpoint_path¶

path to directory with checkpoint files saved by Torch, to reload model. Default is None, in which case a new model is initialized.

Type:: str

frames_standardizer_path¶

path to a saved vak.transforms.FramesStandardizer object used to standardize (normalize) frames. If spectrograms were normalized and this is not provided, will give incorrect results. Default is None.

Type:: str

`[vak.eval]` section¶

class vak.config.eval.EvalConfig(checkpoint_path, output_dir, model, batch_size, dataset: DatasetConfig, trainer: TrainerConfig, labelmap_path=None, frames_standardizer_path=None, post_tfm_kwargs: dict | None = None, num_workers=2)[source]¶

Class that represents [vak.eval] table in configuration file.

checkpoint_path¶

path to directory with checkpoint files saved by Torch, to reload model

Type:: str

output_dir¶

Path to location where .csv files with evaluation metrics should be saved.

Type:: str

model¶

The model to use: its name, and the parameters to configure it. Must be an instance of vak.config.ModelConfig

Type:: vak.config.ModelConfig

batch_size¶

number of samples per batch presented to models during training.

Type:: int

dataset¶

The dataset to use: the path to it, and optionally a path to a file representing splits, and the name, if it is a built-in dataset. Must be an instance of vak.config.DatasetConfig.

Type:: vak.config.DatasetConfig

trainer¶

Configuration for lightning.pytorch.Trainer. Must be an instance of vak.config.TrainerConfig.

Type:: vak.config.TrainerConfig

num_workers¶

Number of processes to use for parallel loading of data. Argument to torch.DataLoader. Default is 2.

Type:: int

labelmap_path¶

path to ‘labelmap.json’ file.

Type:: str

frames_standardizer_path¶

path to a saved vak.transforms.FramesStandardizer object used to standardize (normalize) frames. If spectrograms were normalized and this is not provided, will give incorrect results.

Type:: str

post_tfm_kwargs¶

Keyword arguments to post-processing transform. If None, then no additional clean-up is applied when transforming labeled timebins to segments, the default behavior. The transform used is vak.transforms.frame_labels.PostProcess`. Valid keyword argument names are 'majority_vote' and 'min_segment_dur', and should be appropriate values for those arguments: Boolean for ``majority_vote, a float value for min_segment_dur. See the docstring of the transform for more details on these arguments and how they work.

Type:: dict

`[vak.predict]` section¶

class vak.config.predict.PredictConfig(checkpoint_path, labelmap_path, model, batch_size, dataset: DatasetConfig, trainer: TrainerConfig, frames_standardizer_path=None, num_workers=2, annot_csv_filename=None, output_dir=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/vak/checkouts/stable/doc'), min_segment_dur=None, majority_vote=True, save_net_outputs=False)[source]¶

Class that represents [vak.predict] table of configuration file.

checkpoint_pathstr: path to directory with checkpoint files saved by Torch, to reload model
labelmap_pathstr: path to ‘labelmap.json’ file.
modelvak.config.ModelConfig: The model to use: its name, and the parameters to configure it. Must be an instance of vak.config.ModelConfig
batch_sizeint: number of samples per batch presented to models during training.
datasetvak.config.DatasetConfig: The dataset to use: the path to it, and optionally a path to a file representing splits, and the name, if it is a built-in dataset. Must be an instance of vak.config.DatasetConfig.
trainervak.config.TrainerConfig: Configuration for lightning.pytorch.Trainer. Must be an instance of vak.config.TrainerConfig.
num_workersint: Number of processes to use for parallel loading of data. Argument to torch.DataLoader. Default is 2.
frames_standardizer_pathstr: path to a saved vak.transforms.FramesStandardizer object used to standardize (normalize) frames. If spectrograms were normalized and this is not provided, will give incorrect results.
annot_csv_filenamestr: name of .csv file containing predicted annotations. Default is None, in which case the name of the dataset .csv is used, with ‘.annot.csv’ appended to it.
output_dirstr: path to location where .csv containing predicted annotation should be saved. Defaults to current working directory.
min_segment_durfloat: minimum duration of segment, in seconds. If specified, then any segment with a duration less than min_segment_dur is removed from lbl_tb. Default is None, in which case no segments are removed.
majority_votebool: if True, transform segments containing multiple labels into segments with a single label by taking a “majority vote”, i.e. assign all time bins in the segment the most frequently occurring label in the segment. This transform can only be applied if the labelmap contains an ‘unlabeled’ label, because unlabeled segments makes it possible to identify the labeled segments. Default is False.
save_net_outputsbool: If True, save ‘raw’ outputs of neural networks before they are converted to annotations. Default is False. Typically the output will be “logits” to which a softmax transform might be applied. For each item in the dataset–each row in the dataset_path .csv– the output will be saved in a separate file in output_dir, with the extension {MODEL_NAME}.output.npz. E.g., if the input is a spectrogram with spect_path filename gy6or6_032312_081416.npz, and the network is TweetyNet, then the net output file will be gy6or6_032312_081416.tweetynet.output.npz.

`[vak.learncurve]` section¶

class vak.config.learncurve.LearncurveConfig(model, num_epochs, batch_size, root_results_dir, dataset: DatasetConfig, trainer: TrainerConfig, results_dirname=None, standardize_frames=False, num_workers=2, shuffle=True, val_step=None, ckpt_step=None, patience=None, checkpoint_path=None, frames_standardizer_path=None, post_tfm_kwargs: dict | None = None)[source]¶

Class that represents [vak.learncurve] table in configuration file.

model¶

The model to use: its name, and the parameters to configure it. Must be an instance of vak.config.ModelConfig

Type:: vak.config.ModelConfig

num_epochs¶

number of training epochs. One epoch = one iteration through the entire training set.

Type:: int

batch_size¶

number of samples per batch presented to models during training.

Type:: int

root_results_dir¶

directory in which results will be created. The vak.cli.train function will create a subdirectory in this directory each time it runs.

Type:: str

dataset¶

The dataset to use: the path to it, and optionally a path to a file representing splits, and the name, if it is a built-in dataset. Must be an instance of vak.config.DatasetConfig.

Type:: vak.config.DatasetConfig

trainer¶

Configuration for lightning.pytorch.Trainer. Must be an instance of vak.config.TrainerConfig.

Type:: vak.config.TrainerConfig

num_workers¶

Number of processes to use for parallel loading of data. Argument to torch.DataLoader.

Type:: int

shuffle¶

if True, shuffle training data before each epoch. Default is True.

Type:: bool

standardize_frames¶

Type:: bool

val_step¶

Type:: int

ckpt_step¶

step/epoch at which to save to checkpoint file. Default is None, in which case checkpoint is only saved at the last epoch.

Type:: int

patience¶

number of epochs to wait without the error dropping before stopping the training. Default is None, in which case training continues for num_epochs

Type:: int

post_tfm_kwargs¶

Keyword arguments to post-processing transform. If None, then no additional clean-up is applied when transforming labeled timebins to segments, the default behavior. The transform used is vak.transforms.frame_labels.ToSegmentsWithPostProcessing`. Valid keyword argument names are 'majority_vote' and 'min_segment_dur', and should be appropriate values for those arguments: Boolean for ``majority_vote, a float value for min_segment_dur. See the docstring of the transform for more details on these arguments and how they work.

Type:: dict

Configuration files¶

Valid section names¶

Valid Options by Section¶

[vak.prep] section¶

[vak.prep.spect_params] section¶

[vak.train] section¶

[vak.eval] section¶

[vak.predict] section¶

[vak.learncurve] section¶

`[vak.prep]` section¶

`[vak.prep.spect_params]` section¶

`[vak.train]` section¶

`[vak.eval]` section¶

`[vak.predict]` section¶

`[vak.learncurve]` section¶