Configuration files#

This document contains the specification for the .toml configuration files used when running vak commands through the command-line interface, as described in vak command-line interface.

A .toml configuration file is split up into sections. The sections and their valid options are represented in the vak code by classes. To ensure that the code and this documentation do not go out of sync, the options are presented below exactly as documented in the code for each class.

Valid section names#

Following is the set of valid section names: {PREP, SPECT_PARAMS, DATALOADER, TRAIN, PREDICT, LEARNCURVE}. In the code, these names correspond to attributes of the main Config class, as shown below.

The only other valid section name is the name of a class representing a neural network. For such sections to be recognized as valid, the model must be installed via the vak.models entry point, so that it can be recognized by the function vak.config.validators.is_valid_model_name.

class vak.config.config.Config(spect_params=SpectParamsConfig(fft_size=512, step_size=64, freq_cutoffs=None, thresh=None, transform_type=None, spect_key='s', freqbins_key='f', timebins_key='t', audio_path_key='audio_path'), prep=None, train=None, eval=None, predict=None, learncurve=None)[source]#

class to represent config.toml file

prep#

represents [PREP] section of config.toml file

Type:

vak.config.prep.PrepConfig

spect_params#

represents [SPECT_PARAMS] section of config.toml file

Type:

vak.config.spect_params.SpectParamsConfig

train#

represents [TRAIN] section of config.toml file

Type:

vak.config.train.TrainConfig

eval#

represents [EVAL] section of config.toml file

Type:

vak.config.eval.EvalConfig

predict#

represents [PREDICT] section of config.toml file.

Type:

vak.config.predict.PredictConfig

learncurve#

represents [LEARNCURVE] section of config.toml file

Type:

vak.config.learncurve.LearncurveConfig

Valid Options by Section#

Each section of the .toml config has a set of option names that are considered valid. Valid options for each section are presented below.

[PREP] section#

class vak.config.prep.PrepConfig(data_dir, output_dir, dataset_type, input_type, audio_format=None, spect_format=None, annot_file=None, annot_format=None, labelset=None, audio_dask_bag_kwargs=None, train_dur=None, val_dur=None, test_dur=None, train_set_durs=None, num_replicates=None)[source]#

class to represent [PREP] section of config.toml file

data_dir#

path to directory with files from which to make dataset

Type:

str

output_dir#

Path to location where data sets should be saved. Default is None, in which case data sets are saved in the current working directory.

Type:

str

dataset_type#

String name of the type of dataset, e.g., ā€˜frame_classificationā€™. Dataset types are defined by machine learning tasks, e.g., a ā€˜frame_classificationā€™ dataset would be used a vak.models.FrameClassificationModel model. Valid dataset types are defined as vak.prep.prep.DATASET_TYPES.

Type:

str

audio_format#

format of audio files. One of {ā€˜wavā€™, ā€˜cbinā€™}.

Type:

str

spect_format#

format of files containg spectrograms as 2-d matrices. One of {ā€˜matā€™, ā€˜npyā€™}.

Type:

str

annot_format#

format of annotations. Any format that can be used with the crowsetta library is valid.

Type:

str

annot_file#

Path to a single annotation file. Default is None. Used when a single file contains annotations for multiple audio files.

Type:

str

labelset#

of str or int, the set of labels that correspond to annotated segments that a network should learn to segment and classify. Note that if there are segments that are not annotated, e.g. silent gaps between songbird syllables, then vak will assign a dummy label to those segments ā€“ you donā€™t have to give them a label here. Value for labelset is converted to a Python set using vak.config.converters.labelset_from_toml_value. See help for that function for details on how to specify labelset.

Type:

set

audio_dask_bag_kwargs#

Keyword arguments used when calling dask.bag.from_sequence inside vak.io.audio, where it is used to parallelize the conversion of audio files into spectrograms. Option should be specified in config.toml file as an inline table, e.g., audio_dask_bag_kwargs = { npartitions = 20 }. Allows for finer-grained control when needed to process files of different sizes.

Type:

dict

train_dur#

total duration of training set, in seconds. When creating a learning curve, training subsets of shorter duration (specified by the ā€˜train_set_dursā€™ option in the LEARNCURVE section of a config.toml file) will be drawn from this set.

Type:

float

val_dur#

total duration of validation set, in seconds.

Type:

float

test_dur#

total duration of test set, in seconds.

Type:

float

train_set_durs#

Durations of datasets to use for a learning curve. Float values, durations in seconds of subsets taken from training data to create a learning curve, e.g. [5., 10., 15., 20.]. Default is None. Required if config file has a learncurve section.

Type:

list, optional

num_replicates#

Number of replicates to train for each training set duration in a learning curve. Each replicate uses a different randomly drawn subset of the training data (but of the same duration). Default is None. Required if config file has a learncurve section.

Type:

int, optional

[SPECT_PARAMS] section#

class vak.config.spect_params.SpectParamsConfig(fft_size=512, step_size=64, freq_cutoffs=None, thresh=None, transform_type=None, spect_key='s', freqbins_key='f', timebins_key='t', audio_path_key='audio_path')[source]#

represents parameters for making spectrograms from audio and saving in files

fft_size#

size of window for Fast Fourier transform, number of time bins. Default is 512.

Type:

int

step_size#

step size for Fast Fourier transform. Default is 64.

Type:

int

freq_cutoffs#

of two elements, lower and higher frequencies. Used to bandpass filter audio (using a Butter filter) before generating spectrogram. Default is None, in which case no bandpass filtering is applied.

Type:

tuple

transform_type#

one of {ā€˜log_spectā€™, ā€˜log_spect_plus_oneā€™}. ā€˜log_spectā€™ transforms the spectrogram to log(spectrogram), and ā€˜log_spect_plus_oneā€™ does the same thing but adds one to each element. Default is None. If None, no transform is applied.

Type:

str

thresh#

threshold minimum power for log spectrogram.

Type:

int

spect_key#

key for accessing spectrogram in files. Default is ā€˜sā€™.

Type:

str

freqbins_key#

key for accessing vector of frequency bins in files. Default is ā€˜fā€™.

Type:

str

timebins_key#

key for accessing vector of time bins in files. Default is ā€˜tā€™.

Type:

str

audio_path_key#

key for accessing path to source audio file for spectogram in files. Default is ā€˜audio_pathā€™.

Type:

str

[DATALOADER] section#

[TRAIN] section#

class vak.config.train.TrainConfig(model, num_epochs, batch_size, root_results_dir, dataset_path=None, results_dirname=None, normalize_spectrograms=False, num_workers=2, device='cpu', shuffle=True, val_step=None, ckpt_step=None, patience=None, checkpoint_path=None, spect_scaler_path=None, train_transform_params=None, train_dataset_params=None, val_transform_params=None, val_dataset_params=None)[source]#

class that represents [TRAIN] section of config.toml file

model#

Model name, e.g., model = "TweetyNet"

Type:

str

dataset_path#

Path to dataset, e.g., a csv file generated by running vak prep.

Type:

str

num_epochs#

number of training epochs. One epoch = one iteration through the entire training set.

Type:

int

batch_size#

number of samples per batch presented to models during training.

Type:

int

root_results_dir#

directory in which results will be created. The vak.cli.train function will create a subdirectory in this directory each time it runs.

Type:

str

num_workers#

Number of processes to use for parallel loading of data. Argument to torch.DataLoader.

Type:

int

device#

Device on which to work with model + data. Defaults to ā€˜cudaā€™ if torch.cuda.is_available is True.

Type:

str

shuffle#

if True, shuffle training data before each epoch. Default is True.

Type:

bool

normalize_spectrograms#

if True, use spect.utils.data.SpectScaler to normalize the spectrograms. Normalization is done by subtracting off the mean for each frequency bin of the training set and then dividing by the std for that frequency bin. This same normalization is then applied to validation + test data.

Type:

bool

val_step#

Step on which to estimate accuracy using validation set. If val_step is n, then validation is carried out every time the global step / n is a whole number, i.e., when val_step modulo the global step is 0. Default is None, in which case no validation is done.

Type:

int

ckpt_step#

Step on which to save to checkpoint file. If ckpt_step is n, then a checkpoint is saved every time the global step / n is a whole number, i.e., when ckpt_step modulo the global step is 0. Default is None, in which case checkpoint is only saved at the last epoch.

Type:

int

patience#

number of validation steps to wait without performance on the validation set improving before stopping the training. Default is None, in which case training only stops after the specified number of epochs.

Type:

int

checkpoint_path#

path to directory with checkpoint files saved by Torch, to reload model. Default is None, in which case a new model is initialized.

Type:

str

spect_scaler_path#

path to a saved SpectScaler object used to normalize spectrograms. If spectrograms were normalized and this is not provided, will give incorrect results. Default is None.

Type:

str

[EVAL] section#

class vak.config.eval.EvalConfig(checkpoint_path, output_dir, model, batch_size, dataset_path=None, labelmap_path=None, spect_scaler_path=None, post_tfm_kwargs: dict | None = None, num_workers=2, device='cpu', transform_params=None, dataset_params=None)[source]#

class that represents [EVAL] section of config.toml file

dataset_path#

Path to dataset, e.g., a csv file generated by running vak prep.

Type:

str

checkpoint_path#

path to directory with checkpoint files saved by Torch, to reload model

Type:

str

output_dir#

Path to location where .csv files with evaluation metrics should be saved.

Type:

str

labelmap_path#

path to ā€˜labelmap.jsonā€™ file.

Type:

str

model#

Model name, e.g., model = "TweetyNet"

Type:

str

batch_size#

number of samples per batch presented to models during training.

Type:

int

num_workers#

Number of processes to use for parallel loading of data. Argument to torch.DataLoader. Default is 2.

Type:

int

device#

Device on which to work with model + data. Defaults to ā€˜cudaā€™ if torch.cuda.is_available is True.

Type:

str

spect_scaler_path#

path to a saved SpectScaler object used to normalize spectrograms. If spectrograms were normalized and this is not provided, will give incorrect results.

Type:

str

post_tfm_kwargs#

Keyword arguments to post-processing transform. If None, then no additional clean-up is applied when transforming labeled timebins to segments, the default behavior. The transform used is vak.transforms.frame_labels.PostProcess`. Valid keyword argument names are 'majority_vote' and 'min_segment_dur', and should be appropriate values for those arguments: Boolean for ``majority_vote, a float value for min_segment_dur. See the docstring of the transform for more details on these arguments and how they work.

Type:

dict

transform_params#

Parameters for data transform. Passed as keyword arguments. Optional, default is None.

Type:

dict, optional

dataset_params#

Parameters for dataset. Passed as keyword arguments. Optional, default is None.

Type:

dict, optional

[PREDICT] section#

class vak.config.predict.PredictConfig(checkpoint_path, labelmap_path, model, batch_size, dataset_path=None, spect_scaler_path=None, num_workers=2, device='cpu', annot_csv_filename=None, output_dir=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/vak/checkouts/latest/doc'), min_segment_dur=None, majority_vote=True, save_net_outputs=False, transform_params=None, dataset_params=None)[source]#

class that represents [PREDICT] section of config.toml file

dataset_pathstr

Path to dataset, e.g., a csv file generated by running vak prep.

checkpoint_pathstr

path to directory with checkpoint files saved by Torch, to reload model

labelmap_pathstr

path to ā€˜labelmap.jsonā€™ file.

modelstr

Model name, e.g., model = "TweetyNet"

batch_sizeint

number of samples per batch presented to models during training.

num_workersint

Number of processes to use for parallel loading of data. Argument to torch.DataLoader. Default is 2.

devicestr

Device on which to work with model + data. Defaults to ā€˜cudaā€™ if torch.cuda.is_available is True.

spect_scaler_pathstr

path to a saved SpectScaler object used to normalize spectrograms. If spectrograms were normalized and this is not provided, will give incorrect results.

annot_csv_filenamestr

name of .csv file containing predicted annotations. Default is None, in which case the name of the dataset .csv is used, with ā€˜.annot.csvā€™ appended to it.

output_dirstr

path to location where .csv containing predicted annotation should be saved. Defaults to current working directory.

min_segment_durfloat

minimum duration of segment, in seconds. If specified, then any segment with a duration less than min_segment_dur is removed from lbl_tb. Default is None, in which case no segments are removed.

majority_votebool

if True, transform segments containing multiple labels into segments with a single label by taking a ā€œmajority voteā€, i.e. assign all time bins in the segment the most frequently occurring label in the segment. This transform can only be applied if the labelmap contains an ā€˜unlabeledā€™ label, because unlabeled segments makes it possible to identify the labeled segments. Default is False.

save_net_outputsbool

if True, save ā€˜rawā€™ outputs of neural networks before they are converted to annotations. Default is False. Typically the output will be ā€œlogitsā€ to which a softmax transform might be applied. For each item in the datasetā€“each row in the dataset_path .csvā€“ the output will be saved in a separate file in output_dir, with the extension {MODEL_NAME}.output.npz. E.g., if the input is a spectrogram with spect_path filename gy6or6_032312_081416.npz, and the network is TweetyNet, then the net output file will be gy6or6_032312_081416.tweetynet.output.npz.

transform_params: dict, optional

Parameters for data transform. Passed as keyword arguments. Optional, default is None.

dataset_params: dict, optional

Parameters for dataset. Passed as keyword arguments. Optional, default is None.

[LEARNCURVE] section#

class vak.config.learncurve.LearncurveConfig(model, num_epochs, batch_size, root_results_dir, dataset_path=None, results_dirname=None, normalize_spectrograms=False, num_workers=2, device='cpu', shuffle=True, val_step=None, ckpt_step=None, patience=None, checkpoint_path=None, spect_scaler_path=None, train_transform_params=None, train_dataset_params=None, val_transform_params=None, val_dataset_params=None, post_tfm_kwargs: dict | None = None)[source]#

class that represents [LEARNCURVE] section of config.toml file

model#

Model name, e.g., model = "TweetyNet"

Type:

str

dataset_path#

Path to dataset, e.g., a csv file generated by running vak prep.

Type:

str

num_epochs#

number of training epochs. One epoch = one iteration through the entire training set.

Type:

int

normalize_spectrograms#

if True, use spect.utils.data.SpectScaler to normalize the spectrograms. Normalization is done by subtracting off the mean for each frequency bin of the training set and then dividing by the std for that frequency bin. This same normalization is then applied to validation + test data.

Type:

bool

ckpt_step#

step/epoch at which to save to checkpoint file. Default is None, in which case checkpoint is only saved at the last epoch.

Type:

int

patience#

number of epochs to wait without the error dropping before stopping the training. Default is None, in which case training continues for num_epochs

Type:

int

save_only_single_checkpoint_file#

if True, save only one checkpoint file instead of separate files every time we save. Default is True.

Type:

bool

use_train_subsets_from_previous_run#

if True, use training subsets saved in a previous run. Default is False. Requires setting previous_run_path option in config.toml file.

Type:

bool

post_tfm_kwargs#

Keyword arguments to post-processing transform. If None, then no additional clean-up is applied when transforming labeled timebins to segments, the default behavior. The transform used is vak.transforms.frame_labels.ToSegmentsWithPostProcessing`. Valid keyword argument names are 'majority_vote' and 'min_segment_dur', and should be appropriate values for those arguments: Boolean for ``majority_vote, a float value for min_segment_dur. See the docstring of the transform for more details on these arguments and how they work.

Type:

dict