Configuration files#
This document contains the specification
for the .toml
configuration files used
when running vak
commands through the command-line interface,
as described in vak command-line interface.
A .toml
configuration file is split up into sections.
The sections and their valid options
are represented in the vak
code
by classes.
To ensure that the code and this documentation
do not go out of sync,
the options are presented below
exactly as documented in the code
for each class.
Valid section names#
Following is the set of valid section names:
{PREP, SPECT_PARAMS, DATALOADER, TRAIN, PREDICT, LEARNCURVE}
.
In the code, these names correspond to attributes
of the main Config
class, as shown below.
The only other valid section name
is the name of a class
representing a neural network.
For such sections to be recognized as valid,
the model must be installed via the vak.models
entry point, so that it can be recognized by the function
vak.config.validators.is_valid_model_name
.
- class vak.config.config.Config(spect_params=SpectParamsConfig(fft_size=512, step_size=64, freq_cutoffs=None, thresh=None, transform_type=None, spect_key='s', freqbins_key='f', timebins_key='t', audio_path_key='audio_path'), dataloader=DataLoaderConfig(window_size=88), prep=None, train=None, eval=None, predict=None, learncurve=None)[source]#
class to represent config.toml file
- prep#
represents
[PREP]
section of config.toml file
- spect_params#
represents
[SPECT_PARAMS]
section of config.toml file
- dataloader#
represents
[DATALOADER]
section of config.toml file
- train#
represents
[TRAIN]
section of config.toml file
- eval#
represents
[EVAL]
section of config.toml file
- predict#
represents
[PREDICT]
section of config.toml file.
- learncurve#
represents
[LEARNCURVE]
section of config.toml file
Valid Options by Section#
Each section of the .toml
config
has a set of option names
that are considered valid.
Valid options for each section are presented below.
[PREP]
section#
- class vak.config.prep.PrepConfig(data_dir, output_dir, audio_format=None, spect_format=None, spect_output_dir=None, annot_file=None, annot_format=None, labelset=None, audio_dask_bag_kwargs=None, train_dur=None, val_dur=None, test_dur=None)[source]#
class to represent [PREP] section of config.toml file
- output_dir#
Path to location where data sets should be saved. Default is None, in which case data sets are saved in the current working directory.
- Type:
- spect_format#
format of files containg spectrograms as 2-d matrices. One of {‘mat’, ‘npy’}.
- Type:
- spect_output_dir#
path to directory where array files containing spectrograms should be saved, when generated from audio files. Default is None, in which case the spectrogram files are saved in
data_dir
byvak.io.dataframe.from_files
.- Type:
- annot_format#
format of annotations. Any format that can be used with the crowsetta library is valid.
- Type:
- annot_file#
Path to a single annotation file. Default is None. Used when a single file contains annotations for multiple audio files.
- Type:
- labelset#
of str or int, the set of labels that correspond to annotated segments that a network should learn to segment and classify. Note that if there are segments that are not annotated, e.g. silent gaps between songbird syllables, then vak will assign a dummy label to those segments – you don’t have to give them a label here. Value for
labelset
is converted to a Pythonset
usingvak.config.converters.labelset_from_toml_value
. See help for that function for details on how to specify labelset.- Type:
- audio_dask_bag_kwargs#
Keyword arguments used when calling
dask.bag.from_sequence
insidevak.io.audio
, where it is used to parallelize the conversion of audio files into spectrograms. Option should be specified in config.toml file as an inline table, e.g.,audio_dask_bag_kwargs = { npartitions = 20 }
. Allows for finer-grained control when needed to process files of different sizes.- Type:
- train_dur#
total duration of training set, in seconds. When creating a learning curve, training subsets of shorter duration (specified by the ‘train_set_durs’ option in the LEARNCURVE section of a config.toml file) will be drawn from this set.
- Type:
[SPECT_PARAMS]
section#
- class vak.config.spect_params.SpectParamsConfig(fft_size=512, step_size=64, freq_cutoffs=None, thresh=None, transform_type=None, spect_key='s', freqbins_key='f', timebins_key='t', audio_path_key='audio_path')[source]#
represents parameters for making spectrograms from audio and saving in files
- freq_cutoffs#
of two elements, lower and higher frequencies. Used to bandpass filter audio (using a Butter filter) before generating spectrogram. Default is None, in which case no bandpass filtering is applied.
- Type:
- transform_type#
one of {‘log_spect’, ‘log_spect_plus_one’}. ‘log_spect’ transforms the spectrogram to log(spectrogram), and ‘log_spect_plus_one’ does the same thing but adds one to each element. Default is None. If None, no transform is applied.
- Type:
[DATALOADER]
section#
[TRAIN]
section#
- class vak.config.train.TrainConfig(models, num_epochs, batch_size, root_results_dir, csv_path=None, results_dirname=None, normalize_spectrograms=False, num_workers=2, device='cpu', shuffle=True, val_step=None, ckpt_step=None, patience=None, checkpoint_path=None, spect_scaler_path=None, labelmap_path=None)[source]#
class that represents [TRAIN] section of config.toml file
- num_epochs#
number of training epochs. One epoch = one iteration through the entire training set.
- Type:
- root_results_dir#
directory in which results will be created. The vak.cli.train function will create a subdirectory in this directory each time it runs.
- Type:
- num_workers#
Number of processes to use for parallel loading of data. Argument to torch.DataLoader.
- Type:
- device#
Device on which to work with model + data. Defaults to ‘cuda’ if torch.cuda.is_available is True.
- Type:
- normalize_spectrograms#
if True, use spect.utils.data.SpectScaler to normalize the spectrograms. Normalization is done by subtracting off the mean for each frequency bin of the training set and then dividing by the std for that frequency bin. This same normalization is then applied to validation + test data.
- Type:
- val_step#
Step on which to estimate accuracy using validation set. If val_step is n, then validation is carried out every time the global step / n is a whole number, i.e., when val_step modulo the global step is 0. Default is None, in which case no validation is done.
- Type:
- ckpt_step#
Step on which to save to checkpoint file. If ckpt_step is n, then a checkpoint is saved every time the global step / n is a whole number, i.e., when ckpt_step modulo the global step is 0. Default is None, in which case checkpoint is only saved at the last epoch.
- Type:
- patience#
number of validation steps to wait without performance on the validation set improving before stopping the training. Default is None, in which case training only stops after the specified number of epochs.
- Type:
- checkpoint_path#
path to directory with checkpoint files saved by Torch, to reload model. Default is None, in which case a new model is initialized.
- Type:
- spect_scaler_path#
path to a saved SpectScaler object used to normalize spectrograms. If spectrograms were normalized and this is not provided, will give incorrect results. Default is None.
- Type:
[EVAL]
section#
- class vak.config.eval.EvalConfig(checkpoint_path, labelmap_path, output_dir, models, batch_size, csv_path=None, spect_scaler_path=None, post_tfm_kwargs: dict | None = None, num_workers=2, device='cpu')[source]#
class that represents [EVAL] section of config.toml file
- num_workers#
Number of processes to use for parallel loading of data. Argument to torch.DataLoader. Default is 2.
- Type:
- device#
Device on which to work with model + data. Defaults to ‘cuda’ if torch.cuda.is_available is True.
- Type:
- spect_scaler_path#
path to a saved SpectScaler object used to normalize spectrograms. If spectrograms were normalized and this is not provided, will give incorrect results.
- Type:
- post_tfm_kwargs#
Keyword arguments to post-processing transform. If None, then no additional clean-up is applied when transforming labeled timebins to segments, the default behavior. The transform used is
vak.transforms.labeled_timebins.ToSegmentsWithPostProcessing`. Valid keyword argument names are 'majority_vote' and 'min_segment_dur', and should be appropriate values for those arguments: Boolean for ``majority_vote
, a float value formin_segment_dur
. See the docstring of the transform for more details on these arguments and how they work.- Type:
[PREDICT]
section#
- class vak.config.predict.PredictConfig(checkpoint_path, labelmap_path, models, batch_size, csv_path=None, spect_scaler_path=None, num_workers=2, device='cpu', annot_csv_filename=None, output_dir=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/vak/checkouts/stable/doc'), min_segment_dur=None, majority_vote=True, save_net_outputs=False)[source]#
class that represents [PREDICT] section of config.toml file
- csv_pathstr
path to where dataset was saved as a csv.
- checkpoint_pathstr
path to directory with checkpoint files saved by Torch, to reload model
- labelmap_pathstr
path to ‘labelmap.json’ file.
- modelslist
of model names. e.g., ‘models = TweetyNet, GRUNet, ConvNet’
- batch_sizeint
number of samples per batch presented to models during training.
- num_workersint
Number of processes to use for parallel loading of data. Argument to torch.DataLoader. Default is 2.
- devicestr
Device on which to work with model + data. Defaults to ‘cuda’ if torch.cuda.is_available is True.
- spect_scaler_pathstr
path to a saved SpectScaler object used to normalize spectrograms. If spectrograms were normalized and this is not provided, will give incorrect results.
- annot_csv_filenamestr
name of .csv file containing predicted annotations. Default is None, in which case the name of the dataset .csv is used, with ‘.annot.csv’ appended to it.
- output_dirstr
path to location where .csv containing predicted annotation should be saved. Defaults to current working directory.
- min_segment_durfloat
minimum duration of segment, in seconds. If specified, then any segment with a duration less than min_segment_dur is removed from lbl_tb. Default is None, in which case no segments are removed.
- majority_votebool
if True, transform segments containing multiple labels into segments with a single label by taking a “majority vote”, i.e. assign all time bins in the segment the most frequently occurring label in the segment. This transform can only be applied if the labelmap contains an ‘unlabeled’ label, because unlabeled segments makes it possible to identify the labeled segments. Default is False.
- save_net_outputsbool
if True, save ‘raw’ outputs of neural networks before they are converted to annotations. Default is False. Typically the output will be “logits” to which a softmax transform might be applied. For each item in the dataset–each row in the csv_path .csv– the output will be saved in a separate file in output_dir, with the extension {MODEL_NAME}.output.npz. E.g., if the input is a spectrogram with spect_path filename gy6or6_032312_081416.npz, and the network is TweetyNet, then the net output file will be gy6or6_032312_081416.tweetynet.output.npz.
[LEARNCURVE]
section#
- class vak.config.learncurve.LearncurveConfig(models, num_epochs, batch_size, root_results_dir, csv_path=None, results_dirname=None, normalize_spectrograms=False, num_workers=2, device='cpu', shuffle=True, val_step=None, ckpt_step=None, patience=None, checkpoint_path=None, spect_scaler_path=None, labelmap_path=None, previous_run_path=None, post_tfm_kwargs: dict | None = None, *, train_set_durs, num_replicates)[source]#
class that represents [LEARNCURVE] section of config.toml file
- num_epochs#
number of training epochs. One epoch = one iteration through the entire training set.
- Type:
- normalize_spectrograms#
if True, use spect.utils.data.SpectScaler to normalize the spectrograms. Normalization is done by subtracting off the mean for each frequency bin of the training set and then dividing by the std for that frequency bin. This same normalization is then applied to validation + test data.
- Type:
- ckpt_step#
step/epoch at which to save to checkpoint file. Default is None, in which case checkpoint is only saved at the last epoch.
- Type:
- patience#
number of epochs to wait without the error dropping before stopping the training. Default is None, in which case training continues for num_epochs
- Type:
- train_set_durs#
of int, durations in seconds of subsets taken from training data to create a learning curve, e.g. [5, 10, 15, 20]. Default is None (when training a single model on all available training data).
- Type:
- num_replicates#
number of times to replicate training for each training set duration to better estimate mean accuracy for a training set of that size. Each replicate uses a different randomly drawn subset of the training data (but of the same duration).
- Type:
- save_only_single_checkpoint_file#
if True, save only one checkpoint file instead of separate files every time we save. Default is True.
- Type:
- use_train_subsets_from_previous_run#
if True, use training subsets saved in a previous run. Default is False. Requires setting previous_run_path option in config.toml file.
- Type:
- previous_run_path#
path to results directory from a previous run. Used for training if use_train_subsets_from_previous_run is True.
- Type:
- post_tfm_kwargs#
Keyword arguments to post-processing transform. If None, then no additional clean-up is applied when transforming labeled timebins to segments, the default behavior. The transform used is
vak.transforms.labeled_timebins.ToSegmentsWithPostProcessing`. Valid keyword argument names are 'majority_vote' and 'min_segment_dur', and should be appropriate values for those arguments: Boolean for ``majority_vote
, a float value formin_segment_dur
. See the docstring of the transform for more details on these arguments and how they work.- Type: