vak.config.prep.PrepConfig

class vak.config.prep.PrepConfig(data_dir, output_dir, dataset_type, input_type, audio_format=None, spect_format=None, spect_params=None, annot_file=None, annot_format=None, labelset=None, audio_dask_bag_kwargs=None, train_dur=None, val_dur=None, test_dur=None, train_set_durs=None, num_replicates=None)[source]

Bases: object

Class that represents [vak.prep] table of configuration file.

data_dir

path to directory with files from which to make dataset

Type:

str

output_dir

Path to location where data sets should be saved. Default is None, in which case data sets are saved in the current working directory.

Type:

str

dataset_type

String name of the type of dataset, e.g., ‘frame_classification’. Dataset types are defined by machine learning tasks, e.g., a ‘frame_classification’ dataset would be used a vak.models.FrameClassificationModel model. Valid dataset types are defined as vak.prep.prep.DATASET_TYPES.

Type:

str

audio_format

format of audio files. One of {‘wav’, ‘cbin’}.

Type:

str

spect_format

format of files containg spectrograms as 2-d matrices. One of {‘mat’, ‘npy’}.

Type:

str

spect_params

Parameters for Short-Time Fourier Transform and post-processing of spectrograms. Instance of vak.config.SpectParamsConfig class. Optional, default is None.

Type:

vak.config.SpectParamsConfig, optional

annot_format

format of annotations. Any format that can be used with the crowsetta library is valid.

Type:

str

annot_file

Path to a single annotation file. Default is None. Used when a single file contains annotations for multiple audio files.

Type:

str

labelset

of str or int, the set of labels that correspond to annotated segments that a network should learn to segment and classify. Note that if there are segments that are not annotated, e.g. silent gaps between songbird syllables, then vak will assign a dummy label to those segments – you don’t have to give them a label here. Value for labelset is converted to a Python set using vak.config.converters.labelset_from_toml_value. See help for that function for details on how to specify labelset.

Type:

set

audio_dask_bag_kwargs

Keyword arguments used when calling dask.bag.from_sequence inside vak.io.audio, where it is used to parallelize the conversion of audio files into spectrograms. Option should be specified in config.toml file as an inline table, e.g., audio_dask_bag_kwargs = { npartitions = 20 }. Allows for finer-grained control when needed to process files of different sizes.

Type:

dict

train_dur

total duration of training set, in seconds. When creating a learning curve, training subsets of shorter duration (specified by the ‘train_set_durs’ option in the LEARNCURVE section of a config.toml file) will be drawn from this set.

Type:

float

val_dur

total duration of validation set, in seconds.

Type:

float

test_dur

total duration of test set, in seconds.

Type:

float

train_set_durs

Durations of datasets to use for a learning curve. Float values, durations in seconds of subsets taken from training data to create a learning curve, e.g. [5., 10., 15., 20.]. Default is None. Required if config file has a learncurve section.

Type:

list, optional

num_replicates

Number of replicates to train for each training set duration in a learning curve. Each replicate uses a different randomly drawn subset of the training data (but of the same duration). Default is None. Required if config file has a learncurve section.

Type:

int, optional

__init__(data_dir, output_dir, dataset_type, input_type, audio_format=None, spect_format=None, spect_params=None, annot_file=None, annot_format=None, labelset=None, audio_dask_bag_kwargs=None, train_dur=None, val_dur=None, test_dur=None, train_set_durs=None, num_replicates=None) None

Method generated by attrs for class PrepConfig.

Methods

__init__(data_dir, output_dir, dataset_type, ...)

Method generated by attrs for class PrepConfig.

from_config_dict(config_dict)

Return PrepConfig instance from a dict.

is_valid_dataset_type(attribute, value)

is_valid_input_type(attribute, value)

Attributes

classmethod from_config_dict(config_dict: dict) PrepConfig[source]

Return PrepConfig instance from a dict.

The dict passed in should be the one found by loading a valid configuration toml file with vak.config.parse.from_toml_path(), and then using key prep, i.e., PrepConfig.from_config_dict(config_dict['prep']).