vak.train.parametric_umap.train_parametric_umap_model#
- vak.train.parametric_umap.train_parametric_umap_model(model_name: str, model_config: dict, dataset_path: str | Path, batch_size: int, num_epochs: int, num_workers: int, train_transform_params: dict | None = None, train_dataset_params: dict | None = None, val_transform_params: dict | None = None, val_dataset_params: dict | None = None, checkpoint_path: str | Path | None = None, root_results_dir: str | Path | None = None, results_path: str | Path | None = None, shuffle: bool = True, val_step: int | None = None, ckpt_step: int | None = None, device: str | None = None, subset: str | None = None) None [source]#
Train a model from the parametric UMAP family and save results.
Saves checkpoint files for model, label map, and spectrogram scaler. These are saved either in
results_path
if specified, or a new directory made insideroot_results_dir
.- Parameters:
model_name (str) β Model name, must be one of vak.models.registry.MODEL_NAMES.
model_config (dict) β Model configuration in a
dict
, as loaded from a .toml file, and used by the model methodfrom_config
.dataset_path (str) β Path to dataset, a directory generated by running
vak prep
.batch_size (int) β number of samples per batch presented to models during training.
num_epochs (int) β number of training epochs. One epoch = one iteration through the entire training set.
num_workers (int) β Number of processes to use for parallel loading of data. Argument to torch.DataLoader.
train_dataset_params (dict, optional) β Parameters for training dataset. Passed as keyword arguments to
vak.datasets.parametric_umap.ParametricUMAP
. Optional, default is None.val_dataset_params (dict, optional) β Parameters for validation dataset. Passed as keyword arguments to
vak.datasets.parametric_umap.ParametricUMAP
. Optional, default is None.checkpoint_path (str, pathlib.Path, optional) β path to a checkpoint file, e.g., one generated by a previous run of
vak.core.train
. If specified, this checkpoint will be loaded into model. Used when continuing training. Default is None, in which case a new model is initialized.root_results_dir (str, pathlib.Path, optional) β Root directory in which a new directory will be created where results will be saved.
results_path (str, pathlib.Path, optional) β Directory where results will be saved. If specified, this parameter overrides
root_results_dir
.val_step (int) β Computes the loss using validation set every
val_step
epochs. Default is None, in which case no validation is done.ckpt_step (int) β Step on which to save to checkpoint file. If ckpt_step is n, then a checkpoint is saved every time the global step / n is a whole number, i.e., when ckpt_step modulo the global step is 0. Default is None, in which case checkpoint is only saved at the last epoch.
device (str) β Device on which to work with model + data. Default is None. If None, then a device will be selected with vak.split.get_default. That function defaults to βcudaβ if torch.cuda.is_available is True.
shuffle (bool) β if True, shuffle training data before each epoch. Default is True.
split (str) β Name of split from dataset found at
dataset_path
to use when training model. Default is βtrainβ. This parameter is used by vak.learncurve.learncurve to specify specific subsets of the training set to use when training models for a learning curve.