vak.train.parametric_umap.train_parametric_umap_model#

vak.train.parametric_umap.train_parametric_umap_model(model_name: str, model_config: dict, dataset_path: str | Path, batch_size: int, num_epochs: int, num_workers: int, train_transform_params: dict | None = None, train_dataset_params: dict | None = None, val_transform_params: dict | None = None, val_dataset_params: dict | None = None, checkpoint_path: str | Path | None = None, root_results_dir: str | Path | None = None, results_path: str | Path | None = None, shuffle: bool = True, val_step: int | None = None, ckpt_step: int | None = None, device: str | None = None, subset: str | None = None) None[source]#

Train a model from the parametric UMAP family and save results.

Saves checkpoint files for model, label map, and spectrogram scaler. These are saved either in results_path if specified, or a new directory made inside root_results_dir.

Parameters:
  • model_name (str) – Model name, must be one of vak.models.registry.MODEL_NAMES.

  • model_config (dict) – Model configuration in a dict, as loaded from a .toml file, and used by the model method from_config.

  • dataset_path (str) – Path to dataset, a directory generated by running vak prep.

  • batch_size (int) – number of samples per batch presented to models during training.

  • num_epochs (int) – number of training epochs. One epoch = one iteration through the entire training set.

  • num_workers (int) – Number of processes to use for parallel loading of data. Argument to torch.DataLoader.

  • train_dataset_params (dict, optional) – Parameters for training dataset. Passed as keyword arguments to vak.datasets.parametric_umap.ParametricUMAP. Optional, default is None.

  • val_dataset_params (dict, optional) – Parameters for validation dataset. Passed as keyword arguments to vak.datasets.parametric_umap.ParametricUMAP. Optional, default is None.

  • checkpoint_path (str, pathlib.Path, optional) – path to a checkpoint file, e.g., one generated by a previous run of vak.core.train. If specified, this checkpoint will be loaded into model. Used when continuing training. Default is None, in which case a new model is initialized.

  • root_results_dir (str, pathlib.Path, optional) – Root directory in which a new directory will be created where results will be saved.

  • results_path (str, pathlib.Path, optional) – Directory where results will be saved. If specified, this parameter overrides root_results_dir.

  • val_step (int) – Computes the loss using validation set every val_step epochs. Default is None, in which case no validation is done.

  • ckpt_step (int) – Step on which to save to checkpoint file. If ckpt_step is n, then a checkpoint is saved every time the global step / n is a whole number, i.e., when ckpt_step modulo the global step is 0. Default is None, in which case checkpoint is only saved at the last epoch.

  • device (str) – Device on which to work with model + data. Default is None. If None, then a device will be selected with vak.split.get_default. That function defaults to β€˜cuda’ if torch.cuda.is_available is True.

  • shuffle (bool) – if True, shuffle training data before each epoch. Default is True.

  • split (str) – Name of split from dataset found at dataset_path to use when training model. Default is β€˜train’. This parameter is used by vak.learncurve.learncurve to specify specific subsets of the training set to use when training models for a learning curve.