vak.eval.eval_.eval

vak.eval.eval_.eval(model_config: dict, dataset_config: dict, trainer_config: dict, checkpoint_path: str | Path, output_dir: str | Path, num_workers: int, labelmap_path: str | Path | None = None, batch_size: int | None = None, frames_standardizer_path: str | Path | None = None, post_tfm_kwargs: dict | None = None, device: str | None = None) None[source]

Evaluate a trained model.

Parameters:
  • model_config (dict) – Model configuration in a dict. Can be obtained by calling vak.config.ModelConfig.asdict().

  • dataset_config (dict) – Dataset configuration in a dict. Can be obtained by calling vak.config.DatasetConfig.asdict().

  • trainer_config (dict) – Configuration for lightning.pytorch.Trainer. Can be obtained by calling vak.config.TrainerConfig.asdict().

  • checkpoint_path (str, pathlib.Path) – path to directory with checkpoint files saved by Torch, to reload model

  • output_dir (str, pathlib.Path) – Path to location where .csv files with evaluation metrics should be saved.

  • num_workers (int) – Number of processes to use for parallel loading of data. Argument to torch.DataLoader. Default is 2.

  • labelmap_path (str, pathlib.Path, optional) – Path to ‘labelmap.json’ file. Optional, default is None.

  • batch_size (int, optional.) – Number of samples per batch fed into model. Optional, default is None.

  • split (str) – split of dataset on which model should be evaluated. One of {‘train’, ‘val’, ‘test’}. Default is ‘test’.

  • frames_standardizer_path (str, pathlib.Path) – path to a saved FramesStandardizer object used to standardize frames. If frames were standardized during training, and this is not provided, then evaluation will give incorrect results. Default is None.

  • post_tfm_kwargs (dict) – Keyword arguments to post-processing transform. If None, then no additional clean-up is applied when transforming labeled timebins to segments, the default behavior. The transform used is vak.transforms.frame_labels.PostProcess`. Valid keyword argument names are 'majority_vote' and 'min_segment_dur', and should be appropriate values for those arguments: Boolean for ``majority_vote, a float value for min_segment_dur. See the docstring of the transform for more details on these arguments and how they work.

  • device (str) – Device on which to work with model + data. Defaults to ‘cuda’ if torch.cuda.is_available is True.

Notes

Note that unlike core.predict, this function can modify labelmap so that metrics like edit distance are correctly computed, by converting any string labels in labelmap with multiple characters to (mock) single-character labels, with vak.labels.multi_char_labels_to_single_char.