vak.datasets.parametric_umap.parametric_umap.ParametricUMAPDataset#

class vak.datasets.parametric_umap.parametric_umap.ParametricUMAPDataset(dataset_path: str | Path, dataset_df: DataFrame, split: str, subset: str | None = None, n_epochs: int = 200, n_neighbors: int = 10, metric: str = 'euclidean', random_state: int | None = None, transform: Callable | None = None)[source]#

Bases: Dataset

A dataset class used to train Parametric UMAP models.

__init__(dataset_path: str | Path, dataset_df: DataFrame, split: str, subset: str | None = None, n_epochs: int = 200, n_neighbors: int = 10, metric: str = 'euclidean', random_state: int | None = None, transform: Callable | None = None)[source]#

Initialize a ParametricUMAPDataset instance.

Parameters:
  • dataset_path (pathlib.Path) – Path to directory that represents a parametric UMAP dataset, as created by vak.prep.prep_parametric_umap_dataset().

  • dataset_df (pandas.DataFrame) – A parametric UMAP dataset, represented as a pandas.DataFrame.

  • split (str) – The name of a split from the dataset, one of {β€˜train’, β€˜val’, β€˜test’}.

  • subset (str, optional) – Name of subset to use. If specified, this takes precedence over split. Subsets are typically taken from the training data for use when generating a learning curve.

  • n_epochs (int) – Number of epochs model will be trained. Default is 200.

  • transform (callable, optional) –

Methods

__init__(dataset_path, dataset_df, split[, ...])

Initialize a ParametricUMAPDataset instance.

from_dataset_path(dataset_path, split[, ...])

Make a ParametricUMAPDataset instance, given the path to parametric UMAP dataset.

Attributes

duration

shape

classmethod from_dataset_path(dataset_path: str | Path, split: str, subset: str | None = None, n_neighbors: int = 10, metric: str = 'euclidean', random_state: int | None = None, n_epochs: int = 200, transform: Callable | None = None)[source]#

Make a ParametricUMAPDataset instance, given the path to parametric UMAP dataset.

Parameters:
  • dataset_path (pathlib.Path) – Path to directory that represents a parametric UMAP dataset, as created by vak.prep.prep_parametric_umap_dataset().

  • split (str) – The name of a split from the dataset, one of {β€˜train’, β€˜val’, β€˜test’}.

  • subset (str, optional) – Name of subset to use. If specified, this takes precedence over split. Subsets are typically taken from the training data for use when generating a learning curve.

  • n_neighbors (int) – Number of nearest neighbors to use when computing approximate nearest neighbors. Parameter passed to pynndescent.NNDescent and umap._umap.fuzzy_simplicial_set().

  • metric (str) – Distance metric. Default is β€œcosine”. Parameter passed to pynndescent.NNDescent and umap._umap.fuzzy_simplicial_set().

  • random_state (numpy.random.RandomState) – Either a numpy.random.RandomState instance, or None.

  • transform (callable) – The transform applied to the input to the neural network \(x\).

Returns:

dataset

Return type:

vak.datasets.parametric_umap.ParametricUMAPDataset