vak.datapipes.parametric_umap.parametric_umap.Datapipe¶

class vak.datapipes.parametric_umap.parametric_umap.Datapipe(dataset_path: str | Path, dataset_df: DataFrame, split: str, subset: str | None = None, n_epochs: int = 200, n_neighbors: int = 10, metric: str = 'euclidean', random_state: int | None = None)[source]¶

Bases: Dataset

A datapipe used with Parametric UMAP models.

__init__(dataset_path: str | Path, dataset_df: DataFrame, split: str, subset: str | None = None, n_epochs: int = 200, n_neighbors: int = 10, metric: str = 'euclidean', random_state: int | None = None)[source]¶

Initialize a ParametricUMAPDataset instance.

Parameters:

dataset_path (pathlib.Path) – Path to directory that represents a parametric UMAP dataset, as created by vak.prep.prep_parametric_umap_dataset().
dataset_df (pandas.DataFrame) – A parametric UMAP dataset, represented as a pandas.DataFrame.
split (str) – The name of a split from the dataset, one of {‘train’, ‘val’, ‘test’}.
subset (str, optional) – Name of subset to use. If specified, this takes precedence over split. Subsets are typically taken from the training data for use when generating a learning curve.
n_epochs (int) – Number of epochs model will be trained. Default is 200.
transform (callable, optional)

Methods

`__init__`(dataset_path, dataset_df, split[, ...])	Initialize a `ParametricUMAPDataset` instance.
`from_dataset_path`(dataset_path, split[, ...])	Make a `ParametricUMAPDataset` instance, given the path to parametric UMAP dataset.

Attributes

`duration`
`shape`

classmethod from_dataset_path(dataset_path: str | Path, split: str, subset: str | None = None, n_neighbors: int = 10, metric: str = 'euclidean', random_state: int | None = None, n_epochs: int = 200)[source]¶

Make a ParametricUMAPDataset instance, given the path to parametric UMAP dataset.

Parameters:

dataset_path (pathlib.Path) – Path to directory that represents a parametric UMAP dataset, as created by vak.prep.prep_parametric_umap_dataset().
split (str) – The name of a split from the dataset, one of {‘train’, ‘val’, ‘test’}.
subset (str, optional) – Name of subset to use. If specified, this takes precedence over split. Subsets are typically taken from the training data for use when generating a learning curve.
n_neighbors (int) – Number of nearest neighbors to use when computing approximate nearest neighbors. Parameter passed to pynndescent.NNDescent and umap._umap.fuzzy_simplicial_set().
metric (str) – Distance metric. Default is “cosine”. Parameter passed to pynndescent.NNDescent and umap._umap.fuzzy_simplicial_set().
random_state (numpy.random.RandomState) – Either a numpy.random.RandomState instance, or None.

Returns:

dataset

Return type:

vak.datasets.parametric_umap.TrainDatapipe