vak.datapipes.parametric_umap.parametric_umap.Datapipe¶
- class vak.datapipes.parametric_umap.parametric_umap.Datapipe(dataset_path: str | Path, dataset_df: DataFrame, split: str, subset: str | None = None, n_epochs: int = 200, n_neighbors: int = 10, metric: str = 'euclidean', random_state: int | None = None)[source]¶
Bases:
Dataset
A datapipe used with Parametric UMAP models.
- __init__(dataset_path: str | Path, dataset_df: DataFrame, split: str, subset: str | None = None, n_epochs: int = 200, n_neighbors: int = 10, metric: str = 'euclidean', random_state: int | None = None)[source]¶
Initialize a
ParametricUMAPDataset
instance.- Parameters:
dataset_path (pathlib.Path) – Path to directory that represents a parametric UMAP dataset, as created by
vak.prep.prep_parametric_umap_dataset()
.dataset_df (pandas.DataFrame) – A parametric UMAP dataset, represented as a
pandas.DataFrame
.split (str) – The name of a split from the dataset, one of {‘train’, ‘val’, ‘test’}.
subset (str, optional) – Name of subset to use. If specified, this takes precedence over split. Subsets are typically taken from the training data for use when generating a learning curve.
n_epochs (int) – Number of epochs model will be trained. Default is 200.
transform (callable, optional)
Methods
__init__
(dataset_path, dataset_df, split[, ...])Initialize a
ParametricUMAPDataset
instance.from_dataset_path
(dataset_path, split[, ...])Make a
ParametricUMAPDataset
instance, given the path to parametric UMAP dataset.Attributes
duration
shape
- classmethod from_dataset_path(dataset_path: str | Path, split: str, subset: str | None = None, n_neighbors: int = 10, metric: str = 'euclidean', random_state: int | None = None, n_epochs: int = 200)[source]¶
Make a
ParametricUMAPDataset
instance, given the path to parametric UMAP dataset.- Parameters:
dataset_path (pathlib.Path) – Path to directory that represents a parametric UMAP dataset, as created by
vak.prep.prep_parametric_umap_dataset()
.split (str) – The name of a split from the dataset, one of {‘train’, ‘val’, ‘test’}.
subset (str, optional) – Name of subset to use. If specified, this takes precedence over split. Subsets are typically taken from the training data for use when generating a learning curve.
n_neighbors (int) – Number of nearest neighbors to use when computing approximate nearest neighbors. Parameter passed to
pynndescent.NNDescent
andumap._umap.fuzzy_simplicial_set()
.metric (str) – Distance metric. Default is “cosine”. Parameter passed to
pynndescent.NNDescent
andumap._umap.fuzzy_simplicial_set()
.random_state (numpy.random.RandomState) – Either a numpy.random.RandomState instance, or None.
- Returns:
dataset
- Return type:
vak.datasets.parametric_umap.TrainDatapipe