vak.datasets.parametric_umap.parametric_umap.ParametricUMAPDataset#
- class vak.datasets.parametric_umap.parametric_umap.ParametricUMAPDataset(dataset_path: str | Path, dataset_df: DataFrame, split: str, subset: str | None = None, n_epochs: int = 200, n_neighbors: int = 10, metric: str = 'euclidean', random_state: int | None = None, transform: Callable | None = None)[source]#
Bases:
Dataset
A dataset class used to train Parametric UMAP models.
- __init__(dataset_path: str | Path, dataset_df: DataFrame, split: str, subset: str | None = None, n_epochs: int = 200, n_neighbors: int = 10, metric: str = 'euclidean', random_state: int | None = None, transform: Callable | None = None)[source]#
Initialize a
ParametricUMAPDataset
instance.- Parameters:
dataset_path (pathlib.Path) β Path to directory that represents a parametric UMAP dataset, as created by
vak.prep.prep_parametric_umap_dataset()
.dataset_df (pandas.DataFrame) β A parametric UMAP dataset, represented as a
pandas.DataFrame
.split (str) β The name of a split from the dataset, one of {βtrainβ, βvalβ, βtestβ}.
subset (str, optional) β Name of subset to use. If specified, this takes precedence over split. Subsets are typically taken from the training data for use when generating a learning curve.
n_epochs (int) β Number of epochs model will be trained. Default is 200.
transform (callable, optional) β
Methods
__init__
(dataset_path, dataset_df, split[, ...])Initialize a
ParametricUMAPDataset
instance.from_dataset_path
(dataset_path, split[, ...])Make a
ParametricUMAPDataset
instance, given the path to parametric UMAP dataset.Attributes
duration
shape
- classmethod from_dataset_path(dataset_path: str | Path, split: str, subset: str | None = None, n_neighbors: int = 10, metric: str = 'euclidean', random_state: int | None = None, n_epochs: int = 200, transform: Callable | None = None)[source]#
Make a
ParametricUMAPDataset
instance, given the path to parametric UMAP dataset.- Parameters:
dataset_path (pathlib.Path) β Path to directory that represents a parametric UMAP dataset, as created by
vak.prep.prep_parametric_umap_dataset()
.split (str) β The name of a split from the dataset, one of {βtrainβ, βvalβ, βtestβ}.
subset (str, optional) β Name of subset to use. If specified, this takes precedence over split. Subsets are typically taken from the training data for use when generating a learning curve.
n_neighbors (int) β Number of nearest neighbors to use when computing approximate nearest neighbors. Parameter passed to
pynndescent.NNDescent
andumap._umap.fuzzy_simplicial_set()
.metric (str) β Distance metric. Default is βcosineβ. Parameter passed to
pynndescent.NNDescent
andumap._umap.fuzzy_simplicial_set()
.random_state (numpy.random.RandomState) β Either a numpy.random.RandomState instance, or None.
transform (callable) β The transform applied to the input to the neural network \(x\).
- Returns:
dataset
- Return type:
vak.datasets.parametric_umap.ParametricUMAPDataset