vak.datapipes.parametric_umap.parametric_umap.get_umap_graph¶
- vak.datapipes.parametric_umap.parametric_umap.get_umap_graph(X: ndarray[Any, dtype[_ScalarType_co]], n_neighbors: int = 10, metric: str = 'euclidean', random_state: RandomState | None = None, max_candidates: int = 60, verbose: bool = True) coo_matrix [source]¶
Get graph used by UMAP, the fuzzy topological representation.
- Parameters:
X (numpy.ndarray) – Data from which to build the graph.
n_neighbors (int) – Number of nearest neighbors to use when computing approximate nearest neighbors. Parameter passed to
pynndescent.NNDescent
andumap._umap.fuzzy_simplicial_set()
.metric (str) – Distance metric. Default is “cosine”. Parameter passed to
pynndescent.NNDescent
andumap._umap.fuzzy_simplicial_set()
.random_state (numpy.random.RandomState) – Either a numpy.random.RandomState instance, or None.
max_candidates (int) – Default is 60. Parameter passed to
pynndescent.NNDescent
.verbose (bool) – Whether
pynndescent.NNDescent
should log finding the approximate nearest neighbors. Default is True.
- Returns:
graph
- Return type:
scipy.sparse.csr_matrix
Notes
Adapted from https://github.com/timsainb/ParametricUMAP_paper
The graph returned is a graph of the probabilities of an edge exists between points.
Local, one-directional, probabilities (\(P^{UMAP}_{i|j}\)) are computed between a point and its neighbors to determine the probability with which an edge (or simplex) exists, based upon an assumption that data is uniformly distributed across a manifold in a warped dataspace. Under this assumption, a local notion of distance is set by the distance to the \(k^{th}\) nearest neighbor and the local probability is scaled by that local notion of distance.
Where \(\rho_{i}\) is a local connectivity parameter set to the distance from \(x_i\) to its nearest neighbor, and \(\sigma_{i}\) is a local connectivity parameter set to match the local distance around \(x_i\) upon its \(k\) nearest neighbors (where \(k\) is a hyperparameter). In the UMAP package, these are calculated using
umap._umap.smooth_knn_dist()
.