vak.datasets.parametric_umap.parametric_umap.get_umap_graph#

vak.datasets.parametric_umap.parametric_umap.get_umap_graph(X: ndarray[Any, dtype[_ScalarType_co]], n_neighbors: int = 10, metric: str = 'euclidean', random_state: RandomState | None = None, max_candidates: int = 60, verbose: bool = True) coo_matrix[source]#

Get graph used by UMAP, the fuzzy topological representation.

Parameters:
  • X (numpy.ndarray) – Data from which to build the graph.

  • n_neighbors (int) – Number of nearest neighbors to use when computing approximate nearest neighbors. Parameter passed to pynndescent.NNDescent and umap._umap.fuzzy_simplicial_set().

  • metric (str) – Distance metric. Default is β€œcosine”. Parameter passed to pynndescent.NNDescent and umap._umap.fuzzy_simplicial_set().

  • random_state (numpy.random.RandomState) – Either a numpy.random.RandomState instance, or None.

  • max_candidates (int) – Default is 60. Parameter passed to pynndescent.NNDescent.

  • verbose (bool) – Whether pynndescent.NNDescent should log finding the approximate nearest neighbors. Default is True.

Returns:

graph

Return type:

scipy.sparse.csr_matrix

Notes

Adapted from https://github.com/timsainb/ParametricUMAP_paper

The graph returned is a graph of the probabilities of an edge exists between points.

Local, one-directional, probabilities (\(P^{UMAP}_{i|j}\)) are computed between a point and its neighbors to determine the probability with which an edge (or simplex) exists, based upon an assumption that data is uniformly distributed across a manifold in a warped dataspace. Under this assumption, a local notion of distance is set by the distance to the \(k^{th}\) nearest neighbor and the local probability is scaled by that local notion of distance.

Where \(\rho_{i}\) is a local connectivity parameter set to the distance from \(x_i\) to its nearest neighbor, and \(\sigma_{i}\) is a local connectivity parameter set to match the local distance around \(x_i\) upon its \(k\) nearest neighbors (where \(k\) is a hyperparameter). In the UMAP package, these are calculated using umap._umap.smooth_knn_dist().