Notes on: McInnes, L., & Healy, J. (2018): Umap: uniform manifold approximation and projection for dimension reduction

1 Notation

• $$X = \left\{ X_1, \dots, X_N \right\}$$ is the input data
• $$\mathcal{M}$$ is the manifold which data is assumed to lie in
• $$p \in \mathcal{M}$$ is a point in manifold
• $$g_p$$ is an inner product on the tangent space $$T_p \mathcal{M}$$

2 Overview

• Dimension reduction algorithms usually in two categories:
• Preserve distrance structure within the data
• E.g. PCA, MDS, and Sammon mapping
• Preservation of local distances over global distances
• E.g. t-SNE, Isomap, LargeVis, Laplacian eigenmaps, diffusion maps, NeRV, and JSE

3 UMAP Algorithm

1. Local manifold approximations and patches together their local fuzzy simplicial set representations
2. This constructs topological representation of high-dimensional data
3. Given low-dimensional representation of data, use same process to construct equivalent topological representation
4. Optimize layout of the data representation in the low dimensional space, minimizing the cross-entropy between the two topological representations

Construction of fuzzy topological representations consists of two problems:

1. approximating a manifold on which data is assumed to lie
2. constructing a fuzzy simplicial set representation of the approximated manifold

3.1 Estimating manifold

• Theoretical reasons => useful to assume data is distributed uniformly on the manifold
• Not reality; but can equip manifold with Riemannian metric such that data is uniformly distributed with respect to this constructed Riemannin metric!
• Does so by adapting the metric locally to each $$X_i$$
• Pro: can satisfy the uniform assumption
• Con: different types of data might want to use different notions of distance
• E.g. categorical using Hamming distance
• Can be solved by converting metric spaces into fuzzy simplicial sets

3.2 Fuzzy topological representations

• Will be used as means to merge the incompatible local views of the data

The category $$\Delta$$ has as objects the finite order sets $$[n] = \left\{ 1, \dots, n \right\}$$ with morphisms given by (non-strictly) order-preserving maps.

A simplical set is a function from $$\Delta^{\text{op}}$$ to Sets, the category of sets.