Notes on: McInnes, L., & Healy, J. (2018): Umap: uniform manifold approximation and projection for dimension reduction

Table of Contents

1 Notation

  • \(X = \left\{ X_1, \dots, X_N \right\}\) is the input data
  • \(\mathcal{M}\) is the manifold which data is assumed to lie in
  • \(p \in \mathcal{M}\) is a point in manifold
  • \(g_p\) is an inner product on the tangent space \(T_p \mathcal{M}\)

2 Overview

  • Dimension reduction algorithms usually in two categories:
    • Preserve distrance structure within the data
      • E.g. PCA, MDS, and Sammon mapping
    • Preservation of local distances over global distances
      • E.g. t-SNE, Isomap, LargeVis, Laplacian eigenmaps, diffusion maps, NeRV, and JSE

3 UMAP Algorithm

  1. Local manifold approximations and patches together their local fuzzy simplicial set representations
  2. This constructs topological representation of high-dimensional data
  3. Given low-dimensional representation of data, use same process to construct equivalent topological representation
  4. Optimize layout of the data representation in the low dimensional space, minimizing the cross-entropy between the two topological representations

Construction of fuzzy topological representations consists of two problems:

  1. approximating a manifold on which data is assumed to lie
  2. constructing a fuzzy simplicial set representation of the approximated manifold

3.1 Estimating manifold

  • Theoretical reasons => useful to assume data is distributed uniformly on the manifold
    • Not reality; but can equip manifold with Riemannian metric such that data is uniformly distributed with respect to this constructed Riemannin metric!
    • Does so by adapting the metric locally to each \(X_i\)
    • Pro: can satisfy the uniform assumption
    • Con: different types of data might want to use different notions of distance
      • E.g. categorical using Hamming distance
      • Can be solved by converting metric spaces into fuzzy simplicial sets

3.2 Fuzzy topological representations

  • Will be used as means to merge the incompatible local views of the data

The category \(\Delta\) has as objects the finite order sets \([n] = \left\{ 1, \dots, n \right\}\) with morphisms given by (non-strictly) order-preserving maps.

A simplical set is a function from \(\Delta^{\text{op}}\) to Sets, the category of sets.