# Notes on: McInnes, L., & Healy, J. (2018): Umap: uniform manifold approximation and projection for dimension reduction

## Table of Contents

## 1 Notation

- \(X = \left\{ X_1, \dots, X_N \right\}\) is the input data
- \(\mathcal{M}\) is the manifold which data is assumed to lie in
- \(p \in \mathcal{M}\) is a point in manifold
- \(g_p\) is an inner product on the tangent space \(T_p \mathcal{M}\)

## 2 Overview

- Dimension reduction algorithms usually in two categories:
- Preserve distrance structure within the data
- E.g. PCA, MDS, and Sammon mapping

- Preservation of local distances over global distances
- E.g. t-SNE, Isomap, LargeVis, Laplacian eigenmaps, diffusion maps, NeRV, and JSE

- Preserve distrance structure within the data

## 3 UMAP Algorithm

- Local manifold approximations and patches together their local fuzzy simplicial set representations
- This constructs topological representation of high-dimensional data
- Given low-dimensional representation of data, use same process to construct equivalent topological representation
- Optimize layout of the data representation in the low dimensional space, minimizing the cross-entropy between the two topological representations

Construction of **fuzzy topological representations** consists of two problems:

- approximating a manifold on which data is assumed to lie
- constructing a
*fuzzy simplicial set representation*of the approximated manifold

### 3.1 Estimating manifold

- Theoretical reasons => useful to assume data is distributed
*uniformly*on the manifold- Not reality; but can equip manifold with Riemannian metric such that data is uniformly distributed
*with respect to this constructed Riemannin metric*! - Does so by adapting the metric locally to each \(X_i\)
- Pro: can satisfy the uniform assumption
- Con: different types of data might want to use different notions of distance
- E.g. categorical using Hamming distance
- Can be solved by
*converting metric spaces into fuzzy simplicial sets*

- Not reality; but can equip manifold with Riemannian metric such that data is uniformly distributed

### 3.2 Fuzzy topological representations

- Will be used as means to merge the incompatible local views of the data

The category \(\Delta\) has as objects the finite order sets \([n] = \left\{ 1, \dots, n \right\}\) with morphisms given by (non-strictly) order-preserving maps.

A **simplical set** is a function from \(\Delta^{\text{op}}\) to *Sets*, the category of sets.