Manifolds are multi-dimensional surfaces that could look flat if you're up really really close but from far away are curved. The classic example of a manifold is a torus, or "donut", where you could reshape a coffee mug into a donut by melting it while preserving the essential aspect - the handle or "hole."
The idea behind manifold embedding algorithms is to maintain the high dimensional structure of the manifold, but plot the data in two dimensions.
The math behind these algorithms is actually quite simple. We want to convert each point in high dimensions to a point in two dimensions:
Visually, you can think of converting each high $N$-dimensional sample $i$'s gene expression vector $x_i$ to a length 2 vector:
We'll compare MDS and t-SNE side by side once we get a brief introduction to both.
Multidimensional scaling is an algorithm which faithfully maintains all pairwise distances between the points in the dataset.
t-SNE is an extension of MDS. In addition to maintaining pairwise distances, t-SNE adds the constraint that things that were far apart in the high-dimensional data should also be far apart in 2d, and that things that are close together in high dimensions should stay close together.
In [ ]:
import random
In [ ]:
random.random()
In [ ]:
%load_ext autoreload
%autoreload 2
In [ ]:
%matplotlib inline
from decompositionplots import explore_manifold
explore_manifold()
While you're playing with the sliders above, discuss the questions below.