t-SNE is a nonlinear algorithms developed by Laurens van der Maaten and Geoffrey Hinton.
The name of the algorithm comes from its incorporation of the "t" distribution and stochastic neighbour embedding.
t-SNE does not provide any transformation model, because it modifies the outputs directly to minimize the cost function. No other data can be transformed but that which the algorithm was just fitted to.
t-SNE tries to preserve distances between each input vector.
1) Create a probability distribution p(i, j) where sigma is a hyperparameter:
\begin{equation*} p_{ij} = \frac{\exp\left(-\left || x_i - x_j\right | |^2 \big/ 2\sigma^2\right)}{\displaystyle\sum_{k \neq l} \exp\left(-\left|| x_k - x_l\right||^2 \big/ 2\sigma^2\right)} \end{equation*}2) Initialize randomly low-dimensional mapping Y (N by k vector, where k << D) and define q(i,j) as:
\begin{equation*} q_{ij} = \frac{\exp\left(-\left || y_i - y_j\right | |^2 \right)}{\displaystyle\sum_{k \neq l} \exp\left(-\left|| y_k - y_l\right||^2 \right)} \end{equation*}With symetric SNE, it is not important how far something is from itself thus: \begin{equation*} p_{ij} = q_{ij} = 0 \end{equation*}
The goal is to obtain p(i, j) as close to q(i, j) as possible.
In order to compare two probability distributions, we can use Kullback-Leibler (KL) divergence: \begin{equation*} C = KL(P || Q) = \sum_{i} \sum_{j} p_{ij} log \frac{p_{ij}}{q_{ij}} \end{equation*}
An apptoximated solution is obtained by calculating the derivative of the cost function C and gradient descent: \begin{equation*} \frac{\delta C}{\delta y_{i}} = 4 \sum_{j} (p_{ij}-q_{ij})(y_{i} - y_{j}) \end{equation*}
This model has no weights, so the update affects the mapping directly.
Symmetric SNE suffers from crowding problem.
t-SNE uses t-distribution for the q mapping:
\begin{equation*} q_{ij} = \frac{\left(1+\left|| y_i - y_j\right||^2 \right)^{-1}}{\displaystyle\sum_{k \neq i} \left(1+\left|| y_i - y_j\right||^2 \right)^{-1}} \end{equation*}And a different p: \begin{equation*} p_{ij} = \frac{p_{j|i} + p_{i|j}}{2n} \end{equation*} \begin{equation*} p_{j|i} = \frac{\exp\left(-\left| x_i - x_j\right|^2 \big/ 2\sigma_i^2\right)}{\displaystyle\sum_{k \neq i} \exp\left(-\left| x_i - x_k\right|^2 \big/ 2\sigma_i^2\right)} \end{equation*}
In [1]:
import numpy as np
In [2]:
# Defines centers of the clouds on the
centers = np.array([[ 1, 1, 1],
[ 1, 1, -1],
[ 1, -1, 1],
[ 1, -1, -1],
[-1, 1, 1],
[-1, 1, -1],
[-1, -1, 1],
[-1, -1, -1]]) * 3
In [3]:
# Create the clouds, Gaussian samples centered at
data = []
points_per_cloud = 200
for c in centers:
cloud = np.random.randn(points_per_cloud, 3) + c
data.append(cloud)
data = np.concatenate(data)
In [4]:
# Generate colors for better visualization
colors = np.array([[i] * points_per_cloud for i in range(len(centers))]).flatten()
In [5]:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure(figsize = (14, 14))
ax = fig.add_subplot(111, projection='3d')
x = data[:, 0]
y = data[:, 1]
z = data[:, 2]
ax.scatter(x, y, z, c = colors)
plt.show()
In [6]:
## T-SNE
from sklearn.manifold import TSNE
tsne = TSNE(n_components = 2,
verbose = 1,
method = 'barnes_hut')
transformed = tsne.fit_transform(data)
In [7]:
new_x = transformed[:, 0]
new_y = transformed[:, 1]
In [8]:
plt.figure(figsize = (14, 14))
plt.scatter(new_x, new_y, c = colors)
plt.show()
In [9]:
# Generate the donut data
N = 600
R_inner = 10
R_outer = 20
# distance from origin is radius + random normal
# angle theta is uniformly distributed between (0, 2pi)
R1 = np.random.randn(N // 2) + R_inner
theta = 2*np.pi*np.random.random(N // 2)
X_inner = np.concatenate([[R1 * np.cos(theta)], [R1 * np.sin(theta)]]).T
R2 = np.random.randn(N // 2) + R_outer
theta = 2*np.pi*np.random.random(N//2)
X_outer = np.concatenate([[R2 * np.cos(theta)], [R2 * np.sin(theta)]]).T
X = np.concatenate([X_inner, X_outer])
Y = np.array([0]*(N // 2) + [1]*(N // 2))
In [10]:
plt.figure(figsize = (12, 12))
plt.scatter(X[:,0], X[:,1], s = 100, c = Y, alpha = 0.5)
plt.show()
In [11]:
# Perform T-SNE manifold on the data
# From http://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html :
# "The perplexity is related to the number of nearest neighbors that is used in other manifold learning algorithms.
# Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50.
# The choice is not extremely critical since t-SNE is quite insensitive to this parameter.
tsne = TSNE(perplexity = 40)
Z = tsne.fit_transform(X)
In [12]:
plt.figure(figsize = (12, 12))
plt.scatter(Z[:,0],
Z[:,1],
s = 100,
c = Y,
alpha = 0.5)
plt.show()
In [13]:
# Generate the data
X1 = np.random.random((100, 2))
X2 = np.random.random((100, 2)) - np.array([1, 1])
X3 = np.random.random((100, 2)) - np.array([1, 0])
X4 = np.random.random((100, 2)) - np.array([0, 1])
X = np.vstack((X1, X2, X3, X4))
Y = np.array([0] * 200 + [1] * 200)
In [14]:
# Visualize the data
plt.figure(figsize = (12, 12))
plt.scatter(X[:,0],
X[:,1],
s = 100,
c = Y,
alpha = 0.5)
plt.show()
In [15]:
tsne = TSNE(perplexity = 40)
Z = tsne.fit_transform(X)
In [16]:
plt.figure(figsize = (12, 12))
plt.scatter(Z[:,0],
Z[:,1],
s = 100,
c = Y,
alpha = 0.5)
plt.show()