Image t-SNE

This notebook will take you through the process of generating a t-SNE of a set of images, using a feature vector for each image derived from the activations of the last fully-connected layer in a pre-trained convolutional neural network (convnet).

To go through this, you must first have gone through the notebook which generates feature vectors for a folder of images (8a_image_search.ipnyb).

If you'd like to follow the last section which converts the t-SNE points to a grid assignment, you'll need bmcfee's fork of Mario Klingemann's RasterFairy, which can be installed with pip in with the following command.

pip install -U git+https://github.com/bmcfee/RasterFairy/ --user

Now we can begin. Run the following import commands and make sure all the libraries are correctly installed and import without errors.


In [1]:
%matplotlib inline
import os
import random
import numpy as np
import json
import matplotlib.pyplot
import pickle
from matplotlib.pyplot import imshow
from PIL import Image
from sklearn.manifold import TSNE

First, we will load our image paths and feature vectors from the previous notebook (image-search.ipynb) into memory. We can print their contents to get an idea of what they look like. If you did not run this previous notebook, you need to go through that first to generate the saved feature vectors file features_caltech101.p.


In [9]:
images, pca_features, pca = pickle.load(open('../data/features_caltech101.p', 'rb'))

for img, f in list(zip(images, pca_features))[0:5]:
    print("image: %s, features: %0.2f,%0.2f,%0.2f,%0.2f... "%(img, f[0], f[1], f[2], f[3]))


image: ../data/101_ObjectCategories/dragonfly/image_0012.jpg, features: -1.04,1.21,-5.53,26.57... 
image: ../data/101_ObjectCategories/dragonfly/image_0014.jpg, features: 0.77,-14.03,-10.91,18.38... 
image: ../data/101_ObjectCategories/dragonfly/image_0060.jpg, features: 1.39,-10.45,1.29,9.03... 
image: ../data/101_ObjectCategories/dragonfly/image_0061.jpg, features: 0.99,-12.58,-12.63,26.07... 
image: ../data/101_ObjectCategories/dragonfly/image_0036.jpg, features: -7.05,-9.46,-4.66,7.73... 

In our dataset that we've loaded, there are 9144 images. Although in principle, t-SNE works with any number of images, it's difficult to place that many tiles in a single image. So instead, we will take a random subset of 1000 images and plot those on a t-SNE instead. This step is optional, or you can try changing num_images_to_plot.


In [3]:
num_images_to_plot = 1000

if len(images) > num_images_to_plot:
    sort_order = sorted(random.sample(range(len(images)), num_images_to_plot))
    images = [images[i] for i in sort_order]
    pca_features = [pca_features[i] for i in sort_order]

It is usually a good idea to first run the vectors through a faster dimensionality reduction technique like principal component analysis to project your data into an intermediate lower-dimensional space before using t-SNE. This improves accuracy, and cuts down on runtime since PCA is more efficient than t-SNE. Since we have already projected our data down with PCA in the previous notebook, we can proceed straight to running the t-SNE on the feature vectors. Run the command in the following cell, taking note of the arguments:

  • n_components is the number of dimensions to project down to. In principle it can be anything, but in practice t-SNE is almost always used to project to 2 or 3 dimensions for visualization purposes.
  • learning_rate is the step size for iterations. You usually won't need to adjust this much, but your results may vary slightly.
  • perplexity refers to the number of independent clusters or zones t-SNE will attempt to fit points around. Again, it is relatively robust to large changes, and usually 20-50 works best.
  • angle controls the speed vs accuracy tradeoff. Lower angle means better accuracy but slower, although in practice, there is usually little improvement below a certain threshold.

In [4]:
X = np.array(pca_features)
tsne = TSNE(n_components=2, learning_rate=150, perplexity=30, angle=0.2, verbose=2).fit_transform(X)


[t-SNE] Computing 91 nearest neighbors...
[t-SNE] Indexed 1000 samples in 0.019s...
[t-SNE] Computed neighbors for 1000 samples in 1.191s...
[t-SNE] Computed conditional probabilities for sample 1000 / 1000
[t-SNE] Mean sigma: 16.884794
[t-SNE] Computed conditional probabilities in 0.090s
[t-SNE] Iteration 50: error = 70.5699615, gradient norm = 0.2825119 (50 iterations in 1.522s)
[t-SNE] Iteration 100: error = 70.3222961, gradient norm = 0.2756970 (50 iterations in 1.336s)
[t-SNE] Iteration 150: error = 71.1407242, gradient norm = 0.2418590 (50 iterations in 1.406s)
[t-SNE] Iteration 200: error = 71.1724319, gradient norm = 0.2523526 (50 iterations in 1.334s)
[t-SNE] Iteration 250: error = 71.6720657, gradient norm = 0.2296845 (50 iterations in 1.384s)
[t-SNE] KL divergence after 250 iterations with early exaggeration: 71.672066
[t-SNE] Iteration 300: error = 1.1803343, gradient norm = 0.0025672 (50 iterations in 1.337s)
[t-SNE] Iteration 350: error = 1.0414437, gradient norm = 0.0006574 (50 iterations in 1.160s)
[t-SNE] Iteration 400: error = 1.0122292, gradient norm = 0.0002667 (50 iterations in 1.083s)
[t-SNE] Iteration 450: error = 0.9984583, gradient norm = 0.0001823 (50 iterations in 1.079s)
[t-SNE] Iteration 500: error = 0.9907856, gradient norm = 0.0001434 (50 iterations in 1.081s)
[t-SNE] Iteration 550: error = 0.9862587, gradient norm = 0.0001036 (50 iterations in 1.092s)
[t-SNE] Iteration 600: error = 0.9836794, gradient norm = 0.0000795 (50 iterations in 1.099s)
[t-SNE] Iteration 650: error = 0.9820110, gradient norm = 0.0000699 (50 iterations in 1.106s)
[t-SNE] Iteration 700: error = 0.9808490, gradient norm = 0.0000608 (50 iterations in 1.104s)
[t-SNE] Iteration 750: error = 0.9800025, gradient norm = 0.0001064 (50 iterations in 1.107s)
[t-SNE] Iteration 800: error = 0.9793502, gradient norm = 0.0000455 (50 iterations in 1.105s)
[t-SNE] Iteration 850: error = 0.9788142, gradient norm = 0.0000364 (50 iterations in 1.108s)
[t-SNE] Iteration 900: error = 0.9783841, gradient norm = 0.0000379 (50 iterations in 1.106s)
[t-SNE] Iteration 950: error = 0.9780187, gradient norm = 0.0000293 (50 iterations in 1.108s)
[t-SNE] Iteration 1000: error = 0.9777163, gradient norm = 0.0000262 (50 iterations in 1.110s)
[t-SNE] Error after 1000 iterations: 0.977716

Internally, t-SNE uses an iterative approach, making small (or sometimes large) adjustments to the points. By default, t-SNE will go a maximum of 1000 iterations, but in practice, it often terminates early because it has found a locally optimal (good enough) embedding.

The variable tsne contains an array of unnormalized 2d points, corresponding to the embedding. In the next cell, we normalize the embedding so that lies entirely in the range (0,1).


In [5]:
tx, ty = tsne[:,0], tsne[:,1]
tx = (tx-np.min(tx)) / (np.max(tx) - np.min(tx))
ty = (ty-np.min(ty)) / (np.max(ty) - np.min(ty))

Finally, we will compose a new RGB image where the set of images have been drawn according to the t-SNE results. Adjust width and height to set the size in pixels of the full image, and set max_dim to the pixel size (on the largest size) to scale images to.


In [14]:
width = 4000
height = 3000
max_dim = 100

full_image = Image.new('RGBA', (width, height))
for img, x, y in zip(images, tx, ty):
    tile = Image.open(img)
    rs = max(1, tile.width/max_dim, tile.height/max_dim)
    tile = tile.resize((int(tile.width/rs), int(tile.height/rs)), Image.ANTIALIAS)
    full_image.paste(tile, (int((width-max_dim)*x), int((height-max_dim)*y)), mask=tile.convert('RGBA'))

matplotlib.pyplot.figure(figsize = (16,12))
imshow(full_image)


Out[14]:
<matplotlib.image.AxesImage at 0x7fc5add40be0>

You can save the image to disk:


In [15]:
full_image.save("example-tSNE-caltech101.png")

Now that we have generated our t-SNE, one more nice thing we can optionally do is to take the 2d embedding and assign it to a grid, using RasterFairy. We can optionally choose a grid size of rows (nx) and columns (ny), which should be equal to the number of images you have. If it is less, then you can simply cut the tsne and images lists to be equal to nx * ny.

If you omit the target=(nx, ny) argument, RasterFairy will automatically choose an optimal grid size to be as square-shaped as possible. RasterFairy also has options for embedding them in a grid with irregular borders as well (see the GitHub page for more details).

You can also save the t-SNE points and their associated image paths for further processing in another environment.


In [17]:
tsne_path = "example-tSNE-points-caltech101.json"

data = [{"path":os.path.abspath(img), "point":[float(x), float(y)]} for img, x, y in zip(images, tx, ty)]
with open(tsne_path, 'w') as outfile:
    json.dump(data, outfile)

print("saved t-SNE result to %s" % tsne_path)


saved t-SNE result to example-tSNE-points-caltech101.json

In [6]:
import rasterfairy

# nx * ny = 1000, the number of images
nx = 40
ny = 25

# assign to grid
grid_assignment = rasterfairy.transformPointCloud2D(tsne, target=(nx, ny))

Now finally, we can make a new image of our grid. Set the tile_width and tile_height variables according to how big you want the individual tile images to be. The resolution of the output image is tile_width * nx x tile_height * ny. The script will automatically center-crop all the tiles to match the aspect ratio of tile_width / tile_height.


In [10]:
tile_width = 72
tile_height = 56

full_width = tile_width * nx
full_height = tile_height * ny
aspect_ratio = float(tile_width) / tile_height

grid_image = Image.new('RGB', (full_width, full_height))

for img, grid_pos in zip(images, grid_assignment[0]):
    idx_x, idx_y = grid_pos
    x, y = tile_width * idx_x, tile_height * idx_y
    tile = Image.open(img)
    tile_ar = float(tile.width) / tile.height  # center-crop the tile to match aspect_ratio
    if (tile_ar > aspect_ratio):
        margin = 0.5 * (tile.width - aspect_ratio * tile.height)
        tile = tile.crop((margin, 0, margin + aspect_ratio * tile.height, tile.height))
    else:
        margin = 0.5 * (tile.height - float(tile.width) / aspect_ratio)
        tile = tile.crop((0, margin, tile.width, margin + float(tile.width) / aspect_ratio))
    tile = tile.resize((tile_width, tile_height), Image.ANTIALIAS)
    grid_image.paste(tile, (int(x), int(y)))

matplotlib.pyplot.figure(figsize = (16,12))
imshow(grid_image)


Out[10]:
<matplotlib.image.AxesImage at 0x7f21550485f8>

Finally, we can save the gridded t-SNE to disk as well.


In [11]:
grid_image.save("example-tSNE-grid-caltech101.jpg")