Shortest path between images

This notebook demonstrates how, given a query of two images, to find a sequence of images between them whose visual changes are as gradual as possible.

The methodology is inspired by and (roughly) reverse-engineered from X Degrees of Separation, a work by Mario Klingemann and Simon Doury for Google Arts & Culture Experiments. X Degrees of Separation finds smooth paths through images taken from museum collections. Mario gave me some hints to his strategy :) and I've made an approximate implementation here.

Some screenshots from their original work:

X degrees 1

X degrees 2

Strategy

The approach to the problem is the following:

1) Analyze all images in your dataset, extracting a feature vector for each one from the pre-classification fully-connected layer of a trainedconvolutional neural network. This part is already done in the image-search notebook.

2) For each image, find its k nearest neighbors, i.e. the k neighboring images which have the smallest cosine distance from it.

3) Build a graph whose vertices are the images and whose edges are the connections between nearest-neighbors found in step 2.

4) At run-time, given a query of two images, find the shortest path through the graph between them, using the cosine distance as distance metric. That the vertices in the graph are not fully-connected ensures that it does not return the trivial direct path between the two endpoints.

The upside to this strategy is that it's relatively straightforward, easy to implement, and fast at run-time. The downside is that it doesn't guarantee that there is a path between any two images, since it is possible for some regions of the image space to become isolated from the larger grid if they are too self-similar. Additionally, for image sets which are unevenly distributed, the simple kNN threshold may produce very densely connected clusters. There are more sophisticated approaches which try to ensure that individual edges are not too long or too short, but for the sake of simplicity, we take the easier kNN approach.

Get image features

This notebook assumes you have already extracted features for a directory of images and saved them to disk. If you haven't done this yet, see the previous notebook (image_search.ipynb), which shows you how to do this. Alternatively

Now we can begin. Open your saved feature vectors with pickle, and ensure the images are in the correct paths.



In [1]:

    
%matplotlib inline
import os
import random
import numpy as np
import pickle
import matplotlib.pyplot
from matplotlib.pyplot import imshow
from PIL import Image
from scipy.spatial import distance
from igraph import *
from tqdm import tqdm

Next, open your saved feature vectors with pickle, and ensure the images are in the correct paths.



In [2]:

    
images, pca_features, pca  = pickle.load(open('../data/features_caltech101.p', 'rb'))

for i, f in list(zip(images, pca_features))[0:5]:
    print("image: %s, features: %0.2f,%0.2f,%0.2f,%0.2f... "%(i, f[0], f[1], f[2], f[3]))









    



image: ../data/101_ObjectCategories/accordion/image_0001.jpg, features: 7.74,-5.83,20.38,-5.37... 
image: ../data/101_ObjectCategories/accordion/image_0002.jpg, features: 8.88,-8.90,15.68,2.95... 
image: ../data/101_ObjectCategories/accordion/image_0003.jpg, features: 9.70,-10.76,18.89,-7.32... 
image: ../data/101_ObjectCategories/accordion/image_0004.jpg, features: 14.11,-10.24,20.73,-3.66... 
image: ../data/101_ObjectCategories/accordion/image_0005.jpg, features: 9.65,-2.36,14.25,0.17...

The following cell is optional. If you wish to restrict your graph to a smaller set of images (perhaps for testing purposes), this cell will take a random subsample of your image set of size num_images. Set this number however you wish or skip the cell if you intend to use all the images.



In [22]:

    
num_images = 10000

if len(images) > num_images:
    sort_order = sorted(random.sample(xrange(len(images)), num_images))
    images = [images[i] for i in sort_order]
    pca_features = [pca_features[i] for i in sort_order]

Next, we are going to build our graph. The graph will contain one vertex for every image. The edges of the graph are found by taking the k nearest neighbor images to each image in the set, and adding an edge between them whose distance is given by cosine distance.

Here we set kNN = 30 which means that we save 30 neighbor images. There are tradeoffs between lower and higher values of kNN. If kNN is very low, it helps to ensure that consecutive images in a path are as similar as possible, at the expense of making the paths longer in total nodes traversed, since low kNN restricts edges to a very small number of neighbors. However, there is increased risk that some regions are not connected to the rest of the graph and therefore unreachable in a query. If kNN is high, more nodes are connected to each other, and therefore paths will be shorter and more likely to exist, however there may be more visually dissimilar jumps between consecutive neighbors.



In [53]:

    
kNN = 30

graph = Graph(directed=True)
graph.add_vertices(len(images))

for i in tqdm(range(len(images))):
    distances = [ distance.cosine(pca_features[i], feat) for feat in pca_features ]
    idx_kNN = sorted(range(len(distances)), key=lambda k: distances[k])[1:kNN+1]
    for j in idx_kNN:
        graph.add_edge(i, j, weight=distances[j])
    
summary(graph)









    



100%|██████████| 9144/9144 [3:06:16<00:00,  1.05s/it]





    



IGRAPH D-W- 9144 274320 -- 
+ attr: weight (e)

Once the graph has been saved, we can save it along with the image paths to disk, so we can load them and use them later.



In [54]:

    
pickle.dump([images, graph], open('../data/graph_caltech101_30knn.p', 'wb'))

Later, we can retrieve them in the following way (uncomment the following line).



In [3]:

    
#images, graph = pickle.load(open('../data/graph_caltech101_30knn.p', 'rb'))

For the sake of convenience, we define a helper function which will concatenate a sequence of images into a single image so we can display the sequences in this notebook. The helper function takes a thumb_height and resizes all the images so they have that as their height.



In [149]:

    
def get_concatenated_images(indexes, thumb_height):
    thumbs = []
    for idx in indexes:
        img = Image.open(images[idx])
        img = img.convert('RGB')
        img = img.resize((int(img.width * thumb_height / img.height), thumb_height))
        thumbs.append(img)
    concat_image = np.concatenate([np.asarray(t) for t in thumbs], axis=1)
    return concat_image

Now we can do a query. We'll just pick two random indices (idx1 and idx2) and run igraph's get_shortest_paths method, using the cosine distance ('weight' as the weights).



In [24]:

    
# pick two random indices
idx1 = int(len(images) * random.random())
idx2 = int(len(images) * random.random())

# run get_shortest_paths
path = graph.get_shortest_paths(idx1, to=idx2, mode=OUT, output='vpath', weights='weight')[0]

# retrieve the images, concatenate into one, and display them
results_image = get_concatenated_images(path, 200)
matplotlib.pyplot.figure(figsize = (16,12))
imshow(results_image)









    Out[24]:





<matplotlib.image.AxesImage at 0x10d3af210>

Not too bad! Let's try again...



In [86]:

    
# pick two random indices
idx1 = int(len(images) * random.random())
idx2 = int(len(images) * random.random())

# run get_shortest_paths
path = graph.get_shortest_paths(idx1, to=idx2, mode=OUT, output='vpath', weights='weight')[0]

# retrieve the images, concatenate into one, and display them
results_image = get_concatenated_images(path, 200)
matplotlib.pyplot.figure(figsize = (16,12))
imshow(results_image)









    Out[86]:





<matplotlib.image.AxesImage at 0x11a645f90>

Depending on how low you set your kNN you may have occasional errors where a path cannot be found between two isolated nodes. But with kNN>=30 this appears to be a rare event.

Not all paths are going to be very interesting. This depends greatly on the character of your image set. Some image sets (like CalTech-101 or CalTech-256) are very highly clustered around object categories, which means there may not always be good candidates between two image classes. Datasets which have more variety tend to work better for this reason.

Here are some examples of paths found in CalTech-101: