In [1]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt

First let's make sure that we have access to a subset of image files from the PASCAL VOC dataset:


In [2]:
import os.path as op
from zipfile import ZipFile

if not op.exists("images_resize"):
    print('Extracting image files...')
    zf = ZipFile('images_pascalVOC.zip')
    zf.extractall('.')


Extracting image files...

Using a pretrained model

Objectives:

  • Load a pre-trained ResNet50 pre-trained model using Keras Zoo
  • Build a headless model and compute representations of images
  • Explore the quality of representation with t-SNE
  • Retrain last layer on cat vs dog dataset

In [3]:
from keras.applications.resnet50 import ResNet50
from keras.models import Model
from keras.preprocessing import image

model = ResNet50(include_top=True, weights='imagenet')


Using TensorFlow backend.

In [4]:
# print(model.summary())

Classification of an image

Exercise

  • Open an image, preprocess it and build a batch of 1 image
  • Use the model to classify this image
  • Decode the predictions using decode_predictions from Keras

Notes:

  • You may use preprocess_input for preprocessing the image.
  • Test your code with "images_resize/000007.jpg"

In [5]:
from scipy.misc import imread, imresize
from keras.applications.imagenet_utils import preprocess_input
from keras.applications.imagenet_utils import decode_predictions

path = "images_resize/000007.jpg"

# TODO

In [6]:
# %load solutions/predict_image.py

from scipy.misc import imread, imresize
from keras.applications.imagenet_utils import preprocess_input
from keras.applications.imagenet_utils import decode_predictions

path = "images_resize/000007.jpg"

img = imread(path)
plt.imshow(img)

img = imresize(img, (224,224)).astype("float32")
# add a dimension for a "batch" of 1 image
img_batch = preprocess_input(img[np.newaxis]) 

predictions = model.predict(img_batch)
decoded_predictions= decode_predictions(predictions)

for s, name, score in decoded_predictions[0]:
    print(name, score)


convertible 0.8979
sports_car 0.0324119
beach_wagon 0.0223601
grille 0.0137278
car_wheel 0.00822862

Computing the representation of an image


In [7]:
input = model.layers[0].input
output = model.layers[-2].output
base_model = Model(input, output)

In [8]:
representation = base_model.predict(img_batch)
print("shape of representation:", representation.shape)
print("proportion of zero valued axis: %0.3f"
      % np.mean(representation[0]==0))


shape of representation: (1, 2048)
proportion of zero valued axis: 0.120

Computing representations of all images can be time consuming. This is usually made by large batches on a GPU for massive performance gains.

For the remaining part, we will use pre-computed representations saved in h5 format

For those interested, this is done using the process_images.py script


In [9]:
import os
paths = ["images_resize/" + path
         for path in sorted(os.listdir("images_resize/"))]

In [10]:
import h5py

# Load pre-calculated representations
h5f = h5py.File('img_emb.h5','r')
out_tensors = h5f['img_emb'][:]
h5f.close()

The representations are dense.

Exercise

  • Which proportion of a representation is 0?
  • Why are there null values?

In [11]:
# %load solutions/representations.py
# Proportion of zeros in a representation
print("proportion of zeros", np.mean(out_tensors[0]==0.0))

# For all representations:
plt.hist(np.mean(out_tensors==0.0, axis=1));

# These 0 values come from the different reLU units.
# They propagate through the layers, and there can be many.
# If a network has too many of them, a lot of computation
# / memory is wasted.


proportion of zeros 0.15380859375

In [12]:
from sklearn.manifold import TSNE

img_emb_tsne = TSNE(perplexity=30).fit_transform(out_tensors)

In [13]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 10))
plt.scatter(img_emb_tsne[:, 0], img_emb_tsne[:, 1]);
plt.xticks(()); plt.yticks(());
plt.show()


Let's add thumnails of the original images at their TSNE locations:


In [14]:
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.offsetbox import OffsetImage, AnnotationBbox
from scipy.misc import imread, imresize

def imscatter(x, y, paths, ax=None, zoom=1, linewidth=0):
    if ax is None:
        ax = plt.gca()
    x, y = np.atleast_1d(x, y)
    artists = []
    for x0, y0, p in zip(x, y, paths):
        try:
            im = imread(p)
        except:
            print(p)
            continue
        im = imresize(im,(224,224))
        im = OffsetImage(im, zoom=zoom)
        ab = AnnotationBbox(im, (x0, y0), xycoords='data',
                            frameon=True, pad=0.1, 
                            bboxprops=dict(edgecolor='red',
                                           linewidth=linewidth))
        artists.append(ax.add_artist(ab))
    ax.update_datalim(np.column_stack([x, y]))
    ax.autoscale()
    return artists

In [15]:
fig, ax = plt.subplots(figsize=(50, 50))
imscatter(img_emb_tsne[:, 0], img_emb_tsne[:, 1], paths, zoom=0.5, ax=ax)
plt.savefig('tsne.png')


Visual Search: finding similar images


In [16]:
def display(img):
    plt.figure()
    img = imread(img)
    plt.imshow(img)

In [17]:
idx = 57

def most_similar(idx, top_n=5):
    dists = np.linalg.norm(out_tensors - out_tensors[idx], axis = 1)
    sorted_dists = np.argsort(dists)
    return sorted_dists[:top_n]

sim = most_similar(idx)
[display(paths[s]) for s in sim];


Classification from Nearest Neighbors?

Using these representations, it may be possible to build a nearest neighbor classifier. However, the representations are learnt on ImageNet, which are centered images, when we input images from PascalVOC, more plausible inputs of a real world system.

The next section explores this possibility by computing the histogram of similarities between one image and the others.


In [18]:
out_norms = np.linalg.norm(out_tensors, axis=1, keepdims=True)
normed_out_tensors = out_tensors / out_norms

In [19]:
item_idx = 208
dists_to_item = np.linalg.norm(out_tensors - out_tensors[item_idx],
                               axis=1)
cos_to_item = np.dot(normed_out_tensors, normed_out_tensors[item_idx]) 
plt.hist(cos_to_item)
display(paths[item_idx])


Unfortunately there is no clear separation of class boundaries visible in the histogram of similarities alone. We need some supervision to be able to classify images.

With a labeled dataset, even with very little labels per class, one would be able to:

  • build a k-Nearest Neighbor model,
  • build a classification model such as a SVM.

These approximate classifiers are useful in practice. See the cat vs dog home assignment with GPU for another example of this approach.


In [21]:
items = np.where(cos_to_item > 0.44)
print(items)
[display(paths[s]) for s, _ in zip(items[0], range(10))];


(array([  4,   9,  12,  60,  82, 151, 170, 186, 187, 205, 208, 225, 252,
       265, 274, 280, 310, 330, 331, 355, 385, 391, 397, 400, 420, 435,
       474, 486, 490]),)

In [ ]: