Re-Purposing a Pretrained Network

Since a large CNN is very time-consuming to train (even on a GPU), and requires huge amounts of data, is there any way to use a pre-calculated one instead of retraining the whole thing from scratch?

This notebook shows how this can be done. And it works surprisingly well.

How do we classify images with untrained classes?

This notebook extracts a vector representation of a set of images using the GoogLeNet CNN pretrained on ImageNet. It then builds a 'simple SVM classifier', allowing new images can be classified directly. No retraining of the original CNN is required.


In [ ]:
import theano
import theano.tensor as T

import lasagne

import numpy as np
import scipy

import matplotlib.pyplot as plt
%matplotlib inline

import pickle
import time

CLASS_DIR='./images/cars'

Functions for building the GoogLeNet model with Lasagne and preprocessing the images are defined in model.googlenet.

Build the model and select layers we need - the features are taken from the final network layer, before the softmax nonlinearity.


In [ ]:
from models.imagenet_theano import googlenet

cnn_layers = googlenet.build_model()
cnn_input_var = cnn_layers['input'].input_var
cnn_feature_layer = cnn_layers['loss3/classifier']
cnn_output_layer = cnn_layers['prob']

get_cnn_features = theano.function([cnn_input_var], lasagne.layers.get_output(cnn_feature_layer))

print("GoogLeNet Model defined")

Load the pretrained weights into the network :


In [ ]:
import os
import urllib.request 

imagenet_theano = './data/imagenet_theano' 
googlenet_pkl = imagenet_theano+'/blvc_googlenet.pkl'

if not os.path.isfile(googlenet_pkl):
    if not os.path.exists(imagenet_theano):
        os.makedirs(imagenet_theano)
    print("Downloading GoogLeNet parameter file")
    urllib.request.urlretrieve(
        'https://s3.amazonaws.com/lasagne/recipes/pretrained/imagenet/blvc_googlenet.pkl', 
        googlenet_pkl)

params = pickle.load(open(googlenet_pkl, 'rb'), encoding='iso-8859-1')

model_param_values = params['param values']
imagenet_classes = params['synset words']

lasagne.layers.set_all_param_values(cnn_output_layer, model_param_values)

print("Loaded GoogLeNet params")

Use the Network to create 'features' for the training images

Now go through the input images and feature-ize them according to the pretrained network.

NB: The pretraining was done on ImageNet - there wasn't anything specific to the recognition task we're doing here.


In [ ]:
import os
classes = sorted( [ d for d in os.listdir(CLASS_DIR) if os.path.isdir("%s/%s" % (CLASS_DIR, d)) ] )
classes # Sorted for for consistency

In [ ]:
train = dict(f=[], features=[], target=[])

t0 = time.time()
for class_i,d in enumerate(classes):
    for f in os.listdir("%s/%s" % (CLASS_DIR, d,)):
        filepath = '%s/%s/%s' % (CLASS_DIR,d,f,)
        if os.path.isdir(filepath): continue
        im = plt.imread(filepath)
        rawim, cnn_im = googlenet.prep_image(im)

        prob = get_cnn_features(cnn_im)

        train['f'].append(filepath)
        train['features'].append(prob[0])
        train['target'].append( class_i )

        plt.figure()
        plt.imshow(rawim.astype('uint8'))
        plt.axis('off')

        plt.text(320, 50, '{}'.format(f), fontsize=14)
        plt.text(320, 80, 'Train as class "{}"'.format(d), fontsize=12)
    
print("DONE : %6.2f seconds each" %(float(time.time() - t0)/len(train),))

Build an SVM model over the features


In [ ]:
#train['features'][0]

In [ ]:
from sklearn import svm
classifier = svm.LinearSVC()
classifier.fit(train['features'], train['target']) # learn from the data

Use the SVM model to classify the test set


In [ ]:
test_image_files = [f for f in os.listdir(CLASS_DIR) if not os.path.isdir("%s/%s" % (CLASS_DIR, f))]

t0 = time.time()
for f in sorted(test_image_files):
    im = plt.imread('%s/%s' % (CLASS_DIR,f,))
    rawim, cnn_im = googlenet.prep_image(im)
        
    prob = get_cnn_features(cnn_im)

    prediction_i = classifier.predict([ prob[0] ])
    decision     = classifier.decision_function([ prob[0] ])
                       
    plt.figure()
    plt.imshow(rawim.astype('uint8'))
    plt.axis('off')
                
    prediction = classes[ prediction_i[0] ]
                       
    plt.text(350, 50, '{} : Distance from boundary = {:5.2f}'.format(prediction, decision[0]), fontsize=20)
    plt.text(350, 75, '{}'.format(f), fontsize=14)
    
print("DONE : %6.2f seconds each" %(float(time.time() - t0)/len(test_image_files),))

Did it work?

Exercise : Try your own ideas

The whole training regime here is based on the way the image directories are structured. So building your own example shouldn't be very difficult.

Suppose you wanted to classify pianos into Upright and Grand :

  • Create a pianos directory and point the CLASS_DIR variable at it
  • Within the pianos directory, create subdirectories for each of the classes (i.e. Upright and Grand). The directory names will be used as the class labels
  • Inside the class directories, put a 'bunch' of positive examples of the respective classes - these can be images in any reasonable format, of any size (above 224x224).
    • The images will be automatically resized so that their smallest dimension is 224, and then a square 'crop' area taken from their centers (since ImageNet networks are typically tuned to answering on 224x224 images)
  • Test images should be put in the pianos directory itelf (which is logical, since we don't know their classes yet)

Finally, re-run everything - checking that the training images are read in correctly, that there are no errors along the way, and that (finally) the class predictions on the test set come out as expected.

If/when it works - please let everyone know : We can add that as an example for next time...


In [ ]: