Classifying ImageNet: the instant Caffe way

Caffe provides a general Python interface for models with caffe.Net in python/caffe/pycaffe.py, but to make off-the-shelf classification easy we provide a caffe.Classifier class and classify.py script. Both Python and MATLAB wrappers are provided. However, the Python wrapper has more features so we will describe it here. For MATLAB, refer to matlab/caffe/matcaffe_demo.m.

Before we begin, you must compile Caffe and install the python wrapper by setting your PYTHONPATH. If you haven't yet done so, please refer to the installation instructions. This example uses our pre-trained ImageNet model, an ILSVRC12 image classifier. You can download it (232.57MB) by running examples/imagenet/get_caffe_reference_imagenet_model.sh. Note that this pre-trained model is licensed for academic research / non-commercial use only.

Ready? Let's start.



In [1]:

    
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import caffe

caffe_root = '../'  # this file is expected to be in {caffe_root}/examples

# Set the right path to your model definition file, pretrained model weights,
# and the image you would like to classify.
MODEL_FILE = 'imagenet/imagenet_deploy.prototxt'
PRETRAINED = 'imagenet/caffe_reference_imagenet_model'
IMAGE_FILE = 'images/cat.jpg'

Loading a network is easy. caffe.Classifier takes care of everything. Note the arguments for configuring input preprocessing: mean subtraction switched on by giving a mean file, input channel swapping takes care of mapping RGB into the reference ImageNet model's BGR order, and input scaling multiplies the feature scale from the input [0,1] to [0,255].



In [2]:

    
net = caffe.Classifier(MODEL_FILE, PRETRAINED,
                       mean_file=caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy',
                       channel_swap=(2,1,0),
                       input_scale=255,
                       image_dims=(256, 256))

We will set the phase to test since we are doing testing, and will first use CPU for the computation.



In [3]:

    
net.set_phase_test()
net.set_mode_cpu()

Let's take a look at our example image with Caffe's image loading helper.



In [4]:

    
input_image = caffe.io.load_image(IMAGE_FILE)
plt.imshow(input_image)









    Out[4]:





<matplotlib.image.AxesImage at 0xa7ebfd0>

Time to classify. The default is to actually do 10 predictions, cropping the center and corners of the image as well as their mirrored versions, and average over the predictions:



In [5]:

    
prediction = net.predict([input_image])  # predict takes any number of images, and formats them for the Caffe net automatically
print 'prediction shape:', prediction[0].shape
plt.plot(prediction[0])









    



prediction shape: (1000,)






    Out[5]:





[<matplotlib.lines.Line2D at 0xc8b1bd0>]

Now let's classify by the center crop alone by turning off oversampling. Note that this makes a single input, although if you inspect the model definition prototxt you'll see the network has a batch size of 10. The python wrapper handles batching and padding for you!



In [6]:

    
prediction = net.predict([input_image], oversample=False)
print 'prediction shape:', prediction[0].shape
plt.plot(prediction[0])









    



prediction shape: (1000,)






    Out[6]:





[<matplotlib.lines.Line2D at 0xc8e7090>]

You can see that the prediction is 1000-dimensional, and is pretty sparse.

Our pretrained model uses the synset ID ordering of the classes, as listed in ../data/ilsvrc12/synset_words.txt if you fetch the auxiliary imagenet data by ../data/ilsvrc12/get_ilsvrc_aux.sh. If you look at the top indices that maximize the prediction score, they are foxes, cats, and other cute mammals. Not unreasonable predictions, right?

Now, why don't we see how long it takes to perform the classification end to end? This result is run from an Intel i5 CPU, so you may observe some performance differences.



In [7]:

    
%timeit net.predict([input_image])









    



1 loops, best of 3: 492 ms per loop

It may look a little slow, but note that time is spent on cropping, python interfacing, and running 10 images. For performance, if you really want to make prediction fast, you can optionally code in C++ and pipeline operations better. For experimenting and prototyping the current speed is fine.

Let's time classifying a single image with input preprocessed:



In [8]:

    
# Resize the image to the standard (256, 256) and oversample net input sized crops.
input_oversampled = caffe.io.oversample([caffe.io.resize_image(input_image, net.image_dims)], net.crop_dims)
# 'data' is the input blob name in the model definition, so we preprocess for that input.
caffe_input = np.asarray([net.preprocess('data', in_) for in_ in input_oversampled])
# forward() takes keyword args for the input blobs with preprocessed input arrays.
%timeit net.forward(data=caffe_input)









    



1 loops, best of 3: 327 ms per loop

OK, so how about GPU? it is actually pretty easy:



In [9]:

    
net.set_mode_gpu()

Voila! Now we are in GPU mode. Let's see if the code gives the same result:



In [10]:

    
prediction = net.predict([input_image])
print 'prediction shape:', prediction[0].shape
plt.plot(prediction[0])









    



prediction shape: (1000,)






    Out[10]:





[<matplotlib.lines.Line2D at 0xcbba250>]

Good, everything is the same. And how about time consumption? The following benchmark is obtained on the same machine with a K20 GPU:



In [11]:

    
# Full pipeline timing.
%timeit net.predict([input_image])









    



10 loops, best of 3: 192 ms per loop



In [12]:

    
# Forward pass timing.
%timeit net.forward(data=caffe_input)









    



10 loops, best of 3: 25.2 ms per loop

Pretty fast right? Not as fast as you expected? Indeed, in this python demo you are seeing only 4 times speedup. But remember - the GPU code is actually very fast, and the data loading, transformation and interfacing actually start to take more time than the actual convnet computation itself!

To fully utilize the power of GPUs, you really want to:

Use larger batches, and minimize python call and data transfer overheads.
Pipeline data load operations, like using a subprocess.
Code in C++. A little inconvenient, but maybe worth it if your dataset is really, really large.

Parting Words

So this is python! We hope the interface is easy enough for one to use. The python wrapper is interfaced with boost::python, and source code can be found at python/caffe with the main interface in pycaffe.py and the classification wrapper in classifier.py. If you have customizations to make, start there! Do let us know if you make improvements by sending a pull request!