In this notebook, we will walk you through the end to end process of reproducing the state-of-the-art result of the ImageNet Challenge. At the end of the notebook, you will obtain a "web_image_classifier" object that takes an image url and predicts the category of the image.
The notebook has three parts:
Each part of the notebook can be run independently assuming all the predecessors have completed. You would need a GPU-enabled version of graphlab-create version 1.0 or later, and a powerful GPU (1 thousand cores and 2G memory).
In [1]:
# Assume '/data/imagenet' is your working directory
WORKING_DIR = '/data/imagenet'
In [ ]:
import graphlab as gl
import os
import tarfile
In [ ]:
# Assume you have downloaded the compressed images into 'raw/'
os.listdir(WORKING_DIR + '/raw/')
In [ ]:
os.mkdir(WORKING_DIR + '/ILSVRC2012_image_train_tar')
tarfile.Tarfile(WORKING_DIR+'/raw/ILSVRC2012_img_train.tar').extractall(path=WORKING_DIR+'/ILSVRC2012_image_train_tar')
os.listdir(WORKING_DIR + '/ILSVRC2012_image_train_tar')
In [ ]:
train_dir = WORKING_DIR + '/ILSVRC2012_img_train'
os.mkdir(train_dir)
for tar in [path for path in os.listdir(WORKING_DIR + '/ILSVRC2012_image_train_tar') if path.endswith('.tar')]:
dirname = train_dir + '/' + tar.split('.')[0]
os.mkdir(dirname)
filename = WORKING_DIR + '/ILSVRC2012_image_train_tar/' + tar
print 'extracting %s to %s' % (filename, dirname)
f = tarfile.TarFile(filename)
f.extractall(path=dirname)
In [ ]:
train_sf = gl.image_analysis.load_images(train_dir, random_order=True)
train_sf['image'] = gl.image_analysis.resize(train_sf['image'], 256, 256)
# The path looks like '/data/imagenet/ILSVRC2012_img_train/n1440764/n01440764_10026.JPEG'
# The lambda function extracts the wordnet_id 'n01440764' from the path
train_sf['wnid'] = train_sf['path'].apply(lambda x: int(x.split('/')[-2]))
train_sf.save(WORKING_DIR + '/sframe/train_shuffle')
In [2]:
import graphlab as gl
gl.canvas.set_target('ipynb')
In [3]:
train = gl.SFrame(WORKING_DIR + '/sframe/train_shuffle')
train.head()
Out[3]:
In [ ]:
unique_labels = train['wnid'].unique().sort()
class_map = {}
for i in range(len(unique_labels)):
class_map[unique_labels[i]] = i
train['label'] = train['wnid'].apply(lambda x: class_map[x])
# Save the mapping so that we can use it later when building customized classifier
import pickle
pickle.dump(class_map, file(WORKING_DIR + '/wnid_to_label.pkl', 'w'))
Subtracting mean image from each input image ensures every feature variable (pixel in this case) has zero mean. This is often used as a common preprocessing steps in supervised learning. You can find more detailed discussion about preprocessing here.
In [ ]:
mean_image = train['image'].mean()
## Tip: You can save the mean_image to sarray and load it later to avoid recomputing it everytime.
# gl.SArray([mean_image]).save(WORKING_DIR + '/sframe/mean_image')
# mean_image = gl.SArray(WORKING_DIR + '/sframe/mean_image')[0]
Here, we obtain a NerualNet object from the builtin networks in the deeplearing toolkit. A Network object contains layer specifications and hyperparameters for the learning algorithm.
We will use this NeuralNet object and the training data train a neuralnet_classifer.
The "imagenet" NeuralNet is derived from Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” Advances in neural information processing systems. 2012. link
For more details of the deeplearing toolkit, please see our API docs.
In [4]:
net = gl.deeplearning.get_builtin_neuralnet('imagenet')
net
Out[4]:
Training a deep convolutional neuralnet on 1.2 million images can be time consuming. The model_checkpoint_path option will save the model every 5 iteration to the location specified by "model_checkpoint_path". While waiting for the final model, you can start working with the check_pointed model for early evaluation.
In [ ]:
m = gl.neuralnet_classifier.create(train[['image', 'label']],
target='label',
network=net,
mean_image=mean_image,
metric=['accuracy','recall@5'],
max_iterations=35,
model_checkpoint_path=WORKING_DIR + '/result/model_checkpoint',
model_checkpoint_interval=5,
batch_size=150)
In [5]:
import graphlab as gl
import pickle
wnid_to_text_sf = gl.SFrame.read_csv('http://image-net.org/archive/words.txt', delimiter='\t', header=False)
wnid_to_text_sf
Out[5]:
In [6]:
wnid_to_text = dict((int(row['X1'][1:]), row['X2']) for row in wnid_to_text_sf)
wnid_to_label = pickle.load(file(WORKING_DIR + '/wnid_to_label.pkl'))
label_to_text = dict((label, wnid_to_text[wnid]) for (wnid, label) in wnid_to_label.iteritems())
label_to_text
Out[6]:
In [7]:
model = gl.load_model(WORKING_DIR + '/result/model_checkpoint')
def predict_image_url(url):
image_sf = gl.SFrame({'image': [gl.Image(url)]})
image_sf['image'] = gl.image_analysis.resize(image_sf['image'], 256, 256, 3)
top_labels = model.predict_topk(image_sf, k=5)
top_labels['class_text'] = top_labels['class'].apply(lambda x: label_to_text[x])
return top_labels
In [8]:
predict_image_url('http://cdn3.getnetworth.com/wp-content/uploads/2013/05/world-top-exotic-car-wall-2d4fb.jpg')
Out[8]:
In [ ]: