Similar Image Search Using Deep Learning on Caltech-101

In this notebook, we will walk you through adapting a neural network trained on the ImageNet Challenge to find similar images within the Caltech-101 dataset. At the end of the notebook, we will be able to alogithmically identify images that are visually similar to each other within the Caltech-101 dataset.

The notebook has several parts:

  • Part I focuses on loading the data.

  • Part II focuses on using a pre-trained neural net to extract visual features.

  • Part III focuses on using the extracted visual features to train a nearest neighbors model.

Part I: The Data

In this notebook, we use the Caltech-101 dataset. Caltech-101 contains photos of objects belonging to 101 categories. Note: This is a large dataset, so it may take a while to download.It was collected by Fei-Fei Li, Marco Andreetto, and Marc 'Aurelio Ranzato in September 2003.


In [1]:
import graphlab 
images = graphlab.SFrame('https://static.turi.com/datasets/caltech_101/caltech_101_images')


[INFO] 1447192055 : INFO:     (initialize_globals_from_environment:282): Setting configuration variable GRAPHLAB_FILEIO_ALTERNATIVE_SSL_CERT_FILE to /Users/zach/anaconda/lib/python2.7/site-packages/certifi/cacert.pem
1447192055 : INFO:     (initialize_globals_from_environment:282): Setting configuration variable GRAPHLAB_FILEIO_ALTERNATIVE_SSL_CERT_DIR to 
This commercial license of GraphLab Create is assigned to engr@turi.com.

[INFO] Start server at: ipc:///tmp/graphlab_server-3587 - Server binary: /Users/zach/anaconda/lib/python2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1447192055.log
[INFO] GraphLab Server Version: 1.6.914
PROGRESS: Downloading https://static.turi.com/datasets/caltech_101/caltech_101_images/dir_archive.ini to /var/tmp/graphlab-zach/3587/da1e95a5-df9a-4345-8c7e-d299ca222f30.ini
PROGRESS: Downloading https://static.turi.com/datasets/caltech_101/caltech_101_images/objects.bin to /var/tmp/graphlab-zach/3587/1dc55e67-3e98-43a6-ab5b-4b09830b9b27.bin
PROGRESS: Downloading https://static.turi.com/datasets/caltech_101/caltech_101_images/m_96330e5f81c95d09.frame_idx to /var/tmp/graphlab-zach/3587/3e7ef2ec-592c-4af3-90e8-78bba042ad43.frame_idx
PROGRESS: Downloading https://static.turi.com/datasets/caltech_101/caltech_101_images/m_96330e5f81c95d09.sidx to /var/tmp/graphlab-zach/3587/05bae9df-44d7-47de-99f2-fa8bd2ec01d0.sidx

Part II: Extracting Features

We use the neural network trained on the 1.2 million images of the ImageNet Challenge. For each image of the Caltech-101 dataset, we take the activations of the layer before the classification layer and consider that our feature vector for the image. This is a sort of internal representation of what the network knows about the image. If these feature vectors are similar for two images, then the images should be similar as well. This concept is covered more thoroughly in our blog post on the subject. Note that feature exctraction will not be feasible without a GPU and the GPU installation. In that case, you should download the SArray that contains the result of this step.


In [2]:
# Only do this if you have a GPU
#pretrained_model = graphlab.load_model('https://static.turi.com/models/imagenet_model_iter45')
#images['extracted_features'] = pretrained_model.extract_features(images)

# If you do not have a GPU, do this instead. 
images['extracted_features'] = graphlab.SArray('https://static.turi.com/models/pre_extracted_features.gl')


PROGRESS: Downloading https://static.turi.com/models/pre_extracted_features.gl/dir_archive.ini to /var/tmp/graphlab-zach/3587/3c85f561-f308-4283-98a2-8bb0a3accdaa.ini
PROGRESS: Downloading https://static.turi.com/models/pre_extracted_features.gl/objects.bin to /var/tmp/graphlab-zach/3587/9be9dc1e-9920-4013-8ec9-bd11f12abd1d.bin
PROGRESS: Downloading https://static.turi.com/models/pre_extracted_features.gl/m_3e736cfeecc9a07f.sidx to /var/tmp/graphlab-zach/3587/738c39da-802b-4893-be36-c5cf300b5935.sidx

Now, let's inspect the images SFrame. The 'extracted_features' column contains vector representations of the data, as we expected it to.


In [3]:
images


PROGRESS: Downloading https://static.turi.com/datasets/caltech_101/caltech_101_images/m_96330e5f81c95d09.0000 to /var/tmp/graphlab-zach/3587/da131909-64e5-4036-a79a-c7fb676a527b.0000
PROGRESS: Downloading https://static.turi.com/datasets/caltech_101/caltech_101_images/m_96330e5f81c95d09.0001 to /var/tmp/graphlab-zach/3587/5364f7ab-1c6c-47de-be7b-363893ded4c8.0001
PROGRESS: Downloading https://static.turi.com/datasets/caltech_101/caltech_101_images/m_96330e5f81c95d09.0002 to /var/tmp/graphlab-zach/3587/6d9972c1-b96f-413a-861d-435c25e4c3ae.0002
PROGRESS: Downloading https://static.turi.com/datasets/caltech_101/caltech_101_images/m_96330e5f81c95d09.0003 to /var/tmp/graphlab-zach/3587/00f6f17c-a4eb-4ded-bfea-05afb6e0e9ba.0003
PROGRESS: Downloading https://static.turi.com/models/pre_extracted_features.gl/m_3e736cfeecc9a07f.0000 to /var/tmp/graphlab-zach/3587/01c9c9c1-71e9-406d-be63-28a8fdd95ec2.0000
PROGRESS: Downloading https://static.turi.com/models/pre_extracted_features.gl/m_3e736cfeecc9a07f.0001 to /var/tmp/graphlab-zach/3587/c76dd2f7-b5fb-4383-adef-380cbc7b2bc2.0001
PROGRESS: Downloading https://static.turi.com/models/pre_extracted_features.gl/m_3e736cfeecc9a07f.0002 to /var/tmp/graphlab-zach/3587/62d3e36e-a352-4be6-ac8b-1e8d1f895790.0002
PROGRESS: Downloading https://static.turi.com/models/pre_extracted_features.gl/m_3e736cfeecc9a07f.0003 to /var/tmp/graphlab-zach/3587/0d6a1fe9-33ff-43fb-b33d-a92ca16e35cb.0003
Out[3]:
image extracted_features
Height: 256 Width: 256 [0.0, 1.23551917076,
0.780339300632, ...
Height: 256 Width: 256 [0.0, 0.0, 0.0, 0.0,
2.2158780098, ...
Height: 256 Width: 256 [0.700446844101, 0.0,
0.0, 2.748742342, 0.0, ...
Height: 256 Width: 256 [0.0, 2.67075538635,
1.84025931358, ...
Height: 256 Width: 256 [1.50202608109,
0.421677231789, ...
Height: 256 Width: 256 [0.0, 0.0, 0.0, 0.0,
0.103199839592, ...
Height: 256 Width: 256 [0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 2.04135799408, ...
Height: 256 Width: 256 [0.0, 0.0, 0.0,
0.938414514065, 0.0, ...
Height: 256 Width: 256 [2.32718586922,
1.89031362534, ...
Height: 256 Width: 256 [0.0, 0.0, 0.0, 0.0, 0.0,
5.37764167786, ...
[9144 rows x 2 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

Part III: Finding similar images via Nearest Neighbors on Extracted Features

Knowing that similar extracted features should mean visually similar images, we can do a similar image search simply by finding an images nearest neighbors in the feature space. We demonstrate this below.

First, we construct the nearest neighbors model on the extracted features. This will allow us to see each image's closest neighbor


In [4]:
nearest_neighbor_model = graphlab.nearest_neighbors.create(images, features=['extracted_features'])


PROGRESS: Starting brute force nearest neighbors model training.

In [5]:
similar_images = nearest_neighbor_model.query(images, k = 2)


PROGRESS: Starting blockwise querying.
PROGRESS: max rows per data block: 7668
PROGRESS: number of reference data blocks: 4
PROGRESS: number of query data blocks: 2
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 4572         | 1e+07   | 12.5        | 25.51s       |
PROGRESS: | 9144         | 5.2e+07 | 62.5        | 52.44s       |
PROGRESS: | Done         | 8.4e+07 | 100         | 52.64s       |
PROGRESS: +--------------+---------+-------------+--------------+

similar_images is an SFrame which contains a query label, and it's neighbor, the reference label


In [6]:
similar_images


Out[6]:
query_label reference_label distance rank
0 0 0.0 1
0 1535 30.6212798551 2
1 1 2.6973983047e-06 1
1 8990 54.1374904338 2
2 2 0.0 1
2 8339 36.5163977404 2
3 3 0.0 1
3 3633 28.8060467874 2
4 4 0.0 1
4 140 30.5404123745 2
[18288 rows x 4 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

We do some cleaning to remove the instances where the query equals the reference. This happened beacause the query set was identical to the reference set


In [7]:
similar_images = similar_images[similar_images['query_label'] != similar_images['reference_label']]

In [8]:
similar_images


Out[8]:
query_label reference_label distance rank
0 1535 30.6212798551 2
1 8990 54.1374904338 2
2 8339 36.5163977404 2
3 3633 28.8060467874 2
4 140 30.5404123745 2
5 6206 46.2503776144 2
6 2331 22.8088822589 2
7 6466 47.3209057413 2
8 5760 22.8112886214 2
9 1710 61.5027370061 2
[? rows x 4 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use len(sf) to force materialization.

Now we can explore similar images. For instance, the closest image to image 9 is image 1710. We can view and see both are starfish


In [9]:
graphlab.canvas.set_target('ipynb')
graphlab.SArray([images['image'][9]]).show()



In [10]:
graphlab.SArray([images['image'][1710]]).show()


Similarly, images 0 and 1535 are two similar photos of the same person


In [11]:
graphlab.SArray([images['image'][0]]).show()



In [12]:
graphlab.SArray([images['image'][1535]]).show()