Similar Image Search Using Deep Learning on Caltech-101

In this notebook, we will walk you through adapting a neural network trained on the ImageNet Challenge to find similar images within the Caltech-101 dataset. At the end of the notebook, we will be able to alogithmically identify images that are visually similar to each other within the Caltech-101 dataset.

The notebook has several parts:

Part I focuses on loading the data.
Part II focuses on using a pre-trained neural net to extract visual features.
Part III focuses on using the extracted visual features to train a nearest neighbors model.

Part I: The Data

In this notebook, we use the Caltech-101 dataset. Caltech-101 contains photos of objects belonging to 101 categories. Note: This is a large dataset, so it may take a while to download.It was collected by Fei-Fei Li, Marco Andreetto, and Marc 'Aurelio Ranzato in September 2003.



In [1]:

    
import graphlab 
images = graphlab.SFrame('https://static.turi.com/datasets/caltech_101/caltech_101_images')









    



[INFO] 1447192055 : INFO:     (initialize_globals_from_environment:282): Setting configuration variable GRAPHLAB_FILEIO_ALTERNATIVE_SSL_CERT_FILE to /Users/zach/anaconda/lib/python2.7/site-packages/certifi/cacert.pem
1447192055 : INFO:     (initialize_globals_from_environment:282): Setting configuration variable GRAPHLAB_FILEIO_ALTERNATIVE_SSL_CERT_DIR to 
This commercial license of GraphLab Create is assigned to engr@turi.com.

[INFO] Start server at: ipc:///tmp/graphlab_server-3587 - Server binary: /Users/zach/anaconda/lib/python2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1447192055.log
[INFO] GraphLab Server Version: 1.6.914






    




PROGRESS: Downloading https://static.turi.com/datasets/caltech_101/caltech_101_images/dir_archive.ini to /var/tmp/graphlab-zach/3587/da1e95a5-df9a-4345-8c7e-d299ca222f30.ini






    




PROGRESS: Downloading https://static.turi.com/datasets/caltech_101/caltech_101_images/objects.bin to /var/tmp/graphlab-zach/3587/1dc55e67-3e98-43a6-ab5b-4b09830b9b27.bin






    




PROGRESS: Downloading https://static.turi.com/datasets/caltech_101/caltech_101_images/m_96330e5f81c95d09.frame_idx to /var/tmp/graphlab-zach/3587/3e7ef2ec-592c-4af3-90e8-78bba042ad43.frame_idx






    




PROGRESS: Downloading https://static.turi.com/datasets/caltech_101/caltech_101_images/m_96330e5f81c95d09.sidx to /var/tmp/graphlab-zach/3587/05bae9df-44d7-47de-99f2-fa8bd2ec01d0.sidx

Part II: Extracting Features

We use the neural network trained on the 1.2 million images of the ImageNet Challenge. For each image of the Caltech-101 dataset, we take the activations of the layer before the classification layer and consider that our feature vector for the image. This is a sort of internal representation of what the network knows about the image. If these feature vectors are similar for two images, then the images should be similar as well. This concept is covered more thoroughly in our blog post on the subject. Note that feature exctraction will not be feasible without a GPU and the GPU installation. In that case, you should download the SArray that contains the result of this step.



In [2]:

    
# Only do this if you have a GPU
#pretrained_model = graphlab.load_model('https://static.turi.com/models/imagenet_model_iter45')
#images['extracted_features'] = pretrained_model.extract_features(images)

# If you do not have a GPU, do this instead. 
images['extracted_features'] = graphlab.SArray('https://static.turi.com/models/pre_extracted_features.gl')









    




PROGRESS: Downloading https://static.turi.com/models/pre_extracted_features.gl/dir_archive.ini to /var/tmp/graphlab-zach/3587/3c85f561-f308-4283-98a2-8bb0a3accdaa.ini






    




PROGRESS: Downloading https://static.turi.com/models/pre_extracted_features.gl/objects.bin to /var/tmp/graphlab-zach/3587/9be9dc1e-9920-4013-8ec9-bd11f12abd1d.bin






    




PROGRESS: Downloading https://static.turi.com/models/pre_extracted_features.gl/m_3e736cfeecc9a07f.sidx to /var/tmp/graphlab-zach/3587/738c39da-802b-4893-be36-c5cf300b5935.sidx

Now, let's inspect the images SFrame. The 'extracted_features' column contains vector representations of the data, as we expected it to.



In [3]:

    
images









    




PROGRESS: Downloading https://static.turi.com/datasets/caltech_101/caltech_101_images/m_96330e5f81c95d09.0000 to /var/tmp/graphlab-zach/3587/da131909-64e5-4036-a79a-c7fb676a527b.0000






    




PROGRESS: Downloading https://static.turi.com/datasets/caltech_101/caltech_101_images/m_96330e5f81c95d09.0001 to /var/tmp/graphlab-zach/3587/5364f7ab-1c6c-47de-be7b-363893ded4c8.0001






    




PROGRESS: Downloading https://static.turi.com/datasets/caltech_101/caltech_101_images/m_96330e5f81c95d09.0002 to /var/tmp/graphlab-zach/3587/6d9972c1-b96f-413a-861d-435c25e4c3ae.0002






    




PROGRESS: Downloading https://static.turi.com/datasets/caltech_101/caltech_101_images/m_96330e5f81c95d09.0003 to /var/tmp/graphlab-zach/3587/00f6f17c-a4eb-4ded-bfea-05afb6e0e9ba.0003






    




PROGRESS: Downloading https://static.turi.com/models/pre_extracted_features.gl/m_3e736cfeecc9a07f.0000 to /var/tmp/graphlab-zach/3587/01c9c9c1-71e9-406d-be63-28a8fdd95ec2.0000






    




PROGRESS: Downloading https://static.turi.com/models/pre_extracted_features.gl/m_3e736cfeecc9a07f.0001 to /var/tmp/graphlab-zach/3587/c76dd2f7-b5fb-4383-adef-380cbc7b2bc2.0001






    




PROGRESS: Downloading https://static.turi.com/models/pre_extracted_features.gl/m_3e736cfeecc9a07f.0002 to /var/tmp/graphlab-zach/3587/62d3e36e-a352-4be6-ac8b-1e8d1f895790.0002






    




PROGRESS: Downloading https://static.turi.com/models/pre_extracted_features.gl/m_3e736cfeecc9a07f.0003 to /var/tmp/graphlab-zach/3587/0d6a1fe9-33ff-43fb-b33d-a92ca16e35cb.0003






    Out[3]:





    
        image
        extracted_features
    
    
        Height: 256 Width: 256
        [0.0, 1.23551917076,
0.780339300632, ...
    
    
        Height: 256 Width: 256
        [0.0, 0.0, 0.0, 0.0,
2.2158780098, ...
    
    
        Height: 256 Width: 256
        [0.700446844101, 0.0,
0.0, 2.748742342, 0.0, ...
    
    
        Height: 256 Width: 256
        [0.0, 2.67075538635,
1.84025931358, ...
    
    
        Height: 256 Width: 256
        [1.50202608109,
0.421677231789, ...
    
    
        Height: 256 Width: 256
        [0.0, 0.0, 0.0, 0.0,
0.103199839592, ...
    
    
        Height: 256 Width: 256
        [0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 2.04135799408, ...
    
    
        Height: 256 Width: 256
        [0.0, 0.0, 0.0,
0.938414514065, 0.0, ...
    
    
        Height: 256 Width: 256
        [2.32718586922,
1.89031362534, ...
    
    
        Height: 256 Width: 256
        [0.0, 0.0, 0.0, 0.0, 0.0,
5.37764167786, ...
    

[9144 rows x 2 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

Part III: Finding similar images via Nearest Neighbors on Extracted Features

Knowing that similar extracted features should mean visually similar images, we can do a similar image search simply by finding an images nearest neighbors in the feature space. We demonstrate this below.

First, we construct the nearest neighbors model on the extracted features. This will allow us to see each image's closest neighbor



In [4]:

    
nearest_neighbor_model = graphlab.nearest_neighbors.create(images, features=['extracted_features'])









    




PROGRESS: Starting brute force nearest neighbors model training.



In [5]:

    
similar_images = nearest_neighbor_model.query(images, k = 2)









    




PROGRESS: Starting blockwise querying.






    




PROGRESS: max rows per data block: 7668






    




PROGRESS: number of reference data blocks: 4






    




PROGRESS: number of query data blocks: 2






    




PROGRESS: +--------------+---------+-------------+--------------+






    




PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |






    




PROGRESS: +--------------+---------+-------------+--------------+






    




PROGRESS: | 4572         | 1e+07   | 12.5        | 25.51s       |






    




PROGRESS: | 9144         | 5.2e+07 | 62.5        | 52.44s       |






    




PROGRESS: | Done         | 8.4e+07 | 100         | 52.64s       |






    




PROGRESS: +--------------+---------+-------------+--------------+

similar_images is an SFrame which contains a query label, and it's neighbor, the reference label



In [6]:

    
similar_images









    Out[6]:





    
        query_label
        reference_label
        distance
        rank
    
    
        0
        0
        0.0
        1
    
    
        0
        1535
        30.6212798551
        2
    
    
        1
        1
        2.6973983047e-06
        1
    
    
        1
        8990
        54.1374904338
        2
    
    
        2
        2
        0.0
        1
    
    
        2
        8339
        36.5163977404
        2
    
    
        3
        3
        0.0
        1
    
    
        3
        3633
        28.8060467874
        2
    
    
        4
        4
        0.0
        1
    
    
        4
        140
        30.5404123745
        2
    

[18288 rows x 4 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

We do some cleaning to remove the instances where the query equals the reference. This happened beacause the query set was identical to the reference set



In [7]:

    
similar_images = similar_images[similar_images['query_label'] != similar_images['reference_label']]



In [8]:

    
similar_images









    Out[8]:





    
        query_label
        reference_label
        distance
        rank
    
    
        0
        1535
        30.6212798551
        2
    
    
        1
        8990
        54.1374904338
        2
    
    
        2
        8339
        36.5163977404
        2
    
    
        3
        3633
        28.8060467874
        2
    
    
        4
        140
        30.5404123745
        2
    
    
        5
        6206
        46.2503776144
        2
    
    
        6
        2331
        22.8088822589
        2
    
    
        7
        6466
        47.3209057413
        2
    
    
        8
        5760
        22.8112886214
        2
    
    
        9
        1710
        61.5027370061
        2
    

[? rows x 4 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use len(sf) to force materialization.

Now we can explore similar images. For instance, the closest image to image 9 is image 1710. We can view and see both are starfish



In [9]:

    
graphlab.canvas.set_target('ipynb')
graphlab.SArray([images['image'][9]]).show()



In [10]:

    
graphlab.SArray([images['image'][1710]]).show()

Similarly, images 0 and 1535 are two similar photos of the same person



In [11]:

    
graphlab.SArray([images['image'][0]]).show()



In [12]:

    
graphlab.SArray([images['image'][1535]]).show()

image	extracted_features
Height: 256 Width: 256	[0.0, 1.23551917076, 0.780339300632, ...
Height: 256 Width: 256	[0.0, 0.0, 0.0, 0.0, 2.2158780098, ...
Height: 256 Width: 256	[0.700446844101, 0.0, 0.0, 2.748742342, 0.0, ...
Height: 256 Width: 256	[0.0, 2.67075538635, 1.84025931358, ...
Height: 256 Width: 256	[1.50202608109, 0.421677231789, ...
Height: 256 Width: 256	[0.0, 0.0, 0.0, 0.0, 0.103199839592, ...
Height: 256 Width: 256	[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.04135799408, ...
Height: 256 Width: 256	[0.0, 0.0, 0.0, 0.938414514065, 0.0, ...
Height: 256 Width: 256	[2.32718586922, 1.89031362534, ...
Height: 256 Width: 256	[0.0, 0.0, 0.0, 0.0, 0.0, 5.37764167786, ...

query_label	reference_label	distance	rank
0	0	0.0	1
0	1535	30.6212798551	2
1	1	2.6973983047e-06	1
1	8990	54.1374904338	2
2	2	0.0	1
2	8339	36.5163977404	2
3	3	0.0	1
3	3633	28.8060467874	2
4	4	0.0	1
4	140	30.5404123745	2