In [1]:
import graphlab

Download CIFAR-10 data


In [2]:
image_train_url = 'https://d396qusza40orc.cloudfront.net/phoenixassets/image_train_data.csv'
image_test_url = 'https://d396qusza40orc.cloudfront.net/phoenixassets/image_test_data.csv'
image_train_data = graphlab.SFrame(image_train_url)
image_test_data = graphlab.SFrame(image_test_url)


This non-commercial license of GraphLab Create for academic use is assigned to william_gray@alumni.brown.edu and will expire on March 20, 2018.
[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1492124461.log
Downloading https://d396qusza40orc.cloudfront.net/phoenixassets/image_train_data.csv to /var/tmp/graphlab-williamgray1/1741/6d140f8e-30e2-4fdd-b9ec-23e9c94c944b.csv
Finished parsing file https://d396qusza40orc.cloudfront.net/phoenixassets/image_train_data.csv
Parsing completed. Parsed 100 lines in 1.53976 secs.
------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,str,str,array,array]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
Read 1943 lines. Lines per second: 672.08
Finished parsing file https://d396qusza40orc.cloudfront.net/phoenixassets/image_train_data.csv
Parsing completed. Parsed 2005 lines in 2.94329 secs.
Downloading https://d396qusza40orc.cloudfront.net/phoenixassets/image_test_data.csv to /var/tmp/graphlab-williamgray1/1741/69c39574-cfd7-4708-9ddb-6c91d6d3c26d.csv
Finished parsing file https://d396qusza40orc.cloudfront.net/phoenixassets/image_test_data.csv
Parsing completed. Parsed 100 lines in 1.67298 secs.
------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,str,str,array,array]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
Read 1940 lines. Lines per second: 604.414
Finished parsing file https://d396qusza40orc.cloudfront.net/phoenixassets/image_test_data.csv
Parsing completed. Parsed 4000 lines in 5.27403 secs.

In [3]:
image_train_data.head()


Out[3]:
id image label deep_features image_array
24 Height: 32 Width: 32 bird [0.242872, 1.09545, 0.0,
0.39363, 0.0, 0.0, ...
[73.0, 77.0, 58.0, 71.0,
68.0, 50.0, 77.0, 69.0, ...
33 Height: 32 Width: 32 cat [0.525088, 0.0, 0.0, 0.0,
0.0, 0.0, 9.94829, 0.0, ...
[7.0, 5.0, 8.0, 7.0, 5.0,
8.0, 5.0, 4.0, 6.0, 7.0, ...
36 Height: 32 Width: 32 cat [0.566016, 0.0, 0.0, 0.0,
0.0, 0.0, 9.9972, 0.0, ...
[169.0, 122.0, 65.0,
131.0, 108.0, 75.0, ...
70 Height: 32 Width: 32 dog [1.1298, 0.0, 0.0,
0.778194, 0.0, 0.758051, ...
[154.0, 179.0, 152.0,
159.0, 183.0, 157.0, ...
90 Height: 32 Width: 32 bird [1.71787, 0.0, 0.0, 0.0,
0.0, 0.0, 9.33936, 0.0, ...
[216.0, 195.0, 180.0,
201.0, 178.0, 160.0, ...
97 Height: 32 Width: 32 automobile [1.57819, 0.0, 0.0, 0.0,
0.0, 0.0, 9.00632, 0.0, ...
[33.0, 44.0, 27.0, 29.0,
44.0, 31.0, 32.0, 45.0, ...
107 Height: 32 Width: 32 dog [0.0, 0.0, 0.220678, 0.0,
0.0, 0.0, 8.58053, ...
[97.0, 51.0, 31.0, 104.0,
58.0, 38.0, 107.0, 61.0, ...
121 Height: 32 Width: 32 bird [0.0, 0.237535, 0.0, 0.0,
0.0, 0.0, 9.9908, 0.0, ...
[93.0, 96.0, 88.0, 102.0,
106.0, 97.0, 117.0, ...
136 Height: 32 Width: 32 automobile [0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 7.57379, 0.0, 0.0, ...
[35.0, 59.0, 53.0, 36.0,
56.0, 56.0, 42.0, 62.0, ...
138 Height: 32 Width: 32 bird [0.658936, 0.0, 0.0, 0.0,
0.0, 0.0, 9.93748, 0.0, ...
[205.0, 193.0, 195.0,
200.0, 187.0, 193.0, ...
[10 rows x 5 columns]


In [4]:
# create a k nearest neighbor model using deep features and outputting the labels of the most similar images
knn_model = graphlab.nearest_neighbors.create(image_train_data, 
                                              features=['deep_features'],
                                              label='id')


Starting brute force nearest neighbors model training.

In [10]:
# start with one image
cat = image_train_data[18:19]
cat


Out[10]:
id image label deep_features image_array
384 Height: 32 Width: 32 cat [1.04404, 0.0, 0.0, 0.0,
0.0, 0.0, 9.49541, 0.0, ...
[46.0, 45.0, 50.0, 47.0,
45.0, 51.0, 45.0, 44.0, ...
[1 rows x 5 columns]


In [11]:
# query the model to find nearest neighbors (most similar images) to the above image 
knn_model.query(cat)


Starting pairwise querying.
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 0            | 1       | 0.0498753   | 82.727ms     |
| Done         |         | 100         | 365.057ms    |
+--------------+---------+-------------+--------------+
Out[11]:
query_label reference_label distance rank
0 384 0.0 1
0 6910 36.940312418 2
0 39777 38.4634892031 3
0 36870 39.7559665724 4
0 41734 39.7865972971 5
[5 rows x 4 columns]


In [14]:
# create a function to get the remainder of the data for the nearest neighbors
def get_image_from_id(query_result):
    return image_train_data.filter_by(query_result['reference_label'], 'id')

In [18]:
# calling this function returns the profiles for the five most similar images
get_image_from_id(knn_model.query(cat))


Starting pairwise querying.
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 0            | 1       | 0.0498753   | 41.468ms     |
| Done         |         | 100         | 327.697ms    |
+--------------+---------+-------------+--------------+
Out[18]:
id image label deep_features image_array
384 Height: 32 Width: 32 cat [1.04404, 0.0, 0.0, 0.0,
0.0, 0.0, 9.49541, 0.0, ...
[46.0, 45.0, 50.0, 47.0,
45.0, 51.0, 45.0, 44.0, ...
6910 Height: 32 Width: 32 cat [1.55475, 0.0, 0.0, 0.0,
0.0, 0.0, 10.1923, 0.0, ...
[154.0, 133.0, 92.0,
134.0, 112.0, 75.0, ...
36870 Height: 32 Width: 32 cat [0.240483, 0.0, 0.0, 0.0,
0.0, 0.0, 9.52754, 0.0, ...
[16.0, 20.0, 19.0, 14.0,
19.0, 17.0, 11.0, 15.0, ...
39777 Height: 32 Width: 32 cat [0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 9.42072, 0.0, 0.0, ...
[145.0, 166.0, 165.0,
164.0, 185.0, 184.0, ...
41734 Height: 32 Width: 32 cat [0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 11.6715, 0.0, 0.0, ...
[122.0, 27.0, 34.0,
120.0, 24.0, 31.0, 11 ...
[5 rows x 5 columns]

More generalized function for finding nearest neighbors


In [19]:
show_neighbors = lambda x: get_image_from_id(knn_model.query(image_train_data[x:x+1]))

In [20]:
# enter row number to get list of that row's closest images
show_neighbors(8)


Starting pairwise querying.
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 0            | 1       | 0.0498753   | 48.924ms     |
| Done         |         | 100         | 319.208ms    |
+--------------+---------+-------------+--------------+
Out[20]:
id image label deep_features image_array
136 Height: 32 Width: 32 automobile [0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 7.57379, 0.0, 0.0, ...
[35.0, 59.0, 53.0, 36.0,
56.0, 56.0, 42.0, 62.0, ...
8977 Height: 32 Width: 32 automobile [0.0, 0.0, 0.0, 0.136156,
0.0, 0.0, 6.81498, 0.0, ...
[186.0, 195.0, 199.0,
182.0, 192.0, 198.0, ...
24146 Height: 32 Width: 32 automobile [0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 9.21664, 0.0, 0.0, ...
[229.0, 231.0, 227.0,
232.0, 235.0, 231.0, ...
33261 Height: 32 Width: 32 automobile [0.0, 0.0, 0.0, 0.0, 0.0,
0.157148, 6.90395, 0.0, ...
[110.0, 118.0, 104.0,
98.0, 104.0, 80.0, 92.0, ...
44395 Height: 32 Width: 32 automobile [0.0, 0.0, 0.0, 1.34758,
0.0, 0.0, 7.38394, 0.0, ...
[89.0, 95.0, 50.0, 83.0,
84.0, 43.0, 69.0, 70.0, ...
[5 rows x 5 columns]


In [21]:
show_neighbors(1000)


Starting pairwise querying.
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 0            | 1       | 0.0498753   | 40.291ms     |
| Done         |         | 100         | 349.625ms    |
+--------------+---------+-------------+--------------+
Out[21]:
id image label deep_features image_array
4932 Height: 32 Width: 32 automobile [0.0, 0.0, 0.0, 0.0, 0.0,
0.033414, 11.0754, 0.0, ...
[0.0, 2.0, 1.0, 0.0, 2.0,
1.0, 1.0, 2.0, 1.0, 1.0, ...
23714 Height: 32 Width: 32 automobile [0.0, 0.0, 0.0, 0.689825,
0.0, 0.0, 10.2394, 0.0, ...
[108.0, 115.0, 123.0,
106.0, 113.0, 121.0, ...
23936 Height: 32 Width: 32 automobile [0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 8.65919, 0.0, 0.0, ...
[0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, ...
25048 Height: 32 Width: 32 automobile [0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 8.01601, 0.0, 0.0, ...
[53.0, 98.0, 198.0, 60.0,
103.0, 201.0, 58.0, 9 ...
45011 Height: 32 Width: 32 automobile [0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 9.43663, 0.0, 0.0, ...
[122.0, 33.0, 25.0,
109.0, 28.0, 22.0, 10 ...
[5 rows x 5 columns]


In [ ]: