Deep Learning for Image Analysis

In this tutorial, we'll go through how one may practically apply deep learning to Image Classification and Object Detection



In [1]:

    
import graphlab 
import graphlab.mxnet
graphlab.canvas.set_target('ipynb')









    



A newer version of GraphLab Create (v2.0.1) is available! Your current version is v2.0.

You can use pip to upgrade the graphlab-create package. For more information see https://turi.com/products/create/upgrade.
[INFO] graphlab.mxnet.base: CUDA support is currently not available on this platform. GPU support is disabled.
[INFO] graphlab.cython.cy_server: GraphLab Create v2.0 started. Logging: /tmp/graphlab_server_1468121894.log






    



This commercial license of GraphLab Create is assigned to engr@turi.com.

Applying Pretrained Networks: Product classification

Suppose we need to classify products into Backpacks and Mountain Bikes. We know that deep learning models are state-of-the-art in image classification. So lets load a model that has been trained on ImageNet and apply it, after loading in our dataset



In [2]:

    
products_train = graphlab.SFrame('products_train.sf/')
products_test = graphlab.SFrame('products_test.sf/')



In [3]:

    
products_test['image'].show()



In [4]:

    
pretrained_model = graphlab.mxnet.pretrained_model.load_path('mxnet_models/imagenet1k_inception_bn/')



In [5]:

    
predictions = pretrained_model.predict_topk(products_test.head(10), k=1)



In [6]:

    
predictions['label']









    Out[6]:





dtype: dict
Rows: 10
[{'wnid': 'n02815834', 'text': 'beaker'}, {'wnid': 'n02769748', 'text': 'backpack, back pack, knapsack, packsack, rucksack, haversack'}, {'wnid': 'n04336792', 'text': 'stretcher'}, {'wnid': 'n02769748', 'text': 'backpack, back pack, knapsack, packsack, rucksack, haversack'}, {'wnid': 'n02769748', 'text': 'backpack, back pack, knapsack, packsack, rucksack, haversack'}, {'wnid': 'n02769748', 'text': 'backpack, back pack, knapsack, packsack, rucksack, haversack'}, {'wnid': 'n03623198', 'text': 'knee pad'}, {'wnid': 'n02769748', 'text': 'backpack, back pack, knapsack, packsack, rucksack, haversack'}, {'wnid': 'n03792782', 'text': 'mountain bike, all-terrain bike, off-roader'}, {'wnid': 'n02916936', 'text': 'bulletproof vest'}]

As you can see above, the label set is overly large: and they don't match the actual labels of 'Backpack' and 'Mountain Bike'

Product Classification: Transfer Learning via Extracting Features

Transfer learning is a method for adapting an existing model to a new task. We can use this idea with neural networks: vectorize the image using the nueral network in a process called extracting features, then putting a simple classifier on top. This works well.



In [7]:

    
#products_train['extracted_features'] = pretrained_model.extract_features(products_train)



In [8]:

    
#products_test['extracted_features'] = pretrained_model.extract_features(products_test)



In [9]:

    
transfer_model = graphlab.logistic_classifier.create(products_train, features=['extracted_features'], target='label', validation_set=products_test)









    




WARNING: The number of feature dimensions in this problem is very large in comparison with the number of examples. Unless an appropriate regularization value is set, this model may not provide accurate predictions for a validation/test set.






    




Logistic regression:






    




--------------------------------------------------------






    




Number of examples          : 524






    




Number of classes           : 2






    




Number of feature columns   : 1






    




Number of unpacked features : 1024






    




Number of coefficients    : 1025






    




Starting L-BFGS






    




--------------------------------------------------------






    




+-----------+----------+-----------+--------------+-------------------+---------------------+






    




| Iteration | Passes   | Step size | Elapsed Time | Training-accuracy | Validation-accuracy |






    




+-----------+----------+-----------+--------------+-------------------+---------------------+






    




| 1         | 6        | 0.000152  | 1.243296     | 0.965649          | 0.945736            |






    




| 2         | 11       | 13.000000 | 1.499172     | 0.996183          | 1.000000            |






    




| 3         | 12       | 13.000000 | 1.581619     | 0.965649          | 0.945736            |






    




| 4         | 18       | 1.259945  | 1.866942     | 1.000000          | 0.992248            |






    




| 5         | 19       | 1.259945  | 1.952945     | 1.000000          | 0.992248            |






    




| 6         | 20       | 1.259945  | 2.076350     | 1.000000          | 0.992248            |






    




+-----------+----------+-----------+--------------+-------------------+---------------------+






    




SUCCESS: Optimal solution found.

Validation Accuracy appears to be quite good.

Product Similarity: Nearest Neighbors and Extracted Features

Let us try searching for visually similar products.



In [10]:

    
products_all = products_test.append(products_train)
products_all = products_all.add_row_number()



In [11]:

    
nearest_neighbors_model = graphlab.nearest_neighbors.create(products_all, features=['extracted_features'])









    




Starting brute force nearest neighbors model training.



In [12]:

    
query = products_all[0:1]
query['image'].show()



In [13]:

    
query_results = nearest_neighbors_model.query(query)









    




Starting pairwise querying.






    




+--------------+---------+-------------+--------------+






    




| Query points | # Pairs | % Complete. | Elapsed Time |






    




+--------------+---------+-------------+--------------+






    




| 0            | 1       | 0.153139    | 9.882ms      |






    




| Done         |         | 100         | 38.313ms     |






    




+--------------+---------+-------------+--------------+



In [14]:

    
query_results









    Out[14]:





    
        query_label
        reference_label
        distance
        rank
    
    
        0
        0
        0.0
        1
    
    
        0
        421
        10.1641556948
        2
    
    
        0
        377
        15.2248245533
        3
    
    
        0
        308
        17.0622185119
        4
    
    
        0
        451
        19.8245874976
        5
    

[5 rows x 4 columns]



In [15]:

    
filtered_results = products_all.filter_by(query_results['reference_label'], 'id')



In [16]:

    
filtered_results['image'].show()

Back to Classification: Using pre-defined architechtures

If you want a network that is more custom-tailored to a task, you can use a pre-defined network architechture (that has worked on other tasks) on your own task. Let's try it here.



In [17]:

    
network = graphlab.deeplearning.create(products_train, target='label')



In [18]:

    
network









    Out[18]:





### network layers ###
layer[0]: ConvolutionLayer
  init_random = gaussian
  padding = 0
  stride = 2
  num_channels = 10
  num_groups = 1
  kernel_size = 3
layer[1]: MaxPoolingLayer
  padding = 0
  stride = 2
  kernel_size = 3
layer[2]: FlattenLayer
layer[3]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 100
layer[4]: RectifiedLinearLayer
layer[5]: DropoutLayer
  threshold = 0.5
layer[6]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 2
layer[7]: SoftmaxLayer
### end network layers ###

### network parameters ###
learning_rate = 0.001
momentum = 0.9
### end network parameters ###



In [19]:

    
products_test['image'] = graphlab.image_analysis.resize(products_test['image'], 224, 224, 3)



In [20]:

    
neural_net_model = graphlab.neuralnet_classifier.create(products_train,network = network, features=['image'], target='label', validation_set=products_test, max_iterations = 3)









    



Using network:

### network layers ###
layer[0]: ConvolutionLayer
  init_random = gaussian
  padding = 0
  stride = 2
  num_channels = 10
  num_groups = 1
  kernel_size = 3
layer[1]: MaxPoolingLayer
  padding = 0
  stride = 2
  kernel_size = 3
layer[2]: FlattenLayer
layer[3]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 100
layer[4]: RectifiedLinearLayer
layer[5]: DropoutLayer
  threshold = 0.5
layer[6]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 2
layer[7]: SoftmaxLayer
### end network layers ###

### network parameters ###
learning_rate = 0.001
momentum = 0.9
### end network parameters ###







    




Computing mean image...






    




Done computing mean image.






    




Creating neuralnet using cpu






    




Training with batch size = 100






    




+-----------+----------+--------------+-------------------+---------------------+-----------------+






    




| Iteration | Examples | Elapsed Time | Training-accuracy | Validation-accuracy | Examples/second |






    




+-----------+----------+--------------+-------------------+---------------------+-----------------+






    




| 1         | 300      | 11.160981    | 0.616667          |                     | 26.879393       |






    




| 1         | 600      | 22.161945    | 0.791667          |                     | 27.270309       |






    




| 1         | 600      | 25.768270    | 0.791667          | 0.945736            | 27.270309       |






    




| 2         | 300      | 38.445966    | 0.960000          |                     | 23.663729       |






    




| 2         | 600      | 49.142711    | 0.965000          |                     | 28.045956       |






    




| 2         | 600      | 53.064513    | 0.965000          | 0.945736            | 28.045956       |






    




| 3         | 300      | 64.622525    | 0.963333          |                     | 25.956163       |






    




| 3         | 600      | 76.274524    | 0.968333          |                     | 25.746655       |






    




| 3         | 600      | 80.304676    | 0.968333          | 0.945736            | 25.746655       |






    




+-----------+----------+--------------+-------------------+---------------------+-----------------+

Building Custom Network for Classification

You may want to customize the architechture as well. For instance, you may want . Typically, one modifies existing network architechtures instead of building one completely from scrach.



In [21]:

    
network









    Out[21]:





### network layers ###
layer[0]: ConvolutionLayer
  init_random = gaussian
  padding = 0
  stride = 2
  num_channels = 10
  num_groups = 1
  kernel_size = 3
layer[1]: MaxPoolingLayer
  padding = 0
  stride = 2
  kernel_size = 3
layer[2]: FlattenLayer
layer[3]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 100
layer[4]: RectifiedLinearLayer
layer[5]: DropoutLayer
  threshold = 0.5
layer[6]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 2
layer[7]: SoftmaxLayer
### end network layers ###

### network parameters ###
learning_rate = 0.001
momentum = 0.9
### end network parameters ###

Let's add some hidden units to the network



In [22]:

    
network.layers[3] = graphlab.deeplearning.layers.FullConnectionLayer(200)



In [23]:

    
network









    Out[23]:





### network layers ###
layer[0]: ConvolutionLayer
  init_random = gaussian
  padding = 0
  stride = 2
  num_channels = 10
  num_groups = 1
  kernel_size = 3
layer[1]: MaxPoolingLayer
  padding = 0
  stride = 2
  kernel_size = 3
layer[2]: FlattenLayer
layer[3]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 200
layer[4]: RectifiedLinearLayer
layer[5]: DropoutLayer
  threshold = 0.5
layer[6]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 2
layer[7]: SoftmaxLayer
### end network layers ###

### network parameters ###
learning_rate = 0.001
momentum = 0.9
### end network parameters ###



In [24]:

    
neural_net_model = graphlab.neuralnet_classifier.create(products_train,network = network, features=['image'], target='label', validation_set=products_test, max_iterations = 3)









    



Using network:

### network layers ###
layer[0]: ConvolutionLayer
  init_random = gaussian
  padding = 0
  stride = 2
  num_channels = 10
  num_groups = 1
  kernel_size = 3
layer[1]: MaxPoolingLayer
  padding = 0
  stride = 2
  kernel_size = 3
layer[2]: FlattenLayer
layer[3]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 200
layer[4]: RectifiedLinearLayer
layer[5]: DropoutLayer
  threshold = 0.5
layer[6]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 2
layer[7]: SoftmaxLayer
### end network layers ###

### network parameters ###
learning_rate = 0.001
momentum = 0.9
### end network parameters ###







    




Computing mean image...






    




Done computing mean image.






    




Creating neuralnet using cpu






    




Training with batch size = 100






    




+-----------+----------+--------------+-------------------+---------------------+-----------------+






    




| Iteration | Examples | Elapsed Time | Training-accuracy | Validation-accuracy | Examples/second |






    




+-----------+----------+--------------+-------------------+---------------------+-----------------+






    




| 1         | 300      | 13.709992    | 0.866667          |                     | 21.881855       |






    




| 1         | 600      | 24.964975    | 0.915000          |                     | 26.654860       |






    




| 1         | 600      | 28.991313    | 0.915000          | 0.945736            | 26.654860       |






    




| 2         | 300      | 40.583383    | 0.960000          |                     | 25.879988       |






    




| 2         | 600      | 53.551415    | 0.975000          |                     | 23.133812       |






    




| 2         | 600      | 58.411255    | 0.975000          | 0.984496            | 23.133812       |






    




| 3         | 300      | 72.222624    | 0.986667          |                     | 21.721338       |






    




| 3         | 600      | 84.115119    | 0.990000          |                     | 25.225996       |






    




| 3         | 600      | 88.236170    | 0.990000          | 1.000000            | 25.225996       |






    




+-----------+----------+--------------+-------------------+---------------------+-----------------+

Object Detection

Sometimes, there are many objects in an image and it may be important to identify each seperately. Or, it may be important to identify the location of a particular object in an image. This is called Object Detection.



In [25]:

    
detector_query = graphlab.image_analysis.load_images('detection_query')









    




Unsupported image format. Supported formats are JPG and PNG	 file: /Users/piotrteterwak/Work/dss_tutorial/detection_query/.DS_Store



In [26]:

    
detector_query['image'][0].show()



In [27]:

    
detector = graphlab.mxnet.pretrained_model.load_path('mxnet_models/coco_vgg_16/')



In [28]:

    
detections = detector.detect(detector_query['image'][0])



In [29]:

    
detections









    Out[29]:





    
        box
        class
        score
        id
    
    
        [214.670715332,
229.062835693, ...
        person
        0.999236226082
        0
    
    
        [889.339233398,
525.786682129, ...
        cow
        0.50258243084
        0
    
    
        [207.77230835,
257.155212402, ...
        backpack
        0.985775470734
        0
    
    
        [218.479766846,
237.885543823, ...
        backpack
        0.626349627972
        0
    

[4 rows x 4 columns]



In [30]:

    
backpack_detections = detections.filter_by(['backpack'], 'class')



In [31]:

    
backpack_detections









    Out[31]:





    
        box
        class
        score
        id
    
    
        [207.77230835,
257.155212402, ...
        backpack
        0.985775470734
        0
    
    
        [218.479766846,
237.885543823, ...
        backpack
        0.626349627972
        0
    

[2 rows x 4 columns]



In [32]:

    
visualize = detector.visualize_detection(detector_query['image'][0], backpack_detections)



In [33]:

    
visualize.show()

Object Detection: Matching bounding box detections to a catalog.

Now, let us take the identified backpack in the image, crop it, and find the most similar images in our products SFrame.



In [34]:

    
def crop(gl_img, box):    
    _format = {'JPG': 0, 'PNG': 1, 'RAW': 2, 'UNDEFINED': 3}
    pil_img = gl_img._to_pil_image()
    cropped = pil_img.crop([int(c) for c in box])
    
    height = cropped.size[1]
    width = cropped.size[0]
    if cropped.mode == 'L':
        image_data = bytearray([z for z in cropped.getdata()])
        channels = 1
    elif cropped.mode == 'RGB':
        image_data = bytearray([z for l in cropped.getdata() for z in l ])
        channels = 3
    else:
        image_data = bytearray([z for l in cropped.getdata() for z in l])
        channels = 4
    format_enum = _format['RAW']
    image_data_size = len(image_data)

    img = graphlab.Image(_image_data=image_data, _width=width, _height=height, _channels=channels, _format_enum=format_enum, _image_data_size=image_data_size)
    return img



In [35]:

    
cropped = crop(detector_query['image'][0], backpack_detections['box'][0])



In [36]:

    
cropped.show()



In [37]:

    
query_sf = graphlab.SFrame({'image' : [cropped]})



In [38]:

    
query_sf['image'].show()



In [39]:

    
query_sf['extracted_features'] = pretrained_model.extract_feature(query_sf)









    



[INFO] graphlab.mxnet.pretrained_model: Detect image shape mismatches network input shape. Perform resize to shape (224, 224, 3)



In [40]:

    
query_sf









    Out[40]:





    
        image
        extracted_features
    
    
        Height: 147 Width: 91
        [0.451328188181,
0.225506410003, ...
    

[1 rows x 2 columns]



In [41]:

    
products_all









    Out[41]:





    
        id
        path
        image
        label
        extracted_features
    
    
        0
        /Users/piotrteterwak/Work
/dss_tutorial/product ...
        Height: 224 Width: 224
        backpacks
        [0.507035017014,
0.325989335775, ...
    
    
        1
        /Users/piotrteterwak/Work
/dss_tutorial/product ...
        Height: 224 Width: 224
        backpacks
        [0.240931421518,
0.0417900905013, ...
    
    
        2
        /Users/piotrteterwak/Work
/dss_tutorial/product ...
        Height: 224 Width: 224
        backpacks
        [0.140835434198,
0.221383109689, ...
    
    
        3
        /Users/piotrteterwak/Work
/dss_tutorial/product ...
        Height: 224 Width: 224
        backpacks
        [0.18444852531,
0.0116491876543, ...
    
    
        4
        /Users/piotrteterwak/Work
/dss_tutorial/product ...
        Height: 224 Width: 224
        backpacks
        [0.360294371843,
0.138934314251, ...
    
    
        5
        /Users/piotrteterwak/Work
/dss_tutorial/product ...
        Height: 224 Width: 224
        backpacks
        [0.441695541143,
0.122945547104, ...
    
    
        6
        /Users/piotrteterwak/Work
/dss_tutorial/product ...
        Height: 224 Width: 224
        backpacks
        [0.340819478035,
0.0191696789116, ...
    
    
        7
        /Users/piotrteterwak/Work
/dss_tutorial/product ...
        Height: 224 Width: 224
        backpacks
        [0.18196696043,
0.00862449593842, ...
    
    
        8
        /Users/piotrteterwak/Work
/dss_tutorial/products ...
        Height: 224 Width: 224
        mountain-bikes
        [0.0402251295745,
0.497428715229, ...
    
    
        9
        /Users/piotrteterwak/Work
/dss_tutorial/product ...
        Height: 224 Width: 224
        backpacks
        [0.104144528508,
0.00625701062381, ...
    

[653 rows x 5 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.



In [42]:

    
similar_backpacks = nearest_neighbors_model.query(query_sf)









    




Starting pairwise querying.






    




+--------------+---------+-------------+--------------+






    




| Query points | # Pairs | % Complete. | Elapsed Time |






    




+--------------+---------+-------------+--------------+






    




| 0            | 1       | 0.153139    | 18.707ms     |






    




| Done         |         | 100         | 53.085ms     |






    




+--------------+---------+-------------+--------------+



In [43]:

    
similar_backpacks









    Out[43]:





    
        query_label
        reference_label
        distance
        rank
    
    
        0
        244
        21.7819045741
        1
    
    
        0
        17
        21.9630621329
        2
    
    
        0
        54
        22.0989317322
        3
    
    
        0
        631
        22.5290545697
        4
    
    
        0
        342
        22.6549546738
        5
    

[5 rows x 4 columns]



In [44]:

    
filtered_similar_backpacks = products_all.filter_by(similar_backpacks['reference_label'], 'id')



In [45]:

    
filtered_similar_backpacks['image'].show()



In [ ]:

reference_label	distance	rank
0	0.0	1
421	10.1641556948	2
377	15.2248245533	3
308	17.0622185119	4
451	19.8245874976	5

box	class	score
[214.670715332, 229.062835693, ...	person	0.999236226082
[889.339233398, 525.786682129, ...	cow	0.50258243084
[207.77230835, 257.155212402, ...	backpack	0.985775470734
[218.479766846, 237.885543823, ...	backpack	0.626349627972

id	path	image	label	extracted_features
0	/Users/piotrteterwak/Work /dss_tutorial/product ...	Height: 224 Width: 224	backpacks	[0.507035017014, 0.325989335775, ...
1	/Users/piotrteterwak/Work /dss_tutorial/product ...	Height: 224 Width: 224	backpacks	[0.240931421518, 0.0417900905013, ...
2	/Users/piotrteterwak/Work /dss_tutorial/product ...	Height: 224 Width: 224	backpacks	[0.140835434198, 0.221383109689, ...
3	/Users/piotrteterwak/Work /dss_tutorial/product ...	Height: 224 Width: 224	backpacks	[0.18444852531, 0.0116491876543, ...
4	/Users/piotrteterwak/Work /dss_tutorial/product ...	Height: 224 Width: 224	backpacks	[0.360294371843, 0.138934314251, ...
5	/Users/piotrteterwak/Work /dss_tutorial/product ...	Height: 224 Width: 224	backpacks	[0.441695541143, 0.122945547104, ...
6	/Users/piotrteterwak/Work /dss_tutorial/product ...	Height: 224 Width: 224	backpacks	[0.340819478035, 0.0191696789116, ...
7	/Users/piotrteterwak/Work /dss_tutorial/product ...	Height: 224 Width: 224	backpacks	[0.18196696043, 0.00862449593842, ...
8	/Users/piotrteterwak/Work /dss_tutorial/products ...	Height: 224 Width: 224	mountain-bikes	[0.0402251295745, 0.497428715229, ...
9	/Users/piotrteterwak/Work /dss_tutorial/product ...	Height: 224 Width: 224	backpacks	[0.104144528508, 0.00625701062381, ...

reference_label	distance	rank
244	21.7819045741	1
17	21.9630621329	2
54	22.0989317322	3
631	22.5290545697	4
342	22.6549546738	5