Deep Learning for Image Analysis

In this tutorial, we'll go through how one may practically apply deep learning to Image Classification and Object Detection


In [1]:
import graphlab 
import graphlab.mxnet
graphlab.canvas.set_target('ipynb')


A newer version of GraphLab Create (v2.0.1) is available! Your current version is v2.0.

You can use pip to upgrade the graphlab-create package. For more information see https://turi.com/products/create/upgrade.
[INFO] graphlab.mxnet.base: CUDA support is currently not available on this platform. GPU support is disabled.
[INFO] graphlab.cython.cy_server: GraphLab Create v2.0 started. Logging: /tmp/graphlab_server_1468121894.log
This commercial license of GraphLab Create is assigned to engr@turi.com.

Applying Pretrained Networks: Product classification

Suppose we need to classify products into Backpacks and Mountain Bikes. We know that deep learning models are state-of-the-art in image classification. So lets load a model that has been trained on ImageNet and apply it, after loading in our dataset


In [2]:
products_train = graphlab.SFrame('products_train.sf/')
products_test = graphlab.SFrame('products_test.sf/')

In [3]:
products_test['image'].show()



In [4]:
pretrained_model = graphlab.mxnet.pretrained_model.load_path('mxnet_models/imagenet1k_inception_bn/')

In [5]:
predictions = pretrained_model.predict_topk(products_test.head(10), k=1)

In [6]:
predictions['label']


Out[6]:
dtype: dict
Rows: 10
[{'wnid': 'n02815834', 'text': 'beaker'}, {'wnid': 'n02769748', 'text': 'backpack, back pack, knapsack, packsack, rucksack, haversack'}, {'wnid': 'n04336792', 'text': 'stretcher'}, {'wnid': 'n02769748', 'text': 'backpack, back pack, knapsack, packsack, rucksack, haversack'}, {'wnid': 'n02769748', 'text': 'backpack, back pack, knapsack, packsack, rucksack, haversack'}, {'wnid': 'n02769748', 'text': 'backpack, back pack, knapsack, packsack, rucksack, haversack'}, {'wnid': 'n03623198', 'text': 'knee pad'}, {'wnid': 'n02769748', 'text': 'backpack, back pack, knapsack, packsack, rucksack, haversack'}, {'wnid': 'n03792782', 'text': 'mountain bike, all-terrain bike, off-roader'}, {'wnid': 'n02916936', 'text': 'bulletproof vest'}]

As you can see above, the label set is overly large: and they don't match the actual labels of 'Backpack' and 'Mountain Bike'

Product Classification: Transfer Learning via Extracting Features

Transfer learning is a method for adapting an existing model to a new task. We can use this idea with neural networks: vectorize the image using the nueral network in a process called extracting features, then putting a simple classifier on top. This works well.


In [7]:
#products_train['extracted_features'] = pretrained_model.extract_features(products_train)

In [8]:
#products_test['extracted_features'] = pretrained_model.extract_features(products_test)

In [9]:
transfer_model = graphlab.logistic_classifier.create(products_train, features=['extracted_features'], target='label', validation_set=products_test)


WARNING: The number of feature dimensions in this problem is very large in comparison with the number of examples. Unless an appropriate regularization value is set, this model may not provide accurate predictions for a validation/test set.
Logistic regression:
--------------------------------------------------------
Number of examples          : 524
Number of classes           : 2
Number of feature columns   : 1
Number of unpacked features : 1024
Number of coefficients    : 1025
Starting L-BFGS
--------------------------------------------------------
+-----------+----------+-----------+--------------+-------------------+---------------------+
| Iteration | Passes   | Step size | Elapsed Time | Training-accuracy | Validation-accuracy |
+-----------+----------+-----------+--------------+-------------------+---------------------+
| 1         | 6        | 0.000152  | 1.243296     | 0.965649          | 0.945736            |
| 2         | 11       | 13.000000 | 1.499172     | 0.996183          | 1.000000            |
| 3         | 12       | 13.000000 | 1.581619     | 0.965649          | 0.945736            |
| 4         | 18       | 1.259945  | 1.866942     | 1.000000          | 0.992248            |
| 5         | 19       | 1.259945  | 1.952945     | 1.000000          | 0.992248            |
| 6         | 20       | 1.259945  | 2.076350     | 1.000000          | 0.992248            |
+-----------+----------+-----------+--------------+-------------------+---------------------+
SUCCESS: Optimal solution found.

Validation Accuracy appears to be quite good.

Product Similarity: Nearest Neighbors and Extracted Features

Let us try searching for visually similar products.


In [10]:
products_all = products_test.append(products_train)
products_all = products_all.add_row_number()

In [11]:
nearest_neighbors_model = graphlab.nearest_neighbors.create(products_all, features=['extracted_features'])


Starting brute force nearest neighbors model training.

In [12]:
query = products_all[0:1]
query['image'].show()



In [13]:
query_results = nearest_neighbors_model.query(query)


Starting pairwise querying.
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 0            | 1       | 0.153139    | 9.882ms      |
| Done         |         | 100         | 38.313ms     |
+--------------+---------+-------------+--------------+

In [14]:
query_results


Out[14]:
query_label reference_label distance rank
0 0 0.0 1
0 421 10.1641556948 2
0 377 15.2248245533 3
0 308 17.0622185119 4
0 451 19.8245874976 5
[5 rows x 4 columns]

In [15]:
filtered_results = products_all.filter_by(query_results['reference_label'], 'id')

In [16]:
filtered_results['image'].show()


Back to Classification: Using pre-defined architechtures

If you want a network that is more custom-tailored to a task, you can use a pre-defined network architechture (that has worked on other tasks) on your own task. Let's try it here.


In [17]:
network = graphlab.deeplearning.create(products_train, target='label')

In [18]:
network


Out[18]:
### network layers ###
layer[0]: ConvolutionLayer
  init_random = gaussian
  padding = 0
  stride = 2
  num_channels = 10
  num_groups = 1
  kernel_size = 3
layer[1]: MaxPoolingLayer
  padding = 0
  stride = 2
  kernel_size = 3
layer[2]: FlattenLayer
layer[3]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 100
layer[4]: RectifiedLinearLayer
layer[5]: DropoutLayer
  threshold = 0.5
layer[6]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 2
layer[7]: SoftmaxLayer
### end network layers ###

### network parameters ###
learning_rate = 0.001
momentum = 0.9
### end network parameters ###

In [19]:
products_test['image'] = graphlab.image_analysis.resize(products_test['image'], 224, 224, 3)

In [20]:
neural_net_model = graphlab.neuralnet_classifier.create(products_train,network = network, features=['image'], target='label', validation_set=products_test, max_iterations = 3)


Using network:

### network layers ###
layer[0]: ConvolutionLayer
  init_random = gaussian
  padding = 0
  stride = 2
  num_channels = 10
  num_groups = 1
  kernel_size = 3
layer[1]: MaxPoolingLayer
  padding = 0
  stride = 2
  kernel_size = 3
layer[2]: FlattenLayer
layer[3]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 100
layer[4]: RectifiedLinearLayer
layer[5]: DropoutLayer
  threshold = 0.5
layer[6]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 2
layer[7]: SoftmaxLayer
### end network layers ###

### network parameters ###
learning_rate = 0.001
momentum = 0.9
### end network parameters ###

Computing mean image...
Done computing mean image.
Creating neuralnet using cpu
Training with batch size = 100
+-----------+----------+--------------+-------------------+---------------------+-----------------+
| Iteration | Examples | Elapsed Time | Training-accuracy | Validation-accuracy | Examples/second |
+-----------+----------+--------------+-------------------+---------------------+-----------------+
| 1         | 300      | 11.160981    | 0.616667          |                     | 26.879393       |
| 1         | 600      | 22.161945    | 0.791667          |                     | 27.270309       |
| 1         | 600      | 25.768270    | 0.791667          | 0.945736            | 27.270309       |
| 2         | 300      | 38.445966    | 0.960000          |                     | 23.663729       |
| 2         | 600      | 49.142711    | 0.965000          |                     | 28.045956       |
| 2         | 600      | 53.064513    | 0.965000          | 0.945736            | 28.045956       |
| 3         | 300      | 64.622525    | 0.963333          |                     | 25.956163       |
| 3         | 600      | 76.274524    | 0.968333          |                     | 25.746655       |
| 3         | 600      | 80.304676    | 0.968333          | 0.945736            | 25.746655       |
+-----------+----------+--------------+-------------------+---------------------+-----------------+

Building Custom Network for Classification

You may want to customize the architechture as well. For instance, you may want . Typically, one modifies existing network architechtures instead of building one completely from scrach.


In [21]:
network


Out[21]:
### network layers ###
layer[0]: ConvolutionLayer
  init_random = gaussian
  padding = 0
  stride = 2
  num_channels = 10
  num_groups = 1
  kernel_size = 3
layer[1]: MaxPoolingLayer
  padding = 0
  stride = 2
  kernel_size = 3
layer[2]: FlattenLayer
layer[3]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 100
layer[4]: RectifiedLinearLayer
layer[5]: DropoutLayer
  threshold = 0.5
layer[6]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 2
layer[7]: SoftmaxLayer
### end network layers ###

### network parameters ###
learning_rate = 0.001
momentum = 0.9
### end network parameters ###

Let's add some hidden units to the network


In [22]:
network.layers[3] = graphlab.deeplearning.layers.FullConnectionLayer(200)

In [23]:
network


Out[23]:
### network layers ###
layer[0]: ConvolutionLayer
  init_random = gaussian
  padding = 0
  stride = 2
  num_channels = 10
  num_groups = 1
  kernel_size = 3
layer[1]: MaxPoolingLayer
  padding = 0
  stride = 2
  kernel_size = 3
layer[2]: FlattenLayer
layer[3]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 200
layer[4]: RectifiedLinearLayer
layer[5]: DropoutLayer
  threshold = 0.5
layer[6]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 2
layer[7]: SoftmaxLayer
### end network layers ###

### network parameters ###
learning_rate = 0.001
momentum = 0.9
### end network parameters ###

In [24]:
neural_net_model = graphlab.neuralnet_classifier.create(products_train,network = network, features=['image'], target='label', validation_set=products_test, max_iterations = 3)


Using network:

### network layers ###
layer[0]: ConvolutionLayer
  init_random = gaussian
  padding = 0
  stride = 2
  num_channels = 10
  num_groups = 1
  kernel_size = 3
layer[1]: MaxPoolingLayer
  padding = 0
  stride = 2
  kernel_size = 3
layer[2]: FlattenLayer
layer[3]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 200
layer[4]: RectifiedLinearLayer
layer[5]: DropoutLayer
  threshold = 0.5
layer[6]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 2
layer[7]: SoftmaxLayer
### end network layers ###

### network parameters ###
learning_rate = 0.001
momentum = 0.9
### end network parameters ###

Computing mean image...
Done computing mean image.
Creating neuralnet using cpu
Training with batch size = 100
+-----------+----------+--------------+-------------------+---------------------+-----------------+
| Iteration | Examples | Elapsed Time | Training-accuracy | Validation-accuracy | Examples/second |
+-----------+----------+--------------+-------------------+---------------------+-----------------+
| 1         | 300      | 13.709992    | 0.866667          |                     | 21.881855       |
| 1         | 600      | 24.964975    | 0.915000          |                     | 26.654860       |
| 1         | 600      | 28.991313    | 0.915000          | 0.945736            | 26.654860       |
| 2         | 300      | 40.583383    | 0.960000          |                     | 25.879988       |
| 2         | 600      | 53.551415    | 0.975000          |                     | 23.133812       |
| 2         | 600      | 58.411255    | 0.975000          | 0.984496            | 23.133812       |
| 3         | 300      | 72.222624    | 0.986667          |                     | 21.721338       |
| 3         | 600      | 84.115119    | 0.990000          |                     | 25.225996       |
| 3         | 600      | 88.236170    | 0.990000          | 1.000000            | 25.225996       |
+-----------+----------+--------------+-------------------+---------------------+-----------------+

Object Detection

Sometimes, there are many objects in an image and it may be important to identify each seperately. Or, it may be important to identify the location of a particular object in an image. This is called Object Detection.


In [25]:
detector_query = graphlab.image_analysis.load_images('detection_query')


Unsupported image format. Supported formats are JPG and PNG	 file: /Users/piotrteterwak/Work/dss_tutorial/detection_query/.DS_Store

In [26]:
detector_query['image'][0].show()

In [27]:
detector = graphlab.mxnet.pretrained_model.load_path('mxnet_models/coco_vgg_16/')

In [28]:
detections = detector.detect(detector_query['image'][0])

In [29]:
detections


Out[29]:
box class score id
[214.670715332,
229.062835693, ...
person 0.999236226082 0
[889.339233398,
525.786682129, ...
cow 0.50258243084 0
[207.77230835,
257.155212402, ...
backpack 0.985775470734 0
[218.479766846,
237.885543823, ...
backpack 0.626349627972 0
[4 rows x 4 columns]


In [30]:
backpack_detections = detections.filter_by(['backpack'], 'class')

In [31]:
backpack_detections


Out[31]:
box class score id
[207.77230835,
257.155212402, ...
backpack 0.985775470734 0
[218.479766846,
237.885543823, ...
backpack 0.626349627972 0
[2 rows x 4 columns]


In [32]:
visualize = detector.visualize_detection(detector_query['image'][0], backpack_detections)

In [33]:
visualize.show()

Object Detection: Matching bounding box detections to a catalog.

Now, let us take the identified backpack in the image, crop it, and find the most similar images in our products SFrame.


In [34]:
def crop(gl_img, box):    
    _format = {'JPG': 0, 'PNG': 1, 'RAW': 2, 'UNDEFINED': 3}
    pil_img = gl_img._to_pil_image()
    cropped = pil_img.crop([int(c) for c in box])
    
    height = cropped.size[1]
    width = cropped.size[0]
    if cropped.mode == 'L':
        image_data = bytearray([z for z in cropped.getdata()])
        channels = 1
    elif cropped.mode == 'RGB':
        image_data = bytearray([z for l in cropped.getdata() for z in l ])
        channels = 3
    else:
        image_data = bytearray([z for l in cropped.getdata() for z in l])
        channels = 4
    format_enum = _format['RAW']
    image_data_size = len(image_data)

    img = graphlab.Image(_image_data=image_data, _width=width, _height=height, _channels=channels, _format_enum=format_enum, _image_data_size=image_data_size)
    return img

In [35]:
cropped = crop(detector_query['image'][0], backpack_detections['box'][0])

In [36]:
cropped.show()

In [37]:
query_sf = graphlab.SFrame({'image' : [cropped]})

In [38]:
query_sf['image'].show()



In [39]:
query_sf['extracted_features'] = pretrained_model.extract_feature(query_sf)


[INFO] graphlab.mxnet.pretrained_model: Detect image shape mismatches network input shape. Perform resize to shape (224, 224, 3)

In [40]:
query_sf


Out[40]:
image extracted_features
Height: 147 Width: 91 [0.451328188181,
0.225506410003, ...
[1 rows x 2 columns]


In [41]:
products_all


Out[41]:
id path image label extracted_features
0 /Users/piotrteterwak/Work
/dss_tutorial/product ...
Height: 224 Width: 224 backpacks [0.507035017014,
0.325989335775, ...
1 /Users/piotrteterwak/Work
/dss_tutorial/product ...
Height: 224 Width: 224 backpacks [0.240931421518,
0.0417900905013, ...
2 /Users/piotrteterwak/Work
/dss_tutorial/product ...
Height: 224 Width: 224 backpacks [0.140835434198,
0.221383109689, ...
3 /Users/piotrteterwak/Work
/dss_tutorial/product ...
Height: 224 Width: 224 backpacks [0.18444852531,
0.0116491876543, ...
4 /Users/piotrteterwak/Work
/dss_tutorial/product ...
Height: 224 Width: 224 backpacks [0.360294371843,
0.138934314251, ...
5 /Users/piotrteterwak/Work
/dss_tutorial/product ...
Height: 224 Width: 224 backpacks [0.441695541143,
0.122945547104, ...
6 /Users/piotrteterwak/Work
/dss_tutorial/product ...
Height: 224 Width: 224 backpacks [0.340819478035,
0.0191696789116, ...
7 /Users/piotrteterwak/Work
/dss_tutorial/product ...
Height: 224 Width: 224 backpacks [0.18196696043,
0.00862449593842, ...
8 /Users/piotrteterwak/Work
/dss_tutorial/products ...
Height: 224 Width: 224 mountain-bikes [0.0402251295745,
0.497428715229, ...
9 /Users/piotrteterwak/Work
/dss_tutorial/product ...
Height: 224 Width: 224 backpacks [0.104144528508,
0.00625701062381, ...
[653 rows x 5 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.


In [42]:
similar_backpacks = nearest_neighbors_model.query(query_sf)


Starting pairwise querying.
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 0            | 1       | 0.153139    | 18.707ms     |
| Done         |         | 100         | 53.085ms     |
+--------------+---------+-------------+--------------+

In [43]:
similar_backpacks


Out[43]:
query_label reference_label distance rank
0 244 21.7819045741 1
0 17 21.9630621329 2
0 54 22.0989317322 3
0 631 22.5290545697 4
0 342 22.6549546738 5
[5 rows x 4 columns]


In [44]:
filtered_similar_backpacks = products_all.filter_by(similar_backpacks['reference_label'], 'id')

In [45]:
filtered_similar_backpacks['image'].show()



In [ ]: