In [38]:
from IPython.display import Image

CNTK 201B: Hands On Labs Image Recognition

This hands-on lab shows how to implement image recognition task using convolution network with CNTK v2 Python API. You will start with a basic feedforward CNN architecture to classify CIFAR dataset, then you will keep adding advanced features to your network. Finally, you will implement a VGG net and residual net like the one that won ImageNet competition but smaller in size.

Introduction

In this hands-on, you will practice the following:

  • Understanding subset of CNTK python API needed for image classification task.
  • Write a custom convolution network to classify CIFAR dataset.
  • Modifying the network structure by adding:
    • Dropout layer.
    • Batchnormalization layer.
  • Implement a VGG style network.
  • Introduction to Residual Nets (RESNET).
  • Implement and train RESNET network.

Prerequisites

CNTK 201A hands-on lab, in which you will download and prepare CIFAR dataset is a prerequisites for this lab. This tutorial depends on CNTK v2, so before starting this lab you will need to install CNTK v2. Furthermore, all the tutorials in this lab are done in python, therefore, you will need a basic knowledge of Python.

CNTK 102 lab is recommended but not a prerequisite for this tutorial. However, a basic understanding of Deep Learning is needed. Familiarity with basic convolution operations is highly desirable (Refer to CNTK tutorial 103D).

Dataset

You will use CIFAR 10 dataset, from https://www.cs.toronto.edu/~kriz/cifar.html, during this tutorial. The dataset contains 50000 training images and 10000 test images, all images are 32 x 32 x 3. Each image is classified as one of 10 classes as shown below:


In [2]:
# Figure 1
Image(url="https://cntk.ai/jup/201/cifar-10.png", width=500, height=500)


Out[2]:

The above image is from: https://www.cs.toronto.edu/~kriz/cifar.html

Convolution Neural Network (CNN)

We recommend completing CNTK 103D tutorial before proceeding. Here is a brief recap of Convolution Neural Network (CNN). CNN is a feedforward network comprise of a bunch of layers in such a way that the output of one layer is fed to the next layer (There are more complex architecture that skip layers, we will discuss one of those at the end of this lab). Usually, CNN start with alternating between convolution layer and pooling layer (downsample), then end up with fully connected layer for the classification part.

Convolution layer

Convolution layer consist of multiple 2D convolution kernels applied on the input image or the previous layer, each convolution kernel outputs a feature map.


In [3]:
# Figure 2
Image(url="https://cntk.ai/jup/201/Conv2D.png")


Out[3]:

The stack of feature maps output are the input to the next layer.


In [4]:
# Figure 3
Image(url="https://cntk.ai/jup/201/Conv2DFeatures.png")


Out[4]:

Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, 86(11):2278-2324, November 1998 Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

In CNTK:

Here the convolution layer in Python:

def Convolution(filter_shape,        # e.g. (3,3)
                num_filters,         # e.g. 64
                activation,          # relu or None...etc.
                init,                # Random initialization
                pad,                 # True or False
                strides)             # strides e.g. (1,1)

Pooling layer

In most CNN vision architecture, each convolution layer is succeeded by a pooling layer, so they keep alternating until the fully connected layer.

The purpose of the pooling layer is as follow:

  • Reduce the dimensionality of the previous layer, which speed up the network.
  • Provide a limited translation invariant.

Here an example of max pooling with a stride of 2:


In [5]:
# Figure 4
Image(url="https://cntk.ai/jup/201/MaxPooling.png", width=400, height=400)


Out[5]:

In CNTK:

Here the pooling layer in Python:

# Max pooling
def MaxPooling(filter_shape,  # e.g. (3,3)
               strides,       # (2,2)
               pad)           # True or False

# Average pooling
def AveragePooling(filter_shape,  # e.g. (3,3)
                   strides,       # (2,2)
                   pad)           # True or False

Dropout layer

Dropout layer takes a probability value as an input, the value is called the dropout rate. Let us say the dropout rate is 0.5, what this layer does it pick at random 50% of the nodes from the previous layer and drop them out of the network. This behavior help regularize the network.

Dropout: A Simple Way to Prevent Neural Networks from Overfitting Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov

In CNTK:

Dropout layer in Python:

# Dropout
def Dropout(prob)    # dropout rate e.g. 0.5

Batch normalization (BN)

Batch normalization is a way to make the input to each layer has zero mean and unit variance. BN help the network converge faster and keep the input of each layer around zero. BN has two learnable parameters called gamma and beta, the purpose of those parameters is for the network to decide for itself if the normalized input is what is best or the raw input.

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Sergey Ioffe, Christian Szegedy

In CNTK:

Batch normalization layer in Python:

# Batch normalization
def BatchNormalization(map_rank)  # For image map_rank=1

Microsoft Cognitive Network Toolkit (CNTK)

CNTK is a highly flexible computation graphs, each node take inputs as tensors and produce tensors as the result of the computation. Each node is exposed in Python API, which give you the flexibility of creating any custom graphs, you can also define your own node in Python or C++ using CPU, GPU or both.

For Deep learning, you can use the low level API directly or you can use CNTK layered API. We will start with the low level API, then switch to the layered API in this lab.

So let's first import the needed modules for this lab.


In [7]:
from __future__ import print_function # Use a function definition from future version (say 3.x from 2.7 interpreter)

import matplotlib.pyplot as plt
import math
import numpy as np
import os
import PIL
import sys
try: 
    from urllib.request import urlopen 
except ImportError: 
    from urllib import urlopen

import cntk as C

In the block below, we check if we are running this notebook in the CNTK internal test machines by looking for environment variables defined there. We then select the right target device (GPU vs CPU) to test this notebook. In other cases, we use CNTK's default policy to use the best available device (GPU, if available, else CPU).


In [8]:
if 'TEST_DEVICE' in os.environ:
    if os.environ['TEST_DEVICE'] == 'cpu':
        C.device.try_set_default_device(C.device.cpu())
    else:
        C.device.try_set_default_device(C.device.gpu(0))

In [9]:
# Figure 5
Image(url="https://cntk.ai/jup/201/CNN.png")


Out[9]:

Now that we imported the needed modules, let's implement our first CNN, as shown in Figure 5 above.

Let's implement the above network using CNTK layer API:


In [10]:
def create_basic_model(input, out_dims):
    with C.layers.default_options(init=C.glorot_uniform(), activation=C.relu):
        net = C.layers.Convolution((5,5), 32, pad=True)(input)
        net = C.layers.MaxPooling((3,3), strides=(2,2))(net)

        net = C.layers.Convolution((5,5), 32, pad=True)(net)
        net = C.layers.MaxPooling((3,3), strides=(2,2))(net)

        net = C.layers.Convolution((5,5), 64, pad=True)(net)
        net = C.layers.MaxPooling((3,3), strides=(2,2))(net)
    
        net = C.layers.Dense(64)(net)
        net = C.layers.Dense(out_dims, activation=None)(net)
    
    return net

To train the above model we need two things:

  • Read the training images and their corresponding labels.
  • Define a cost function, compute the cost for each mini-batch and update the model weights according to the cost value.

To read the data in CNTK, we will use CNTK readers which handle data augmentation and can fetch data in parallel.

Example of a map text file:

S:\data\CIFAR-10\train\00001.png    9
S:\data\CIFAR-10\train\00002.png    9
S:\data\CIFAR-10\train\00003.png    4
S:\data\CIFAR-10\train\00004.png    1
S:\data\CIFAR-10\train\00005.png    1

In [16]:
# Determine the data path for testing
# Check for an environment variable defined in CNTK's test infrastructure
#envvar = 'CNTK_EXTERNAL_TESTDATA_SOURCE_DIRECTORY'
#def is_test(): return envvar in os.environ

#if is_test():
#    data_path = os.path.join(os.environ[envvar],'Image','CIFAR','v0','tutorial201')
#    data_path = os.path.normpath(data_path)
#else:
#    data_path = os.path.join('data', 'CIFAR-10')

data_path = 'C:\\Users\\hojohnl\\Source\\Repos\\CNTK\\Examples\\Image\\DataSets\\CIFAR-10'
# model dimensions
image_height = 32
image_width  = 32
num_channels = 3
num_classes  = 10

import cntk.io.transforms as xforms 
#
# Define the reader for both training and evaluation action.
#
def create_reader(map_file, mean_file, train):
    print("Reading map file:", map_file)
    print("Reading mean file:", mean_file)
    
    if not os.path.exists(map_file) or not os.path.exists(mean_file):
        raise RuntimeError("This tutorials depends 201A tutorials, please run 201A first.")

    # transformation pipeline for the features has jitter/crop only when training
    transforms = []
    # train uses data augmentation (translation only)
    if train:
        transforms += [
            xforms.crop(crop_type='randomside', side_ratio=0.8) 
        ]
    transforms += [
        xforms.scale(width=image_width, height=image_height, channels=num_channels, interpolations='linear'),
        xforms.mean(mean_file)
    ]
    # deserializer
    return C.io.MinibatchSource(C.io.ImageDeserializer(map_file, C.io.StreamDefs(
        features = C.io.StreamDef(field='image', transforms=transforms), # first column in map file is referred to as 'image'
        labels   = C.io.StreamDef(field='label', shape=num_classes)      # and second as 'label'
    )))

In [17]:
# Create the train and test readers
print(data_path)
reader_train = create_reader(os.path.join(data_path, 'train_map.txt'), 
                             os.path.join(data_path, 'CIFAR-10_mean.xml'), True)
reader_test  = create_reader(os.path.join(data_path, 'test_map.txt'), 
                             os.path.join(data_path, 'CIFAR-10_mean.xml'), False)


C:\Users\hojohnl\Source\Repos\CNTK\Examples\Image\DataSets\CIFAR-10
Reading map file: C:\Users\hojohnl\Source\Repos\CNTK\Examples\Image\DataSets\CIFAR-10\train_map.txt
Reading mean file: C:\Users\hojohnl\Source\Repos\CNTK\Examples\Image\DataSets\CIFAR-10\CIFAR-10_mean.xml
Reading map file: C:\Users\hojohnl\Source\Repos\CNTK\Examples\Image\DataSets\CIFAR-10\test_map.txt
Reading mean file: C:\Users\hojohnl\Source\Repos\CNTK\Examples\Image\DataSets\CIFAR-10\CIFAR-10_mean.xml

Now let us write the the training and validation loop.


In [63]:
#
# Train and evaluate the network.
#
def train_and_evaluate(reader_train, reader_test, max_epochs, model_func):
    # Input variables denoting the features and label data
    input_var = C.input_variable((num_channels, image_height, image_width))
    label_var = C.input_variable((num_classes))

    # Normalize the input
    feature_scale = 1.0 / 256.0
    input_var_norm = C.element_times(feature_scale, input_var)
    
    # apply model to input
    z = model_func(input_var_norm, out_dims=10)

    #
    # Training action
    #

    # loss and metric
    ce = C.cross_entropy_with_softmax(z, label_var)
    pe = C.classification_error(z, label_var)

    # training config
    epoch_size     = 50000
    minibatch_size = 64

    # Set training parameters
    lr_per_minibatch       = C.learning_rate_schedule([0.01]*10 + [0.003]*10 + [0.001], 
                                                      C.UnitType.minibatch, epoch_size)
    momentum_time_constant = C.momentum_as_time_constant_schedule(-minibatch_size/np.log(0.9))
    l2_reg_weight          = 0.001
    
    # trainer object
    learner = C.momentum_sgd(z.parameters, 
                             lr = lr_per_minibatch, 
                             momentum = momentum_time_constant, 
                             l2_regularization_weight=l2_reg_weight)
    progress_printer = C.logging.ProgressPrinter(tag='Training', num_epochs=max_epochs)
    trainer = C.Trainer(z, (ce, pe), [learner], [progress_printer])

    # define mapping from reader streams to network inputs
    input_map = {
        input_var: reader_train.streams.features,
        label_var: reader_train.streams.labels
    }

    C.logging.log_number_of_parameters(z) ; print()

    # perform model training
    batch_index = 0
    plot_data = {'batchindex':[], 'loss':[], 'error':[]}
    for epoch in range(max_epochs):       # loop over epochs
        sample_count = 0
        while sample_count < epoch_size:  # loop over minibatches in the epoch
            data = reader_train.next_minibatch(min(minibatch_size, epoch_size - sample_count), 
                                               input_map=input_map) # fetch minibatch.
            trainer.train_minibatch(data)                                   # update model with it

            sample_count += data[label_var].num_samples                     # count samples processed so far
            
            # For visualization...            
            plot_data['batchindex'].append(batch_index)
            plot_data['loss'].append(trainer.previous_minibatch_loss_average)
            plot_data['error'].append(trainer.previous_minibatch_evaluation_average)
            
            batch_index += 1
        trainer.summarize_training_progress()
        
    #
    # Evaluation action
    #
    epoch_size     = 10000
    minibatch_size = 16

    # process minibatches and evaluate the model
    metric_numer    = 0
    metric_denom    = 0
    sample_count    = 0
    minibatch_index = 0

    while sample_count < epoch_size:
        current_minibatch = min(minibatch_size, epoch_size - sample_count)

        # Fetch next test min batch.
        data = reader_test.next_minibatch(current_minibatch, input_map=input_map)

        # minibatch data to be trained with
        metric_numer += trainer.test_minibatch(data) * current_minibatch
        metric_denom += current_minibatch

        # Keep track of the number of samples processed so far.
        sample_count += data[label_var].num_samples
        minibatch_index += 1

    print("")
    print("Final Results: Minibatch[1-{}]: errs = {:0.1f}% * {}".format(minibatch_index+1, (metric_numer*100.0)/metric_denom, metric_denom))
    print("")
    
    # Visualize training result:
    window_width            = 32
    loss_cumsum             = np.cumsum(np.insert(plot_data['loss'], 0, 0)) 
    error_cumsum            = np.cumsum(np.insert(plot_data['error'], 0, 0)) 

    # Moving average.
    plot_data['batchindex'] = np.insert(plot_data['batchindex'], 0, 0)[window_width:]
    plot_data['avg_loss']   = (loss_cumsum[window_width:] - loss_cumsum[:-window_width]) / window_width
    plot_data['avg_error']  = (error_cumsum[window_width:] - error_cumsum[:-window_width]) / window_width
    
    plt.figure(1)
    plt.subplot(211)
    plt.plot(plot_data["batchindex"], plot_data["avg_loss"], 'b--')
    plt.xlabel('Minibatch number')
    plt.ylabel('Loss')
    plt.title('Minibatch run vs. Training loss ')

    plt.show()

    plt.subplot(212)
    plt.plot(plot_data["batchindex"], plot_data["avg_error"], 'r--')
    plt.xlabel('Minibatch number')
    plt.ylabel('Label Prediction Error')
    plt.title('Minibatch run vs. Label Prediction Error ')
    plt.show()
    
    return C.softmax(z)

In [64]:
pred = train_and_evaluate(reader_train, 
                          reader_test, 
                          max_epochs=5, 
                          model_func=create_basic_model)


Training 116906 parameters in 10 parameter tensors.

Learning rate per minibatch: 0.01
Momentum per sample: 0.9983550962823424
Finished Epoch[1 of 5]: [Training] loss = 2.070221 * 50000, metric = 76.25% * 50000 13.182s (3793.1 samples/s);
Finished Epoch[2 of 5]: [Training] loss = 1.705455 * 50000, metric = 63.05% * 50000 5.572s (8973.4 samples/s);
Finished Epoch[3 of 5]: [Training] loss = 1.537387 * 50000, metric = 56.39% * 50000 5.705s (8764.2 samples/s);
Finished Epoch[4 of 5]: [Training] loss = 1.445027 * 50000, metric = 52.08% * 50000 5.590s (8944.5 samples/s);
Finished Epoch[5 of 5]: [Training] loss = 1.380155 * 50000, metric = 49.56% * 50000 5.621s (8895.2 samples/s);

Final Results: Minibatch[1-626]: errs = 44.3% * 10000

Although, this model is very simple, it still has too much code, we can do better. Here the same model in more terse format:


In [20]:
def create_basic_model_terse(input, out_dims):

    with C.layers.default_options(init=C.glorot_uniform(), activation=C.relu):
        model = C.layers.Sequential([
            C.layers.For(range(3), lambda i: [
                C.layers.Convolution((5,5), [32,32,64][i], pad=True),
                C.layers.MaxPooling((3,3), strides=(2,2))
                ]),
            C.layers.Dense(64),
            C.layers.Dense(out_dims, activation=None)
        ])

    return model(input)

In [65]:
pred_basic_model = train_and_evaluate(reader_train, 
                                      reader_test, 
                                      max_epochs=10, 
                                      model_func=create_basic_model_terse)


Training 116906 parameters in 10 parameter tensors.

Learning rate per minibatch: 0.01
Momentum per sample: 0.9983550962823424
Finished Epoch[1 of 10]: [Training] loss = 2.125503 * 50000, metric = 78.39% * 50000 6.056s (8256.3 samples/s);
Finished Epoch[2 of 10]: [Training] loss = 1.772582 * 50000, metric = 65.36% * 50000 5.661s (8832.4 samples/s);
Finished Epoch[3 of 10]: [Training] loss = 1.600455 * 50000, metric = 58.59% * 50000 5.568s (8979.9 samples/s);
Finished Epoch[4 of 10]: [Training] loss = 1.492868 * 50000, metric = 54.29% * 50000 5.586s (8950.9 samples/s);
Finished Epoch[5 of 10]: [Training] loss = 1.410663 * 50000, metric = 51.15% * 50000 5.489s (9109.1 samples/s);
Finished Epoch[6 of 10]: [Training] loss = 1.333853 * 50000, metric = 47.83% * 50000 5.529s (9043.2 samples/s);
Finished Epoch[7 of 10]: [Training] loss = 1.274483 * 50000, metric = 45.52% * 50000 5.545s (9017.1 samples/s);
Finished Epoch[8 of 10]: [Training] loss = 1.224182 * 50000, metric = 43.72% * 50000 5.661s (8832.4 samples/s);
Finished Epoch[9 of 10]: [Training] loss = 1.170496 * 50000, metric = 41.33% * 50000 5.612s (8909.5 samples/s);
Finished Epoch[10 of 10]: [Training] loss = 1.134483 * 50000, metric = 40.20% * 50000 5.494s (9100.8 samples/s);

Final Results: Minibatch[1-626]: errs = 35.6% * 10000

Now that we have a trained model, let us classify the following image of a truck. We use PIL to read the image.


In [66]:
# Figure 6
Image(url="https://cntk.ai/jup/201/00014.png", width=64, height=64)


Out[66]:

In [67]:
import PIL

In [68]:
# Download a sample image 
# (this is 00014.png from test dataset)
# Any image of size 32,32 can be evaluated

url = "https://cntk.ai/jup/201/00014.png"
myimg = np.array(PIL.Image.open(urlopen(url)), dtype=np.float32)

During training we have subtracted the mean from the input images. Here we take an approximate value of the mean and subtract it from the image.


In [69]:
def eval(pred_op, image_data):
    label_lookup = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]
    image_mean = 133.0
    image_data -= image_mean
    image_data = np.ascontiguousarray(np.transpose(image_data, (2, 0, 1)))
    
    result = np.squeeze(pred_op.eval({pred_op.arguments[0]:[image_data]}))
    
    # Return top 3 results:
    top_count = 3
    result_indices = (-np.array(result)).argsort()[:top_count]

    print("Top 3 predictions:")
    for i in range(top_count):
        print("\tLabel: {:10s}, confidence: {:.2f}%".format(label_lookup[result_indices[i]], result[result_indices[i]] * 100))

In [70]:
# Run the evaluation on the downloaded image
eval(pred_basic_model, myimg)


Top 3 predictions:
	Label: truck     , confidence: 99.10%
	Label: ship      , confidence: 0.42%
	Label: cat       , confidence: 0.21%

Adding dropout layer, with drop rate of 0.25, before the last dense layer:


In [74]:
from cntk.logging.graph import get_node_outputs
node_outputs = get_node_outputs(pred_basic_model)

In [77]:
node_string_output = str(node_outputs)
print(node_string_output[0:1024])
len(node_outputs)


[Output('Softmax22515_Output_0', [#], [10]), Output('Block22351_Output_0', [#], [10]), Output('Block22336_Output_0', [#], [64]), Output('Block22320_Output_0', [#], [64 x 3 x 3]), Output('Block22308_Output_0', [#], [64 x 7 x 7]), Output('Block22292_Output_0', [#], [32 x 7 x 7]), Output('Block22280_Output_0', [#], [32 x 15 x 15]), Output('Block22264_Output_0', [#], [32 x 15 x 15]), Output('Block22252_Output_0', [#], [32 x 32 x 32]), Output('ElementTimes21540_Output_0', [#], [3 x 32 x 32])]
Out[77]:
10

In [31]:
def create_basic_model_with_dropout(input, out_dims):

    with C.layers.default_options(activation=C.relu, init=C.glorot_uniform()):
        model = C.layers.Sequential([
            C.layers.For(range(3), lambda i: [
                C.layers.Convolution((5,5), [32,32,64][i], pad=True),
                C.layers.MaxPooling((3,3), strides=(2,2))
            ]),
            C.layers.Dense(64),
            C.layers.Dropout(0.25),
            C.layers.Dense(out_dims, activation=None)
        ])

    return model(input)

In [32]:
pred_basic_model_dropout = train_and_evaluate(reader_train, 
                                              reader_test, 
                                              max_epochs=5, 
                                              model_func=create_basic_model_with_dropout)


Training 116906 parameters in 10 parameter tensors.

Learning rate per minibatch: 0.01
Momentum per sample: 0.9983550962823424
Finished Epoch[1 of 5]: [Training] loss = 2.107428 * 50000, metric = 78.99% * 50000 6.717s (7443.8 samples/s);
Finished Epoch[2 of 5]: [Training] loss = 1.797806 * 50000, metric = 67.32% * 50000 6.130s (8156.6 samples/s);
Finished Epoch[3 of 5]: [Training] loss = 1.656954 * 50000, metric = 61.50% * 50000 6.099s (8198.1 samples/s);
Finished Epoch[4 of 5]: [Training] loss = 1.566658 * 50000, metric = 57.80% * 50000 6.089s (8211.5 samples/s);
Finished Epoch[5 of 5]: [Training] loss = 1.501140 * 50000, metric = 55.05% * 50000 6.070s (8237.2 samples/s);

Final Results: Minibatch[1-626]: errs = 48.3% * 10000

Add batch normalization after each convolution and before the last dense layer:


In [33]:
def create_basic_model_with_batch_normalization(input, out_dims):

    with C.layers.default_options(activation=C.relu, init=C.glorot_uniform()):
        model = C.layers.Sequential([
            C.layers.For(range(3), lambda i: [
                C.layers.Convolution((5,5), [32,32,64][i], pad=True),
                C.layers.BatchNormalization(map_rank=1),
                C.layers.MaxPooling((3,3), strides=(2,2))
            ]),
            C.layers.Dense(64),
            C.layers.BatchNormalization(map_rank=1),
            C.layers.Dense(out_dims, activation=None)
        ])

    return model(input)

In [34]:
pred_basic_model_bn = train_and_evaluate(reader_train, 
                                         reader_test, 
                                         max_epochs=5, 
                                         model_func=create_basic_model_with_batch_normalization)


Training 117290 parameters in 18 parameter tensors.

Learning rate per minibatch: 0.01
Momentum per sample: 0.9983550962823424
Finished Epoch[1 of 5]: [Training] loss = 1.567152 * 50000, metric = 55.53% * 50000 7.973s (6271.2 samples/s);
Finished Epoch[2 of 5]: [Training] loss = 1.229288 * 50000, metric = 43.80% * 50000 7.322s (6828.7 samples/s);
Finished Epoch[3 of 5]: [Training] loss = 1.101452 * 50000, metric = 38.72% * 50000 7.455s (6706.9 samples/s);
Finished Epoch[4 of 5]: [Training] loss = 1.027490 * 50000, metric = 35.96% * 50000 7.350s (6802.7 samples/s);
Finished Epoch[5 of 5]: [Training] loss = 0.968720 * 50000, metric = 33.73% * 50000 7.500s (6666.7 samples/s);

Final Results: Minibatch[1-626]: errs = 31.1% * 10000


In [43]:
eval(pred_basic_model_bn, myimg)


Top 3 predictions:
	Label: automobile, confidence: 42.31%
	Label: truck     , confidence: 37.58%
	Label: cat       , confidence: 16.02%

Let's implement an inspired VGG style network, using layer API, here the architecture:

VGG9
conv3-64
conv3-64
max3
conv3-96
conv3-96
max3
conv3-128
conv3-128
max3
FC-1024
FC-1024
FC-10

In [35]:
def create_vgg9_model(input, out_dims):
    with C.layers.default_options(activation=C.relu, init=C.glorot_uniform()):
        model = C.layers.Sequential([
            C.layers.For(range(3), lambda i: [
                C.layers.Convolution((3,3), [64,96,128][i], pad=True),
                C.layers.Convolution((3,3), [64,96,128][i], pad=True),
                C.layers.MaxPooling((3,3), strides=(2,2))
            ]),
            C.layers.For(range(2), lambda : [
                C.layers.Dense(1024)
            ]),
            C.layers.Dense(out_dims, activation=None)
        ])
        
    return model(input)

In [36]:
pred_vgg = train_and_evaluate(reader_train, 
                              reader_test, 
                              max_epochs=5, 
                              model_func=create_vgg9_model)


Training 2675978 parameters in 18 parameter tensors.

Learning rate per minibatch: 0.01
Momentum per sample: 0.9983550962823424
Finished Epoch[1 of 5]: [Training] loss = 2.268885 * 50000, metric = 84.66% * 50000 11.938s (4188.3 samples/s);
Finished Epoch[2 of 5]: [Training] loss = 1.878581 * 50000, metric = 70.13% * 50000 10.311s (4849.2 samples/s);
Finished Epoch[3 of 5]: [Training] loss = 1.693114 * 50000, metric = 63.13% * 50000 10.346s (4832.8 samples/s);
Finished Epoch[4 of 5]: [Training] loss = 1.561276 * 50000, metric = 57.63% * 50000 10.404s (4805.8 samples/s);
Finished Epoch[5 of 5]: [Training] loss = 1.469125 * 50000, metric = 53.66% * 50000 10.373s (4820.2 samples/s);

Final Results: Minibatch[1-626]: errs = 49.6% * 10000


In [44]:
eval(pred_vgg, myimg)


Top 3 predictions:
	Label: truck     , confidence: 48.35%
	Label: automobile, confidence: 44.02%
	Label: cat       , confidence: 3.37%

Residual Network (ResNet)

One of the main problem of a Deep Neural Network is how to propagate the error all the way to the first layer. For a deep network, the gradients keep getting smaller until it has no effect on the network weights. ResNet was designed to overcome such problem, by defining a block with identity path, as shown below:


In [39]:
# Figure 7
Image(url="https://cntk.ai/jup/201/ResNetBlock2.png")


Out[39]:

The idea of the above block is 2 folds:

  • During back propagation the gradients have a path that does not affect its magnitude.
  • The network need to learn residual mapping (delta to x).

So let's implements ResNet blocks using CNTK:

        ResNetNode                   ResNetNodeInc
            |                              |
     +------+------+             +---------+----------+
     |             |             |                    |
     V             |             V                    V
+----------+       |      +--------------+   +----------------+
| Conv, BN |       |      | Conv x 2, BN |   | SubSample, BN  |
+----------+       |      +--------------+   +----------------+
     |             |             |                    |
     V             |             V                    |
 +-------+         |         +-------+                |
 | ReLU  |         |         | ReLU  |                |
 +-------+         |         +-------+                |
     |             |             |                    |
     V             |             V                    |
+----------+       |        +----------+              |
| Conv, BN |       |        | Conv, BN |              |
+----------+       |        +----------+              |
     |             |             |                    |
     |    +---+    |             |       +---+        |
     +--->| + |<---+             +------>+ + +<-------+
          +---+                          +---+
            |                              |
            V                              V
        +-------+                      +-------+
        | ReLU  |                      | ReLU  |
        +-------+                      +-------+
            |                              |
            V                              V

In [40]:
def convolution_bn(input, filter_size, num_filters, strides=(1,1), init=C.he_normal(), activation=C.relu):
    if activation is None:
        activation = lambda x: x
        
    r = C.layers.Convolution(filter_size, 
                             num_filters, 
                             strides=strides, 
                             init=init, 
                             activation=None, 
                             pad=True, bias=False)(input)
    r = C.layers.BatchNormalization(map_rank=1)(r)
    r = activation(r)
    
    return r

def resnet_basic(input, num_filters):
    c1 = convolution_bn(input, (3,3), num_filters)
    c2 = convolution_bn(c1, (3,3), num_filters, activation=None)
    p  = c2 + input
    return C.relu(p)

def resnet_basic_inc(input, num_filters):
    c1 = convolution_bn(input, (3,3), num_filters, strides=(2,2))
    c2 = convolution_bn(c1, (3,3), num_filters, activation=None)

    s = convolution_bn(input, (1,1), num_filters, strides=(2,2), activation=None)
    
    p = c2 + s
    return C.relu(p)

def resnet_basic_stack(input, num_filters, num_stack):
    assert (num_stack > 0)
    
    r = input
    for _ in range(num_stack):
        r = resnet_basic(r, num_filters)
    return r

Let's write the full model:


In [41]:
def create_resnet_model(input, out_dims):
    conv = convolution_bn(input, (3,3), 16)
    r1_1 = resnet_basic_stack(conv, 16, 3)

    r2_1 = resnet_basic_inc(r1_1, 32)
    r2_2 = resnet_basic_stack(r2_1, 32, 2)

    r3_1 = resnet_basic_inc(r2_2, 64)
    r3_2 = resnet_basic_stack(r3_1, 64, 2)

    # Global average pooling
    pool = C.layers.AveragePooling(filter_shape=(8,8), strides=(1,1))(r3_2)    
    net = C.layers.Dense(out_dims, init=C.he_normal(), activation=None)(pool)
    
    return net

In [46]:
pred_resnet = train_and_evaluate(reader_train, reader_test, max_epochs=10, model_func=create_resnet_model)


Training 272474 parameters in 65 parameter tensors.

Learning rate per minibatch: 0.01
Momentum per sample: 0.9983550962823424
Finished Epoch[1 of 10]: [Training] loss = 1.890762 * 50000, metric = 70.05% * 50000 22.479s (2224.3 samples/s);
Finished Epoch[2 of 10]: [Training] loss = 1.582508 * 50000, metric = 58.54% * 50000 21.328s (2344.3 samples/s);
Finished Epoch[3 of 10]: [Training] loss = 1.462718 * 50000, metric = 53.66% * 50000 21.617s (2313.0 samples/s);
Finished Epoch[4 of 10]: [Training] loss = 1.379337 * 50000, metric = 49.96% * 50000 21.079s (2372.0 samples/s);
Finished Epoch[5 of 10]: [Training] loss = 1.309584 * 50000, metric = 47.57% * 50000 21.065s (2373.6 samples/s);
Finished Epoch[6 of 10]: [Training] loss = 1.247120 * 50000, metric = 45.25% * 50000 21.203s (2358.2 samples/s);
Finished Epoch[7 of 10]: [Training] loss = 1.195248 * 50000, metric = 43.28% * 50000 21.422s (2334.0 samples/s);
Finished Epoch[8 of 10]: [Training] loss = 1.144757 * 50000, metric = 41.40% * 50000 21.546s (2320.6 samples/s);
Finished Epoch[9 of 10]: [Training] loss = 1.103275 * 50000, metric = 39.54% * 50000 22.225s (2249.7 samples/s);
Finished Epoch[10 of 10]: [Training] loss = 1.066128 * 50000, metric = 38.16% * 50000 22.141s (2258.3 samples/s);

Final Results: Minibatch[1-626]: errs = 36.8% * 10000


In [45]:
eval(pred_resnet, myimg)


Top 3 predictions:
	Label: automobile, confidence: 87.49%
	Label: bird      , confidence: 6.12%
	Label: frog      , confidence: 2.61%

In [47]:
eval(pred_resnet, myimg)


Top 3 predictions:
	Label: automobile, confidence: 99.60%
	Label: bird      , confidence: 0.19%
	Label: airplane  , confidence: 0.18%

In [48]:
save('cifar10-resnet.model')


'' was not found in history, as a file, url, nor in the user namespace.

In [49]:
pred_resnet.save('cifar10-resnet.model')

In [59]:
m = C.load('cifar10-resnet.model')


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-59-46b4ded984af> in <module>()
----> 1 m = C.load('cifar10-resnet.model')

AttributeError: module 'cntk' has no attribute 'load'

In [53]:
eval(myimg)


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-53-ce4082a9b538> in <module>()
----> 1 eval(myimg)

TypeError: eval() missing 1 required positional argument: 'image_data'

In [54]:
eval(image_data=myimg)


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-54-eb55e5123f4d> in <module>()
----> 1 eval(image_data=myimg)

TypeError: eval() missing 1 required positional argument: 'pred_op'

In [55]:
r = block_root


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-55-f47d04f5d730> in <module>()
----> 1 r = block_root

NameError: name 'block_root' is not defined

In [60]:
from __future__ import print_function
import numpy as np
import cntk as C
from cntk.learners import sgd, learning_rate_schedule, UnitType
from cntk.logging import ProgressPrinter
from cntk.layers import Dense, Sequential

def generate_random_data(sample_size, feature_dim, num_classes):
     # Create synthetic data using NumPy.
     Y = np.random.randint(size=(sample_size, 1), low=0, high=num_classes)

     # Make sure that the data is separable
     X = (np.random.randn(sample_size, feature_dim) + 3) * (Y + 1)
     X = X.astype(np.float32)
     # converting class 0 into the vector "1 0 0",
     # class 1 into vector "0 1 0", ...
     class_ind = [Y == class_number for class_number in range(num_classes)]
     Y = np.asarray(np.hstack(class_ind), dtype=np.float32)
     return X, Y

def ffnet():
    inputs = 2
    outputs = 2
    layers = 2
    hidden_dimension = 50

    # input variables denoting the features and label data
    features = C.input_variable((inputs), np.float32)
    label = C.input_variable((outputs), np.float32)

    # Instantiate the feedforward classification model
    my_model = Sequential ([
                    Dense(hidden_dimension, activation=C.sigmoid),
                    Dense(outputs)])
    z = my_model(features)

    ce = C.cross_entropy_with_softmax(z, label)
    pe = C.classification_error(z, label)

    # Instantiate the trainer object to drive the model training
    lr_per_minibatch = learning_rate_schedule(0.125, UnitType.minibatch)
    progress_printer = ProgressPrinter(0)
    trainer = C.Trainer(z, (ce, pe), [sgd(z.parameters, lr=lr_per_minibatch)], [progress_printer])

    # Get minibatches of training data and perform model training
    minibatch_size = 25
    num_minibatches_to_train = 1024

    aggregate_loss = 0.0
    for i in range(num_minibatches_to_train):
        train_features, labels = generate_random_data(minibatch_size, inputs, outputs)
        # Specify the mapping of input variables in the model to actual minibatch data to be trained with
        trainer.train_minibatch({features : train_features, label : labels})
        sample_count = trainer.previous_minibatch_sample_count
        aggregate_loss += trainer.previous_minibatch_loss_average * sample_count

    last_avg_error = aggregate_loss / trainer.total_number_of_samples_seen

    test_features, test_labels = generate_random_data(minibatch_size, inputs, outputs)
    avg_error = trainer.test_minibatch({features : test_features, label : test_labels})
    print(' error rate on an unseen minibatch: {}'.format(avg_error))
    return last_avg_error, avg_error

np.random.seed(98052)
ffnet()


 average      since    average      since      examples
    loss       last     metric       last              
 ------------------------------------------------------
Learning rate per minibatch: 0.125
    0.773      0.773       0.56       0.56            25
    0.714      0.685       0.52        0.5            75
    0.698      0.686      0.463       0.42           175
    0.626      0.564      0.389      0.325           375
    0.569      0.515      0.289      0.195           775
      0.5      0.433      0.209      0.131          1575
     0.42      0.341      0.157      0.106          3175
    0.347      0.275      0.125     0.0925          6375
     0.29      0.232      0.102     0.0788         12775
    0.253      0.217     0.0907     0.0798         25575
 error rate on an unseen minibatch: 0.0
Out[60]:
(0.2533962619700469, 0.0)

In [ ]: