Neural network models

This notebook picks up after the simple_models notebook. After trying a range of classification algorithms, we'll try out some of the neural network models in [1]. These include fully-connected models of varying layer sizes, and finally convolutional models including the famous LeNet-5.

Along the way, we'll be using Keras which is a library sitting on top of Theano or Tensorflow. This allows easy construction, training and evaluation of neural nets. Before we get started, here's a recap of the simple_models notebook models.

[1] - Gradient-Based Learning Applied to Document Recognition, LeCun et al, Nov 1998


In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import pickle

plt.style.use('fivethirtyeight')
# plt.rcParams['font.family'] = 'serif'
plt.rcParams['font.serif'] = 'Helvetica'
plt.rcParams['font.monospace'] = 'Consolas'
plt.rcParams['font.size'] = 16
plt.rcParams['axes.labelsize'] = 16
plt.rcParams['axes.labelweight'] = 'bold'
plt.rcParams['xtick.labelsize'] = 14
plt.rcParams['ytick.labelsize'] = 14
plt.rcParams['legend.fontsize'] = 16
plt.rcParams['axes.titlesize'] = 20
plt.rcParams['lines.linewidth'] = 2

%matplotlib inline

# for auto-reloading external modules
%load_ext autoreload
%autoreload 2

Load pickle files

The original data files are processed using the convert_data.py script, and written out to pickle files. We can load these in as numpy arrays.


In [4]:
# Set up the file directory and names
DIR = '../input/'
X_TRAIN = DIR + 'train-images-idx3-ubyte.pkl'
Y_TRAIN = DIR + 'train-labels-idx1-ubyte.pkl'
X_TEST = DIR + 't10k-images-idx3-ubyte.pkl'
Y_TEST = DIR + 't10k-labels-idx1-ubyte.pkl'

def load_data():
    '''Loads pickled ubyte files with MNIST data
    INPUT: X_train_file, y_train_file - strings with training filenames
           X_test_file, y_test_File - strings with test filenames
    RETURNS: Tuple with (X_train, y_train, X_test, y_test)
    '''
    print('Loading pickle files')
    try:
        X_train = pickle.load( open( X_TRAIN, "rb" ) )
        y_train = pickle.load( open( Y_TRAIN, "rb" ) )
        X_test = pickle.load( open( X_TEST, "rb" ) )
        y_test = pickle.load( open( Y_TEST, "rb" ) )
    except:
        print('Error loading pickle file')
        return None
    
    return (X_train, y_train, X_test, y_test)

X_train, y_train, X_test,  y_test = load_data()


Loading pickle files

Helper functions

Before evaluating some models on the images, let's create some helper functions we can re-use later on. These deal with converting images to and from 1d and 2d versions, plotting images, resizing them, etc.


In [5]:
def flatten_images(X):
    ''' Converts images to 1-d vectors
    INPUT: X - Input array of shape [n, w, h]
    RETURNS: Numpy array of shape [n, w*h]
    '''
    n, w, h = X.shape
    X_flat = X.reshape((n, w * h))
    return X_flat

def square_images(X, w=None, h=None):
    '''Converts single-vector images into square images 
    INPUT: X - numpy array of images in single-vector form
           w - width of images to convert to
           h - height of images to convert to
    RETURNS: Numpy array of shape [n, w, h]
    '''
    
    assert X.shape[1] == w * h, "Error - Can't square array of shape {} to {}".format(X.shape, (w, h))
    n = X.shape[0]
    X_square = X.reshape((n, w, h))
    return X_square


N_TRAIN, W, H = X_train.shape
N_TEST, w_test, h_test = X_test.shape

# Flatten the images
X_train = flatten_images(X_train)
X_test = flatten_images(X_test)

# Do some checks on the data
assert N_TRAIN == 60000, 'Error - expected 60000 training images, got {}'.format(N_TRAIN)
assert N_TEST == 10000, 'Error - expected 60000 training images, got {}'.format(N_TEST)
assert W == w_test, 'Error - width mismatch. Train {}, Test {}'.format(w, w_test)
assert H == h_test, 'Error - height mismatch. Train {}, Test {}'.format(h, h_test)

assert np.array_equal(X_train, flatten_images(square_images(X_train, W, H)))
assert X_train.shape[0] == y_train.shape[0]
assert X_test.shape[0] == y_test.shape[0]

print('Loaded train images shape {}, labels shape {}'.format(X_train.shape, y_train.shape))
print('Loaded test images shape {}, labels shape {}'.format(X_test.shape, y_test.shape))


Loaded train images shape (60000, 784), labels shape (60000, 1)
Loaded test images shape (10000, 784), labels shape (10000, 1)

Data preparation

This section sets up global constants used in all models (to ensure a fair comparison). It also prepares the data by converting y values to one-hot, and normalizing X inputs.


In [6]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer, StandardScaler

# Keras Common configuration
SEED = 1234 # Fix the seed for repeatability
N_JOBS=-2 # Leave 1 core free for UI updates
VERBOSE=2 # 3 is the most verbose level
EPOCHS = 20 # todo ! Check how many epochs in the paper
BATCH = 256 # todo ! Check this in the paper too

In [7]:
# Useful helper functions
def stratified_subsample(X, y, num_rows, verbose=False):
    '''Creates a stratified subsample of X and y
    INPUT: X and y, numpy arrays
    RETURNS: subset of X and y, maintaining class balances
    '''
    # Create a stratified, shuffled subset of the training data if needed
    N = X.shape[0]
    
    new_X, new_y = X, y
    if num_rows < N:
        if verbose:
            print('Reducing size from {} to {} examples'.format(N, num_rows))
        new_X, _, new_y, _ = train_test_split(X_train, y_train, # Undersample by dropping "test" data
                                              train_size=num_rows, random_state=SEED)    
    return new_X, new_y
        
def onehot_encode_y(y_train, y_test):
    '''Convert y_train and y_test to a one-hot encoding version
    INPUT: y_train - np.array of size (n_train,)
           y_test - np.array of size (n_test,)
    RETURNS: y_train - np.array of size (n_train, n_classes)
             y_test - np.arary of size (n_test, n_classes)
    '''    
    print('Converting y variables to one-hot encoding..')
    lbe = LabelBinarizer()
    lbe.fit(y_train)
    y_train = lbe.transform(y_train)
    y_test = lbe.transform(y_test)
    return y_train, y_test

def z_norm_X(X_train, X_test):
    '''Z-normalizes X_train and X_test with 0 mean and 1 std. dev.
    INPUT: X_train - training set
           X_test - test set
    RETURNS: X_train - normalized version of same size
             X_test - normalized version (using X_train parameters)
    '''
    print('Z-normalizing X data..')    
    std = StandardScaler()
    X_train = X_train.astype(np.float32)
    X_test = X_test.astype(np.float32)
    std.fit(X_train)
    X_train = std.transform(X_train)
    X_test = std.transform(X_test)
    return X_train, X_test

In [8]:
y_train, y_test = onehot_encode_y(y_train, y_test)
X_train, X_test = z_norm_X(X_train, X_test)
scores = dict()

print('Train images shape {}, labels shape {}'.format(X_train.shape, y_train.shape))
print('Test images shape {}, labels shape {}'.format(X_test.shape, y_test.shape))


Converting y variables to one-hot encoding..
Z-normalizing X data..
Train images shape (60000, 784), labels shape (60000, 10)
Test images shape (10000, 784), labels shape (10000, 10)

[1] C.5 - Baseline fully-connected models (original dataset)

We'll first compare the performance of different fully-connected models on fully-connected networks of varying layers and size. These are all trained on the 28x28 dataset.

Helper class


In [9]:
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.optimizers import SGD

# Create a dictionary to store model training and test info
models = dict()

class KerasFCModel(object):
    
    def __init__(self, model_name, model_type, input_dim, layers, 
                 activation, output_activation, verbose=2):
        '''Initializes a new keras model'''
        self.model_name = model_name
        self.verbose = verbose
        
        model = model_type
        for idx, size in enumerate(layers):
            
            # First layer has to take input from image files
            if idx == 0:
                if self.verbose == 2:
                    print('Adding input dense layer, input dim {}, dim {}'.format(input_dim, size))
                model.add(Dense(size, input_dim=input_dim))
                model.add(Activation(activation))
                
            # Last layer has to include the output activation
            elif idx == len(layers) - 1:
                if self.verbose == 2:
                    print('Adding dense layer {}, size {}, activation {}'.format(idx, size, activation))
                model.add(Dense(size))
                model.add(Activation(output_activation))
                
            # Layers other than first and last have standard activation
            else: 
                if self.verbose == 2:
                    print('Adding output layer {}, size {}, activation {}'.format(idx, size, output_activation))
                model.add(Dense(size))
                model.add(Activation(activation))
                
        if self.verbose > 0:
            print('Model summary:\n')
            model.summary()
        
        self.model = model
        
    def compile_model(self, loss, optimizer, metrics):
        '''Compile the model'''
        self.metrics = metrics
        self.loss = loss
        self.optimizer = optimizer
        # Need to flip error vs accuracy 
        metrics = ['acc' if metric is 'error' else metric for metric in metrics]
        self.model.compile(loss=loss, optimizer=optimizer, metrics=metrics)
          
    def fit(self, X, y, epochs, batch_size):
        '''Fit model to training data'''
        self.history = self.model.fit(X, y, epochs=epochs, batch_size=batch_size, verbose=self.verbose)

    def evaluate(self, X, y, batch_size):
        '''Evaluates the model on test data'''
        output = self.model.evaluate(X, y, batch_size=batch_size)
        results = dict()
        for idx, metric in enumerate(self.model.metrics_names):
            if metric == 'acc':
                results['error'] = 1.0 - output[idx]
            else:
                results[metric] = output[idx]                
        self.results = results
    
    def report(self):
        '''Prints a recap of the model, how it was trained, and performance'''
        report = dict()
        if self.verbose > 0:
            report['model_info'] = self.model.summary()
            report['loss'] = self.loss
            report['optimizer'] = self.optimizer.get_config()
            report['metrics'] = self.metrics
            report['history'] = self.history
        report['results'] = self.results
        return report


Using Theano backend.
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10).  Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: GeForce GTX 1070 (CNMeM is disabled, cuDNN 5105)

In [10]:
# Helper function to evaluate fully-connected models
def evaluate_fc_model(name, layers, activation, optimizer,
                            X_tr, y_tr, X_te, y_te,
                            epochs, batch_size,
                            verbose=2):
    """Creates, trains, and evaluates neural network on provided data"""
    
    print('Creating Keras model {}'.format(name))
    model = KerasFCModel(model_name=name, model_type=Sequential(), 
                           input_dim=784, layers=layers, 
                           activation=activation, output_activation='softmax',
                           verbose=verbose)

    print('Compiling model')
    model.compile_model(loss='categorical_crossentropy',
                        optimizer=optimizer,
                        metrics=['error'])

    print('Training model')
    model.fit(X_tr, y_tr, epochs=epochs, batch_size=batch_size)

    print('Evaluating model')
    model.evaluate(X_te, y_te, batch_size=batch_size)

    print('\nTest results: {:.2f}% error'.format(100.0 * model.report()['results']['error']))
    return model

Fully connected single-hidden layer networks


In [11]:
%%time

fc_results = dict()
fc_results['fc-300-10'] = evaluate_fc_model('fc-300-10', layers=(300,10), activation='tanh', 
                                            optimizer=SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True),
                                            X_tr=X_train, y_tr=y_train, X_te=X_test, y_te=y_test,
                                            epochs=EPOCHS, batch_size=BATCH,
                                            verbose=0)


Creating Keras model fc-300-10
Compiling model
Training model
Evaluating model
  256/10000 [..............................] - ETA: 0s
Test results: 2.92% error
CPU times: user 11.1 s, sys: 2.52 s, total: 13.6 s
Wall time: 13.8 s

In [12]:
%%time

# FC 1000-10
fc_results['fc-1000-10'] = evaluate_fc_model('fc-1000-10', layers=(1000,10), activation='tanh', 
                                            optimizer=SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True),
                                            X_tr=X_train, y_tr=y_train, X_te=X_test, y_te=y_test,
                                            epochs=EPOCHS, batch_size=BATCH,
                                            verbose=0)


Creating Keras model fc-1000-10
Compiling model
Training model
Evaluating model
 9984/10000 [============================>.] - ETA: 0s
Test results: 2.58% error
CPU times: user 12.7 s, sys: 2.62 s, total: 15.3 s
Wall time: 15.3 s

Two hidden layer fully connected networks


In [13]:
%%time

# FC 300-100-10
fc_results['fc-300-100-10'] = evaluate_fc_model('fc-300-100-10', layers=(300,100,10), activation='tanh', 
                                            optimizer=SGD(lr=0.1, decay=1e-5, momentum=0.9, nesterov=True),
                                            X_tr=X_train, y_tr=y_train, X_te=X_test, y_te=y_test,
                                            epochs=EPOCHS, batch_size=BATCH,
                                            verbose=0)


Creating Keras model fc-300-100-10
Compiling model
Training model
Evaluating model
 7680/10000 [======================>.......] - ETA: 0s
Test results: 2.93% error
CPU times: user 12.3 s, sys: 4.46 s, total: 16.8 s
Wall time: 16.8 s

In [14]:
%%time

# FC 500-150-10
fc_results['fc-500-150-10'] = evaluate_fc_model('fc-500-150-10', layers=(500,150,10), activation='tanh', 
                                            optimizer=SGD(lr=0.1, decay=1e-5, momentum=0.9, nesterov=True),
                                            X_tr=X_train, y_tr=y_train, X_te=X_test, y_te=y_test,
                                            epochs=EPOCHS, batch_size=BATCH,
                                            verbose=0)


Creating Keras model fc-500-150-10
Compiling model
Training model
Evaluating model
 7680/10000 [======================>.......] - ETA: 0s
Test results: 2.66% error
CPU times: user 13.1 s, sys: 4.64 s, total: 17.7 s
Wall time: 17.7 s

In [15]:
# Compile the FC results so far into a dataframe for easy plotting

fc_scores = {result: value.results['error'] for result, value in fc_results.items()}
fc_scores_df = pd.DataFrame.from_dict(fc_scores, orient='index')

fc_scores_df.columns = ['error']
fc_scores_df['error'] *= 100.0
fc_scores_df = fc_scores_df.sort_values('error', ascending=True)
fc_scores_df.to_pickle('fc_scores.pkl')
fc_scores_df


Out[15]:
error
fc-1000-10 2.58
fc-500-150-10 2.66
fc-300-10 2.92
fc-300-100-10 2.93

In [16]:
fig, ax = plt.subplots(1,1, figsize=(6,6))
fc_scores_df.plot.barh(width=0.4, ax=ax, legend=None)
ax.set(title="Fully-connected neural net test-set accuracy", ylabel="Network", xlabel="%age error");
plt.savefig('fc_scores.png', bbox_inches='tight', dpi=150)


Convolutional neural networks

Now let's see how much further we can improve performance with convolutional neural networks. For the convolutional networks, we need a 2-d image instead of the flattened 1-d vector the fully connected networks used. We also need to add padding around each of the images to increase their size to 32x32, as in [1].


In [17]:
# Load image pickle files
X_train, y_train, X_test,  y_test = load_data()
X_train.shape


Loading pickle files
Out[17]:
(60000, 28, 28)

In [18]:
#Plot a few random numbers to sanity check their size and that they look correct
N = 3
indexes = np.random.choice(X_train.shape[0], N)
fig, ax = plt.subplots(1, N)

for num, idx in enumerate(indexes):
    ax[num].imshow(X_train[idx])
    ax[num].set(title="Label={}\n{}x{}".format(y_train[idx], *X_train[idx].shape))


Padding images to 32x32 while centering image

In [1], the images are padded to 32x32 to ensure every pixel of the input image ends up in the center of the receptive fields of the highest level feature receptors.


In [19]:
def image_border(image, size, fill):
    """
    Adds a border around the nupmy array of the gizen size and value
    """
    im_w, im_h = image.shape
    im_dtype = image.dtype
    
    new_image = np.full((im_w + (2 * size), im_h + (2 * size)),
                        fill_value=fill, dtype=im_dtype)
    new_image[size:im_h + size, size:im_w + size] = image
    
    assert new_image.dtype == image.dtype
    assert new_image.shape[0] == image.shape[0] + (2 * size)
    assert new_image.shape[1] == image.shape[1] + (2 * size)
    assert np.array_equal(image, new_image[size:size+im_h, size:size+im_w])
    return new_image

In [20]:
N = 3
indexes = np.random.choice(X_train.shape[0], N)
fig, ax = plt.subplots(2, N, figsize=(10,6))

for num, idx in enumerate(indexes):
    ax[0, num].imshow(X_train[idx])
    ax[0, num].set(title="Label={}\n{}x{}".format(y_train[idx], *X_train[idx].shape))
    
    X_resize = image_border(X_train[idx], 2, 0)
    ax[1, num].imshow(X_resize)
    ax[1, num].set(title="Label={}\n{}x{}".format(y_train[idx], *X_resize.shape))

plt.tight_layout()



In [21]:
from tqdm import tqdm

# resize all the training and test images
n_train = X_train.shape[0]
n_test = X_test.shape[0]


def resize_images(images, description):
    """
    Iterates through lowest order dimension, and resizes images
    """
    new_images = np.zeros((images.shape[0], 32, 32))

    for index in tqdm(range(images.shape[0]), desc=description):
        new_images[index] = image_border(images[index], 2, 0)
        
    return new_images

X_resize_train = resize_images(X_train, "Resizing train images")
X_resize_test = resize_images(X_test, "Resizing test images")
X_train, X_test = X_resize_train, X_resize_test

print('New X_train shape: {}, new x_test shape: {}'.format(X_train.shape, X_test.shape))
print('y_train shape: {}, y_test shape: {}'.format(y_train.shape, y_test.shape))


Resizing train images: 100%|██████████| 60000/60000 [00:02<00:00, 29615.00it/s]
Resizing test images: 100%|██████████| 10000/10000 [00:00<00:00, 31094.83it/s]
New X_train shape: (60000, 32, 32), new x_test shape: (10000, 32, 32)
y_train shape: (60000, 1), y_test shape: (10000, 1)

Z-Normalizing images, and converting labels to one-hot


In [22]:
from keras import backend as K

# Input images need to be Z-normalized, and need to be flattened to 1-d vector and re-squared afterwards
X_train, X_test = z_norm_X(flatten_images(X_train), flatten_images(X_test))
X_train, X_test = square_images(X_train, 32, 32), square_images(X_test, 32, 32)

# y values need to be converted to one-hot
y_train, y_test = onehot_encode_y(y_train, y_test)

# Need to add explicit shape of 1 as we have 1 channel for B&W images
X_train, X_test = X_train[:,:,:, np.newaxis], X_test[:,:,:, np.newaxis] # Need explicit single channel 

print('New X_train shape: {}, new x_test shape: {}'.format(X_train.shape, X_test.shape))
print('y_train shape: {}, y_test shape: {}'.format(y_train.shape, y_test.shape))


Z-normalizing X data..
Converting y variables to one-hot encoding..
New X_train shape: (60000, 32, 32, 1), new x_test shape: (10000, 32, 32, 1)
y_train shape: (60000, 10), y_test shape: (10000, 10)

In [23]:
# Our channels are in the least significant order of the np array (32, 32, 1). 
# Make sure the current backend matches this ordering, and doesn't expect (1, 32, 32).
assert K.image_data_format() == 'channels_last'

LeNet-5

This is the best performing network, found on page 7 of [1].


In [24]:
from keras.models import Sequential
from keras.layers import Conv2D, Dense, Activation, AveragePooling2D, Flatten

def lenet5_model(verbose=False):
    """
    Creates and returns a lenet5 model
    """

    # Create the model
    model = Sequential()

    model.add(Conv2D(filters=6, kernel_size=(5, 5), strides=(1, 1), input_shape=(32, 32, 1))) # C1
    model.add(AveragePooling2D(pool_size=(2, 2))) # S2
    model.add(Activation('tanh'))

    model.add(Conv2D(filters=16, kernel_size=(5, 5), strides=(1, 1))) # C3
    model.add(AveragePooling2D(pool_size=(2, 2))) # S4
    model.add(Activation('tanh'))

    model.add(Conv2D(filters=120, kernel_size=(5, 5), strides=(1, 1))) # C5
    model.add(Activation('tanh'))

    model.add(Flatten())
    model.add(Dense(120)) # F6
    model.add(Activation('tanh'))

    model.add(Dense(10))
    model.add(Activation('softmax'))

    if verbose:
        print(model.summary())
    return model
    
lenet5 = lenet5_model(verbose=True)


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 28, 28, 6)         156       
_________________________________________________________________
average_pooling2d_1 (Average (None, 14, 14, 6)         0         
_________________________________________________________________
activation_11 (Activation)   (None, 14, 14, 6)         0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 10, 10, 16)        2416      
_________________________________________________________________
average_pooling2d_2 (Average (None, 5, 5, 16)          0         
_________________________________________________________________
activation_12 (Activation)   (None, 5, 5, 16)          0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 1, 1, 120)         48120     
_________________________________________________________________
activation_13 (Activation)   (None, 1, 1, 120)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 120)               0         
_________________________________________________________________
dense_11 (Dense)             (None, 120)               14520     
_________________________________________________________________
activation_14 (Activation)   (None, 120)               0         
_________________________________________________________________
dense_12 (Dense)             (None, 10)                1210      
_________________________________________________________________
activation_15 (Activation)   (None, 10)                0         
=================================================================
Total params: 66,422.0
Trainable params: 66,422.0
Non-trainable params: 0.0
_________________________________________________________________
None

In [33]:
# Create a new model every time

def evaluate_model(model, optimizer, cv_split=None, verbose=False):
    """
    Wrapper method to create, train and optionally CV, and check performance on test set
    """

    if verbose:
        print('\nCompiling model')
        model.summary()
        
    model.compile(optimizer=optimizer,
                   loss='categorical_crossentropy', 
                   metrics=['accuracy'])

    if verbose:
        print('\nTraining model')
    history = model.fit(X_train, y_train, validation_split=cv_split, 
                        epochs=20, batch_size=256, verbose=1 if verbose else 0)

    if verbose:
        print('\nEvaluating model')
    score = model.evaluate(X_test, y_test, batch_size=256)

    if verbose:
        print('\nTest results: Loss = {:.4f}, Error = {:.4f}'.format(score[0], 1.0 - score[1]))
    
    results = {'model': model, 'history': history.history, 'loss': score[0], 'acc': score[1], 'err': 1.0 - score[1]}
    return results

results = evaluate_model(model=lenet5_model(),
                          optimizer=SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True),
                          cv_split=0.2, verbose=True)


Compiling model
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_13 (Conv2D)           (None, 28, 28, 6)         156       
_________________________________________________________________
average_pooling2d_7 (Average (None, 14, 14, 6)         0         
_________________________________________________________________
activation_31 (Activation)   (None, 14, 14, 6)         0         
_________________________________________________________________
conv2d_14 (Conv2D)           (None, 10, 10, 16)        2416      
_________________________________________________________________
average_pooling2d_8 (Average (None, 5, 5, 16)          0         
_________________________________________________________________
activation_32 (Activation)   (None, 5, 5, 16)          0         
_________________________________________________________________
conv2d_15 (Conv2D)           (None, 1, 1, 120)         48120     
_________________________________________________________________
activation_33 (Activation)   (None, 1, 1, 120)         0         
_________________________________________________________________
flatten_5 (Flatten)          (None, 120)               0         
_________________________________________________________________
dense_19 (Dense)             (None, 120)               14520     
_________________________________________________________________
activation_34 (Activation)   (None, 120)               0         
_________________________________________________________________
dense_20 (Dense)             (None, 10)                1210      
_________________________________________________________________
activation_35 (Activation)   (None, 10)                0         
=================================================================
Total params: 66,422.0
Trainable params: 66,422.0
Non-trainable params: 0.0
_________________________________________________________________

Training model
Train on 48000 samples, validate on 12000 samples
Epoch 1/20
48000/48000 [==============================] - 4s - loss: 0.2699 - acc: 0.9194 - val_loss: 0.1327 - val_acc: 0.9587
Epoch 2/20
48000/48000 [==============================] - 4s - loss: 0.0921 - acc: 0.9728 - val_loss: 0.0781 - val_acc: 0.9758
Epoch 3/20
48000/48000 [==============================] - 4s - loss: 0.0625 - acc: 0.9808 - val_loss: 0.0721 - val_acc: 0.9793
Epoch 4/20
48000/48000 [==============================] - 4s - loss: 0.0469 - acc: 0.9852 - val_loss: 0.0706 - val_acc: 0.9785
Epoch 5/20
48000/48000 [==============================] - 4s - loss: 0.0354 - acc: 0.9884 - val_loss: 0.0620 - val_acc: 0.9812
Epoch 6/20
48000/48000 [==============================] - 4s - loss: 0.0260 - acc: 0.9918 - val_loss: 0.0709 - val_acc: 0.9797
Epoch 7/20
48000/48000 [==============================] - 4s - loss: 0.0206 - acc: 0.9937 - val_loss: 0.0634 - val_acc: 0.9832
Epoch 8/20
48000/48000 [==============================] - 4s - loss: 0.0148 - acc: 0.9958 - val_loss: 0.0604 - val_acc: 0.9820
Epoch 9/20
48000/48000 [==============================] - 4s - loss: 0.0115 - acc: 0.9967 - val_loss: 0.0571 - val_acc: 0.9838
Epoch 10/20
48000/48000 [==============================] - 4s - loss: 0.0073 - acc: 0.9983 - val_loss: 0.0618 - val_acc: 0.9835
Epoch 11/20
48000/48000 [==============================] - 4s - loss: 0.0053 - acc: 0.9988 - val_loss: 0.0572 - val_acc: 0.9846
Epoch 12/20
48000/48000 [==============================] - 4s - loss: 0.0036 - acc: 0.9994 - val_loss: 0.0607 - val_acc: 0.9833
Epoch 13/20
48000/48000 [==============================] - 4s - loss: 0.0026 - acc: 0.9997 - val_loss: 0.0566 - val_acc: 0.9847
Epoch 14/20
48000/48000 [==============================] - 4s - loss: 0.0020 - acc: 0.9998 - val_loss: 0.0575 - val_acc: 0.9850
Epoch 15/20
48000/48000 [==============================] - 4s - loss: 0.0013 - acc: 0.9999 - val_loss: 0.0573 - val_acc: 0.9852
Epoch 16/20
48000/48000 [==============================] - 4s - loss: 0.0012 - acc: 0.9999 - val_loss: 0.0591 - val_acc: 0.9851
Epoch 17/20
48000/48000 [==============================] - 4s - loss: 9.1285e-04 - acc: 1.0000 - val_loss: 0.0583 - val_acc: 0.9852
Epoch 18/20
48000/48000 [==============================] - 4s - loss: 7.3716e-04 - acc: 1.0000 - val_loss: 0.0575 - val_acc: 0.9860
Epoch 19/20
48000/48000 [==============================] - 4s - loss: 6.5757e-04 - acc: 1.0000 - val_loss: 0.0585 - val_acc: 0.9853
Epoch 20/20
48000/48000 [==============================] - 4s - loss: 5.7490e-04 - acc: 1.0000 - val_loss: 0.0587 - val_acc: 0.9856

Evaluating model
 7936/10000 [======================>.......] - ETA: 0s
Test results: Loss = 0.0574, Error = 0.0155

Visualizing results of Lenet-5


In [34]:
def plot_history(hist):
    """
    Plots the history object returned by the .fit() call
    """
    for metric in ('acc', 'loss', 'val_acc', 'val_loss'):
        assert metric in hist.keys()
    
    hist_df = pd.DataFrame(hist)
    fig, axes = plt.subplots(1,2,figsize=(14, 6))

    hist_df['err'] = 1 - hist_df['acc']
    hist_df['val_err'] = 1 - hist_df['val_acc']
    
    hist_df[['val_err', 'err']].plot.line(ax=axes[0])
    hist_df[['val_loss', 'loss']].plot.line(ax=axes[1])
    axes[0].set(title="Error during training")
    axes[0].legend(labels=["Test", "Training"])
    axes[1].set(title="Loss during training")
    axes[1].legend(labels=["Test", "Training"])
 
    for ax in axes:
        ax.set_xticks(range(hist_df.shape[0]))
        ax.set(xlabel="epoch", ylabel="Accuracy / loss")
        
#     return fig, axes

plot_history(results['history'])



In [36]:
all_scores_df = pd.DataFrame.from_dict({'lenet-5': results['err'] * 100}, orient='index')
all_scores_df.columns = ['error']
all_scores_df = fc_scores_df.append(all_scores_df)
all_scores_df = all_scores_df.sort_values('error')
# all_scores_df.sort_values('error').plot.barh()
# results['err']


fig, ax = plt.subplots(1,1, figsize=(6,6))
all_scores_df.plot.barh(width=0.4, ax=ax, legend=None)
ax.set(title="Neural networks test set accuracy", ylabel="Network", xlabel="%age error");
plt.savefig('lenet_scores.png', bbox_inches='tight', dpi=150)

all_scores_df


Out[36]:
error
lenet-5 1.55
fc-1000-10 2.58
fc-500-150-10 2.66
fc-300-10 2.92
fc-300-100-10 2.93

Cross validation of optimizer

So far we've used a stock SGD optimizer with default settings. Can we randomize our parameters and come up with a better one?

It looks like it would need a lot of random guesses to get the same performance as the stock SGD we used so far. We could still try


In [37]:
def random_sgd(verbose=False):
    """
    Generates an SGD optimizer with random values
    """
    lr = 10 ** np.random.randint(-6, -3)
    momentum = 0.1 * np.random.randint(8, 10)
    decay = 10 ** np.random.randint(-5, -3)
    nesterov = np.random.uniform() < 0.5
    
    sgd = SGD(lr=lr, momentum=momentum, decay=decay, nesterov=nesterov)
    if verbose:
        print('sgd: lr={}, momentum={}, decay={}, nesterov={}'.format(lr, momentum, decay, nesterov))
        
    return sgd
    
# random_sgd(verbose=True)

In [38]:
# Randomize optimizer and run for 100 samples
def best_model(N=100):
    """
    Returns the best model after random search for N SGD values
    """
    best_result = None
    best_acc = 0

    for n in range(N):
        print('\nIteration {}'.format(n))
        sgd_opt = random_sgd()
        result = evaluate_model(lenet5_model(), sgd_opt)
        current_acc = result['acc']
        
        if current_acc > best_acc:
            print('\n-> Updating best model. Current acc: {}, old acc: {}'.format(n, current_acc, best_acc))
            best_result = result
            best_acc = current_acc
    
    return result

# best_lenet5 = best_model(N=5)

Modernizing the network - max-pooling, relu, and dropout

Since [1] was published, new layer types have been invented to improve ease-of training, and reduce overfitting. Let's retrofit the original network with these new improvements, and see how the performance changes.

When creating Dropout layers in between the activation and convolutional layers, it's not clear how many to use, and what the dropout percentage should be. Let's pass in configurations to the model creation method so we can try different combinations later.


In [39]:
from keras.models import Sequential
from keras.layers import MaxPooling2D, Dropout

def lenet5_modern_model(dropout_cnt=0, dropout_val=0.5, bias_init=None, verbose=False):
    """
    Creates and returns a lenet5 model with retrofitted modern layers:
    - ReLU activations
    - Max pooling
    - Dropout
    """

    # Create the model
    model = Sequential()

    if bias_init:
        bias = bias_init
    else:
        bias='zeros'
        
    model.add(Conv2D(filters=6, kernel_size=(5, 5), strides=(1, 1), 
                     input_shape=(32, 32, 1), bias_initializer=bias)) # C1
    model.add(MaxPooling2D(pool_size=(2, 2))) # S2
    model.add(Activation('relu'))
    if dropout_cnt >= 1:
        model.add(Dropout(dropout_val))
    
    model.add(Conv2D(filters=16, kernel_size=(5, 5), strides=(1, 1),
                    bias_initializer=bias)) # C3
    model.add(MaxPooling2D(pool_size=(2, 2))) # S4
    model.add(Activation('relu'))
    if dropout_cnt >= 2:
        model.add(Dropout(dropout_val))

    model.add(Conv2D(filters=120, kernel_size=(5, 5), strides=(1, 1),
                    bias_initializer=bias)) # C5
    model.add(Activation('relu'))
    if dropout_cnt >= 3:
        model.add(Dropout(dropout_val))

    model.add(Flatten())
    model.add(Dense(120)) # F6
    model.add(Activation('relu'))
    if dropout_cnt >= 4:
        model.add(Dropout(dropout_val))
    
    model.add(Dense(10))
    model.add(Activation('softmax'))

    if verbose:
        print(model.summary())
    return model
    
lenet5 = lenet5_modern_model(verbose=True)


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_16 (Conv2D)           (None, 28, 28, 6)         156       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 14, 14, 6)         0         
_________________________________________________________________
activation_36 (Activation)   (None, 14, 14, 6)         0         
_________________________________________________________________
conv2d_17 (Conv2D)           (None, 10, 10, 16)        2416      
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 5, 5, 16)          0         
_________________________________________________________________
activation_37 (Activation)   (None, 5, 5, 16)          0         
_________________________________________________________________
conv2d_18 (Conv2D)           (None, 1, 1, 120)         48120     
_________________________________________________________________
activation_38 (Activation)   (None, 1, 1, 120)         0         
_________________________________________________________________
flatten_6 (Flatten)          (None, 120)               0         
_________________________________________________________________
dense_21 (Dense)             (None, 120)               14520     
_________________________________________________________________
activation_39 (Activation)   (None, 120)               0         
_________________________________________________________________
dense_22 (Dense)             (None, 10)                1210      
_________________________________________________________________
activation_40 (Activation)   (None, 10)                0         
=================================================================
Total params: 66,422.0
Trainable params: 66,422.0
Non-trainable params: 0.0
_________________________________________________________________
None

Run a grid search over dropout layer count and percentage

Let's do a grid search to find out how many dropout layers gives the best performance, and what their percentage should be. We're restricting the state space by adding dropout layers from the first layer onwards, and using the same percentage on each layer.

This is going to take a long time to run!!


In [41]:
%%time

results = dict()
best_dropout = None
min_error = 1.00
RUNS = 1

# Exhaustive grid search of dropout configs
for dropout_cnt in range(1, 4):
    for dropout_val in (0.1, 0.2, 0.3, 0.4, 0.5):
        
        print('\nTesting {} runs with {} layer(s) of dropout, {} value .. '.format(RUNS, dropout_cnt, dropout_val))
        errors = np.zeros((RUNS,))
        for index in range(RUNS): # Run each combination multiple times
            
            model = lenet5_modern_model(dropout_cnt=dropout_cnt, dropout_val=dropout_val)
            result = evaluate_model(model,
                                    optimizer=SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True),
                                    cv_split=None, verbose=False)

            errors[index] = 1.0 - result['acc']
        
        result_key = (dropout_cnt, dropout_val)
        mean, std = errors.mean(), errors.std()
        
        # Update the best settings based on worst case 
        if mean < min_error:
            print("\nUpdating best settings:")
            best_dropout = result_key
            min_error = mean
            
        results[result_key] = {'mean': mean, 'std': std}
        
        print('\nDropout: {} @ {}, error: {:.4f} ({:.4f} std dev)'.format(*result_key, mean, std))

print(results)


Testing 1 runs with 1 layer(s) of dropout, 0.1 value .. 
 7168/10000 [====================>.........] - ETA: 0s
Updating best settings:

Dropout: 1 @ 0.1, error: 0.0115 (0.0000 std dev)

Testing 1 runs with 1 layer(s) of dropout, 0.2 value .. 
 8448/10000 [========================>.....] - ETA: 0s
Dropout: 1 @ 0.2, error: 0.0124 (0.0000 std dev)

Testing 1 runs with 1 layer(s) of dropout, 0.3 value .. 
 8192/10000 [=======================>......] - ETA: 0s
Updating best settings:

Dropout: 1 @ 0.3, error: 0.0100 (0.0000 std dev)

Testing 1 runs with 1 layer(s) of dropout, 0.4 value .. 
 8448/10000 [========================>.....] - ETA: 0s
Dropout: 1 @ 0.4, error: 0.0102 (0.0000 std dev)

Testing 1 runs with 1 layer(s) of dropout, 0.5 value .. 
 7424/10000 [=====================>........] - ETA: 0s
Dropout: 1 @ 0.5, error: 0.0103 (0.0000 std dev)

Testing 1 runs with 2 layer(s) of dropout, 0.1 value .. 
 7936/10000 [======================>.......] - ETA: 0s
Updating best settings:

Dropout: 2 @ 0.1, error: 0.0090 (0.0000 std dev)

Testing 1 runs with 2 layer(s) of dropout, 0.2 value .. 
 7936/10000 [======================>.......] - ETA: 0s
Dropout: 2 @ 0.2, error: 0.0090 (0.0000 std dev)

Testing 1 runs with 2 layer(s) of dropout, 0.3 value .. 
 7936/10000 [======================>.......] - ETA: 0s
Dropout: 2 @ 0.3, error: 0.0098 (0.0000 std dev)

Testing 1 runs with 2 layer(s) of dropout, 0.4 value .. 
 7936/10000 [======================>.......] - ETA: 0s
Dropout: 2 @ 0.4, error: 0.0140 (0.0000 std dev)

Testing 1 runs with 2 layer(s) of dropout, 0.5 value .. 
 7936/10000 [======================>.......] - ETA: 0s
Dropout: 2 @ 0.5, error: 0.0117 (0.0000 std dev)

Testing 1 runs with 3 layer(s) of dropout, 0.1 value .. 
 8960/10000 [=========================>....] - ETA: 0s
Updating best settings:

Dropout: 3 @ 0.1, error: 0.0083 (0.0000 std dev)

Testing 1 runs with 3 layer(s) of dropout, 0.2 value .. 
 7424/10000 [=====================>........] - ETA: 0s
Dropout: 3 @ 0.2, error: 0.0093 (0.0000 std dev)

Testing 1 runs with 3 layer(s) of dropout, 0.3 value .. 
 7680/10000 [======================>.......] - ETA: 0s
Dropout: 3 @ 0.3, error: 0.0108 (0.0000 std dev)

Testing 1 runs with 3 layer(s) of dropout, 0.4 value .. 
 7424/10000 [=====================>........] - ETA: 0s
Dropout: 3 @ 0.4, error: 0.0118 (0.0000 std dev)

Testing 1 runs with 3 layer(s) of dropout, 0.5 value .. 
 8704/10000 [=========================>....] - ETA: 0s
Dropout: 3 @ 0.5, error: 0.0182 (0.0000 std dev)
{(2, 0.5): {'std': 0.0, 'mean': 0.011700000000000044}, (3, 0.1): {'std': 0.0, 'mean': 0.0082999999999999741}, (1, 0.1): {'std': 0.0, 'mean': 0.011499999999999955}, (3, 0.3): {'std': 0.0, 'mean': 0.010800000000000032}, (3, 0.5): {'std': 0.0, 'mean': 0.018199999999999994}, (1, 0.4): {'std': 0.0, 'mean': 0.010199999999999987}, (2, 0.1): {'std': 0.0, 'mean': 0.009000000000000008}, (3, 0.2): {'std': 0.0, 'mean': 0.009299999999999975}, (3, 0.4): {'std': 0.0, 'mean': 0.011800000000000033}, (2, 0.3): {'std': 0.0, 'mean': 0.0098000000000000309}, (1, 0.2): {'std': 0.0, 'mean': 0.012399999999999967}, (2, 0.4): {'std': 0.0, 'mean': 0.014000000000000012}, (1, 0.3): {'std': 0.0, 'mean': 0.010000000000000009}, (1, 0.5): {'std': 0.0, 'mean': 0.010299999999999976}, (2, 0.2): {'std': 0.0, 'mean': 0.009000000000000008}}
CPU times: user 23min, sys: 9min 43s, total: 32min 44s
Wall time: 32min 47s

In [42]:
print('Best dropout settings: {}, giving error of: {}'.format(best_dropout, min_error))


Best dropout settings: (3, 0.1), giving error of: 0.008299999999999974

In [43]:
new_score_df = pd.DataFrame.from_dict({'lenet-5-dropout-relu': min_error * 100}, orient='index')
new_score_df.columns = ['error']
all_scores_df = all_scores_df.append(new_score_df)
all_scores_df = all_scores_df.sort_values('error')
# all_scores_df.sort_values('error').plot.barh()
# results['err']

In [44]:
fig, ax = plt.subplots(1,1, figsize=(6,6))
all_scores_df.plot.barh(width=0.4, ax=ax, legend=None)
ax.set(title="Neural networks test set accuracy", ylabel="Network", xlabel="%age error");
plt.savefig('lenet_modern_scores.png', bbox_inches='tight', dpi=150)

all_scores_df


Out[44]:
error
lenet-5-dropout-relu 0.83
lenet-5 1.55
fc-1000-10 2.58
fc-500-150-10 2.66
fc-300-10 2.92
fc-300-100-10 2.93

How much training data is enough ?!

Now with the best dropout settings, let's train on increasingly more data, and see how the performance varies on the test set. This section uses the best_dropout setting found in the previous cell.


In [45]:
# Shortcut the cross validation above as it takes ages
best_dropout = 3, 0.2

In [50]:
# Create a new evaluation method where the data is passed in

def evaluate_model(model, X_tr, y_tr, X_te, y_te, optimizer, epochs=20, batch=256, cv_split=None, verbose=False):
    """
    Wrapper method to create, train and optionally CV, and check performance on test set
    """

    if verbose:
        print('\nCompiling model')
        model.summary()
        
    model.compile(optimizer=optimizer,
                   loss='categorical_crossentropy', 
                   metrics=['accuracy'])

    if verbose:
        print('\nTraining model')
    history = model.fit(X_tr, y_tr, validation_split=cv_split, 
                        epochs=epochs, batch_size=batch, verbose=1 if verbose else 0)

    if verbose:
        print('\nEvaluating model')
        
    train_score = model.evaluate(X_tr, y_tr, batch_size=batch)
    test_score = model.evaluate(X_te, y_te, batch_size=batch)

    if verbose:
        print('\nTest results: Loss = {:.2f}, Error = {:.2f}'.format(100.0 * test_score[0], 100.0 * (1.0 - test_score[1])))
    
    results = {'model': model, 'history': history.history, 
               'train_loss': train_score[0], 'train_acc': train_score[1], 'train_err': 1.0 - train_score[1],
               'test_loss': test_score[0], 'test_acc': test_score[1], 'test_err': 1.0 - test_score[1],
              }
    return results

In [51]:
%%time
n_rows = range(5000, 65000, 5000)

results = dict()

for n in n_rows:
    print('\nValidating train and test set performance with {} training examples'.format(n))
    X_train_sub, y_train_sub = stratified_subsample(X_train, y_train, n)
    
    model = lenet5_modern_model(dropout_cnt=best_dropout[0], dropout_val=best_dropout[1])
    result = evaluate_model(model,
                            X_train_sub, y_train_sub, X_test, y_test,
                            optimizer=SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True),
                            cv_split=None, verbose=False)

    results[n] = result

train_sub_df = pd.DataFrame.from_dict(results, orient='index')
train_sub_df.index.name="N"
train_sub_df = train_sub_df[['train_err', 'test_err']]
train_sub_df


Validating train and test set performance with 5000 training examples
 8704/10000 [=========================>....] - ETA: 0s
Validating train and test set performance with 10000 training examples
 8960/10000 [=========================>....] - ETA: 0s
Validating train and test set performance with 15000 training examples
15000/15000 [==============================] - 0s     
 7936/10000 [======================>.......] - ETA: 0s
Validating train and test set performance with 20000 training examples
 7936/10000 [======================>.......] - ETA: 0s
Validating train and test set performance with 25000 training examples
 7424/10000 [=====================>........] - ETA: 0s
Validating train and test set performance with 30000 training examples
 7424/10000 [=====================>........] - ETA: 0s
Validating train and test set performance with 35000 training examples
 8960/10000 [=========================>....] - ETA: 0s
Validating train and test set performance with 40000 training examples
 7424/10000 [=====================>........] - ETA: 0s
Validating train and test set performance with 45000 training examples
45000/45000 [==============================] - 0s     
 7936/10000 [======================>.......] - ETA: 0s
Validating train and test set performance with 50000 training examples
 8960/10000 [=========================>....] - ETA: 0s
Validating train and test set performance with 55000 training examples
 7424/10000 [=====================>........] - ETA: 0s
Validating train and test set performance with 60000 training examples
 7424/10000 [=====================>........] - ETA: 0sCPU times: user 10min 39s, sys: 4min 8s, total: 14min 47s
Wall time: 14min 48s

In [52]:
train_sub_df.plot.line()


Out[52]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fe3499bd9b0>

Keras' Image augmentation

Keras provides a number of built-in functions to. Let's use those:

  • Convert labels to one-hot array of binary vaues
  • Z-Standardize images
  • Include image augmentations to generate new training examples.

In [53]:
from keras.utils import to_categorical

NUM_CLASSES = 10 # Number of digits (0 to 9)

# Load the data
X_train, y_train, X_test,  y_test = load_data()

# Convert single value label into 10-dim array of bools
y_train = to_categorical(y_train, NUM_CLASSES)
y_test = to_categorical(y_test, NUM_CLASSES)

# Resize the images to be centered in 32x32 (instead of 28x28)
X_train = resize_images(X_train, "Resizing train images")
X_test = resize_images(X_test, "Resizing test images")

# Need to add explicit shape of 1 as we have 1 channel for B&W images. This is "channels-last" ordering
X_train, X_test = X_train[:,:,:, np.newaxis], X_test[:,:,:, np.newaxis] # Need explicit single channel 

print('Shapes: X_train: {}, y_train: {}, X_test: {}, y_test: {}'.format(X_train.shape, y_train.shape,
                                                                       X_test.shape, y_test.shape))


Resizing train images:   4%|▍         | 2613/60000 [00:00<00:02, 26129.71it/s]
Loading pickle files
Resizing train images: 100%|██████████| 60000/60000 [00:01<00:00, 31311.99it/s]
Resizing test images: 100%|██████████| 10000/10000 [00:00<00:00, 20028.45it/s]
Shapes: X_train: (60000, 32, 32, 1), y_train: (60000, 10), X_test: (10000, 32, 32, 1), y_test: (10000, 10)


In [54]:
from keras.preprocessing.image import ImageDataGenerator


def augmented_plot(images):
    """
    Plots original images, and their augmented versions
    """
    fig, ax = plt.subplots(2, 3, figsize=(10,6))
    
    for idx, axis in enumerate(ax):
        ax[0, idx].imshow(images[idx].squeeze())

        # Plot a few augmented values
    datagen = ImageDataGenerator(
        rotation_range=20,
        width_shift_range=0.2,
        height_shift_range=0.2,
        shear_range=np.pi/8.0
    )


    for X_batch, y_batch in datagen.flow(X_train[:3], y_train[:3], batch_size=3):
        implot(X_batch)
        break

In [55]:
def implot(images):
    """
    Plots the images on rows of 3
    """
    fig, ax = plt.subplots(1, 3, figsize=(10,6))
    
    for idx, axis in enumerate(ax):
        ax[idx].imshow(images[idx].squeeze())
        
implot(X_train[:3])
plt.savefig('noaug_digits.png', bbox_inches='tight', dpi=150)



In [57]:
# Plot a few augmented values
datagen = ImageDataGenerator(
    featurewise_center=True,
    featurewise_std_normalization=True,
    rotation_range=45,
#     width_shift_range=0.1,
#     height_shift_range=0.1,
    shear_range=np.pi/8.0
)


for X_batch, y_batch in datagen.flow(X_train[:3], y_train[:3], batch_size=3):
    implot(X_batch)
    break

plt.savefig('aug_digits.png', bbox_inches='tight', dpi=150)


/home/tim/anaconda3/envs/tensorflow/lib/python3.5/site-packages/keras/preprocessing/image.py:500: UserWarning: This ImageDataGenerator specifies `featurewise_center`, but it hasn'tbeen fit on any training data. Fit it first by calling `.fit(numpy_data)`.
  warnings.warn('This ImageDataGenerator specifies '
/home/tim/anaconda3/envs/tensorflow/lib/python3.5/site-packages/keras/preprocessing/image.py:508: UserWarning: This ImageDataGenerator specifies `featurewise_std_normalization`, but it hasn'tbeen fit on any training data. Fit it first by calling `.fit(numpy_data)`.
  warnings.warn('This ImageDataGenerator specifies '

Comparing original dataset with augmented images


In [58]:
# Train model with augmented data

def evaluate_model(model, datagen, 
                   X_tr, y_tr, X_te, y_te, 
                   optimizer, batch=256, epochs=20, verbose=False):
    """
    Wrapper method to create, train and optionally CV, and check performance on test set
    """

    if verbose:
        print('\nCompiling model')
        model.summary()
        
    model.compile(optimizer=optimizer,
                   loss='categorical_crossentropy', 
                   metrics=['accuracy'])
        
    if verbose:
        print('\nTraining model (no image augmentation)')

    # Only difference is now the generator provides flow of images for minibatches
    history = model.fit_generator(datagen.flow(X_tr, y_tr, batch_size=batch),
                                  steps_per_epoch=int(X_tr.shape[0] / batch), epochs=epochs,  
                                  verbose=1 if verbose else 0)
    
    if verbose:
        print('\nEvaluating model')
        
    train_score = model.evaluate(X_tr, y_tr, batch_size=batch)
    test_score = model.evaluate(X_te, y_te, batch_size=batch)

    if verbose:
        print('\nTest results: Loss = {:.2f}, Error = {:.2f}'.format(100.0 * test_score[0], 100.0 * (1.0 - test_score[1])))
    
    results = {'model': model, 'history': history.history, 
               'train_loss': train_score[0], 'train_acc': train_score[1], 'train_err': 1.0 - train_score[1],
               'test_loss': test_score[0], 'test_acc': test_score[1], 'test_err': 1.0 - test_score[1],
              }
    return results

In [62]:
# Do a run with no augmentation for a baseline

N = 1

datagen_std = ImageDataGenerator(
    featurewise_center=True,
    featurewise_std_normalization=True
)

datagen_std.fit(X_train)

results = np.zeros((N,))

for n in range(N): 
    print('\nEvaluating model {} of {}'.format(n+1, N))

    result = evaluate_model(lenet5_modern_model(dropout_cnt=best_dropout[0], 
                                                dropout_val=best_dropout[1]),
                            datagen_std,
                            X_train, y_train, X_test, y_test,
                            optimizer=SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True),
                            batch=256, epochs=60,
                            verbose=False)

    results[n] = result['test_err']

error = results.mean()
std = results.std()
print('\n{} runs, mean error: {:.6f}, std dev: {:.6f}'.format(N, error, std))


Evaluating model 1 of 1
60000/60000 [==============================] - 0s     
 6912/10000 [===================>..........] - ETA: 0s
1 runs, mean error: 0.007100, std dev: 0.000000

In [63]:
plot_history(result['history'])


---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-63-a4d46e4cfae0> in <module>()
----> 1 plot_history(result['history'])

<ipython-input-34-83eaac4a3169> in plot_history(hist)
      4     """
      5     for metric in ('acc', 'loss', 'val_acc', 'val_loss'):
----> 6         assert metric in hist.keys()
      7 
      8     hist_df = pd.DataFrame(hist)

AssertionError: 

In [65]:
result['history']


Out[65]:
{'acc': [0.82929353632478631,
  0.95291577396893412,
  0.96302557579003745,
  0.96774571508280915,
  0.9712607124047159,
  0.97388859135490335,
  0.97511047137632811,
  0.9761817086234601,
  0.97763792183160403,
  0.97951258706995425,
  0.9809185859667916,
  0.98157137118371718,
  0.98312801288677276,
  0.98297737011247988,
  0.98339582220653698,
  0.98366363149437597,
  0.98404860742346256,
  0.98532070162849239,
  0.98577262985559477,
  0.98537091587594827,
  0.98530396357793248,
  0.98604043920728446,
  0.98545460628837456,
  0.98707820028944582,
  0.98722884309566394,
  0.9869275576109291,
  0.98737948577418067,
  0.98709493837193107,
  0.98721210494932798,
  0.98798205674365047,
  0.9881326994860179,
  0.98895286552779615,
  0.98803227099110624,
  0.98778119975382717,
  0.98816617565098841,
  0.98865158010691201,
  0.98880222281735408,
  0.98995715054076305,
  0.98932110337439738,
  0.99000736472436812,
  0.98983998393144079,
  0.98965586505602821,
  0.98928762717750152,
  0.99005757900374936,
  0.99092795929298338,
  0.99019148363170617,
  0.98960565077664697,
  0.9910618639847909,
  0.98997388859132296,
  0.99067688805570431,
  0.99107860203535081,
  0.99054298339582214,
  0.99042581685035069,
  0.99081079274751183,
  0.99159748256046887,
  0.99030865020910308,
  0.99101164973733513,
  0.99109534008591071,
  0.99168117300482062,
  0.99164769683984999],
 'loss': [0.5205908866965363,
  0.15279298603343147,
  0.11825202239440863,
  0.10266396484722348,
  0.090437776219033914,
  0.082191532652328272,
  0.077691070635687809,
  0.074000203050174171,
  0.067594074586782474,
  0.062971893705958404,
  0.06002358181564707,
  0.058952815843677224,
  0.052695668130378771,
  0.053974909232016484,
  0.051939649496429034,
  0.049861281124019947,
  0.0502904100781561,
  0.046766089702480987,
  0.045870797941087081,
  0.045977323556974349,
  0.044615036272705716,
  0.044159459777228259,
  0.043979945596074065,
  0.040269792025053826,
  0.039631351192063291,
  0.039925198874326079,
  0.038798325770421339,
  0.039021875029883732,
  0.038265684838035108,
  0.038050586228015978,
  0.03658474829440999,
  0.03549829059458539,
  0.034956184072678399,
  0.035278183562221156,
  0.0351642170810227,
  0.034073651388654688,
  0.033548396866270765,
  0.031364701548616196,
  0.031818404382700324,
  0.029988649592050973,
  0.031321446295956325,
  0.030956439140148513,
  0.031778935113563597,
  0.030012622251897155,
  0.028354855729930451,
  0.029035437037049926,
  0.031588781748743189,
  0.027969340644637148,
  0.030504839538472359,
  0.027835489277282445,
  0.026408969067500024,
  0.02683232204432898,
  0.028266643361055284,
  0.02765754020578828,
  0.026567793573962696,
  0.029024865400459145,
  0.026262892730410367,
  0.027821747453441324,
  0.024754127274600597,
  0.025923991671426148]}

In [ ]:
result['train_err'], result['test_err']

In [67]:
# Do a run with no augmentation for a baseline

N = 1

print('Standardizing images')
datagen_aug = ImageDataGenerator(
    featurewise_center=True,
    featurewise_std_normalization=True,
        rotation_range=45,
#     width_shift_range=0.1,
#     height_shift_range=0.1,
    shear_range=np.pi/8.0


)

datagen_aug.fit(X_train)

print('Training and evaluating model')
result = evaluate_model(lenet5_modern_model(dropout_cnt=best_dropout[0], 
                                            dropout_val=best_dropout[1]),
                        datagen_aug,
                        X_train, y_train, X_test, y_test,
                        optimizer=SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True),
                        batch=256, epochs=20,
                        verbose=False)

print('\nTrain error: {}, test error: {}'.format(result['train_err'], result['test_err']))


Standardizing images
Training and evaluating model
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-67-f6e7b930a75d> in <module>()
     24                         optimizer=SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True),
     25                         batch=256, epochs=20,
---> 26                         verbose=False)
     27 
     28 print('\nTrain error: {}, test error: {}'.format(result['train_err'], result['test_err']))

<ipython-input-58-a6778c4f4084> in evaluate_model(model, datagen, X_tr, y_tr, X_te, y_te, optimizer, batch, epochs, verbose)
     22     history = model.fit_generator(datagen.flow(X_tr, y_tr, batch_size=batch),
     23                                   steps_per_epoch=int(X_tr.shape[0] / batch), epochs=epochs,
---> 24                                   verbose=1 if verbose else 0)
     25 
     26     if verbose:

~/anaconda3/envs/tensorflow/lib/python3.5/site-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
     86                 warnings.warn('Update your `' + object_name +
     87                               '` call to the Keras 2 API: ' + signature, stacklevel=2)
---> 88             return func(*args, **kwargs)
     89         wrapper._legacy_support_signature = inspect.getargspec(func)
     90         return wrapper

~/anaconda3/envs/tensorflow/lib/python3.5/site-packages/keras/models.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_q_size, workers, pickle_safe, initial_epoch)
   1095                                         workers=workers,
   1096                                         pickle_safe=pickle_safe,
-> 1097                                         initial_epoch=initial_epoch)
   1098 
   1099     @interfaces.legacy_generator_methods_support

~/anaconda3/envs/tensorflow/lib/python3.5/site-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
     86                 warnings.warn('Update your `' + object_name +
     87                               '` call to the Keras 2 API: ' + signature, stacklevel=2)
---> 88             return func(*args, **kwargs)
     89         wrapper._legacy_support_signature = inspect.getargspec(func)
     90         return wrapper

~/anaconda3/envs/tensorflow/lib/python3.5/site-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_q_size, workers, pickle_safe, initial_epoch)
   1843                             break
   1844                         else:
-> 1845                             time.sleep(wait_time)
   1846 
   1847                     if not hasattr(generator_output, '__len__'):

KeyboardInterrupt: 

In [ ]:


In [ ]:


In [ ]:
%%time

N = 1

all_results = dict()

image_gen_desc = 'no_aug'
for image_gen in (None, ImageDataGenerator(rotation_range=20,
                                   width_shift_range=0.2,
                                   height_shift_range=0.2,
                                   shear_range=np.pi/4.0)):
#     print('Using Image generator: {}'.format(image_gen_desc))
    
    for epoch in range(20, 120, 20):
        print('Training for {} epochs'.format(epoch))
        
        results = np.zeros((N,))
        for n in range(N): 
            print('\nEvaluating model {} of {}'.format(n+1, N))

            result = evaluate_model(lenet5_modern_model(dropout_cnt=best_dropout[0], 
                                                        dropout_val=best_dropout[1]),
                                    image_gen,
                                    X_train, y_train, X_test, y_test,
                                    optimizer=SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True),
                                    batch=256, epochs=epoch,
                                    cv_split=None, verbose=True)

            results[n] = result['test_err']
            
        error = results.mean()
        std = results.std()
        print('\n{} runs, {} epochs, mean error: {:.2f}, std dev: {:.2f}'.format(N, epoch, error, std))

        all_results[(image_gen_desc, epoch)] = (error, std)

    image_gen_desc = 'aug'


        
print(all_results)

In [ ]:
print(all_results)

Confusion matrix

Mis-classified examples


In [ ]:

Ensembling models

Train multiple models, keep the ones with the least correlation and take mojority vote from them.


In [ ]:
# Read back the weights of the best performing network and visualize them

In [ ]: