I. Imports


In [1]:
import keras
import numpy as np

from keras.datasets import mnist
from keras.optimizers import Adam
from keras.models import Sequential
from keras.preprocessing import image
from keras.layers.core import Dense
from keras.layers.core import Lambda
from keras.layers.core import Flatten
from keras.layers.core import Dropout
from keras.layers.pooling import MaxPooling2D
from keras.layers.convolutional import Convolution2D
from keras.layers.normalization import BatchNormalization
from keras.utils.np_utils import to_categorical


Using Theano backend.
/home/wnixalo/miniconda3/envs/FAI/lib/python2.7/site-packages/theano/gpuarray/dnn.py:135: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to version 5.1.
  warnings.warn("Your cuDNN version is more recent than "
Using cuDNN version 6021 on context None
Mapped name None to device cuda: GeForce GTX 870M (0000:01:00.0)

I want to import Vgg16 as well because I'll want it's low-level features


In [ ]:
# import os, sys
# sys.path.insert(1, os.path.join('../utils/'))

Actually, looks like Vgg's ImageNet weights won't be needed.


In [ ]:
# from vgg16 import Vgg16
# vgg = Vgg16()

II. Load Data


In [2]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

III. Preprocessing

Keras Convolutional layers expect color channels, so expand an empty dimension in the input data, to account for no colors.


In [3]:
x_train = np.expand_dims(x_train, 1) # can also enter <axis=1> for <1>
x_test = np.expand_dims(x_test, 1)
x_train.shape


Out[3]:
(60000, 1, 28, 28)

One-Hot Encoding the outputs:


In [4]:
y_train, y_test = to_categorical(y_train), to_categorical(y_test)

Since this notebook's models are all mimicking Vgg16, the input data should be preprocessed in the same way: in this case normalized by subtracting the mean and dividing by the standard deviation. It turns out this is a good idea generally.


In [5]:
x_mean = x_train.mean().astype(np.float32)
x_stdv = x_train.std().astype(np.float32)
def norm_input(x): return (x - x_mean) / x_stdv

Create Data Batch Generator

ImageDataGenerator with no arguments will return a generator. Later, when data is augmented, it'll be told how to do so. I don't know what batch-size should be set to: in Lecture it was 64.


In [6]:
gen = image.ImageDataGenerator()
trn_batches = gen.flow(x_train, y_train, batch_size=64)
tst_batches = gen.flow(x_test, y_test, batch_size=64)

General workflow, going forward:

  • Define the model's architecture.
  • Run 1 Epoch at default learning rate (0.01 ~ 0.001 depending on optimizer) to get it started.
  • Jack up the learning to 0.1 (as high as you'll ever want to go) and run 1 Epoch, possibly more if you can get away with it.
  • Lower the learning rate by a factor of 10 and run for a number of Epochs -- repeat until model begins to overfit (acc > valacc)

Points on internal architecture:

  • Each model will have a data-preprocessing Lambda layer, which normalizes the input and assigns a shape of (1 color-channel x 28 pixels x 28 pixels)
  • Weights are flattened before entering FC layers
  • Convolutional Layers will come in 2 pairs (because this is similar to the Vgg model).
  • Convol layer-pairs will start with 32 3x3 filters and double to 64 3x3 layers
  • A MaxPooling Layer comes after each Convol-pair.
  • When Batch-Normalization is applied, it is done after every layer but last (excluding MaxPooling).
  • Final layer is always an FC softmax layer with 10 outputs for our 10 digits.
  • Dropout, when applied, should increase toward later layers.
  • Optimizer used in Lecture was Adam(), all layers but last use a ReLU activation, loss function is categorical cross-entropy.

1. Linear Model

aka 'Dense', 'Fully-Connected'


In [24]:
def LinModel():
    model = Sequential([
        Lambda(norm_input, input_shape=(1, 28, 28)),
        Flatten(),
        Dense(10, activation='softmax')
    ])
    model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [25]:
Linear_model = LinModel()
Linear_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1,
                          validation_data=tst_batches, nb_val_samples=trn_batches.n)


/Users/WayNoxchi/Miniconda3/envs/FAI/lib/python2.7/site-packages/keras/layers/core.py:622: UserWarning: `output_shape` argument not specified for layer lambda_5 and cannot be automatically inferred with the Theano backend. Defaulting to output shape `(None, 1, 28, 28)` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
Epoch 1/1
60000/60000 [==============================] - 21s - loss: 0.4237 - acc: 0.8734 - val_loss: 0.3017 - val_acc: 0.9143
Out[25]:
<keras.callbacks.History at 0x111b44690>

In [27]:
Linear_model.optimizer.lr=0.1
Linear_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1,
                          validation_data=tst_batches, nb_val_samples=tst_batches.n)


Epoch 1/1
60000/60000 [==============================] - 12s - loss: 0.2972 - acc: 0.9162 - val_loss: 0.2886 - val_acc: 0.9182
Out[27]:
<keras.callbacks.History at 0x112838790>

In [28]:
Linear_model.optimizer.lr=0.01
Linear_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=4,
                          validation_data=tst_batches, nb_val_samples=tst_batches.n)


Epoch 1/4
60000/60000 [==============================] - 16s - loss: 0.2843 - acc: 0.9199 - val_loss: 0.2712 - val_acc: 0.9250
Epoch 2/4
60000/60000 [==============================] - 11s - loss: 0.2769 - acc: 0.9212 - val_loss: 0.2854 - val_acc: 0.9246
Epoch 3/4
60000/60000 [==============================] - 11s - loss: 0.2698 - acc: 0.9240 - val_loss: 0.2893 - val_acc: 0.9211
Epoch 4/4
60000/60000 [==============================] - 11s - loss: 0.2708 - acc: 0.9243 - val_loss: 0.2820 - val_acc: 0.9197
Out[28]:
<keras.callbacks.History at 0x1128388d0>

In [29]:
Linear_model.optimizer.lr=0.001
Linear_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=8,
                          validation_data=tst_batches, nb_val_samples=tst_batches.n)


Epoch 1/8
60000/60000 [==============================] - 16s - loss: 0.2648 - acc: 0.9255 - val_loss: 0.2776 - val_acc: 0.9217
Epoch 2/8
60000/60000 [==============================] - 13s - loss: 0.2612 - acc: 0.9265 - val_loss: 0.2699 - val_acc: 0.9249
Epoch 3/8
60000/60000 [==============================] - 11s - loss: 0.2649 - acc: 0.9265 - val_loss: 0.2766 - val_acc: 0.9237
Epoch 4/8
60000/60000 [==============================] - 13s - loss: 0.2563 - acc: 0.9292 - val_loss: 0.2867 - val_acc: 0.9227
Epoch 5/8
60000/60000 [==============================] - 11s - loss: 0.2586 - acc: 0.9283 - val_loss: 0.2894 - val_acc: 0.9208
Epoch 6/8
60000/60000 [==============================] - 13s - loss: 0.2561 - acc: 0.9286 - val_loss: 0.2790 - val_acc: 0.9237
Epoch 7/8
60000/60000 [==============================] - 18s - loss: 0.2564 - acc: 0.9289 - val_loss: 0.2861 - val_acc: 0.9233
Epoch 8/8
60000/60000 [==============================] - 17s - loss: 0.2548 - acc: 0.9292 - val_loss: 0.2733 - val_acc: 0.9269
Out[29]:
<keras.callbacks.History at 0x1088a2ed0>

2. Single Dense Layer

This is what people in the 80s & 90s thought of as a 'Neural Network': a single Fully-Connected hidden layer. I don't yet know why the hidden layer is ouputting 512 units. For natural-image recognition it's 4096. I'll see whether a ReLU or Softmax hidden layer works better.

By the way, the training and hyper-parameter tuning process should be automated. I want to use a NN to figure out how to do that for me.


In [30]:
def FCModel():
    model = Sequential([
        Lambda(norm_input, input_shape=(1, 28, 28)),
        Dense(512, activation='relu'),
        Flatten(),
        Dense(10, activation='softmax')
    ])
    model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [32]:
FC_model = FCModel()
FC_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1,
                      validation_data=tst_batches, nb_val_samples=tst_batches.n)


/Users/WayNoxchi/Miniconda3/envs/FAI/lib/python2.7/site-packages/keras/layers/core.py:622: UserWarning: `output_shape` argument not specified for layer lambda_7 and cannot be automatically inferred with the Theano backend. Defaulting to output shape `(None, 1, 28, 28)` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
Epoch 1/1
60000/60000 [==============================] - 34s - loss: 0.2062 - acc: 0.9385 - val_loss: 0.1386 - val_acc: 0.9594
Out[32]:
<keras.callbacks.History at 0x113c7ce10>

In [34]:
FC_model.optimizer=0.1
FC_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1,
                      validation_data=tst_batches, nb_val_samples=tst_batches.n)


Epoch 1/1
60000/60000 [==============================] - 36s - loss: 0.1188 - acc: 0.9644 - val_loss: 0.1266 - val_acc: 0.9637
Out[34]:
<keras.callbacks.History at 0x113e6bbd0>

In [35]:
FC_model.optimizer=0.01
FC_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=4,
                      validation_data=tst_batches, nb_val_samples=tst_batches.n)


Epoch 1/4
60000/60000 [==============================] - 44s - loss: 0.0971 - acc: 0.9707 - val_loss: 0.1309 - val_acc: 0.9630
Epoch 2/4
60000/60000 [==============================] - 34s - loss: 0.0819 - acc: 0.9748 - val_loss: 0.1187 - val_acc: 0.9651
Epoch 3/4
60000/60000 [==============================] - 33s - loss: 0.0679 - acc: 0.9783 - val_loss: 0.1309 - val_acc: 0.9661
Epoch 4/4
60000/60000 [==============================] - 29s - loss: 0.0568 - acc: 0.9823 - val_loss: 0.1212 - val_acc: 0.9664
Out[35]:
<keras.callbacks.History at 0x113e7cd90>

With an accuracy of 0.9823 and validation accuracy of 0.9664, the model's starting to overfit significantly and hit its limits, so it's time to go on to the next technique.

3. Basic 'VGG' style Convolutional Neural Network

I'm specifying an output shape equal to the input shape, to suppress the warnings keras was giving me; and it stated it was defaulting to that anyway. Or maybe I should've written output_shape=input_shape

Aha: yes it's as I thought. See this thread -- output_shape warnings were added to Keras, and neither vgg16.py (nor I until now) were specifying output_shape. It's fine.

The first time I ran this, I forgot to have 2 pairs of Conv layers. At the third λr=0.01 epoch I had acc/val of 0.9964, 0.9878

Also noticing: in lecture JH was using a GPU which I think was an NVidia Titan X. I'm using an Intel Core i5 CPU on a MacBook Pro. His epochs took on average 6 seconds, mine are taking 180~190. Convolutions are also the most computationally-intensive part of the NN being built here.

Interestingly, the model with 2 Conv-layer pairs is taking avg 160s. Best Acc/Val: 0.9968/0.9944

Final: 0.9975/0.9918 - massive overfitting


In [46]:
def ConvModel():
    model = Sequential([
        Lambda(norm_input, input_shape=(1, 28, 28), output_shape=(1, 28, 28)),
        Convolution2D(32, 3, 3, activation='relu'),
        Convolution2D(32, 3, 3, activation='relu'),
        MaxPooling2D(),
        Convolution2D(64, 3, 3, activation='relu'),
        Convolution2D(64, 3, 3, activation='relu'),
        MaxPooling2D(),
        Flatten(),
        Dense(512, activation='relu'),
        Dense(10, activation='softmax')
    ])
    model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [47]:
CNN_model = ConvModel()
CNN_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1,
                       validation_data=tst_batches, nb_val_samples=tst_batches.n)


Epoch 1/1
60000/60000 [==============================] - 168s - loss: 0.1039 - acc: 0.9682 - val_loss: 0.0385 - val_acc: 0.9866
Out[47]:
<keras.callbacks.History at 0x123ed46d0>

In [48]:
CNN_model.optimizer=0.1
CNN_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
                       validation_data=tst_batches, nb_val_samples=tst_batches.n)


Epoch 1/1
60000/60000 [==============================] - 174s - loss: 0.0352 - acc: 0.9896 - val_loss: 0.0449 - val_acc: 0.9851
Out[48]:
<keras.callbacks.History at 0x124513250>

In [49]:
CNN_model.optimizer=0.01
CNN_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=4, verbose=1,
                       validation_data=tst_batches, nb_val_samples=tst_batches.n)


Epoch 1/4
60000/60000 [==============================] - 157s - loss: 0.0246 - acc: 0.9926 - val_loss: 0.0267 - val_acc: 0.9925
Epoch 2/4
60000/60000 [==============================] - 159s - loss: 0.0185 - acc: 0.9940 - val_loss: 0.0276 - val_acc: 0.9895
Epoch 3/4
60000/60000 [==============================] - 158s - loss: 0.0160 - acc: 0.9949 - val_loss: 0.0312 - val_acc: 0.9903
Epoch 4/4
60000/60000 [==============================] - 157s - loss: 0.0133 - acc: 0.9958 - val_loss: 0.0228 - val_acc: 0.9926
Out[49]:
<keras.callbacks.History at 0x124513150>

In [50]:
# Running again until validation accuracy stops increasing
CNN_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=4, verbose=1,
                       validation_data=tst_batches, nb_val_samples=tst_batches.n)


Epoch 1/4
60000/60000 [==============================] - 166s - loss: 0.0104 - acc: 0.9968 - val_loss: 0.0209 - val_acc: 0.9944
Epoch 2/4
60000/60000 [==============================] - 153s - loss: 0.0095 - acc: 0.9970 - val_loss: 0.0299 - val_acc: 0.9931
Epoch 3/4
60000/60000 [==============================] - 156s - loss: 0.0081 - acc: 0.9977 - val_loss: 0.0384 - val_acc: 0.9907
Epoch 4/4
60000/60000 [==============================] - 158s - loss: 0.0075 - acc: 0.9975 - val_loss: 0.0316 - val_acc: 0.9918
Out[50]:
<keras.callbacks.History at 0x124513210>

4. Data Augmentation


In [6]:
gen = image.ImageDataGenerator(rotation_range=8, width_shift_range=0.08, shear_range=0.3,
                           height_shift_range=0.08, zoom_range=0.08)
trn_batches = gen.flow(x_train, y_train, batch_size=64)
tst_batches = gen.flow(x_test, y_test, batch_size=64)

In [55]:
CNN_Aug_model = ConvModel()
CNN_Aug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
                  validation_data=tst_batches, nb_val_samples=tst_batches.n)
# upping LR
print("Learning Rate, η = 0.1")
CNN_Aug_model.optimizer.lr=0.1
CNN_Aug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
                  validation_data=tst_batches, nb_val_samples=tst_batches.n)
# brining LR back down for more epochs
print("Learning Rate, η = 0.01")
CNN_Aug_model.optimizer.lr=0.01
CNN_Aug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=4, verbose=1,
                  validation_data=tst_batches, nb_val_samples=tst_batches.n)


Epoch 1/1
60000/60000 [==============================] - 171s - loss: 0.2000 - acc: 0.9364 - val_loss: 0.0700 - val_acc: 0.9773
Learning Rate, η = 0.1
Epoch 1/1
60000/60000 [==============================] - 168s - loss: 0.0693 - acc: 0.9781 - val_loss: 0.0491 - val_acc: 0.9848
Learning Rate, η = 0.01
Epoch 1/4
60000/60000 [==============================] - 164s - loss: 0.0550 - acc: 0.9829 - val_loss: 0.0407 - val_acc: 0.9872
Epoch 2/4
60000/60000 [==============================] - 172s - loss: 0.0460 - acc: 0.9858 - val_loss: 0.0433 - val_acc: 0.9858
Epoch 3/4
60000/60000 [==============================] - 165s - loss: 0.0424 - acc: 0.9872 - val_loss: 0.0395 - val_acc: 0.9874
Epoch 4/4
60000/60000 [==============================] - 169s - loss: 0.0397 - acc: 0.9878 - val_loss: 0.0339 - val_acc: 0.9890
Out[55]:
<keras.callbacks.History at 0x122656110>

In [56]:
# 4 more epochs at η=0.01
CNN_Aug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=4, verbose=1,
                  validation_data=tst_batches, nb_val_samples=tst_batches.n)


Epoch 1/4
60000/60000 [==============================] - 167s - loss: 0.0378 - acc: 0.9885 - val_loss: 0.0336 - val_acc: 0.9899
Epoch 2/4
60000/60000 [==============================] - 170s - loss: 0.0346 - acc: 0.9891 - val_loss: 0.0393 - val_acc: 0.9884
Epoch 3/4
60000/60000 [==============================] - 169s - loss: 0.0317 - acc: 0.9902 - val_loss: 0.0295 - val_acc: 0.9913
Epoch 4/4
60000/60000 [==============================] - 169s - loss: 0.0308 - acc: 0.9910 - val_loss: 0.0326 - val_acc: 0.9903
Out[56]:
<keras.callbacks.History at 0x1245137d0>

5. Batch Normalization + Data Augmentation

See this thread for info on BatchNorm axis.


In [8]:
def ConvModelBN():
    model = Sequential([
        Lambda(norm_input, input_shape=(1, 28, 28), output_shape=(1, 28, 28)),
        Convolution2D(32, 3, 3, activation='relu'),
        BatchNormalization(axis=1),
        Convolution2D(32, 3, 3, activation='relu'),
        MaxPooling2D(),
        BatchNormalization(axis=1),
        Convolution2D(64, 3, 3, activation='relu'),
        BatchNormalization(axis=1),
        Convolution2D(64, 3, 3, activation='relu'),
        MaxPooling2D(),
        Flatten(),
        BatchNormalization(),
        Dense(512, activation='relu'),
        BatchNormalization(),
        Dense(10, activation='softmax')
    ])
    model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [9]:
CNN_BNAug_model = ConvModelBN()
CNN_BNAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
                              validation_data=tst_batches, nb_val_samples=tst_batches.n)
print("Learning Rate, η = 0.1")
CNN_BNAug_model.optimizer=0.1
CNN_BNAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=2, verbose=1,
                              validation_data=tst_batches, nb_val_samples=tst_batches.n)
print("Learning Rate, η = 0.01")
CNN_BNAug_model.optimizer=0.01
CNN_BNAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=6, verbose=1,
                              validation_data=tst_batches, nb_val_samples=tst_batches.n)


Epoch 1/1
60000/60000 [==============================] - 769s - loss: 0.1630 - acc: 0.9487 - val_loss: 0.0842 - val_acc: 0.9739
Learning Rate, η = 0.1
Epoch 1/2
60000/60000 [==============================] - 2210s - loss: 0.0724 - acc: 0.9770 - val_loss: 0.0504 - val_acc: 0.9845
Epoch 2/2
60000/60000 [==============================] - 323s - loss: 0.0569 - acc: 0.9823 - val_loss: 0.0435 - val_acc: 0.9866
Learning Rate, η = 0.01
Epoch 1/6
60000/60000 [==============================] - 323s - loss: 0.0536 - acc: 0.9830 - val_loss: 0.0396 - val_acc: 0.9881
Epoch 2/6
60000/60000 [==============================] - 322s - loss: 0.0470 - acc: 0.9852 - val_loss: 0.0373 - val_acc: 0.9878
Epoch 3/6
60000/60000 [==============================] - 316s - loss: 0.0453 - acc: 0.9851 - val_loss: 0.0431 - val_acc: 0.9856
Epoch 4/6
60000/60000 [==============================] - 326s - loss: 0.0417 - acc: 0.9871 - val_loss: 0.0383 - val_acc: 0.9879
Epoch 5/6
60000/60000 [==============================] - 325s - loss: 0.0377 - acc: 0.9885 - val_loss: 0.0355 - val_acc: 0.9888
Epoch 6/6
60000/60000 [==============================] - 314s - loss: 0.0355 - acc: 0.9888 - val_loss: 0.0374 - val_acc: 0.9889
Out[9]:
<keras.callbacks.History at 0x11440ff90>

In [10]:
# some more training at 0.1 and 0.01:
print("Learning Rate, η = 0.1")
CNN_BNAug_model.optimizer=0.1
CNN_BNAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
                              validation_data=tst_batches, nb_val_samples=tst_batches.n)
print("Learning Rate, η = 0.01")
CNN_BNAug_model.optimizer=0.01
CNN_BNAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=6, verbose=1,
                              validation_data=tst_batches, nb_val_samples=tst_batches.n)


Learning Rate, η = 0.1
Epoch 1/1
60000/60000 [==============================] - 316s - loss: 0.0355 - acc: 0.9884 - val_loss: 0.0362 - val_acc: 0.9887
Learning Rate, η = 0.01
Epoch 1/6
60000/60000 [==============================] - 320s - loss: 0.0309 - acc: 0.9896 - val_loss: 0.0314 - val_acc: 0.9898
Epoch 2/6
60000/60000 [==============================] - 335s - loss: 0.0314 - acc: 0.9898 - val_loss: 0.0320 - val_acc: 0.9901
Epoch 3/6
60000/60000 [==============================] - 318s - loss: 0.0298 - acc: 0.9906 - val_loss: 0.0270 - val_acc: 0.9917
Epoch 4/6
60000/60000 [==============================] - 327s - loss: 0.0302 - acc: 0.9902 - val_loss: 0.0321 - val_acc: 0.9906
Epoch 5/6
60000/60000 [==============================] - 337s - loss: 0.0284 - acc: 0.9909 - val_loss: 0.0261 - val_acc: 0.9920
Epoch 6/6
60000/60000 [==============================] - 350s - loss: 0.0255 - acc: 0.9918 - val_loss: 0.0282 - val_acc: 0.9912
Out[10]:
<keras.callbacks.History at 0x10c08eb90>

6. Dropout + Batch Normalization + Data Augmentation


In [7]:
def ConvModelBNDo():
    model = Sequential([
        Lambda(norm_input, input_shape=(1, 28, 28), output_shape=(1, 28, 28)),
        Convolution2D(32, 3, 3, activation='relu'),
        BatchNormalization(axis=1),
        Convolution2D(32, 3, 3, activation='relu'),
        MaxPooling2D(),
        BatchNormalization(axis=1),
        Convolution2D(64, 3, 3, activation='relu'),
        BatchNormalization(axis=1),
        Convolution2D(64, 3, 3, activation='relu'),
        MaxPooling2D(),
        Flatten(),
        BatchNormalization(),
        Dense(512, activation='relu'),
        BatchNormalization(),
        Dropout(0.5),
        Dense(10, activation='softmax')
    ])
    model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [12]:
CNN_BNDoAug_model = ConvModelBNDo()
CNN_BNDoAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
                                validation_data=tst_batches, nb_val_samples=tst_batches.n)
print("Learning Rate, η = 0.1")
CNN_BNDoAug_model.optimizer.lr=0.1
CNN_BNDoAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=4, verbose=1,
                                validation_data=tst_batches, nb_val_samples=tst_batches.n)
print("Learning Rate, η = 0.01")
CNN_BNDoAug_model.optimizer.lr=0.01
CNN_BNDoAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=6, verbose=1,
                                validation_data=tst_batches, nb_val_samples=tst_batches.n)


Epoch 1/1
60000/60000 [==============================] - 322s - loss: 0.2264 - acc: 0.9320 - val_loss: 0.0660 - val_acc: 0.9783
Learning Rate, η = 0.1
Epoch 1/4
60000/60000 [==============================] - 364s - loss: 0.0929 - acc: 0.9711 - val_loss: 0.0707 - val_acc: 0.9775
Epoch 2/4
60000/60000 [==============================] - 343s - loss: 0.0738 - acc: 0.9771 - val_loss: 0.0459 - val_acc: 0.9856
Epoch 3/4
60000/60000 [==============================] - 327s - loss: 0.0638 - acc: 0.9804 - val_loss: 0.0439 - val_acc: 0.9870
Epoch 4/4
60000/60000 [==============================] - 410s - loss: 0.0641 - acc: 0.9806 - val_loss: 0.0600 - val_acc: 0.9816
Learning Rate, η = 0.01
Epoch 1/6
60000/60000 [==============================] - 327s - loss: 0.0570 - acc: 0.9823 - val_loss: 0.0396 - val_acc: 0.9863
Epoch 2/6
60000/60000 [==============================] - 312s - loss: 0.0543 - acc: 0.9835 - val_loss: 0.0436 - val_acc: 0.9854
Epoch 3/6
60000/60000 [==============================] - 311s - loss: 0.0498 - acc: 0.9849 - val_loss: 0.0314 - val_acc: 0.9890
Epoch 4/6
60000/60000 [==============================] - 317s - loss: 0.0468 - acc: 0.9855 - val_loss: 0.0341 - val_acc: 0.9894
Epoch 5/6
60000/60000 [==============================] - 330s - loss: 0.0461 - acc: 0.9858 - val_loss: 0.0340 - val_acc: 0.9888
Epoch 6/6
60000/60000 [==============================] - 327s - loss: 0.0429 - acc: 0.9862 - val_loss: 0.0284 - val_acc: 0.9907
Out[12]:
<keras.callbacks.History at 0x1146df550>

In [13]:
# 6 more epochs at 0.01
CNN_BNDoAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=6, verbose=1,
                                validation_data=tst_batches, nb_val_samples=tst_batches.n)


Epoch 1/6
60000/60000 [==============================] - 326s - loss: 0.0420 - acc: 0.9873 - val_loss: 0.0322 - val_acc: 0.9910
Epoch 2/6
60000/60000 [==============================] - 394s - loss: 0.0381 - acc: 0.9883 - val_loss: 0.0284 - val_acc: 0.9904
Epoch 3/6
60000/60000 [==============================] - 355s - loss: 0.0402 - acc: 0.9871 - val_loss: 0.0325 - val_acc: 0.9905
Epoch 4/6
60000/60000 [==============================] - 477s - loss: 0.0371 - acc: 0.9883 - val_loss: 0.0230 - val_acc: 0.9926
Epoch 5/6
60000/60000 [==============================] - 383s - loss: 0.0360 - acc: 0.9891 - val_loss: 0.0325 - val_acc: 0.9897
Epoch 6/6
60000/60000 [==============================] - 441s - loss: 0.0334 - acc: 0.9891 - val_loss: 0.0251 - val_acc: 0.9924
Out[13]:
<keras.callbacks.History at 0x11b359210>

In [14]:
print("Learning Rate η = 0.001")
CNN_BNDoAug_model.optimizer.lr=0.001
CNN_BNDoAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=12, verbose=1,
                                validation_data=tst_batches, nb_val_samples=tst_batches.n)


Learning Rate η = 0.001
Epoch 1/12
60000/60000 [==============================] - 314s - loss: 0.0346 - acc: 0.9894 - val_loss: 0.0326 - val_acc: 0.9906
Epoch 2/12
60000/60000 [==============================] - 312s - loss: 0.0324 - acc: 0.9902 - val_loss: 0.0234 - val_acc: 0.9936
Epoch 3/12
60000/60000 [==============================] - 310s - loss: 0.0335 - acc: 0.9895 - val_loss: 0.0218 - val_acc: 0.9930
Epoch 4/12
60000/60000 [==============================] - 309s - loss: 0.0329 - acc: 0.9896 - val_loss: 0.0337 - val_acc: 0.9908
Epoch 5/12
60000/60000 [==============================] - 310s - loss: 0.0314 - acc: 0.9902 - val_loss: 0.0211 - val_acc: 0.9928
Epoch 6/12
60000/60000 [==============================] - 309s - loss: 0.0298 - acc: 0.9910 - val_loss: 0.0287 - val_acc: 0.9925
Epoch 7/12
60000/60000 [==============================] - 312s - loss: 0.0275 - acc: 0.9921 - val_loss: 0.0186 - val_acc: 0.9942
Epoch 8/12
60000/60000 [==============================] - 318s - loss: 0.0272 - acc: 0.9914 - val_loss: 0.0234 - val_acc: 0.9926
Epoch 9/12
60000/60000 [==============================] - 318s - loss: 0.0279 - acc: 0.9917 - val_loss: 0.0231 - val_acc: 0.9934
Epoch 10/12
60000/60000 [==============================] - 318s - loss: 0.0272 - acc: 0.9921 - val_loss: 0.0219 - val_acc: 0.9929
Epoch 11/12
60000/60000 [==============================] - 318s - loss: 0.0272 - acc: 0.9916 - val_loss: 0.0219 - val_acc: 0.9927
Epoch 12/12
60000/60000 [==============================] - 318s - loss: 0.0255 - acc: 0.9918 - val_loss: 0.0192 - val_acc: 0.9936
Out[14]:
<keras.callbacks.History at 0x11b0d5c90>

7. Ensembling

Define a function to automatically train a model:


In [8]:
# I'll set it to display progress at the start of each LR-change
def train_model():
    model = ConvModelBNDo()
    model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
                                    validation_data=tst_batches, nb_val_samples=tst_batches.n)
    
    model.optimizer.lr=0.1
    model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
                                    validation_data=tst_batches, nb_val_samples=tst_batches.n)
    model.fit_generator(trn_batches, trn_batches.n, nb_epoch=3, verbose=0,
                                    validation_data=tst_batches, nb_val_samples=tst_batches.n)
    
    model.optimizer.lr=0.01
    model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
                                    validation_data=tst_batches, nb_val_samples=tst_batches.n)
    model.fit_generator(trn_batches, trn_batches.n, nb_epoch=11, verbose=0,
                                    validation_data=tst_batches, nb_val_samples=tst_batches.n)
    
    model.optimizer.lr=0.001
    model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
                                    validation_data=tst_batches, nb_val_samples=tst_batches.n)
    model.fit_generator(trn_batches, trn_batches.n, nb_epoch=11, verbose=0,
                                    validation_data=tst_batches, nb_val_samples=tst_batches.n)
    return model

In [9]:
# Running a little test on the GPU now
testmodel = ConvModelBNDo()
testmodel.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
                        validation_data=tst_batches, nb_val_samples=tst_batches.n)


Epoch 1/1
60000/60000 [==============================] - 17s - loss: 0.2290 - acc: 0.9318 - val_loss: 0.0768 - val_acc: 0.9752
Out[9]:
<keras.callbacks.History at 0x7fa41578f190>

I finally got my GPU running on my workstation. Decided to leave the ghost of Bill Gates alone and put Ubuntu Linux on the second harddrive. This nvidia GTX 870M takes 17 seconds to get through the 60,000 images. The Core i5 on my Mac took an average of 340. A 20x speed up. This also means, at those numbers, a 6-strong ensemble running the regime in train_model() will take about 49 minutes and 18 seconds, instead of 16 hours and 26 minutes. You can see what the motivation was, for me to spend ~9 hours today and get the GPU working. It's a warm feeling, knowing your computer isn't just good for playing DOOM, but'll be doing its share of work real soon.

So, onward:

Create an array of models


In [12]:
# this'll take some time
models = [train_model() for m in xrange(6)]


Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.2289 - acc: 0.9310 - val_loss: 0.0722 - val_acc: 0.9792
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0966 - acc: 0.9700 - val_loss: 0.0529 - val_acc: 0.9835
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0584 - acc: 0.9821 - val_loss: 0.0392 - val_acc: 0.9872
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0314 - acc: 0.9904 - val_loss: 0.0271 - val_acc: 0.9904
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.2352 - acc: 0.9289 - val_loss: 0.0715 - val_acc: 0.9770
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0950 - acc: 0.9706 - val_loss: 0.0708 - val_acc: 0.9777
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0587 - acc: 0.9819 - val_loss: 0.0365 - val_acc: 0.9884
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0354 - acc: 0.9892 - val_loss: 0.0293 - val_acc: 0.9900
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.2202 - acc: 0.9340 - val_loss: 0.0673 - val_acc: 0.9793
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0912 - acc: 0.9723 - val_loss: 0.0453 - val_acc: 0.9855
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0569 - acc: 0.9829 - val_loss: 0.0438 - val_acc: 0.9858
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0331 - acc: 0.9897 - val_loss: 0.0247 - val_acc: 0.9923
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.2289 - acc: 0.9319 - val_loss: 0.0627 - val_acc: 0.9802
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0931 - acc: 0.9710 - val_loss: 0.0497 - val_acc: 0.9838
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0577 - acc: 0.9822 - val_loss: 0.0375 - val_acc: 0.9884
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0328 - acc: 0.9899 - val_loss: 0.0216 - val_acc: 0.9931
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.2161 - acc: 0.9357 - val_loss: 0.1182 - val_acc: 0.9649
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0955 - acc: 0.9708 - val_loss: 0.0447 - val_acc: 0.9861
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0560 - acc: 0.9827 - val_loss: 0.0342 - val_acc: 0.9894
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0354 - acc: 0.9896 - val_loss: 0.0314 - val_acc: 0.9902
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.2269 - acc: 0.9312 - val_loss: 0.0669 - val_acc: 0.9784
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0964 - acc: 0.9696 - val_loss: 0.0493 - val_acc: 0.9841
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0563 - acc: 0.9826 - val_loss: 0.0475 - val_acc: 0.9849
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0347 - acc: 0.9891 - val_loss: 0.0224 - val_acc: 0.9933

Save the models' weights -- bc this wasn't computationally cheap


In [14]:
from os import getcwd
path = getcwd() + 'data/mnist/'
model_path = path + 'models/'
for i,m in enumerate(models):
    m.save_weights(model_path + 'MNIST_CNN' + str(i) + '.pkl')


---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-14-df3862c740f7> in <module>()
      3 model_path = path + 'models/'
      4 for i,m in enumerate(models):
----> 5     m.save_weights(model_path + 'MNIST_CNN' + str(i) + '.pkl')

/home/wnixalo/miniconda3/envs/FAI/lib/python2.7/site-packages/keras/engine/topology.pyc in save_weights(self, filepath, overwrite)
   2643                     storing the weight value, named after the weight tensor.
   2644         """
-> 2645         import h5py
   2646         # If file exists and should not be overwritten:
   2647         if not overwrite and os.path.isfile(filepath):

ImportError: No module named h5py

Create an array of predictions from the models on the test-set. I'm using a batch size of 256 because that's what was done in lecture, and prediction is such an easier task that I think the large size just helps things go faster.


In [17]:
ensemble_preds = np.stack([m.predict(x_test, batch_size=256) for m in models])

Finally, take the average of the predictions:


In [18]:
avg_preds = ensemble_preds.mean(axis=0)

In [20]:
keras.metrics.categorical_accuracy(y_test, avg_preds).eval()


Out[20]:
array(0.996999979019165, dtype=float32)

Boom. 0.99699.. ~ 99.7% accuracy. Same as achieved in lecture; took roughly 50 minutes to train. Unfortunately I didn't have the h5py module installed when I ran this, so the weight's can't be saved easily -- simple fix of rerunning after install.

Trying the above again, this time having h5py installed.


In [9]:
# this'll take some time
models = [train_model() for m in xrange(6)]


Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.2259 - acc: 0.9318 - val_loss: 0.1109 - val_acc: 0.9646
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0922 - acc: 0.9715 - val_loss: 0.0647 - val_acc: 0.9788
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0595 - acc: 0.9818 - val_loss: 0.0338 - val_acc: 0.9887
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0334 - acc: 0.9900 - val_loss: 0.0374 - val_acc: 0.9882
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.2242 - acc: 0.9315 - val_loss: 0.0697 - val_acc: 0.9773
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0945 - acc: 0.9711 - val_loss: 0.0560 - val_acc: 0.9828
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0582 - acc: 0.9825 - val_loss: 0.0391 - val_acc: 0.9875
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0342 - acc: 0.9894 - val_loss: 0.0233 - val_acc: 0.9931
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.2253 - acc: 0.9323 - val_loss: 0.0667 - val_acc: 0.9796
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0928 - acc: 0.9712 - val_loss: 0.0492 - val_acc: 0.9833
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0585 - acc: 0.9819 - val_loss: 0.0395 - val_acc: 0.9869
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0324 - acc: 0.9896 - val_loss: 0.0218 - val_acc: 0.9933
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.2199 - acc: 0.9336 - val_loss: 0.0652 - val_acc: 0.9784
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0918 - acc: 0.9715 - val_loss: 0.0481 - val_acc: 0.9846
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0562 - acc: 0.9826 - val_loss: 0.0373 - val_acc: 0.9876
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0335 - acc: 0.9896 - val_loss: 0.0285 - val_acc: 0.9907
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.2272 - acc: 0.9316 - val_loss: 0.0839 - val_acc: 0.9744
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0898 - acc: 0.9724 - val_loss: 0.0794 - val_acc: 0.9738
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0589 - acc: 0.9823 - val_loss: 0.0527 - val_acc: 0.9838
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0348 - acc: 0.9891 - val_loss: 0.0241 - val_acc: 0.9928
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.2237 - acc: 0.9324 - val_loss: 0.0680 - val_acc: 0.9767
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0912 - acc: 0.9721 - val_loss: 0.0549 - val_acc: 0.9838
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0568 - acc: 0.9825 - val_loss: 0.0385 - val_acc: 0.9876
Epoch 1/1
60000/60000 [==============================] - 18s - loss: 0.0337 - acc: 0.9895 - val_loss: 0.0308 - val_acc: 0.9912

In [18]:
from os import getcwd
import os
path = getcwd() + '/data/mnist/'
model_path = path + 'models/'

if not os.path.exists(path):
    os.mkdir('data')
    os.mkdir('data/mnist')
if not os.path.exists(model_path): os.mkdir(model_path)

for i,m in enumerate(models):
    m.save_weights(model_path + 'MNIST_CNN' + str(i) + '.pkl')

In [19]:
ensemble_preds = np.stack([m.predict(x_test, batch_size=256) for m in models])
avg_preds = ensemble_preds.mean(axis=0)

In [21]:
keras.metrics.categorical_accuracy(y_test, avg_preds).eval()


Out[21]:
array(0.9970999956130981, dtype=float32)

And that's it. 99.71% -- 19 May 2017 - Wayne H Nixalo