Deep Learning: Mnist Analysis


In [1]:
%matplotlib inline
import math
import numpy as np
import utils; reload(utils)
from utils import *

from sympy import Symbol
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Lambda, Dense
from matplotlib import pyplot as plt


Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5103)
/home/ubuntu/anaconda2/lib/python2.7/site-packages/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.
  warnings.warn(warn)
Using Theano backend.

Workflow for each analysis type (e.g basic, 1 Dense layer...):

  1. Create model
  2. Train it with the default "Learning Rate" of 0.01 just 1 epoch so we see the speed with what the accuracy is increasing.
  3. Increase the "Learning Rate" to 0.1 and train the model between 4 and 12 epochs.
  4. Decrease the "Learning Rate" to 0.01 and train the model 4 epochs.
  5. Decrease the "Learning Rate" to 0.001 and train the model 2 epochs.
  6. Decrease the "Learning Rate" to 0.0001 and train the model 1 epoch.

In [2]:
# We set the "seed" so we make the results a bit more predictable.
np.random.seed(1)

In [3]:
# Let's load the data. Mnist can be loaded really easily with Keras!
(X_train, y_train), (X_test, y_test) = mnist.load_data()


Downloading data from https://s3.amazonaws.com/img-datasets/mnist.pkl.gz
15171584/15296311 [============================>.] - ETA: 0s

In [4]:
(X_train.shape, y_train.shape, X_test.shape, y_test.shape)


Out[4]:
((60000, 28, 28), (60000,), (10000, 28, 28), (10000,))

In [5]:
# Keras needs to have at least one channel (color), so we expand the dimensions here.
X_test = np.expand_dims(X_test,1)
X_train = np.expand_dims(X_train,1)

In [6]:
# We would like to have an output in the form: [0, 0, 1, 0...] so we transform the labels with
# "onehot".
y_train = onehot(y_train)
y_test = onehot(y_test)

In [7]:
mean_px = X_train.mean().astype(np.float32)
std_px = X_train.std().astype(np.float32)

In [8]:
# We normalize the inputs so the training is more stable.
def norm_input(x): return (x-mean_px)/std_px

Linear Model


In [26]:
# Let's start by implementing a really basic Linear Model.
model = Sequential([
    Lambda(norm_input, input_shape=(1,28,28)),
    Flatten(),
    Dense(10, activation='softmax')
])
model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])

In [27]:
# This class creates batches based on images in "array-form". It's also quite powerful
# as it allows us to do Data Augmentation.
gen = image.ImageDataGenerator()
batches = gen.flow(X_train, y_train, batch_size=64)
test_batches = gen.flow(X_test, y_test, batch_size=64)

In [28]:
# We train the model with the batches.
model.fit_generator(batches, batches.N, nb_epoch=1, 
                    validation_data=test_batches, nb_val_samples=test_batches.N)


Epoch 1/1
60000/60000 [==============================] - 11s - loss: 0.4208 - acc: 0.8753 - val_loss: 0.2939 - val_acc: 0.9157
Out[28]:
<keras.callbacks.History at 0x7fcfda318190>

In [29]:
# We increase the learning rate until we get overfitting.
model.optimizer.lr=0.1

In [30]:
model.fit_generator(batches, batches.N, nb_epoch=1, 
                    validation_data=test_batches, nb_val_samples=test_batches.N)


Epoch 1/1
60000/60000 [==============================] - 11s - loss: 0.3002 - acc: 0.9144 - val_loss: 0.2811 - val_acc: 0.9237
Out[30]:
<keras.callbacks.History at 0x7fcfd92fc510>

In [31]:
# We decrease the learning rate as we want it to go slower because our accuracy
# didn't increase too much in the last step.
model.optimizer.lr=0.01

In [32]:
# We train the model with the batches 4 times so we reach overfitting.
model.fit_generator(batches, batches.N, nb_epoch=4, 
                    validation_data=test_batches, nb_val_samples=test_batches.N)


Epoch 1/4
60000/60000 [==============================] - 11s - loss: 0.2840 - acc: 0.9200 - val_loss: 0.2714 - val_acc: 0.9240
Epoch 2/4
60000/60000 [==============================] - 11s - loss: 0.2791 - acc: 0.9222 - val_loss: 0.2870 - val_acc: 0.9201
Epoch 3/4
60000/60000 [==============================] - 11s - loss: 0.2725 - acc: 0.9237 - val_loss: 0.2734 - val_acc: 0.9210
Epoch 4/4
60000/60000 [==============================] - 12s - loss: 0.2693 - acc: 0.9255 - val_loss: 0.2770 - val_acc: 0.9259
Out[32]:
<keras.callbacks.History at 0x7fcfd92fc890>

In [33]:
# We are still underfitting! Our model is clearly not complex enough.

Single Dense Layer


In [34]:
# We add a new hidden dense layer and follow the same process as before.
model = Sequential([
    Lambda(norm_input, input_shape=(1,28,28)),
    Flatten(),
    Dense(512, activation='softmax'),
    Dense(10, activation='softmax')
    ])
model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])

In [36]:
model.fit_generator(batches, batches.N, nb_epoch=1, 
                    validation_data=test_batches, nb_val_samples=test_batches.N)


Epoch 1/1
60000/60000 [==============================] - 12s - loss: 1.5467 - acc: 0.8874 - val_loss: 1.0143 - val_acc: 0.9259
Out[36]:
<keras.callbacks.History at 0x7fcfd90ea510>

In [37]:
model.optimizer.lr=0.1

In [38]:
model.fit_generator(batches, batches.N, nb_epoch=4, 
                    validation_data=test_batches, nb_val_samples=test_batches.N)


Epoch 1/4
60000/60000 [==============================] - 12s - loss: 0.7422 - acc: 0.9279 - val_loss: 0.5531 - val_acc: 0.9295
Epoch 2/4
60000/60000 [==============================] - 12s - loss: 0.4504 - acc: 0.9349 - val_loss: 0.3907 - val_acc: 0.9318
Epoch 3/4
60000/60000 [==============================] - 12s - loss: 0.3460 - acc: 0.9371 - val_loss: 0.3189 - val_acc: 0.9356
Epoch 4/4
60000/60000 [==============================] - 12s - loss: 0.2984 - acc: 0.9395 - val_loss: 0.3073 - val_acc: 0.9337
Out[38]:
<keras.callbacks.History at 0x7fcfd8862d50>

In [39]:
model.optimizer.lr=0.01

In [40]:
model.fit_generator(batches, batches.N, nb_epoch=4, 
                    validation_data=test_batches, nb_val_samples=test_batches.N)


Epoch 1/4
60000/60000 [==============================] - 12s - loss: 0.2705 - acc: 0.9424 - val_loss: 0.2842 - val_acc: 0.9371
Epoch 2/4
60000/60000 [==============================] - 12s - loss: 0.2512 - acc: 0.9453 - val_loss: 0.2761 - val_acc: 0.9386
Epoch 3/4
60000/60000 [==============================] - 12s - loss: 0.2373 - acc: 0.9470 - val_loss: 0.2635 - val_acc: 0.9385
Epoch 4/4
60000/60000 [==============================] - 12s - loss: 0.2288 - acc: 0.9476 - val_loss: 0.2782 - val_acc: 0.9340
Out[40]:
<keras.callbacks.History at 0x7fcfd8862e50>

In [41]:
# We are clearly overfitting this time!
# Meaning that the accuracy of the training data is much higher than the one in the
# validation set

VGG-Style CNN


In [42]:
# Now we try out a VGG-style model, with several Convolution2D layers and MaxPooling2D.
model = Sequential([
    Lambda(norm_input, input_shape=(1,28,28)),
    Convolution2D(32,3,3, activation='relu'),
    Convolution2D(32,3,3, activation='relu'),
    MaxPooling2D(),
    Convolution2D(64,3,3, activation='relu'),
    Convolution2D(64,3,3, activation='relu'),
    MaxPooling2D(),
    Flatten(),
    Dense(512, activation='relu'),
    Dense(10, activation='softmax')
    ])
model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])

In [43]:
model.fit_generator(batches, batches.N, nb_epoch=1, 
                    validation_data=test_batches, nb_val_samples=test_batches.N)


Epoch 1/1
60000/60000 [==============================] - 20s - loss: 0.1093 - acc: 0.9658 - val_loss: 0.0321 - val_acc: 0.9899
Out[43]:
<keras.callbacks.History at 0x7fcfd41db690>

In [44]:
model.optimizer.lr=0.1

In [45]:
model.fit_generator(batches, batches.N, nb_epoch=1, 
                    validation_data=test_batches, nb_val_samples=test_batches.N)


Epoch 1/1
60000/60000 [==============================] - 20s - loss: 0.0335 - acc: 0.9902 - val_loss: 0.0288 - val_acc: 0.9910
Out[45]:
<keras.callbacks.History at 0x7fcfc9cb22d0>

In [46]:
model.optimizer.lr=0.01

In [47]:
model.fit_generator(batches, batches.N, nb_epoch=8, 
                    validation_data=test_batches, nb_val_samples=test_batches.N)


Epoch 1/8
60000/60000 [==============================] - 20s - loss: 0.0259 - acc: 0.9920 - val_loss: 0.0267 - val_acc: 0.9906
Epoch 2/8
60000/60000 [==============================] - 20s - loss: 0.0196 - acc: 0.9939 - val_loss: 0.0230 - val_acc: 0.9931
Epoch 3/8
60000/60000 [==============================] - 20s - loss: 0.0150 - acc: 0.9952 - val_loss: 0.0247 - val_acc: 0.9941
Epoch 4/8
60000/60000 [==============================] - 20s - loss: 0.0131 - acc: 0.9955 - val_loss: 0.0204 - val_acc: 0.9939
Epoch 5/8
60000/60000 [==============================] - 20s - loss: 0.0109 - acc: 0.9964 - val_loss: 0.0232 - val_acc: 0.9945
Epoch 6/8
60000/60000 [==============================] - 20s - loss: 0.0091 - acc: 0.9974 - val_loss: 0.0251 - val_acc: 0.9937
Epoch 7/8
60000/60000 [==============================] - 20s - loss: 0.0080 - acc: 0.9975 - val_loss: 0.0277 - val_acc: 0.9933
Epoch 8/8
60000/60000 [==============================] - 20s - loss: 0.0092 - acc: 0.9970 - val_loss: 0.0301 - val_acc: 0.9915
Out[47]:
<keras.callbacks.History at 0x7fcfc9cb2450>

In [ ]:
# This result is incredible! But we are overfitting, let's introduce "Data Augmentation" so
# we can deal with that.

Data Augmentation


In [48]:
model = Sequential([
    Lambda(norm_input, input_shape=(1,28,28)),
    Convolution2D(32,3,3, activation='relu'),
    Convolution2D(32,3,3, activation='relu'),
    MaxPooling2D(),
    Convolution2D(64,3,3, activation='relu'),
    Convolution2D(64,3,3, activation='relu'),
    MaxPooling2D(),
    Flatten(),
    Dense(512, activation='relu'),
    Dense(10, activation='softmax')
    ])
model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])

In [49]:
# This command will randomly modify the images (e.g rotation, zoom, ...) so it seems like we have more
# images.
gen = image.ImageDataGenerator(rotation_range=8, width_shift_range=0.08, shear_range=0.3,
                               height_shift_range=0.08, zoom_range=0.08)
batches = gen.flow(X_train, y_train, batch_size=64)
test_batches = gen.flow(X_test, y_test, batch_size=64)

In [50]:
model.fit_generator(batches, batches.N, nb_epoch=1, 
                    validation_data=test_batches, nb_val_samples=test_batches.N)


Epoch 1/1
60000/60000 [==============================] - 19s - loss: 0.1981 - acc: 0.9376 - val_loss: 0.0779 - val_acc: 0.9747
Out[50]:
<keras.callbacks.History at 0x7fcfc883a990>

In [51]:
model.optimizer.lr=0.1

In [52]:
model.fit_generator(batches, batches.N, nb_epoch=4, 
                    validation_data=test_batches, nb_val_samples=test_batches.N)


Epoch 1/4
60000/60000 [==============================] - 20s - loss: 0.0720 - acc: 0.9780 - val_loss: 0.0497 - val_acc: 0.9831
Epoch 2/4
60000/60000 [==============================] - 19s - loss: 0.0563 - acc: 0.9821 - val_loss: 0.0484 - val_acc: 0.9836
Epoch 3/4
60000/60000 [==============================] - 20s - loss: 0.0455 - acc: 0.9859 - val_loss: 0.0401 - val_acc: 0.9876
Epoch 4/4
60000/60000 [==============================] - 20s - loss: 0.0438 - acc: 0.9864 - val_loss: 0.0387 - val_acc: 0.9872
Out[52]:
<keras.callbacks.History at 0x7fcfc7eba190>

In [53]:
model.optimizer.lr=0.01

In [54]:
model.fit_generator(batches, batches.N, nb_epoch=8, 
                    validation_data=test_batches, nb_val_samples=test_batches.N)


Epoch 1/8
60000/60000 [==============================] - 19s - loss: 0.0373 - acc: 0.9886 - val_loss: 0.0320 - val_acc: 0.9881
Epoch 2/8
60000/60000 [==============================] - 19s - loss: 0.0380 - acc: 0.9882 - val_loss: 0.0342 - val_acc: 0.9898
Epoch 3/8
60000/60000 [==============================] - 19s - loss: 0.0343 - acc: 0.9891 - val_loss: 0.0408 - val_acc: 0.9869
Epoch 4/8
60000/60000 [==============================] - 19s - loss: 0.0325 - acc: 0.9899 - val_loss: 0.0279 - val_acc: 0.9919
Epoch 5/8
60000/60000 [==============================] - 19s - loss: 0.0306 - acc: 0.9905 - val_loss: 0.0273 - val_acc: 0.9917
Epoch 6/8
60000/60000 [==============================] - 20s - loss: 0.0296 - acc: 0.9907 - val_loss: 0.0304 - val_acc: 0.9904
Epoch 7/8
60000/60000 [==============================] - 19s - loss: 0.0282 - acc: 0.9914 - val_loss: 0.0315 - val_acc: 0.9918
Epoch 8/8
60000/60000 [==============================] - 19s - loss: 0.0259 - acc: 0.9920 - val_loss: 0.0310 - val_acc: 0.9912
Out[54]:
<keras.callbacks.History at 0x7fcfc7eba4d0>

In [55]:
model.optimizer.lr=0.001

In [56]:
model.fit_generator(batches, batches.N, nb_epoch=4, 
                    validation_data=test_batches, nb_val_samples=test_batches.N)


Epoch 1/4
60000/60000 [==============================] - 20s - loss: 0.0260 - acc: 0.9921 - val_loss: 0.0414 - val_acc: 0.9880
Epoch 2/4
60000/60000 [==============================] - 20s - loss: 0.0253 - acc: 0.9917 - val_loss: 0.0310 - val_acc: 0.9906
Epoch 3/4
60000/60000 [==============================] - 20s - loss: 0.0228 - acc: 0.9927 - val_loss: 0.0340 - val_acc: 0.9908
Epoch 4/4
60000/60000 [==============================] - 20s - loss: 0.0234 - acc: 0.9927 - val_loss: 0.0263 - val_acc: 0.9916
Out[56]:
<keras.callbacks.History at 0x7fcfc7eba090>

In [ ]:
# Not bad, we are still overfitting but much less! Let's see other techniques that might be
# useful in your analyses.

Batch Normalization + Data Augmentation


In [57]:
# Let's apply now "Batch Normalization" to normalize the different weights in the CNN.
model = Sequential([
    Lambda(norm_input, input_shape=(1,28,28)),
    Convolution2D(32,3,3, activation='relu'),
    BatchNormalization(axis=1),
    Convolution2D(32,3,3, activation='relu'),
    MaxPooling2D(),
    BatchNormalization(axis=1),
    Convolution2D(64,3,3, activation='relu'),
    BatchNormalization(axis=1),
    Convolution2D(64,3,3, activation='relu'),
    MaxPooling2D(),
    Flatten(),
    BatchNormalization(),
    Dense(512, activation='relu'),
    BatchNormalization(),
    Dense(10, activation='softmax')
    ])
model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])

In [58]:
model.fit_generator(batches, batches.N, nb_epoch=1, 
                    validation_data=test_batches, nb_val_samples=test_batches.N)


Epoch 1/1
60000/60000 [==============================] - 47s - loss: 0.1600 - acc: 0.9507 - val_loss: 0.0681 - val_acc: 0.9795
Out[58]:
<keras.callbacks.History at 0x7fcfc3bb4490>

In [59]:
model.optimizer.lr=0.1

In [60]:
model.fit_generator(batches, batches.N, nb_epoch=4, 
                    validation_data=test_batches, nb_val_samples=test_batches.N)


Epoch 1/4
60000/60000 [==============================] - 46s - loss: 0.0709 - acc: 0.9776 - val_loss: 0.0438 - val_acc: 0.9849
Epoch 2/4
60000/60000 [==============================] - 46s - loss: 0.0598 - acc: 0.9811 - val_loss: 0.0419 - val_acc: 0.9867
Epoch 3/4
60000/60000 [==============================] - 46s - loss: 0.0517 - acc: 0.9834 - val_loss: 0.0522 - val_acc: 0.9843
Epoch 4/4
60000/60000 [==============================] - 46s - loss: 0.0488 - acc: 0.9852 - val_loss: 0.0374 - val_acc: 0.9884
Out[60]:
<keras.callbacks.History at 0x7fcfc3bb4590>

In [61]:
model.optimizer.lr=0.01

In [62]:
model.fit_generator(batches, batches.N, nb_epoch=4, 
                    validation_data=test_batches, nb_val_samples=test_batches.N)


Epoch 1/4
60000/60000 [==============================] - 46s - loss: 0.0423 - acc: 0.9866 - val_loss: 0.0416 - val_acc: 0.9869
Epoch 2/4
60000/60000 [==============================] - 46s - loss: 0.0432 - acc: 0.9866 - val_loss: 0.0373 - val_acc: 0.9877
Epoch 3/4
60000/60000 [==============================] - 46s - loss: 0.0387 - acc: 0.9879 - val_loss: 0.0297 - val_acc: 0.9907
Epoch 4/4
60000/60000 [==============================] - 46s - loss: 0.0359 - acc: 0.9888 - val_loss: 0.0358 - val_acc: 0.9890
Out[62]:
<keras.callbacks.History at 0x7fcfc7768a90>

In [63]:
model.optimizer.lr=0.001

In [64]:
model.fit_generator(batches, batches.N, nb_epoch=1, 
                    validation_data=test_batches, nb_val_samples=test_batches.N)


Epoch 1/1
60000/60000 [==============================] - 46s - loss: 0.0336 - acc: 0.9891 - val_loss: 0.0402 - val_acc: 0.9874
Out[64]:
<keras.callbacks.History at 0x7fcfc7ebaad0>

Batch Normalization + Data Augmentation + Dropout


In [65]:
# We are overfitting again, let's add a Dropout layer
def get_model_bn_do():
    model = Sequential([
        Lambda(norm_input, input_shape=(1,28,28)),
        Convolution2D(32,3,3, activation='relu'),
        BatchNormalization(axis=1),
        Convolution2D(32,3,3, activation='relu'),
        MaxPooling2D(),
        BatchNormalization(axis=1),
        Convolution2D(64,3,3, activation='relu'),
        BatchNormalization(axis=1),
        Convolution2D(64,3,3, activation='relu'),
        MaxPooling2D(),
        Flatten(),
        BatchNormalization(),
        Dense(512, activation='relu'),
        BatchNormalization(),
        Dropout(0.5),
        Dense(10, activation='softmax')
        ])
    model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
    return model
model = get_model_bn_do()

In [66]:
model.fit_generator(batches, batches.N, nb_epoch=1, 
                    validation_data=test_batches, nb_val_samples=test_batches.N)


Epoch 1/1
60000/60000 [==============================] - 45s - loss: 0.2195 - acc: 0.9344 - val_loss: 0.0652 - val_acc: 0.9796
Out[66]:
<keras.callbacks.History at 0x7fcfab895110>

In [67]:
model.optimizer.lr=0.1

In [68]:
model.fit_generator(batches, batches.N, nb_epoch=4, 
                    validation_data=test_batches, nb_val_samples=test_batches.N)


Epoch 1/4
60000/60000 [==============================] - 45s - loss: 0.0921 - acc: 0.9710 - val_loss: 0.0461 - val_acc: 0.9840
Epoch 2/4
60000/60000 [==============================] - 45s - loss: 0.0752 - acc: 0.9766 - val_loss: 0.0406 - val_acc: 0.9871
Epoch 3/4
60000/60000 [==============================] - 45s - loss: 0.0673 - acc: 0.9792 - val_loss: 0.0348 - val_acc: 0.9884
Epoch 4/4
60000/60000 [==============================] - 45s - loss: 0.0632 - acc: 0.9804 - val_loss: 0.0398 - val_acc: 0.9867
Out[68]:
<keras.callbacks.History at 0x7fcfab8959d0>

In [69]:
model.optimizer.lr=0.01

In [70]:
model.fit_generator(batches, batches.N, nb_epoch=12, 
                    validation_data=test_batches, nb_val_samples=test_batches.N)


Epoch 1/12
60000/60000 [==============================] - 45s - loss: 0.0570 - acc: 0.9828 - val_loss: 0.0400 - val_acc: 0.9877
Epoch 2/12
60000/60000 [==============================] - 45s - loss: 0.0525 - acc: 0.9840 - val_loss: 0.0430 - val_acc: 0.9871
Epoch 3/12
60000/60000 [==============================] - 45s - loss: 0.0483 - acc: 0.9841 - val_loss: 0.0308 - val_acc: 0.9911
Epoch 4/12
60000/60000 [==============================] - 45s - loss: 0.0480 - acc: 0.9855 - val_loss: 0.0295 - val_acc: 0.9916
Epoch 5/12
60000/60000 [==============================] - 45s - loss: 0.0459 - acc: 0.9861 - val_loss: 0.0295 - val_acc: 0.9897
Epoch 6/12
60000/60000 [==============================] - 45s - loss: 0.0437 - acc: 0.9869 - val_loss: 0.0294 - val_acc: 0.9915
Epoch 7/12
60000/60000 [==============================] - 45s - loss: 0.0414 - acc: 0.9872 - val_loss: 0.0242 - val_acc: 0.9914
Epoch 8/12
60000/60000 [==============================] - 45s - loss: 0.0417 - acc: 0.9873 - val_loss: 0.0338 - val_acc: 0.9909
Epoch 9/12
60000/60000 [==============================] - 45s - loss: 0.0380 - acc: 0.9884 - val_loss: 0.0352 - val_acc: 0.9895
Epoch 10/12
60000/60000 [==============================] - 45s - loss: 0.0393 - acc: 0.9881 - val_loss: 0.0264 - val_acc: 0.9918
Epoch 11/12
60000/60000 [==============================] - 45s - loss: 0.0359 - acc: 0.9892 - val_loss: 0.0272 - val_acc: 0.9920
Epoch 12/12
60000/60000 [==============================] - 45s - loss: 0.0365 - acc: 0.9891 - val_loss: 0.0240 - val_acc: 0.9924
Out[70]:
<keras.callbacks.History at 0x7fcfb202fdd0>

In [71]:
model.optimizer.lr=0.001

In [72]:
model.fit_generator(batches, batches.N, nb_epoch=1, 
                    validation_data=test_batches, nb_val_samples=test_batches.N)


Epoch 1/1
60000/60000 [==============================] - 45s - loss: 0.0336 - acc: 0.9895 - val_loss: 0.0271 - val_acc: 0.9920
Out[72]:
<keras.callbacks.History at 0x7fcfb202f0d0>

Ensembling


In [73]:
# Let's try finally with "Ensembling"
def fit_model():
    model = get_model_bn_do()
    model.fit_generator(batches, batches.N, nb_epoch=1, verbose=0,
                        validation_data=test_batches, nb_val_samples=test_batches.N)
    model.optimizer.lr=0.1
    model.fit_generator(batches, batches.N, nb_epoch=4, verbose=0,
                        validation_data=test_batches, nb_val_samples=test_batches.N)
    model.optimizer.lr=0.01
    model.fit_generator(batches, batches.N, nb_epoch=12, verbose=0,
                        validation_data=test_batches, nb_val_samples=test_batches.N)
    model.optimizer.lr=0.001
    model.fit_generator(batches, batches.N, nb_epoch=18, verbose=0,
                        validation_data=test_batches, nb_val_samples=test_batches.N)
    return model

In [ ]:
models = [fit_model() for i in range(6)]

In [ ]:
path = "data/mnist/"
model_path = path + 'models/'

In [ ]:
for i,m in enumerate(models):
    m.save_weights(model_path+'cnn-mnist23-'+str(i)+'.pkl')

In [ ]:
evals = np.array([m.evaluate(X_test, y_test, batch_size=256) for m in models])

In [ ]:
evals.mean(axis=0)

In [ ]:
all_preds = np.stack([m.predict(X_test, batch_size=256) for m in models])

In [ ]:
all_preds.shape

In [ ]:
avg_preds = all_preds.mean(axis=0)

In [ ]:
keras.metrics.categorical_accuracy(y_test, avg_preds).eval()