In [1]:
import keras
import numpy as np
from keras.datasets import mnist
from keras.optimizers import Adam
from keras.models import Sequential
from keras.preprocessing import image
from keras.layers.core import Dense
from keras.layers.core import Lambda
from keras.layers.core import Flatten
from keras.layers.core import Dropout
from keras.layers.pooling import MaxPooling2D
from keras.layers.convolutional import Convolution2D
from keras.layers.normalization import BatchNormalization
from keras.utils.np_utils import to_categorical
I want to import Vgg16 as well because I'll want it's low-level features
In [ ]:
# import os, sys
# sys.path.insert(1, os.path.join('../utils/'))
Actually, looks like Vgg's ImageNet weights won't be needed.
In [ ]:
# from vgg16 import Vgg16
# vgg = Vgg16()
In [2]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
In [3]:
x_train = np.expand_dims(x_train, 1) # can also enter <axis=1> for <1>
x_test = np.expand_dims(x_test, 1)
x_train.shape
Out[3]:
One-Hot Encoding the outputs:
In [4]:
y_train, y_test = to_categorical(y_train), to_categorical(y_test)
Since this notebook's models are all mimicking Vgg16, the input data should be preprocessed in the same way: in this case normalized by subtracting the mean and dividing by the standard deviation. It turns out this is a good idea generally.
In [5]:
x_mean = x_train.mean().astype(np.float32)
x_stdv = x_train.std().astype(np.float32)
def norm_input(x): return (x - x_mean) / x_stdv
In [6]:
gen = image.ImageDataGenerator()
trn_batches = gen.flow(x_train, y_train, batch_size=64)
tst_batches = gen.flow(x_test, y_test, batch_size=64)
General workflow, going forward:
Points on internal architecture:
Lambda layer, which normalizes the input and assigns a shape of (1 color-channel x 28 pixels x 28 pixels)aka 'Dense', 'Fully-Connected'
In [24]:
def LinModel():
model = Sequential([
Lambda(norm_input, input_shape=(1, 28, 28)),
Flatten(),
Dense(10, activation='softmax')
])
model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
return model
In [25]:
Linear_model = LinModel()
Linear_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1,
validation_data=tst_batches, nb_val_samples=trn_batches.n)
Out[25]:
In [27]:
Linear_model.optimizer.lr=0.1
Linear_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
Out[27]:
In [28]:
Linear_model.optimizer.lr=0.01
Linear_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=4,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
Out[28]:
In [29]:
Linear_model.optimizer.lr=0.001
Linear_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=8,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
Out[29]:
This is what people in the 80s & 90s thought of as a 'Neural Network': a single Fully-Connected hidden layer. I don't yet know why the hidden layer is ouputting 512 units. For natural-image recognition it's 4096. I'll see whether a ReLU or Softmax hidden layer works better.
By the way, the training and hyper-parameter tuning process should be automated. I want to use a NN to figure out how to do that for me.
In [30]:
def FCModel():
model = Sequential([
Lambda(norm_input, input_shape=(1, 28, 28)),
Dense(512, activation='relu'),
Flatten(),
Dense(10, activation='softmax')
])
model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
return model
In [32]:
FC_model = FCModel()
FC_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
Out[32]:
In [34]:
FC_model.optimizer=0.1
FC_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
Out[34]:
In [35]:
FC_model.optimizer=0.01
FC_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=4,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
Out[35]:
With an accuracy of 0.9823 and validation accuracy of 0.9664, the model's starting to overfit significantly and hit its limits, so it's time to go on to the next technique.
I'm specifying an output shape equal to the input shape, to suppress the warnings keras was giving me; and it stated it was defaulting to that anyway. Or maybe I should've written output_shape=input_shape
Aha: yes it's as I thought. See this thread -- output_shape warnings were added to Keras, and neither vgg16.py (nor I until now) were specifying output_shape. It's fine.
The first time I ran this, I forgot to have 2 pairs of Conv layers. At the third λr=0.01 epoch I had acc/val of 0.9964, 0.9878
Also noticing: in lecture JH was using a GPU which I think was an NVidia Titan X. I'm using an Intel Core i5 CPU on a MacBook Pro. His epochs took on average 6 seconds, mine are taking 180~190. Convolutions are also the most computationally-intensive part of the NN being built here.
Interestingly, the model with 2 Conv-layer pairs is taking avg 160s. Best Acc/Val: 0.9968/0.9944
Final: 0.9975/0.9918 - massive overfitting
In [46]:
def ConvModel():
model = Sequential([
Lambda(norm_input, input_shape=(1, 28, 28), output_shape=(1, 28, 28)),
Convolution2D(32, 3, 3, activation='relu'),
Convolution2D(32, 3, 3, activation='relu'),
MaxPooling2D(),
Convolution2D(64, 3, 3, activation='relu'),
Convolution2D(64, 3, 3, activation='relu'),
MaxPooling2D(),
Flatten(),
Dense(512, activation='relu'),
Dense(10, activation='softmax')
])
model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
return model
In [47]:
CNN_model = ConvModel()
CNN_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
Out[47]:
In [48]:
CNN_model.optimizer=0.1
CNN_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
Out[48]:
In [49]:
CNN_model.optimizer=0.01
CNN_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=4, verbose=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
Out[49]:
In [50]:
# Running again until validation accuracy stops increasing
CNN_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=4, verbose=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
Out[50]:
In [6]:
gen = image.ImageDataGenerator(rotation_range=8, width_shift_range=0.08, shear_range=0.3,
height_shift_range=0.08, zoom_range=0.08)
trn_batches = gen.flow(x_train, y_train, batch_size=64)
tst_batches = gen.flow(x_test, y_test, batch_size=64)
In [55]:
CNN_Aug_model = ConvModel()
CNN_Aug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
# upping LR
print("Learning Rate, η = 0.1")
CNN_Aug_model.optimizer.lr=0.1
CNN_Aug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
# brining LR back down for more epochs
print("Learning Rate, η = 0.01")
CNN_Aug_model.optimizer.lr=0.01
CNN_Aug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=4, verbose=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
Out[55]:
In [56]:
# 4 more epochs at η=0.01
CNN_Aug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=4, verbose=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
Out[56]:
See this thread for info on BatchNorm axis.
In [8]:
def ConvModelBN():
model = Sequential([
Lambda(norm_input, input_shape=(1, 28, 28), output_shape=(1, 28, 28)),
Convolution2D(32, 3, 3, activation='relu'),
BatchNormalization(axis=1),
Convolution2D(32, 3, 3, activation='relu'),
MaxPooling2D(),
BatchNormalization(axis=1),
Convolution2D(64, 3, 3, activation='relu'),
BatchNormalization(axis=1),
Convolution2D(64, 3, 3, activation='relu'),
MaxPooling2D(),
Flatten(),
BatchNormalization(),
Dense(512, activation='relu'),
BatchNormalization(),
Dense(10, activation='softmax')
])
model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
return model
In [9]:
CNN_BNAug_model = ConvModelBN()
CNN_BNAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
print("Learning Rate, η = 0.1")
CNN_BNAug_model.optimizer=0.1
CNN_BNAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=2, verbose=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
print("Learning Rate, η = 0.01")
CNN_BNAug_model.optimizer=0.01
CNN_BNAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=6, verbose=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
Out[9]:
In [10]:
# some more training at 0.1 and 0.01:
print("Learning Rate, η = 0.1")
CNN_BNAug_model.optimizer=0.1
CNN_BNAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
print("Learning Rate, η = 0.01")
CNN_BNAug_model.optimizer=0.01
CNN_BNAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=6, verbose=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
Out[10]:
In [7]:
def ConvModelBNDo():
model = Sequential([
Lambda(norm_input, input_shape=(1, 28, 28), output_shape=(1, 28, 28)),
Convolution2D(32, 3, 3, activation='relu'),
BatchNormalization(axis=1),
Convolution2D(32, 3, 3, activation='relu'),
MaxPooling2D(),
BatchNormalization(axis=1),
Convolution2D(64, 3, 3, activation='relu'),
BatchNormalization(axis=1),
Convolution2D(64, 3, 3, activation='relu'),
MaxPooling2D(),
Flatten(),
BatchNormalization(),
Dense(512, activation='relu'),
BatchNormalization(),
Dropout(0.5),
Dense(10, activation='softmax')
])
model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
return model
In [12]:
CNN_BNDoAug_model = ConvModelBNDo()
CNN_BNDoAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
print("Learning Rate, η = 0.1")
CNN_BNDoAug_model.optimizer.lr=0.1
CNN_BNDoAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=4, verbose=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
print("Learning Rate, η = 0.01")
CNN_BNDoAug_model.optimizer.lr=0.01
CNN_BNDoAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=6, verbose=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
Out[12]:
In [13]:
# 6 more epochs at 0.01
CNN_BNDoAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=6, verbose=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
Out[13]:
In [14]:
print("Learning Rate η = 0.001")
CNN_BNDoAug_model.optimizer.lr=0.001
CNN_BNDoAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=12, verbose=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
Out[14]:
Define a function to automatically train a model:
In [8]:
# I'll set it to display progress at the start of each LR-change
def train_model():
model = ConvModelBNDo()
model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
model.optimizer.lr=0.1
model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
model.fit_generator(trn_batches, trn_batches.n, nb_epoch=3, verbose=0,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
model.optimizer.lr=0.01
model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
model.fit_generator(trn_batches, trn_batches.n, nb_epoch=11, verbose=0,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
model.optimizer.lr=0.001
model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
model.fit_generator(trn_batches, trn_batches.n, nb_epoch=11, verbose=0,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
return model
In [9]:
# Running a little test on the GPU now
testmodel = ConvModelBNDo()
testmodel.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
validation_data=tst_batches, nb_val_samples=tst_batches.n)
Out[9]:
I finally got my GPU running on my workstation. Decided to leave the ghost of Bill Gates alone and put Ubuntu Linux on the second harddrive. This nvidia GTX 870M takes 17 seconds to get through the 60,000 images. The Core i5 on my Mac took an average of 340. A 20x speed up. This also means, at those numbers, a 6-strong ensemble running the regime in train_model() will take about 49 minutes and 18 seconds, instead of 16 hours and 26 minutes. You can see what the motivation was, for me to spend ~9 hours today and get the GPU working. It's a warm feeling, knowing your computer isn't just good for playing DOOM, but'll be doing its share of work real soon.
So, onward:
Create an array of models
In [12]:
# this'll take some time
models = [train_model() for m in xrange(6)]
Save the models' weights -- bc this wasn't computationally cheap
In [14]:
from os import getcwd
path = getcwd() + 'data/mnist/'
model_path = path + 'models/'
for i,m in enumerate(models):
m.save_weights(model_path + 'MNIST_CNN' + str(i) + '.pkl')
Create an array of predictions from the models on the test-set. I'm using a batch size of 256 because that's what was done in lecture, and prediction is such an easier task that I think the large size just helps things go faster.
In [17]:
ensemble_preds = np.stack([m.predict(x_test, batch_size=256) for m in models])
Finally, take the average of the predictions:
In [18]:
avg_preds = ensemble_preds.mean(axis=0)
In [20]:
keras.metrics.categorical_accuracy(y_test, avg_preds).eval()
Out[20]:
Boom. 0.99699.. ~ 99.7% accuracy. Same as achieved in lecture; took roughly 50 minutes to train. Unfortunately I didn't have the h5py module installed when I ran this, so the weight's can't be saved easily -- simple fix of rerunning after install.
Trying the above again, this time having h5py installed.
In [9]:
# this'll take some time
models = [train_model() for m in xrange(6)]
In [18]:
from os import getcwd
import os
path = getcwd() + '/data/mnist/'
model_path = path + 'models/'
if not os.path.exists(path):
os.mkdir('data')
os.mkdir('data/mnist')
if not os.path.exists(model_path): os.mkdir(model_path)
for i,m in enumerate(models):
m.save_weights(model_path + 'MNIST_CNN' + str(i) + '.pkl')
In [19]:
ensemble_preds = np.stack([m.predict(x_test, batch_size=256) for m in models])
avg_preds = ensemble_preds.mean(axis=0)
In [21]:
keras.metrics.categorical_accuracy(y_test, avg_preds).eval()
Out[21]:
And that's it. 99.71% -- 19 May 2017 - Wayne H Nixalo