Convolutional Neural Networks


In this notebook, we train an MLP to classify images from the MNIST database.

1. Load MNIST Database


In [1]:
from keras.datasets import mnist

# use Keras to import pre-shuffled MNIST database
(X_train, y_train), (X_test, y_test) = mnist.load_data()

print("The MNIST database has a training set of %d examples." % len(X_train))
print("The MNIST database has a test set of %d examples." % len(X_test))


Using TensorFlow backend.
The MNIST database has a training set of 60000 examples.
The MNIST database has a test set of 10000 examples.

2. Visualize the First Six Training Images


In [2]:
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.cm as cm
import numpy as np

# plot first six training images
fig = plt.figure(figsize=(20,20))
for i in range(6):
    ax = fig.add_subplot(1, 6, i+1, xticks=[], yticks=[])
    ax.imshow(X_train[i], cmap='gray')
    ax.set_title(str(y_train[i]))


3. View an Image in More Detail


In [3]:
def visualize_input(img, ax):
    ax.imshow(img, cmap='gray')
    width, height = img.shape
    thresh = img.max()/2.5
    for x in range(width):
        for y in range(height):
            ax.annotate(str(round(img[x][y],2)), xy=(y,x),
                        horizontalalignment='center',
                        verticalalignment='center',
                        color='white' if img[x][y]<thresh else 'black')

fig = plt.figure(figsize = (12,12)) 
ax = fig.add_subplot(111)
visualize_input(X_train[0], ax)


4. Rescale the Images by Dividing Every Pixel in Every Image by 255


In [4]:
# rescale [0,255] --> [0,1]
X_train = X_train.astype('float32')/255
X_test = X_test.astype('float32')/255

5. Encode Categorical Integer Labels Using a One-Hot Scheme


In [5]:
from keras.utils import np_utils

# print first ten (integer-valued) training labels
print('Integer-valued labels:')
print(y_train[:10])

# one-hot encode the labels
y_train = np_utils.to_categorical(y_train, 10)
y_test = np_utils.to_categorical(y_test, 10)

# print first ten (one-hot) training labels
print('One-hot labels:')
print(y_train[:10])


Integer-valued labels:
[5 0 4 1 9 2 1 3 1 4]
One-hot labels:
[[ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.]
 [ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]
 [ 0.  0.  1.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]]

6. Define the Model Architecture


In [6]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten

# define the model
model = Sequential()
model.add(Flatten(input_shape=X_train.shape[1:]))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='softmax'))

# summarize the model
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_1 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               401920    
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 512)               262656    
_________________________________________________________________
dropout_2 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                5130      
=================================================================
Total params: 669,706.0
Trainable params: 669,706.0
Non-trainable params: 0.0
_________________________________________________________________

7. Compile the Model


In [7]:
# compile the model
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', 
              metrics=['accuracy'])

8. Calculate the Classification Accuracy on the Test Set (Before Training)


In [8]:
# evaluate test accuracy
score = model.evaluate(X_test, y_test, verbose=0)
accuracy = 100*score[1]

# print test accuracy
print('Test accuracy: %.4f%%' % accuracy)


Test accuracy: 6.2500%

9. Train the Model


In [9]:
from keras.callbacks import ModelCheckpoint   

# train the model
checkpointer = ModelCheckpoint(filepath='mnist.model.best.hdf5', 
                               verbose=1, save_best_only=True)
hist = model.fit(X_train, y_train, batch_size=128, epochs=10,
          validation_split=0.2, callbacks=[checkpointer],
          verbose=1, shuffle=True)


Train on 48000 samples, validate on 12000 samples
Epoch 1/10
47104/48000 [============================>.] - ETA: 0s - loss: 0.2756 - acc: 0.9150 Epoch 00000: val_loss improved from inf to 0.11112, saving model to mnist.model.best.hdf5
48000/48000 [==============================] - 3s - loss: 0.2735 - acc: 0.9155 - val_loss: 0.1111 - val_acc: 0.9663
Epoch 2/10
47616/48000 [============================>.] - ETA: 0s - loss: 0.1105 - acc: 0.9664Epoch 00001: val_loss improved from 0.11112 to 0.09212, saving model to mnist.model.best.hdf5
48000/48000 [==============================] - 2s - loss: 0.1104 - acc: 0.9664 - val_loss: 0.0921 - val_acc: 0.9739
Epoch 3/10
47488/48000 [============================>.] - ETA: 0s - loss: 0.0777 - acc: 0.9764Epoch 00002: val_loss improved from 0.09212 to 0.09021, saving model to mnist.model.best.hdf5
48000/48000 [==============================] - 2s - loss: 0.0779 - acc: 0.9763 - val_loss: 0.0902 - val_acc: 0.9741
Epoch 4/10
47232/48000 [============================>.] - ETA: 0s - loss: 0.0629 - acc: 0.9815Epoch 00003: val_loss did not improve
48000/48000 [==============================] - 2s - loss: 0.0630 - acc: 0.9814 - val_loss: 0.1114 - val_acc: 0.9710
Epoch 5/10
47616/48000 [============================>.] - ETA: 0s - loss: 0.0529 - acc: 0.9840Epoch 00004: val_loss did not improve
48000/48000 [==============================] - 2s - loss: 0.0534 - acc: 0.9839 - val_loss: 0.0987 - val_acc: 0.9752
Epoch 6/10
47616/48000 [============================>.] - ETA: 0s - loss: 0.0442 - acc: 0.9868Epoch 00005: val_loss improved from 0.09021 to 0.08665, saving model to mnist.model.best.hdf5
48000/48000 [==============================] - 3s - loss: 0.0443 - acc: 0.9867 - val_loss: 0.0866 - val_acc: 0.9798
Epoch 7/10
47616/48000 [============================>.] - ETA: 0s - loss: 0.0384 - acc: 0.9886Epoch 00006: val_loss did not improve
48000/48000 [==============================] - 3s - loss: 0.0386 - acc: 0.9886 - val_loss: 0.0926 - val_acc: 0.9800
Epoch 8/10
47616/48000 [============================>.] - ETA: 0s - loss: 0.0324 - acc: 0.9900    Epoch 00007: val_loss did not improve
48000/48000 [==============================] - 3s - loss: 0.0323 - acc: 0.9900 - val_loss: 0.0919 - val_acc: 0.9800
Epoch 9/10
47360/48000 [============================>.] - ETA: 0s - loss: 0.0310 - acc: 0.9907Epoch 00008: val_loss did not improve
48000/48000 [==============================] - 3s - loss: 0.0310 - acc: 0.9907 - val_loss: 0.0996 - val_acc: 0.9805
Epoch 10/10
47744/48000 [============================>.] - ETA: 0s - loss: 0.0272 - acc: 0.9923Epoch 00009: val_loss did not improve
48000/48000 [==============================] - 3s - loss: 0.0272 - acc: 0.9923 - val_loss: 0.1024 - val_acc: 0.9802

10. Load the Model with the Best Classification Accuracy on the Validation Set


In [10]:
# load the weights that yielded the best validation accuracy
model.load_weights('mnist.model.best.hdf5')

11. Calculate the Classification Accuracy on the Test Set


In [11]:
# evaluate test accuracy
score = model.evaluate(X_test, y_test, verbose=0)
accuracy = 100*score[1]

# print test accuracy
print('Test accuracy: %.4f%%' % accuracy)


Test accuracy: 98.2100%