Introduction to Keras with MNIST

Import various modules that we need for this notebook.


In [1]:
%pylab inline

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD, RMSprop
from keras.utils import np_utils
from keras.regularizers import l2


Using Theano backend.
/Users/taylor/anaconda3/lib/python3.5/site-packages/theano/tensor/signal/downsample.py:5: UserWarning: downsample module has been moved to the pool module.
  warnings.warn("downsample module has been moved to the pool module.")
Populating the interactive namespace from numpy and matplotlib

Load the MNIST dataset, flatten the images, convert the class labels, and scale the data.


In [2]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(60000, 28**2).astype('float32') / 255
X_test = X_test.reshape(10000, 28**2).astype('float32') / 255
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)

I. Basic example

Build and compile a basic model.


In [3]:
model = Sequential()
model.add(Dense(512, input_shape=(28 * 28,)))
model.add(Activation("sigmoid"))
model.add(Dense(10))
          
sgd = SGD(lr = 0.01, momentum = 0.9, nesterov = True)
model.compile(loss='mse', optimizer=sgd)

Fit the model over 25 epochs.


In [4]:
model.fit(X_train, Y_train, batch_size=32, nb_epoch=10,
          verbose=1, show_accuracy=True, validation_split=0.1)


Train on 54000 samples, validate on 6000 samples
Epoch 1/10
54000/54000 [==============================] - 10s - loss: 0.0497 - acc: 0.7867 - val_loss: 0.0419 - val_acc: 0.8592
Epoch 2/10
54000/54000 [==============================] - 11s - loss: 0.0427 - acc: 0.8439 - val_loss: 0.0388 - val_acc: 0.8773
Epoch 3/10
54000/54000 [==============================] - 11s - loss: 0.0415 - acc: 0.8475 - val_loss: 0.0392 - val_acc: 0.8683
Epoch 4/10
54000/54000 [==============================] - 11s - loss: 0.0408 - acc: 0.8499 - val_loss: 0.0370 - val_acc: 0.8857
Epoch 5/10
54000/54000 [==============================] - 13s - loss: 0.0402 - acc: 0.8514 - val_loss: 0.0368 - val_acc: 0.8848
Epoch 6/10
54000/54000 [==============================] - 13s - loss: 0.0395 - acc: 0.8549 - val_loss: 0.0378 - val_acc: 0.8710
Epoch 7/10
54000/54000 [==============================] - 14s - loss: 0.0388 - acc: 0.8582 - val_loss: 0.0356 - val_acc: 0.8848
Epoch 8/10
54000/54000 [==============================] - 14s - loss: 0.0379 - acc: 0.8607 - val_loss: 0.0341 - val_acc: 0.8918
Epoch 9/10
54000/54000 [==============================] - 13s - loss: 0.0368 - acc: 0.8656 - val_loss: 0.0329 - val_acc: 0.8930
Epoch 10/10
54000/54000 [==============================] - 14s - loss: 0.0355 - acc: 0.8705 - val_loss: 0.0332 - val_acc: 0.8932
Out[4]:
<keras.callbacks.History at 0x1041ffbe0>

Evaluate model on the test set


In [5]:
print("Test classification rate %0.05f" % model.evaluate(X_test, Y_test, show_accuracy=True)[1])


10000/10000 [==============================] - 0s     
Test classification rate 0.87960

Predict classes on the test set.


In [6]:
y_hat = model.predict_classes(X_test)
pd.crosstab(y_hat, y_test)


10000/10000 [==============================] - 0s     
Out[6]:
col_0 0 1 2 3 4 5 6 7 8 9
row_0
0 947 0 18 3 0 17 15 5 16 18
1 0 1116 32 6 25 16 7 45 39 14
2 1 2 848 16 4 5 6 16 10 1
3 2 2 30 923 1 77 0 8 45 17
4 0 0 8 0 828 6 7 10 7 27
5 7 1 0 10 3 690 21 1 35 2
6 16 5 36 9 22 22 901 3 18 2
7 1 0 16 13 1 11 0 880 5 23
8 4 9 36 16 6 25 1 0 760 2
9 2 0 8 14 92 23 0 60 39 903

II. Deeper model with dropout and cross entropy

Let's now build a deeper model, with four hidden dense layers and dropout layers. I'll use rectified linear units as they tend to perform better on deep models. I also initilize the nodes using "glorot_normal", which uses Gaussian noise scaled by the sum of the inputs plus outputs from the node. Notice that we do not need to give an input shape to any layers other than the first.


In [7]:
model = Sequential()

model.add(Dense(512, input_shape=(28 * 28,), init="glorot_normal"))
model.add(Activation("relu"))
model.add(Dropout(0.5))

model.add(Dense(512, init="glorot_normal"))
model.add(Activation("relu"))
model.add(Dropout(0.5))

model.add(Dense(512, init="glorot_normal"))
model.add(Activation("relu"))
model.add(Dropout(0.5))

model.add(Dense(512, init="glorot_normal"))
model.add(Activation("relu"))
model.add(Dropout(0.5))

model.add(Dense(10))
model.add(Activation('softmax'))

In [8]:
sgd = SGD(lr = 0.01, momentum = 0.9, nesterov = True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)

In [9]:
model.fit(X_train, Y_train, batch_size=32, nb_epoch=10,
          verbose=1, show_accuracy=True, validation_split=0.1)


Train on 54000 samples, validate on 6000 samples
Epoch 1/10
54000/54000 [==============================] - 27s - loss: 0.5660 - acc: 0.8167 - val_loss: 0.1335 - val_acc: 0.9612
Epoch 2/10
54000/54000 [==============================] - 27s - loss: 0.2677 - acc: 0.9242 - val_loss: 0.0948 - val_acc: 0.9727
Epoch 3/10
54000/54000 [==============================] - 28s - loss: 0.2115 - acc: 0.9408 - val_loss: 0.0894 - val_acc: 0.9747
Epoch 4/10
54000/54000 [==============================] - 29s - loss: 0.1840 - acc: 0.9478 - val_loss: 0.0771 - val_acc: 0.9780
Epoch 5/10
54000/54000 [==============================] - 29s - loss: 0.1617 - acc: 0.9538 - val_loss: 0.0739 - val_acc: 0.9782
Epoch 6/10
54000/54000 [==============================] - 29s - loss: 0.1419 - acc: 0.9602 - val_loss: 0.0729 - val_acc: 0.9805
Epoch 7/10
54000/54000 [==============================] - 31s - loss: 0.1373 - acc: 0.9611 - val_loss: 0.0766 - val_acc: 0.9780
Epoch 8/10
54000/54000 [==============================] - 33s - loss: 0.1268 - acc: 0.9637 - val_loss: 0.0632 - val_acc: 0.9818
Epoch 9/10
54000/54000 [==============================] - 32s - loss: 0.1148 - acc: 0.9674 - val_loss: 0.0693 - val_acc: 0.9808
Epoch 10/10
54000/54000 [==============================] - 30s - loss: 0.1084 - acc: 0.9695 - val_loss: 0.0706 - val_acc: 0.9800
Out[9]:
<keras.callbacks.History at 0x123192cc0>

In [ ]:
print("Test classification rate %0.05f" % model.evaluate(X_test, Y_test, show_accuracy=True)[1])
fy_hat = model.predict_classes(X_test)
pd.crosstab(y_hat, y_test)

In [ ]:
test_wrong = [im for im in zip(X_test,y_hat,y_test) if im[1] != im[2]]

plt.figure(figsize=(15, 15))
for ind, val in enumerate(test_wrong[:100]):
    plt.subplot(10, 10, ind + 1)
    im = 1 - val[0].reshape((28,28))
    axis("off")
    plt.imshow(im, cmap='gray')

III. Small model: Visualizing weights

Now, I want to make a model that has only a small number of hidden nodes in each layer. We may then have a chance of actually visualizing the weights.


In [ ]:
model = Sequential()

model.add(Dense(16, input_shape=(28 * 28,), init="glorot_normal"))
model.add(Activation("relu"))
model.add(Dropout(0.5))

model.add(Dense(16, init="glorot_normal"))
model.add(Activation("relu"))
model.add(Dropout(0.5))

model.add(Dense(10))
model.add(Activation('softmax'))

rms = RMSprop()
model.compile(loss='categorical_crossentropy', optimizer=rms)

model.fit(X_train, Y_train, batch_size=32, nb_epoch=10,
          verbose=1, show_accuracy=True, validation_split=0.1)

The classification rate on the validation set is not nearly as predictive, but it is still not too bad overall. A model object contains a list of its layers. The weights are easy to pull out.


In [ ]:
print(model.layers) # list of the layers
print(model.layers[0].get_weights()[0].shape) # the weights

The first set of weights will be given as weights the same size as the input space. Notice how


In [ ]:
W1 = model.layers[0].get_weights()[0]

for ind, val in enumerate(W1.T):
    plt.figure(figsize=(3, 3), frameon=False)
    im = val.reshape((28,28))
    plt.axis("off")
    plt.imshow(im, cmap='seismic')

The second layer of weights will be given as a single 16x16 matrix of weights.


In [ ]:
W2 = model.layers[3].get_weights()[0]

plt.figure(figsize=(3, 3))
im = W2.reshape((16,16))
plt.axis("off")
plt.imshow(im, cmap='seismic')

IV. Further tweaks: weights and alternative optimizers

Just to show off a few more tweaks, we'll run one final model. Here we use weights and an alternative to vanillia stochastic gradient descent.


In [ ]:
model = Sequential()

model.add(Dense(128, input_shape=(28 * 28,), init="glorot_normal"))
model.add(Activation("relu"))
model.add(Dropout(0.5))

model.add(Dense(512, init="glorot_normal",W_regularizer=l2(0.1)))
model.add(Activation("relu"))
model.add(Dropout(0.2))

model.add(Dense(512, init="glorot_normal",W_regularizer=l2(0.1)))
model.add(Activation("relu"))
model.add(Dropout(0.2))

model.add(Dense(10))
model.add(Activation('softmax'))

In [ ]:
rms = RMSprop()
model.compile(loss='categorical_crossentropy', optimizer=rms)

model.fit(X_train, Y_train, batch_size=32, nb_epoch=5,
          verbose=1, show_accuracy=True, validation_split=0.1)

In [ ]: