Introduction to Keras with MNIST

Import various modules that we need for this notebook.


In [1]:
%pylab inline

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD, RMSprop
from keras.utils import np_utils
from keras.regularizers import l2


Using Theano backend.
/Users/taylor/anaconda3/lib/python3.5/site-packages/theano/tensor/signal/downsample.py:5: UserWarning: downsample module has been moved to the pool module.
  warnings.warn("downsample module has been moved to the pool module.")
Populating the interactive namespace from numpy and matplotlib

Load the MNIST dataset, flatten the images, convert the class labels, and scale the data.


In [2]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(60000, 28**2).astype('float32') / 255
X_test = X_test.reshape(10000, 28**2).astype('float32') / 255
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)

I. Basic example

Build and compile a basic model.


In [3]:
model = Sequential()
model.add(Dense(512, input_shape=(28 * 28,)))
model.add(Activation("sigmoid"))
model.add(Dense(10))
          
sgd = SGD(lr = 0.01, momentum = 0.9, nesterov = True)
model.compile(loss='mse', optimizer=sgd)

Fit the model over 25 epochs.


In [4]:
model.fit(X_train, Y_train, batch_size=32, nb_epoch=10,
          verbose=1, show_accuracy=True, validation_split=0.1)


Train on 54000 samples, validate on 6000 samples
Epoch 1/10
54000/54000 [==============================] - 9s - loss: 0.0496 - acc: 0.7842 - val_loss: 0.0403 - val_acc: 0.8690
Epoch 2/10
54000/54000 [==============================] - 10s - loss: 0.0427 - acc: 0.8421 - val_loss: 0.0390 - val_acc: 0.8727
Epoch 3/10
54000/54000 [==============================] - 10s - loss: 0.0415 - acc: 0.8462 - val_loss: 0.0384 - val_acc: 0.8810
Epoch 4/10
54000/54000 [==============================] - 10s - loss: 0.0407 - acc: 0.8494 - val_loss: 0.0380 - val_acc: 0.8725
Epoch 5/10
54000/54000 [==============================] - 10s - loss: 0.0401 - acc: 0.8517 - val_loss: 0.0380 - val_acc: 0.8827
Epoch 6/10
54000/54000 [==============================] - 11s - loss: 0.0394 - acc: 0.8548 - val_loss: 0.0362 - val_acc: 0.8788
Epoch 7/10
54000/54000 [==============================] - 11s - loss: 0.0386 - acc: 0.8562 - val_loss: 0.0351 - val_acc: 0.8820
Epoch 8/10
54000/54000 [==============================] - 13s - loss: 0.0376 - acc: 0.8611 - val_loss: 0.0346 - val_acc: 0.8863
Epoch 9/10
54000/54000 [==============================] - 13s - loss: 0.0364 - acc: 0.8651 - val_loss: 0.0328 - val_acc: 0.8892
Epoch 10/10
54000/54000 [==============================] - 12s - loss: 0.0351 - acc: 0.8706 - val_loss: 0.0320 - val_acc: 0.8928
Out[4]:
<keras.callbacks.History at 0x104402e48>

Evaluate model on the test set


In [6]:
print("Test classification rate %0.05f" % model.evaluate(X_test, Y_test, show_accuracy=True)[1])


10000/10000 [==============================] - 0s     
Test classification rate 0.87740

Predict classes on the test set.


In [ ]:
y_hat = model.predict_classes(X_test)
pd.crosstab(y_hat, y_test)

II. Deeper model with dropout and cross entropy

Let's now build a deeper model, with three hidden dense layers and dropout layers. I'll use rectified linear units as they tend to perform better on deep models. I also initilize the nodes using "glorot_normal", which uses Gaussian noise scaled by the sum of the inputs plus outputs from the node. Notice that we do not need to give an input shape to any layers other than the first.


In [ ]:
model = Sequential()

model.add(Dense(512, input_shape=(28 * 28,), init="glorot_normal"))
model.add(Activation("relu"))
model.add(Dropout(0.5))

model.add(Dense(512, init="glorot_normal"))
model.add(Activation("relu"))
model.add(Dropout(0.5))

model.add(Dense(512, init="glorot_normal"))
model.add(Activation("relu"))
model.add(Dropout(0.5))

model.add(Dense(512, init="glorot_normal"))
model.add(Activation("relu"))
model.add(Dropout(0.5))

model.add(Dense(10))
model.add(Activation('softmax'))

In [ ]:
sgd = SGD(lr = 0.01, momentum = 0.9, nesterov = True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)

model.fit(X_train, Y_train, batch_size=32, nb_epoch=10,
          verbose=1, show_accuracy=True, validation_split=0.1)

In [ ]:
print("Test classification rate %0.05f" % model.evaluate(X_test, Y_test, show_accuracy=True)[1])
fy_hat = model.predict_classes(X_test)
pd.crosstab(y_hat, y_test)

In [ ]:
test_wrong = [im for im in zip(X_test,y_hat,y_test) if im[1] != im[2]]

plt.figure(figsize=(15, 15))
for ind, val in enumerate(test_wrong[:100]):
    plt.subplot(10, 10, ind + 1)
    im = 1 - val[0].reshape((28,28))
    axis("off")
    plt.imshow(im, cmap='gray')

III. Small model: Visualizing weights

Now, I want to make a model that has only a small number of hidden nodes in each layer. We may then have a chance of actually visualizing the weights.


In [ ]:
model = Sequential()

model.add(Dense(16, input_shape=(28 * 28,), init="glorot_normal"))
model.add(Activation("relu"))
model.add(Dropout(0.5))

model.add(Dense(16, init="glorot_normal"))
model.add(Activation("relu"))
model.add(Dropout(0.5))

model.add(Dense(10))
model.add(Activation('softmax'))

rms = RMSprop()
model.compile(loss='categorical_crossentropy', optimizer=rms)

model.fit(X_train, Y_train, batch_size=32, nb_epoch=10,
          verbose=1, show_accuracy=True, validation_split=0.1)

The classification rate on the validation set is not nearly as predictive, but it is still not too bad overall. A model object contains a list of its layers. The weights are easy to pull out.


In [ ]:
print(model.layers) # list of the layers
print(model.layers[0].get_weights()[0].shape) # the weights

The first set of weights will be given as weights the same size as the input space. Notice how


In [ ]:
W1 = model.layers[0].get_weights()[0]

for ind, val in enumerate(W1.T):
    plt.figure(figsize=(3, 3), frameon=False)
    im = val.reshape((28,28))
    plt.axis("off")
    plt.imshow(im, cmap='seismic')

The second layer of weights will be given as a single 16x16 matrix of weights.


In [ ]:
W2 = model.layers[3].get_weights()[0]

plt.figure(figsize=(3, 3))
im = W2.reshape((16,16))
plt.axis("off")
plt.imshow(im, cmap='seismic')

IV. Further tweaks: weights and alternative optimizers

Just to show off a few more tweaks, we'll run one final model. Here we use weights and an alternative to vanillia stochastic gradient descent.


In [ ]:
model = Sequential()

model.add(Dense(128, input_shape=(28 * 28,), init="glorot_normal"))
model.add(Activation("relu"))
model.add(Dropout(0.5))

model.add(Dense(512, init="glorot_normal",W_regularizer=l2(0.1)))
model.add(Activation("relu"))
model.add(Dropout(0.2))

model.add(Dense(512, init="glorot_normal",W_regularizer=l2(0.1)))
model.add(Activation("relu"))
model.add(Dropout(0.2))

model.add(Dense(10))
model.add(Activation('softmax'))

In [ ]:
rms = RMSprop()
model.compile(loss='categorical_crossentropy', optimizer=rms)

model.fit(X_train, Y_train, batch_size=32, nb_epoch=5,
          verbose=1, show_accuracy=True, validation_split=0.1)

In [ ]: