Note that this excerpt contains only the raw code - the book is rich with additional explanations and illustrations. If you find this content useful, please consider supporting the work by buying the book!
Although we achieved a formidable score with the MLP above, our result does not hold up to state-of-the-art results. Currently the best result has close to 99.8% accuracy—better than human performance! This is why nowadays, the task of classifying handwritten digits is largely regarded as solved.
To get closer to the state-of-the-art results, we need to use state-of-the-art techniques. Thus, we return to Keras.
In [1]:
import numpy as np
np.random.seed(1337) # for reproducibility
Keras provides a loading function similar to train_test_split from scikit-learn's
model_selection
module. Its syntax might look strangely familiar to you:
In [2]:
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
The neural nets in Keras act on the feature matrix slightly differently than the standard
OpenCV and scikit-learn estimators. Whereas the rows of a feature matrix in Keras still
correspond to the number of samples (X_train.shape[0]
in the code below), we can
preserve the two-dimensional nature of the input images by adding more dimensions to the
feature matrix:
In [3]:
img_rows, img_cols = 28, 28
X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0
Here we have reshaped the feature matrix into a four-dimensional matrix with dimensions
n_features x 28 x 28 x 1
.
We also made sure we operate on 32-bit floating
point numbers between [0, 1], rather than unsigned integers in [0, 255].
Then, we can one-hot encode the training labels like we did before. This will make sure each
category of target labels can be assigned to a neuron in the output layer. We could do this
with scikit-learn's preprocessing
, but in this case it is easier to use Keras' own utility
function:
In [4]:
from keras.utils import np_utils
n_classes = 10
Y_train = np_utils.to_categorical(y_train, n_classes)
Y_test = np_utils.to_categorical(y_test, n_classes)
In [5]:
from keras.models import Sequential
model = Sequential()
However, this time, we will be smarter about the individual layers. We will design our neural network around a convolutional layer, where the kernel is a 3 x 3 pixel two-dimensional convolution.
A two-dimensional convolutional layer operates akin to image filtering in OpenCV, where each image in the input data is convolved with a small two-dimensional kernel. In Keras, we can specify the kernel size and the stride:
In [6]:
from keras.layers import Conv2D
n_filters = 32
kernel_size = (3, 3)
model.add(Conv2D(n_filters, (kernel_size[0], kernel_size[1]),
padding='valid',
input_shape=input_shape))
After that, we will use a linear rectified unit as an activation function:
In [7]:
from keras.layers import Activation
model.add(Activation('relu'))
In a deep convolutional neural net, we can have as many layers as we want. A popular version of this structure applied to MNIST involves performing the convolution and rectification twice:
In [8]:
model.add(Conv2D(n_filters, (kernel_size[0], kernel_size[1])))
model.add(Activation('relu'))
Finally, we will pool the activations and add a Dropout
layer:
In [9]:
from keras.layers import MaxPooling2D, Dropout
pool_size = (2, 2)
model.add(MaxPooling2D(pool_size=pool_size))
model.add(Dropout(0.25))
Then we will flatten the model and finally pass it through a softmax
function to arrive at
the output layer:
In [10]:
from keras.layers import Flatten, Dense
model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(n_classes))
model.add(Activation('softmax'))
Here, we will use the cross-entropy loss and the Adadelta algorithm:
In [11]:
model.compile(loss='categorical_crossentropy',
optimizer='adadelta',
metrics=['accuracy'])
In [12]:
model.fit(X_train, Y_train, batch_size=128, epochs=12,
verbose=1, validation_data=(X_test, Y_test))
Out[12]:
After training completes, we can evaluate the classifier:
In [13]:
model.evaluate(X_test, Y_test, verbose=1)
Out[13]:
And we achieve 99.25% accuracy! Worlds apart from the MLP classifier we implemented before. And this is just one way to do things. As you can see, neural networks provide a plethora of tuning parameters, and it is not at all clear which ones will lead to the best performance.