MNIST digit recognition with a normal feedforward NN

All credits to: https://github.com/fchollet/keras/blob/master/examples/mnist_mlp.py. The following code is a modified version of the above. The error rate is about 0.9% after 10 epochs of training.


In [1]:
import numpy as np
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils


Using Theano backend.
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10).  Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: GeForce GTX 760 (CNMeM is enabled with initial size: 40.0% of memory, cuDNN 5110)

In [2]:
# Fix random seed for reproducibility
seed = 7
np.random.seed(seed)

# Load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()

In [3]:
# Flatten 28*28 images to a 784 vector for each image
num_pixels = X_train.shape[1] * X_train.shape[2]
X_train = X_train.reshape(X_train.shape[0], num_pixels).astype('float32')
X_test = X_test.reshape(X_test.shape[0], num_pixels).astype('float32')

num_pixels is equal to 748

X_train will have the shape (60000, 748)

X_test will have the shape (10000, 748)


In [4]:
# Normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255

In [5]:
# one-hot-encode outputs (Bsp: 2 --> [0,0,1,0,0,0,0,0,0,0])
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

num_classes = y_test.shape[1]

one-hot-encoding is used because in the network, there is one neuron for one number...

To predict the networks output, one takes the index of the most active neuron and thereby converts the one-hot-vector back into a number.


In [6]:
def model():
	# Create model
	model = Sequential()
	model.add(Dense(num_pixels, input_dim=num_pixels, activation='relu'))
	model.add(Dense(num_classes, activation='softmax'))
	# Compile model
	model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model

'softmax' is a sigmoid shaped curve

'categorical_crossentropy' is the used loss-function or error-function

'adam' is the specified way of performing gradient descent

Run the model


In [8]:
# Build the model
model = model()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Error: %.2f%%" % (100-scores[1]*100))


Train on 60000 samples, validate on 10000 samples
Epoch 1/10
0s - loss: 0.2772 - acc: 0.9218 - val_loss: 0.1385 - val_acc: 0.9591
Epoch 2/10
0s - loss: 0.1129 - acc: 0.9670 - val_loss: 0.1058 - val_acc: 0.9669
Epoch 3/10
0s - loss: 0.0737 - acc: 0.9786 - val_loss: 0.0744 - val_acc: 0.9773
Epoch 4/10
0s - loss: 0.0508 - acc: 0.9850 - val_loss: 0.0742 - val_acc: 0.9764
Epoch 5/10
0s - loss: 0.0375 - acc: 0.9890 - val_loss: 0.0649 - val_acc: 0.9805
Epoch 6/10
0s - loss: 0.0276 - acc: 0.9928 - val_loss: 0.0682 - val_acc: 0.9770
Epoch 7/10
0s - loss: 0.0205 - acc: 0.9946 - val_loss: 0.0607 - val_acc: 0.9809
Epoch 8/10
0s - loss: 0.0148 - acc: 0.9967 - val_loss: 0.0621 - val_acc: 0.9811
Epoch 9/10
0s - loss: 0.0108 - acc: 0.9978 - val_loss: 0.0685 - val_acc: 0.9785
Epoch 10/10
0s - loss: 0.0108 - acc: 0.9975 - val_loss: 0.0619 - val_acc: 0.9809
Error: 1.91%