MNIST classification

In this session we will learn to design and train simple neural networks and observe their performance on a digit dataset called MNIST. To do this, we will use the Keras API with tensorflow, which will allow us to train and evaluate models in a simple way with a few lines of code.


In [ ]:
import matplotlib.pyplot as plt  
%matplotlib inline
from utils import plot_samples, plot_curves
import time

In [ ]:
import numpy as np
# force random seed for results to be reproducible
SEED = 4242
np.random.seed(SEED)

Dataset

Let's begin by loading the MNIST dataset, which we will use during the whole session.


In [ ]:
from keras.datasets import mnist
from keras.utils import np_utils

# Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# Display some of the samples
plot_samples(X_train)
X_train.shape

Multiclass softmax

We will design a network with a single layer, with as many neurons as categories in our dataset. Each output will be a fuction of all inputs (pixels)


In [ ]:
from keras.models import Sequential
from keras.layers import Dense, Activation

model = Sequential()
# in the first layer we need to specify the input shape
model.add(Dense(10, input_shape=(784,)))
model.add(Activation('softmax'))

model.summary()

Exercise: model.summary() gave us the total number of trainable parameters of our model. How is this number obtained?

We flatten and normalize images to match the input that the network expects:


In [ ]:
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

Categories need to be converted to one-hot vectors for training:


In [ ]:
nb_classes = 10
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
y_train, Y_train

We are now ready to train. Let's define the optimizer:


In [ ]:
from keras.optimizers import SGD
lr = 0.01
# For now we will not decrease the learning rate
decay = 0

optim = SGD(lr=lr, decay=decay, momentum=0.9, nesterov=True)

In Keras, we need to compile the model to define the loss and the optimizer we want to use. Since we are dealing with a classification problem, we will use the cross entropy loss, which is already defined in keras. Additionally, we will incorporate the accuracy as an additional metric to compute at the end of each epoch:


In [ ]:
model.compile(loss='categorical_crossentropy',
              optimizer=optim,
              metrics=['accuracy'])

Now let's train the model. model.fit() will do the training loop for us. We just need to pass the training data X_train and labels Y_train as input, specify the batch_size and the number of epochs nb_epoch we want to do. We also pass the test set (X_test,Y_Test) as validation data, which will allow us to see how the model performs on the test data as training progresses. Let's run it:


In [ ]:
batch_size = 32
nb_epoch = 20
verbose = 2

t = time.time()
history = model.fit(X_train, Y_train,
                batch_size=batch_size, nb_epoch=nb_epoch,
                verbose=verbose,validation_data=(X_test, Y_test))

print (time.time() - t, "seconds.")

We can plot the loss and accuracy curves with the history object returned by model.fit(). The function plot_curves, which is defined in utils.py will do this for us.


In [ ]:
plot_curves(history,nb_epoch)

The curve trend indicates that the model may be able to improve if we train it for longer, but for now let's leave it here.

Let's now evaluate our model. model.evaluate() will take all the test samples, forward them through the network and return the average loss, and any additional metrics we specified (in our case, the accuracy).


In [ ]:
score = model.evaluate(X_test, Y_test, verbose=0)
print ("Loss: %f"%(score[0]))
print ("Accuracy: %f"%(score[1]))

Adding a hidden layer

Let's try to train a model with a hidden layer between the input and the classifier.

Exercise: Modify the previous architecture to include this layer with 128 neurons and train it. Take into account that the input_shape must be passed to the first layer of the network.


In [ ]:
import numpy as np
np.random.seed(SEED)

# MODEL DEFINITION
model = Sequential()

# ...

model.summary()

Exercise: Compute the number of parameters and check if they match the ones given by model.summary()

Exercise: Write the code to train the model. Check the code we used for the previous model. Should be too similar...


In [ ]:
# COMPILE & TRAIN

Dropout

Exercise: Add a dropout layer to the model and see their effect in the training curves & accuracy. See the documentation for the Dropout layer in keras.


In [ ]:
import numpy as np
np.random.seed(SEED)

from keras.layers import Dropout

dratio = 0.2
H_DIM = 128

model = Sequential()
# ...
model.summary()

In [ ]:
model.compile(loss='categorical_crossentropy',
              optimizer=optim,
              metrics=['accuracy'])
t = time.time()
history = model.fit(X_train, Y_train,
                batch_size=batch_size, nb_epoch=nb_epoch,
                verbose=verbose,validation_data=(X_test, Y_test))
print (time.time() - t, "seconds.")

score = model.evaluate(X_test, Y_test, verbose=0)
print ("-"*10)
print ("Loss: %f"%(score[0]))
print ("Accuracy: %f"%(score[1]))
plot_curves(history,nb_epoch)

Did this improve the curves? What about the accuracy?