In this session we will learn to design and train simple neural networks and observe their performance on a digit dataset called MNIST. To do this, we will use the Keras API with tensorflow, which will allow us to train and evaluate models in a simple way with a few lines of code.
In [ ]:
import matplotlib.pyplot as plt
%matplotlib inline
from utils import plot_samples, plot_curves
import time
In [ ]:
import numpy as np
# force random seed for results to be reproducible
SEED = 4242
np.random.seed(SEED)
Let's begin by loading the MNIST dataset, which we will use during the whole session.
In [ ]:
from keras.datasets import mnist
from keras.utils import np_utils
# Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# Display some of the samples
plot_samples(X_train)
X_train.shape
We will design a network with a single layer, with as many neurons as categories in our dataset. Each output will be a fuction of all inputs (pixels)
In [ ]:
from keras.models import Sequential
from keras.layers import Dense, Activation
model = Sequential()
# in the first layer we need to specify the input shape
model.add(Dense(10, input_shape=(784,)))
model.add(Activation('softmax'))
model.summary()
Exercise: model.summary()
gave us the total number of trainable parameters of our model. How is this number obtained?
We flatten and normalize images to match the input that the network expects:
In [ ]:
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
Categories need to be converted to one-hot vectors for training:
In [ ]:
nb_classes = 10
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
y_train, Y_train
We are now ready to train. Let's define the optimizer:
In [ ]:
from keras.optimizers import SGD
lr = 0.01
# For now we will not decrease the learning rate
decay = 0
optim = SGD(lr=lr, decay=decay, momentum=0.9, nesterov=True)
In Keras, we need to compile the model to define the loss and the optimizer we want to use. Since we are dealing with a classification problem, we will use the cross entropy loss, which is already defined in keras. Additionally, we will incorporate the accuracy as an additional metric to compute at the end of each epoch:
In [ ]:
model.compile(loss='categorical_crossentropy',
optimizer=optim,
metrics=['accuracy'])
Now let's train the model. model.fit()
will do the training loop for us. We just need to pass the training data X_train
and labels Y_train
as input, specify the batch_size
and the number of epochs nb_epoch
we want to do. We also pass the test set (X_test,Y_Test)
as validation data, which will allow us to see how the model performs on the test data as training progresses. Let's run it:
In [ ]:
batch_size = 32
nb_epoch = 20
verbose = 2
t = time.time()
history = model.fit(X_train, Y_train,
batch_size=batch_size, nb_epoch=nb_epoch,
verbose=verbose,validation_data=(X_test, Y_test))
print (time.time() - t, "seconds.")
We can plot the loss and accuracy curves with the history
object returned by model.fit()
. The function plot_curves
, which is defined in utils.py
will do this for us.
In [ ]:
plot_curves(history,nb_epoch)
The curve trend indicates that the model may be able to improve if we train it for longer, but for now let's leave it here.
Let's now evaluate our model. model.evaluate()
will take all the test samples, forward them through the network and return the average loss, and any additional metrics we specified (in our case, the accuracy).
In [ ]:
score = model.evaluate(X_test, Y_test, verbose=0)
print ("Loss: %f"%(score[0]))
print ("Accuracy: %f"%(score[1]))
Let's try to train a model with a hidden layer between the input and the classifier.
Exercise: Modify the previous architecture to include this layer with 128 neurons and train it. Take into account that the input_shape
must be passed to the first layer of the network.
In [ ]:
import numpy as np
np.random.seed(SEED)
# MODEL DEFINITION
model = Sequential()
# ...
model.summary()
Exercise: Compute the number of parameters and check if they match the ones given by model.summary()
Exercise: Write the code to train the model. Check the code we used for the previous model. Should be too similar...
In [ ]:
# COMPILE & TRAIN
Exercise: Add a dropout layer to the model and see their effect in the training curves & accuracy. See the documentation for the Dropout layer in keras.
In [ ]:
import numpy as np
np.random.seed(SEED)
from keras.layers import Dropout
dratio = 0.2
H_DIM = 128
model = Sequential()
# ...
model.summary()
In [ ]:
model.compile(loss='categorical_crossentropy',
optimizer=optim,
metrics=['accuracy'])
t = time.time()
history = model.fit(X_train, Y_train,
batch_size=batch_size, nb_epoch=nb_epoch,
verbose=verbose,validation_data=(X_test, Y_test))
print (time.time() - t, "seconds.")
score = model.evaluate(X_test, Y_test, verbose=0)
print ("-"*10)
print ("Loss: %f"%(score[0]))
print ("Accuracy: %f"%(score[1]))
plot_curves(history,nb_epoch)
Did this improve the curves? What about the accuracy?