So far we have been treating images as flattened arrays of data. One might argue that such representation is not the best, since by flattening 2D images we are losing all spatial information. Let's now use a different network that exploits spatial information by using convolutional layers.
In [ ]:
import matplotlib.pyplot as plt
%matplotlib inline
from utils import plot_samples, plot_curves
import time
In [ ]:
import numpy as np
# force random seed for results to be reproducible
SEED = 4242
np.random.seed(SEED)
In the first part of this session we will use MNIST data to train the network. Using fully connected layers we achieved an accuracy of nearly 0.98 on the test set. Let's see if we can improve this with convolutional layers.
In [ ]:
from keras.datasets import mnist
from keras.utils import np_utils
# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
img_rows = X_train.shape[1]
img_cols = X_train.shape[2]
X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1) # add third axis with 1
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
nb_classes = 10
# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
Exercise: Design a model with convolutional layers for MNIST classification.
You may also need to use other layer types:
keras.layers.pooling.MaxPooling2D(pool_size=(2, 2), border_mode='valid')
keras.layers.core.Flatten()
keras.layers.core.Activation()
keras.layers.core.Dropout()
Keep things simple: remember that training these models is computationally expensive. Limit the amount of layers and neurons to reduce the number of parameters. More than 100K parameters is not recommended for training in CPU.
In [ ]:
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.layers import Convolution2D, MaxPooling2D, Flatten
# number of convolutional filters to use
nb_filters = 8
# size of pooling area for max pooling
pool_size = (2, 2)
# convolution kernel size
kernel_size = (3,3)
h_dim = 28
dr_ratio = 0.2
# something of the form:
# conv - relu - conv - relu - maxpool - dense - relu - dropout - classifier - softmax
model = Sequential()
# ...
model.summary()
Exercise: The convolutional layer in keras has a parameter border_mode
, which can take values of 'valid' (no padding) or 'same' (+padding). What is the impact in the parameters when setting it to 'valid' or 'same', respectively? Why does the number of parameters change?
Let's train the model. Notice that this procedure is going to take a lot longer than the ones in the previous session. Convolutions are computationally expensive, and even the simplest model takes a long time to train without a GPU.
In [ ]:
from keras.optimizers import SGD
lr = 0.01
# For now we will not decrease the learning rate
decay = 0
optim = SGD(lr=lr, decay=decay, momentum=0.9, nesterov=True)
In [ ]:
batch_size = 32
nb_epoch = 10
model.compile(loss='categorical_crossentropy',
optimizer=optim,
metrics=['accuracy'])
t = time.time()
# GeForce GTX 980 - 161 seconds 30 epochs bs 128
# GeForce GTX Titan Black - 200 seconds 30 epochs bs 128
history = model.fit(X_train, Y_train,
batch_size=batch_size, nb_epoch=nb_epoch,
verbose=2,validation_data=(X_test, Y_test))
print (time.time() - t, "seconds.")
score = model.evaluate(X_test, Y_test, verbose=0)
print ("-"*10)
print ("Loss: %f"%(score[0]))
print ("Accuracy: %f"%(score[1]))
plot_curves(history,nb_epoch)
In [ ]:
model.save('../models/mnist_conv.h5')
We could probably train this model for a lot longer and results would still improve. Since our time is limited, let's move on for now.
At this point we already know how to train a convnet for classification. Let's now switch to a more challenging dataset of colour images: CIFAR 10.
In [ ]:
from keras.datasets import cifar10
import numpy as np
np.random.seed(4242)
Let's load the dataset and display some samples:
In [ ]:
# The data, shuffled and split between train and test sets:
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
plot_samples(X_train,5)
We format the data before training:
In [ ]:
nb_classes = 10
# Convert class vectors to binary class matrices.
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
X_train.shape
img_rows = X_train.shape[1]
img_cols = X_train.shape[2]
input_shape = (img_rows, img_cols,3)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
input_shape
Exercise: Design and train a convnet to classify cifar 10 images. Hint: Images now have 3 channels instead of 1 !
In [ ]:
nb_epoch=10
lr = 0.01
decay = 1e-6
optim = SGD(lr=lr, decay=decay, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
optimizer=optim,
metrics=['accuracy'])
t = time.time()
history = model.fit(X_train, Y_train,
batch_size=batch_size, nb_epoch=nb_epoch,
verbose=2,validation_data=(X_test, Y_test))
print (time.time() - t, 'seconds.')
In [ ]:
score = model.evaluate(X_test, Y_test, verbose=0)
print ("Loss: %f"%(score[0]))
print ("Accuracy: %f"%(score[1]))
plot_curves(history,nb_epoch)
In [ ]:
model.save('../models/cifar10.h5')
Final notes: Though there probably won't be enough time to properly train the cifar model until convergence, you can try to finish training at home. Track the training and validation curves to know when to stop training. For the next practical sessions, a fully trained model with CIFAR will be provided.