This is an annotated version of Keras's example MNIST CNN code for how to implement convolutional networks.
The CIFAR-10 classification task is a classic machine learning benchmark. The data includes 50,000 images belonging to 10 classes, and the task is to identify them. Along with MNIST, CIFAR-10 classification is a sort of like a "hello world" for computer vision and convolutional networks, so a solution can be implemented quickly with an off-the-shelf machine learning library.
Since convolutional neural networks have thus far proven to be the best at computer vision tasks, we'll use the Keras library to implement a convolutional networks as our solution. Keras provides a well-designed and readable API on top of TensorFlow's backend, so we'll be done in a surprisingly short amount of steps!
Note, if you have been running these notebooks on a regular laptop without GPU until now, it's going to become more and more difficult to do so. The neural networks we will be training, starting with convolutional networks, will become increasingly memory and processing-intensive and may slow down laptops without good graphics processing.
In [37]:
import os
import matplotlib.pyplot as plt
import numpy as np
import random
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import Conv2D, MaxPooling2D, Flatten
from keras.layers import Activation
Recall that a basic neural network in Keras can be set up like this:
In [38]:
model = Sequential()
model.add(Dense(100, activation='sigmoid', input_dim=3072))
model.add(Dense(100, activation='sigmoid'))
model.add(Dense(10, activation='softmax'))
model.summary()
We load CIFAR-10 dataset and reshape them as individual vectors.
In [39]:
from keras.datasets import cifar10
# load CIFAR
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
num_classes = 10
# reshape CIFAR
x_train = x_train.reshape(50000, 32*32*3)
x_test = x_test.reshape(10000, 32*32*3)
# make float32
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
# normalize to (0-1)
x_train /= 255
x_test /= 255
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
print('%d train samples, %d test samples'%(x_train.shape[0], x_test.shape[0]))
print("training data shape: ", x_train.shape, y_train.shape)
print("test data shape: ", x_test.shape, y_test.shape)
Let's see some of our samples.
In [41]:
samples = np.concatenate([np.concatenate([x_train[i].reshape((32,32,3)) for i in [int(random.random() * len(x_train)) for i in range(16)]], axis=1) for i in range(6)], axis=0)
plt.figure(figsize=(16,6))
plt.imshow(samples, cmap='gray')
Out[41]:
We can compile the model using categorical-cross-entropy loss, and train it for 30 epochs.
In [4]:
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=128,
epochs=30,
validation_data=(x_test, y_test))
Out[4]:
Then we can evaluate the model.
In [5]:
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
Not very good! With more training time, perhaps 100 epochs, we might get 40% accuracy.
Now onto convolutional networks... The general architecture of a convolutional neural network is:
We'll follow this same basic structure and interweave some other components, such as dropout, to improve performance.
To begin, we start with our convolution layers. We first need to specify some architectural hyperparemeters:
We start by designing a neural network with two alternating convolutional and max-pooling layers, followed by a 100-neuron fully-connected layer and a 10-neuron output. We'll have 64 and 32 filters in the two convolutional layers, and make the input shape a full-sized image (32x32x3) instead of an unrolled vector (3072x1). We also now use ReLU activation units instead of sigmoids, to avoid vanishing gradients.
In [6]:
model = Sequential()
model.add(Conv2D(64, (3, 3), padding='same', input_shape=(32,32,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(100))
model.add(Activation('relu'))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
model.summary()
We need to reload the CIFAR-10 dataset and this time do not reshape them into unrolled input vectors -- let them stay as images, although continue to normalize them.
In [42]:
from keras.datasets import cifar10
# load CIFAR
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
num_classes = 10
# do not reshape CIFAR if you have a convolutional input!
# make float32
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
# normalize to (0-1)
x_train /= 255
x_test /= 255
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
print('%d train samples, %d test samples'%(x_train.shape[0], x_test.shape[0]))
print("training data shape: ", x_train.shape, y_train.shape)
print("test data shape: ", x_test.shape, y_test.shape)
Let's compile the model and test it again.
In [8]:
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=128,
epochs=30,
validation_data=(x_test, y_test))
Out[8]:
Let's evaluate the model again.
In [17]:
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
63% accuracy is a big improvement on 40%! All of that is accomplished in just 30 epochs using convolutional layers and ReLUs.
Let's try to make the network bigger.
In [43]:
model = Sequential()
model.add(Conv2D(128, (3, 3), padding='same', input_shape=(32,32,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dense(100))
model.add(Activation('relu'))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
model.summary()
Compile and train again.
In [9]:
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=128,
epochs=50,
validation_data=(x_test, y_test))
Out[9]:
Evaluate test accuracy.
In [10]:
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
One problem you might notice is that the accuracy of the model is much better on the training set than on the test set. You can see that by monitoring the progress at the end of each epoch above or by evaluating it directly.
In [11]:
score = model.evaluate(x_train, y_train, verbose=0)
print('Training loss:', score[0])
print('Training accuracy:', score[1])
77% accuracy on the training set but only 68% on the test set. Looking at the monitored training, the validation accuracy and training accuracy began to diverge around epoch 10.
Something must be wrong! This is a symptom of "overfitting". Our model has probably tried to bend itself a little too well towards predicting the training set but does not generalize very well to unseen data. This is a very common problem.
It's normal for the training accuracy to be better than the testng accuracy to some degree, because it's hard to avoid for the network to be better at predicting the data it sees. But a 9% difference is too much.
One way of helping this is by doing some regularization. We can add dropout to our model after a few layers.
In [22]:
model = Sequential()
model.add(Conv2D(128, (3, 3), padding='same', input_shape=(32,32,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dense(100))
model.add(Activation('relu'))
model.add(Dropout(0.25))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
model.summary()
We compile and train again.
In [23]:
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=128,
epochs=50,
validation_data=(x_test, y_test))
Out[23]:
We check our test loss and training loss again.
In [25]:
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
score = model.evaluate(x_train, y_train, verbose=0)
print('Training loss:', score[0])
print('Training accuracy:', score[1])
Now our training accuracy is lower (72%) but our test accuracy is higher (69%). This is more like what we expect.
In [31]:
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
score = model.evaluate(x_train, y_train, verbose=0)
print('Training loss:', score[0])
print('Training accuracy:', score[1])
Another way of improving performance is to experiment with different optimizers beyond just standard sgd. Let's try to instantiate the same network but use ADAM instead of sgd.
In [47]:
model = Sequential()
model.add(Conv2D(128, (3, 3), padding='same', input_shape=(32,32,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dense(100))
model.add(Activation('relu'))
model.add(Dropout(0.25))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
model.summary()
In [48]:
model.compile(
loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
model.fit(x_train, y_train,
batch_size=128,
epochs=50,
validation_data=(x_test, y_test))
Out[48]:
In [49]:
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
score = model.evaluate(x_train, y_train, verbose=0)
print('Training loss:', score[0])
print('Training accuracy:', score[1])
78% accuracy! Our best yet. Looks heavily overfit though (99% accuracy on the training set... maybe needs more dropout?)
Still a long way to go to beat the record (96%). We can make a lot of progress by making the network (much) bigger, training for (much) longer and using a lot of little tricks (like data augmentation) but that is beyond the scope of this lesson for now.
Let's also recall how to predict a single value and look at its probabilities.
In [50]:
import matplotlib
x_sample = x_test[0].reshape(1,32,32,3)
y_prob = model.predict(x_sample)[0]
y_pred = y_prob.argmax()
y_actual = y_test[0].argmax()
print("predicted = %d, actual = %d" % (y_pred, y_actual))
matplotlib.pyplot.bar(range(10), y_prob)
Out[50]:
Let's also review here how to save and load trained keras models. It's easy! From Keras docuemtnation
In [51]:
from keras.models import load_model
model.save('my_model.h5') # creates a HDF5 file 'my_model.h5'
del model # deletes the existing model
# returns a compiled model
# identical to the previous one
model = load_model('my_model.h5')