Convolutional networks

This is an annotated version of Keras's example MNIST CNN code for how to implement convolutional networks.

The CIFAR-10 classification task is a classic machine learning benchmark. The data includes 50,000 images belonging to 10 classes, and the task is to identify them. Along with MNIST, CIFAR-10 classification is a sort of like a "hello world" for computer vision and convolutional networks, so a solution can be implemented quickly with an off-the-shelf machine learning library.

Since convolutional neural networks have thus far proven to be the best at computer vision tasks, we'll use the Keras library to implement a convolutional networks as our solution. Keras provides a well-designed and readable API on top of TensorFlow's backend, so we'll be done in a surprisingly short amount of steps!

Note, if you have been running these notebooks on a regular laptop without GPU until now, it's going to become more and more difficult to do so. The neural networks we will be training, starting with convolutional networks, will become increasingly memory and processing-intensive and may slow down laptops without good graphics processing.


In [37]:
import os
import matplotlib.pyplot as plt
import numpy as np
import random

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import Conv2D, MaxPooling2D, Flatten

from keras.layers import Activation

Recall that a basic neural network in Keras can be set up like this:


In [38]:
model = Sequential()
model.add(Dense(100, activation='sigmoid', input_dim=3072))
model.add(Dense(100, activation='sigmoid'))
model.add(Dense(10, activation='softmax'))
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_35 (Dense)             (None, 100)               307300    
_________________________________________________________________
dense_36 (Dense)             (None, 100)               10100     
_________________________________________________________________
dense_37 (Dense)             (None, 10)                1010      
=================================================================
Total params: 318,410
Trainable params: 318,410
Non-trainable params: 0
_________________________________________________________________

We load CIFAR-10 dataset and reshape them as individual vectors.


In [39]:
from keras.datasets import cifar10

# load CIFAR
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
num_classes = 10

# reshape CIFAR
x_train = x_train.reshape(50000, 32*32*3)
x_test = x_test.reshape(10000, 32*32*3)

# make float32
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

# normalize to (0-1)
x_train /= 255
x_test /= 255

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

print('%d train samples, %d test samples'%(x_train.shape[0], x_test.shape[0]))
print("training data shape: ", x_train.shape, y_train.shape)
print("test data shape: ", x_test.shape, y_test.shape)


50000 train samples, 10000 test samples
training data shape:  (50000, 3072) (50000, 10)
test data shape:  (10000, 3072) (10000, 10)

Let's see some of our samples.


In [41]:
samples = np.concatenate([np.concatenate([x_train[i].reshape((32,32,3)) for i in [int(random.random() * len(x_train)) for i in range(16)]], axis=1) for i in range(6)], axis=0)
plt.figure(figsize=(16,6))
plt.imshow(samples, cmap='gray')


Out[41]:
<matplotlib.image.AxesImage at 0x7f282998b6d8>

We can compile the model using categorical-cross-entropy loss, and train it for 30 epochs.


In [4]:
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=128,
          epochs=30,
          validation_data=(x_test, y_test))


Train on 50000 samples, validate on 10000 samples
Epoch 1/30
50000/50000 [==============================] - 4s 85us/step - loss: 2.2977 - acc: 0.1434 - val_loss: 2.2823 - val_acc: 0.1897
Epoch 2/30
50000/50000 [==============================] - 1s 30us/step - loss: 2.2731 - acc: 0.2025 - val_loss: 2.2625 - val_acc: 0.2174
Epoch 3/30
50000/50000 [==============================] - 1s 29us/step - loss: 2.2520 - acc: 0.2323 - val_loss: 2.2396 - val_acc: 0.2420
Epoch 4/30
50000/50000 [==============================] - 1s 29us/step - loss: 2.2259 - acc: 0.2490 - val_loss: 2.2111 - val_acc: 0.2650
Epoch 5/30
50000/50000 [==============================] - 1s 29us/step - loss: 2.1940 - acc: 0.2626 - val_loss: 2.1765 - val_acc: 0.2698
Epoch 6/30
50000/50000 [==============================] - 1s 28us/step - loss: 2.1575 - acc: 0.2688 - val_loss: 2.1393 - val_acc: 0.2762
Epoch 7/30
50000/50000 [==============================] - 1s 28us/step - loss: 2.1207 - acc: 0.2751 - val_loss: 2.1037 - val_acc: 0.2716
Epoch 8/30
50000/50000 [==============================] - 1s 28us/step - loss: 2.0869 - acc: 0.2803 - val_loss: 2.0723 - val_acc: 0.2858
Epoch 9/30
50000/50000 [==============================] - 1s 29us/step - loss: 2.0577 - acc: 0.2879 - val_loss: 2.0454 - val_acc: 0.2865
Epoch 10/30
50000/50000 [==============================] - 1s 28us/step - loss: 2.0327 - acc: 0.2904 - val_loss: 2.0231 - val_acc: 0.2861
Epoch 11/30
50000/50000 [==============================] - 1s 28us/step - loss: 2.0114 - acc: 0.2952 - val_loss: 2.0026 - val_acc: 0.3020
Epoch 12/30
50000/50000 [==============================] - 1s 28us/step - loss: 1.9929 - acc: 0.2994 - val_loss: 1.9854 - val_acc: 0.3008
Epoch 13/30
50000/50000 [==============================] - 1s 28us/step - loss: 1.9765 - acc: 0.3036 - val_loss: 1.9702 - val_acc: 0.3028
Epoch 14/30
50000/50000 [==============================] - 1s 28us/step - loss: 1.9621 - acc: 0.3069 - val_loss: 1.9568 - val_acc: 0.3047
Epoch 15/30
50000/50000 [==============================] - 1s 28us/step - loss: 1.9491 - acc: 0.3100 - val_loss: 1.9444 - val_acc: 0.3057
Epoch 16/30
50000/50000 [==============================] - 1s 28us/step - loss: 1.9375 - acc: 0.3147 - val_loss: 1.9331 - val_acc: 0.3147
Epoch 17/30
50000/50000 [==============================] - 1s 29us/step - loss: 1.9271 - acc: 0.3171 - val_loss: 1.9245 - val_acc: 0.3184
Epoch 18/30
50000/50000 [==============================] - 1s 29us/step - loss: 1.9178 - acc: 0.3201 - val_loss: 1.9155 - val_acc: 0.3193
Epoch 19/30
50000/50000 [==============================] - 1s 29us/step - loss: 1.9095 - acc: 0.3242 - val_loss: 1.9077 - val_acc: 0.3172
Epoch 20/30
50000/50000 [==============================] - 1s 30us/step - loss: 1.9019 - acc: 0.3250 - val_loss: 1.8996 - val_acc: 0.3274
Epoch 21/30
50000/50000 [==============================] - 1s 28us/step - loss: 1.8947 - acc: 0.3292 - val_loss: 1.8924 - val_acc: 0.3288
Epoch 22/30
50000/50000 [==============================] - 1s 28us/step - loss: 1.8879 - acc: 0.3321 - val_loss: 1.8861 - val_acc: 0.3298
Epoch 23/30
50000/50000 [==============================] - 1s 28us/step - loss: 1.8816 - acc: 0.3341 - val_loss: 1.8813 - val_acc: 0.3282
Epoch 24/30
50000/50000 [==============================] - 1s 29us/step - loss: 1.8756 - acc: 0.3367 - val_loss: 1.8754 - val_acc: 0.3364
Epoch 25/30
50000/50000 [==============================] - 1s 28us/step - loss: 1.8698 - acc: 0.3385 - val_loss: 1.8689 - val_acc: 0.3371
Epoch 26/30
50000/50000 [==============================] - 1s 29us/step - loss: 1.8639 - acc: 0.3420 - val_loss: 1.8633 - val_acc: 0.3353
Epoch 27/30
50000/50000 [==============================] - 1s 29us/step - loss: 1.8581 - acc: 0.3435 - val_loss: 1.8568 - val_acc: 0.3443
Epoch 28/30
50000/50000 [==============================] - 1s 28us/step - loss: 1.8528 - acc: 0.3453 - val_loss: 1.8517 - val_acc: 0.3444
Epoch 29/30
50000/50000 [==============================] - 1s 29us/step - loss: 1.8472 - acc: 0.3473 - val_loss: 1.8461 - val_acc: 0.3490
Epoch 30/30
50000/50000 [==============================] - 1s 29us/step - loss: 1.8419 - acc: 0.3484 - val_loss: 1.8402 - val_acc: 0.3497
Out[4]:
<keras.callbacks.History at 0x7f2af7e04860>

Then we can evaluate the model.


In [5]:
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])


Test loss: 1.8402026081085205
Test accuracy: 0.3497

Not very good! With more training time, perhaps 100 epochs, we might get 40% accuracy.

Now onto convolutional networks... The general architecture of a convolutional neural network is:

  • convolution layers, followed by pooling layers
  • fully-connected layers
  • a final fully-connected softmax layer

We'll follow this same basic structure and interweave some other components, such as dropout, to improve performance.

To begin, we start with our convolution layers. We first need to specify some architectural hyperparemeters:

  • How many filters do we want for our convolution layers? Like most hyperparameters, this is chosen through a mix of intuition and tuning. A rough rule of thumb is: the more complex the task, the more filters. (Note that we don't need to have the same number of filters for each convolution layer, but we are doing so here for convenience.)
  • What size should our convolution filters be? We don't want filters to be too large or the resulting matrix might not be very meaningful. For instance, a useless filter size in this task would be a 28x28 filter since it covers the whole image. We also don't want filters to be too small for a similar reason, e.g. a 1x1 filter just returns each pixel.
  • What size should our pooling window be? Again, we don't want pooling windows to be too large or we'll be throwing away information. However, for larger images, a larger pooling window might be appropriate (same goes for convolution filters).

We start by designing a neural network with two alternating convolutional and max-pooling layers, followed by a 100-neuron fully-connected layer and a 10-neuron output. We'll have 64 and 32 filters in the two convolutional layers, and make the input shape a full-sized image (32x32x3) instead of an unrolled vector (3072x1). We also now use ReLU activation units instead of sigmoids, to avoid vanishing gradients.


In [6]:
model = Sequential()
model.add(Conv2D(64, (3, 3), padding='same', input_shape=(32,32,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(100))
model.add(Activation('relu'))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 32, 32, 64)        1792      
_________________________________________________________________
activation_1 (Activation)    (None, 32, 32, 64)        0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 16, 16, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 16, 16, 32)        18464     
_________________________________________________________________
activation_2 (Activation)    (None, 16, 16, 32)        0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 8, 8, 32)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 2048)              0         
_________________________________________________________________
dense_4 (Dense)              (None, 100)               204900    
_________________________________________________________________
activation_3 (Activation)    (None, 100)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 10)                1010      
_________________________________________________________________
activation_4 (Activation)    (None, 10)                0         
=================================================================
Total params: 226,166
Trainable params: 226,166
Non-trainable params: 0
_________________________________________________________________

We need to reload the CIFAR-10 dataset and this time do not reshape them into unrolled input vectors -- let them stay as images, although continue to normalize them.


In [42]:
from keras.datasets import cifar10

# load CIFAR
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
num_classes = 10

# do not reshape CIFAR if you have a convolutional input!

# make float32
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

# normalize to (0-1)
x_train /= 255
x_test /= 255

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

print('%d train samples, %d test samples'%(x_train.shape[0], x_test.shape[0]))
print("training data shape: ", x_train.shape, y_train.shape)
print("test data shape: ", x_test.shape, y_test.shape)


50000 train samples, 10000 test samples
training data shape:  (50000, 32, 32, 3) (50000, 10)
test data shape:  (10000, 32, 32, 3) (10000, 10)

Let's compile the model and test it again.


In [8]:
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=128,
          epochs=30,
          validation_data=(x_test, y_test))


Train on 50000 samples, validate on 10000 samples
Epoch 1/30
50000/50000 [==============================] - 6s 121us/step - loss: 2.1950 - acc: 0.1982 - val_loss: 2.0243 - val_acc: 0.2768
Epoch 2/30
50000/50000 [==============================] - 4s 76us/step - loss: 1.9498 - acc: 0.3072 - val_loss: 1.9031 - val_acc: 0.3381
Epoch 3/30
50000/50000 [==============================] - 4s 76us/step - loss: 1.8160 - acc: 0.3621 - val_loss: 1.7428 - val_acc: 0.3857
Epoch 4/30
50000/50000 [==============================] - 4s 75us/step - loss: 1.6909 - acc: 0.4049 - val_loss: 1.6034 - val_acc: 0.4345
Epoch 5/30
50000/50000 [==============================] - 4s 75us/step - loss: 1.6039 - acc: 0.4325 - val_loss: 1.5351 - val_acc: 0.4469
Epoch 6/30
50000/50000 [==============================] - 4s 76us/step - loss: 1.5284 - acc: 0.4569 - val_loss: 1.4748 - val_acc: 0.4711
Epoch 7/30
50000/50000 [==============================] - 4s 76us/step - loss: 1.4794 - acc: 0.4739 - val_loss: 1.4417 - val_acc: 0.4878
Epoch 8/30
50000/50000 [==============================] - 4s 76us/step - loss: 1.4300 - acc: 0.4891 - val_loss: 1.4253 - val_acc: 0.4888
Epoch 9/30
50000/50000 [==============================] - 4s 75us/step - loss: 1.3947 - acc: 0.5048 - val_loss: 1.4039 - val_acc: 0.4968
Epoch 10/30
50000/50000 [==============================] - 4s 76us/step - loss: 1.3616 - acc: 0.5176 - val_loss: 1.3843 - val_acc: 0.5038
Epoch 11/30
50000/50000 [==============================] - 4s 75us/step - loss: 1.3283 - acc: 0.5292 - val_loss: 1.3235 - val_acc: 0.5276
Epoch 12/30
50000/50000 [==============================] - 4s 75us/step - loss: 1.3000 - acc: 0.5399 - val_loss: 1.2925 - val_acc: 0.5408
Epoch 13/30
50000/50000 [==============================] - 4s 75us/step - loss: 1.2742 - acc: 0.5503 - val_loss: 1.2821 - val_acc: 0.5406
Epoch 14/30
50000/50000 [==============================] - 4s 75us/step - loss: 1.2468 - acc: 0.5611 - val_loss: 1.2522 - val_acc: 0.5507
Epoch 15/30
50000/50000 [==============================] - 4s 75us/step - loss: 1.2225 - acc: 0.5691 - val_loss: 1.2457 - val_acc: 0.5509
Epoch 16/30
50000/50000 [==============================] - 4s 76us/step - loss: 1.1988 - acc: 0.5774 - val_loss: 1.2455 - val_acc: 0.5593
Epoch 17/30
50000/50000 [==============================] - 4s 76us/step - loss: 1.1782 - acc: 0.5873 - val_loss: 1.1993 - val_acc: 0.5719
Epoch 18/30
50000/50000 [==============================] - 4s 76us/step - loss: 1.1599 - acc: 0.5933 - val_loss: 1.2084 - val_acc: 0.5674
Epoch 19/30
50000/50000 [==============================] - 4s 77us/step - loss: 1.1394 - acc: 0.6006 - val_loss: 1.2101 - val_acc: 0.5722
Epoch 20/30
50000/50000 [==============================] - 4s 76us/step - loss: 1.1241 - acc: 0.6070 - val_loss: 1.2216 - val_acc: 0.5784
Epoch 21/30
50000/50000 [==============================] - 4s 76us/step - loss: 1.1044 - acc: 0.6133 - val_loss: 1.1923 - val_acc: 0.5782
Epoch 22/30
50000/50000 [==============================] - 4s 76us/step - loss: 1.0889 - acc: 0.6185 - val_loss: 1.1599 - val_acc: 0.5865
Epoch 23/30
50000/50000 [==============================] - 4s 75us/step - loss: 1.0701 - acc: 0.6245 - val_loss: 1.1081 - val_acc: 0.6020
Epoch 24/30
50000/50000 [==============================] - 4s 76us/step - loss: 1.0533 - acc: 0.6325 - val_loss: 1.0949 - val_acc: 0.6094
Epoch 25/30
50000/50000 [==============================] - 4s 76us/step - loss: 1.0379 - acc: 0.6382 - val_loss: 1.0887 - val_acc: 0.6147
Epoch 26/30
50000/50000 [==============================] - 4s 76us/step - loss: 1.0258 - acc: 0.6417 - val_loss: 1.1304 - val_acc: 0.5994
Epoch 27/30
50000/50000 [==============================] - 4s 76us/step - loss: 1.0096 - acc: 0.6471 - val_loss: 1.0664 - val_acc: 0.6275
Epoch 28/30
50000/50000 [==============================] - 4s 76us/step - loss: 0.9945 - acc: 0.6532 - val_loss: 1.0837 - val_acc: 0.6152
Epoch 29/30
50000/50000 [==============================] - 4s 76us/step - loss: 0.9819 - acc: 0.6577 - val_loss: 1.0721 - val_acc: 0.6184
Epoch 30/30
50000/50000 [==============================] - 4s 76us/step - loss: 0.9696 - acc: 0.6640 - val_loss: 1.0799 - val_acc: 0.6180
Out[8]:
<keras.callbacks.History at 0x7f2af7d21518>

Let's evaluate the model again.


In [17]:
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])


Test loss: 1.0451206232070922
Test accuracy: 0.6323

63% accuracy is a big improvement on 40%! All of that is accomplished in just 30 epochs using convolutional layers and ReLUs.

Let's try to make the network bigger.


In [43]:
model = Sequential()
model.add(Conv2D(128, (3, 3), padding='same', input_shape=(32,32,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dense(100))
model.add(Activation('relu'))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_40 (Conv2D)           (None, 32, 32, 128)       3584      
_________________________________________________________________
activation_71 (Activation)   (None, 32, 32, 128)       0         
_________________________________________________________________
max_pooling2d_32 (MaxPooling (None, 16, 16, 128)       0         
_________________________________________________________________
conv2d_41 (Conv2D)           (None, 16, 16, 64)        73792     
_________________________________________________________________
activation_72 (Activation)   (None, 16, 16, 64)        0         
_________________________________________________________________
max_pooling2d_33 (MaxPooling (None, 8, 8, 64)          0         
_________________________________________________________________
conv2d_42 (Conv2D)           (None, 8, 8, 64)          36928     
_________________________________________________________________
activation_73 (Activation)   (None, 8, 8, 64)          0         
_________________________________________________________________
max_pooling2d_34 (MaxPooling (None, 4, 4, 64)          0         
_________________________________________________________________
flatten_13 (Flatten)         (None, 1024)              0         
_________________________________________________________________
dense_38 (Dense)             (None, 256)               262400    
_________________________________________________________________
activation_74 (Activation)   (None, 256)               0         
_________________________________________________________________
dense_39 (Dense)             (None, 100)               25700     
_________________________________________________________________
activation_75 (Activation)   (None, 100)               0         
_________________________________________________________________
dense_40 (Dense)             (None, 10)                1010      
_________________________________________________________________
activation_76 (Activation)   (None, 10)                0         
=================================================================
Total params: 403,414
Trainable params: 403,414
Non-trainable params: 0
_________________________________________________________________

Compile and train again.


In [9]:
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=128,
          epochs=50,
          validation_data=(x_test, y_test))


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_3 (Conv2D)            (None, 32, 32, 128)       3584      
_________________________________________________________________
activation_5 (Activation)    (None, 32, 32, 128)       0         
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 16, 16, 128)       0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 16, 16, 64)        73792     
_________________________________________________________________
activation_6 (Activation)    (None, 16, 16, 64)        0         
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 8, 8, 64)          0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 8, 8, 64)          36928     
_________________________________________________________________
activation_7 (Activation)    (None, 8, 8, 64)          0         
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 4, 4, 64)          0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 1024)              0         
_________________________________________________________________
dense_6 (Dense)              (None, 256)               262400    
_________________________________________________________________
activation_8 (Activation)    (None, 256)               0         
_________________________________________________________________
dense_7 (Dense)              (None, 100)               25700     
_________________________________________________________________
activation_9 (Activation)    (None, 100)               0         
_________________________________________________________________
dense_8 (Dense)              (None, 10)                1010      
_________________________________________________________________
activation_10 (Activation)   (None, 10)                0         
=================================================================
Total params: 403,414
Trainable params: 403,414
Non-trainable params: 0
_________________________________________________________________
Train on 50000 samples, validate on 10000 samples
Epoch 1/50
50000/50000 [==============================] - 6s 123us/step - loss: 2.2554 - acc: 0.1903 - val_loss: 2.1562 - val_acc: 0.1933
Epoch 2/50
50000/50000 [==============================] - 6s 116us/step - loss: 2.0300 - acc: 0.2712 - val_loss: 1.9791 - val_acc: 0.2843
Epoch 3/50
50000/50000 [==============================] - 6s 115us/step - loss: 1.9109 - acc: 0.3177 - val_loss: 1.8798 - val_acc: 0.3334
Epoch 4/50
50000/50000 [==============================] - 6s 117us/step - loss: 1.8111 - acc: 0.3543 - val_loss: 1.7440 - val_acc: 0.3758
Epoch 5/50
50000/50000 [==============================] - 6s 116us/step - loss: 1.7230 - acc: 0.3867 - val_loss: 1.6357 - val_acc: 0.4203
Epoch 6/50
50000/50000 [==============================] - 6s 115us/step - loss: 1.6345 - acc: 0.4163 - val_loss: 1.5607 - val_acc: 0.4485
Epoch 7/50
50000/50000 [==============================] - 6s 115us/step - loss: 1.5600 - acc: 0.4441 - val_loss: 1.5302 - val_acc: 0.4474
Epoch 8/50
50000/50000 [==============================] - 6s 116us/step - loss: 1.5005 - acc: 0.4640 - val_loss: 1.4898 - val_acc: 0.4716
Epoch 9/50
50000/50000 [==============================] - 6s 116us/step - loss: 1.4564 - acc: 0.4795 - val_loss: 1.4031 - val_acc: 0.4952
Epoch 10/50
50000/50000 [==============================] - 6s 115us/step - loss: 1.4113 - acc: 0.4965 - val_loss: 1.4292 - val_acc: 0.4817
Epoch 11/50
50000/50000 [==============================] - 6s 115us/step - loss: 1.3784 - acc: 0.5100 - val_loss: 1.3621 - val_acc: 0.5163
Epoch 12/50
50000/50000 [==============================] - 6s 115us/step - loss: 1.3484 - acc: 0.5213 - val_loss: 1.3342 - val_acc: 0.5182
Epoch 13/50
50000/50000 [==============================] - 6s 116us/step - loss: 1.3126 - acc: 0.5329 - val_loss: 1.2919 - val_acc: 0.5342
Epoch 14/50
50000/50000 [==============================] - 6s 115us/step - loss: 1.2877 - acc: 0.5430 - val_loss: 1.2874 - val_acc: 0.5312
Epoch 15/50
50000/50000 [==============================] - 6s 115us/step - loss: 1.2602 - acc: 0.5536 - val_loss: 1.4420 - val_acc: 0.4882
Epoch 16/50
50000/50000 [==============================] - 6s 115us/step - loss: 1.2314 - acc: 0.5638 - val_loss: 1.2218 - val_acc: 0.5669
Epoch 17/50
50000/50000 [==============================] - 6s 115us/step - loss: 1.2049 - acc: 0.5741 - val_loss: 1.2472 - val_acc: 0.5532
Epoch 18/50
50000/50000 [==============================] - 6s 115us/step - loss: 1.1755 - acc: 0.5865 - val_loss: 1.2906 - val_acc: 0.5367
Epoch 19/50
50000/50000 [==============================] - 6s 115us/step - loss: 1.1512 - acc: 0.5935 - val_loss: 1.1664 - val_acc: 0.5809
Epoch 20/50
50000/50000 [==============================] - 6s 115us/step - loss: 1.1242 - acc: 0.6059 - val_loss: 1.1497 - val_acc: 0.5885
Epoch 21/50
50000/50000 [==============================] - 6s 116us/step - loss: 1.1013 - acc: 0.6122 - val_loss: 1.1715 - val_acc: 0.5828
Epoch 22/50
50000/50000 [==============================] - 6s 115us/step - loss: 1.0763 - acc: 0.6217 - val_loss: 1.1545 - val_acc: 0.5913
Epoch 23/50
50000/50000 [==============================] - 6s 115us/step - loss: 1.0535 - acc: 0.6298 - val_loss: 1.0809 - val_acc: 0.6199
Epoch 24/50
50000/50000 [==============================] - 6s 116us/step - loss: 1.0325 - acc: 0.6393 - val_loss: 1.0858 - val_acc: 0.6134
Epoch 25/50
50000/50000 [==============================] - 6s 115us/step - loss: 1.0088 - acc: 0.6466 - val_loss: 1.0516 - val_acc: 0.6306
Epoch 26/50
50000/50000 [==============================] - 6s 115us/step - loss: 0.9883 - acc: 0.6542 - val_loss: 1.1746 - val_acc: 0.5982
Epoch 27/50
50000/50000 [==============================] - 6s 116us/step - loss: 0.9696 - acc: 0.6604 - val_loss: 1.0660 - val_acc: 0.6232
Epoch 28/50
50000/50000 [==============================] - 6s 115us/step - loss: 0.9477 - acc: 0.6709 - val_loss: 1.0299 - val_acc: 0.6393
Epoch 29/50
50000/50000 [==============================] - 6s 115us/step - loss: 0.9311 - acc: 0.6763 - val_loss: 1.0138 - val_acc: 0.6466
Epoch 30/50
50000/50000 [==============================] - 6s 115us/step - loss: 0.9144 - acc: 0.6808 - val_loss: 1.0106 - val_acc: 0.6470
Epoch 31/50
50000/50000 [==============================] - 6s 116us/step - loss: 0.8915 - acc: 0.6909 - val_loss: 1.0042 - val_acc: 0.6468
Epoch 32/50
50000/50000 [==============================] - 6s 115us/step - loss: 0.8750 - acc: 0.6943 - val_loss: 0.9766 - val_acc: 0.6586
Epoch 33/50
50000/50000 [==============================] - 6s 115us/step - loss: 0.8590 - acc: 0.7004 - val_loss: 0.9977 - val_acc: 0.6551
Epoch 34/50
50000/50000 [==============================] - 6s 115us/step - loss: 0.8440 - acc: 0.7083 - val_loss: 1.0536 - val_acc: 0.6438
Epoch 35/50
50000/50000 [==============================] - 6s 116us/step - loss: 0.8274 - acc: 0.7140 - val_loss: 1.1110 - val_acc: 0.6119
Epoch 36/50
50000/50000 [==============================] - 6s 116us/step - loss: 0.8134 - acc: 0.7173 - val_loss: 1.0435 - val_acc: 0.6477
Epoch 37/50
50000/50000 [==============================] - 6s 115us/step - loss: 0.7967 - acc: 0.7250 - val_loss: 1.0294 - val_acc: 0.6473
Epoch 38/50
50000/50000 [==============================] - 6s 115us/step - loss: 0.7831 - acc: 0.7286 - val_loss: 0.9457 - val_acc: 0.6736
Epoch 39/50
50000/50000 [==============================] - 6s 115us/step - loss: 0.7660 - acc: 0.7358 - val_loss: 0.9650 - val_acc: 0.6648
Epoch 40/50
50000/50000 [==============================] - 6s 116us/step - loss: 0.7551 - acc: 0.7392 - val_loss: 0.9824 - val_acc: 0.6653
Epoch 41/50
50000/50000 [==============================] - 6s 115us/step - loss: 0.7395 - acc: 0.7446 - val_loss: 0.9714 - val_acc: 0.6706
Epoch 42/50
50000/50000 [==============================] - 6s 115us/step - loss: 0.7264 - acc: 0.7486 - val_loss: 0.9794 - val_acc: 0.6655
Epoch 43/50
50000/50000 [==============================] - 6s 116us/step - loss: 0.7106 - acc: 0.7546 - val_loss: 1.0379 - val_acc: 0.6566
Epoch 44/50
50000/50000 [==============================] - 6s 116us/step - loss: 0.6938 - acc: 0.7597 - val_loss: 0.9049 - val_acc: 0.6903
Epoch 45/50
50000/50000 [==============================] - 6s 116us/step - loss: 0.6851 - acc: 0.7643 - val_loss: 0.9483 - val_acc: 0.6785
Epoch 46/50
50000/50000 [==============================] - 6s 116us/step - loss: 0.6696 - acc: 0.7692 - val_loss: 0.9494 - val_acc: 0.6856
Epoch 47/50
50000/50000 [==============================] - 6s 116us/step - loss: 0.6566 - acc: 0.7735 - val_loss: 0.9291 - val_acc: 0.6855
Epoch 48/50
50000/50000 [==============================] - 6s 115us/step - loss: 0.6446 - acc: 0.7762 - val_loss: 0.9010 - val_acc: 0.6947
Epoch 49/50
50000/50000 [==============================] - 6s 116us/step - loss: 0.6300 - acc: 0.7811 - val_loss: 0.9371 - val_acc: 0.6854
Epoch 50/50
50000/50000 [==============================] - 6s 115us/step - loss: 0.6197 - acc: 0.7854 - val_loss: 0.9821 - val_acc: 0.6781
Out[9]:
<keras.callbacks.History at 0x7f2ae1f23160>

Evaluate test accuracy.


In [10]:
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])


Test loss: 0.9820661101341247
Test accuracy: 0.6781

One problem you might notice is that the accuracy of the model is much better on the training set than on the test set. You can see that by monitoring the progress at the end of each epoch above or by evaluating it directly.


In [11]:
score = model.evaluate(x_train, y_train, verbose=0)
print('Training loss:', score[0])
print('Training accuracy:', score[1])


Training loss: 0.647886616191864
Training accuracy: 0.7712

77% accuracy on the training set but only 68% on the test set. Looking at the monitored training, the validation accuracy and training accuracy began to diverge around epoch 10.

Something must be wrong! This is a symptom of "overfitting". Our model has probably tried to bend itself a little too well towards predicting the training set but does not generalize very well to unseen data. This is a very common problem.

It's normal for the training accuracy to be better than the testng accuracy to some degree, because it's hard to avoid for the network to be better at predicting the data it sees. But a 9% difference is too much.

One way of helping this is by doing some regularization. We can add dropout to our model after a few layers.


In [22]:
model = Sequential()
model.add(Conv2D(128, (3, 3), padding='same', input_shape=(32,32,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dense(100))
model.add(Activation('relu'))
model.add(Dropout(0.25))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_21 (Conv2D)           (None, 32, 32, 128)       3584      
_________________________________________________________________
activation_41 (Activation)   (None, 32, 32, 128)       0         
_________________________________________________________________
max_pooling2d_21 (MaxPooling (None, 16, 16, 128)       0         
_________________________________________________________________
conv2d_22 (Conv2D)           (None, 16, 16, 64)        73792     
_________________________________________________________________
activation_42 (Activation)   (None, 16, 16, 64)        0         
_________________________________________________________________
max_pooling2d_22 (MaxPooling (None, 8, 8, 64)          0         
_________________________________________________________________
conv2d_23 (Conv2D)           (None, 8, 8, 64)          36928     
_________________________________________________________________
activation_43 (Activation)   (None, 8, 8, 64)          0         
_________________________________________________________________
max_pooling2d_23 (MaxPooling (None, 4, 4, 64)          0         
_________________________________________________________________
dropout_13 (Dropout)         (None, 4, 4, 64)          0         
_________________________________________________________________
flatten_8 (Flatten)          (None, 1024)              0         
_________________________________________________________________
dense_24 (Dense)             (None, 256)               262400    
_________________________________________________________________
activation_44 (Activation)   (None, 256)               0         
_________________________________________________________________
dense_25 (Dense)             (None, 100)               25700     
_________________________________________________________________
activation_45 (Activation)   (None, 100)               0         
_________________________________________________________________
dropout_14 (Dropout)         (None, 100)               0         
_________________________________________________________________
dense_26 (Dense)             (None, 10)                1010      
_________________________________________________________________
activation_46 (Activation)   (None, 10)                0         
=================================================================
Total params: 403,414
Trainable params: 403,414
Non-trainable params: 0
_________________________________________________________________

We compile and train again.


In [23]:
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=128,
          epochs=50,
          validation_data=(x_test, y_test))


Train on 50000 samples, validate on 10000 samples
Epoch 1/50
50000/50000 [==============================] - 7s 132us/step - loss: 2.2915 - acc: 0.1272 - val_loss: 2.2643 - val_acc: 0.1780
Epoch 2/50
50000/50000 [==============================] - 6s 122us/step - loss: 2.1875 - acc: 0.1797 - val_loss: 2.0443 - val_acc: 0.2662
Epoch 3/50
50000/50000 [==============================] - 6s 121us/step - loss: 2.0504 - acc: 0.2390 - val_loss: 1.9592 - val_acc: 0.2933
Epoch 4/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.9522 - acc: 0.2873 - val_loss: 1.8107 - val_acc: 0.3590
Epoch 5/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.8516 - acc: 0.3258 - val_loss: 1.7262 - val_acc: 0.3914
Epoch 6/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.7655 - acc: 0.3619 - val_loss: 1.6722 - val_acc: 0.4035
Epoch 7/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.6940 - acc: 0.3835 - val_loss: 1.5555 - val_acc: 0.4370
Epoch 8/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.6410 - acc: 0.4016 - val_loss: 1.5345 - val_acc: 0.4442
Epoch 9/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.6012 - acc: 0.4150 - val_loss: 1.5140 - val_acc: 0.4560
Epoch 10/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.5629 - acc: 0.4290 - val_loss: 1.4396 - val_acc: 0.4790
Epoch 11/50
50000/50000 [==============================] - 6s 123us/step - loss: 1.5281 - acc: 0.4436 - val_loss: 1.4515 - val_acc: 0.4725
Epoch 12/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.5052 - acc: 0.4528 - val_loss: 1.3930 - val_acc: 0.4940
Epoch 13/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.4687 - acc: 0.4650 - val_loss: 1.3571 - val_acc: 0.5054
Epoch 14/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.4461 - acc: 0.4735 - val_loss: 1.3391 - val_acc: 0.5132
Epoch 15/50
50000/50000 [==============================] - 6s 123us/step - loss: 1.4252 - acc: 0.4852 - val_loss: 1.3287 - val_acc: 0.5178
Epoch 16/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.4015 - acc: 0.4925 - val_loss: 1.3226 - val_acc: 0.5173
Epoch 17/50
50000/50000 [==============================] - 6s 123us/step - loss: 1.3816 - acc: 0.5021 - val_loss: 1.3070 - val_acc: 0.5310
Epoch 18/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.3587 - acc: 0.5107 - val_loss: 1.3062 - val_acc: 0.5306
Epoch 19/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.3396 - acc: 0.5159 - val_loss: 1.2485 - val_acc: 0.5544
Epoch 20/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.3177 - acc: 0.5257 - val_loss: 1.2497 - val_acc: 0.5558
Epoch 21/50
50000/50000 [==============================] - 6s 121us/step - loss: 1.2973 - acc: 0.5338 - val_loss: 1.2119 - val_acc: 0.5656
Epoch 22/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.2770 - acc: 0.5419 - val_loss: 1.1726 - val_acc: 0.5828
Epoch 23/50
50000/50000 [==============================] - 6s 121us/step - loss: 1.2585 - acc: 0.5517 - val_loss: 1.1617 - val_acc: 0.5834
Epoch 24/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.2417 - acc: 0.5545 - val_loss: 1.1736 - val_acc: 0.5815
Epoch 25/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.2255 - acc: 0.5621 - val_loss: 1.1333 - val_acc: 0.5968
Epoch 26/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.2077 - acc: 0.5681 - val_loss: 1.1415 - val_acc: 0.5894
Epoch 27/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.1894 - acc: 0.5766 - val_loss: 1.1133 - val_acc: 0.6033
Epoch 28/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.1788 - acc: 0.5799 - val_loss: 1.0746 - val_acc: 0.6180
Epoch 29/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.1583 - acc: 0.5873 - val_loss: 1.0697 - val_acc: 0.6180
Epoch 30/50
50000/50000 [==============================] - 6s 123us/step - loss: 1.1419 - acc: 0.5944 - val_loss: 1.0474 - val_acc: 0.6314
Epoch 31/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.1284 - acc: 0.5990 - val_loss: 1.0345 - val_acc: 0.6350
Epoch 32/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.1087 - acc: 0.6061 - val_loss: 1.0353 - val_acc: 0.6329
Epoch 33/50
50000/50000 [==============================] - 6s 123us/step - loss: 1.1019 - acc: 0.6089 - val_loss: 1.0311 - val_acc: 0.6331
Epoch 34/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.0843 - acc: 0.6150 - val_loss: 1.0180 - val_acc: 0.6389
Epoch 35/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.0710 - acc: 0.6232 - val_loss: 0.9908 - val_acc: 0.6507
Epoch 36/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.0588 - acc: 0.6285 - val_loss: 1.0137 - val_acc: 0.6447
Epoch 37/50
50000/50000 [==============================] - 6s 123us/step - loss: 1.0508 - acc: 0.6278 - val_loss: 0.9988 - val_acc: 0.6464
Epoch 38/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.0370 - acc: 0.6327 - val_loss: 0.9680 - val_acc: 0.6547
Epoch 39/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.0229 - acc: 0.6401 - val_loss: 0.9509 - val_acc: 0.6677
Epoch 40/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.0151 - acc: 0.6409 - val_loss: 0.9826 - val_acc: 0.6538
Epoch 41/50
50000/50000 [==============================] - 6s 122us/step - loss: 1.0019 - acc: 0.6465 - val_loss: 0.9400 - val_acc: 0.6728
Epoch 42/50
50000/50000 [==============================] - 6s 122us/step - loss: 0.9904 - acc: 0.6495 - val_loss: 0.9605 - val_acc: 0.6589
Epoch 43/50
50000/50000 [==============================] - 6s 122us/step - loss: 0.9832 - acc: 0.6535 - val_loss: 0.9385 - val_acc: 0.6720
Epoch 44/50
50000/50000 [==============================] - 6s 122us/step - loss: 0.9728 - acc: 0.6566 - val_loss: 0.9163 - val_acc: 0.6833
Epoch 45/50
50000/50000 [==============================] - 6s 123us/step - loss: 0.9583 - acc: 0.6620 - val_loss: 0.8996 - val_acc: 0.6861
Epoch 46/50
50000/50000 [==============================] - 6s 122us/step - loss: 0.9492 - acc: 0.6668 - val_loss: 0.9058 - val_acc: 0.6861
Epoch 47/50
50000/50000 [==============================] - 6s 122us/step - loss: 0.9393 - acc: 0.6710 - val_loss: 0.8956 - val_acc: 0.6874
Epoch 48/50
50000/50000 [==============================] - 6s 122us/step - loss: 0.9304 - acc: 0.6721 - val_loss: 0.8769 - val_acc: 0.6929
Epoch 49/50
50000/50000 [==============================] - 6s 122us/step - loss: 0.9194 - acc: 0.6757 - val_loss: 0.8853 - val_acc: 0.6887
Epoch 50/50
50000/50000 [==============================] - 6s 122us/step - loss: 0.9110 - acc: 0.6796 - val_loss: 0.8791 - val_acc: 0.6929
Out[23]:
<keras.callbacks.History at 0x7f285c865fd0>

We check our test loss and training loss again.


In [25]:
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

score = model.evaluate(x_train, y_train, verbose=0)
print('Training loss:', score[0])
print('Training accuracy:', score[1])


Test loss: 0.8791208669662476
Test accuracy: 0.6929
Training loss: 0.792264692363739
Training accuracy: 0.72292

Now our training accuracy is lower (72%) but our test accuracy is higher (69%). This is more like what we expect.


In [31]:
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

score = model.evaluate(x_train, y_train, verbose=0)
print('Training loss:', score[0])
print('Training accuracy:', score[1])


Test loss: 0.8672016343116761
Test accuracy: 0.6991
Training loss: 0.7848990835762024
Training accuracy: 0.73034

Another way of improving performance is to experiment with different optimizers beyond just standard sgd. Let's try to instantiate the same network but use ADAM instead of sgd.


In [47]:
model = Sequential()
model.add(Conv2D(128, (3, 3), padding='same', input_shape=(32,32,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dense(100))
model.add(Activation('relu'))
model.add(Dropout(0.25))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_46 (Conv2D)           (None, 32, 32, 128)       3584      
_________________________________________________________________
activation_83 (Activation)   (None, 32, 32, 128)       0         
_________________________________________________________________
max_pooling2d_38 (MaxPooling (None, 16, 16, 128)       0         
_________________________________________________________________
conv2d_47 (Conv2D)           (None, 16, 16, 64)        73792     
_________________________________________________________________
activation_84 (Activation)   (None, 16, 16, 64)        0         
_________________________________________________________________
max_pooling2d_39 (MaxPooling (None, 8, 8, 64)          0         
_________________________________________________________________
conv2d_48 (Conv2D)           (None, 8, 8, 64)          36928     
_________________________________________________________________
activation_85 (Activation)   (None, 8, 8, 64)          0         
_________________________________________________________________
max_pooling2d_40 (MaxPooling (None, 4, 4, 64)          0         
_________________________________________________________________
dropout_29 (Dropout)         (None, 4, 4, 64)          0         
_________________________________________________________________
flatten_15 (Flatten)         (None, 1024)              0         
_________________________________________________________________
dense_44 (Dense)             (None, 256)               262400    
_________________________________________________________________
activation_86 (Activation)   (None, 256)               0         
_________________________________________________________________
dense_45 (Dense)             (None, 100)               25700     
_________________________________________________________________
activation_87 (Activation)   (None, 100)               0         
_________________________________________________________________
dropout_30 (Dropout)         (None, 100)               0         
_________________________________________________________________
dense_46 (Dense)             (None, 10)                1010      
_________________________________________________________________
activation_88 (Activation)   (None, 10)                0         
=================================================================
Total params: 403,414
Trainable params: 403,414
Non-trainable params: 0
_________________________________________________________________

In [48]:
model.compile(
    loss='categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

model.fit(x_train, y_train,
          batch_size=128,
          epochs=50,
          validation_data=(x_test, y_test))


Train on 50000 samples, validate on 10000 samples
Epoch 1/50
50000/50000 [==============================] - 7s 146us/step - loss: 1.6382 - acc: 0.4002 - val_loss: 1.2972 - val_acc: 0.5378
Epoch 2/50
50000/50000 [==============================] - 6s 129us/step - loss: 1.2317 - acc: 0.5640 - val_loss: 1.0376 - val_acc: 0.6280
Epoch 3/50
50000/50000 [==============================] - 6s 130us/step - loss: 1.0625 - acc: 0.6239 - val_loss: 0.9647 - val_acc: 0.6581
Epoch 4/50
50000/50000 [==============================] - 6s 129us/step - loss: 0.9416 - acc: 0.6708 - val_loss: 0.8799 - val_acc: 0.6979
Epoch 5/50
50000/50000 [==============================] - 6s 129us/step - loss: 0.8508 - acc: 0.7044 - val_loss: 0.8396 - val_acc: 0.7045
Epoch 6/50
50000/50000 [==============================] - 6s 129us/step - loss: 0.7853 - acc: 0.7277 - val_loss: 0.7940 - val_acc: 0.7231
Epoch 7/50
50000/50000 [==============================] - 6s 129us/step - loss: 0.7307 - acc: 0.7441 - val_loss: 0.7589 - val_acc: 0.7363
Epoch 8/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.6834 - acc: 0.7608 - val_loss: 0.7449 - val_acc: 0.7410
Epoch 9/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.6334 - acc: 0.7797 - val_loss: 0.7348 - val_acc: 0.7457
Epoch 10/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.6021 - acc: 0.7906 - val_loss: 0.7106 - val_acc: 0.7592
Epoch 11/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.5622 - acc: 0.8017 - val_loss: 0.6812 - val_acc: 0.7647
Epoch 12/50
50000/50000 [==============================] - 6s 129us/step - loss: 0.5279 - acc: 0.8141 - val_loss: 0.6972 - val_acc: 0.7663
Epoch 13/50
50000/50000 [==============================] - 6s 129us/step - loss: 0.4999 - acc: 0.8249 - val_loss: 0.6749 - val_acc: 0.7737
Epoch 14/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.4752 - acc: 0.8325 - val_loss: 0.6941 - val_acc: 0.7669
Epoch 15/50
50000/50000 [==============================] - 6s 129us/step - loss: 0.4453 - acc: 0.8430 - val_loss: 0.7053 - val_acc: 0.7661
Epoch 16/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.4271 - acc: 0.8489 - val_loss: 0.6851 - val_acc: 0.7784
Epoch 17/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.4029 - acc: 0.8571 - val_loss: 0.6793 - val_acc: 0.7789
Epoch 18/50
50000/50000 [==============================] - 6s 129us/step - loss: 0.3794 - acc: 0.8664 - val_loss: 0.7009 - val_acc: 0.7728
Epoch 19/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.3659 - acc: 0.8702 - val_loss: 0.7238 - val_acc: 0.7756
Epoch 20/50
50000/50000 [==============================] - 6s 129us/step - loss: 0.3431 - acc: 0.8768 - val_loss: 0.7185 - val_acc: 0.7758
Epoch 21/50
50000/50000 [==============================] - 6s 129us/step - loss: 0.3313 - acc: 0.8819 - val_loss: 0.7220 - val_acc: 0.7721
Epoch 22/50
50000/50000 [==============================] - 6s 129us/step - loss: 0.3134 - acc: 0.8892 - val_loss: 0.7411 - val_acc: 0.7751
Epoch 23/50
50000/50000 [==============================] - 6s 129us/step - loss: 0.3061 - acc: 0.8903 - val_loss: 0.7551 - val_acc: 0.7767
Epoch 24/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.2812 - acc: 0.9010 - val_loss: 0.7759 - val_acc: 0.7695
Epoch 25/50
50000/50000 [==============================] - 6s 129us/step - loss: 0.2827 - acc: 0.8980 - val_loss: 0.7740 - val_acc: 0.7760
Epoch 26/50
50000/50000 [==============================] - 6s 129us/step - loss: 0.2768 - acc: 0.9020 - val_loss: 0.8171 - val_acc: 0.7692
Epoch 27/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.2647 - acc: 0.9054 - val_loss: 0.7987 - val_acc: 0.7765
Epoch 28/50
50000/50000 [==============================] - 6s 129us/step - loss: 0.2533 - acc: 0.9103 - val_loss: 0.8112 - val_acc: 0.7726
Epoch 29/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.2518 - acc: 0.9121 - val_loss: 0.8097 - val_acc: 0.7742
Epoch 30/50
50000/50000 [==============================] - 6s 129us/step - loss: 0.2426 - acc: 0.9147 - val_loss: 0.8517 - val_acc: 0.7734
Epoch 31/50
50000/50000 [==============================] - 6s 129us/step - loss: 0.2285 - acc: 0.9201 - val_loss: 0.8203 - val_acc: 0.7794
Epoch 32/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.2251 - acc: 0.9214 - val_loss: 0.8824 - val_acc: 0.7651
Epoch 33/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.2173 - acc: 0.9230 - val_loss: 0.8846 - val_acc: 0.7679
Epoch 34/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.2139 - acc: 0.9251 - val_loss: 0.8547 - val_acc: 0.7750
Epoch 35/50
50000/50000 [==============================] - 6s 130us/step - loss: 0.2063 - acc: 0.9271 - val_loss: 0.8930 - val_acc: 0.7744
Epoch 36/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.2109 - acc: 0.9251 - val_loss: 0.8623 - val_acc: 0.7755
Epoch 37/50
50000/50000 [==============================] - 6s 129us/step - loss: 0.2030 - acc: 0.9287 - val_loss: 0.9001 - val_acc: 0.7746
Epoch 38/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.1962 - acc: 0.9316 - val_loss: 0.8962 - val_acc: 0.7719
Epoch 39/50
50000/50000 [==============================] - 6s 129us/step - loss: 0.1901 - acc: 0.9339 - val_loss: 0.9481 - val_acc: 0.7655
Epoch 40/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.1851 - acc: 0.9347 - val_loss: 0.9045 - val_acc: 0.7719
Epoch 41/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.1904 - acc: 0.9328 - val_loss: 0.9049 - val_acc: 0.7790
Epoch 42/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.1760 - acc: 0.9387 - val_loss: 0.9005 - val_acc: 0.7784
Epoch 43/50
50000/50000 [==============================] - 6s 129us/step - loss: 0.1777 - acc: 0.9388 - val_loss: 0.9391 - val_acc: 0.7739
Epoch 44/50
50000/50000 [==============================] - 6s 129us/step - loss: 0.1772 - acc: 0.9376 - val_loss: 0.9207 - val_acc: 0.7781
Epoch 45/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.1693 - acc: 0.9417 - val_loss: 0.9473 - val_acc: 0.7765
Epoch 46/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.1616 - acc: 0.9437 - val_loss: 0.9907 - val_acc: 0.7734
Epoch 47/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.1718 - acc: 0.9399 - val_loss: 0.9832 - val_acc: 0.7712
Epoch 48/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.1582 - acc: 0.9462 - val_loss: 0.9661 - val_acc: 0.7747
Epoch 49/50
50000/50000 [==============================] - 6s 128us/step - loss: 0.1637 - acc: 0.9425 - val_loss: 0.9532 - val_acc: 0.7779
Epoch 50/50
50000/50000 [==============================] - 6s 129us/step - loss: 0.1530 - acc: 0.9471 - val_loss: 0.9661 - val_acc: 0.7790
Out[48]:
<keras.callbacks.History at 0x7f2829a72a58>

In [49]:
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

score = model.evaluate(x_train, y_train, verbose=0)
print('Training loss:', score[0])
print('Training accuracy:', score[1])


Test loss: 0.9660877988576889
Test accuracy: 0.779
Training loss: 0.024635728995651005
Training accuracy: 0.99504

78% accuracy! Our best yet. Looks heavily overfit though (99% accuracy on the training set... maybe needs more dropout?)

Still a long way to go to beat the record (96%). We can make a lot of progress by making the network (much) bigger, training for (much) longer and using a lot of little tricks (like data augmentation) but that is beyond the scope of this lesson for now.

Let's also recall how to predict a single value and look at its probabilities.


In [50]:
import matplotlib
x_sample = x_test[0].reshape(1,32,32,3)
y_prob = model.predict(x_sample)[0]
y_pred = y_prob.argmax()
y_actual = y_test[0].argmax()

print("predicted = %d, actual = %d" % (y_pred, y_actual))
matplotlib.pyplot.bar(range(10), y_prob)


predicted = 3, actual = 3
Out[50]:
<BarContainer object of 10 artists>

Let's also review here how to save and load trained keras models. It's easy! From Keras docuemtnation


In [51]:
from keras.models import load_model

model.save('my_model.h5')  # creates a HDF5 file 'my_model.h5'
del model  # deletes the existing model

# returns a compiled model
# identical to the previous one
model = load_model('my_model.h5')