Fine Tuning Example

Transfer learning example:

1- Train a simple convnet on the MNIST dataset the first 5 digits [0..4].

2- Freeze convolutional layers and fine-tune dense layers for the classification of digits [5..9].

Get to 99.8% test accuracy after 5 epochs for the first five digits classifier and 99.2% for the last five digits after transfer + fine-tuning.


In [1]:
from __future__ import print_function

import datetime
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.preprocessing import image
from keras.applications.imagenet_utils import preprocess_input
from keras import backend as K
import numpy as np


Using Theano backend.

In [2]:
now = datetime.datetime.now

batch_size = 128
num_classes = 5
epochs = 5

# input image dimensions
img_rows, img_cols = 28, 28
# number of convolutional filters to use
filters = 32
# size of pooling area for max pooling
pool_size = 2
# convolution kernel size
kernel_size = 3

Keras Configs

~/.keras/keras.json

Specify whether you will use Theano or TensorFlow, Optmization options, Channel first and more.


In [3]:
if K.image_data_format() == 'channels_first':
    input_shape = (1, img_rows, img_cols)
else:
    input_shape = (img_rows, img_cols, 1)

In [4]:
def train_model(model, train, test, num_classes):
    x_train = train[0].reshape((train[0].shape[0],) + input_shape)
    x_test = test[0].reshape((test[0].shape[0],) + input_shape)
    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')
    x_train /= 255
    x_test /= 255
    print('x_train shape:', x_train.shape)
    print(x_train.shape[0], 'train samples')
    print(x_test.shape[0], 'test samples')

    # convert class vectors to binary class matrices
    y_train = keras.utils.to_categorical(train[1], num_classes)
    y_test = keras.utils.to_categorical(test[1], num_classes)

    model.compile(loss='categorical_crossentropy',
                  optimizer='adadelta',
                  metrics=['accuracy'])

    t = now()
    model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              verbose=1,
              validation_data=(x_test, y_test))
    print('Training time: %s' % (now() - t))
    score = model.evaluate(x_test, y_test, verbose=0)
    print('Test score:', score[0])
    print('Test accuracy:', score[1])

the data, shuffled and split between train and test sets


In [5]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# create two datasets one with digits below 5 and one with 5 and above
x_train_lt5 = x_train[y_train < 5]
y_train_lt5 = y_train[y_train < 5]
x_test_lt5 = x_test[y_test < 5]
y_test_lt5 = y_test[y_test < 5]

x_train_gte5 = x_train[y_train >= 5]
y_train_gte5 = y_train[y_train >= 5] - 5
x_test_gte5 = x_test[y_test >= 5]
y_test_gte5 = y_test[y_test >= 5] - 5

define two groups of layers: feature (convolutions) and classification (dense)


In [6]:
feature_layers = [
    Conv2D(filters, kernel_size,
           padding='valid',
           input_shape=input_shape),
    Activation('relu'),
    Conv2D(filters, kernel_size),
    Activation('relu'),
    MaxPooling2D(pool_size=pool_size),
    Dropout(0.25),
    Flatten(),
]

classification_layers = [
    Dense(128),
    Activation('relu'),
    Dropout(0.5),
    Dense(num_classes),
    Activation('softmax')
]

In [7]:
# create complete model
model = Sequential(feature_layers + classification_layers)

# train model for 5-digit classification [0..4]
train_model(model,
            (x_train_lt5, y_train_lt5),
            (x_test_lt5, y_test_lt5), num_classes)

# freeze feature layers and rebuild model
for l in feature_layers:
    l.trainable = False

# transfer: train dense layers for new classification task [5..9]
train_model(model,
            (x_train_gte5, y_train_gte5),
            (x_test_gte5, y_test_gte5), num_classes)


x_train shape: (30596, 28, 28, 1)
30596 train samples
5139 test samples
Train on 30596 samples, validate on 5139 samples
Epoch 1/5
30596/30596 [==============================] - 50s - loss: 0.2054 - acc: 0.9383 - val_loss: 0.0481 - val_acc: 0.9848
Epoch 2/5
30596/30596 [==============================] - 50s - loss: 0.0732 - acc: 0.9776 - val_loss: 0.0264 - val_acc: 0.9924
Epoch 3/5
30596/30596 [==============================] - 48s - loss: 0.0484 - acc: 0.9852 - val_loss: 0.0171 - val_acc: 0.9947
Epoch 4/5
30596/30596 [==============================] - 50s - loss: 0.0375 - acc: 0.9893 - val_loss: 0.0133 - val_acc: 0.9963
Epoch 5/5
30596/30596 [==============================] - 50s - loss: 0.0297 - acc: 0.9908 - val_loss: 0.0107 - val_acc: 0.9961
Training time: 0:04:13.897418
Test score: 0.0107047033795
Test accuracy: 0.996108192255
x_train shape: (29404, 28, 28, 1)
29404 train samples
4861 test samples
Train on 29404 samples, validate on 4861 samples
Epoch 1/5
29404/29404 [==============================] - 20s - loss: 0.3382 - acc: 0.8960 - val_loss: 0.0807 - val_acc: 0.9770
Epoch 2/5
29404/29404 [==============================] - 20s - loss: 0.1109 - acc: 0.9661 - val_loss: 0.0505 - val_acc: 0.9852
Epoch 3/5
29404/29404 [==============================] - 20s - loss: 0.0803 - acc: 0.9754 - val_loss: 0.0456 - val_acc: 0.9854
Epoch 4/5
29404/29404 [==============================] - 20s - loss: 0.0674 - acc: 0.9785 - val_loss: 0.0377 - val_acc: 0.9875
Epoch 5/5
29404/29404 [==============================] - 20s - loss: 0.0592 - acc: 0.9822 - val_loss: 0.0341 - val_acc: 0.9885
Training time: 0:01:43.500362
Test score: 0.034055509847
Test accuracy: 0.988479736569

Test it out on your own handwriting!


In [14]: