Fine Tuning a pre-trained Deep CNN on a GPU machine

This session is inspired by a blog post by François Chollet, the creator of the Keras library.

WARNING: the execution of notebook requires a GPU e.g. with at least 6GB of GPU RAM.

For this session we are going to use the dataset of the dogs-vs-cats.

It is recommended to do this notebook from the kaggle kernels hosted interface that provides GPU hours for free:

  • login at kaggle kernels;
  • click the new notebook button;
  • upload this notebook file from the "File" menu;
  • in the "File" menu "Add or upload data" and choose to add the Dogs vs. Cats dataset;
  • the data should be available in the /kaggle/input/dogs-vs-cats folder of your kaggle kernel session;
  • enable "Internet" and "GPU" in the "Settings" panel of this kernel.

Alternatively, to download the data yourself, create a password-based account on Kaggle, then click on the download link of one of the data file when you are logged-in in your browser to get to the form that makes you accept the terms and conditions of that challenge.

Then in a shell session possibly on a server do the following:

pip3 install kaggle
# You need to download a new api key here https://www.kaggle.com/{my_name}/account
# And save it likewise `~/.kaggle/kaggle.json`.
mkdir -p ~/data/dogs-vs-cats
cd ~/data/dogs-vs-cats
kaggle competitions download -c dogs-vs-cats

If you want to use colab, follow the instructions at https://www.kaggle.com/general/74235 to upload the kaggle.json file to your colab session.

This should download 3 files among which: train.zip and test1.zip (and a CSV template file we won't need).

Once this is done we can extract the archives for the train set:


In [ ]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

import os
import os.path as op
import shutil
from zipfile import ZipFile

In [ ]:
# When working from manually downloaded files:

# data_folder = op.expanduser('~/data/dogs-vs-cats')
# working_folder = data_folder
# train_folder = op.join(working_folder, 'train')
# if not op.exists(train_folder):
#     train_zip = op.join(data_folder, 'dogs-vs-cats.zip')

In [ ]:
# or when running on Kaggle:
data_folder = '/kaggle/input/dogs-vs-cats'
working_folder = "/kaggle/working"
train_zip = '/kaggle/input/dogs-vs-cats/train.zip'

In [ ]:
train_folder = op.join(working_folder, 'train')

if not op.exists(train_folder):
    print('Extracting %s...' % train_zip)
    ZipFile(train_zip).extractall(working_folder)

The Keras image data helpers want images for different classes ('cat' and 'dog') to live in distinct subfolders. Let's rearrange the image files to follow that convention:


In [ ]:
def rearrange_folders(folder):
    image_filenames = [op.join(folder, fn) for fn in os.listdir(folder)
                       if fn.endswith('.jpg')]
    if len(image_filenames) == 0:
        return
    print("Rearranging %d images in %s into one subfolder per class..."
          % (len(image_filenames), folder))
    for image_filename in image_filenames:
        subfolder, _ = image_filename.split('.', 1)
        subfolder = op.join(folder, subfolder)
        if not op.exists(subfolder):
            os.mkdir(subfolder)
        shutil.move(image_filename, subfolder)

rearrange_folders(train_folder)

Lets build a validation dataset by taking 500 images of cats and 500 images of dogs out of the training set:


In [ ]:
n_validation = 500

validation_folder = op.join(working_folder, 'validation')
if not op.exists(validation_folder):
    os.mkdir(validation_folder)
    for class_name in ['dog', 'cat']:
        train_subfolder = op.join(train_folder, class_name)
        validation_subfolder = op.join(validation_folder, class_name)
        print("Populating %s..." % validation_subfolder)
        os.mkdir(validation_subfolder)
        images_filenames = sorted(os.listdir(train_subfolder))
        for image_filename in images_filenames[-n_validation:]:
            shutil.move(op.join(train_subfolder, image_filename),
                        validation_subfolder)
        print("Moved %d images" % len(os.listdir(validation_subfolder)))

Data Loading and Data Augmentation

Let's use keras utilities to manually load the first image file of the cat folder. If keras complains about the missing "PIL" library, make sure to install it with one of the following commands:

conda install pillow

# or

pip install pillow

You might need to restart the kernel of this notebook to get Keras work.


In [ ]:
from tensorflow.keras.preprocessing.image import array_to_img, img_to_array, load_img

img = load_img(op.join(train_folder, 'cat', 'cat.249.jpg'))
x = img_to_array(img)

print(x.shape)

In [ ]:
plt.imshow(x.astype(np.uint8))
plt.axis('off');

Keras provides tools to generate many variations from a single image: this is useful to augment the dataset with variants that should not affect the image label: a rotated image of a cat is an image of a cat.

Doing data augmentation at train time make neural networks ignore such label-preserving transformations and therefore help reduce overfitting.


In [ ]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

augmenting_datagen = ImageDataGenerator(
    rescale=1. / 255,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    channel_shift_range=9,
    fill_mode='nearest'
)

In [ ]:
plt.figure(figsize=(11, 5))
flow = augmenting_datagen.flow(x[np.newaxis, :, :, :])
for i, x_augmented in zip(range(15), flow):
    plt.subplot(3, 5, i + 1)
    plt.imshow(x_augmented[0])
    plt.axis('off')

The ImageDataGenerator object can the be pointed to the dataset folder both load the image and augment them on the fly and resize / crop them to fit the input dimensions of the classification neural network.


In [ ]:
flow = augmenting_datagen.flow_from_directory(
    train_folder, batch_size=1, target_size=(224, 224))

plt.figure(figsize=(11, 5))
for i, (X, y) in zip(range(15), flow):
    plt.subplot(3, 5, i + 1)
    plt.imshow(X[0])
    plt.axis('off')

Loading a pre-trained computer vision model

Let us load a state of the art model with a good tradeoff between prediction speed, model size and predictive accuracy, namely a Residual Network with 54 parameterized layers (53 convolutional + 1 fully connected for the softmax):


In [ ]:
from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input

full_imagenet_model = ResNet50(weights='imagenet')

In [ ]:
print(full_imagenet_model.summary())

If you have graphviz system package and the pydot_ng python package installed you can uncomment the following cell to display the structure of the network.


In [ ]:
# from IPython.display import SVG
# from keras.utils.vis_utils import model_to_dot

# model_viz = model_to_dot(full_imagenet_model,
#                          show_layer_names=False,
#                          show_shapes=True)
# SVG(model_viz.create(prog='dot', format='svg'))

Transfer learning

Let's remove the last dense classification layer that is specific to the image net classes and use the previous layer (after flattening) as a feature extractor:


In [ ]:
from tensorflow.keras.models import Model

output = full_imagenet_model.layers[-2].output
base_model = Model(full_imagenet_model.input, output)

When using this model we need to be careful to apply the same image processing as was used during the training, otherwise the marginal distribution of the input pixels might not be on the right scale:


In [ ]:
def preprocess_function(x):
    if x.ndim == 3:
        x = x[np.newaxis, :, :, :]
    return preprocess_input(x)

In [ ]:
batch_size = 50

datagen = ImageDataGenerator(preprocessing_function=preprocess_function)

train_flow = datagen.flow_from_directory(
    train_folder,
    target_size=(224, 224),
    batch_size=batch_size,
    class_mode='binary',
    shuffle=True,
)

X, y = next(train_flow)
print(X.shape, y.shape)

Exercise: write a function that iterate of over 5000 images in the training set (bach after batch), extracts the activations of the last layer of base_model (by calling predicts) and collect the results in a big numpy array with dimensions (5000, 2048) for the features and (5000,) for the matching image labels.


In [ ]:
# %load solutions/dogs_vs_cats_extract_features.py

Let's load precomputed features if available:


In [ ]:
print("Loading precomputed features")
labels_train = np.load('labels_train.npy')
features_train = np.load('features_train.npy')

Let's train a simple linear model on those features. First let's check that the resulting small dataset has balanced classes:


In [ ]:
print(labels_train.shape)

In [ ]:
np.mean(labels_train)

In [ ]:
n_samples, n_features = features_train.shape
print(n_features, "features extracted")

Let's define the classification model:


In [ ]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import Adam


top_model = Sequential()
top_model.add(Dense(1, input_dim=n_features, activation='sigmoid'))
top_model.compile(optimizer=Adam(lr=1e-4),
                  loss='binary_crossentropy', metrics=['accuracy'])

top_model.fit(features_train, labels_train,
              validation_split=0.1, verbose=2, epochs=15)

Alright so the transfer learning is already at ~0.98 / 0.99 accuracy. This is not too surprising as the cats and dogs classes are already part of the imagenet label set.

Note that this is already as good or slightly better than the winner of the original kaggle competition three years ago. At that time they did not have pretrained resnet models at hand.

Or validation set has 1000 images, so an accuracy of 0.990 means only 10 classification errors.

Let's plug this on top the base model to be able to use it to make some classifications on our held out validation image folder:


In [ ]:
model = Model(base_model.input, top_model(base_model.output))

In [ ]:
flow = ImageDataGenerator().flow_from_directory(
        validation_folder, batch_size=1, target_size=(224, 224))

plt.figure(figsize=(12, 8))
for i, (X, y) in zip(range(15), flow):
    plt.subplot(3, 5, i + 1)
    plt.imshow(X[0] / 255)
    prediction = model.predict(preprocess_input(X))
    label = "dog" if y[:, 1] > 0.5 else "cat"
    plt.title("dog prob=%0.4f\ntrue label: %s"
              % (prediction, label))
    plt.axis('off')

Let's compute the validation score on the full validation set:


In [ ]:
valgen = ImageDataGenerator(preprocessing_function=preprocess_function)
val_flow = valgen.flow_from_directory(
    validation_folder, batch_size=batch_size, target_size=(224, 224),
    shuffle=False, class_mode='binary')

all_correct = []
for i, (X, y) in zip(range(val_flow.n // batch_size), val_flow):
    predictions = model.predict(X).ravel()
    correct = list((predictions > 0.5) == y)
    all_correct.extend(correct)
    print("Processed %d images" % len(all_correct))
    
print("Validation accuracy: %0.4f" % np.mean(all_correct))

Exercise: display the example where the model makes the most confident mistakes.

To display images in jupyter notebook you can use:

from IPython.display import Image, display
import os.path as op

display(Image(op.join(validation_folder, image_name)))

The filenames of items sampled by a flow (without random shuffling) can be accessed via: val_flow.filenames.


In [ ]:


In [ ]:
# %load solutions/dogs_vs_cats_worst_predictions.py

Fine tuning

Let's identify the location of the residual blocks (merge by addition in a residual architecture):


In [ ]:
from tensorflow.keras.layers import Add

[(i, l.output_shape)
 for (i, l) in enumerate(model.layers)
 if isinstance(l, Add)]

Let's fix the weights of the low level layers and fine tune the top level layers:


In [ ]:
for i, layer in enumerate(model.layers):
    layer.trainable = i >= 151

Let's fine tune a bit the top level layers to see if we can further improve the accuracy. Use the nvidia-smi command in a bash terminal on the server to monitor the GPU usage when the model is training.


In [ ]:
from tensorflow.keras import optimizers

augmenting_datagen = ImageDataGenerator(
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest',
    preprocessing_function=preprocess_function,
)
train_flow = augmenting_datagen.flow_from_directory(
    train_folder, target_size=(224, 224), batch_size=batch_size,
    class_mode='binary', shuffle=True, seed=0)

opt = optimizers.SGD(lr=1e-4, momentum=0.9)
model.compile(optimizer=opt, loss='binary_crossentropy',
              metrics=['accuracy'])


# compute the validation metrics every 5000 training samples
history = model.fit_generator(train_flow, 5000,
                              epochs=30,
                              validation_data=val_flow,
                              validation_steps=val_flow.n)

# Note: the pretrained model was already very good. Fine tuning
# does not really seem to help. It might be more interesting to
# introspect the quality of the labeling in the training set to
# check for images that are too ambiguous and should be removed
# from the training set.

Bonus exercise: train your own architecture from scratch using adam and data augmentation. Start with a small architecture first (e.g. 4 convolutions layers interleaved with 2 max pooling layers followed by a Flatten and two fully connected layers).


In [ ]: