Transfer Learning Lab with VGG, Inception and ResNet

In this lab, you will continue exploring transfer learning. You've already explored feature extraction with AlexNet and TensorFlow. Next, you will use Keras to explore feature extraction with the VGG, Inception and ResNet architectures. The models you will use were trained for days or weeks on the ImageNet dataset. Thus, the weights encapsulate higher-level features learned from training on thousands of classes.

We'll use two datasets in this lab:

  1. German Traffic Sign Dataset
  2. Cifar10

Unless you have a powerful GPU, running feature extraction on these models will take a significant amount of time. To make things we precomputed bottleneck features for each (network, dataset) pair, this will allow you experiment with feature extraction even on a modest CPU. You can think of bottleneck features as feature extraction but with caching. Because the base network weights are frozen during feature extraction, the output for an image will always be the same. Thus, once the image has already been passed once through the network we can cache and reuse the output.

The files are encoded as such:

  • {network}_{dataset}_bottleneck_features_train.p
  • {network}_{dataset}_bottleneck_features_validation.p

network can be one of 'vgg', 'inception', or 'resnet'

dataset can be on of 'cifar10' or 'traffic'

How will the pretrained model perform on the new datasets?


In [ ]:
from keras.layers import Dense, Flatten, Input, Dropout
from keras.models import Sequential
import pickle

In [ ]:
def load_bottleneck_data(network, dataset):
    """
    Arguments:
        network - String, one of 'resnet', 'vgg', 'inception'
        dataset - String, one of 'cifar10', 'traffic'
    """
    train_file = '{}_{}_bottleneck_features_train.p'.format(network, dataset)
    validation_file = '{}_{}_bottleneck_features_validation.p'.format(network, dataset)
        
    with open(train_file, 'rb') as f:
        train_data = pickle.load(f)
    with open(validation_file, 'rb') as f:
        validation_data = pickle.load(f)
        
    X_train = train_data['features']
    y_train = train_data['labels']
    X_val = validation_data['features']
    y_val = validation_data['labels']
    
    return X_train, y_train, X_val, y_val

Feature Extraction

Before you try feature extraction on pretrained models it's a good idea to take a moment and run the classifier you used in the Traffic Sign project on the Cifar10 dataset. Cifar10 images are also (32, 32, 3) so the only thing you'll need to change is the number of classes to 10 instead of 43.

You can easily download and load the Cifar10 dataset like this:

from keras.datasets import cifar10
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

Cool, now you have something to compare the Cifar10 feature extraction results with!

Keep in mind the following as you experiment:

Does feature extraction outperform the Traffic Signs classifier on the Cifar10 dataset? Why?

Does feature extraction outperform the Traffic Signs classifier on the Traffic Signs dataset? Why?


In [ ]:
# load bottleneck data
X_train, y_train, X_val, y_val = load_bottleneck_data('resnet', 'traffic')

In [ ]:
nb_epoch = 50
batch_size = 32
nb_classes = 10 # NOTE: change this to 43 if using traffic sign data

print('Feature shape', X_train.shape[1:])

model = Sequential()
model.add(Flatten(input_shape=X_train.shape[1:]))
# TODO: Define the rest of your network here

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

In [ ]:
model.fit(
    X_train, 
    y_train, 
    nb_epoch=nb_epoch, 
    batch_size=batch_size, 
    validation_data=(X_val, y_val), 
    shuffle=True
)

Summary

By now you should have a good feel for feature extraction and when it might be a good choice. To end this lab, let's summarize when we should consider:

  1. Feature extraction (train only the top-level of the network, the rest of the network remains fixed)
  2. Finetuning (train the entire network end-to-end, start with pretrained weights)
  3. Training from scratch (train the entire network end-to-end, start from random weights)

Consider feature extraction when ...

If dataset is small and similar to the original dataset. The higher-level features learned from the original dataset should be relevant to the new dataset.

Consider finetuning when ...

If the dataset is large and similar to the original dataset. In this case we should be much more confident we won't overfit so it should be safe to alter the original weights.

If the dataset is small and very different from the original dataset. You could also make the case for training from scratch. If we choose to finetune it might be a good idea to only use features found earlier on in the network, features found later might be too dataset specific.

Consider training from scratch when ...

If the dataset is large and very different from the original dataset. In this case we have enough data to confidently train from scratch. However, even in this case it might be more beneficial to finetune and the entire network from pretrained weights.


Most importantly, keep in mind for a lot of problems you won't need an architecture as complicated and powerful as VGG, Inception, or ResNet. These architectures were made for the task of classifying thousands of complex classes. A much smaller network might be a much better fit for your problem, especially if you can comfortably train it on moderate hardware.


In [ ]: