Modify the Model

Retrain last layer's linear model

The original VGG16 network's last layer is Dense (a linear model), so it is a little odd and wasterful that we are adding an additional linear model on top of it in lesson 2.

Also, you may notice that the last layer had a softmax activation, which is an odd choice for an intermediate layer after we add another linear layer to a model.

So, we start by removing the last layer, and telling Keras to fix the weights in all the other layers.


In [1]:
%matplotlib inline
from importlib import reload
import utils; reload(utils)
from utils import *
import keras
from keras import backend as K
from keras.utils.data_utils import get_file
from keras.models import Sequential
from keras.layers import Input
from keras.layers.core import Flatten, Dense, Dropout, Lambda
from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.optimizers import SGD, RMSprop
from keras.preprocessing import image


WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10).  Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5103)
Using Theano backend.

In [2]:
from vgg16 import Vgg16
vgg = Vgg16()
model = vgg.model

# vgg.model.summary()

In [3]:
model.pop()  # remove the last layer
for layer in model.layers:
    layer.trainable = False

WARNING: Now that we have modified the definition of model, be careful not to rerun any code in the previous sections.


In [4]:
model.add(Dense(2, activation = 'softmax'))

Now, compile our updated model, and set up our batches to use the preprocessed images.


In [5]:
path = "data/dogscats/"
# path = "data/dogscats/"
model_path = path + 'models/'

trn_data = load_array(model_path+'train_data.bc')
val_data = load_array(model_path+'valid_data.bc')

# Use batch size of 1 since we're just doing preprocessing on the CPU
val_batches = get_batches(path+'valid', shuffle=False, batch_size=1)
batches = get_batches(path+'train', shuffle=False, batch_size=1)

val_classes = val_batches.classes
trn_classes = batches.classes
val_labels = onehot(val_classes)
trn_labels = onehot(trn_classes)

batch_size = 32
gen = image.ImageDataGenerator()
batches = gen.flow(trn_data, trn_labels, batch_size=batch_size, shuffle=True)
val_batches = gen.flow(val_data, val_labels, batch_size=batch_size, shuffle=False)


Found 2000 images belonging to 2 classes.
Found 23000 images belonging to 2 classes.

We define a simple function for fitting models, just to save some typing


In [6]:
def fit_model(model, batches, val_batches, nb_epoch=1):
    model.fit_generator(batches, samples_per_epoch=batches.N, nb_epoch=nb_epoch,
                       validation_data=val_batches, nb_val_samples=val_batches.N)

...and now, we can use it to train the last layer of our model! Be warned, it will run quite slowy, because it still has to calculate all the previous layers in order to know what input to pass to the new final layer.

You can always precalculate the output of the penultimate layer, like what we did earlier - but since we're only likle to want 1 or 2 iterations, let's just run it.


In [7]:
opt = RMSprop(lr=0.1)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

In [8]:
fit_model(model, batches, val_batches, nb_epoch=2)


Epoch 1/2
23000/23000 [==============================] - 619s - loss: 0.9897 - acc: 0.9370 - val_loss: 0.4793 - val_acc: 0.9690
Epoch 2/2
23000/23000 [==============================] - 624s - loss: 0.5502 - acc: 0.9653 - val_loss: 0.3480 - val_acc: 0.9780

In [9]:
model.save_weights(model_path+'finetune1.h5')

How many layers to retrain?

Well, for dogs vs. cats problems, the classes are similar to the imageNet models output, so no need to retrain more layers. But for state farms, we may consider to retrain more Dense layers.

However, for state farm, there is also no need to retrain the convolution layers, because the spacial relationships in pictures are very likely to be the same. Figuring out whether someone is playing mobile phones is not gonna use different spatial features.


In [9]:
layers = model.layers
# Get the index of the first dense layer...
first_dense_idx = [index for index,layer in enumerate(layers) if type(layer) is Dense][0]
# ...and set this and all subsequent layers to trainable
for layer in layers[first_dense_idx:]: layer.trainable=True

In [11]:
K.set_value(opt.lr, 0.01)
fit_model(model, batches, val_batches, 3)


Epoch 1/3
23000/23000 [==============================] - 625s - loss: 0.4156 - acc: 0.9737 - val_loss: 0.2933 - val_acc: 0.9815
Epoch 2/3
23000/23000 [==============================] - 627s - loss: 0.4255 - acc: 0.9729 - val_loss: 0.2746 - val_acc: 0.9825
Epoch 3/3
23000/23000 [==============================] - 627s - loss: 0.4104 - acc: 0.9741 - val_loss: 0.3240 - val_acc: 0.9795

In [12]:
model.save_weights(model_path+'finetune2.h5')

In [10]:
for layer in layers[12:]: layer.trainable=True
K.set_value(opt.lr, 0.001)
fit_model(model, batches, val_batches, 4)


Epoch 1/4
23000/23000 [==============================] - 636s - loss: 0.1293 - acc: 0.9700 - val_loss: 0.1078 - val_acc: 0.9785
Epoch 2/4
23000/23000 [==============================] - 641s - loss: 0.1167 - acc: 0.9771 - val_loss: 0.0929 - val_acc: 0.9825
Epoch 3/4
23000/23000 [==============================] - 640s - loss: 0.1230 - acc: 0.9789 - val_loss: 0.1299 - val_acc: 0.9800
Epoch 4/4
23000/23000 [==============================] - 640s - loss: 0.1199 - acc: 0.9787 - val_loss: 0.0994 - val_acc: 0.9840

In [11]:
model.save_weights(model_path+'finetune3.h5')

In [ ]: