In this notebook, I will publish some code examples for training AlexNet, using Keras and Theano.
Software Requirements :-
We will acheive 3 objectives -
In [1]:
%matplotlib inline
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import SGD
from alexnet_base import *
from utils import *
Define some variables
In [2]:
batch_size = 16
input_size = (3,227,227)
nb_classes = 2
mean_flag = True # if False, then the mean subtraction layer is not prepended
In the cell below, we define generators which are neat ways to perform realtime data augmentation. We use only basic data augmentation such as shear, zoom and flipping.
In [3]:
#code ported from https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator()
train_generator = train_datagen.flow_from_directory(
'../Data/Train',
batch_size=batch_size,
shuffle=True,
target_size=input_size[1:],
class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(
'../Data/Test',
batch_size=batch_size,
target_size=input_size[1:],
shuffle=True,
class_mode='categorical')
Ontain the AlexNet model object. The get_alexnet function abstracts two things
1. Implements real-time mean subtraction through prepending a custom layer
2. Specifies the initialization parameter of each layer to he_normal
In [4]:
alexnet = get_alexnet(input_size,nb_classes,mean_flag)
print alexnet.summary()
We train the AlexNet CNN using Stochastic Gradient descent for 80 epochs.
In [5]:
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
alexnet.compile(loss='mse',
optimizer=sgd,
metrics=['accuracy'])
history = alexnet.fit_generator(train_generator,
samples_per_epoch=2000,
validation_data=validation_generator,
nb_val_samples=800,
nb_epoch=80,
verbose=1)
In [6]:
plot_performance(history)
Discussion
The standard AlexNet, trained from scratch obtains ~84% accuracy. The accuracy and loss plots shown above suggest overfitting. In the next section, we will try to overcome this problem through fine-tuning.
We will follow the strategy as suggested in this paper - http://ieeexplore.ieee.org/abstract/document/7426826/?reload=true
The basic idea is to perform the training layerwise. What this means is - consider a 5 layer CNN with layers {L1,L2,L3,L4,L5}. In the first round of training, we freeze layers L1-L4 and finetune layer L5 for some epochs. In the next round, we train layers L4,L5 for some epochs. In the next round, we train L3,L4,L5 for some epochs. Essentially, the training percolates to shallow layers.
In [4]:
alexnet = get_alexnet(input_size,nb_classes,mean_flag)
alexnet.load_weights('../convnets-keras/weights/alexnet_weights.h5', by_name=True)
print alexnet.summary()
In [5]:
layers = ['dense_3_new','dense_2','dense_1','conv_5_1','conv_4_1','conv_3','conv_2_1','conv_1']
epochs = [10,10,10,10,10,10,10,10]
lr = [1e-2,1e-3,1e-3,1e-3,1e-3,1e-3,1e-3,1e-3]
history_finetune = []
for i,layer in enumerate(layers):
alexnet = unfreeze_layer_onwards(alexnet,layer)
sgd = SGD(lr=lr[i], decay=1e-6, momentum=0.9, nesterov=True)
alexnet.compile(loss='mse',
optimizer=sgd,
metrics=['accuracy'])
for epoch in range(epochs[i]):
h = alexnet.fit_generator(train_generator,
samples_per_epoch=2000,
validation_data=validation_generator,
nb_val_samples=800,
nb_epoch=1,
verbose=1)
history_finetune = append_history(history_finetune,h)
In [6]:
plot_performance(history_finetune)
Discussion
We get a test accuracy of ~89%. This is almost a 5% jump over the training from scratch case. This is a huge jump in test accuracy! One should not expect such a bug jump usually. We should bear in mind that the ImageNet dataset, on which AlexNet was pre-trained already contained dogs and cats among its classes. However, many papers report benefits of fine-tuning over training from scratch.
We treat the activation maps from outout of the last convolutional layer as dense feature maps. These are flattened and fed into a single layer ANN consisting of 256 neurons.
In the cell below, we get a sliced CNN model (alexnet_convolutional_only) which will enable us to extract features from any layer specified by its name ("convpool_5" in our case)
In [7]:
alexnet = get_alexnet(input_size,nb_classes,mean_flag)
alexnet.load_weights('../convnets-keras/weights/alexnet_weights.h5', by_name=True)
alexnet_convolutional_only = Model(input=alexnet.input, output=alexnet.get_layer('convpool_5').output)
Get the output of the sliced CNN model
In [9]:
#code ported from https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
import numpy as np
generator = train_datagen.flow_from_directory(
'../Data/Train',
target_size=(227, 227),
batch_size=batch_size,
class_mode=None, # this means our generator will only yield batches of data, no labels
shuffle=False) # our data will be in order, so all first 1000 images will be cats, then 1000 dogs
# the predict_generator method returns the output of a model, given
# a generator that yields batches of numpy data
train_data = alexnet_convolutional_only.predict_generator(generator, 2000)
train_labels = np.array([[1, 0]] * 1000 + [[0, 1]] * 1000)
generator = test_datagen.flow_from_directory(
'../Data/Test',
target_size=(227, 227),
batch_size=batch_size,
class_mode=None,
shuffle=False)
validation_data = alexnet_convolutional_only.predict_generator(generator, 800)
validation_labels = np.array([[1,0]] * 400 + [[0,1]] * 400)
We define and train a single layer ANN model with 256 neurons
In [10]:
from keras.models import Sequential
model = Sequential()
model.add(Flatten(input_shape=train_data.shape[1:]))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd,
loss='mse',
metrics=['accuracy'])
history_convpool_5 = model.fit(train_data, train_labels,
nb_epoch=80,
batch_size=batch_size,
validation_data=(validation_data, validation_labels),
verbose=2)
In [11]:
plot_performance(history_convpool_5)
Discussion We obtain a classification accuracy of ~84 which seems at par with the accuracy obtained while training from scratch. The model converges in very few epochs.