If you want some tutorial, follow the video from Lesson 48:50
In [1]:
from theano.sandbox import cuda
cuda.use('gpu1')
In [2]:
%matplotlib inline
from __future__ import division, print_function
from importlib import reload
import utils; reload(utils)
from utils import *
from IPython.display import FileLink
path = 'data/state/sample/'
batch_size = 64
In [3]:
import os
First, let's get the data ready.
Move to the data folder and create a new directory for statefarm:
cd ~/fastai-notes/deeplearning1/nbs/data/
mkdir state
cd state
Download the data from kaggle. Note: don't forget to accept the competition rule.
kg download -u yingchi.pei@gmail.com -p FL199473/kag -c state-farm-distracted-driver-detection
Unzip the downloaded data:
unzip -q imgs.zip
unzip -q driver_imgs_list.csv.zip
Count the number of files in the current directory: find . -type f | wc -l
In each category's folder inside /train, there are around 2.2K images. So let's take 200 from each category to the validation set.
In [13]:
DATA_HOME_DIR='/home/ubuntu/fastai-notes/deeplearning1/nbs/data/state'
%cd $DATA_HOME_DIR
%mkdir valid
%mkdir results
%cd train
In [11]:
for d in glob('c?'): os.mkdir('../valid/'+d)
In [14]:
g = glob('c?/*.jpg')
shuf = np.random.permutation(g)
for i in range(2000): os.rename(shuf[i], DATA_HOME_DIR+'/valid/'+shuf[i])
In [16]:
%cd $DATA_HOME_DIR
In [18]:
%mkdir -p sample/train
%mkdir -p sample/valid
In [19]:
%cd train
In [20]:
from shutil import copyfile
for d in glob('c?'):
os.mkdir('../sample/train/'+d)
os.mkdir('../sample/valid/'+d)
In [21]:
g = glob('c?/*.jpg')
shuf = np.random.permutation(g)
for i in range(1500): copyfile(shuf[i], '../sample/train/'+shuf[i])
In [22]:
%cd ../valid
In [23]:
g = glob('c?/*.jpg')
shuf = np.random.permutation(g)
for i in range(1000): copyfile(shuf[i], '../sample/valid/'+shuf[i])
In [24]:
%cd ../../..
%mkdir data/state/sample/test
In [4]:
batches = get_batches(path+'train', batch_size=batch_size)
val_batches = get_batches(path+'valid', batch_size=batch_size)
def get_batches(dirname, gen=image.ImageDataGenerator(), shuffle=True,
batch_size=4, class_mode='categorical', target_size=(224,224)):
return gen.flow_from_directory(dirname, target_size=target_size,
class_mode=class_mode, shuffle=shuffle, batch_size=batch_size)
From keras:
fit_generator(self, generator, steps_per_epoch, epochs=1, verbose=1, callbacks=None, validation_data=None, validation_steps=None, class_weight=None, max_queue_size=10, workers=1, use_multiprocessing=False, initial_epoch=0)
In [26]:
??get_classes()
In [5]:
(val_classes, trn_classes, val_labels, trn_labels, val_filenames, filenames,
test_filename) = get_classes(path)
First, we try the simplest model and use default parameter.
Note the trick of making the first layer a batchnorm layer - so that we don't have to worry about normalizing the input.
In [28]:
model = Sequential([
BatchNormalization(axis=1, input_shape=(3, 224, 224)),
Flatten(),
Dense(10, activation='softmax')
])
In [30]:
model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2,
validation_data=val_batches, nb_val_samples=val_batches.nb_sample)
Out[30]:
Ya, this model training is going nowhere... Let's check the number of parameters in our model
In [31]:
model.summary()
Over 1.5 million parameters - that should be enough. Incidentally, it's worth checking you understand why this is the number of parameters in this layer:
In [32]:
10*3*224*224
Out[32]:
Since we have a simple model with no regularization and plenty of parameters, it seems most likely that our learning rate is too high. Perhaps it is jumping to a solution where it predicts 1 or 2 classes wigh high confidence, so that it can give a 0 prediction to as many classes as possible.
In [34]:
np.round(model.predict_generator(batches, batches.N)[:20], 2)
Out[34]:
Our hypothesis was correct. It's nearly always predicting class 1 or 9.
From keras documentation, the default params for Adam() is:
Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
So let's try a lower learning rate:
In [38]:
model = Sequential([
BatchNormalization(axis=1, input_shape=(3, 224, 224)),
Flatten(),
Dense(10, activation='softmax')
])
model.compile(Adam(lr=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=4, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
Out[38]:
We're stabilizing at validation accuracy of 0.7. Much better than a random guess!
Before moving on, let's check that our validation set on the sample is large enough that it gives consistent results:
In [39]:
rnd_batches = get_batches(path+'valid', batch_size=batch_size*2, shuffle=True)
In [43]:
# Look at 10 randomly batches validation sets, and look at the val_acc
val_res = [model.evaluate_generator(rnd_batches, rnd_batches.nb_sample) for i in range(10)]
np.round(val_res, 2)
Out[43]:
Up to now, the previous models are over-fitting. Note that we can't user dropout since we only have one simple linear layer. Let's try to descrease overfitting by adding l2 regularization.
In [12]:
model = Sequential([
BatchNormalization(axis=1, input_shape=(3, 224, 224)),
Flatten(),
Dense(10, activation='softmax', W_regularizer=l2(0.01))
])
model.compile(Adam(lr=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=4, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
Out[12]:
This will be a good benchmark for our future models - if we can't beat 80%, then we're not even beating a linear model trained on a sample, so we'll know that's not a good approach.
In [13]:
model = Sequential([
BatchNormalization(axis=1, input_shape=(3, 224, 224)),
Flatten(),
Dense(100, activation='relu'),
BatchNormalization(),
Dense(10, activation='softmax')
])
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
model.optimizer.lr = 0.01
model.fit_generator(batches, batches.nb_sample, nb_epoch=5, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
Out[13]:
2 conv layers with max pooling followed by a simple dense network is a good sample CNN to start:
In [15]:
def conv1(batches):
model = Sequential([
BatchNormalization(axis=1, input_shape=(3,224,224)),
Convolution2D(32,3,3, activation='relu'),
BatchNormalization(axis=1),
MaxPooling2D((3,3)),
Convolution2D(64,3,3, activation='relu'),
BatchNormalization(axis=1),
MaxPooling2D((3,3)),
Flatten(),
Dense(200, activation='relu'),
BatchNormalization(),
Dense(10, activation='softmax')
])
model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
model.optimizer.lr = 0.01
model.fit_generator(batches, batches.nb_sample, nb_epoch=4, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
return model
In [16]:
conv1(batches)
Out[16]:
The training set here is very rapidly reaching a very high accuracy. So if we could regularize this, perhaps we could get a reasonable result.
So, what kind of regularization should we try first? As we discussed in lesson 3, we should start with data augmentation.
Often, to find the best data augmentation parameters, we need to try each type, one at time.
And for each type, we can try 4 very different levels of augmentation, and see which one is the best. In the steps below we've only kept the single best result we found.
Width shift: move the image left and right -
In [17]:
gen_t = image.ImageDataGenerator(width_shift_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)
In [18]:
model = conv1(batches)
In [20]:
# Try a different level:
gen_t = image.ImageDataGenerator(width_shift_range=0.5)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)
model = conv1(batches)
There are many other types that we can try.
Here, let's save some time and put them together :)
In [21]:
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05,
shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)
model = conv1(batches)
At first glance, this isn't looking encouraging, since the validation set is poor and getting worse. But the training set is getting better, and still has a long way to go in accuracy - so we should try annealing our learning rate and running more epochs, before we make a decisions.
In [22]:
model.optimizer.lr = 0.0001
model.fit_generator(batches, batches.nb_sample, nb_epoch=5, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
Out[22]:
Lucky we tried that - we starting to make progress! Let's keep going.
In [23]:
model.fit_generator(batches, batches.nb_sample, nb_epoch=25, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
Out[23]:
Amazingly, using nothing but a small sample, a simple (not pre-trained) model with no dropout, and data augmentation, we're getting results that would get us into the top 50% of the competition! This looks like a great foundation for our futher experiments. To go further, we'll need to use the whole dataset, since dropout and data volumes are very related, so we can't tweak dropout without using all the data.
In [ ]: