Wayne Nixalo - 4 Jun 2017
Codealong of Practical Deep Learning I Lesson 4 statefarm JNB. My comments are in italics.
6 Jun 2017 NOTE: notebook incomplete. Unable to generate convolutional-model features on test data: "MemoryError:
"
In [1]:
import theano
In [2]:
import os, sys
sys.path.insert(1, os.path.join('utils'))
In [3]:
%matplotlib inline
from __future__ import print_function, division
path = "data/statefarm/"
import utils; reload(utils)
from utils import *
from IPython.display import FileLink
In [4]:
# batch_size=32
batch_size=16
In [5]:
batches = get_batches(path + 'train', batch_size=batch_size)
val_batches = get_batches(path + 'valid', batch_size=batch_size*2, shuffle=False)
# test_batches = get_batches(path + 'test', batch_size=batch_size, shuffle=False)
In [6]:
(val_classes, trn_classes, val_labels, trn_labels,
val_filenames, trn_filenames, test_filenames) = get_classes(path)
Rather than using batches, we could just import all the data into an array to save some processing time. (In mose examples, I'm using the batches, however - just because that's how I happened to start out.)
In [ ]:
# trn = get_data(path + 'train')
# val = get_data(path + 'valid')
In [ ]:
# save_array(path + 'results/val.dat', val)
# save_array(path + 'results/trn.dat', trn)
In [ ]:
# val = load_array(path + 'results/val.dat')
# trn = load_array(path + 'results/trn.dat')
We should find that everything that worked on the sample (see statefarm-sample.ipynb), works on the full dataset too. Only better! Because now we have more data. So let's see how they go - the models in this section are exact copies of the sample notebook models.
In [8]:
def conv1(batches):
model = Sequential([
BatchNormalization(axis=1, input_shape=(3,224,224)),
Convolution2D(32, 3, 3, activation='relu'),
BatchNormalization(axis=1),
MaxPooling2D((3,3)),
Convolution2D(64, 3, 3, activation='relu'),
BatchNormalization(axis=1),
MaxPooling2D((3,3)),
Flatten(),
Dense(200, activation='relu'),
BatchNormalization(),
Dense(10, activation='softmax')
])
model.compile(Adam(lr=1e-4), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
model.optimizer.lr = 1e-3
model.fit_generator(batches, batches.nb_sample, nb_epoch=4, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
return model
In [9]:
model = conv1(batches)
In [10]:
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05,
shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
batches = get_batches(path + 'train', gen_t, batch_size=batch_size)
In [11]:
model = conv1(batches)
In [12]:
model.optimizer.lr = 1e-4
model.fit_generator(batches, batches.nb_sample, nb_epoch=15, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
Out[12]:
I'm shocked by how good these results are! We're regularly seeing 75-80% accuracy on the validation set, which puts us into the top third or better of the competition. With such a simple model and no dropout or semi-supervised learning, this really speaks to the power of this approach to data augmentation. Noted. I'm seeing the same numbers
Unfortunately, the results are still very unstable - the validation accuracy jumps from epoch to epoch. Perhaps a deeper model with some dropout would help.
In [ ]:
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05,
shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
batches = get_batches(path + 'train', gen_t, batch_size=batch_size)
In [14]:
model = Sequential([
BatchNormalization(axis=1, input_shape=(3, 224, 224)),
Convolution2D(32, 3, 3, activation='relu'),
BatchNormalization(axis=1),
MaxPooling2D(),
Convolution2D(64, 3, 3, activation='relu'),
BatchNormalization(axis=1),
MaxPooling2D(),
Convolution2D(128, 3, 3, activation='relu'),
BatchNormalization(axis=1),
MaxPooling2D(),
Flatten(),
Dense(200, activation='relu'),
BatchNormalization(),
Dropout(0.5),
Dense(200, activation='relu'),
BatchNormalization(),
Dropout(0.5),
Dense(10, activation='softmax')
])
In [15]:
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
In [16]:
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
Out[16]:
In [17]:
model.optimizer.lr=1e-3
In [18]:
model.fit_generator(batches, batches.nb_sample, nb_epoch=10, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
Out[18]:
In [19]:
model.optimizer.lr=1e-5
In [20]:
model.fit_generator(batches, batches.nb_sample, nb_epoch=10, validation_data=val_batches,
nb_val_samples=val_batches.nb_sample)
Out[20]:
In [23]:
# os.mkdir(path + 'models')
model.save_weights(path + 'models/conv8_prelim.h5')
This is looking quite a bit better - the accuracy is similar, but the stability is higher. There's still some way to go however...
Since we have so little data, and it is similar to ImageNet images (full-color photos), using pre-trained VGG weights is likely to be helpful - in fact it seems likely that we won't need to fine-tune the convolutional layer weights much, if at all. So we can pre-compute the output of the last convolutional layer, as we did in lesson 3 when we experimented with dropout. (However this means that we can't use full data augmentation, since we can't pre-compute something that changes every image.)
NOTE: there is a work-around to this, discussed in lecture: add augmented-versions of the data to the dataset first.
In [8]:
vgg = Vgg16()
model = vgg.model
last_conv_idx = [i for i, l in enumerate(model.layers) if type(l) is Convolution2D][-1]
conv_layers = model.layers[:last_conv_idx + 1]
In [9]:
conv_model = Sequential(conv_layers)
In [8]:
# ¡ batches shuffle must be set to False when pre-computing features !
batches = get_batches(path + 'train', batch_size=batch_size, shuffle=False)
In [10]:
(val_classes, trn_classes, val_labels, trn_labels,
val_filenames, filenames, test_filenames) = get_classes(path)
In [11]:
conv_feat = conv_model.predict_generator(batches, batches.nb_sample)
conv_val_feat = conv_model.predict_generator(val_batches, val_batches.nb_sample)
# conv_test_feat = conv_model.predict_generator(test_batches, test_batches.nb_sample)
In [12]:
save_array(path + 'results/conv_feat.dat', conv_feat)
save_array(path + 'results/conv_val_feat.dat', conv_val_feat)
# save_array(path + 'results/conv_test_feat.dat', conv_test_feat)
In [10]:
conv_feat = load_array(path + 'results/conv_feat.dat')
conv_val_feat = load_array(path + 'results/conv_val_feat.dat')
# conv_test_feat = load_array(path + 'results/conv_test_feat.dat')
conv_val_feat.shape
Out[10]:
(Working on getting conv_test_feat. For some reason getting a nameless "MemoryError:
" every time I run conv_test_feat = conv_model.predict_generator(test_batches, test_batches.nb_sample)
Update: this doesn't throw an error on the Mac using CPU, however, unable on Linux machine to generate test convolutional features. Throwing "MemoryError
" Will see if able to generate predictions on test data through full model.
Thought: loading convolutional training and validation features raises memory load from ~2.3 GB to ~10.5 GB.. That's on ~20k imgs. Test data is 80k.. Could the MemoryError be from overloading RAM? But then why is that working just fine on the Mac? Is it an issue with the version of Theano? It's 0.9.0 on both machines...
Maybe I should find a way to save generated convolutional test features straight to disk as they're created in batches..
In [11]:
test_batches = get_batches(path + 'test', batch_size=1, shuffle=False, class_mode=None)
In [12]:
save_array(path + '/results/conv_test_feat.dat', conv_model.predict_generator(test_batches, test_batches.nb_sample))
In [ ]:
save_array(path + 'results/conv_test_feat.dat', conv_test_feat)
In [14]:
def get_bn_layers(p):
return [
MaxPooling2D(input_shape=conv_layers[-1].output_shape[1:]),
Flatten(),
Dropout(p/2),
Dense(128, activation='relu'),
BatchNormalization(),
Dropout(p/2),
Dense(128, activation='relu'),
BatchNormalization(),
Dropout(p),
Dense(10, activation='softmax')
]
In [15]:
p = 0.8
In [16]:
bn_model = Sequential(get_bn_layers(p))
bn_model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
In [17]:
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=1,
validation_data=(conv_val_feat, val_labels))
Out[17]:
In [18]:
bn_model.optimizer.lr=0.01
In [19]:
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=2,
validation_data=(conv_val_feat, val_labels))
Out[19]:
In [20]:
bn_model.save_weights(path + 'models/conv8.h5')
In [ ]:
# bn_model.load_weights(path + 'models/conv8.h5')
I'm going to leave off the following sections on concatenating DataAugmented versions w/ training data features; and Pseudolabeling, for time. For the massive memory-overhead of concatenating data augmented files/features -- use bcolz to save them and work on it in batches. Sure I'll get experience with that soon.I may train the model w/ dropout below
In [21]:
bn_model.optimizer.lr=0.001
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=4,
validation_data=(conv_val_feat, val_labels))
Out[21]:
In [22]:
bn_model.optimizer.lr=0.0001
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=4,
validation_data=(conv_val_feat, val_labels))
Out[22]:
In [23]:
bn_model.optimizer.lr=0.00001
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=8,
validation_data=(conv_val_feat, val_labels))
Out[23]:
In [ ]:
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05,
shear_range=0.1, channel_shif_range=20, width_shift_range=0.1)
da_batches = get_batches(path + 'train', gen_t, batch_size=batch_size, shuffle=False)
We'll use those to create a dataset of convolutional features 5x bigger than the training set.
In [ ]:
da_conv_feat = conv_model.predict_generator(da_batches, da_batches.nb_smaple*5)
In [ ]:
save_array(path + 'results/da_conv_feat.dat', da_conv_feat)
In [ ]:
da_conv_feat = load_array('results/da_conv_feat.dat')
Let's include the real trianing data as well in its non-augmented form.
In [ ]:
da_conv_feat = np.concatenate([da_conv_feat, conv_feat])
Since we've now got a dataset 6x bigger than before, we'll need tocopy our labels 6 times too.
In [ ]:
da_trn_labels = np.concatenate([trn_labels]*6)
Based on some experiments the previous model works well, with bigger dense layers.
In [24]:
def get_bn_da_layers(p):
return [
MaxPooling2D(input_shape = conv_layers[-1].output_shape[1:]),
Flatten(),
Dropout(p),
Dense(256, activation='relu'),
BatchNormalization(),
Dropout(p),
Dense(256, activation='relu'),
BatchNormalization(),
Dropout(p),
Dense(10, activation='softmax')
]
In [25]:
p=0.8
In [26]:
bn_model = Sequential(get_bn_da_layers(p))
bn_model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
Now we can train the model as usual, with pre-computed augmented data.
In [ ]:
bn_model.fit(da_conv_feat, da_trn_labels, batch_size=batch_size, nb_epoch=1,
validation_data=(conv_val_feat, val_labels))
In [ ]:
bn_model.optimizer.lr=0.01
In [ ]:
bn_model.fit(da_conv_feat, da_trn_labels, batch_size=batch_size, nb_epoch=4,
validation_data=(conv_val_feat, val_labels))
In [ ]:
bn_model.optimizer.lr=1e-4
In [ ]:
bn_model.fit(da_conv_feat, da_trn_labels, batch_size=batch_size, nb_epoch=4,
validation_data=(conv_val_feat, val_labels))
Looks good - let's save those weights.
In [ ]:
bn_model.save_weights(path + 'models/da_conv8_1.h5')
In [27]:
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=1,
validation_data=(conv_val_feat, val_labels))
bn_model.optimizer.lr=0.01
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=4,
validation_data=(conv_val_feat, val_labels))
bn_model.optimizer.lr=1e-4
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=4,
validation_data=(conv_val_feat, val_labels))
Out[27]:
In [30]:
bn_model.save_weights(path + 'models/conv8_bn_1.h5')
We're going to try using a combination of psudeo labeling and knowledge distillation to allow us to use unlabeled data (ie: do semi-supervised learning). For our initial experiment we'll use the validation set as the unlabled data, so that we can see that it is working without using the test set. At a layer date we'll try using the test set.
To do this, we can simply calculate the predictions of our model...
In [28]:
val_pseudo = bn_model.predict(conv_val_feat, batch_size=batch_size)
...concatenate them with our training labels...
In [29]:
comb_pseudo = np.concatenate([trn_labels, val_pseudo])
comb_feat = np.concatenate([trn_labels, conv_val_feat])
In [ ]:
comb_pseudo = np.concatenate([da_trn_labels, val_pseudo])
In [ ]:
comb_feat = np.concatenate([da_conv_feat, conv_val_feat])
...and fine-tune our model using that data.
In [ ]:
bn_model.load_weights(path _ + 'models/da_conv8_1.h5')
In [ ]:
bn_model.fit(comb_feat, comb_pseudo, batch_size=batch_size, nb_epoch=1,
validation_data=(conv_val_feat, val_labels))
In [ ]:
bn_model.fit(comb_feat, comb_pseudo, batch_size=batch_size, nb_epoch=4,
validation_data=(conv_val_feat, val_labels))
In [ ]:
bn_model.optimizer.lr=1e-5
In [ ]:
bn_model.fit(comb_feat, comb_pseudo, batch_size=batch_size, nb_epoch=4,
validation_data=(conv_val_feat, val_labels))
That's a distinct improvement - even although the validation set isn't very big. This looks encouraging for when we try this on the test set.
In [ ]:
bn_model.save_weights(path + 'models/bn-ps8.h5')
In [31]:
def do_clip(arr, mx): return np.clip(arr, (1 - mx)/9, mx)
In [33]:
val_preds = bn_model.predict(conv_val_feat, batch_size=batch_size)
In [34]:
keras.metrics.categorical_crossentropy(val_labels, do_clip(val_preds, 0.93)).eval()
Out[34]:
In [35]:
conb_test_feat = conv_model.predict_generator(test_batches, test_batches.n)
In [ ]:
conv_test_feat = load_array(path + 'results/conv_test_feat.dat')
In [ ]:
preds = bn_model.predict(conv_test_feat, batch_size=batch_size*2)
In [ ]:
subm = do_clip(preds, 0.93)
In [ ]:
subm_name = path + 'results/subm.gz'
In [ ]:
classes = sorted(batches.class_indices, key=batches.class_indices.get)
In [ ]:
submission = pd.DataFrame(subm, columns=classes)
submission.insert(0, 'img', [a[4:] for a in test_filenames]) # <-- why a[4:]?
# submission.insert(0, 'img', [f[8:] for f in test_filenames])
submission.head()
In [ ]:
submission.to_csv(subm_name, index=False, compression='gzip')
In [ ]:
FileLink(subm_name)
This gets 0.534 on the leaderboard.