Wayne Nixalo - 4 Jun 2017

Codealong of Practical Deep Learning I Lesson 4 statefarm JNB. My comments are in italics.

6 Jun 2017 NOTE: notebook incomplete. Unable to generate convolutional-model features on test data: "MemoryError:"

Enter State Farm


In [1]:
import theano


/home/wnixalo/miniconda3/envs/FAI/lib/python2.7/site-packages/theano/gpuarray/dnn.py:135: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to version 5.1.
  warnings.warn("Your cuDNN version is more recent than "
Using cuDNN version 6021 on context None
Mapped name None to device cuda: GeForce GTX 870M (0000:01:00.0)

In [2]:
import os, sys
sys.path.insert(1, os.path.join('utils'))

In [3]:
%matplotlib inline
from __future__ import print_function, division
path = "data/statefarm/"
import utils; reload(utils)
from utils import *
from IPython.display import FileLink


Using Theano backend.

In [4]:
# batch_size=32
batch_size=16

Setup Batches


In [5]:
batches = get_batches(path + 'train', batch_size=batch_size)
val_batches = get_batches(path + 'valid', batch_size=batch_size*2, shuffle=False)
# test_batches = get_batches(path + 'test', batch_size=batch_size, shuffle=False)


Found 19463 images belonging to 10 classes.
Found 2961 images belonging to 10 classes.

In [6]:
(val_classes, trn_classes, val_labels, trn_labels, 
    val_filenames, trn_filenames, test_filenames) = get_classes(path)


Found 19463 images belonging to 10 classes.
Found 2961 images belonging to 10 classes.
Found 79726 images belonging to 1 classes.

Rather than using batches, we could just import all the data into an array to save some processing time. (In mose examples, I'm using the batches, however - just because that's how I happened to start out.)


In [ ]:
# trn = get_data(path + 'train')
# val = get_data(path + 'valid')

In [ ]:
# save_array(path + 'results/val.dat', val)
# save_array(path + 'results/trn.dat', trn)

In [ ]:
# val = load_array(path + 'results/val.dat')
# trn = load_array(path + 'results/trn.dat')

Re-run sample experiments on full dataset

We should find that everything that worked on the sample (see statefarm-sample.ipynb), works on the full dataset too. Only better! Because now we have more data. So let's see how they go - the models in this section are exact copies of the sample notebook models.

Single Conv Layer


In [8]:
def conv1(batches):
    model = Sequential([
                BatchNormalization(axis=1, input_shape=(3,224,224)),
                Convolution2D(32, 3, 3, activation='relu'),
                BatchNormalization(axis=1),
                MaxPooling2D((3,3)),
                Convolution2D(64, 3, 3, activation='relu'),
                BatchNormalization(axis=1),
                MaxPooling2D((3,3)),
                Flatten(),
                Dense(200, activation='relu'),
                BatchNormalization(),
                Dense(10, activation='softmax')
            ])
    model.compile(Adam(lr=1e-4), loss='categorical_crossentropy', metrics=['accuracy'])
    model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches,
                        nb_val_samples=val_batches.nb_sample)
    
    model.optimizer.lr = 1e-3
    model.fit_generator(batches, batches.nb_sample, nb_epoch=4, validation_data=val_batches,
                        nb_val_samples=val_batches.nb_sample)
    return model

In [9]:
model = conv1(batches)


Epoch 1/2
19463/19463 [==============================] - 246s - loss: 0.2081 - acc: 0.9451 - val_loss: 1.6457 - val_acc: 0.4799
Epoch 2/2
19463/19463 [==============================] - 230s - loss: 0.0190 - acc: 0.9968 - val_loss: 2.0091 - val_acc: 0.3600
Epoch 1/4
19463/19463 [==============================] - 234s - loss: 0.0065 - acc: 0.9994 - val_loss: 1.8342 - val_acc: 0.4401
Epoch 2/4
19463/19463 [==============================] - 231s - loss: 0.0033 - acc: 0.9998 - val_loss: 1.8580 - val_acc: 0.4076
Epoch 3/4
19463/19463 [==============================] - 231s - loss: 0.0056 - acc: 0.9992 - val_loss: 1.1711 - val_acc: 0.6494
Epoch 4/4
19463/19463 [==============================] - 230s - loss: 0.0096 - acc: 0.9987 - val_loss: 2.0138 - val_acc: 0.4076

Interestingly, with no regularization or augmentation, we're getting some reasonable results from our simple convolutional model. So with augmentation, we hopefully will see some very good results.

Data Augmentation


In [10]:
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05,
                shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
batches = get_batches(path + 'train', gen_t, batch_size=batch_size)


Found 19463 images belonging to 10 classes.

In [11]:
model = conv1(batches)


Epoch 1/2
19463/19463 [==============================] - 240s - loss: 1.2663 - acc: 0.5893 - val_loss: 1.1617 - val_acc: 0.6278
Epoch 2/2
19463/19463 [==============================] - 236s - loss: 0.6408 - acc: 0.7995 - val_loss: 0.9994 - val_acc: 0.6964
Epoch 1/4
19463/19463 [==============================] - 238s - loss: 0.4542 - acc: 0.8631 - val_loss: 0.8759 - val_acc: 0.7221
Epoch 2/4
19463/19463 [==============================] - 235s - loss: 0.3503 - acc: 0.9010 - val_loss: 0.8460 - val_acc: 0.7291
Epoch 3/4
19463/19463 [==============================] - 235s - loss: 0.2827 - acc: 0.9194 - val_loss: 0.9318 - val_acc: 0.7717
Epoch 4/4
19463/19463 [==============================] - 235s - loss: 0.2340 - acc: 0.9347 - val_loss: 0.7082 - val_acc: 0.7943

In [12]:
model.optimizer.lr = 1e-4
model.fit_generator(batches, batches.nb_sample, nb_epoch=15, validation_data=val_batches,
                    nb_val_samples=val_batches.nb_sample)


Epoch 1/15
19463/19463 [==============================] - 238s - loss: 0.2069 - acc: 0.9403 - val_loss: 0.6503 - val_acc: 0.8122
Epoch 2/15
19463/19463 [==============================] - 235s - loss: 0.1960 - acc: 0.9449 - val_loss: 0.7898 - val_acc: 0.7683
Epoch 3/15
19463/19463 [==============================] - 235s - loss: 0.1721 - acc: 0.9506 - val_loss: 0.6337 - val_acc: 0.8220
Epoch 4/15
19463/19463 [==============================] - 235s - loss: 0.1537 - acc: 0.9560 - val_loss: 0.7949 - val_acc: 0.8051
Epoch 5/15
19463/19463 [==============================] - 235s - loss: 0.1443 - acc: 0.9582 - val_loss: 0.7589 - val_acc: 0.7882
Epoch 6/15
19463/19463 [==============================] - 235s - loss: 0.1336 - acc: 0.9637 - val_loss: 0.7472 - val_acc: 0.7825
Epoch 7/15
19463/19463 [==============================] - 235s - loss: 0.1236 - acc: 0.9654 - val_loss: 0.8541 - val_acc: 0.7717
Epoch 8/15
19463/19463 [==============================] - 236s - loss: 0.1118 - acc: 0.9687 - val_loss: 0.6983 - val_acc: 0.8197
Epoch 9/15
19463/19463 [==============================] - 234s - loss: 0.1133 - acc: 0.9689 - val_loss: 0.6898 - val_acc: 0.8126
Epoch 10/15
19463/19463 [==============================] - 236s - loss: 0.0975 - acc: 0.9733 - val_loss: 0.5965 - val_acc: 0.8183
Epoch 11/15
19463/19463 [==============================] - 235s - loss: 0.0930 - acc: 0.9739 - val_loss: 0.7155 - val_acc: 0.8153
Epoch 12/15
19463/19463 [==============================] - 235s - loss: 0.0961 - acc: 0.9722 - val_loss: 0.7055 - val_acc: 0.8028
Epoch 13/15
19463/19463 [==============================] - 235s - loss: 0.0858 - acc: 0.9756 - val_loss: 0.6565 - val_acc: 0.8200
Epoch 14/15
19463/19463 [==============================] - 235s - loss: 0.0769 - acc: 0.9785 - val_loss: 0.5436 - val_acc: 0.8457
Epoch 15/15
19463/19463 [==============================] - 236s - loss: 0.0749 - acc: 0.9788 - val_loss: 0.6557 - val_acc: 0.8230
Out[12]:
<keras.callbacks.History at 0x7fec7a019410>

I'm shocked by how good these results are! We're regularly seeing 75-80% accuracy on the validation set, which puts us into the top third or better of the competition. With such a simple model and no dropout or semi-supervised learning, this really speaks to the power of this approach to data augmentation. Noted. I'm seeing the same numbers

Four Conv/Pooling pairs + Dropout

Unfortunately, the results are still very unstable - the validation accuracy jumps from epoch to epoch. Perhaps a deeper model with some dropout would help.


In [ ]:
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05,
                shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
batches = get_batches(path + 'train', gen_t, batch_size=batch_size)

In [14]:
model = Sequential([
            BatchNormalization(axis=1, input_shape=(3, 224, 224)),
            Convolution2D(32, 3, 3, activation='relu'),
            BatchNormalization(axis=1),
            MaxPooling2D(),
            Convolution2D(64, 3, 3, activation='relu'),
            BatchNormalization(axis=1),
            MaxPooling2D(),
            Convolution2D(128, 3, 3, activation='relu'),
            BatchNormalization(axis=1),
            MaxPooling2D(),
            Flatten(),
            Dense(200, activation='relu'),
            BatchNormalization(),
            Dropout(0.5),
            Dense(200, activation='relu'),
            BatchNormalization(),
            Dropout(0.5),
            Dense(10, activation='softmax')
        ])

In [15]:
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])

In [16]:
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches,
                    nb_val_samples=val_batches.nb_sample)


Epoch 1/2
19463/19463 [==============================] - 295s - loss: 3.0645 - acc: 0.1751 - val_loss: 1.6314 - val_acc: 0.4877
Epoch 2/2
19463/19463 [==============================] - 295s - loss: 2.3852 - acc: 0.2982 - val_loss: 1.3944 - val_acc: 0.5326
Out[16]:
<keras.callbacks.History at 0x7fec90efced0>

In [17]:
model.optimizer.lr=1e-3

In [18]:
model.fit_generator(batches, batches.nb_sample, nb_epoch=10, validation_data=val_batches,
                    nb_val_samples=val_batches.nb_sample)


Epoch 1/10
19463/19463 [==============================] - 295s - loss: 2.0652 - acc: 0.3742 - val_loss: 1.2693 - val_acc: 0.6153
Epoch 2/10
19463/19463 [==============================] - 295s - loss: 1.8344 - acc: 0.4340 - val_loss: 1.2215 - val_acc: 0.6217
Epoch 3/10
19463/19463 [==============================] - 295s - loss: 1.6298 - acc: 0.4791 - val_loss: 1.1868 - val_acc: 0.6535
Epoch 4/10
19463/19463 [==============================] - 295s - loss: 1.4825 - acc: 0.5243 - val_loss: 1.1558 - val_acc: 0.6677
Epoch 5/10
19463/19463 [==============================] - 295s - loss: 1.3684 - acc: 0.5582 - val_loss: 1.1495 - val_acc: 0.6846
Epoch 6/10
19463/19463 [==============================] - 295s - loss: 1.2752 - acc: 0.5852 - val_loss: 1.1122 - val_acc: 0.6684
Epoch 7/10
19463/19463 [==============================] - 295s - loss: 1.2023 - acc: 0.6105 - val_loss: 1.0681 - val_acc: 0.6836
Epoch 8/10
19463/19463 [==============================] - 295s - loss: 1.1268 - acc: 0.6298 - val_loss: 1.1334 - val_acc: 0.6802
Epoch 9/10
19463/19463 [==============================] - 294s - loss: 1.0686 - acc: 0.6486 - val_loss: 1.0859 - val_acc: 0.6842
Epoch 10/10
19463/19463 [==============================] - 295s - loss: 1.0056 - acc: 0.6700 - val_loss: 0.9440 - val_acc: 0.7264
Out[18]:
<keras.callbacks.History at 0x7fec86035d50>

In [19]:
model.optimizer.lr=1e-5

In [20]:
model.fit_generator(batches, batches.nb_sample, nb_epoch=10, validation_data=val_batches,
                    nb_val_samples=val_batches.nb_sample)


Epoch 1/10
19463/19463 [==============================] - 295s - loss: 0.9563 - acc: 0.6856 - val_loss: 1.0102 - val_acc: 0.7119
Epoch 2/10
19463/19463 [==============================] - 295s - loss: 0.9028 - acc: 0.7030 - val_loss: 0.9853 - val_acc: 0.7126
Epoch 3/10
19463/19463 [==============================] - 295s - loss: 0.8503 - acc: 0.7177 - val_loss: 1.0061 - val_acc: 0.7221
Epoch 4/10
19463/19463 [==============================] - 295s - loss: 0.8053 - acc: 0.7386 - val_loss: 0.9059 - val_acc: 0.7393
Epoch 5/10
19463/19463 [==============================] - 295s - loss: 0.7798 - acc: 0.7440 - val_loss: 0.8938 - val_acc: 0.7622
Epoch 6/10
19463/19463 [==============================] - 295s - loss: 0.7333 - acc: 0.7591 - val_loss: 0.9187 - val_acc: 0.7518
Epoch 7/10
19463/19463 [==============================] - 295s - loss: 0.6912 - acc: 0.7713 - val_loss: 0.8440 - val_acc: 0.7653
Epoch 8/10
19463/19463 [==============================] - 295s - loss: 0.6783 - acc: 0.7787 - val_loss: 0.8569 - val_acc: 0.7724
Epoch 9/10
19463/19463 [==============================] - 294s - loss: 0.6495 - acc: 0.7853 - val_loss: 0.8414 - val_acc: 0.7859
Epoch 10/10
19463/19463 [==============================] - 294s - loss: 0.6258 - acc: 0.7978 - val_loss: 0.8342 - val_acc: 0.7818
Out[20]:
<keras.callbacks.History at 0x7fec86035e50>

In [23]:
# os.mkdir(path + 'models')
model.save_weights(path + 'models/conv8_prelim.h5')

This is looking quite a bit better - the accuracy is similar, but the stability is higher. There's still some way to go however...

Imagenet Conv Features

Since we have so little data, and it is similar to ImageNet images (full-color photos), using pre-trained VGG weights is likely to be helpful - in fact it seems likely that we won't need to fine-tune the convolutional layer weights much, if at all. So we can pre-compute the output of the last convolutional layer, as we did in lesson 3 when we experimented with dropout. (However this means that we can't use full data augmentation, since we can't pre-compute something that changes every image.)

NOTE: there is a work-around to this, discussed in lecture: add augmented-versions of the data to the dataset first.


In [8]:
vgg = Vgg16()
model = vgg.model
last_conv_idx = [i for i, l in enumerate(model.layers) if type(l) is Convolution2D][-1]
conv_layers = model.layers[:last_conv_idx + 1]

In [9]:
conv_model = Sequential(conv_layers)

In [8]:
# ¡ batches shuffle must be set to False when pre-computing features !
batches = get_batches(path + 'train', batch_size=batch_size, shuffle=False)


Found 19463 images belonging to 10 classes.

In [10]:
(val_classes, trn_classes, val_labels, trn_labels,
    val_filenames, filenames, test_filenames) = get_classes(path)


Found 19463 images belonging to 10 classes.
Found 2961 images belonging to 10 classes.
Found 79726 images belonging to 1 classes.

In [11]:
conv_feat = conv_model.predict_generator(batches, batches.nb_sample)
conv_val_feat = conv_model.predict_generator(val_batches, val_batches.nb_sample)
# conv_test_feat = conv_model.predict_generator(test_batches, test_batches.nb_sample)

In [12]:
save_array(path + 'results/conv_feat.dat', conv_feat)
save_array(path + 'results/conv_val_feat.dat', conv_val_feat)
# save_array(path + 'results/conv_test_feat.dat', conv_test_feat)

In [10]:
conv_feat = load_array(path + 'results/conv_feat.dat')
conv_val_feat = load_array(path + 'results/conv_val_feat.dat')
# conv_test_feat = load_array(path + 'results/conv_test_feat.dat')
conv_val_feat.shape


Out[10]:
(2961, 512, 14, 14)

(Working on getting conv_test_feat. For some reason getting a nameless "MemoryError:" every time I run conv_test_feat = conv_model.predict_generator(test_batches, test_batches.nb_sample)

Update: this doesn't throw an error on the Mac using CPU, however, unable on Linux machine to generate test convolutional features. Throwing "MemoryError" Will see if able to generate predictions on test data through full model.

Thought: loading convolutional training and validation features raises memory load from ~2.3 GB to ~10.5 GB.. That's on ~20k imgs. Test data is 80k.. Could the MemoryError be from overloading RAM? But then why is that working just fine on the Mac? Is it an issue with the version of Theano? It's 0.9.0 on both machines...

Maybe I should find a way to save generated convolutional test features straight to disk as they're created in batches..


In [11]:
test_batches = get_batches(path + 'test', batch_size=1, shuffle=False, class_mode=None)


Found 79726 images belonging to 1 classes.

In [12]:
save_array(path + '/results/conv_test_feat.dat', conv_model.predict_generator(test_batches, test_batches.nb_sample))


---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-12-023d498bb144> in <module>()
----> 1 save_array(path + '/results/conv_test_feat.dat', conv_model.predict_generator(test_batches, test_batches.nb_sample))

/home/wnixalo/miniconda3/envs/FAI/lib/python2.7/site-packages/keras/models.pyc in predict_generator(self, generator, val_samples, max_q_size, nb_worker, pickle_safe)
   1010                                             max_q_size=max_q_size,
   1011                                             nb_worker=nb_worker,
-> 1012                                             pickle_safe=pickle_safe)
   1013 
   1014     def get_config(self):

/home/wnixalo/miniconda3/envs/FAI/lib/python2.7/site-packages/keras/engine/training.pyc in predict_generator(self, generator, val_samples, max_q_size, nb_worker, pickle_safe)
   1776                     for out in outs:
   1777                         shape = (val_samples,) + out.shape[1:]
-> 1778                         all_outs.append(np.zeros(shape, dtype=K.floatx()))
   1779 
   1780                 for i, out in enumerate(outs):

MemoryError: 

In [ ]:
save_array(path + 'results/conv_test_feat.dat', conv_test_feat)

BatchNorm Dense layers on pretrained Conv layers

Since we've pre-computed the output of the last convolutional layer, we need to create a network that takes that as input, and predicts our 10 classes. Let's try using a simplified version of VGG's dense layers.


In [14]:
def get_bn_layers(p):
    return [
        MaxPooling2D(input_shape=conv_layers[-1].output_shape[1:]),
        Flatten(),
        Dropout(p/2),
        Dense(128, activation='relu'),
        BatchNormalization(),
        Dropout(p/2),
        Dense(128, activation='relu'),
        BatchNormalization(),
        Dropout(p),
        Dense(10, activation='softmax')
        ]

In [15]:
p = 0.8

In [16]:
bn_model = Sequential(get_bn_layers(p))
bn_model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

In [17]:
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=1,
             validation_data=(conv_val_feat, val_labels))


Train on 19463 samples, validate on 2961 samples
Epoch 1/1
19463/19463 [==============================] - 5s - loss: 1.5614 - acc: 0.5749 - val_loss: 0.6704 - val_acc: 0.7497
Out[17]:
<keras.callbacks.History at 0x7f899e2094d0>

In [18]:
bn_model.optimizer.lr=0.01

In [19]:
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=2,
             validation_data=(conv_val_feat, val_labels))


Train on 19463 samples, validate on 2961 samples
Epoch 1/2
19463/19463 [==============================] - 5s - loss: 0.2692 - acc: 0.9172 - val_loss: 0.5943 - val_acc: 0.7980
Epoch 2/2
19463/19463 [==============================] - 5s - loss: 0.1412 - acc: 0.9605 - val_loss: 0.6282 - val_acc: 0.7663
Out[19]:
<keras.callbacks.History at 0x7f8bc1c734d0>

In [20]:
bn_model.save_weights(path + 'models/conv8.h5')

In [ ]:
# bn_model.load_weights(path + 'models/conv8.h5')

NOTE:

I'm going to leave off the following sections on concatenating DataAugmented versions w/ training data features; and Pseudolabeling, for time. For the massive memory-overhead of concatenating data augmented files/features -- use bcolz to save them and work on it in batches. Sure I'll get experience with that soon.I may train the model w/ dropout below


In [21]:
bn_model.optimizer.lr=0.001
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=4,
             validation_data=(conv_val_feat, val_labels))


Train on 19463 samples, validate on 2961 samples
Epoch 1/4
19463/19463 [==============================] - 5s - loss: 0.1100 - acc: 0.9687 - val_loss: 0.6599 - val_acc: 0.7818
Epoch 2/4
19463/19463 [==============================] - 5s - loss: 0.0746 - acc: 0.9787 - val_loss: 0.6599 - val_acc: 0.7849
Epoch 3/4
19463/19463 [==============================] - 5s - loss: 0.0638 - acc: 0.9811 - val_loss: 0.8356 - val_acc: 0.7329
Epoch 4/4
19463/19463 [==============================] - 5s - loss: 0.0600 - acc: 0.9812 - val_loss: 0.9786 - val_acc: 0.7244
Out[21]:
<keras.callbacks.History at 0x7f899df730d0>

In [22]:
bn_model.optimizer.lr=0.0001
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=4,
             validation_data=(conv_val_feat, val_labels))


Train on 19463 samples, validate on 2961 samples
Epoch 1/4
19463/19463 [==============================] - 5s - loss: 0.0606 - acc: 0.9818 - val_loss: 0.9510 - val_acc: 0.7072
Epoch 2/4
19463/19463 [==============================] - 5s - loss: 0.0499 - acc: 0.9841 - val_loss: 0.7346 - val_acc: 0.7923
Epoch 3/4
19463/19463 [==============================] - 5s - loss: 0.0481 - acc: 0.9853 - val_loss: 1.3455 - val_acc: 0.6454
Epoch 4/4
19463/19463 [==============================] - 5s - loss: 0.0494 - acc: 0.9848 - val_loss: 0.7110 - val_acc: 0.7859
Out[22]:
<keras.callbacks.History at 0x7f8bc1bfff10>

In [23]:
bn_model.optimizer.lr=0.00001
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=8,
             validation_data=(conv_val_feat, val_labels))


Train on 19463 samples, validate on 2961 samples
Epoch 1/8
19463/19463 [==============================] - 5s - loss: 0.0387 - acc: 0.9888 - val_loss: 1.3321 - val_acc: 0.6707
Epoch 2/8
19463/19463 [==============================] - 5s - loss: 0.0355 - acc: 0.9892 - val_loss: 1.1233 - val_acc: 0.6981
Epoch 3/8
19463/19463 [==============================] - 5s - loss: 0.0328 - acc: 0.9905 - val_loss: 0.7673 - val_acc: 0.7923
Epoch 4/8
19463/19463 [==============================] - 5s - loss: 0.0343 - acc: 0.9888 - val_loss: 0.9397 - val_acc: 0.7862
Epoch 5/8
19463/19463 [==============================] - 5s - loss: 0.0367 - acc: 0.9889 - val_loss: 1.2474 - val_acc: 0.7322
Epoch 6/8
19463/19463 [==============================] - 5s - loss: 0.0291 - acc: 0.9909 - val_loss: 0.8400 - val_acc: 0.7869
Epoch 7/8
19463/19463 [==============================] - 5s - loss: 0.0235 - acc: 0.9921 - val_loss: 1.1227 - val_acc: 0.7177
Epoch 8/8
19463/19463 [==============================] - 5s - loss: 0.0273 - acc: 0.9912 - val_loss: 1.0805 - val_acc: 0.7423
Out[23]:
<keras.callbacks.History at 0x7f899e241690>

Looking good! Let's try pre-computing 5 epochs worth of augmented data, so we can experiment with combining dropout and augmentation on the pre-trained model.

Pre-computed DataAugmentation + Dropout

We'll use our usual data augmentation parameters:


In [ ]:
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05,
                shear_range=0.1, channel_shif_range=20, width_shift_range=0.1)
da_batches = get_batches(path + 'train', gen_t, batch_size=batch_size, shuffle=False)

We'll use those to create a dataset of convolutional features 5x bigger than the training set.


In [ ]:
da_conv_feat = conv_model.predict_generator(da_batches, da_batches.nb_smaple*5)

In [ ]:
save_array(path + 'results/da_conv_feat.dat', da_conv_feat)

In [ ]:
da_conv_feat = load_array('results/da_conv_feat.dat')

Let's include the real trianing data as well in its non-augmented form.


In [ ]:
da_conv_feat = np.concatenate([da_conv_feat, conv_feat])

Since we've now got a dataset 6x bigger than before, we'll need tocopy our labels 6 times too.


In [ ]:
da_trn_labels = np.concatenate([trn_labels]*6)

Based on some experiments the previous model works well, with bigger dense layers.


In [24]:
def get_bn_da_layers(p):
    return [
        MaxPooling2D(input_shape = conv_layers[-1].output_shape[1:]),
        Flatten(),
        Dropout(p),
        Dense(256, activation='relu'),
        BatchNormalization(),
        Dropout(p),
        Dense(256, activation='relu'),
        BatchNormalization(),
        Dropout(p),
        Dense(10, activation='softmax')
        ]

In [25]:
p=0.8

In [26]:
bn_model = Sequential(get_bn_da_layers(p))
bn_model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

Now we can train the model as usual, with pre-computed augmented data.


In [ ]:
bn_model.fit(da_conv_feat, da_trn_labels, batch_size=batch_size, nb_epoch=1,
             validation_data=(conv_val_feat, val_labels))

In [ ]:
bn_model.optimizer.lr=0.01

In [ ]:
bn_model.fit(da_conv_feat, da_trn_labels, batch_size=batch_size, nb_epoch=4,
             validation_data=(conv_val_feat, val_labels))

In [ ]:
bn_model.optimizer.lr=1e-4

In [ ]:
bn_model.fit(da_conv_feat, da_trn_labels, batch_size=batch_size, nb_epoch=4,
             validation_data=(conv_val_feat, val_labels))

Looks good - let's save those weights.


In [ ]:
bn_model.save_weights(path + 'models/da_conv8_1.h5')

In [27]:
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=1,
             validation_data=(conv_val_feat, val_labels))
bn_model.optimizer.lr=0.01
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=4,
             validation_data=(conv_val_feat, val_labels))
bn_model.optimizer.lr=1e-4
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=4,
             validation_data=(conv_val_feat, val_labels))


Train on 19463 samples, validate on 2961 samples
Epoch 1/1
19463/19463 [==============================] - 7s - loss: 2.7437 - acc: 0.2704 - val_loss: 1.1370 - val_acc: 0.7589
Train on 19463 samples, validate on 2961 samples
Epoch 1/4
19463/19463 [==============================] - 7s - loss: 1.0967 - acc: 0.6119 - val_loss: 0.7872 - val_acc: 0.7889
Epoch 2/4
19463/19463 [==============================] - 7s - loss: 0.6871 - acc: 0.7640 - val_loss: 0.6816 - val_acc: 0.7501
Epoch 3/4
19463/19463 [==============================] - 7s - loss: 0.5364 - acc: 0.8208 - val_loss: 0.5816 - val_acc: 0.7852
Epoch 4/4
19463/19463 [==============================] - 7s - loss: 0.4298 - acc: 0.8581 - val_loss: 0.6536 - val_acc: 0.7687
Train on 19463 samples, validate on 2961 samples
Epoch 1/4
19463/19463 [==============================] - 7s - loss: 0.3780 - acc: 0.8780 - val_loss: 0.6361 - val_acc: 0.7801
Epoch 2/4
19463/19463 [==============================] - 7s - loss: 0.3459 - acc: 0.8899 - val_loss: 0.7410 - val_acc: 0.7433
Epoch 3/4
19463/19463 [==============================] - 7s - loss: 0.3107 - acc: 0.9000 - val_loss: 0.6891 - val_acc: 0.7612
Epoch 4/4
19463/19463 [==============================] - 7s - loss: 0.2818 - acc: 0.9099 - val_loss: 0.7168 - val_acc: 0.7589
Out[27]:
<keras.callbacks.History at 0x7f8995af09d0>

In [30]:
bn_model.save_weights(path + 'models/conv8_bn_1.h5')

Pseudo-Labeling

We're going to try using a combination of psudeo labeling and knowledge distillation to allow us to use unlabeled data (ie: do semi-supervised learning). For our initial experiment we'll use the validation set as the unlabled data, so that we can see that it is working without using the test set. At a layer date we'll try using the test set.

To do this, we can simply calculate the predictions of our model...


In [28]:
val_pseudo = bn_model.predict(conv_val_feat, batch_size=batch_size)

...concatenate them with our training labels...


In [29]:
comb_pseudo = np.concatenate([trn_labels, val_pseudo])
comb_feat = np.concatenate([trn_labels, conv_val_feat])


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-29-0fdb3db5e74f> in <module>()
      1 comb_pseudo = np.concatenate([trn_labels, val_pseudo])
----> 2 comb_feat = np.concatenate([trn_labels, conv_val_feat])

ValueError: all the input arrays must have same number of dimensions

In [ ]:
comb_pseudo = np.concatenate([da_trn_labels, val_pseudo])

In [ ]:
comb_feat = np.concatenate([da_conv_feat, conv_val_feat])

...and fine-tune our model using that data.


In [ ]:
bn_model.load_weights(path _ + 'models/da_conv8_1.h5')

In [ ]:
bn_model.fit(comb_feat, comb_pseudo, batch_size=batch_size, nb_epoch=1,
             validation_data=(conv_val_feat, val_labels))

In [ ]:
bn_model.fit(comb_feat, comb_pseudo, batch_size=batch_size, nb_epoch=4,
             validation_data=(conv_val_feat, val_labels))

In [ ]:
bn_model.optimizer.lr=1e-5

In [ ]:
bn_model.fit(comb_feat, comb_pseudo, batch_size=batch_size, nb_epoch=4,
             validation_data=(conv_val_feat, val_labels))

That's a distinct improvement - even although the validation set isn't very big. This looks encouraging for when we try this on the test set.


In [ ]:
bn_model.save_weights(path + 'models/bn-ps8.h5')

Submit

We'll find a good clipping amount using the validation set, prior to submitting.


In [31]:
def do_clip(arr, mx): return np.clip(arr, (1 - mx)/9, mx)

In [33]:
val_preds = bn_model.predict(conv_val_feat, batch_size=batch_size)

In [34]:
keras.metrics.categorical_crossentropy(val_labels, do_clip(val_preds, 0.93)).eval()


Out[34]:
array(0.7353486965074808)

In [35]:
conb_test_feat = conv_model.predict_generator(test_batches, test_batches.n)


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-35-86f55d1296aa> in <module>()
----> 1 conb_test_feat = conv_model.predict_generator(test_batches, test_batches.n)

NameError: name 'conv_model' is not defined

In [ ]:
conv_test_feat = load_array(path + 'results/conv_test_feat.dat')

In [ ]:
preds = bn_model.predict(conv_test_feat, batch_size=batch_size*2)

In [ ]:
subm = do_clip(preds, 0.93)

In [ ]:
subm_name = path + 'results/subm.gz'

In [ ]:
classes = sorted(batches.class_indices, key=batches.class_indices.get)

In [ ]:
submission = pd.DataFrame(subm, columns=classes)
submission.insert(0, 'img', [a[4:] for a in test_filenames]) # <-- why a[4:]?
# submission.insert(0, 'img', [f[8:] for f in test_filenames])
submission.head()

In [ ]:
submission.to_csv(subm_name, index=False, compression='gzip')

In [ ]:
FileLink(subm_name)

This gets 0.534 on the leaderboard.