Enter State Farm



In [1]:

    
from __future__ import division, print_function
%matplotlib inline
#path = "data/state/"
path = "data/state/sample/"
from importlib import reload  # Python 3
import utils; reload(utils)
from utils import *
from IPython.display import FileLink









    



Using cuDNN version 6021 on context None
Mapped name None to device cuda0: GeForce GTX TITAN X (0000:04:00.0)
Using Theano backend.



In [2]:

    
batch_size=64

Setup batches



In [3]:

    
batches = get_batches(path+'train', batch_size=batch_size)
val_batches = get_batches(path+'valid', batch_size=batch_size*2, shuffle=False)
steps_per_epoch = int(np.ceil(batches.samples/batch_size))
validation_steps = int(np.ceil(val_batches.samples/(batch_size*2)))









    



Found 1500 images belonging to 10 classes.
Found 1000 images belonging to 10 classes.



In [4]:

    
(val_classes, trn_classes, val_labels, trn_labels, 
    val_filenames, filenames, test_filenames) = get_classes(path)









    



Found 1500 images belonging to 10 classes.
Found 1000 images belonging to 10 classes.
Found 1000 images belonging to 1 classes.

Rather than using batches, we could just import all the data into an array to save some processing time. (In most examples I'm using the batches, however - just because that's how I happened to start out.)



In [5]:

    
trn = get_data(path+'train')
val = get_data(path+'valid')









    



Found 1500 images belonging to 10 classes.
Found 1000 images belonging to 10 classes.



In [6]:

    
save_array(path+'results/val.dat', val)
save_array(path+'results/trn.dat', trn)



In [7]:

    
val = load_array(path+'results/val.dat')
trn = load_array(path+'results/trn.dat')

Re-run sample experiments on full dataset

We should find that everything that worked on the sample (see statefarm-sample.ipynb), works on the full dataset too. Only better! Because now we have more data. So let's see how they go - the models in this section are exact copies of the sample notebook models.

Single conv layer



In [8]:

    
def conv1(batches):
    model = Sequential([
            BatchNormalization(axis=1, input_shape=(3,224,224)),
            Conv2D(32,(3,3), activation='relu'),
            BatchNormalization(axis=1),
            MaxPooling2D((3,3)),
            Conv2D(64,(3,3), activation='relu'),
            BatchNormalization(axis=1),
            MaxPooling2D((3,3)),
            Flatten(),
            Dense(200, activation='relu'),
            BatchNormalization(),
            Dense(10, activation='softmax')
        ])

    model.compile(Adam(lr=1e-4), loss='categorical_crossentropy', metrics=['accuracy'])
    model.fit_generator(batches, steps_per_epoch, epochs=2, validation_data=val_batches, 
                     validation_steps=validation_steps)
    model.optimizer.lr = 0.001
    model.fit_generator(batches, steps_per_epoch, epochs=4, validation_data=val_batches, 
                     validation_steps=validation_steps)
    return model



In [9]:

    
model = conv1(batches)









    



Epoch 1/2
24/24 [==============================] - 10s 433ms/step - loss: 1.5693 - acc: 0.5293 - val_loss: 2.2747 - val_acc: 0.2550
Epoch 2/2
24/24 [==============================] - 9s 367ms/step - loss: 0.3224 - acc: 0.9451 - val_loss: 1.8041 - val_acc: 0.2860
Epoch 1/4
24/24 [==============================] - 10s 413ms/step - loss: 0.1051 - acc: 0.9901 - val_loss: 1.9799 - val_acc: 0.3120
Epoch 2/4
24/24 [==============================] - 9s 367ms/step - loss: 0.0404 - acc: 0.9987 - val_loss: 2.1482 - val_acc: 0.3250
Epoch 3/4
24/24 [==============================] - 9s 367ms/step - loss: 0.0224 - acc: 0.9993 - val_loss: 2.2305 - val_acc: 0.3580
Epoch 4/4
24/24 [==============================] - 9s 369ms/step - loss: 0.0146 - acc: 1.0000 - val_loss: 2.2415 - val_acc: 0.3740

Interestingly, with no regularization or augmentation we're getting some reasonable results from our simple convolutional model. So with augmentation, we hopefully will see some very good results.

Data augmentation



In [10]:

    
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05, 
                shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)









    



Found 1500 images belonging to 10 classes.



In [11]:

    
model = conv1(batches)









    



Epoch 1/2
24/24 [==============================] - 19s 779ms/step - loss: 2.5529 - acc: 0.2210 - val_loss: 2.2162 - val_acc: 0.2770
Epoch 2/2
24/24 [==============================] - 15s 606ms/step - loss: 1.8697 - acc: 0.3692 - val_loss: 2.0072 - val_acc: 0.3230
Epoch 1/4
24/24 [==============================] - 19s 779ms/step - loss: 1.5607 - acc: 0.4863 - val_loss: 1.9515 - val_acc: 0.2840
Epoch 2/4
24/24 [==============================] - 15s 608ms/step - loss: 1.4616 - acc: 0.5177 - val_loss: 1.9731 - val_acc: 0.2970
Epoch 3/4
24/24 [==============================] - 15s 612ms/step - loss: 1.3418 - acc: 0.5453 - val_loss: 1.9866 - val_acc: 0.2150
Epoch 4/4
24/24 [==============================] - 15s 611ms/step - loss: 1.2361 - acc: 0.6133 - val_loss: 2.0294 - val_acc: 0.2290



In [12]:

    
model.optimizer.lr = 0.0001
model.fit_generator(batches, steps_per_epoch, epochs=15, validation_data=val_batches, 
                 validation_steps=validation_steps)









    



Epoch 1/15
24/24 [==============================] - 19s 789ms/step - loss: 1.1746 - acc: 0.6082 - val_loss: 1.9254 - val_acc: 0.2940
Epoch 2/15
24/24 [==============================] - 14s 603ms/step - loss: 1.1016 - acc: 0.6433 - val_loss: 1.8997 - val_acc: 0.3220
Epoch 3/15
24/24 [==============================] - 14s 601ms/step - loss: 1.0579 - acc: 0.6516 - val_loss: 1.8398 - val_acc: 0.3060
Epoch 4/15
24/24 [==============================] - 15s 605ms/step - loss: 0.9796 - acc: 0.6855 - val_loss: 1.6889 - val_acc: 0.3800
Epoch 5/15
24/24 [==============================] - 15s 613ms/step - loss: 0.8753 - acc: 0.7147 - val_loss: 1.6203 - val_acc: 0.4120
Epoch 6/15
24/24 [==============================] - 15s 605ms/step - loss: 0.8660 - acc: 0.7215 - val_loss: 1.5431 - val_acc: 0.4470
Epoch 7/15
24/24 [==============================] - 15s 612ms/step - loss: 0.8319 - acc: 0.7426 - val_loss: 1.3342 - val_acc: 0.4820
Epoch 8/15
24/24 [==============================] - 15s 629ms/step - loss: 0.7438 - acc: 0.7661 - val_loss: 1.2683 - val_acc: 0.5270
Epoch 9/15
24/24 [==============================] - 15s 623ms/step - loss: 0.7320 - acc: 0.7742 - val_loss: 1.2446 - val_acc: 0.5110
Epoch 10/15
24/24 [==============================] - 15s 609ms/step - loss: 0.7029 - acc: 0.7821 - val_loss: 1.0673 - val_acc: 0.5990
Epoch 11/15
24/24 [==============================] - 15s 612ms/step - loss: 0.6689 - acc: 0.8022 - val_loss: 0.8389 - val_acc: 0.7270
Epoch 12/15
24/24 [==============================] - 15s 607ms/step - loss: 0.6544 - acc: 0.8124 - val_loss: 0.7727 - val_acc: 0.7460
Epoch 13/15
24/24 [==============================] - 14s 604ms/step - loss: 0.6200 - acc: 0.8234 - val_loss: 0.8189 - val_acc: 0.7270
Epoch 14/15
24/24 [==============================] - 15s 619ms/step - loss: 0.5998 - acc: 0.8015 - val_loss: 0.5809 - val_acc: 0.8090
Epoch 15/15
24/24 [==============================] - 15s 616ms/step - loss: 0.5655 - acc: 0.8213 - val_loss: 0.5515 - val_acc: 0.8300






    Out[12]:





<keras.callbacks.History at 0x7f3c5003b240>

I'm shocked by how good these results are! We're regularly seeing 75-80% accuracy on the validation set, which puts us into the top third or better of the competition. With such a simple model and no dropout or semi-supervised learning, this really speaks to the power of this approach to data augmentation.

Four conv/pooling pairs + dropout

Unfortunately, the results are still very unstable - the validation accuracy jumps from epoch to epoch. Perhaps a deeper model with some dropout would help.



In [13]:

    
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05, 
                shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)









    



Found 1500 images belonging to 10 classes.



In [14]:

    
model = Sequential([
        BatchNormalization(axis=1, input_shape=(3,224,224)),
        Conv2D(32,(3,3), activation='relu'),
        BatchNormalization(axis=1),
        MaxPooling2D(),
        Conv2D(64,(3,3), activation='relu'),
        BatchNormalization(axis=1),
        MaxPooling2D(),
        Conv2D(128,(3,3), activation='relu'),
        BatchNormalization(axis=1),
        MaxPooling2D(),
        Flatten(),
        Dense(200, activation='relu'),
        BatchNormalization(),
        Dropout(0.5),
        Dense(200, activation='relu'),
        BatchNormalization(),
        Dropout(0.5),
        Dense(10, activation='softmax')
    ])



In [15]:

    
model.compile(Adam(lr=10e-5), loss='categorical_crossentropy', metrics=['accuracy'])



In [16]:

    
model.fit_generator(batches, steps_per_epoch, epochs=2, validation_data=val_batches, 
                 validation_steps=validation_steps)









    



Epoch 1/2
24/24 [==============================] - 19s 782ms/step - loss: 3.5253 - acc: 0.1230 - val_loss: 2.2149 - val_acc: 0.2100
Epoch 2/2
24/24 [==============================] - 15s 616ms/step - loss: 3.1404 - acc: 0.1552 - val_loss: 2.2839 - val_acc: 0.2300






    Out[16]:





<keras.callbacks.History at 0x7f3c7c2195f8>



In [17]:

    
model.optimizer.lr=0.001



In [18]:

    
model.fit_generator(batches, steps_per_epoch, epochs=10, validation_data=val_batches, 
                 validation_steps=validation_steps)









    



Epoch 1/10
24/24 [==============================] - 19s 783ms/step - loss: 2.9247 - acc: 0.1787 - val_loss: 2.3075 - val_acc: 0.1550
Epoch 2/10
24/24 [==============================] - 19s 774ms/step - loss: 2.8073 - acc: 0.2099 - val_loss: 2.4846 - val_acc: 0.1530
Epoch 3/10
24/24 [==============================] - 12s 516ms/step - loss: 2.6097 - acc: 0.2414 - val_loss: 2.7364 - val_acc: 0.1500
Epoch 4/10
24/24 [==============================] - 15s 611ms/step - loss: 2.5821 - acc: 0.2543 - val_loss: 2.9833 - val_acc: 0.1280
Epoch 5/10
24/24 [==============================] - 15s 611ms/step - loss: 2.4576 - acc: 0.2951 - val_loss: 3.0379 - val_acc: 0.1200
Epoch 6/10
24/24 [==============================] - 15s 620ms/step - loss: 2.4471 - acc: 0.2884 - val_loss: 3.1420 - val_acc: 0.1150
Epoch 7/10
24/24 [==============================] - 15s 610ms/step - loss: 2.2162 - acc: 0.3383 - val_loss: 3.2096 - val_acc: 0.1180
Epoch 8/10
24/24 [==============================] - 15s 608ms/step - loss: 2.2098 - acc: 0.3363 - val_loss: 3.2340 - val_acc: 0.1370
Epoch 9/10
24/24 [==============================] - 15s 609ms/step - loss: 2.1079 - acc: 0.3555 - val_loss: 2.8120 - val_acc: 0.1690
Epoch 10/10
24/24 [==============================] - 15s 615ms/step - loss: 2.0698 - acc: 0.3661 - val_loss: 2.5486 - val_acc: 0.1870






    Out[18]:





<keras.callbacks.History at 0x7f3c530d7da0>



In [19]:

    
model.optimizer.lr=0.00001



In [20]:

    
model.fit_generator(batches, steps_per_epoch, epochs=10, validation_data=val_batches, 
                 validation_steps=validation_steps)









    



Epoch 1/10
24/24 [==============================] - 23s 950ms/step - loss: 1.9946 - acc: 0.3802 - val_loss: 2.2454 - val_acc: 0.2220
Epoch 2/10
24/24 [==============================] - 16s 677ms/step - loss: 1.9861 - acc: 0.3889 - val_loss: 1.9335 - val_acc: 0.2960
Epoch 3/10
24/24 [==============================] - 12s 507ms/step - loss: 1.8180 - acc: 0.4163 - val_loss: 1.7115 - val_acc: 0.3890
Epoch 4/10
24/24 [==============================] - 19s 776ms/step - loss: 1.8090 - acc: 0.4290 - val_loss: 1.5234 - val_acc: 0.4540
Epoch 5/10
24/24 [==============================] - 12s 495ms/step - loss: 1.7559 - acc: 0.4508 - val_loss: 1.4253 - val_acc: 0.4990
Epoch 6/10
24/24 [==============================] - 15s 628ms/step - loss: 1.7149 - acc: 0.4569 - val_loss: 1.2912 - val_acc: 0.5500
Epoch 7/10
24/24 [==============================] - 15s 609ms/step - loss: 1.6628 - acc: 0.4809 - val_loss: 1.1859 - val_acc: 0.6020
Epoch 8/10
24/24 [==============================] - 19s 779ms/step - loss: 1.6352 - acc: 0.4798 - val_loss: 1.0579 - val_acc: 0.6440
Epoch 9/10
24/24 [==============================] - 16s 672ms/step - loss: 1.6067 - acc: 0.5000 - val_loss: 0.9203 - val_acc: 0.6820
Epoch 10/10
24/24 [==============================] - 12s 507ms/step - loss: 1.5851 - acc: 0.4990 - val_loss: 0.8714 - val_acc: 0.7030






    Out[20]:





<keras.callbacks.History at 0x7f3c530e6208>

This is looking quite a bit better - the accuracy is similar, but the stability is higher. There's still some way to go however...

Imagenet conv features

Since we have so little data, and it is similar to imagenet images (full color photos), using pre-trained VGG weights is likely to be helpful - in fact it seems likely that we won't need to fine-tune the convolutional layer weights much, if at all. So we can pre-compute the output of the last convolutional layer, as we did in lesson 3 when we experimented with dropout. (However this means that we can't use full data augmentation, since we can't pre-compute something that changes every image.)



In [21]:

    
vgg = Vgg16()
model=vgg.model
last_conv_idx = [i for i,l in enumerate(model.layers) if type(l) is Convolution2D][-1]
conv_layers = model.layers[:last_conv_idx+1]



In [22]:

    
conv_model = Sequential(conv_layers)



In [23]:

    
(val_classes, trn_classes, val_labels, trn_labels, 
    val_filenames, filenames, test_filenames) = get_classes(path)









    



Found 1500 images belonging to 10 classes.
Found 1000 images belonging to 10 classes.
Found 1000 images belonging to 1 classes.



In [24]:

    
test_batches = get_batches(path+'test', batch_size=batch_size*2, shuffle=False)









    



Found 1000 images belonging to 1 classes.



In [25]:

    
conv_feat = conv_model.predict_generator(batches, int(np.ceil(batches.samples/batch_size)))
conv_val_feat = conv_model.predict_generator(val_batches, int(np.ceil(val_batches.samples/(batch_size*2))))
conv_test_feat = conv_model.predict_generator(test_batches, int(np.ceil(test_batches.samples/(batch_size*2))))



In [26]:

    
save_array(path+'results/conv_val_feat.dat', conv_val_feat)
save_array(path+'results/conv_test_feat.dat', conv_test_feat)
save_array(path+'results/conv_feat.dat', conv_feat)



In [27]:

    
conv_feat = load_array(path+'results/conv_feat.dat')
conv_val_feat = load_array(path+'results/conv_val_feat.dat')
conv_val_feat.shape









    Out[27]:





(1000, 512, 14, 14)

Batchnorm dense layers on pretrained conv layers

Since we've pre-computed the output of the last convolutional layer, we need to create a network that takes that as input, and predicts our 10 classes. Let's try using a simplified version of VGG's dense layers.



In [28]:

    
def get_bn_layers(p):
    return [
        MaxPooling2D(input_shape=conv_layers[-1].output_shape[1:]),
        Flatten(),
        Dropout(p/2),
        Dense(128, activation='relu'),
        BatchNormalization(),
        Dropout(p/2),
        Dense(128, activation='relu'),
        BatchNormalization(),
        Dropout(p),
        Dense(10, activation='softmax')
        ]



In [29]:

    
p=0.8



In [30]:

    
bn_model = Sequential(get_bn_layers(p))
bn_model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])



In [31]:

    
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, epochs=1, 
             validation_data=(conv_val_feat, val_labels))









    



Train on 1500 samples, validate on 1000 samples
Epoch 1/1
1500/1500 [==============================] - 0s 211us/step - loss: 4.8353 - acc: 0.1073 - val_loss: 5.7340 - val_acc: 0.1140






    Out[31]:





<keras.callbacks.History at 0x7f3b85cf8748>



In [32]:

    
bn_model.optimizer.lr=0.01



In [33]:

    
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, epochs=2, 
             validation_data=(conv_val_feat, val_labels))









    



Train on 1500 samples, validate on 1000 samples
Epoch 1/2
1500/1500 [==============================] - 0s 214us/step - loss: 4.0202 - acc: 0.1100 - val_loss: 3.5989 - val_acc: 0.0920
Epoch 2/2
1500/1500 [==============================] - 0s 209us/step - loss: 3.7136 - acc: 0.1373 - val_loss: 3.1136 - val_acc: 0.0730






    Out[33]:





<keras.callbacks.History at 0x7f3b802b7128>



In [34]:

    
bn_model.save_weights(path+'models/conv8.h5')

Looking good! Let's try pre-computing 5 epochs worth of augmented data, so we can experiment with combining dropout and augmentation on the pre-trained model.

Pre-computed data augmentation + dropout

We'll use our usual data augmentation parameters:



In [35]:

    
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05, 
                shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
da_batches = get_batches(path+'train', gen_t, batch_size=batch_size, shuffle=False)









    



Found 1500 images belonging to 10 classes.

We use those to create a dataset of convolutional features 5x bigger than the training set.



In [36]:

    
da_conv_feat = conv_model.predict_generator(da_batches,  5*int(np.ceil((da_batches.samples)/(batch_size))), workers=3)



In [37]:

    
save_array(path+'results/da_conv_feat2.dat', da_conv_feat)



In [38]:

    
da_conv_feat = load_array(path+'results/da_conv_feat2.dat')

Let's include the real training data as well in its non-augmented form.



In [39]:

    
da_conv_feat = np.concatenate([da_conv_feat, conv_feat])

Since we've now got a dataset 6x bigger than before, we'll need to copy our labels 6 times too.



In [40]:

    
da_trn_labels = np.concatenate([trn_labels]*6)

Based on some experiments the previous model works well, with bigger dense layers.



In [41]:

    
def get_bn_da_layers(p):
    return [
        MaxPooling2D(input_shape=conv_layers[-1].output_shape[1:]),
        Flatten(),
        Dropout(p),
        Dense(256, activation='relu'),
        BatchNormalization(),
        Dropout(p),
        Dense(256, activation='relu'),
        BatchNormalization(),
        Dropout(p),
        Dense(10, activation='softmax')
        ]



In [42]:

    
p=0.8



In [43]:

    
bn_model = Sequential(get_bn_da_layers(p))
bn_model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

Now we can train the model as usual, with pre-computed augmented data.



In [44]:

    
bn_model.fit(da_conv_feat, da_trn_labels, batch_size=batch_size, epochs=1, 
             validation_data=(conv_val_feat, val_labels))









    



Train on 9000 samples, validate on 1000 samples
Epoch 1/1
9000/9000 [==============================] - 1s 157us/step - loss: 4.1739 - acc: 0.1190 - val_loss: 1.7323 - val_acc: 0.4320






    Out[44]:





<keras.callbacks.History at 0x7f3b2b2bc978>



In [45]:

    
bn_model.optimizer.lr=0.01



In [46]:

    
bn_model.fit(da_conv_feat, da_trn_labels, batch_size=batch_size, epochs=4, 
             validation_data=(conv_val_feat, val_labels))









    



Train on 9000 samples, validate on 1000 samples
Epoch 1/4
9000/9000 [==============================] - 1s 149us/step - loss: 3.0138 - acc: 0.1702 - val_loss: 1.5579 - val_acc: 0.6080
Epoch 2/4
9000/9000 [==============================] - 1s 148us/step - loss: 2.4110 - acc: 0.2206 - val_loss: 1.4365 - val_acc: 0.7230
Epoch 3/4
9000/9000 [==============================] - 1s 148us/step - loss: 2.0996 - acc: 0.2818 - val_loss: 1.3154 - val_acc: 0.7560
Epoch 4/4
9000/9000 [==============================] - 1s 148us/step - loss: 1.9277 - acc: 0.3307 - val_loss: 1.1721 - val_acc: 0.7630






    Out[46]:





<keras.callbacks.History at 0x7f3c7c1a09b0>



In [47]:

    
bn_model.optimizer.lr=0.0001



In [48]:

    
bn_model.fit(da_conv_feat, da_trn_labels, batch_size=batch_size, epochs=4, 
             validation_data=(conv_val_feat, val_labels))









    



Train on 9000 samples, validate on 1000 samples
Epoch 1/4
9000/9000 [==============================] - 1s 149us/step - loss: 1.8020 - acc: 0.3896 - val_loss: 1.0272 - val_acc: 0.8130
Epoch 2/4
9000/9000 [==============================] - 1s 148us/step - loss: 1.7003 - acc: 0.4277 - val_loss: 0.9038 - val_acc: 0.8470
Epoch 3/4
9000/9000 [==============================] - 1s 148us/step - loss: 1.6278 - acc: 0.4706 - val_loss: 0.7905 - val_acc: 0.8730
Epoch 4/4
9000/9000 [==============================] - 1s 152us/step - loss: 1.5601 - acc: 0.4964 - val_loss: 0.7329 - val_acc: 0.8860






    Out[48]:





<keras.callbacks.History at 0x7f3c7c1a0ac8>

Looks good - let's save those weights.



In [49]:

    
bn_model.save_weights(path+'models/da_conv8_1.h5')

Pseudo labeling

We're going to try using a combination of pseudo labeling and knowledge distillation to allow us to use unlabeled data (i.e. do semi-supervised learning). For our initial experiment we'll use the validation set as the unlabeled data, so that we can see that it is working without using the test set. At a later date we'll try using the test set.

To do this, we simply calculate the predictions of our model...



In [50]:

    
val_pseudo = bn_model.predict(conv_val_feat, batch_size=batch_size)

...concatenate them with our training labels...



In [51]:

    
comb_pseudo = np.concatenate([da_trn_labels, val_pseudo])



In [52]:

    
comb_feat = np.concatenate([da_conv_feat, conv_val_feat])

...and fine-tune our model using that data.



In [53]:

    
bn_model.load_weights(path+'models/da_conv8_1.h5')



In [54]:

    
bn_model.fit(comb_feat, comb_pseudo, batch_size=batch_size, epochs=1, 
             validation_data=(conv_val_feat, val_labels))









    



Train on 10000 samples, validate on 1000 samples
Epoch 1/1
10000/10000 [==============================] - 2s 152us/step - loss: 1.5338 - acc: 0.5402 - val_loss: 0.6742 - val_acc: 0.8880






    Out[54]:





<keras.callbacks.History at 0x7f3b2b11cf98>



In [55]:

    
bn_model.fit(comb_feat, comb_pseudo, batch_size=batch_size, epochs=4, 
             validation_data=(conv_val_feat, val_labels))









    



Train on 10000 samples, validate on 1000 samples
Epoch 1/4
10000/10000 [==============================] - 2s 152us/step - loss: 1.5131 - acc: 0.5492 - val_loss: 0.6520 - val_acc: 0.9070
Epoch 2/4
10000/10000 [==============================] - 1s 147us/step - loss: 1.4655 - acc: 0.5743 - val_loss: 0.6067 - val_acc: 0.9110
Epoch 3/4
10000/10000 [==============================] - 2s 154us/step - loss: 1.4394 - acc: 0.5821 - val_loss: 0.5799 - val_acc: 0.9040
Epoch 4/4
10000/10000 [==============================] - 1s 147us/step - loss: 1.4063 - acc: 0.6026 - val_loss: 0.5382 - val_acc: 0.9360






    Out[55]:





<keras.callbacks.History at 0x7f3b802b74a8>



In [56]:

    
bn_model.optimizer.lr=0.00001



In [57]:

    
bn_model.fit(comb_feat, comb_pseudo, batch_size=batch_size, epochs=4, 
             validation_data=(conv_val_feat, val_labels))









    



Train on 10000 samples, validate on 1000 samples
Epoch 1/4
10000/10000 [==============================] - 1s 150us/step - loss: 1.3719 - acc: 0.6176 - val_loss: 0.5019 - val_acc: 0.9450
Epoch 2/4
10000/10000 [==============================] - 1s 147us/step - loss: 1.3488 - acc: 0.6237 - val_loss: 0.4827 - val_acc: 0.9350
Epoch 3/4
10000/10000 [==============================] - 1s 149us/step - loss: 1.3219 - acc: 0.6430 - val_loss: 0.4658 - val_acc: 0.9440
Epoch 4/4
10000/10000 [==============================] - 1s 147us/step - loss: 1.3121 - acc: 0.6466 - val_loss: 0.4548 - val_acc: 0.9350






    Out[57]:





<keras.callbacks.History at 0x7f3b2b121160>

That's a distinct improvement - even although the validation set isn't very big. This looks encouraging for when we try this on the test set.



In [58]:

    
bn_model.save_weights(path+'models/bn-ps8.h5')

Submit

We'll find a good clipping amount using the validation set, prior to submitting.



In [59]:

    
def do_clip(arr, mx): return np.clip(arr, (1-mx)/9, mx)



In [60]:

    
val_preds = bn_model.predict(conv_val_feat, batch_size=batch_size*2)



In [61]:

    
np.mean(keras.metrics.categorical_crossentropy(val_labels, do_clip(val_preds, 0.93)).eval())









    Out[61]:





0.46317427399009464



In [62]:

    
conv_test_feat = load_array(path+'results/conv_test_feat.dat')



In [63]:

    
preds = bn_model.predict(conv_test_feat, batch_size=batch_size*2)



In [64]:

    
subm = do_clip(preds,0.93)



In [65]:

    
subm_name = path+'results/subm.gz'



In [66]:

    
classes = sorted(batches.class_indices, key=batches.class_indices.get)



In [67]:

    
submission = pd.DataFrame(subm, columns=classes)
submission.insert(0, 'img', [a[4:] for a in test_filenames])
submission.head()









    Out[67]:







  
    
      
      img
      c0
      c1
      c2
      c3
      c4
      c5
      c6
      c7
      c8
      c9
    
  
  
    
      0
      own/img_10001.jpg
      0.044445
      0.027838
      0.012445
      0.007778
      0.021270
      0.794290
      0.036183
      0.007778
      0.021229
      0.029527
    
    
      1
      own/img_100228.jpg
      0.026350
      0.148008
      0.152710
      0.016703
      0.025873
      0.035956
      0.350845
      0.057857
      0.165161
      0.020536
    
    
      2
      own/img_100259.jpg
      0.037174
      0.216431
      0.124052
      0.057873
      0.045128
      0.042009
      0.074831
      0.173426
      0.187613
      0.041462
    
    
      3
      own/img_100263.jpg
      0.042274
      0.040189
      0.014808
      0.635093
      0.185962
      0.019687
      0.011732
      0.010670
      0.017345
      0.022241
    
    
      4
      own/img_100596.jpg
      0.158790
      0.217054
      0.019222
      0.040138
      0.026702
      0.047620
      0.068982
      0.037552
      0.074853
      0.309085



In [68]:

    
submission.to_csv(subm_name, index=False, compression='gzip')



In [69]:

    
FileLink(subm_name)









    Out[69]:




data/state/sample/results/subm.gz

This gets 0.534 on the leaderboard.

The "things that didn't really work" section

You can safely ignore everything from here on, because they didn't really help.

Finetune some conv layers too



In [70]:

    
#for l in get_bn_layers(p): conv_model.add(l)  #  this choice would give a weight shape error
for l in get_bn_da_layers(p): conv_model.add(l)  # ... so probably this is the right one



In [71]:

    
for l1,l2 in zip(bn_model.layers, conv_model.layers[last_conv_idx+1:]):
    l2.set_weights(l1.get_weights())



In [72]:

    
for l in conv_model.layers: l.trainable =False



In [73]:

    
for l in conv_model.layers[last_conv_idx+1:]: l.trainable =True



In [74]:

    
comb = np.concatenate([trn, val])



In [75]:

    
# not knowing what the experiment was about, added this to avoid a shape match error with comb using gen_t.flow
comb_pseudo = np.concatenate([trn_labels, val_pseudo])



In [76]:

    
gen_t = image.ImageDataGenerator(rotation_range=8, height_shift_range=0.04, 
                shear_range=0.03, channel_shift_range=10, width_shift_range=0.08)



In [77]:

    
batches = gen_t.flow(comb, comb_pseudo, batch_size=batch_size)



In [78]:

    
val_batches = get_batches(path+'valid', batch_size=batch_size*2, shuffle=False)









    



Found 1000 images belonging to 10 classes.



In [79]:

    
conv_model.compile(Adam(lr=0.00001), loss='categorical_crossentropy', metrics=['accuracy'])



In [80]:

    
conv_model.fit_generator(batches, steps_per_epoch, epochs=1, validation_data=val_batches, 
                 validation_steps=validation_steps)









    



Epoch 1/1
24/24 [==============================] - 23s 941ms/step - loss: 1.2991 - acc: 0.6966 - val_loss: 0.4551 - val_acc: 0.9360






    Out[80]:





<keras.callbacks.History at 0x7f3b2a8fc9b0>



In [81]:

    
conv_model.optimizer.lr = 0.0001



In [82]:

    
conv_model.fit_generator(batches, steps_per_epoch, epochs=3, validation_data=val_batches, 
                 validation_steps=validation_steps)









    



Epoch 1/3
24/24 [==============================] - 23s 943ms/step - loss: 1.3306 - acc: 0.6947 - val_loss: 0.4558 - val_acc: 0.9370
Epoch 2/3
24/24 [==============================] - 23s 943ms/step - loss: 1.3471 - acc: 0.6934 - val_loss: 0.4539 - val_acc: 0.9350
Epoch 3/3
24/24 [==============================] - 22s 925ms/step - loss: 1.2838 - acc: 0.7064 - val_loss: 0.4546 - val_acc: 0.9350






    Out[82]:





<keras.callbacks.History at 0x7f3b280bea58>



In [83]:

    
for l in conv_model.layers[16:]: l.trainable =True



In [84]:

    
#- added compile instruction in order to avoid Keras 2.1 warning message
conv_model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])



In [85]:

    
conv_model.optimizer.lr = 0.00001



In [86]:

    
conv_model.fit_generator(batches, steps_per_epoch, epochs=8, validation_data=val_batches, 
                 validation_steps=validation_steps)









    



Epoch 1/8
24/24 [==============================] - 36s 2s/step - loss: 1.2655 - acc: 0.7324 - val_loss: 0.2987 - val_acc: 0.9340
Epoch 2/8
24/24 [==============================] - 37s 2s/step - loss: 1.1985 - acc: 0.7615 - val_loss: 0.3260 - val_acc: 0.9320
Epoch 3/8
24/24 [==============================] - 36s 2s/step - loss: 1.2366 - acc: 0.7602 - val_loss: 0.3567 - val_acc: 0.9250
Epoch 4/8
24/24 [==============================] - 37s 2s/step - loss: 1.2386 - acc: 0.7669 - val_loss: 0.3153 - val_acc: 0.9390
Epoch 5/8
24/24 [==============================] - 36s 2s/step - loss: 1.1640 - acc: 0.7671 - val_loss: 0.3056 - val_acc: 0.9420
Epoch 6/8
24/24 [==============================] - 37s 2s/step - loss: 1.1472 - acc: 0.7910 - val_loss: 0.3079 - val_acc: 0.9420
Epoch 7/8
24/24 [==============================] - 35s 1s/step - loss: 1.1709 - acc: 0.8094 - val_loss: 0.3302 - val_acc: 0.9420
Epoch 8/8
24/24 [==============================] - 37s 2s/step - loss: 1.1433 - acc: 0.7949 - val_loss: 0.3212 - val_acc: 0.9430






    Out[86]:





<keras.callbacks.History at 0x7f3b2a0d0828>



In [87]:

    
conv_model.save_weights(path+'models/conv8_ps.h5')



In [88]:

    
#conv_model.load_weights(path+'models/conv8_da.h5')  # conv8_da.h5 was not saved in this notebook



In [89]:

    
val_pseudo = conv_model.predict(val, batch_size=batch_size*2)



In [90]:

    
save_array(path+'models/pseudo8_da.dat', val_pseudo)

Ensembling



In [91]:

    
drivers_ds = pd.read_csv(path+'driver_imgs_list.csv')
drivers_ds.head()









    Out[91]:







  
    
      
      subject
      classname
      img
    
  
  
    
      0
      p002
      c0
      img_44733.jpg
    
    
      1
      p002
      c0
      img_72999.jpg
    
    
      2
      p002
      c0
      img_25094.jpg
    
    
      3
      p002
      c0
      img_69092.jpg
    
    
      4
      p002
      c0
      img_92629.jpg



In [92]:

    
img2driver = drivers_ds.set_index('img')['subject'].to_dict()



In [93]:

    
driver2imgs = {k: g["img"].tolist() 
               for k,g in drivers_ds[['subject', 'img']].groupby("subject")}



In [94]:

    
# It seems this function is not used in this notebook
def get_idx(driver_list):
    return [i for i,f in enumerate(filenames) if img2driver[f[3:]] in driver_list]



In [95]:

    
# drivers = driver2imgs.keys()  # Python 2
drivers = list(driver2imgs)  # Python 3



In [96]:

    
rnd_drivers = np.random.permutation(drivers)



In [97]:

    
ds1 = rnd_drivers[:len(rnd_drivers)//2]
ds2 = rnd_drivers[len(rnd_drivers)//2:]



In [ ]:

    
# The following cells seem to require some preparation code not included in this notebook
models=[fit_conv([d]) for d in drivers]
models=[m for m in models if m is not None]



In [ ]:

    
all_preds = np.stack([m.predict(conv_test_feat, batch_size=128) for m in models])
avg_preds = all_preds.mean(axis=0)
avg_preds = avg_preds/np.expand_dims(avg_preds.sum(axis=1), 1)



In [ ]:

    
keras.metrics.categorical_crossentropy(val_labels, np.clip(avg_val_preds,0.01,0.99)).eval()



In [ ]:

    
keras.metrics.categorical_accuracy(val_labels, np.clip(avg_val_preds,0.01,0.99)).eval()



In [ ]:

	img	c0	c1	c2	c3	c4	c5	c6	c7	c8	c9
0	own/img_10001.jpg	0.044445	0.027838	0.012445	0.007778	0.021270	0.794290	0.036183	0.007778	0.021229	0.029527
1	own/img_100228.jpg	0.026350	0.148008	0.152710	0.016703	0.025873	0.035956	0.350845	0.057857	0.165161	0.020536
2	own/img_100259.jpg	0.037174	0.216431	0.124052	0.057873	0.045128	0.042009	0.074831	0.173426	0.187613	0.041462
3	own/img_100263.jpg	0.042274	0.040189	0.014808	0.635093	0.185962	0.019687	0.011732	0.010670	0.017345	0.022241
4	own/img_100596.jpg	0.158790	0.217054	0.019222	0.040138	0.026702	0.047620	0.068982	0.037552	0.074853	0.309085

	subject	classname	img
0	p002	c0	img_44733.jpg
1	p002	c0	img_72999.jpg
2	p002	c0	img_25094.jpg
3	p002	c0	img_69092.jpg
4	p002	c0	img_92629.jpg