Enter State Farm


In [1]:
from theano.sandbox import cuda
cuda.use('gpu1')


Using gpu device 1: GeForce GTX TITAN X (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)

In [2]:
%matplotlib inline
from __future__ import print_function, division
#path = "data/state/"
path = "data/state/sample/"
import utils; reload(utils)
from utils import *
from IPython.display import FileLink


Using Theano backend.

In [3]:
batch_size=64

Create sample

The following assumes you've already created your validation set - remember that the training and validation set should contain different drivers, as mentioned on the Kaggle competition page.


In [ ]:
%cd data/state

In [ ]:
%cd train

In [ ]:
%mkdir ../sample
%mkdir ../sample/train
%mkdir ../sample/valid

In [ ]:
for d in glob('c?'):
    os.mkdir('../sample/train/'+d)
    os.mkdir('../sample/valid/'+d)

In [ ]:
from shutil import copyfile

In [ ]:
g = glob('c?/*.jpg')
shuf = np.random.permutation(g)
for i in range(1500): copyfile(shuf[i], '../sample/train/' + shuf[i])

In [ ]:
%cd ../valid

In [ ]:
g = glob('c?/*.jpg')
shuf = np.random.permutation(g)
for i in range(1000): copyfile(shuf[i], '../sample/valid/' + shuf[i])

In [ ]:
%cd ../../..

In [ ]:
%mkdir data/state/results

In [8]:
%mkdir data/state/sample/test

Create batches


In [56]:
batches = get_batches(path+'train', batch_size=batch_size)
val_batches = get_batches(path+'valid', batch_size=batch_size*2, shuffle=False)


Found 1568 images belonging to 10 classes.
Found 1002 images belonging to 10 classes.

In [5]:
(val_classes, trn_classes, val_labels, trn_labels, val_filenames, filenames,
    test_filename) = get_classes(path)


Found 1568 images belonging to 10 classes.
Found 1002 images belonging to 10 classes.
Found 0 images belonging to 0 classes.

Basic models

Linear model

First, we try the simplest model and use default parameters. Note the trick of making the first layer a batchnorm layer - that way we don't have to worry about normalizing the input ourselves.


In [6]:
model = Sequential([
        BatchNormalization(axis=1, input_shape=(3,224,224)),
        Flatten(),
        Dense(10, activation='softmax')
    ])

As you can see below, this training is going nowhere...


In [7]:
model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample)


Epoch 1/2
1568/1568 [==============================] - 20s - loss: 13.8189 - acc: 0.1040 - val_loss: 13.5792 - val_acc: 0.1517
Epoch 2/2
1568/1568 [==============================] - 5s - loss: 14.4052 - acc: 0.1052 - val_loss: 13.8349 - val_acc: 0.1397
Out[7]:
<keras.callbacks.History at 0x7fbf34d441d0>

Let's first check the number of parameters to see that there's enough parameters to find some useful relationships:


In [66]:
model.summary()


____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
batchnormalization_65 (BatchNorma(None, 3, 224, 224)   6           batchnormalization_input_23[0][0]
____________________________________________________________________________________________________
flatten_23 (Flatten)             (None, 150528)        0           batchnormalization_65[0][0]      
____________________________________________________________________________________________________
dense_39 (Dense)                 (None, 10)            1505290     flatten_23[0][0]                 
====================================================================================================
Total params: 1505296
____________________________________________________________________________________________________

Over 1.5 million parameters - that should be enough. Incidentally, it's worth checking you understand why this is the number of parameters in this layer:


In [67]:
10*3*224*224


Out[67]:
150528

Since we have a simple model with no regularization and plenty of parameters, it seems most likely that our learning rate is too high. Perhaps it is jumping to a solution where it predicts one or two classes with high confidence, so that it can give a zero prediction to as many classes as possible - that's the best approach for a model that is no better than random, and there is likely to be where we would end up with a high learning rate. So let's check:


In [10]:
np.round(model.predict_generator(batches, batches.N)[:10],2)


Out[10]:
array([[ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.]], dtype=float32)

Our hypothesis was correct. It's nearly always predicting class 1 or 6, with very high confidence. So let's try a lower learning rate:


In [14]:
model = Sequential([
        BatchNormalization(axis=1, input_shape=(3,224,224)),
        Flatten(),
        Dense(10, activation='softmax')
    ])
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample)


Epoch 1/2
1568/1568 [==============================] - 7s - loss: 2.4180 - acc: 0.1575 - val_loss: 5.2975 - val_acc: 0.1477
Epoch 2/2
1568/1568 [==============================] - 5s - loss: 1.7690 - acc: 0.4196 - val_loss: 4.0165 - val_acc: 0.1926
Out[14]:
<keras.callbacks.History at 0x7fbf22eda150>

Great - we found our way out of that hole... Now we can increase the learning rate and see where we can get to.


In [15]:
model.optimizer.lr=0.001

In [16]:
model.fit_generator(batches, batches.nb_sample, nb_epoch=4, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample)


Epoch 1/4
1568/1568 [==============================] - 7s - loss: 1.3763 - acc: 0.5816 - val_loss: 2.5994 - val_acc: 0.2884
Epoch 2/4
1568/1568 [==============================] - 5s - loss: 1.0961 - acc: 0.7136 - val_loss: 1.9945 - val_acc: 0.3902
Epoch 3/4
1568/1568 [==============================] - 5s - loss: 0.9395 - acc: 0.7730 - val_loss: 1.9828 - val_acc: 0.3822
Epoch 4/4
1568/1568 [==============================] - 5s - loss: 0.7894 - acc: 0.8323 - val_loss: 1.8041 - val_acc: 0.3962
Out[16]:
<keras.callbacks.History at 0x7fbf2294e210>

We're stabilizing at validation accuracy of 0.39. Not great, but a lot better than random. Before moving on, let's check that our validation set on the sample is large enough that it gives consistent results:


In [6]:
rnd_batches = get_batches(path+'valid', batch_size=batch_size*2, shuffle=True)


Found 1002 images belonging to 10 classes.

In [11]:
val_res = [model.evaluate_generator(rnd_batches, rnd_batches.nb_sample) for i in range(10)]
np.round(val_res, 2)


Out[11]:
array([[ 4.4 ,  0.49],
       [ 4.57,  0.49],
       [ 4.48,  0.48],
       [ 4.28,  0.51],
       [ 4.66,  0.48],
       [ 4.5 ,  0.49],
       [ 4.46,  0.49],
       [ 4.51,  0.47],
       [ 4.45,  0.51],
       [ 4.47,  0.49]])

Yup, pretty consistent - if we see improvements of 3% or more, it's probably not random, based on the above samples.

L2 regularization

The previous model is over-fitting a lot, but we can't use dropout since we only have one layer. We can try to decrease overfitting in our model by adding l2 regularization (i.e. add the sum of squares of the weights to our loss function):


In [20]:
model = Sequential([
        BatchNormalization(axis=1, input_shape=(3,224,224)),
        Flatten(),
        Dense(10, activation='softmax', W_regularizer=l2(0.01))
    ])
model.compile(Adam(lr=10e-5), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample)


Epoch 1/2
1568/1568 [==============================] - 7s - loss: 5.7173 - acc: 0.2583 - val_loss: 14.5162 - val_acc: 0.0988
Epoch 2/2
1568/1568 [==============================] - 5s - loss: 2.5953 - acc: 0.6148 - val_loss: 4.8340 - val_acc: 0.3952
Out[20]:
<keras.callbacks.History at 0x7fbf05ab7190>

In [21]:
model.optimizer.lr=0.001

In [22]:
model.fit_generator(batches, batches.nb_sample, nb_epoch=4, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample)


Epoch 1/4
1568/1568 [==============================] - 7s - loss: 1.5759 - acc: 0.8355 - val_loss: 4.3326 - val_acc: 0.3902
Epoch 2/4
1568/1568 [==============================] - 5s - loss: 0.9414 - acc: 0.8552 - val_loss: 3.5898 - val_acc: 0.3872
Epoch 3/4
1568/1568 [==============================] - 5s - loss: 0.4152 - acc: 0.9401 - val_loss: 2.3976 - val_acc: 0.4780
Epoch 4/4
1568/1568 [==============================] - 5s - loss: 0.3282 - acc: 0.9726 - val_loss: 2.3441 - val_acc: 0.5100
Out[22]:
<keras.callbacks.History at 0x7fbf1a862dd0>

Looks like we can get a bit over 50% accuracy this way. This will be a good benchmark for our future models - if we can't beat 50%, then we're not even beating a linear model trained on a sample, so we'll know that's not a good approach.

Single hidden layer

The next simplest model is to add a single hidden layer.


In [34]:
model = Sequential([
        BatchNormalization(axis=1, input_shape=(3,224,224)),
        Flatten(),
        Dense(100, activation='relu'),
        BatchNormalization(),
        Dense(10, activation='softmax')
    ])
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample)

model.optimizer.lr = 0.01
model.fit_generator(batches, batches.nb_sample, nb_epoch=5, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample)


Epoch 1/2
1568/1568 [==============================] - 7s - loss: 2.0182 - acc: 0.3412 - val_loss: 3.4769 - val_acc: 0.2435
Epoch 2/2
1568/1568 [==============================] - 5s - loss: 1.0104 - acc: 0.7379 - val_loss: 2.2270 - val_acc: 0.4361
Epoch 1/5
1568/1568 [==============================] - 7s - loss: 0.5350 - acc: 0.9043 - val_loss: 1.8474 - val_acc: 0.4621
Epoch 2/5
1568/1568 [==============================] - 5s - loss: 0.3459 - acc: 0.9458 - val_loss: 1.9591 - val_acc: 0.4222
Epoch 3/5
1568/1568 [==============================] - 5s - loss: 0.2296 - acc: 0.9802 - val_loss: 1.7887 - val_acc: 0.4441
Epoch 4/5
1568/1568 [==============================] - 5s - loss: 0.1591 - acc: 0.9936 - val_loss: 1.6847 - val_acc: 0.4830
Epoch 5/5
1568/1568 [==============================] - 5s - loss: 0.1204 - acc: 0.9943 - val_loss: 1.6344 - val_acc: 0.4910
Out[34]:
<keras.callbacks.History at 0x7fbf0da4f990>

Not looking very encouraging... which isn't surprising since we know that CNNs are a much better choice for computer vision problems. So we'll try one.

Single conv layer

2 conv layers with max pooling followed by a simple dense network is a good simple CNN to start with:


In [61]:
def conv1(batches):
    model = Sequential([
            BatchNormalization(axis=1, input_shape=(3,224,224)),
            Convolution2D(32,3,3, activation='relu'),
            BatchNormalization(axis=1),
            MaxPooling2D((3,3)),
            Convolution2D(64,3,3, activation='relu'),
            BatchNormalization(axis=1),
            MaxPooling2D((3,3)),
            Flatten(),
            Dense(200, activation='relu'),
            BatchNormalization(),
            Dense(10, activation='softmax')
        ])

    model.compile(Adam(lr=1e-4), loss='categorical_crossentropy', metrics=['accuracy'])
    model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches, 
                     nb_val_samples=val_batches.nb_sample)
    model.optimizer.lr = 0.001
    model.fit_generator(batches, batches.nb_sample, nb_epoch=4, validation_data=val_batches, 
                     nb_val_samples=val_batches.nb_sample)
    return model

In [62]:
conv1(batches)


Epoch 1/2
1568/1568 [==============================] - 11s - loss: 1.3664 - acc: 0.6020 - val_loss: 1.8697 - val_acc: 0.3932
Epoch 2/2
1568/1568 [==============================] - 11s - loss: 0.3201 - acc: 0.9388 - val_loss: 2.2294 - val_acc: 0.1537
Epoch 1/4
1568/1568 [==============================] - 11s - loss: 0.0862 - acc: 0.9911 - val_loss: 2.5230 - val_acc: 0.1517
Epoch 2/4
1568/1568 [==============================] - 11s - loss: 0.0350 - acc: 0.9994 - val_loss: 2.8057 - val_acc: 0.1497
Epoch 3/4
1568/1568 [==============================] - 11s - loss: 0.0201 - acc: 1.0000 - val_loss: 2.9036 - val_acc: 0.1607
Epoch 4/4
1568/1568 [==============================] - 11s - loss: 0.0124 - acc: 1.0000 - val_loss: 2.9390 - val_acc: 0.1647
Out[62]:
<keras.models.Sequential at 0x7fbe53519190>

The training set here is very rapidly reaching a very high accuracy. So if we could regularize this, perhaps we could get a reasonable result.

So, what kind of regularization should we try first? As we discussed in lesson 3, we should start with data augmentation.

Data augmentation

To find the best data augmentation parameters, we can try each type of data augmentation, one at a time. For each type, we can try four very different levels of augmentation, and see which is the best. In the steps below we've only kept the single best result we found. We're using the CNN we defined above, since we have already observed it can model the data quickly and accurately.

Width shift: move the image left and right -


In [63]:
gen_t = image.ImageDataGenerator(width_shift_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)


Found 1568 images belonging to 10 classes.

In [64]:
model = conv1(batches)


Epoch 1/2
1568/1568 [==============================] - 11s - loss: 2.1802 - acc: 0.3316 - val_loss: 2.9037 - val_acc: 0.1038
Epoch 2/2
1568/1568 [==============================] - 11s - loss: 1.0996 - acc: 0.6862 - val_loss: 2.1270 - val_acc: 0.2495
Epoch 1/4
1568/1568 [==============================] - 11s - loss: 0.6856 - acc: 0.8106 - val_loss: 2.1610 - val_acc: 0.1487
Epoch 2/4
1568/1568 [==============================] - 11s - loss: 0.4989 - acc: 0.8693 - val_loss: 2.0959 - val_acc: 0.2525
Epoch 3/4
1568/1568 [==============================] - 11s - loss: 0.3715 - acc: 0.9120 - val_loss: 2.1168 - val_acc: 0.2385
Epoch 4/4
1568/1568 [==============================] - 11s - loss: 0.2916 - acc: 0.9254 - val_loss: 2.1028 - val_acc: 0.3044

Height shift: move the image up and down -


In [65]:
gen_t = image.ImageDataGenerator(height_shift_range=0.05)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)


Found 1568 images belonging to 10 classes.

In [66]:
model = conv1(batches)


Epoch 1/2
1568/1568 [==============================] - 11s - loss: 1.7843 - acc: 0.4458 - val_loss: 2.1259 - val_acc: 0.2375
Epoch 2/2
1568/1568 [==============================] - 11s - loss: 0.7028 - acc: 0.7825 - val_loss: 2.0232 - val_acc: 0.3164
Epoch 1/4
1568/1568 [==============================] - 11s - loss: 0.3586 - acc: 0.9152 - val_loss: 2.1772 - val_acc: 0.1806
Epoch 2/4
1568/1568 [==============================] - 11s - loss: 0.2335 - acc: 0.9490 - val_loss: 2.1935 - val_acc: 0.1727
Epoch 3/4
1568/1568 [==============================] - 11s - loss: 0.1626 - acc: 0.9656 - val_loss: 2.1944 - val_acc: 0.2106
Epoch 4/4
1568/1568 [==============================] - 11s - loss: 0.1214 - acc: 0.9758 - val_loss: 2.3481 - val_acc: 0.1766

Random shear angles (max in radians) -


In [67]:
gen_t = image.ImageDataGenerator(shear_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)


Found 1568 images belonging to 10 classes.

In [68]:
model = conv1(batches)


Epoch 1/2
1568/1568 [==============================] - 11s - loss: 1.6148 - acc: 0.5223 - val_loss: 2.2513 - val_acc: 0.2475
Epoch 2/2
1568/1568 [==============================] - 11s - loss: 0.3915 - acc: 0.9203 - val_loss: 2.0757 - val_acc: 0.2725
Epoch 1/4
1568/1568 [==============================] - 11s - loss: 0.1478 - acc: 0.9821 - val_loss: 2.1869 - val_acc: 0.3084
Epoch 2/4
1568/1568 [==============================] - 11s - loss: 0.0831 - acc: 0.9904 - val_loss: 2.2449 - val_acc: 0.3164
Epoch 3/4
1568/1568 [==============================] - 11s - loss: 0.0530 - acc: 0.9955 - val_loss: 2.2426 - val_acc: 0.3154
Epoch 4/4
1568/1568 [==============================] - 11s - loss: 0.0343 - acc: 0.9994 - val_loss: 2.2609 - val_acc: 0.3234

Rotation: max in degrees -


In [69]:
gen_t = image.ImageDataGenerator(rotation_range=15)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)


Found 1568 images belonging to 10 classes.

In [70]:
model = conv1(batches)


Epoch 1/2
1568/1568 [==============================] - 11s - loss: 1.9734 - acc: 0.3865 - val_loss: 2.1849 - val_acc: 0.3064
Epoch 2/2
1568/1568 [==============================] - 11s - loss: 0.8523 - acc: 0.7411 - val_loss: 2.0310 - val_acc: 0.2655
Epoch 1/4
1568/1568 [==============================] - 11s - loss: 0.4652 - acc: 0.8833 - val_loss: 2.0401 - val_acc: 0.2036
Epoch 2/4
1568/1568 [==============================] - 11s - loss: 0.3448 - acc: 0.9101 - val_loss: 2.2149 - val_acc: 0.1317
Epoch 3/4
1568/1568 [==============================] - 11s - loss: 0.2411 - acc: 0.9420 - val_loss: 2.2614 - val_acc: 0.1287
Epoch 4/4
1568/1568 [==============================] - 11s - loss: 0.1722 - acc: 0.9636 - val_loss: 2.1208 - val_acc: 0.2106

Channel shift: randomly changing the R,G,B colors -


In [76]:
gen_t = image.ImageDataGenerator(channel_shift_range=20)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)


Found 1568 images belonging to 10 classes.

In [77]:
model = conv1(batches)


Epoch 1/2
1568/1568 [==============================] - 11s - loss: 1.6381 - acc: 0.5191 - val_loss: 2.2146 - val_acc: 0.3483
Epoch 2/2
1568/1568 [==============================] - 11s - loss: 0.3530 - acc: 0.9305 - val_loss: 2.0966 - val_acc: 0.2665
Epoch 1/4
1568/1568 [==============================] - 11s - loss: 0.1036 - acc: 0.9923 - val_loss: 2.4195 - val_acc: 0.1766
Epoch 2/4
1568/1568 [==============================] - 11s - loss: 0.0450 - acc: 1.0000 - val_loss: 2.6192 - val_acc: 0.1667
Epoch 3/4
1568/1568 [==============================] - 11s - loss: 0.0259 - acc: 0.9994 - val_loss: 2.7227 - val_acc: 0.1816
Epoch 4/4
1568/1568 [==============================] - 11s - loss: 0.0180 - acc: 0.9994 - val_loss: 2.7049 - val_acc: 0.2206

And finally, putting it all together!


In [75]:
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05, 
                shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)


Found 1568 images belonging to 10 classes.

In [59]:
model = conv1(batches)


Epoch 1/2
1568/1568 [==============================] - 12s - loss: 2.4533 - acc: 0.2258 - val_loss: 2.1042 - val_acc: 0.2265
Epoch 2/2
1568/1568 [==============================] - 11s - loss: 1.7107 - acc: 0.4305 - val_loss: 2.1321 - val_acc: 0.2295
Epoch 1/2
1568/1568 [==============================] - 11s - loss: 1.4329 - acc: 0.5478 - val_loss: 2.3451 - val_acc: 0.1427
Epoch 2/2
1568/1568 [==============================] - 11s - loss: 1.2623 - acc: 0.5918 - val_loss: 2.4122 - val_acc: 0.1088

At first glance, this isn't looking encouraging, since the validation set is poor and getting worse. But the training set is getting better, and still has a long way to go in accuracy - so we should try annealing our learning rate and running more epochs, before we make a decisions.


In [60]:
model.optimizer.lr = 0.0001
model.fit_generator(batches, batches.nb_sample, nb_epoch=5, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample)


Epoch 1/5
1568/1568 [==============================] - 11s - loss: 1.1570 - acc: 0.6282 - val_loss: 2.4787 - val_acc: 0.1048
Epoch 2/5
1568/1568 [==============================] - 11s - loss: 1.0278 - acc: 0.6582 - val_loss: 2.4211 - val_acc: 0.1267
Epoch 3/5
1568/1568 [==============================] - 11s - loss: 0.9459 - acc: 0.6939 - val_loss: 2.5656 - val_acc: 0.1477
Epoch 4/5
1568/1568 [==============================] - 11s - loss: 0.9045 - acc: 0.6996 - val_loss: 2.2994 - val_acc: 0.2365
Epoch 5/5
1568/1568 [==============================] - 11s - loss: 0.8346 - acc: 0.7360 - val_loss: 2.1203 - val_acc: 0.2705
Out[60]:
<keras.callbacks.History at 0x7fe52f0bef90>

Lucky we tried that - we starting to make progress! Let's keep going.


In [61]:
model.fit_generator(batches, batches.nb_sample, nb_epoch=25, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample)


Epoch 1/25
1568/1568 [==============================] - 11s - loss: 0.8055 - acc: 0.7423 - val_loss: 2.0895 - val_acc: 0.2984
Epoch 2/25
1568/1568 [==============================] - 11s - loss: 0.7538 - acc: 0.7621 - val_loss: 1.8985 - val_acc: 0.4212
Epoch 3/25
1568/1568 [==============================] - 11s - loss: 0.7037 - acc: 0.7774 - val_loss: 1.7200 - val_acc: 0.4411
Epoch 4/25
1568/1568 [==============================] - 11s - loss: 0.6865 - acc: 0.7966 - val_loss: 1.5225 - val_acc: 0.5180
Epoch 5/25
1568/1568 [==============================] - 11s - loss: 0.6404 - acc: 0.8036 - val_loss: 1.3924 - val_acc: 0.5319
Epoch 6/25
1568/1568 [==============================] - 11s - loss: 0.6116 - acc: 0.8144 - val_loss: 1.4472 - val_acc: 0.5259
Epoch 7/25
1568/1568 [==============================] - 11s - loss: 0.5671 - acc: 0.8361 - val_loss: 1.4703 - val_acc: 0.5549
Epoch 8/25
1568/1568 [==============================] - 11s - loss: 0.5559 - acc: 0.8265 - val_loss: 1.2402 - val_acc: 0.6337
Epoch 9/25
1568/1568 [==============================] - 11s - loss: 0.5434 - acc: 0.8406 - val_loss: 1.2765 - val_acc: 0.6297
Epoch 10/25
1568/1568 [==============================] - 11s - loss: 0.4877 - acc: 0.8533 - val_loss: 1.2366 - val_acc: 0.6267
Epoch 11/25
1568/1568 [==============================] - 11s - loss: 0.4944 - acc: 0.8406 - val_loss: 1.3992 - val_acc: 0.5349
Epoch 12/25
1568/1568 [==============================] - 11s - loss: 0.4694 - acc: 0.8597 - val_loss: 1.1821 - val_acc: 0.6277
Epoch 13/25
1568/1568 [==============================] - 11s - loss: 0.4251 - acc: 0.8858 - val_loss: 1.1803 - val_acc: 0.6427
Epoch 14/25
1568/1568 [==============================] - 11s - loss: 0.4501 - acc: 0.8680 - val_loss: 1.2752 - val_acc: 0.5908
Epoch 15/25
1568/1568 [==============================] - 11s - loss: 0.3922 - acc: 0.8846 - val_loss: 1.1758 - val_acc: 0.6457
Epoch 16/25
1568/1568 [==============================] - 11s - loss: 0.4406 - acc: 0.8629 - val_loss: 1.3147 - val_acc: 0.5808
Epoch 17/25
1568/1568 [==============================] - 11s - loss: 0.4075 - acc: 0.8788 - val_loss: 1.2941 - val_acc: 0.6148
Epoch 18/25
1568/1568 [==============================] - 11s - loss: 0.3890 - acc: 0.8948 - val_loss: 1.1871 - val_acc: 0.6567
Epoch 19/25
1568/1568 [==============================] - 11s - loss: 0.3708 - acc: 0.8890 - val_loss: 1.1560 - val_acc: 0.6756
Epoch 20/25
1568/1568 [==============================] - 11s - loss: 0.3539 - acc: 0.8973 - val_loss: 1.2621 - val_acc: 0.6537
Epoch 21/25
1568/1568 [==============================] - 11s - loss: 0.3582 - acc: 0.8909 - val_loss: 1.1357 - val_acc: 0.6677
Epoch 22/25
1568/1568 [==============================] - 11s - loss: 0.3232 - acc: 0.9056 - val_loss: 1.2114 - val_acc: 0.6287
Epoch 23/25
1568/1568 [==============================] - 11s - loss: 0.3286 - acc: 0.9011 - val_loss: 1.2917 - val_acc: 0.6377
Epoch 24/25
1568/1568 [==============================] - 11s - loss: 0.3080 - acc: 0.9139 - val_loss: 1.2519 - val_acc: 0.6248
Epoch 25/25
1568/1568 [==============================] - 11s - loss: 0.2999 - acc: 0.9152 - val_loss: 1.1980 - val_acc: 0.6647
Out[61]:
<keras.callbacks.History at 0x7fe52f0bea50>

Amazingly, using nothing but a small sample, a simple (not pre-trained) model with no dropout, and data augmentation, we're getting results that would get us into the top 50% of the competition! This looks like a great foundation for our futher experiments.

To go further, we'll need to use the whole dataset, since dropout and data volumes are very related, so we can't tweak dropout without using all the data.