Enter State Farm


In [1]:
from theano.sandbox import cuda
cuda.use('gpu1')


WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10).  Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5103)
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10).  Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

WARNING (theano.sandbox.cuda): Ignoring call to use(1), GPU number 0 is already in use.

In [2]:
%matplotlib inline
from __future__ import print_function, division
#path = "data/state/"
path = "data/state/sample/"
import utils; reload(utils)
from utils import *
from IPython.display import FileLink


Using Theano backend.

In [3]:
batch_size=64

Create sample

The following assumes you've already created your validation set - remember that the training and validation set should contain different drivers, as mentioned on the Kaggle competition page.


In [ ]:
%cd data/state

In [ ]:
%cd train

In [ ]:
%mkdir ../sample
%mkdir ../sample/train
%mkdir ../sample/valid

In [ ]:
for d in glob('c?'):
    os.mkdir('../sample/train/'+d)
    os.mkdir('../sample/valid/'+d)

In [ ]:
from shutil import copyfile

In [ ]:
g = glob('c?/*.jpg')
shuf = np.random.permutation(g)
for i in range(1500): copyfile(shuf[i], '../sample/train/' + shuf[i])

In [ ]:
%cd ../valid

In [ ]:
g = glob('c?/*.jpg')
shuf = np.random.permutation(g)
for i in range(1000): copyfile(shuf[i], '../sample/valid/' + shuf[i])

In [ ]:
%cd ../../..

In [ ]:
%mkdir data/state/results

In [8]:
%mkdir data/state/sample/test

Create batches


In [4]:
batches = get_batches(path+'train', batch_size=batch_size)
val_batches = get_batches(path+'valid', batch_size=batch_size*2, shuffle=False)


Found 1500 images belonging to 10 classes.
Found 1000 images belonging to 10 classes.

In [5]:
(val_classes, trn_classes, val_labels, trn_labels, val_filenames, filenames,
    test_filename) = get_classes(path)


Found 1500 images belonging to 10 classes.
Found 1000 images belonging to 10 classes.
Found 0 images belonging to 0 classes.

Basic models

Linear model

First, we try the simplest model and use default parameters. Note the trick of making the first layer a batchnorm layer - that way we don't have to worry about normalizing the input ourselves.


In [6]:
model = Sequential([
        BatchNormalization(axis=1, input_shape=(3,224,224)),
        Flatten(),
        Dense(10, activation='softmax')
    ])

As you can see below, this training is going nowhere...


In [7]:
model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample, verbose=2)


Epoch 1/2
39s - loss: 13.6830 - acc: 0.1120 - val_loss: 14.3058 - val_acc: 0.1090
Epoch 2/2
27s - loss: 14.2099 - acc: 0.1160 - val_loss: 14.3290 - val_acc: 0.1110
Out[7]:
<keras.callbacks.History at 0x7fe6107e5810>

Let's first check the number of parameters to see that there's enough parameters to find some useful relationships:


In [8]:
model.summary()


____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
batchnormalization_1 (BatchNorma (None, 3, 224, 224)   12          batchnormalization_input_1[0][0] 
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 150528)        0           batchnormalization_1[0][0]       
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 10)            1505290     flatten_1[0][0]                  
====================================================================================================
Total params: 1,505,302
Trainable params: 1,505,296
Non-trainable params: 6
____________________________________________________________________________________________________

Over 1.5 million parameters - that should be enough. Incidentally, it's worth checking you understand why this is the number of parameters in this layer:


In [9]:
10*3*224*224


Out[9]:
1505280

Since we have a simple model with no regularization and plenty of parameters, it seems most likely that our learning rate is too high. Perhaps it is jumping to a solution where it predicts one or two classes with high confidence, so that it can give a zero prediction to as many classes as possible - that's the best approach for a model that is no better than random, and there is likely to be where we would end up with a high learning rate. So let's check:


In [11]:
np.round(model.predict_generator(batches, batches.n)[:10],2)


Out[11]:
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.]], dtype=float32)

Our hypothesis was correct. It's nearly always predicting class 1 or 6, with very high confidence. So let's try a lower learning rate:


In [12]:
model = Sequential([
        BatchNormalization(axis=1, input_shape=(3,224,224)),
        Flatten(),
        Dense(10, activation='softmax')
    ])
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample, verbose=2)


Epoch 1/2
34s - loss: 2.3421 - acc: 0.2000 - val_loss: 5.3378 - val_acc: 0.1100
Epoch 2/2
25s - loss: 1.7499 - acc: 0.4227 - val_loss: 4.8640 - val_acc: 0.1500
Out[12]:
<keras.callbacks.History at 0x7fe605a37b10>

Great - we found our way out of that hole... Now we can increase the learning rate and see where we can get to.


In [15]:
model.optimizer.lr=0.001

In [16]:
model.fit_generator(batches, batches.nb_sample, nb_epoch=4, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample)


Epoch 1/4
1568/1568 [==============================] - 7s - loss: 1.3763 - acc: 0.5816 - val_loss: 2.5994 - val_acc: 0.2884
Epoch 2/4
1568/1568 [==============================] - 5s - loss: 1.0961 - acc: 0.7136 - val_loss: 1.9945 - val_acc: 0.3902
Epoch 3/4
1568/1568 [==============================] - 5s - loss: 0.9395 - acc: 0.7730 - val_loss: 1.9828 - val_acc: 0.3822
Epoch 4/4
1568/1568 [==============================] - 5s - loss: 0.7894 - acc: 0.8323 - val_loss: 1.8041 - val_acc: 0.3962
Out[16]:
<keras.callbacks.History at 0x7fbf2294e210>

We're stabilizing at validation accuracy of 0.39. Not great, but a lot better than random. Before moving on, let's check that our validation set on the sample is large enough that it gives consistent results:


In [13]:
rnd_batches = get_batches(path+'valid', batch_size=batch_size*2, shuffle=True)


Found 1000 images belonging to 10 classes.

In [14]:
val_res = [model.evaluate_generator(rnd_batches, rnd_batches.nb_sample) for i in range(10)]
np.round(val_res, 2)


Out[14]:
array([[ 4.86,  0.15],
       [ 4.93,  0.15],
       [ 4.79,  0.16],
       [ 4.83,  0.15],
       [ 4.88,  0.15],
       [ 4.91,  0.16],
       [ 4.85,  0.14],
       [ 4.77,  0.16],
       [ 4.86,  0.15],
       [ 4.88,  0.15]])

Yup, pretty consistent - if we see improvements of 3% or more, it's probably not random, based on the above samples.

L2 regularization

The previous model is over-fitting a lot, but we can't use dropout since we only have one layer. We can try to decrease overfitting in our model by adding l2 regularization (i.e. add the sum of squares of the weights to our loss function):


In [16]:
model = Sequential([
        BatchNormalization(axis=1, input_shape=(3,224,224)),
        Flatten(),
        Dense(10, activation='softmax', W_regularizer=l2(0.01))
    ])
model.compile(Adam(lr=10e-5), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample, verbose=2)


Epoch 1/2
34s - loss: 4.3294 - acc: 0.3040 - val_loss: 12.1583 - val_acc: 0.1390
Epoch 2/2
26s - loss: 2.5007 - acc: 0.6613 - val_loss: 6.8278 - val_acc: 0.2170
Out[16]:
<keras.callbacks.History at 0x7fe5fb8073d0>

In [17]:
model.optimizer.lr=0.001

In [18]:
model.fit_generator(batches, batches.nb_sample, nb_epoch=4, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample, verbose=2)


Epoch 1/4
34s - loss: 2.1336 - acc: 0.7907 - val_loss: 4.5187 - val_acc: 0.3520
Epoch 2/4
25s - loss: 1.7667 - acc: 0.8773 - val_loss: 4.2702 - val_acc: 0.3370
Epoch 3/4
26s - loss: 1.9875 - acc: 0.8707 - val_loss: 4.4408 - val_acc: 0.3350
Epoch 4/4
25s - loss: 1.7687 - acc: 0.8973 - val_loss: 3.9340 - val_acc: 0.3860
Out[18]:
<keras.callbacks.History at 0x7fe5fb5d8610>

Looks like we can get a bit over 50% accuracy this way. This will be a good benchmark for our future models - if we can't beat 50%, then we're not even beating a linear model trained on a sample, so we'll know that's not a good approach.

Single hidden layer

The next simplest model is to add a single hidden layer.


In [19]:
model = Sequential([
        BatchNormalization(axis=1, input_shape=(3,224,224)),
        Flatten(),
        Dense(100, activation='relu'),
        BatchNormalization(),
        Dense(10, activation='softmax')
    ])
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample,verbose=2)

model.optimizer.lr = 0.01
model.fit_generator(batches, batches.nb_sample, nb_epoch=5, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample,verbose=2)


Epoch 1/2
34s - loss: 1.9573 - acc: 0.3440 - val_loss: 11.2260 - val_acc: 0.1150
Epoch 2/2
27s - loss: 1.0177 - acc: 0.7473 - val_loss: 6.5736 - val_acc: 0.1760
Epoch 1/5
35s - loss: 0.5473 - acc: 0.8987 - val_loss: 3.3755 - val_acc: 0.2180
Epoch 2/5
27s - loss: 0.3321 - acc: 0.9533 - val_loss: 2.6395 - val_acc: 0.2980
Epoch 3/5
27s - loss: 0.2289 - acc: 0.9773 - val_loss: 2.2630 - val_acc: 0.3370
Epoch 4/5
27s - loss: 0.1486 - acc: 0.9927 - val_loss: 2.0141 - val_acc: 0.3780
Epoch 5/5
28s - loss: 0.1116 - acc: 0.9980 - val_loss: 1.9483 - val_acc: 0.3930
Out[19]:
<keras.callbacks.History at 0x7fe5f9213e90>

Not looking very encouraging... which isn't surprising since we know that CNNs are a much better choice for computer vision problems. So we'll try one.

Single conv layer

2 conv layers with max pooling followed by a simple dense network is a good simple CNN to start with:


In [20]:
def conv1(batches):
    model = Sequential([
            BatchNormalization(axis=1, input_shape=(3,224,224)),
            Convolution2D(32,3,3, activation='relu'),
            BatchNormalization(axis=1),
            MaxPooling2D((3,3)),
            Convolution2D(64,3,3, activation='relu'),
            BatchNormalization(axis=1),
            MaxPooling2D((3,3)),
            Flatten(),
            Dense(200, activation='relu'),
            BatchNormalization(),
            Dense(10, activation='softmax')
        ])

    model.compile(Adam(lr=1e-4), loss='categorical_crossentropy', metrics=['accuracy'])
    model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches, 
                     nb_val_samples=val_batches.nb_sample,verbose=2)
    model.optimizer.lr = 0.001
    model.fit_generator(batches, batches.nb_sample, nb_epoch=4, validation_data=val_batches, 
                     nb_val_samples=val_batches.nb_sample, verbose=2)
    return model

In [21]:
conv1(batches)


Epoch 1/2
36s - loss: 1.3849 - acc: 0.5933 - val_loss: 2.7263 - val_acc: 0.1680
Epoch 2/2
30s - loss: 0.3194 - acc: 0.9407 - val_loss: 2.1638 - val_acc: 0.2250
Epoch 1/4
35s - loss: 0.0939 - acc: 0.9913 - val_loss: 2.2107 - val_acc: 0.2560
Epoch 2/4
28s - loss: 0.0374 - acc: 0.9980 - val_loss: 2.4310 - val_acc: 0.2360
Epoch 3/4
27s - loss: 0.0193 - acc: 1.0000 - val_loss: 2.6299 - val_acc: 0.2060
Epoch 4/4
30s - loss: 0.0124 - acc: 1.0000 - val_loss: 2.6834 - val_acc: 0.2040
Out[21]:
<keras.models.Sequential at 0x7fe5f346a250>

The training set here is very rapidly reaching a very high accuracy. So if we could regularize this, perhaps we could get a reasonable result.

So, what kind of regularization should we try first? As we discussed in lesson 3, we should start with data augmentation.

Data augmentation

To find the best data augmentation parameters, we can try each type of data augmentation, one at a time. For each type, we can try four very different levels of augmentation, and see which is the best. In the steps below we've only kept the single best result we found. We're using the CNN we defined above, since we have already observed it can model the data quickly and accurately.

Width shift: move the image left and right -


In [22]:
gen_t = image.ImageDataGenerator(width_shift_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)


Found 1500 images belonging to 10 classes.

In [23]:
model = conv1(batches)


Epoch 1/2
34s - loss: 2.1510 - acc: 0.3267 - val_loss: 3.2290 - val_acc: 0.1190
Epoch 2/2
29s - loss: 1.0804 - acc: 0.6587 - val_loss: 2.5324 - val_acc: 0.1560
Epoch 1/4
35s - loss: 0.6985 - acc: 0.8053 - val_loss: 2.4333 - val_acc: 0.2180
Epoch 2/4
27s - loss: 0.5371 - acc: 0.8507 - val_loss: 2.6678 - val_acc: 0.1730
Epoch 3/4
30s - loss: 0.3771 - acc: 0.8980 - val_loss: 2.7519 - val_acc: 0.1220
Epoch 4/4
27s - loss: 0.3353 - acc: 0.9120 - val_loss: 2.9800 - val_acc: 0.1350

Height shift: move the image up and down -


In [24]:
gen_t = image.ImageDataGenerator(height_shift_range=0.05)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)


Found 1500 images belonging to 10 classes.

In [25]:
model = conv1(batches)


Epoch 1/2
37s - loss: 1.7949 - acc: 0.4587 - val_loss: 2.2227 - val_acc: 0.2310
Epoch 2/2
30s - loss: 0.7039 - acc: 0.8080 - val_loss: 2.0922 - val_acc: 0.2880
Epoch 1/4
36s - loss: 0.3266 - acc: 0.9313 - val_loss: 2.2060 - val_acc: 0.2270
Epoch 2/4
27s - loss: 0.2357 - acc: 0.9473 - val_loss: 2.4529 - val_acc: 0.1500
Epoch 3/4
28s - loss: 0.1579 - acc: 0.9713 - val_loss: 2.7069 - val_acc: 0.1360
Epoch 4/4
28s - loss: 0.1005 - acc: 0.9860 - val_loss: 2.9130 - val_acc: 0.1210

Random shear angles (max in radians) -


In [26]:
gen_t = image.ImageDataGenerator(shear_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)


Found 1500 images belonging to 10 classes.

In [27]:
model = conv1(batches)


Epoch 1/2
35s - loss: 1.6927 - acc: 0.4767 - val_loss: 2.5305 - val_acc: 0.2040
Epoch 2/2
27s - loss: 0.4436 - acc: 0.9033 - val_loss: 2.0879 - val_acc: 0.2800
Epoch 1/4
35s - loss: 0.1688 - acc: 0.9800 - val_loss: 2.0308 - val_acc: 0.3380
Epoch 2/4
28s - loss: 0.0928 - acc: 0.9867 - val_loss: 2.0769 - val_acc: 0.3080
Epoch 3/4
28s - loss: 0.0541 - acc: 0.9947 - val_loss: 2.0909 - val_acc: 0.2780
Epoch 4/4
27s - loss: 0.0433 - acc: 0.9947 - val_loss: 2.0701 - val_acc: 0.2920

Rotation: max in degrees -


In [28]:
gen_t = image.ImageDataGenerator(rotation_range=15)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)


Found 1500 images belonging to 10 classes.

In [29]:
model = conv1(batches)


Epoch 1/2
35s - loss: 2.0479 - acc: 0.3473 - val_loss: 2.3764 - val_acc: 0.2180
Epoch 2/2
27s - loss: 0.9034 - acc: 0.7300 - val_loss: 2.3233 - val_acc: 0.2220
Epoch 1/4
35s - loss: 0.5280 - acc: 0.8520 - val_loss: 2.8386 - val_acc: 0.1870
Epoch 2/4
28s - loss: 0.3505 - acc: 0.9207 - val_loss: 3.2220 - val_acc: 0.2010
Epoch 3/4
31s - loss: 0.2355 - acc: 0.9447 - val_loss: 3.5971 - val_acc: 0.1860
Epoch 4/4
27s - loss: 0.1986 - acc: 0.9567 - val_loss: 3.8511 - val_acc: 0.2030

Channel shift: randomly changing the R,G,B colors -


In [30]:
gen_t = image.ImageDataGenerator(channel_shift_range=20)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)


Found 1500 images belonging to 10 classes.

In [31]:
model = conv1(batches)


Epoch 1/2
41s - loss: 1.6864 - acc: 0.4947 - val_loss: 2.6202 - val_acc: 0.2260
Epoch 2/2
28s - loss: 0.3524 - acc: 0.9347 - val_loss: 2.0925 - val_acc: 0.2300
Epoch 1/4
36s - loss: 0.1040 - acc: 0.9900 - val_loss: 2.1689 - val_acc: 0.2370
Epoch 2/4
27s - loss: 0.0508 - acc: 0.9987 - val_loss: 2.1892 - val_acc: 0.3440
Epoch 3/4
27s - loss: 0.0288 - acc: 1.0000 - val_loss: 2.2397 - val_acc: 0.3390
Epoch 4/4
28s - loss: 0.0182 - acc: 1.0000 - val_loss: 2.2936 - val_acc: 0.3180

And finally, putting it all together!


In [32]:
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05, 
                shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)


Found 1500 images belonging to 10 classes.

In [33]:
model = conv1(batches)


Epoch 1/2
38s - loss: 2.5631 - acc: 0.2293 - val_loss: 3.0057 - val_acc: 0.1550
Epoch 2/2
32s - loss: 1.7291 - acc: 0.4287 - val_loss: 2.2073 - val_acc: 0.2120
Epoch 1/4
36s - loss: 1.3820 - acc: 0.5447 - val_loss: 2.2495 - val_acc: 0.2630
Epoch 2/4
27s - loss: 1.2662 - acc: 0.5813 - val_loss: 2.3502 - val_acc: 0.2640
Epoch 3/4
31s - loss: 1.1277 - acc: 0.6413 - val_loss: 2.5250 - val_acc: 0.2360
Epoch 4/4
29s - loss: 1.0181 - acc: 0.6753 - val_loss: 2.5151 - val_acc: 0.2470

At first glance, this isn't looking encouraging, since the validation set is poor and getting worse. But the training set is getting better, and still has a long way to go in accuracy - so we should try annealing our learning rate and running more epochs, before we make a decisions.


In [34]:
model.optimizer.lr = 0.0001
model.fit_generator(batches, batches.nb_sample, nb_epoch=5, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample, verbose=2)


Epoch 1/5
37s - loss: 0.9286 - acc: 0.7040 - val_loss: 2.7108 - val_acc: 0.2080
Epoch 2/5
27s - loss: 0.8599 - acc: 0.7207 - val_loss: 2.7431 - val_acc: 0.2200
Epoch 3/5
27s - loss: 0.8876 - acc: 0.7160 - val_loss: 2.6309 - val_acc: 0.2320
Epoch 4/5
28s - loss: 0.8009 - acc: 0.7387 - val_loss: 2.8189 - val_acc: 0.2250
Epoch 5/5
27s - loss: 0.7203 - acc: 0.7807 - val_loss: 2.7457 - val_acc: 0.2740
Out[34]:
<keras.callbacks.History at 0x7fe5e6623b90>

Lucky we tried that - we starting to make progress! Let's keep going.


In [35]:
model.fit_generator(batches, batches.nb_sample, nb_epoch=25, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample, verbose=2)


Epoch 1/25
37s - loss: 0.7046 - acc: 0.7820 - val_loss: 2.7219 - val_acc: 0.3040
Epoch 2/25
32s - loss: 0.6566 - acc: 0.7907 - val_loss: 2.4826 - val_acc: 0.3460
Epoch 3/25
26s - loss: 0.6422 - acc: 0.7980 - val_loss: 2.3274 - val_acc: 0.3710
Epoch 4/25
29s - loss: 0.5939 - acc: 0.8180 - val_loss: 2.4956 - val_acc: 0.3900
Epoch 5/25
29s - loss: 0.5548 - acc: 0.8273 - val_loss: 1.9931 - val_acc: 0.4520
Epoch 6/25
27s - loss: 0.5461 - acc: 0.8240 - val_loss: 2.2871 - val_acc: 0.4270
Epoch 7/25
28s - loss: 0.5183 - acc: 0.8433 - val_loss: 2.0971 - val_acc: 0.4540
Epoch 8/25
27s - loss: 0.5106 - acc: 0.8453 - val_loss: 2.0001 - val_acc: 0.4830
Epoch 9/25
27s - loss: 0.4619 - acc: 0.8533 - val_loss: 1.9045 - val_acc: 0.5010
Epoch 10/25
31s - loss: 0.4641 - acc: 0.8593 - val_loss: 1.9041 - val_acc: 0.4800
Epoch 11/25
32s - loss: 0.4220 - acc: 0.8793 - val_loss: 1.7995 - val_acc: 0.4740
Epoch 12/25
32s - loss: 0.4320 - acc: 0.8580 - val_loss: 1.7505 - val_acc: 0.4780
Epoch 13/25
28s - loss: 0.4021 - acc: 0.8807 - val_loss: 1.8403 - val_acc: 0.5030
Epoch 14/25
26s - loss: 0.3838 - acc: 0.8847 - val_loss: 1.6980 - val_acc: 0.5310
Epoch 15/25
27s - loss: 0.3489 - acc: 0.8953 - val_loss: 1.9515 - val_acc: 0.4960
Epoch 16/25
27s - loss: 0.3780 - acc: 0.8927 - val_loss: 1.7491 - val_acc: 0.5390
Epoch 17/25
27s - loss: 0.3561 - acc: 0.8967 - val_loss: 1.7220 - val_acc: 0.5100
Epoch 18/25
29s - loss: 0.3676 - acc: 0.8887 - val_loss: 1.9092 - val_acc: 0.4640
Epoch 19/25
29s - loss: 0.3519 - acc: 0.9047 - val_loss: 1.9436 - val_acc: 0.5170
Epoch 20/25
32s - loss: 0.3465 - acc: 0.9000 - val_loss: 2.1267 - val_acc: 0.4570
Epoch 21/25
28s - loss: 0.3331 - acc: 0.9007 - val_loss: 1.8302 - val_acc: 0.5220
Epoch 22/25
29s - loss: 0.3093 - acc: 0.9147 - val_loss: 1.8320 - val_acc: 0.5050
Epoch 23/25
30s - loss: 0.2842 - acc: 0.9160 - val_loss: 1.7722 - val_acc: 0.5200
Epoch 24/25
29s - loss: 0.3165 - acc: 0.9033 - val_loss: 1.7929 - val_acc: 0.5010
Epoch 25/25
27s - loss: 0.2917 - acc: 0.9153 - val_loss: 1.8177 - val_acc: 0.4950
Out[35]:
<keras.callbacks.History at 0x7fe5e6627110>

Amazingly, using nothing but a small sample, a simple (not pre-trained) model with no dropout, and data augmentation, we're getting results that would get us into the top 50% of the competition! This looks like a great foundation for our futher experiments.

To go further, we'll need to use the whole dataset, since dropout and data volumes are very related, so we can't tweak dropout without using all the data.