Enter State Farm


In [1]:
from __future__ import division, print_function
%matplotlib inline
# path = "data/state/"
path = "data/state/sample/"
from importlib import reload  # Python 3
import utils; reload(utils)
from utils import *
from IPython.display import FileLink


Using cuDNN version 5105 on context None
Mapped name None to device cuda0: GeForce GTX TITAN X (0000:04:00.0)
Using Theano backend.

In [2]:
batch_size=64
#batch_size=1

Create sample

The following assumes you've already created your validation set - remember that the training and validation set should contain different drivers, as mentioned on the Kaggle competition page.


In [ ]:
%cd data/state

In [ ]:
%cd train

In [ ]:
%mkdir ../sample
%mkdir ../sample/train
%mkdir ../sample/valid

In [ ]:
for d in glob('c?'):
    os.mkdir('../sample/train/'+d)
    os.mkdir('../sample/valid/'+d)

In [ ]:
from shutil import copyfile

In [ ]:
g = glob('c?/*.jpg')
shuf = np.random.permutation(g)
for i in range(1500): copyfile(shuf[i], '../sample/train/' + shuf[i])

In [ ]:
%cd ../valid

In [ ]:
g = glob('c?/*.jpg')
shuf = np.random.permutation(g)
for i in range(1000): copyfile(shuf[i], '../sample/valid/' + shuf[i])

In [ ]:
%cd ../../../..

In [ ]:
%mkdir data/state/results

In [ ]:
%mkdir data/state/sample/test

Create batches


In [3]:
batches = get_batches(path+'train', batch_size=batch_size)
val_batches = get_batches(path+'valid', batch_size=batch_size*2, shuffle=False)


Found 1500 images belonging to 10 classes.
Found 1000 images belonging to 10 classes.

In [4]:
(val_classes, trn_classes, val_labels, trn_labels, val_filenames, filenames,
    test_filename) = get_classes(path)


Found 1500 images belonging to 10 classes.
Found 1000 images belonging to 10 classes.
Found 1000 images belonging to 1 classes.

In [5]:
steps_per_epoch = int(np.ceil(batches.samples/batch_size))
validation_steps = int(np.ceil(val_batches.samples/(batch_size*2)))

Basic models

Linear model

First, we try the simplest model and use default parameters. Note the trick of making the first layer a batchnorm layer - that way we don't have to worry about normalizing the input ourselves.


In [6]:
model = Sequential([
        BatchNormalization(axis=1, input_shape=(3,224,224)),
        Flatten(),
        Dense(10, activation='softmax')
    ])

As you can see below, this training is going nowhere...


In [7]:
model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, steps_per_epoch, epochs=2, validation_data=val_batches, 
                 validation_steps=validation_steps)


Epoch 1/2
24/24 [==============================] - 10s - loss: 13.4302 - acc: 0.1274 - val_loss: 13.7206 - val_acc: 0.1470
Epoch 2/2
24/24 [==============================] - 8s - loss: 13.4243 - acc: 0.1644 - val_loss: 14.0802 - val_acc: 0.1240
Out[7]:
<keras.callbacks.History at 0x7f04ad6690b8>

Let's first check the number of parameters to see that there's enough parameters to find some useful relationships:


In [8]:
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
batch_normalization_1 (Batch (None, 3, 224, 224)       12        
_________________________________________________________________
flatten_1 (Flatten)          (None, 150528)            0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1505290   
=================================================================
Total params: 1,505,302
Trainable params: 1,505,296
Non-trainable params: 6
_________________________________________________________________

Over 1.5 million parameters - that should be enough. Incidentally, it's worth checking you understand why this is the number of parameters in this layer:


In [9]:
10*3*224*224


Out[9]:
1505280

Since we have a simple model with no regularization and plenty of parameters, it seems most likely that our learning rate is too high. Perhaps it is jumping to a solution where it predicts one or two classes with high confidence, so that it can give a zero prediction to as many classes as possible - that's the best approach for a model that is no better than random, and there is likely to be where we would end up with a high learning rate. So let's check:


In [10]:
np.round(model.predict_generator(batches, int(np.ceil(batches.samples/batch_size)))[:10],2)


Out[10]:
array([[ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.]], dtype=float32)

Our hypothesis was correct. It's nearly always predicting class 1 or 6, with very high confidence. So let's try a lower learning rate:


In [11]:
model = Sequential([
        BatchNormalization(axis=1, input_shape=(3,224,224)),
        Flatten(),
        Dense(10, activation='softmax')
    ])
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, steps_per_epoch, epochs=2, validation_data=val_batches, 
                 validation_steps=validation_steps)


Epoch 1/2
24/24 [==============================] - 10s - loss: 2.2572 - acc: 0.2308 - val_loss: 3.3144 - val_acc: 0.2690
Epoch 2/2
24/24 [==============================] - 8s - loss: 1.7193 - acc: 0.4352 - val_loss: 2.3427 - val_acc: 0.3240
Out[11]:
<keras.callbacks.History at 0x7f04a7516358>

Great - we found our way out of that hole... Now we can increase the learning rate and see where we can get to.


In [12]:
model.optimizer.lr=0.001

In [13]:
model.fit_generator(batches, steps_per_epoch, epochs=4, validation_data=val_batches, 
                 validation_steps=validation_steps)


Epoch 1/4
24/24 [==============================] - 10s - loss: 1.3814 - acc: 0.5872 - val_loss: 1.6773 - val_acc: 0.4020
Epoch 2/4
24/24 [==============================] - 8s - loss: 1.1442 - acc: 0.7046 - val_loss: 1.4093 - val_acc: 0.5190
Epoch 3/4
24/24 [==============================] - 8s - loss: 0.9823 - acc: 0.7488 - val_loss: 1.1107 - val_acc: 0.6370
Epoch 4/4
24/24 [==============================] - 8s - loss: 0.8483 - acc: 0.8155 - val_loss: 1.0131 - val_acc: 0.6850
Out[13]:
<keras.callbacks.History at 0x7f04ac07a588>

We're stabilizing at validation accuracy of 0.39. Not great, but a lot better than random. Before moving on, let's check that our validation set on the sample is large enough that it gives consistent results:


In [14]:
rnd_batches = get_batches(path+'valid', batch_size=batch_size*2, shuffle=True)


Found 1000 images belonging to 10 classes.

In [15]:
val_res = [model.evaluate_generator(rnd_batches, int(np.ceil(rnd_batches.samples/(batch_size*2)))) for i in range(10)]
np.round(val_res, 2)


Out[15]:
array([[ 1.01,  0.68],
       [ 1.02,  0.68],
       [ 1.01,  0.68],
       [ 1.01,  0.69],
       [ 0.97,  0.7 ],
       [ 1.02,  0.69],
       [ 1.01,  0.68],
       [ 1.03,  0.68],
       [ 1.01,  0.69],
       [ 1.03,  0.68]])

Yup, pretty consistent - if we see improvements of 3% or more, it's probably not random, based on the above samples.

L2 regularization

The previous model is over-fitting a lot, but we can't use dropout since we only have one layer. We can try to decrease overfitting in our model by adding l2 regularization (i.e. add the sum of squares of the weights to our loss function):


In [16]:
model = Sequential([
        BatchNormalization(axis=1, input_shape=(3,224,224)),
        Flatten(),
        Dense(10, activation='softmax', kernel_regularizer=l2(0.01))
    ])
model.compile(Adam(lr=10e-5), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, steps_per_epoch, epochs=2, validation_data=val_batches, 
                 validation_steps=validation_steps)


Epoch 1/2
24/24 [==============================] - 10s - loss: 6.2899 - acc: 0.2615 - val_loss: 7.8386 - val_acc: 0.2900
Epoch 2/2
24/24 [==============================] - 8s - loss: 3.1825 - acc: 0.5653 - val_loss: 3.3825 - val_acc: 0.5130
Out[16]:
<keras.callbacks.History at 0x7f04a68de0f0>

In [17]:
model.optimizer.lr=0.001

In [18]:
model.fit_generator(batches, steps_per_epoch, epochs=4, validation_data=val_batches, 
                 validation_steps=validation_steps)


Epoch 1/4
24/24 [==============================] - 10s - loss: 2.1131 - acc: 0.7638 - val_loss: 2.5903 - val_acc: 0.6770
Epoch 2/4
24/24 [==============================] - 8s - loss: 1.4320 - acc: 0.8279 - val_loss: 1.8212 - val_acc: 0.7040
Epoch 3/4
24/24 [==============================] - 8s - loss: 0.9213 - acc: 0.8920 - val_loss: 1.2848 - val_acc: 0.7750
Epoch 4/4
24/24 [==============================] - 8s - loss: 0.6609 - acc: 0.9450 - val_loss: 1.0115 - val_acc: 0.8590
Out[18]:
<keras.callbacks.History at 0x7f04ac07a898>

Looks like we can get a bit over 50% accuracy this way. This will be a good benchmark for our future models - if we can't beat 50%, then we're not even beating a linear model trained on a sample, so we'll know that's not a good approach.

Single hidden layer

The next simplest model is to add a single hidden layer.


In [19]:
model = Sequential([
        BatchNormalization(axis=1, input_shape=(3,224,224)),
        Flatten(),
        Dense(100, activation='relu'),
        BatchNormalization(),
        Dense(10, activation='softmax')
    ])
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, steps_per_epoch, epochs=2, validation_data=val_batches, 
                 validation_steps=validation_steps)

model.optimizer.lr = 0.01
model.fit_generator(batches, steps_per_epoch, epochs=5, validation_data=val_batches, 
                 validation_steps=validation_steps)


Epoch 1/2
24/24 [==============================] - 10s - loss: 2.0228 - acc: 0.3469 - val_loss: 6.0921 - val_acc: 0.2160
Epoch 2/2
24/24 [==============================] - 7s - loss: 1.1177 - acc: 0.6703 - val_loss: 2.7110 - val_acc: 0.3480
Epoch 1/5
24/24 [==============================] - 10s - loss: 0.6942 - acc: 0.8556 - val_loss: 1.0824 - val_acc: 0.6570
Epoch 2/5
24/24 [==============================] - 8s - loss: 0.4111 - acc: 0.9459 - val_loss: 0.8392 - val_acc: 0.7540
Epoch 3/5
24/24 [==============================] - 8s - loss: 0.3006 - acc: 0.9665 - val_loss: 0.6148 - val_acc: 0.8480
Epoch 4/5
24/24 [==============================] - 8s - loss: 0.1949 - acc: 0.9901 - val_loss: 0.5960 - val_acc: 0.8730
Epoch 5/5
24/24 [==============================] - 7s - loss: 0.1479 - acc: 0.9927 - val_loss: 0.5734 - val_acc: 0.8510
Out[19]:
<keras.callbacks.History at 0x7f04a4cd6320>

Not looking very encouraging... which isn't surprising since we know that CNNs are a much better choice for computer vision problems. So we'll try one.

Single conv layer

2 conv layers with max pooling followed by a simple dense network is a good simple CNN to start with:


In [20]:
def conv1(batches):
    model = Sequential([
            BatchNormalization(axis=1, input_shape=(3,224,224)),
            Conv2D(32,(3,3), activation='relu'),
            BatchNormalization(axis=1),
            MaxPooling2D((3,3)),
            Conv2D(64,(3,3), activation='relu'),
            BatchNormalization(axis=1),
            MaxPooling2D((3,3)),
            Flatten(),
            Dense(200, activation='relu'),
            BatchNormalization(),
            Dense(10, activation='softmax')
        ])

    model.compile(Adam(lr=1e-4), loss='categorical_crossentropy', metrics=['accuracy'])
    model.fit_generator(batches, steps_per_epoch, epochs=2, validation_data=val_batches, 
                     validation_steps=validation_steps)
    model.optimizer.lr = 0.001
    model.fit_generator(batches, steps_per_epoch, epochs=4, validation_data=val_batches, 
                     validation_steps=validation_steps)
    return model

In [21]:
conv1(batches)


Epoch 1/2
24/24 [==============================] - 11s - loss: 1.6939 - acc: 0.4836 - val_loss: 2.0490 - val_acc: 0.2950
Epoch 2/2
24/24 [==============================] - 8s - loss: 0.4044 - acc: 0.9246 - val_loss: 1.7653 - val_acc: 0.3430
Epoch 1/4
24/24 [==============================] - 10s - loss: 0.1155 - acc: 0.9907 - val_loss: 1.6874 - val_acc: 0.4810
Epoch 2/4
24/24 [==============================] - 8s - loss: 0.0542 - acc: 0.9987 - val_loss: 1.6647 - val_acc: 0.4430
Epoch 3/4
24/24 [==============================] - 8s - loss: 0.0287 - acc: 1.0000 - val_loss: 1.6646 - val_acc: 0.3840
Epoch 4/4
24/24 [==============================] - 8s - loss: 0.0181 - acc: 1.0000 - val_loss: 1.6312 - val_acc: 0.3990
Out[21]:
<keras.models.Sequential at 0x7f04a68de668>

The training set here is very rapidly reaching a very high accuracy. So if we could regularize this, perhaps we could get a reasonable result.

So, what kind of regularization should we try first? As we discussed in lesson 3, we should start with data augmentation.

Data augmentation

To find the best data augmentation parameters, we can try each type of data augmentation, one at a time. For each type, we can try four very different levels of augmentation, and see which is the best. In the steps below we've only kept the single best result we found. We're using the CNN we defined above, since we have already observed it can model the data quickly and accurately.

Width shift: move the image left and right -


In [22]:
gen_t = image.ImageDataGenerator(width_shift_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)


Found 1500 images belonging to 10 classes.

In [23]:
model = conv1(batches)


Epoch 1/2
24/24 [==============================] - 18s - loss: 2.1641 - acc: 0.3462 - val_loss: 3.6318 - val_acc: 0.1350
Epoch 2/2
24/24 [==============================] - 13s - loss: 1.2426 - acc: 0.6026 - val_loss: 2.1089 - val_acc: 0.1990
Epoch 1/4
24/24 [==============================] - 18s - loss: 0.8824 - acc: 0.7240 - val_loss: 1.9272 - val_acc: 0.2850
Epoch 2/4
24/24 [==============================] - 13s - loss: 0.6863 - acc: 0.8165 - val_loss: 2.0318 - val_acc: 0.2830
Epoch 3/4
24/24 [==============================] - 13s - loss: 0.4905 - acc: 0.8665 - val_loss: 2.1731 - val_acc: 0.2640
Epoch 4/4
24/24 [==============================] - 13s - loss: 0.3563 - acc: 0.9155 - val_loss: 2.4367 - val_acc: 0.2240

Height shift: move the image up and down -


In [24]:
gen_t = image.ImageDataGenerator(height_shift_range=0.05)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)


Found 1500 images belonging to 10 classes.

In [25]:
model = conv1(batches)


Epoch 1/2
24/24 [==============================] - 18s - loss: 1.8739 - acc: 0.4134 - val_loss: 1.7773 - val_acc: 0.3910
Epoch 2/2
24/24 [==============================] - 13s - loss: 0.7271 - acc: 0.7914 - val_loss: 1.8788 - val_acc: 0.3680
Epoch 1/4
24/24 [==============================] - 18s - loss: 0.3920 - acc: 0.9034 - val_loss: 2.0485 - val_acc: 0.3590
Epoch 2/4
24/24 [==============================] - 13s - loss: 0.2723 - acc: 0.9324 - val_loss: 2.1801 - val_acc: 0.4220
Epoch 3/4
24/24 [==============================] - 13s - loss: 0.1691 - acc: 0.9621 - val_loss: 2.2416 - val_acc: 0.4360
Epoch 4/4
24/24 [==============================] - 13s - loss: 0.1161 - acc: 0.9805 - val_loss: 2.2153 - val_acc: 0.4580

Random shear angles (max in radians) -


In [26]:
gen_t = image.ImageDataGenerator(shear_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)


Found 1500 images belonging to 10 classes.

In [27]:
model = conv1(batches)


Epoch 1/2
24/24 [==============================] - 18s - loss: 1.6858 - acc: 0.4727 - val_loss: 2.1458 - val_acc: 0.2650
Epoch 2/2
24/24 [==============================] - 14s - loss: 0.4867 - acc: 0.8826 - val_loss: 1.9226 - val_acc: 0.2640
Epoch 1/4
24/24 [==============================] - 18s - loss: 0.1993 - acc: 0.9644 - val_loss: 1.9421 - val_acc: 0.2510
Epoch 2/4
24/24 [==============================] - 13s - loss: 0.1044 - acc: 0.9876 - val_loss: 1.8168 - val_acc: 0.3070
Epoch 3/4
24/24 [==============================] - 13s - loss: 0.0658 - acc: 0.9948 - val_loss: 1.7465 - val_acc: 0.3590
Epoch 4/4
24/24 [==============================] - 13s - loss: 0.0401 - acc: 0.9974 - val_loss: 1.6869 - val_acc: 0.4240

Rotation: max in degrees -


In [28]:
gen_t = image.ImageDataGenerator(rotation_range=15)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)


Found 1500 images belonging to 10 classes.

In [29]:
model = conv1(batches)


Epoch 1/2
24/24 [==============================] - 18s - loss: 2.0859 - acc: 0.3415 - val_loss: 2.3138 - val_acc: 0.2160
Epoch 2/2
24/24 [==============================] - 13s - loss: 0.9637 - acc: 0.7060 - val_loss: 1.8826 - val_acc: 0.3500
Epoch 1/4
24/24 [==============================] - 18s - loss: 0.5810 - acc: 0.8524 - val_loss: 2.0551 - val_acc: 0.3640
Epoch 2/4
24/24 [==============================] - 13s - loss: 0.4360 - acc: 0.8870 - val_loss: 2.4008 - val_acc: 0.2470
Epoch 3/4
24/24 [==============================] - 13s - loss: 0.3467 - acc: 0.9061 - val_loss: 2.7716 - val_acc: 0.1740
Epoch 4/4
24/24 [==============================] - 13s - loss: 0.2459 - acc: 0.9401 - val_loss: 3.2880 - val_acc: 0.1260

Channel shift: randomly changing the R,G,B colors -


In [30]:
gen_t = image.ImageDataGenerator(channel_shift_range=20)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)


Found 1500 images belonging to 10 classes.

In [31]:
model = conv1(batches)


Epoch 1/2
24/24 [==============================] - 11s - loss: 1.6874 - acc: 0.4942 - val_loss: 1.7698 - val_acc: 0.3940
Epoch 2/2
24/24 [==============================] - 8s - loss: 0.4305 - acc: 0.9164 - val_loss: 1.6881 - val_acc: 0.4630
Epoch 1/4
24/24 [==============================] - 11s - loss: 0.1200 - acc: 0.9901 - val_loss: 1.8431 - val_acc: 0.4110
Epoch 2/4
24/24 [==============================] - 8s - loss: 0.0602 - acc: 0.9961 - val_loss: 2.0336 - val_acc: 0.3320
Epoch 3/4
24/24 [==============================] - 8s - loss: 0.0324 - acc: 1.0000 - val_loss: 2.2320 - val_acc: 0.2700
Epoch 4/4
24/24 [==============================] - 8s - loss: 0.0186 - acc: 1.0000 - val_loss: 2.2715 - val_acc: 0.2910

And finally, putting it all together!


In [32]:
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05, 
                shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)


Found 1500 images belonging to 10 classes.

In [33]:
model = conv1(batches)


Epoch 1/2
24/24 [==============================] - 19s - loss: 2.5759 - acc: 0.2156 - val_loss: 2.4766 - val_acc: 0.1540
Epoch 2/2
24/24 [==============================] - 14s - loss: 1.8040 - acc: 0.4054 - val_loss: 2.1131 - val_acc: 0.2040
Epoch 1/4
24/24 [==============================] - 19s - loss: 1.5062 - acc: 0.4963 - val_loss: 2.1864 - val_acc: 0.1840
Epoch 2/4
24/24 [==============================] - 14s - loss: 1.3706 - acc: 0.5554 - val_loss: 2.3007 - val_acc: 0.1550
Epoch 3/4
24/24 [==============================] - 14s - loss: 1.2591 - acc: 0.5813 - val_loss: 2.4391 - val_acc: 0.1760
Epoch 4/4
24/24 [==============================] - 14s - loss: 1.1561 - acc: 0.6277 - val_loss: 2.3749 - val_acc: 0.2060

At first glance, this isn't looking encouraging, since the validation set is poor and getting worse. But the training set is getting better, and still has a long way to go in accuracy - so we should try annealing our learning rate and running more epochs, before we make a decisions.


In [34]:
model.optimizer.lr = 0.0001
model.fit_generator(batches, steps_per_epoch, epochs=5, validation_data=val_batches, 
                 validation_steps=validation_steps)


Epoch 1/5
24/24 [==============================] - 20s - loss: 1.0816 - acc: 0.6562 - val_loss: 2.4125 - val_acc: 0.2090
Epoch 2/5
24/24 [==============================] - 14s - loss: 1.0067 - acc: 0.6800 - val_loss: 2.4819 - val_acc: 0.2110
Epoch 3/5
24/24 [==============================] - 14s - loss: 0.9060 - acc: 0.7033 - val_loss: 2.3600 - val_acc: 0.2510
Epoch 4/5
24/24 [==============================] - 14s - loss: 0.8923 - acc: 0.7254 - val_loss: 2.2154 - val_acc: 0.2690
Epoch 5/5
24/24 [==============================] - 14s - loss: 0.8140 - acc: 0.7274 - val_loss: 2.1111 - val_acc: 0.2910
Out[34]:
<keras.callbacks.History at 0x7f049751b240>

Lucky we tried that - we starting to make progress! Let's keep going.


In [35]:
model.fit_generator(batches, steps_per_epoch, epochs=25, validation_data=val_batches, 
                 validation_steps=validation_steps)


Epoch 1/25
24/24 [==============================] - 19s - loss: 0.7860 - acc: 0.7533 - val_loss: 1.8747 - val_acc: 0.3250
Epoch 2/25
24/24 [==============================] - 14s - loss: 0.7967 - acc: 0.7391 - val_loss: 1.6263 - val_acc: 0.4020
Epoch 3/25
24/24 [==============================] - 14s - loss: 0.7144 - acc: 0.7788 - val_loss: 1.3234 - val_acc: 0.4820
Epoch 4/25
24/24 [==============================] - 14s - loss: 0.6868 - acc: 0.7835 - val_loss: 1.1697 - val_acc: 0.5590
Epoch 5/25
24/24 [==============================] - 14s - loss: 0.6759 - acc: 0.7959 - val_loss: 0.9414 - val_acc: 0.6800
Epoch 6/25
24/24 [==============================] - 14s - loss: 0.6455 - acc: 0.8054 - val_loss: 0.8549 - val_acc: 0.7200
Epoch 7/25
24/24 [==============================] - 14s - loss: 0.6204 - acc: 0.8165 - val_loss: 0.8666 - val_acc: 0.6840
Epoch 8/25
24/24 [==============================] - 14s - loss: 0.5875 - acc: 0.8274 - val_loss: 0.6926 - val_acc: 0.7610
Epoch 9/25
24/24 [==============================] - 14s - loss: 0.5502 - acc: 0.8370 - val_loss: 0.5564 - val_acc: 0.8390
Epoch 10/25
24/24 [==============================] - 14s - loss: 0.5226 - acc: 0.8431 - val_loss: 0.5468 - val_acc: 0.8310
Epoch 11/25
24/24 [==============================] - 14s - loss: 0.5155 - acc: 0.8448 - val_loss: 0.4691 - val_acc: 0.8610
Epoch 12/25
24/24 [==============================] - 14s - loss: 0.4826 - acc: 0.8684 - val_loss: 0.5031 - val_acc: 0.8460
Epoch 13/25
24/24 [==============================] - 14s - loss: 0.5066 - acc: 0.8446 - val_loss: 0.4253 - val_acc: 0.8930
Epoch 14/25
24/24 [==============================] - 14s - loss: 0.4762 - acc: 0.8701 - val_loss: 0.3762 - val_acc: 0.8910
Epoch 15/25
24/24 [==============================] - 14s - loss: 0.5109 - acc: 0.8491 - val_loss: 0.3611 - val_acc: 0.9040
Epoch 16/25
24/24 [==============================] - 14s - loss: 0.4498 - acc: 0.8640 - val_loss: 0.3275 - val_acc: 0.9080
Epoch 17/25
24/24 [==============================] - 14s - loss: 0.4212 - acc: 0.8746 - val_loss: 0.2903 - val_acc: 0.9230
Epoch 18/25
24/24 [==============================] - 14s - loss: 0.4007 - acc: 0.8870 - val_loss: 0.3077 - val_acc: 0.9080
Epoch 19/25
24/24 [==============================] - 14s - loss: 0.3985 - acc: 0.8855 - val_loss: 0.2599 - val_acc: 0.9290
Epoch 20/25
24/24 [==============================] - 14s - loss: 0.3672 - acc: 0.8964 - val_loss: 0.2799 - val_acc: 0.9160
Epoch 21/25
24/24 [==============================] - 14s - loss: 0.3909 - acc: 0.8902 - val_loss: 0.2556 - val_acc: 0.9320
Epoch 22/25
24/24 [==============================] - 14s - loss: 0.3793 - acc: 0.8945 - val_loss: 0.2340 - val_acc: 0.9340
Epoch 23/25
24/24 [==============================] - 14s - loss: 0.3739 - acc: 0.9006 - val_loss: 0.2314 - val_acc: 0.9360
Epoch 24/25
24/24 [==============================] - 14s - loss: 0.3905 - acc: 0.8929 - val_loss: 0.2551 - val_acc: 0.9280
Epoch 25/25
24/24 [==============================] - 14s - loss: 0.3701 - acc: 0.8953 - val_loss: 0.2585 - val_acc: 0.9250
Out[35]:
<keras.callbacks.History at 0x7f049751b0b8>

Amazingly, using nothing but a small sample, a simple (not pre-trained) model with no dropout, and data augmentation, we're getting results that would get us into the top 50% of the competition! This looks like a great foundation for our futher experiments.

To go further, we'll need to use the whole dataset, since dropout and data volumes are very related, so we can't tweak dropout without using all the data.


In [ ]: