Enter State Farm

If you want some tutorial, follow the video from Lesson 48:50

Import necessary libraries


In [1]:
from theano.sandbox import cuda
cuda.use('gpu1')


WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10).  Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5103)
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10).  Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

WARNING (theano.sandbox.cuda): Ignoring call to use(1), GPU number 0 is already in use.

In [2]:
%matplotlib inline
from __future__ import division, print_function
from importlib import reload
import utils; reload(utils)
from utils import *
from IPython.display import FileLink

path = 'data/state/sample/'
batch_size = 64


Using Theano backend.

In [3]:
import os

First, let's get the data ready.

Move to the data folder and create a new directory for statefarm:

cd ~/fastai-notes/deeplearning1/nbs/data/
mkdir state
cd state

Download the data from kaggle. Note: don't forget to accept the competition rule.

kg download -u yingchi.pei@gmail.com -p FL199473/kag -c state-farm-distracted-driver-detection

Unzip the downloaded data:

unzip -q imgs.zip
unzip -q driver_imgs_list.csv.zip

Get the data ready

Create the validation set

Count the number of files in the current directory: find . -type f | wc -l

In each category's folder inside /train, there are around 2.2K images. So let's take 200 from each category to the validation set.


In [13]:
DATA_HOME_DIR='/home/ubuntu/fastai-notes/deeplearning1/nbs/data/state'
%cd $DATA_HOME_DIR
%mkdir valid
%mkdir results
%cd train


/home/ubuntu/fastai-notes/deeplearning1/nbs/data/state
mkdir: cannot create directory ‘valid’: File exists
mkdir: cannot create directory ‘results’: File exists
/home/ubuntu/fastai-notes/deeplearning1/nbs/data/state/train

In [11]:
for d in glob('c?'): os.mkdir('../valid/'+d)

In [14]:
g = glob('c?/*.jpg')
shuf = np.random.permutation(g)
for i in range(2000): os.rename(shuf[i], DATA_HOME_DIR+'/valid/'+shuf[i])

Create the sample set


In [16]:
%cd $DATA_HOME_DIR


/home/ubuntu/fastai-notes/deeplearning1/nbs/data/state

In [18]:
%mkdir -p sample/train
%mkdir -p sample/valid

In [19]:
%cd train


/home/ubuntu/fastai-notes/deeplearning1/nbs/data/state/train

In [20]:
from shutil import copyfile

for d in glob('c?'): 
    os.mkdir('../sample/train/'+d)
    os.mkdir('../sample/valid/'+d)

In [21]:
g = glob('c?/*.jpg')
shuf = np.random.permutation(g)
for i in range(1500): copyfile(shuf[i], '../sample/train/'+shuf[i])

In [22]:
%cd ../valid


/home/ubuntu/fastai-notes/deeplearning1/nbs/data/state/valid

In [23]:
g = glob('c?/*.jpg')
shuf = np.random.permutation(g)
for i in range(1000): copyfile(shuf[i], '../sample/valid/'+shuf[i])

In [24]:
%cd ../../..
%mkdir data/state/sample/test


/home/ubuntu/fastai-notes/deeplearning1/nbs

Create batches


In [4]:
batches = get_batches(path+'train', batch_size=batch_size)
val_batches = get_batches(path+'valid', batch_size=batch_size)


Found 1500 images belonging to 10 classes.
Found 1000 images belonging to 10 classes.
def get_batches(dirname, gen=image.ImageDataGenerator(), shuffle=True, 
                batch_size=4, class_mode='categorical', target_size=(224,224)):
    return gen.flow_from_directory(dirname, target_size=target_size, 
           class_mode=class_mode, shuffle=shuffle, batch_size=batch_size)

From keras:

fit_generator(self, generator, steps_per_epoch, epochs=1, verbose=1, callbacks=None, validation_data=None, validation_steps=None, class_weight=None, max_queue_size=10, workers=1, use_multiprocessing=False, initial_epoch=0)

In [26]:
??get_classes()

In [5]:
(val_classes, trn_classes, val_labels, trn_labels, val_filenames, filenames,
    test_filename) = get_classes(path)


Found 1500 images belonging to 10 classes.
Found 1000 images belonging to 10 classes.
Found 0 images belonging to 0 classes.

Basic models

Linear model

First, we try the simplest model and use default parameter.

Note the trick of making the first layer a batchnorm layer - so that we don't have to worry about normalizing the input.


In [28]:
model = Sequential([
        BatchNormalization(axis=1, input_shape=(3, 224, 224)),
        Flatten(),
        Dense(10, activation='softmax')
    ])

In [30]:
model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2,
                   validation_data=val_batches, nb_val_samples=val_batches.nb_sample)


Epoch 1/2
1500/1500 [==============================] - 32s - loss: 13.6292 - acc: 0.1500 - val_loss: 13.3279 - val_acc: 0.1700
Epoch 2/2
1500/1500 [==============================] - 24s - loss: 14.3084 - acc: 0.1107 - val_loss: 13.7446 - val_acc: 0.1460
Out[30]:
<keras.callbacks.History at 0x7fc0780cada0>

Ya, this model training is going nowhere... Let's check the number of parameters in our model


In [31]:
model.summary()


____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
batchnormalization_1 (BatchNorma (None, 3, 224, 224)   12          batchnormalization_input_1[0][0] 
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 150528)        0           batchnormalization_1[0][0]       
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 10)            1505290     flatten_1[0][0]                  
====================================================================================================
Total params: 1,505,302
Trainable params: 1,505,296
Non-trainable params: 6
____________________________________________________________________________________________________

Over 1.5 million parameters - that should be enough. Incidentally, it's worth checking you understand why this is the number of parameters in this layer:


In [32]:
10*3*224*224


Out[32]:
1505280

Since we have a simple model with no regularization and plenty of parameters, it seems most likely that our learning rate is too high. Perhaps it is jumping to a solution where it predicts 1 or 2 classes wigh high confidence, so that it can give a 0 prediction to as many classes as possible.


In [34]:
np.round(model.predict_generator(batches, batches.N)[:20], 2)


Out[34]:
array([[ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]], dtype=float32)

Our hypothesis was correct. It's nearly always predicting class 1 or 9.

From keras documentation, the default params for Adam() is: Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

So let's try a lower learning rate:


In [38]:
model = Sequential([
        BatchNormalization(axis=1, input_shape=(3, 224, 224)),
        Flatten(),
        Dense(10, activation='softmax')
    ])
model.compile(Adam(lr=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=4, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample)


Epoch 1/4
1500/1500 [==============================] - 32s - loss: 4.6020 - acc: 0.2793 - val_loss: 8.0088 - val_acc: 0.2700
Epoch 2/4
1500/1500 [==============================] - 24s - loss: 2.3559 - acc: 0.6407 - val_loss: 3.1900 - val_acc: 0.5210
Epoch 3/4
1500/1500 [==============================] - 24s - loss: 1.8422 - acc: 0.7927 - val_loss: 2.2388 - val_acc: 0.7370
Epoch 4/4
1500/1500 [==============================] - 24s - loss: 1.6826 - acc: 0.8467 - val_loss: 2.4954 - val_acc: 0.6940
Out[38]:
<keras.callbacks.History at 0x7fc070d6c668>

We're stabilizing at validation accuracy of 0.7. Much better than a random guess!

Before moving on, let's check that our validation set on the sample is large enough that it gives consistent results:


In [39]:
rnd_batches = get_batches(path+'valid', batch_size=batch_size*2, shuffle=True)


Found 1000 images belonging to 10 classes.

In [43]:
# Look at 10 randomly batches validation sets, and look at the val_acc
val_res = [model.evaluate_generator(rnd_batches, rnd_batches.nb_sample) for i in range(10)]
np.round(val_res, 2)


Out[43]:
array([[ 2.58,  0.68],
       [ 2.38,  0.69],
       [ 2.71,  0.67],
       [ 2.46,  0.71],
       [ 2.53,  0.68],
       [ 2.56,  0.68],
       [ 2.49,  0.69],
       [ 2.64,  0.67],
       [ 2.44,  0.71],
       [ 2.63,  0.67]])

L2 regularization

Up to now, the previous models are over-fitting. Note that we can't user dropout since we only have one simple linear layer. Let's try to descrease overfitting by adding l2 regularization.


In [12]:
model = Sequential([
        BatchNormalization(axis=1, input_shape=(3, 224, 224)),
        Flatten(),
        Dense(10, activation='softmax', W_regularizer=l2(0.01))
    ])
model.compile(Adam(lr=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=4, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample)


Epoch 1/4
1500/1500 [==============================] - 32s - loss: 5.2144 - acc: 0.2973 - val_loss: 9.9192 - val_acc: 0.2370
Epoch 2/4
1500/1500 [==============================] - 24s - loss: 2.6806 - acc: 0.5960 - val_loss: 4.2026 - val_acc: 0.4470
Epoch 3/4
1500/1500 [==============================] - 24s - loss: 1.3007 - acc: 0.7213 - val_loss: 1.6341 - val_acc: 0.6700
Epoch 4/4
1500/1500 [==============================] - 24s - loss: 0.7057 - acc: 0.8607 - val_loss: 0.9395 - val_acc: 0.8110
Out[12]:
<keras.callbacks.History at 0x7f5df4dcf898>

This will be a good benchmark for our future models - if we can't beat 80%, then we're not even beating a linear model trained on a sample, so we'll know that's not a good approach.

Single hidden layer


In [13]:
model = Sequential([
        BatchNormalization(axis=1, input_shape=(3, 224, 224)),
        Flatten(),
        Dense(100, activation='relu'),
        BatchNormalization(),
        Dense(10, activation='softmax')
    ])
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample)

model.optimizer.lr = 0.01
model.fit_generator(batches, batches.nb_sample, nb_epoch=5, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample)


Epoch 1/2
1500/1500 [==============================] - 32s - loss: 2.0024 - acc: 0.3640 - val_loss: 7.2128 - val_acc: 0.1330
Epoch 2/2
1500/1500 [==============================] - 24s - loss: 1.1389 - acc: 0.6913 - val_loss: 3.0983 - val_acc: 0.2890
Epoch 1/5
1500/1500 [==============================] - 32s - loss: 0.6864 - acc: 0.8487 - val_loss: 1.3658 - val_acc: 0.5700
Epoch 2/5
1500/1500 [==============================] - 24s - loss: 0.4720 - acc: 0.9273 - val_loss: 0.8339 - val_acc: 0.7500
Epoch 3/5
1500/1500 [==============================] - 24s - loss: 0.3135 - acc: 0.9600 - val_loss: 0.6967 - val_acc: 0.8200
Epoch 4/5
1500/1500 [==============================] - 24s - loss: 0.2304 - acc: 0.9840 - val_loss: 0.5857 - val_acc: 0.8610
Epoch 5/5
1500/1500 [==============================] - 24s - loss: 0.1706 - acc: 0.9933 - val_loss: 0.5138 - val_acc: 0.9070
Out[13]:
<keras.callbacks.History at 0x7f5decffecc0>

Single conv layer

2 conv layers with max pooling followed by a simple dense network is a good sample CNN to start:


In [15]:
def conv1(batches):
    model = Sequential([
            BatchNormalization(axis=1, input_shape=(3,224,224)),
            Convolution2D(32,3,3, activation='relu'),
            BatchNormalization(axis=1),
            MaxPooling2D((3,3)),
            Convolution2D(64,3,3, activation='relu'),
            BatchNormalization(axis=1),
            MaxPooling2D((3,3)),
            Flatten(),
            Dense(200, activation='relu'),
            BatchNormalization(),
            Dense(10, activation='softmax')
        ])
    
    model.compile(Adam(lr=0.001),  loss='categorical_crossentropy', metrics=['accuracy'])
    model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches, 
                     nb_val_samples=val_batches.nb_sample)
    model.optimizer.lr = 0.01
    model.fit_generator(batches, batches.nb_sample, nb_epoch=4, validation_data=val_batches, 
                     nb_val_samples=val_batches.nb_sample)
    return model

In [16]:
conv1(batches)


Epoch 1/2
1500/1500 [==============================] - 33s - loss: 1.6943 - acc: 0.5407 - val_loss: 2.2134 - val_acc: 0.3720
Epoch 2/2
1500/1500 [==============================] - 26s - loss: 0.5735 - acc: 0.8733 - val_loss: 2.1744 - val_acc: 0.2820
Epoch 1/4
1500/1500 [==============================] - 32s - loss: 0.1649 - acc: 0.9780 - val_loss: 2.3287 - val_acc: 0.2760
Epoch 2/4
1500/1500 [==============================] - 24s - loss: 0.0618 - acc: 0.9940 - val_loss: 2.4012 - val_acc: 0.2550
Epoch 3/4
1500/1500 [==============================] - 26s - loss: 0.0257 - acc: 0.9987 - val_loss: 2.6948 - val_acc: 0.2340
Epoch 4/4
1500/1500 [==============================] - 24s - loss: 0.0122 - acc: 1.0000 - val_loss: 2.4649 - val_acc: 0.2940
Out[16]:
<keras.models.Sequential at 0x7f5dec982128>

The training set here is very rapidly reaching a very high accuracy. So if we could regularize this, perhaps we could get a reasonable result.

So, what kind of regularization should we try first? As we discussed in lesson 3, we should start with data augmentation.

Data augmentation

Often, to find the best data augmentation parameters, we need to try each type, one at time.

And for each type, we can try 4 very different levels of augmentation, and see which one is the best. In the steps below we've only kept the single best result we found.

Width shift: move the image left and right -


In [17]:
gen_t = image.ImageDataGenerator(width_shift_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)


Found 1500 images belonging to 10 classes.

In [18]:
model = conv1(batches)


Epoch 1/2
1500/1500 [==============================] - 32s - loss: 2.1572 - acc: 0.3293 - val_loss: 2.7141 - val_acc: 0.1950
Epoch 2/2
1500/1500 [==============================] - 25s - loss: 1.2215 - acc: 0.6333 - val_loss: 1.9040 - val_acc: 0.3660
Epoch 1/4
1500/1500 [==============================] - 36s - loss: 0.8126 - acc: 0.7580 - val_loss: 1.9323 - val_acc: 0.3180
Epoch 2/4
1500/1500 [==============================] - 30s - loss: 0.6064 - acc: 0.8280 - val_loss: 4.1675 - val_acc: 0.2040
Epoch 3/4
1500/1500 [==============================] - 24s - loss: 0.4107 - acc: 0.8893 - val_loss: 7.3811 - val_acc: 0.1540
Epoch 4/4
1500/1500 [==============================] - 25s - loss: 0.3534 - acc: 0.9073 - val_loss: 10.0822 - val_acc: 0.1550

In [20]:
# Try a different level:
gen_t = image.ImageDataGenerator(width_shift_range=0.5)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)
model = conv1(batches)


Found 1500 images belonging to 10 classes.
Epoch 1/2
1500/1500 [==============================] - 33s - loss: 2.7401 - acc: 0.1673 - val_loss: 6.5864 - val_acc: 0.1330
Epoch 2/2
1500/1500 [==============================] - 24s - loss: 2.1378 - acc: 0.2727 - val_loss: 4.5555 - val_acc: 0.1810
Epoch 1/4
1500/1500 [==============================] - 33s - loss: 1.9101 - acc: 0.3660 - val_loss: 2.8872 - val_acc: 0.1570
Epoch 2/4
1500/1500 [==============================] - 25s - loss: 1.7085 - acc: 0.4460 - val_loss: 2.7262 - val_acc: 0.2110
Epoch 3/4
1500/1500 [==============================] - 26s - loss: 1.5522 - acc: 0.4833 - val_loss: 3.6229 - val_acc: 0.1410
Epoch 4/4
1500/1500 [==============================] - 26s - loss: 1.4203 - acc: 0.5327 - val_loss: 10.0113 - val_acc: 0.1100

There are many other types that we can try.

Here, let's save some time and put them together :)


In [21]:
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05, 
                shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)
model = conv1(batches)


Found 1500 images belonging to 10 classes.
Epoch 1/2
1500/1500 [==============================] - 33s - loss: 2.3887 - acc: 0.2400 - val_loss: 4.5639 - val_acc: 0.1980
Epoch 2/2
1500/1500 [==============================] - 25s - loss: 1.6648 - acc: 0.4513 - val_loss: 2.7483 - val_acc: 0.2190
Epoch 1/4
1500/1500 [==============================] - 35s - loss: 1.4291 - acc: 0.5160 - val_loss: 2.2258 - val_acc: 0.2660
Epoch 2/4
1500/1500 [==============================] - 26s - loss: 1.2220 - acc: 0.5980 - val_loss: 2.0314 - val_acc: 0.2810
Epoch 3/4
1500/1500 [==============================] - 26s - loss: 1.1074 - acc: 0.6433 - val_loss: 2.7005 - val_acc: 0.2600
Epoch 4/4
1500/1500 [==============================] - 25s - loss: 0.9478 - acc: 0.7060 - val_loss: 2.4790 - val_acc: 0.2210

At first glance, this isn't looking encouraging, since the validation set is poor and getting worse. But the training set is getting better, and still has a long way to go in accuracy - so we should try annealing our learning rate and running more epochs, before we make a decisions.


In [22]:
model.optimizer.lr = 0.0001
model.fit_generator(batches, batches.nb_sample, nb_epoch=5, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample)


Epoch 1/5
1500/1500 [==============================] - 33s - loss: 0.8260 - acc: 0.7413 - val_loss: 3.3064 - val_acc: 0.2110
Epoch 2/5
1500/1500 [==============================] - 25s - loss: 0.8478 - acc: 0.7260 - val_loss: 3.3521 - val_acc: 0.1630
Epoch 3/5
1500/1500 [==============================] - 27s - loss: 0.7473 - acc: 0.7560 - val_loss: 4.1624 - val_acc: 0.1980
Epoch 4/5
1500/1500 [==============================] - 26s - loss: 0.6712 - acc: 0.7873 - val_loss: 2.6913 - val_acc: 0.2350
Epoch 5/5
1500/1500 [==============================] - 26s - loss: 0.6504 - acc: 0.7993 - val_loss: 2.3551 - val_acc: 0.3160
Out[22]:
<keras.callbacks.History at 0x7f5ddbe72278>

Lucky we tried that - we starting to make progress! Let's keep going.


In [23]:
model.fit_generator(batches, batches.nb_sample, nb_epoch=25, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample)


Epoch 1/25
1500/1500 [==============================] - 33s - loss: 0.5882 - acc: 0.8173 - val_loss: 2.2266 - val_acc: 0.3240
Epoch 2/25
1500/1500 [==============================] - 25s - loss: 0.5156 - acc: 0.8413 - val_loss: 1.8544 - val_acc: 0.3570
Epoch 3/25
1500/1500 [==============================] - 25s - loss: 0.4701 - acc: 0.8493 - val_loss: 1.6511 - val_acc: 0.3860
Epoch 4/25
1500/1500 [==============================] - 25s - loss: 0.4363 - acc: 0.8620 - val_loss: 1.4309 - val_acc: 0.4650
Epoch 5/25
1500/1500 [==============================] - 28s - loss: 0.4512 - acc: 0.8673 - val_loss: 1.2533 - val_acc: 0.5680
Epoch 6/25
1500/1500 [==============================] - 25s - loss: 0.4127 - acc: 0.8760 - val_loss: 0.8589 - val_acc: 0.6700
Epoch 7/25
1500/1500 [==============================] - 25s - loss: 0.3548 - acc: 0.8920 - val_loss: 0.5642 - val_acc: 0.8100
Epoch 8/25
1500/1500 [==============================] - 25s - loss: 0.3340 - acc: 0.9013 - val_loss: 0.5808 - val_acc: 0.7970
Epoch 9/25
1500/1500 [==============================] - 26s - loss: 0.3178 - acc: 0.9107 - val_loss: 0.5591 - val_acc: 0.8200
Epoch 10/25
1500/1500 [==============================] - 29s - loss: 0.3170 - acc: 0.9047 - val_loss: 0.7229 - val_acc: 0.7410
Epoch 11/25
1500/1500 [==============================] - 25s - loss: 0.3012 - acc: 0.9073 - val_loss: 0.3774 - val_acc: 0.8930
Epoch 12/25
1500/1500 [==============================] - 25s - loss: 0.2888 - acc: 0.9173 - val_loss: 0.4326 - val_acc: 0.8670
Epoch 13/25
1500/1500 [==============================] - 26s - loss: 0.2789 - acc: 0.9147 - val_loss: 0.2632 - val_acc: 0.9270
Epoch 14/25
1500/1500 [==============================] - 26s - loss: 0.2831 - acc: 0.9113 - val_loss: 0.4195 - val_acc: 0.8590
Epoch 15/25
1500/1500 [==============================] - 26s - loss: 0.2488 - acc: 0.9260 - val_loss: 0.3485 - val_acc: 0.8900
Epoch 16/25
1500/1500 [==============================] - 26s - loss: 0.2426 - acc: 0.9327 - val_loss: 0.6070 - val_acc: 0.7960
Epoch 17/25
1500/1500 [==============================] - 25s - loss: 0.2546 - acc: 0.9240 - val_loss: 0.5311 - val_acc: 0.8190
Epoch 18/25
1500/1500 [==============================] - 26s - loss: 0.2622 - acc: 0.9207 - val_loss: 0.3521 - val_acc: 0.8910
Epoch 19/25
1500/1500 [==============================] - 27s - loss: 0.2587 - acc: 0.9260 - val_loss: 0.2459 - val_acc: 0.9180
Epoch 20/25
1500/1500 [==============================] - 30s - loss: 0.2309 - acc: 0.9267 - val_loss: 0.2587 - val_acc: 0.9190
Epoch 21/25
1500/1500 [==============================] - 26s - loss: 0.2095 - acc: 0.9327 - val_loss: 0.2613 - val_acc: 0.9180
Epoch 22/25
1500/1500 [==============================] - 25s - loss: 0.2029 - acc: 0.9307 - val_loss: 0.2308 - val_acc: 0.9310
Epoch 23/25
1500/1500 [==============================] - 27s - loss: 0.1949 - acc: 0.9387 - val_loss: 0.2980 - val_acc: 0.8930
Epoch 24/25
1500/1500 [==============================] - 25s - loss: 0.2042 - acc: 0.9400 - val_loss: 0.2478 - val_acc: 0.9210
Epoch 25/25
1500/1500 [==============================] - 26s - loss: 0.1566 - acc: 0.9573 - val_loss: 0.2473 - val_acc: 0.9330
Out[23]:
<keras.callbacks.History at 0x7f5ddbe72a58>

Amazingly, using nothing but a small sample, a simple (not pre-trained) model with no dropout, and data augmentation, we're getting results that would get us into the top 50% of the competition! This looks like a great foundation for our futher experiments. To go further, we'll need to use the whole dataset, since dropout and data volumes are very related, so we can't tweak dropout without using all the data.


In [ ]: