Enter State Farm


In [1]:
import theano


/home/wnixalo/miniconda3/envs/FAI/lib/python2.7/site-packages/theano/gpuarray/dnn.py:135: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to version 5.1.
  warnings.warn("Your cuDNN version is more recent than "
Using cuDNN version 6021 on context None
Mapped name None to device cuda: GeForce GTX 870M (0000:01:00.0)

In [2]:
import os, sys
sys.path.insert(1, os.path.join(os.getcwd(), 'utils'))

In [3]:
%matplotlib inline
from __future__ import print_function, division
# path = "data/sample/"
path = "data/statefarm/sample/"
import utils; reload(utils)
from utils import *
from IPython.display import FileLink


Using Theano backend.

In [4]:
# batch_size = 64
batch_size = 32

Create Sample

The following assumes you've already created your validation set - remember that the training and validation set should contain different drivers, as mentioned on the Kaggle competition page.


In [5]:
%cd data/statefarm
%cd train


/home/wnixalo/Kaukasos/FAI/data/statefarm
/home/wnixalo/Kaukasos/FAI/data/statefarm/train

In [6]:
%mkdir ../sample
%mkdir ../sample/train
%mkdir ../sample/valid

In [7]:
for d in glob('c?'):
    os.mkdir('../sample/train/' + d)
    os.mkdir('../sample/valid/' + d)

In [5]:
from shutil import copyfile

In [23]:
g = glob('c?/*.jpg')
shuf = np.random.permutation(g)
for i in range(1500): copyfile(shuf[i], '../sample/train/' + shuf[i])

In [20]:
# # removing copied sample training images
# help(os)
# for f in glob('c?/*.jpg'):
#     os.remove(f)

In [23]:
% cd ../../..
%mkdir data/statefarm/results
%mkdir data/statefarm/sample/test


/Users/WayNoxchi/Deshar/Kaukasos/FAI

Validation Set (Sample)

How I'll do it: create a full val set in the full valid folder, then copy over the same percentage as train to the sample/valid folder.

Acutally: wouldn't it be better if I used the full validation set for more accurate results? Then again, for processing on my MacBook, it may be good enough to go w/ the 1st method.

1/3: function definitions for moving stuff & aming dirs:

In [10]:
# run once, make sure you're in datadir first
# path = os.getcwd()
# os.mkdir(path + '/valid')
# for i in xrange(10): os.mkdir(path + '/valid' + '/c' + str(i))

def reset_valid(verbose=1, valid_path='', TRAIN_DIR=''):
    """Moves all images in validation set back to 
    their respective classes in the training set."""
    counter = 0
    if not valid_path: valid_path = os.getcwd() + '/valid/'
    if not TRAIN_DIR:  TRAIN_DIR  = os.getcwd() + '/train'
    %cd $valid_path
    for i in xrange(10):
        %cd c"$i"
        g = glob('*.jpg')
        for n in xrange(len(g)):
            os.rename(g[n], TRAIN_DIR + '/c' + str(i) + '/' + g[n])
            counter += 1
        % cd ..
    if verbose: print("Moved {} files.".format(counter))
#         %mv $VALID_DIR/c"$i"/$*.jpg $TRAIN_DIR/c"$i"/$*.jpg

# modified from: http://forums.fast.ai/t/statefarm-kaggle-comp/183/20
def set_valid(number=1, verbose=1, data_path=''):
    """Moves <number> of subjects from training to validation 
    directories. Verbosity: 0: Silent; 1: print no. files moved; 
    2: print each move operation"""
    if not data_path: data_path = os.getcwd() + '/'
    counter = 0
    if number < 0: number = 0
    for n in xrange(number):
        # read CSV file into Pandas DataFrame
        dil = pd.read_csv(data_path + 'driver_imgs_list.csv')
        # group frame by subject in image
        grouped_subjects = dil.groupby('subject')
        # pick <number> subjects at random
        subject = grouped_subjects.groups.keys()[np.random.randint(0, high=len(grouped_subjects.groups))] # <-- groups?
        # get the group assoc w/ subject
        group = grouped_subjects.get_group(subject)
        # loop over gropu & move imgs to validation dir
        for (subject, clssnm, img) in group.values:
            source = '{}train/{}/{}'.format(data_path, clssnm, img)
            target = source.replace('train', 'valid')
            if verbose > 1: print('mv {} {}'.format(source, target))
            os.rename(source, target)
            counter += 1
    if verbose: print ("Files moved: {}".format(counter))
2/3: Making sure we're in the right dir, & moving stuff

In [11]:
%pwd


Out[11]:
u'/home/wnixalo/Kaukasos/FAI/data/statefarm/train'

In [6]:
# %cd ~/Deshar/Kaukasos/FAI
%cd ~/Kaukasos/FAI


/home/wnixalo/Kaukasos/FAI

In [13]:
%cd data/statefarm/
reset_valid()
%cd ..
set_valid(number=3)


/home/wnixalo/Kaukasos/FAI/data/statefarm
/home/wnixalo/Kaukasos/FAI/data/statefarm/valid
/home/wnixalo/Kaukasos/FAI/data/statefarm/valid/c0
/home/wnixalo/Kaukasos/FAI/data/statefarm/valid
/home/wnixalo/Kaukasos/FAI/data/statefarm/valid/c1
/home/wnixalo/Kaukasos/FAI/data/statefarm/valid
/home/wnixalo/Kaukasos/FAI/data/statefarm/valid/c2
/home/wnixalo/Kaukasos/FAI/data/statefarm/valid
/home/wnixalo/Kaukasos/FAI/data/statefarm/valid/c3
/home/wnixalo/Kaukasos/FAI/data/statefarm/valid
/home/wnixalo/Kaukasos/FAI/data/statefarm/valid/c4
/home/wnixalo/Kaukasos/FAI/data/statefarm/valid
/home/wnixalo/Kaukasos/FAI/data/statefarm/valid/c5
/home/wnixalo/Kaukasos/FAI/data/statefarm/valid
/home/wnixalo/Kaukasos/FAI/data/statefarm/valid/c6
/home/wnixalo/Kaukasos/FAI/data/statefarm/valid
/home/wnixalo/Kaukasos/FAI/data/statefarm/valid/c7
/home/wnixalo/Kaukasos/FAI/data/statefarm/valid
/home/wnixalo/Kaukasos/FAI/data/statefarm/valid/c8
/home/wnixalo/Kaukasos/FAI/data/statefarm/valid
/home/wnixalo/Kaukasos/FAI/data/statefarm/valid/c9
/home/wnixalo/Kaukasos/FAI/data/statefarm/valid
Moved 2577 files.
/home/wnixalo/Kaukasos/FAI/data/statefarm
Files moved: 2961

I understand now why I was getting weird validation-accuracy results: I was moving a unique valset from training, in the full data directory and not in the sample dir. But then why was my model even able to train if there wasn't anything in the sample validation folders? Because I was only copying 1000 random images from sample/train to sample/valid. Ooof..

Nevermind, ignore (some of) that.... the 1000 sample val imgs are taken from the valid set moved from training in the full directory.. The problem affecting accuracy is that the valid set separated from training after the sample training set is copied.. So some of the val imgs will have drivers in sample training set. This explains why accuracy was off, but not as off as one would expect. Will reconfigure this.

This notebook is being rerun on my Asus Linux machine. Upgrading from an Intel Core i5 CPU to an NVidia GTX 870M GPU should yield a good speedup.

CPU times:

  • Single Linear Model: 60~48 seconds
  • Single (100 Node) Hidden Layer: 67~52 seconds
  • Single block of 2 Convolutional layers (+ LM): 453~410 seconds
3/3: copying val set from the full valid folder to sample valid

J.Howard uses a permutation of 1,000 val imgs, so I'll just do that here.


In [26]:
%pwd


/home/wnixalo/Kaukasos/FAI/data/statefarm
Out[26]:
u'/home/wnixalo/Kaukasos/FAI/data/statefarm'

In [27]:
%cd valid
# g = glob('valid/c?/*.jpg') # <-- this doesnt work: why?
g = glob('c?/*.jpg')
shuf = np.random.permutation(g)
# for i in range(1000): copyfile(shuf[i], '/sample/' + shuf[i])
for i in range(1000): copyfile(shuf[i], '../sample/valid/' + shuf[i])


/home/wnixalo/Kaukasos/FAI/data/statefarm/valid

Create Batches


In [7]:
batches = get_batches(path + 'train', batch_size=batch_size)
val_batches = get_batches(path + 'valid', batch_size=batch_size*2, shuffle=False)


Found 1500 images belonging to 10 classes.
Found 1000 images belonging to 10 classes.

In [35]:
%pwd
os.mkdir(path + 'test')

In [8]:
(val_classes, trn_classes, val_labels, trn_labels, val_filenames, filenames, 
    test_filename) = get_classes(path)


Found 1500 images belonging to 10 classes.
Found 1000 images belonging to 10 classes.
Found 0 images belonging to 0 classes.

Basic Models

Linear Model

First, we try the simplest model and use default parameters. Note the trick of making the first layer a batchnorm layer - that way we don't have to worry about normalizing the input ourselves.


In [39]:
model = Sequential([
            BatchNormalization(axis=1, input_shape=(3, 224, 224)),
            Flatten(),
            Dense(10, activation='softmax')
        ])

As you can see below, this training is going nowhere...


In [40]:
model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches,
                   nb_val_samples=val_batches.nb_sample)


Epoch 1/2
1500/1500 [==============================] - 29s - loss: 13.2244 - acc: 0.1360 - val_loss: 14.0211 - val_acc: 0.1270
Epoch 2/2
1500/1500 [==============================] - 22s - loss: 13.3749 - acc: 0.1627 - val_loss: 13.8814 - val_acc: 0.1300
Out[40]:
<keras.callbacks.History at 0x7efbfd7bdcd0>

Let's first check the number of parameters to see that there's enough parameters to find some useful relationships:


In [41]:
model.summary()


____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
batchnormalization_1 (BatchNorma (None, 3, 224, 224)   12          batchnormalization_input_1[0][0] 
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 150528)        0           batchnormalization_1[0][0]       
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 10)            1505290     flatten_1[0][0]                  
====================================================================================================
Total params: 1,505,302
Trainable params: 1,505,296
Non-trainable params: 6
____________________________________________________________________________________________________

In [42]:
10*3*224*224


Out[42]:
1505280

Since we have a simple model with no regularization and plenty of parameters, it seems most likely that our learning rate is too hgh. Perhaps it is jumping to a solution where it predicts one or two classes with high confidence, so that it can give a zero prediction to as many classes as possible - that's the best approach for a model that is no better than random, and there is likely to be where we would end up with a high learning rate. So let's check:


In [43]:
np.round(model.predict_generator(batches, batches.n)[:10],2)


Out[43]:
array([[ 0.15,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.85,  0.  ],
       [ 0.  ,  0.  ,  1.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ],
       [ 0.  ,  0.  ,  1.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ],
       [ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  1.  ,  0.  ],
       [ 0.  ,  0.  ,  1.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ],
       [ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  1.  ,  0.  ],
       [ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  1.  ,  0.  ],
       [ 0.  ,  0.  ,  1.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ],
       [ 0.  ,  0.  ,  1.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ],
       [ 0.  ,  0.  ,  1.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ]], dtype=float32)

In [16]:
# temp = model.predict_generator(batches, batches.n)

(Not so in this case, only kind of, but it was indeed predicted 1 or 6 back on the Mac)

Our hypothesis was correct. It's nearly always predicting class 1 or 6, with very high confidence. So let's try a lower learning rate:


In [44]:
# here's a way to take a look at the learning rate
import keras.backend as K
LR = K.eval(model.optimizer.lr)
print(LR)


0.0010000000475

In [45]:
model = Sequential([
            BatchNormalization(axis=1, input_shape=(3,224,224)),
            Flatten(),
            Dense(10, activation='softmax')
        ])
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches, 
                    nb_val_samples=val_batches.nb_sample)


Epoch 1/2
1500/1500 [==============================] - 29s - loss: 2.3244 - acc: 0.2060 - val_loss: 4.7891 - val_acc: 0.1390
Epoch 2/2
1500/1500 [==============================] - 22s - loss: 1.7141 - acc: 0.4367 - val_loss: 3.4757 - val_acc: 0.1630
Out[45]:
<keras.callbacks.History at 0x7efbf9d8b250>

Great - we found our way out of that hole ... Now we can increase the learning rate and see where we can get to.


In [46]:
model.optimizer.lr=0.001

In [47]:
model.fit_generator(batches, batches.nb_sample, nb_epoch=4, validation_data=val_batches, 
                    nb_val_samples=val_batches.nb_sample)


Epoch 1/4
1500/1500 [==============================] - 28s - loss: 1.3299 - acc: 0.6260 - val_loss: 2.5152 - val_acc: 0.2180
Epoch 2/4
1500/1500 [==============================] - 23s - loss: 1.1029 - acc: 0.7153 - val_loss: 2.2670 - val_acc: 0.2630
Epoch 3/4
1500/1500 [==============================] - 23s - loss: 0.9380 - acc: 0.7900 - val_loss: 1.9485 - val_acc: 0.3060
Epoch 4/4
1500/1500 [==============================] - 22s - loss: 0.7986 - acc: 0.8380 - val_loss: 1.9150 - val_acc: 0.3500
Out[47]:
<keras.callbacks.History at 0x7efbf9d8ba90>

We're stabilizing at validation accuracy of 0.39 (~.35 in my NB). Not great, but a lot better than random. Before moving on, let's check that our validation set on the sample is large enough that it gives consistent results:


In [48]:
rnd_batches = get_batches(path+'valid', batch_size=batch_size*2, shuffle=True)


Found 1000 images belonging to 10 classes.

In [49]:
val_res = [model.evaluate_generator(rnd_batches, rnd_batches.nb_sample) for i in range(10)]

In [50]:
np.round(val_res,2)


Out[50]:
array([[ 1.92,  0.35],
       [ 1.89,  0.36],
       [ 1.91,  0.35],
       [ 1.95,  0.34],
       [ 1.93,  0.36],
       [ 1.87,  0.35],
       [ 1.94,  0.34],
       [ 1.93,  0.35],
       [ 1.92,  0.35],
       [ 1.92,  0.35]])

Yup, pretty consistent - if we see imporvements of 3% or more, it's probably not random, based on the above samples.

L2 Regularization

The previous model is over-fitting a lot, but we can't use dropout since we only have one layer. We can try to decrease overfitting in our model by adding l2 regularization (ie: add the sum of squares of the weights to our loss function):


In [51]:
model = Sequential([
            BatchNormalization(axis=1, input_shape=(3,224,224)),
            Flatten(),
            Dense(10, activation='softmax', W_regularizer=l2(0.01))
        ])
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches, 
                    nb_val_samples=val_batches.nb_sample)


Epoch 1/2
1500/1500 [==============================] - 29s - loss: 2.5984 - acc: 0.1967 - val_loss: 4.0778 - val_acc: 0.1300
Epoch 2/2
1500/1500 [==============================] - 22s - loss: 1.9251 - acc: 0.4273 - val_loss: 3.3006 - val_acc: 0.1690
Out[51]:
<keras.callbacks.History at 0x7efbf8e91050>

In [52]:
model.optimizer.lr=0.001
model.fit_generator(batches, batches.nb_sample, nb_epoch=4, validation_data=val_batches, 
                    nb_val_samples=val_batches.nb_sample)


Epoch 1/4
1500/1500 [==============================] - 29s - loss: 1.5375 - acc: 0.6327 - val_loss: 2.6286 - val_acc: 0.3050
Epoch 2/4
1500/1500 [==============================] - 23s - loss: 1.3301 - acc: 0.7207 - val_loss: 2.2907 - val_acc: 0.3770
Epoch 3/4
1500/1500 [==============================] - 23s - loss: 1.1374 - acc: 0.7927 - val_loss: 2.1786 - val_acc: 0.4260
Epoch 4/4
1500/1500 [==============================] - 23s - loss: 1.0011 - acc: 0.8413 - val_loss: 2.0969 - val_acc: 0.4280
Out[52]:
<keras.callbacks.History at 0x7efbf0d1c550>

Looks like we can get a bit over 50% (almost, here: 42.8%) accuracy this way. This'll be a good benchmark for our future models - if we can't beat 50%, then we're not even beating a linear model trained on a sample, so we'll know that's not a good approach.

Single hidden layer

The next simplest model is to add a single hidden layer.


In [53]:
model = Sequential([
            BatchNormalization(axis=1, input_shape=(3, 224, 224)),
            Flatten(),
            Dense(100, activation='relu'), #¿would λ2 be good here?
            BatchNormalization(),
            Dense(10, activation='softmax')
        ])
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches, 
                    nb_val_samples=val_batches.nb_sample)

model.optimizer.lr = 0.01
model.fit_generator(batches, batches.nb_sample, nb_epoch=5, validation_data=val_batches, 
                    nb_val_samples=val_batches.nb_sample)


Epoch 1/2
1500/1500 [==============================] - 29s - loss: 1.9530 - acc: 0.3720 - val_loss: 6.8269 - val_acc: 0.2310
Epoch 2/2
1500/1500 [==============================] - 22s - loss: 0.9760 - acc: 0.7380 - val_loss: 3.5732 - val_acc: 0.3020
Epoch 1/5
1500/1500 [==============================] - 29s - loss: 0.5404 - acc: 0.8920 - val_loss: 2.8955 - val_acc: 0.3460
Epoch 2/5
1500/1500 [==============================] - 24s - loss: 0.3456 - acc: 0.9513 - val_loss: 2.3355 - val_acc: 0.3780
Epoch 3/5
1500/1500 [==============================] - 22s - loss: 0.2140 - acc: 0.9807 - val_loss: 1.8686 - val_acc: 0.4330
Epoch 4/5
1500/1500 [==============================] - 22s - loss: 0.1530 - acc: 0.9940 - val_loss: 1.9645 - val_acc: 0.4020
Epoch 5/5
1500/1500 [==============================] - 22s - loss: 0.1067 - acc: 0.9973 - val_loss: 1.9403 - val_acc: 0.4000
Out[53]:
<keras.callbacks.History at 0x7efbece61d10>

(Odd, I may not have a good validation set if I'm getting such higher valacc numbers... ---- not anymore now that I'm using a proper valset. Of course, just as with JH's notebook: val accuracy has decreased a bit.)

Not looking very encouraging... which isn't surprising since we know that CNNs are a much better choice for computer vision problems. So we'll try one.

Single Conv Layer

2 conv layers with max pooling followed by a simple dense network is a good simple CNN to start with:


In [24]:
def conv1(batches):
    model = Sequential([
                BatchNormalization(axis=1, input_shape=(3,224,224)),
                Convolution2D(32, 3, 3, activation='relu'),
                BatchNormalization(axis=1),
                MaxPooling2D((3, 3)),
                Convolution2D(64, 3, 3, activation='relu'),
                BatchNormalization(axis=1),
                MaxPooling2D((3,3)),
                Flatten(),
                Dense(200, activation='relu'),
                BatchNormalization(),
                Dense(10, activation='softmax')
            ])
    model.compile(Adam(1e-3), loss='categorical_crossentropy', metrics=['accuracy'])
    model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches,
                        nb_val_samples=val_batches.nb_sample)
    
    model.optimizer.lr = 0.001
    model.fit_generator(batches, batches.nb_sample, nb_epoch=4, validation_data=val_batches,
                        nb_val_samples=val_batches.nb_sample)
    
    return model

On GPU running out of memory (2692/3017 MiB) at this point. Restarting with smaller batch size (32?)


In [10]:
conv1(batches)


Epoch 1/2
1500/1500 [==============================] - 27s - loss: 1.5698 - acc: 0.5500 - val_loss: 2.3221 - val_acc: 0.1720
Epoch 2/2
1500/1500 [==============================] - 24s - loss: 0.3194 - acc: 0.9200 - val_loss: 4.1715 - val_acc: 0.1150
Epoch 1/4
1500/1500 [==============================] - 28s - loss: 0.0753 - acc: 0.9860 - val_loss: 10.1540 - val_acc: 0.0950
Epoch 2/4
1500/1500 [==============================] - 24s - loss: 0.0312 - acc: 0.9967 - val_loss: 7.7758 - val_acc: 0.1620
Epoch 3/4
1500/1500 [==============================] - 25s - loss: 0.0130 - acc: 1.0000 - val_loss: 5.6227 - val_acc: 0.2060
Epoch 4/4
1500/1500 [==============================] - 24s - loss: 0.0060 - acc: 1.0000 - val_loss: 3.5450 - val_acc: 0.2780

The training set here is very rapidly reaching a very high accuracy. So if we could regularize this, perhaps we could get a reasonable results.

So, what kind of regularization should we try first? As we discussed in lesson 3, we should start with data augmentation.

Data Augmentation

To find the best data augmentation parameters, we can try each type of data augmentation, one at a time. For each type, we can try four very different levels of augmentation, and see which is the best. In the steps below we've only kept the single best results we found. We're using the CNN we defined above, since we have already observed it can model the data quickly and accurately.

Width shift: move the image left and right -


In [11]:
gen_t = image.ImageDataGenerator(width_shift_range=0.1)
batches = get_batches(path + 'train', gen_t, batch_size=batch_size)


Found 1500 images belonging to 10 classes.

In [12]:
model = conv1(batches)


Epoch 1/2
1500/1500 [==============================] - 27s - loss: 2.1444 - acc: 0.3353 - val_loss: 2.0697 - val_acc: 0.3110
Epoch 2/2
1500/1500 [==============================] - 26s - loss: 1.1475 - acc: 0.6627 - val_loss: 2.1255 - val_acc: 0.2670
Epoch 1/4
1500/1500 [==============================] - 28s - loss: 0.7160 - acc: 0.7867 - val_loss: 2.6059 - val_acc: 0.2640
Epoch 2/4
1500/1500 [==============================] - 24s - loss: 0.5159 - acc: 0.8447 - val_loss: 2.7986 - val_acc: 0.2920
Epoch 3/4
1500/1500 [==============================] - 24s - loss: 0.4063 - acc: 0.8807 - val_loss: 2.1449 - val_acc: 0.3750
Epoch 4/4
1500/1500 [==============================] - 24s - loss: 0.2804 - acc: 0.9220 - val_loss: 1.7042 - val_acc: 0.3800

Height shift: move the image up and down -


In [13]:
gen_t = image.ImageDataGenerator(height_shift_range=0.05)
batches = get_batches(path + 'train', gen_t, batch_size=batch_size)


Found 1500 images belonging to 10 classes.

In [14]:
model = conv1(batches)


Epoch 1/2
1500/1500 [==============================] - 27s - loss: 1.9705 - acc: 0.4240 - val_loss: 2.1577 - val_acc: 0.2370
Epoch 2/2
1500/1500 [==============================] - 24s - loss: 0.7798 - acc: 0.7780 - val_loss: 2.5230 - val_acc: 0.2800
Epoch 1/4
1500/1500 [==============================] - 28s - loss: 0.3468 - acc: 0.9153 - val_loss: 3.1418 - val_acc: 0.2650
Epoch 2/4
1500/1500 [==============================] - 24s - loss: 0.2442 - acc: 0.9333 - val_loss: 4.4669 - val_acc: 0.1850
Epoch 3/4
1500/1500 [==============================] - 24s - loss: 0.1695 - acc: 0.9553 - val_loss: 5.1272 - val_acc: 0.2070
Epoch 4/4
1500/1500 [==============================] - 24s - loss: 0.1205 - acc: 0.9767 - val_loss: 3.2405 - val_acc: 0.2420

Random shear angles (max in radians) -


In [15]:
gen_t = image.ImageDataGenerator(shear_range=0.1)
batches = get_batches(path + 'train', gen_t, batch_size=batch_size)


Found 1500 images belonging to 10 classes.

In [16]:
model = conv1(batches)


Epoch 1/2
1500/1500 [==============================] - 28s - loss: 1.6760 - acc: 0.5007 - val_loss: 2.1360 - val_acc: 0.2010
Epoch 2/2
1500/1500 [==============================] - 24s - loss: 0.4158 - acc: 0.8887 - val_loss: 2.7266 - val_acc: 0.1360
Epoch 1/4
1500/1500 [==============================] - 28s - loss: 0.1252 - acc: 0.9767 - val_loss: 3.0101 - val_acc: 0.2570
Epoch 2/4
1500/1500 [==============================] - 24s - loss: 0.0758 - acc: 0.9873 - val_loss: 3.4695 - val_acc: 0.1940
Epoch 3/4
1500/1500 [==============================] - 25s - loss: 0.0400 - acc: 0.9960 - val_loss: 3.1227 - val_acc: 0.1900
Epoch 4/4
1500/1500 [==============================] - 24s - loss: 0.0209 - acc: 0.9987 - val_loss: 2.8056 - val_acc: 0.2300

Rotation: max in degrees -


In [17]:
gen_t = image.ImageDataGenerator(rotation_range=15)
batches = get_batches(path + 'train', gen_t, batch_size=batch_size)


Found 1500 images belonging to 10 classes.

In [18]:
model = conv1(batches)


Epoch 1/2
1500/1500 [==============================] - 27s - loss: 1.9311 - acc: 0.3973 - val_loss: 2.1683 - val_acc: 0.1690
Epoch 2/2
1500/1500 [==============================] - 24s - loss: 0.8799 - acc: 0.7247 - val_loss: 2.1014 - val_acc: 0.2760
Epoch 1/4
1500/1500 [==============================] - 27s - loss: 0.4911 - acc: 0.8547 - val_loss: 2.4849 - val_acc: 0.2890
Epoch 2/4
1500/1500 [==============================] - 24s - loss: 0.3981 - acc: 0.8893 - val_loss: 2.6046 - val_acc: 0.2490
Epoch 3/4
1500/1500 [==============================] - 25s - loss: 0.2532 - acc: 0.9267 - val_loss: 2.7225 - val_acc: 0.3160
Epoch 4/4
1500/1500 [==============================] - 25s - loss: 0.2075 - acc: 0.9460 - val_loss: 2.7361 - val_acc: 0.3170

Channel shift: randomly changing the R,B,G colors -


In [19]:
gen_t = image.ImageDataGenerator(channel_shift_range=20)
batches = get_batches(path + 'train', gen_t, batch_size=batch_size)


Found 1500 images belonging to 10 classes.

In [20]:
model = conv1(batches)


Epoch 1/2
1500/1500 [==============================] - 28s - loss: 1.8106 - acc: 0.4927 - val_loss: 2.3554 - val_acc: 0.1720
Epoch 2/2
1500/1500 [==============================] - 24s - loss: 0.4388 - acc: 0.8960 - val_loss: 3.5913 - val_acc: 0.1240
Epoch 1/4
1500/1500 [==============================] - 28s - loss: 0.1195 - acc: 0.9820 - val_loss: 5.1230 - val_acc: 0.1970
Epoch 2/4
1500/1500 [==============================] - 24s - loss: 0.0474 - acc: 0.9940 - val_loss: 3.5882 - val_acc: 0.2090
Epoch 3/4
1500/1500 [==============================] - 25s - loss: 0.0146 - acc: 1.0000 - val_loss: 3.4423 - val_acc: 0.1480
Epoch 4/4
1500/1500 [==============================] - 25s - loss: 0.0076 - acc: 1.0000 - val_loss: 3.2381 - val_acc: 0.1520

And finally, putting it all together!


In [21]:
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05, 
                shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
batches = get_batches(path + 'train', gen_t, batch_size=batch_size)


Found 1500 images belonging to 10 classes.

In [25]:
model = conv1(batches)


Epoch 1/2
1500/1500 [==============================] - 28s - loss: 2.2840 - acc: 0.2820 - val_loss: 2.9133 - val_acc: 0.1980
Epoch 2/2
1500/1500 [==============================] - 24s - loss: 1.6720 - acc: 0.4600 - val_loss: 2.2237 - val_acc: 0.1370
Epoch 1/4
1500/1500 [==============================] - 28s - loss: 1.3354 - acc: 0.5707 - val_loss: 4.3484 - val_acc: 0.1790
Epoch 2/4
1500/1500 [==============================] - 24s - loss: 1.1996 - acc: 0.6173 - val_loss: 5.3284 - val_acc: 0.2020
Epoch 3/4
1500/1500 [==============================] - 25s - loss: 1.0949 - acc: 0.6333 - val_loss: 4.0895 - val_acc: 0.2800
Epoch 4/4
1500/1500 [==============================] - 24s - loss: 0.9598 - acc: 0.6987 - val_loss: 2.6515 - val_acc: 0.3640

At first glance, this isn't looking encouraging, since the validation set is poor and getting worse. But the training set is getting better, and still has a long way to go in accuracy - so we should try annealing our learning rate and running more epochs, before we make a decision.


In [26]:
model.optimizer.lr = 0.0001
model.fit_generator(batches, batches.nb_sample, nb_epoch=5, validation_data=val_batches,
                    nb_val_samples=val_batches.nb_sample)


Epoch 1/5
1500/1500 [==============================] - 28s - loss: 0.8259 - acc: 0.7513 - val_loss: 1.4187 - val_acc: 0.5020
Epoch 2/5
1500/1500 [==============================] - 25s - loss: 0.7557 - acc: 0.7540 - val_loss: 1.4003 - val_acc: 0.4590
Epoch 3/5
1500/1500 [==============================] - 24s - loss: 0.6917 - acc: 0.7840 - val_loss: 1.1953 - val_acc: 0.5540
Epoch 4/5
1500/1500 [==============================] - 24s - loss: 0.6404 - acc: 0.7920 - val_loss: 0.9105 - val_acc: 0.7270
Epoch 5/5
1500/1500 [==============================] - 25s - loss: 0.5648 - acc: 0.8133 - val_loss: 0.8700 - val_acc: 0.7090
Out[26]:
<keras.callbacks.History at 0x7f37bd40d550>

Lucky we tried that - we're starting to make progress! Let's keep going.


In [27]:
model.fit_generator(batches, batches.nb_sample, nb_epoch=25, validation_data=val_batches,
                    nb_val_samples=val_batches.nb_sample)


Epoch 1/25
1500/1500 [==============================] - 28s - loss: 0.5665 - acc: 0.8213 - val_loss: 0.9156 - val_acc: 0.6950
Epoch 2/25
1500/1500 [==============================] - 24s - loss: 0.5001 - acc: 0.8453 - val_loss: 1.1941 - val_acc: 0.6120
Epoch 3/25
1500/1500 [==============================] - 24s - loss: 0.4538 - acc: 0.8640 - val_loss: 0.8370 - val_acc: 0.6890
Epoch 4/25
1500/1500 [==============================] - 25s - loss: 0.4005 - acc: 0.8800 - val_loss: 0.9979 - val_acc: 0.6710
Epoch 5/25
1500/1500 [==============================] - 25s - loss: 0.3976 - acc: 0.8760 - val_loss: 0.8736 - val_acc: 0.7280
Epoch 6/25
1500/1500 [==============================] - 24s - loss: 0.3930 - acc: 0.8767 - val_loss: 1.0180 - val_acc: 0.7020
Epoch 7/25
1500/1500 [==============================] - 25s - loss: 0.3792 - acc: 0.8833 - val_loss: 1.0186 - val_acc: 0.6510
Epoch 8/25
1500/1500 [==============================] - 25s - loss: 0.3408 - acc: 0.8873 - val_loss: 0.7202 - val_acc: 0.7560
Epoch 9/25
1500/1500 [==============================] - 25s - loss: 0.3141 - acc: 0.9027 - val_loss: 0.6616 - val_acc: 0.8240
Epoch 10/25
1500/1500 [==============================] - 24s - loss: 0.3139 - acc: 0.8960 - val_loss: 0.8935 - val_acc: 0.7520
Epoch 11/25
1500/1500 [==============================] - 24s - loss: 0.3201 - acc: 0.8953 - val_loss: 0.9206 - val_acc: 0.7150
Epoch 12/25
1500/1500 [==============================] - 24s - loss: 0.3155 - acc: 0.8953 - val_loss: 0.7933 - val_acc: 0.7590
Epoch 13/25
1500/1500 [==============================] - 25s - loss: 0.2670 - acc: 0.9113 - val_loss: 0.7633 - val_acc: 0.7530
Epoch 14/25
1500/1500 [==============================] - 25s - loss: 0.2459 - acc: 0.9287 - val_loss: 0.8990 - val_acc: 0.7460
Epoch 15/25
1500/1500 [==============================] - 25s - loss: 0.2187 - acc: 0.9327 - val_loss: 0.7794 - val_acc: 0.7720
Epoch 16/25
1500/1500 [==============================] - 25s - loss: 0.2375 - acc: 0.9287 - val_loss: 1.0600 - val_acc: 0.7370
Epoch 17/25
1500/1500 [==============================] - 25s - loss: 0.2250 - acc: 0.9307 - val_loss: 0.7686 - val_acc: 0.7590
Epoch 18/25
1500/1500 [==============================] - 25s - loss: 0.2285 - acc: 0.9313 - val_loss: 0.9665 - val_acc: 0.7500
Epoch 19/25
1500/1500 [==============================] - 25s - loss: 0.2145 - acc: 0.9273 - val_loss: 0.7705 - val_acc: 0.7590
Epoch 20/25
1500/1500 [==============================] - 24s - loss: 0.2196 - acc: 0.9273 - val_loss: 1.2420 - val_acc: 0.6880
Epoch 21/25
1500/1500 [==============================] - 24s - loss: 0.2156 - acc: 0.9320 - val_loss: 1.0883 - val_acc: 0.7420
Epoch 22/25
1500/1500 [==============================] - 25s - loss: 0.2368 - acc: 0.9207 - val_loss: 0.8508 - val_acc: 0.7470
Epoch 23/25
1500/1500 [==============================] - 25s - loss: 0.2331 - acc: 0.9200 - val_loss: 1.0747 - val_acc: 0.7370
Epoch 24/25
1500/1500 [==============================] - 25s - loss: 0.1729 - acc: 0.9440 - val_loss: 1.1955 - val_acc: 0.6960
Epoch 25/25
1500/1500 [==============================] - 25s - loss: 0.1553 - acc: 0.9493 - val_loss: 1.0082 - val_acc: 0.7680
Out[27]:
<keras.callbacks.History at 0x7f37bd40d9d0>

Amazingly, using nothing but a small sample, a simple (not pre-trianed) model with no dropout, and data augmentation, we're getting results that would get us into the top 50% of the competition! This looks like a great foundation for our further experiments.

To go further, we'll need to use the whole dataset, since dropout and data volumes are very related, so we can't tweak dropout without using all the data.

(I can confirm: my best 1st attempt score was a loss of 1.51002 w/ indicated val/trn loss of: 1.1896/1.0269 & val acc. of 61.9%.. Here it's obviously severely overfitting on training data, but the indicated val loss/acc is: 1.0082/76.8%.. Looks good.)

Btw, 1.51~~ gives a ranking of 658/1440: top 45.69%. So this would be much better.