Fisheries competition

In this notebook we're going to investigate a range of different architectures for the Kaggle fisheries competition. The video states that vgg.py and vgg_ft() from utils.py have been updated to include VGG with batch normalization, but this is not the case. We've instead created a new file vgg_bn.py and an additional method vgg_ft_bn() (which is already in utils.py) which we use in this notebook.


In [1]:
from theano.sandbox import cuda


WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10).  Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5103)

In [2]:
%matplotlib inline
import utils; reload(utils)
from utils import *
from __future__ import division, print_function


Using Theano backend.

In [3]:
#path = "data/fish/sample/"
path = "data/fish/"
batch_size=64

Setup dirs

We create the validation and sample sets in the usual way.


In [4]:
%cd data/fish
%cd train
%mkdir ../valid


/home/ubuntu/extvol/fastai-courses/deeplearning1/nbs-custom-mine/data/fish
/home/ubuntu/extvol/fastai-courses/deeplearning1/nbs-custom-mine/data/fish/train

In [5]:
g = glob('*')
for d in g: os.mkdir('../valid/'+d)

g = glob('*/*.jpg')
shuf = np.random.permutation(g)
for i in range(500): os.rename(shuf[i], '../valid/' + shuf[i])

In [6]:
%mkdir ../sample
%mkdir ../sample/train
%mkdir ../sample/valid

In [7]:
from shutil import copyfile

g = glob('*')
for d in g: 
    os.mkdir('../sample/train/'+d)
    os.mkdir('../sample/valid/'+d)

In [8]:
g = glob('*/*.jpg')
shuf = np.random.permutation(g)
for i in range(400): copyfile(shuf[i], '../sample/train/' + shuf[i])

%cd ../valid

g = glob('*/*.jpg')
shuf = np.random.permutation(g)
for i in range(200): copyfile(shuf[i], '../sample/valid/' + shuf[i])

%cd ..


/home/ubuntu/extvol/fastai-courses/deeplearning1/nbs-custom-mine/data/fish/valid
/home/ubuntu/extvol/fastai-courses/deeplearning1/nbs-custom-mine/data/fish

In [9]:
%mkdir results
%mkdir sample/results
%cd ../..


/home/ubuntu/extvol/fastai-courses/deeplearning1/nbs-custom-mine

Basic VGG

We start with our usual VGG approach. We will be using VGG with batch normalization. We explained how to add batch normalization to VGG in the imagenet_batchnorm notebook. VGG with batch normalization is implemented in vgg_bn.py, and there is a version of vgg_ft (our fine tuning function) with batch norm called vgg_ft_bn in utils.py.

Initial model


In [4]:
batches = get_batches(path+'train', batch_size=batch_size)
val_batches = get_batches(path+'valid', batch_size=batch_size*2, shuffle=False)

(val_classes, trn_classes, val_labels, trn_labels, 
    val_filenames, filenames, test_filenames) = get_classes(path)


Found 3277 images belonging to 8 classes.
Found 500 images belonging to 8 classes.
Found 3277 images belonging to 8 classes.
Found 500 images belonging to 8 classes.
Found 1000 images belonging to 1 classes.

Sometimes it's helpful to have just the filenames, without the path.


In [5]:
raw_filenames = [f.split('/')[-1] for f in filenames]
raw_test_filenames = [f.split('/')[-1] for f in test_filenames]
raw_val_filenames = [f.split('/')[-1] for f in val_filenames]

First we create a simple fine-tuned VGG model to be our starting point.


In [6]:
from vgg16bn import Vgg16BN
model = vgg_ft_bn(8)

In [7]:
trn = get_data(path+'train')
val = get_data(path+'valid')
test = get_data(path+'test')


Found 3277 images belonging to 8 classes.
Found 500 images belonging to 8 classes.
Found 1000 images belonging to 1 classes.

In [8]:
save_array(path+'results/trn.dat', trn)
save_array(path+'results/val.dat', val)
save_array(path+'results/test.dat', test)

In [9]:
# trn = load_array(path+'results/trn.dat')
# val = load_array(path+'results/val.dat')
# test = load_array(path+'results/test.dat')

In [10]:
gen = image.ImageDataGenerator()

In [11]:
model.compile(optimizer=Adam(1e-3),loss='categorical_crossentropy', metrics=['accuracy'])

In [12]:
model.fit(trn, trn_labels, batch_size=batch_size, nb_epoch=3, validation_data=(val, val_labels), verbose=2)


Train on 3277 samples, validate on 500 samples
Epoch 1/3
92s - loss: 2.8399 - acc: 0.4706 - val_loss: 1.1267 - val_acc: 0.7120
Epoch 2/3
91s - loss: 1.5604 - acc: 0.6619 - val_loss: 0.8838 - val_acc: 0.7980
Epoch 3/3
93s - loss: 1.2242 - acc: 0.7229 - val_loss: 0.6084 - val_acc: 0.8540
Out[12]:
<keras.callbacks.History at 0x7f0d5e46f1d0>

In [13]:
model.save_weights(path+'results/ft1.h5')

Precompute convolutional output

We pre-compute the output of the last convolution layer of VGG, since we're unlikely to need to fine-tune those layers. (All following analysis will be done on just the pre-computed convolutional features.)


In [14]:
model.load_weights(path+'results/ft1.h5')

In [15]:
conv_layers,fc_layers = split_at(model, Convolution2D)

In [16]:
conv_model = Sequential(conv_layers)

In [17]:
conv_feat = conv_model.predict(trn)
conv_val_feat = conv_model.predict(val)
conv_test_feat = conv_model.predict(test)

In [18]:
save_array(path+'results/conv_val_feat.dat', conv_val_feat)
save_array(path+'results/conv_feat.dat', conv_feat)
save_array(path+'results/conv_test_feat.dat', conv_test_feat)

In [156]:
conv_feat = load_array(path+'results/conv_feat.dat')
conv_val_feat = load_array(path+'results/conv_val_feat.dat')
conv_test_feat = load_array(path+'results/conv_test_feat.dat')

In [157]:
print(conv_feat.shape)
print(conv_val_feat.shape)
print(conv_test_feat.shape)


(3277, 512, 14, 14)
(500, 512, 14, 14)
(1000, 512, 14, 14)

Train model

We can now create our first baseline model - a simple 3-layer FC net.


In [21]:
def get_bn_layers(p):
    return [
        MaxPooling2D(input_shape=conv_layers[-1].output_shape[1:]),
        BatchNormalization(axis=1),
        Dropout(p/4),
        Flatten(),
        Dense(512, activation='relu'),
        BatchNormalization(),
        Dropout(p),
        Dense(512, activation='relu'),
        BatchNormalization(),
        Dropout(p/2),
        Dense(8, activation='softmax')
    ]

In [22]:
p=0.6

In [23]:
bn_model = Sequential(get_bn_layers(p))
bn_model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

In [24]:
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=3, verbose=2,
             validation_data=(conv_val_feat, val_labels))


Train on 3277 samples, validate on 500 samples
Epoch 1/3
2s - loss: 1.0854 - acc: 0.6802 - val_loss: 0.9516 - val_acc: 0.8340
Epoch 2/3
2s - loss: 0.3139 - acc: 0.9115 - val_loss: 0.2875 - val_acc: 0.9380
Epoch 3/3
2s - loss: 0.1580 - acc: 0.9536 - val_loss: 0.2447 - val_acc: 0.9520
Out[24]:
<keras.callbacks.History at 0x7f0cde438610>

In [25]:
bn_model.optimizer.lr = 1e-4

In [26]:
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=7, verbose=2,
             validation_data=(conv_val_feat, val_labels))


Train on 3277 samples, validate on 500 samples
Epoch 1/7
2s - loss: 0.1012 - acc: 0.9744 - val_loss: 0.1874 - val_acc: 0.9580
Epoch 2/7
2s - loss: 0.1032 - acc: 0.9728 - val_loss: 0.2227 - val_acc: 0.9600
Epoch 3/7
2s - loss: 0.0621 - acc: 0.9835 - val_loss: 0.1613 - val_acc: 0.9640
Epoch 4/7
2s - loss: 0.0411 - acc: 0.9893 - val_loss: 0.1394 - val_acc: 0.9640
Epoch 5/7
2s - loss: 0.0322 - acc: 0.9908 - val_loss: 0.1776 - val_acc: 0.9660
Epoch 6/7
2s - loss: 0.0247 - acc: 0.9924 - val_loss: 0.1556 - val_acc: 0.9700
Epoch 7/7
2s - loss: 0.0407 - acc: 0.9887 - val_loss: 0.2213 - val_acc: 0.9540
Out[26]:
<keras.callbacks.History at 0x7f0cde438d50>

In [27]:
bn_model.save_weights(path+'models/conv_512_6.h5')

In [28]:
bn_model.evaluate(conv_val_feat, val_labels)


500/500 [==============================] - 0s     
Out[28]:
[0.22125084658712149, 0.95399999904632571]

In [29]:
# bn_model.load_weights(path+'models/conv_512_6.h5')

Multi-input

The images are of different sizes, which are likely to represent the boat they came from (since different boats will use different cameras). Perhaps this creates some data leakage that we can take advantage of to get a better Kaggle leaderboard position? To find out, first we create arrays of the file sizes for each image:


In [38]:
sizes = [PIL.Image.open(path+'train/'+f).size for f in filenames]

In [39]:
sizes


Out[39]:
[(1280, 720),
 (1192, 670),
 (1280, 750),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 750),
 (1280, 750),
 (1276, 718),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 974),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 750),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1192, 670),
 (1280, 720),
 (1192, 670),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 924),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1244, 700),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 974),
 (1192, 670),
 (1280, 720),
 (1280, 750),
 (1276, 718),
 (1280, 720),
 (1280, 750),
 (1280, 974),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 750),
 (1280, 974),
 (1280, 750),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 924),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 750),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 924),
 (1276, 718),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 974),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 974),
 (1280, 750),
 (1280, 720),
 (1192, 670),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 974),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1518, 854),
 (1192, 670),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 750),
 (1276, 718),
 (1280, 750),
 (1192, 670),
 (1280, 750),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1192, 670),
 (1280, 924),
 (1280, 750),
 (1280, 750),
 (1280, 750),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 750),
 (1280, 720),
 (1280, 750),
 (1280, 974),
 (1280, 720),
 (1280, 720),
 (1280, 924),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 974),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1276, 718),
 (1280, 720),
 (1276, 718),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1192, 670),
 (1280, 924),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 750),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 924),
 (1280, 720),
 (1280, 750),
 (1280, 974),
 (1192, 670),
 (1244, 700),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 974),
 (1280, 720),
 (1280, 750),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 750),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 924),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 974),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 974),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1192, 670),
 (1280, 720),
 (1280, 750),
 (1280, 750),
 (1280, 720),
 (1280, 750),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 924),
 (1280, 720),
 (1280, 924),
 (1192, 670),
 (1280, 974),
 (1280, 720),
 (1518, 854),
 (1280, 720),
 (1192, 670),
 (1276, 718),
 (1280, 720),
 (1280, 924),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 974),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 750),
 (1280, 720),
 (1280, 750),
 (1192, 670),
 (1280, 720),
 (1280, 750),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 974),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 974),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1192, 670),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1192, 670),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 750),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 924),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1276, 718),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 750),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1192, 670),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 974),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 750),
 (1276, 718),
 (1280, 720),
 (1276, 718),
 (1192, 670),
 (1280, 750),
 (1280, 720),
 (1518, 854),
 (1276, 718),
 (1280, 750),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1276, 718),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1276, 718),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 750),
 (1276, 718),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 974),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 974),
 (1192, 670),
 (1280, 750),
 (1280, 720),
 (1280, 750),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1276, 718),
 (1280, 750),
 (1280, 974),
 (1280, 720),
 (1518, 854),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 924),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 974),
 (1280, 720),
 (1280, 924),
 (1280, 750),
 (1280, 720),
 (1280, 750),
 (1192, 670),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1280, 750),
 (1192, 670),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1192, 670),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 974),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1192, 670),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1280, 974),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 750),
 (1280, 720),
 (1280, 750),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 750),
 (1280, 974),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 750),
 (1280, 750),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 924),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 974),
 (1280, 720),
 (1280, 750),
 (1280, 924),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 974),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 750),
 (1280, 720),
 (1280, 750),
 (1276, 718),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1192, 670),
 (1280, 720),
 (1192, 670),
 (1276, 718),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 924),
 (1280, 924),
 (1280, 750),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1518, 854),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 750),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1192, 670),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1276, 718),
 (1280, 720),
 (1280, 750),
 (1280, 750),
 (1280, 720),
 (1192, 670),
 (1276, 718),
 (1280, 750),
 (1280, 750),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1280, 720),
 (1280, 750),
 (1192, 670),
 (1280, 750),
 (1280, 720),
 (1280, 720),
 (1280, 750),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1280, 720),
 (1192, 670),
 (1280, 720),
 (1280, 720),
 ...]

In [40]:
id2size = list(set(sizes))
size2id = {o:i for i,o in enumerate(id2size)}

In [41]:
id2size


Out[41]:
[(1280, 974),
 (1244, 700),
 (1732, 974),
 (1334, 750),
 (1192, 670),
 (1280, 720),
 (1276, 718),
 (1280, 750),
 (1518, 854),
 (1280, 924)]

In [42]:
size2id


Out[42]:
{(1192, 670): 4,
 (1244, 700): 1,
 (1276, 718): 6,
 (1280, 720): 5,
 (1280, 750): 7,
 (1280, 924): 9,
 (1280, 974): 0,
 (1334, 750): 3,
 (1518, 854): 8,
 (1732, 974): 2}

In [43]:
import collections
collections.Counter(sizes)


Out[43]:
Counter({(1192, 670): 159,
         (1244, 700): 23,
         (1276, 718): 182,
         (1280, 720): 1905,
         (1280, 750): 518,
         (1280, 924): 51,
         (1280, 974): 346,
         (1334, 750): 27,
         (1518, 854): 36,
         (1732, 974): 30})

Then we one-hot encode them (since we want to treat them as categorical) and normalize the data.


In [44]:
trn_sizes_orig = to_categorical([size2id[o] for o in sizes], len(id2size))

In [45]:
raw_val_sizes = [PIL.Image.open(path+'valid/'+f).size for f in val_filenames]
val_sizes = to_categorical([size2id[o] for o in raw_val_sizes], len(id2size))

In [46]:
val_sizes


Out[46]:
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  1.,  0.,  0.],
       [ 1.,  0.,  0., ...,  0.,  0.,  0.],
       [ 1.,  0.,  0., ...,  0.,  0.,  0.]])

In [49]:
raw_test_sizes = [PIL.Image.open(path+'test/'+f).size for f in test_filenames]
test_sizes = to_categorical([size2id[o] for o in raw_test_sizes], len(id2size))

In [50]:
trn_sizes = trn_sizes_orig-trn_sizes_orig.mean(axis=0)/trn_sizes_orig.std(axis=0)
val_sizes = val_sizes-trn_sizes_orig.mean(axis=0)/trn_sizes_orig.std(axis=0)
test_sizes = test_sizes-trn_sizes_orig.mean(axis=0)/trn_sizes_orig.std(axis=0)

To use this additional "meta-data", we create a model with multiple input layers - sz_inp will be our input for the size information.


In [51]:
p=0.6

In [52]:
inp = Input(conv_layers[-1].output_shape[1:])
sz_inp = Input((len(id2size),))
bn_inp = BatchNormalization()(sz_inp)

x = MaxPooling2D()(inp)
x = BatchNormalization(axis=1)(x)
x = Dropout(p/4)(x)
x = Flatten()(x)
x = Dense(512, activation='relu')(x)
x = BatchNormalization()(x)
x = Dropout(p)(x)
x = Dense(512, activation='relu')(x)
x = BatchNormalization()(x)
x = Dropout(p/2)(x)
x = merge([x,bn_inp], 'concat')
x = Dense(8, activation='softmax')(x)

When we compile the model, we have to specify all the input layers in an array.


In [53]:
model = Model([inp, sz_inp], x)
model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

And when we train the model, we have to provide all the input layers' data in an array.


In [54]:
model.fit([conv_feat, trn_sizes], trn_labels, batch_size=batch_size, nb_epoch=3, verbose=2,
             validation_data=([conv_val_feat, val_sizes], val_labels))


Train on 3277 samples, validate on 500 samples
Epoch 1/3
2s - loss: 1.1250 - acc: 0.6701 - val_loss: 1.0511 - val_acc: 0.8400
Epoch 2/3
2s - loss: 0.3089 - acc: 0.9127 - val_loss: 0.3697 - val_acc: 0.9300
Epoch 3/3
2s - loss: 0.1575 - acc: 0.9576 - val_loss: 0.2183 - val_acc: 0.9500
Out[54]:
<keras.callbacks.History at 0x7f0cdcbc2710>

In [55]:
bn_model.optimizer.lr = 1e-4

In [56]:
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=8, verbose=2,
             validation_data=(conv_val_feat, val_labels))


Train on 3277 samples, validate on 500 samples
Epoch 1/8
2s - loss: 0.0467 - acc: 0.9896 - val_loss: 0.1644 - val_acc: 0.9640
Epoch 2/8
2s - loss: 0.0273 - acc: 0.9930 - val_loss: 0.1592 - val_acc: 0.9600
Epoch 3/8
2s - loss: 0.0363 - acc: 0.9915 - val_loss: 0.1703 - val_acc: 0.9620
Epoch 4/8
2s - loss: 0.0339 - acc: 0.9905 - val_loss: 0.1881 - val_acc: 0.9640
Epoch 5/8
2s - loss: 0.0416 - acc: 0.9878 - val_loss: 0.1990 - val_acc: 0.9580
Epoch 6/8
2s - loss: 0.0330 - acc: 0.9908 - val_loss: 0.2090 - val_acc: 0.9540
Epoch 7/8
2s - loss: 0.0417 - acc: 0.9887 - val_loss: 0.1889 - val_acc: 0.9580
Epoch 8/8
2s - loss: 0.0324 - acc: 0.9912 - val_loss: 0.2261 - val_acc: 0.9640
Out[56]:
<keras.callbacks.History at 0x7f0cda9aedd0>

The model did not show an improvement by using the leakage, other than in the early epochs. This is most likely because the information about what boat the picture came from is readily identified from the image itself, so the meta-data turned out not to add any additional information.

Bounding boxes & multi output

Import / view bounding boxes

A kaggle user has created bounding box annotations for each fish in each training set image. You can download them from here. We will see if we can utilize this additional information. First, we'll load in the data, and keep just the largest bounding box for each image.


In [57]:
import ujson as json

In [58]:
anno_classes = ['alb', 'bet', 'dol', 'lag', 'other', 'shark', 'yft']

In [59]:
bb_json = {}
for c in anno_classes:
    j = json.load(open('{}annos/{}_labels.json'.format(path, c), 'r'))
    for l in j:
        if 'annotations' in l.keys() and len(l['annotations'])>0:
            bb_json[l['filename'].split('/')[-1]] = sorted(
                l['annotations'], key=lambda x: x['height']*x['width'])[-1]

In [60]:
bb_json['img_04908.jpg']


Out[60]:
{u'class': u'rect',
 u'height': 246.75000000000074,
 u'width': 432.8700000000013,
 u'x': 465.3000000000014,
 u'y': 496.32000000000147}

In [61]:
file2idx = {o:i for i,o in enumerate(raw_filenames)}
val_file2idx = {o:i for i,o in enumerate(raw_val_filenames)}

For any images that have no annotations, we'll create an empty bounding box.


In [62]:
empty_bbox = {'height': 0., 'width': 0., 'x': 0., 'y': 0.}

In [63]:
for f in raw_filenames:
    if not f in bb_json.keys(): bb_json[f] = empty_bbox
for f in raw_val_filenames:
    if not f in bb_json.keys(): bb_json[f] = empty_bbox

Finally, we convert the dictionary into an array, and convert the coordinates to our resized 224x224 images.


In [64]:
bb_params = ['height', 'width', 'x', 'y']
def convert_bb(bb, size):
    bb = [bb[p] for p in bb_params]
    conv_x = (224. / size[0])
    conv_y = (224. / size[1])
    bb[0] = bb[0]*conv_y
    bb[1] = bb[1]*conv_x
    bb[2] = max(bb[2]*conv_x, 0)
    bb[3] = max(bb[3]*conv_y, 0)
    return bb

In [65]:
trn_bbox = np.stack([convert_bb(bb_json[f], s) for f,s in zip(raw_filenames, sizes)], 
                   ).astype(np.float32)
val_bbox = np.stack([convert_bb(bb_json[f], s) 
                   for f,s in zip(raw_val_filenames, raw_val_sizes)]).astype(np.float32)

Now we can check our work by drawing one of the annotations.


In [66]:
def create_rect(bb, color='red'):
    return plt.Rectangle((bb[2], bb[3]), bb[1], bb[0], color=color, fill=False, lw=3)

def show_bb(i):
    bb = val_bbox[i]
    plot(val[i])
    plt.gca().add_patch(create_rect(bb))

In [67]:
show_bb(0)


Create & train model

Since we're not allowed (by the kaggle rules) to manually annotate the test set, we'll need to create a model that predicts the locations of the bounding box on each image. To do so, we create a model with multiple outputs: it will predict both the type of fish (the 'class'), and the 4 bounding box coordinates. We prefer this approach to only predicting the bounding box coordinates, since we hope that giving the model more context about what it's looking for will help it with both tasks.


In [68]:
p=0.6

In [69]:
inp = Input(conv_layers[-1].output_shape[1:])
x = MaxPooling2D()(inp)
x = BatchNormalization(axis=1)(x)
x = Dropout(p/4)(x)
x = Flatten()(x)
x = Dense(512, activation='relu')(x)
x = BatchNormalization()(x)
x = Dropout(p)(x)
x = Dense(512, activation='relu')(x)
x = BatchNormalization()(x)
x = Dropout(p/2)(x)
x_bb = Dense(4, name='bb')(x)
x_class = Dense(8, activation='softmax', name='class')(x)

Since we have multiple outputs, we need to provide them to the model constructor in an array, and we also need to say what loss function to use for each. We also weight the bounding box loss function down by 1000x since the scale of the cross-entropy loss and the MSE is very different.


In [70]:
model = Model([inp], [x_bb, x_class])
model.compile(Adam(lr=0.001), loss=['mse', 'categorical_crossentropy'], metrics=['accuracy'],
             loss_weights=[.001, 1.])

In [71]:
model.fit(conv_feat, [trn_bbox, trn_labels], batch_size=batch_size, nb_epoch=3, verbose=2,
             validation_data=(conv_val_feat, [val_bbox, val_labels]))


Train on 3277 samples, validate on 500 samples
Epoch 1/3
2s - loss: 6.2004 - bb_loss: 5027.3073 - class_loss: 1.1731 - bb_acc: 0.3949 - class_acc: 0.6570 - val_loss: 5.1280 - val_bb_loss: 4262.7622 - val_class_loss: 0.8652 - val_bb_acc: 0.5620 - val_class_acc: 0.8460
Epoch 2/3
2s - loss: 5.0543 - bb_loss: 4755.4110 - class_loss: 0.2989 - bb_acc: 0.4611 - class_acc: 0.9139 - val_loss: 4.3070 - val_bb_loss: 4025.0525 - val_class_loss: 0.2819 - val_bb_acc: 0.5280 - val_class_acc: 0.9320
Epoch 3/3
2s - loss: 4.4736 - bb_loss: 4297.4391 - class_loss: 0.1761 - bb_acc: 0.4983 - class_acc: 0.9512 - val_loss: 3.8345 - val_bb_loss: 3629.4188 - val_class_loss: 0.2051 - val_bb_acc: 0.5460 - val_class_acc: 0.9600
Out[71]:
<keras.callbacks.History at 0x7f0cd113e550>

In [72]:
model.optimizer.lr = 1e-5

In [73]:
model.fit(conv_feat, [trn_bbox, trn_labels], batch_size=batch_size, nb_epoch=10, verbose=2,
             validation_data=(conv_val_feat, [val_bbox, val_labels]))


Train on 3277 samples, validate on 500 samples
Epoch 1/10
2s - loss: 3.7778 - bb_loss: 3660.0914 - class_loss: 0.1177 - bb_acc: 0.5490 - class_acc: 0.9713 - val_loss: 3.3886 - val_bb_loss: 3154.2368 - val_class_loss: 0.2343 - val_bb_acc: 0.5700 - val_class_acc: 0.9500
Epoch 2/10
2s - loss: 2.9983 - bb_loss: 2931.0076 - class_loss: 0.0673 - bb_acc: 0.5957 - class_acc: 0.9820 - val_loss: 2.6216 - val_bb_loss: 2460.3486 - val_class_loss: 0.1612 - val_bb_acc: 0.6420 - val_class_acc: 0.9600
Epoch 3/10
2s - loss: 2.1767 - bb_loss: 2120.2324 - class_loss: 0.0565 - bb_acc: 0.6430 - class_acc: 0.9881 - val_loss: 1.9147 - val_bb_loss: 1764.5918 - val_class_loss: 0.1501 - val_bb_acc: 0.7080 - val_class_acc: 0.9640
Epoch 4/10
2s - loss: 1.3838 - bb_loss: 1343.7766 - class_loss: 0.0401 - bb_acc: 0.6930 - class_acc: 0.9884 - val_loss: 1.2519 - val_bb_loss: 1024.5410 - val_class_loss: 0.2274 - val_bb_acc: 0.7460 - val_class_acc: 0.9580
Epoch 5/10
2s - loss: 0.8441 - bb_loss: 790.5873 - class_loss: 0.0535 - bb_acc: 0.7394 - class_acc: 0.9835 - val_loss: 0.9005 - val_bb_loss: 653.3226 - val_class_loss: 0.2472 - val_bb_acc: 0.7980 - val_class_acc: 0.9560
Epoch 6/10
2s - loss: 0.5046 - bb_loss: 465.8909 - class_loss: 0.0387 - bb_acc: 0.7772 - class_acc: 0.9902 - val_loss: 0.6213 - val_bb_loss: 423.3583 - val_class_loss: 0.1980 - val_bb_acc: 0.8480 - val_class_acc: 0.9700
Epoch 7/10
2s - loss: 0.3455 - bb_loss: 316.7136 - class_loss: 0.0288 - bb_acc: 0.8062 - class_acc: 0.9924 - val_loss: 0.5411 - val_bb_loss: 346.5716 - val_class_loss: 0.1946 - val_bb_acc: 0.8560 - val_class_acc: 0.9680
Epoch 8/10
2s - loss: 0.2873 - bb_loss: 265.7755 - class_loss: 0.0216 - bb_acc: 0.8154 - class_acc: 0.9930 - val_loss: 0.5199 - val_bb_loss: 328.7005 - val_class_loss: 0.1912 - val_bb_acc: 0.8560 - val_class_acc: 0.9660
Epoch 9/10
2s - loss: 0.2693 - bb_loss: 251.4356 - class_loss: 0.0178 - bb_acc: 0.8068 - class_acc: 0.9948 - val_loss: 0.5444 - val_bb_loss: 311.7663 - val_class_loss: 0.2326 - val_bb_acc: 0.8500 - val_class_acc: 0.9620
Epoch 10/10
2s - loss: 0.2571 - bb_loss: 232.1723 - class_loss: 0.0249 - bb_acc: 0.7980 - class_acc: 0.9945 - val_loss: 0.5016 - val_bb_loss: 305.0378 - val_class_loss: 0.1965 - val_bb_acc: 0.8440 - val_class_acc: 0.9640
Out[73]:
<keras.callbacks.History at 0x7f0cd1ac49d0>

Excitingly, it turned out that the classification model is much improved by giving it this additional task. Let's see how well the bounding box model did by taking a look at its output.


In [74]:
pred = model.predict(conv_val_feat[0:10])

In [75]:
def show_bb_pred(i):
    bb = val_bbox[i]
    bb_pred = pred[0][i]
    plt.figure(figsize=(6,6))
    plot(val[i])
    ax=plt.gca()
    ax.add_patch(create_rect(bb_pred, 'yellow'))
    ax.add_patch(create_rect(bb))

The image shows that it can find fish that are tricky for us to see!


In [76]:
show_bb_pred(6)



In [77]:
model.evaluate(conv_val_feat, [val_bbox, val_labels])


500/500 [==============================] - 0s     
Out[77]:
[0.50158701372146608,
 305.03782080078128,
 0.19654918460361659,
 0.84399999952316285,
 0.96399999952316284]

In [78]:
model.save_weights(path+'models/bn_anno.h5')

In [79]:
model.load_weights(path+'models/bn_anno.h5')

Larger size

Set up data

Let's see if we get better results if we use larger images. We'll use 640x360, since it's the same shape as the most common size we saw earlier (1280x720), without being too big.


In [80]:
trn = get_data(path+'train', (360,640))
val = get_data(path+'valid', (360,640))


Found 3277 images belonging to 8 classes.
Found 500 images belonging to 8 classes.

The image shows that things are much clearer at this size.


In [81]:
plot(trn[0])



In [82]:
test = get_data(path+'test', (360,640))


Found 1000 images belonging to 1 classes.

In [83]:
save_array(path+'results/trn_640.dat', trn)
save_array(path+'results/val_640.dat', val)

In [84]:
save_array(path+'results/test_640.dat', test)

In [85]:
# trn = load_array(path+'results/trn_640.dat')
# val = load_array(path+'results/val_640.dat')

We can now create our VGG model - we'll need to tell it we're not using the normal 224x224 images, which also means it won't include the fully connected layers (since they don't make sense for non-default sizes). We will also remove the last max pooling layer, since we don't want to throw away information yet.


In [86]:
vgg640 = Vgg16BN((360, 640)).model
vgg640.pop()
vgg640.input_shape, vgg640.output_shape
vgg640.compile(Adam(), 'categorical_crossentropy', metrics=['accuracy'])

We can now pre-compute the output of the convolutional part of VGG.


In [87]:
conv_val_feat = vgg640.predict(val, batch_size=32, verbose=2)
conv_trn_feat = vgg640.predict(trn, batch_size=32, verbose=2)

In [88]:
save_array(path+'results/conv_val_640.dat', conv_val_feat)
save_array(path+'results/conv_trn_640.dat', conv_trn_feat)

In [89]:
conv_test_feat = vgg640.predict(test, batch_size=32, verbose=2)

In [90]:
save_array(path+'results/conv_test_640.dat', conv_test_feat)

In [91]:
# conv_val_feat = load_array(path+'results/conv_val_640.dat')
# conv_trn_feat = load_array(path+'results/conv_trn_640.dat')

In [92]:
# conv_test_feat = load_array(path+'results/conv_test_640.dat')

Fully convolutional net (FCN)

Since we're using a larger input, the output of the final convolutional layer is also larger. So we probably don't want to put a dense layer there - that would be a lot of parameters! Instead, let's use a fully convolutional net (FCN); this also has the benefit that they tend to generalize well, and also seems like a good fit for our problem (since the fish are a small part of the image).


In [93]:
conv_layers,_ = split_at(vgg640, Convolution2D)

I'm not using any dropout, since I found I got better results without it.


In [94]:
nf=128; p=0.

In [95]:
def get_lrg_layers():
    return [
        BatchNormalization(axis=1, input_shape=conv_layers[-1].output_shape[1:]),
        Convolution2D(nf,3,3, activation='relu', border_mode='same'),
        BatchNormalization(axis=1),
        MaxPooling2D(),
        Convolution2D(nf,3,3, activation='relu', border_mode='same'),
        BatchNormalization(axis=1),
        MaxPooling2D(),
        Convolution2D(nf,3,3, activation='relu', border_mode='same'),
        BatchNormalization(axis=1),
        MaxPooling2D((1,2)),
        Convolution2D(8,3,3, border_mode='same'),
        Dropout(p),
        GlobalAveragePooling2D(),
        Activation('softmax')
    ]

In [96]:
lrg_model = Sequential(get_lrg_layers())

In [97]:
lrg_model.summary()


____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
batchnormalization_13 (BatchNorm (None, 512, 22, 40)   2048        batchnormalization_input_1[0][0] 
____________________________________________________________________________________________________
convolution2d_27 (Convolution2D) (None, 128, 22, 40)   589952      batchnormalization_13[0][0]      
____________________________________________________________________________________________________
batchnormalization_14 (BatchNorm (None, 128, 22, 40)   512         convolution2d_27[0][0]           
____________________________________________________________________________________________________
maxpooling2d_14 (MaxPooling2D)   (None, 128, 11, 20)   0           batchnormalization_14[0][0]      
____________________________________________________________________________________________________
convolution2d_28 (Convolution2D) (None, 128, 11, 20)   147584      maxpooling2d_14[0][0]            
____________________________________________________________________________________________________
batchnormalization_15 (BatchNorm (None, 128, 11, 20)   512         convolution2d_28[0][0]           
____________________________________________________________________________________________________
maxpooling2d_15 (MaxPooling2D)   (None, 128, 5, 10)    0           batchnormalization_15[0][0]      
____________________________________________________________________________________________________
convolution2d_29 (Convolution2D) (None, 128, 5, 10)    147584      maxpooling2d_15[0][0]            
____________________________________________________________________________________________________
batchnormalization_16 (BatchNorm (None, 128, 5, 10)    512         convolution2d_29[0][0]           
____________________________________________________________________________________________________
maxpooling2d_16 (MaxPooling2D)   (None, 128, 5, 5)     0           batchnormalization_16[0][0]      
____________________________________________________________________________________________________
convolution2d_30 (Convolution2D) (None, 8, 5, 5)       9224        maxpooling2d_16[0][0]            
____________________________________________________________________________________________________
dropout_12 (Dropout)             (None, 8, 5, 5)       0           convolution2d_30[0][0]           
____________________________________________________________________________________________________
globalaveragepooling2d_1 (Global (None, 8)             0           dropout_12[0][0]                 
____________________________________________________________________________________________________
activation_1 (Activation)        (None, 8)             0           globalaveragepooling2d_1[0][0]   
====================================================================================================
Total params: 897,928
Trainable params: 896,136
Non-trainable params: 1,792
____________________________________________________________________________________________________

In [98]:
lrg_model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

In [99]:
lrg_model.fit(conv_trn_feat, trn_labels, batch_size=batch_size, nb_epoch=2, verbose = 2,
             validation_data=(conv_val_feat, val_labels))


Train on 3277 samples, validate on 500 samples
Epoch 1/2
12s - loss: 0.6384 - acc: 0.7931 - val_loss: 1.8815 - val_acc: 0.6420
Epoch 2/2
12s - loss: 0.1432 - acc: 0.9619 - val_loss: 0.9283 - val_acc: 0.7560
Out[99]:
<keras.callbacks.History at 0x7f0e21e5da50>

In [100]:
lrg_model.optimizer.lr=1e-5

In [101]:
lrg_model.fit(conv_trn_feat, trn_labels, batch_size=batch_size, nb_epoch=6, verbose=2,
             validation_data=(conv_val_feat, val_labels))


Train on 3277 samples, validate on 500 samples
Epoch 1/6
12s - loss: 0.0353 - acc: 0.9896 - val_loss: 0.3336 - val_acc: 0.9000
Epoch 2/6
12s - loss: 0.0236 - acc: 0.9948 - val_loss: 0.2499 - val_acc: 0.9460
Epoch 3/6
12s - loss: 0.0238 - acc: 0.9936 - val_loss: 0.3148 - val_acc: 0.9120
Epoch 4/6
12s - loss: 0.0231 - acc: 0.9942 - val_loss: 0.2819 - val_acc: 0.9260
Epoch 5/6
12s - loss: 0.0140 - acc: 0.9969 - val_loss: 0.2966 - val_acc: 0.9320
Epoch 6/6
12s - loss: 0.0119 - acc: 0.9979 - val_loss: 0.2419 - val_acc: 0.9400
Out[101]:
<keras.callbacks.History at 0x7f0e26387950>

When I submitted the results of this model to Kaggle, I got the best single model results of any shown here (ranked 22nd on the leaderboard as at Dec-6-2016.)


In [102]:
lrg_model.save_weights(path+'models/lrg_nmp.h5')

In [103]:
# lrg_model.load_weights(path+'models/lrg_nmp.h5')

In [104]:
lrg_model.evaluate(conv_val_feat, val_labels)


480/500 [===========================>..] - ETA: 0s
Out[104]:
[0.24193315172195434, 0.93999999952316282]

Another benefit of this kind of model is that the last convolutional layer has to learn to classify each part of the image (since there's only an average pooling layer after). Let's create a function that grabs the output of this layer (which is the 4th-last layer of our model).


In [105]:
l = lrg_model.layers
conv_fn = K.function([l[0].input, K.learning_phase()], l[-4].output)

In [106]:
def get_cm(imp, label):
    conv = conv_fn([inp,0])[0, label]
    return scipy.misc.imresize(conv, (360,640), interp='nearest')

We have to add an extra dimension to our input since the CNN expects a 'batch' (even if it's just a batch of one).


In [107]:
inp = np.expand_dims(conv_val_feat[0], 0)
np.round(lrg_model.predict(inp)[0],2)


Out[107]:
array([ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.], dtype=float32)

In [108]:
plt.imshow(to_plot(val[0]))


Out[108]:
<matplotlib.image.AxesImage at 0x7f0e1e092f10>

In [109]:
cm = get_cm(inp, 0)

The heatmap shows that (at very low resolution) the model is finding the fish!


In [110]:
plt.imshow(cm, cmap="cool")


Out[110]:
<matplotlib.image.AxesImage at 0x7f0e1e014dd0>

All convolutional net heatmap

To create a higher resolution heatmap, we'll remove all the max pooling layers, and repeat the previous steps.


In [111]:
def get_lrg_layers():
    return [
        BatchNormalization(axis=1, input_shape=conv_layers[-1].output_shape[1:]),
        Convolution2D(nf,3,3, activation='relu', border_mode='same'),
        BatchNormalization(axis=1),
        Convolution2D(nf,3,3, activation='relu', border_mode='same'),
        BatchNormalization(axis=1),
        Convolution2D(nf,3,3, activation='relu', border_mode='same'),
        BatchNormalization(axis=1),
        Convolution2D(8,3,3, border_mode='same'),
        GlobalAveragePooling2D(),
        Activation('softmax')
    ]

In [112]:
lrg_model = Sequential(get_lrg_layers())

In [113]:
lrg_model.summary()


____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
batchnormalization_17 (BatchNorm (None, 512, 22, 40)   2048        batchnormalization_input_2[0][0] 
____________________________________________________________________________________________________
convolution2d_31 (Convolution2D) (None, 128, 22, 40)   589952      batchnormalization_17[0][0]      
____________________________________________________________________________________________________
batchnormalization_18 (BatchNorm (None, 128, 22, 40)   512         convolution2d_31[0][0]           
____________________________________________________________________________________________________
convolution2d_32 (Convolution2D) (None, 128, 22, 40)   147584      batchnormalization_18[0][0]      
____________________________________________________________________________________________________
batchnormalization_19 (BatchNorm (None, 128, 22, 40)   512         convolution2d_32[0][0]           
____________________________________________________________________________________________________
convolution2d_33 (Convolution2D) (None, 128, 22, 40)   147584      batchnormalization_19[0][0]      
____________________________________________________________________________________________________
batchnormalization_20 (BatchNorm (None, 128, 22, 40)   512         convolution2d_33[0][0]           
____________________________________________________________________________________________________
convolution2d_34 (Convolution2D) (None, 8, 22, 40)     9224        batchnormalization_20[0][0]      
____________________________________________________________________________________________________
globalaveragepooling2d_2 (Global (None, 8)             0           convolution2d_34[0][0]           
____________________________________________________________________________________________________
activation_2 (Activation)        (None, 8)             0           globalaveragepooling2d_2[0][0]   
====================================================================================================
Total params: 897,928
Trainable params: 896,136
Non-trainable params: 1,792
____________________________________________________________________________________________________

In [114]:
lrg_model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

In [115]:
lrg_model.fit(conv_trn_feat, trn_labels, batch_size=batch_size, nb_epoch=2, verbose=2,
             validation_data=(conv_val_feat, val_labels))


Train on 3277 samples, validate on 500 samples
Epoch 1/2
16s - loss: 0.9477 - acc: 0.7104 - val_loss: 2.1907 - val_acc: 0.6260
Epoch 2/2
16s - loss: 0.2583 - acc: 0.9310 - val_loss: 1.0893 - val_acc: 0.7980
Out[115]:
<keras.callbacks.History at 0x7f0e1b790c50>

In [116]:
lrg_model.optimizer.lr=1e-5

In [117]:
lrg_model.fit(conv_trn_feat, trn_labels, batch_size=batch_size, nb_epoch=6, verbose=2,
             validation_data=(conv_val_feat, val_labels))


Train on 3277 samples, validate on 500 samples
Epoch 1/6
16s - loss: 0.1140 - acc: 0.9701 - val_loss: 0.3595 - val_acc: 0.9160
Epoch 2/6
16s - loss: 0.0910 - acc: 0.9713 - val_loss: 0.3457 - val_acc: 0.9340
Epoch 3/6
16s - loss: 0.0388 - acc: 0.9912 - val_loss: 0.2329 - val_acc: 0.9560
Epoch 4/6
16s - loss: 0.0369 - acc: 0.9896 - val_loss: 0.2382 - val_acc: 0.9500
Epoch 5/6
16s - loss: 0.0119 - acc: 0.9982 - val_loss: 0.2295 - val_acc: 0.9540
Epoch 6/6
16s - loss: 0.0094 - acc: 0.9985 - val_loss: 0.2460 - val_acc: 0.9480
Out[117]:
<keras.callbacks.History at 0x7f0e1b78e0d0>

In [118]:
lrg_model.save_weights(path+'models/lrg_0mp.h5')

In [119]:
# lrg_model.load_weights(path+'models/lrg_0mp.h5')

Create heatmap


In [120]:
l = lrg_model.layers
conv_fn = K.function([l[0].input, K.learning_phase()], l[-3].output)

In [121]:
def get_cm2(imp, label):
    conv = conv_fn([inp,0])[0, label]
    return scipy.misc.imresize(conv, (360,640))

In [122]:
inp = np.expand_dims(conv_val_feat[0], 0)

In [123]:
plt.imshow(to_plot(val[0]))


Out[123]:
<matplotlib.image.AxesImage at 0x7f0e1ab0e210>

In [124]:
cm = get_cm2(inp, 0)

In [125]:
cm = get_cm2(inp, 4)

In [126]:
plt.imshow(cm, cmap="cool")


Out[126]:
<matplotlib.image.AxesImage at 0x7f0e1aa70a90>

In [127]:
plt.figure(figsize=(10,10))
plot(val[0])
plt.imshow(cm, cmap="cool", alpha=0.5)


Out[127]:
<matplotlib.image.AxesImage at 0x7f0e1ab0e6d0>

Inception mini-net

Here's an example of how to create and use "inception blocks" - as you see, they use multiple different convolution filter sizes and concatenate the results together. We'll talk more about these next year.


In [128]:
def conv2d_bn(x, nb_filter, nb_row, nb_col, subsample=(1, 1)):
    x = Convolution2D(nb_filter, nb_row, nb_col,
                      subsample=subsample, activation='relu', border_mode='same')(x)
    return BatchNormalization(axis=1)(x)

In [129]:
def incep_block(x):
    branch1x1 = conv2d_bn(x, 32, 1, 1, subsample=(2, 2))
    branch5x5 = conv2d_bn(x, 24, 1, 1)
    branch5x5 = conv2d_bn(branch5x5, 32, 5, 5, subsample=(2, 2))

    branch3x3dbl = conv2d_bn(x, 32, 1, 1)
    branch3x3dbl = conv2d_bn(branch3x3dbl, 48, 3, 3)
    branch3x3dbl = conv2d_bn(branch3x3dbl, 48, 3, 3, subsample=(2, 2))

    branch_pool = AveragePooling2D(
        (3, 3), strides=(2, 2), border_mode='same')(x)
    branch_pool = conv2d_bn(branch_pool, 16, 1, 1)
    return merge([branch1x1, branch5x5, branch3x3dbl, branch_pool],
              mode='concat', concat_axis=1)

In [130]:
inp = Input(vgg640.layers[-1].output_shape[1:]) 
x = BatchNormalization(axis=1)(inp)
x = incep_block(x)
x = incep_block(x)
x = incep_block(x)
x = Dropout(0.75)(x)
x = Convolution2D(8,3,3, border_mode='same')(x)
x = GlobalAveragePooling2D()(x)
outp = Activation('softmax')(x)

In [131]:
lrg_model = Model([inp], outp)

In [132]:
lrg_model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

In [133]:
lrg_model.fit(conv_trn_feat, trn_labels, batch_size=batch_size, nb_epoch=2, verbose=2,
             validation_data=(conv_val_feat, val_labels))


Train on 3277 samples, validate on 500 samples
Epoch 1/2
12s - loss: 1.3259 - acc: 0.5478 - val_loss: 1.5660 - val_acc: 0.6220
Epoch 2/2
12s - loss: 0.4475 - acc: 0.8712 - val_loss: 1.0315 - val_acc: 0.8200
Out[133]:
<keras.callbacks.History at 0x7f0e158d7990>

In [134]:
lrg_model.optimizer.lr=1e-5

In [135]:
lrg_model.fit(conv_trn_feat, trn_labels, batch_size=batch_size, nb_epoch=6, verbose=2,
             validation_data=(conv_val_feat, val_labels))


Train on 3277 samples, validate on 500 samples
Epoch 1/6
12s - loss: 0.1625 - acc: 0.9600 - val_loss: 0.5040 - val_acc: 0.8960
Epoch 2/6
12s - loss: 0.0683 - acc: 0.9844 - val_loss: 0.3150 - val_acc: 0.9220
Epoch 3/6
12s - loss: 0.0438 - acc: 0.9899 - val_loss: 0.2614 - val_acc: 0.9340
Epoch 4/6
12s - loss: 0.0248 - acc: 0.9948 - val_loss: 0.2517 - val_acc: 0.9400
Epoch 5/6
12s - loss: 0.0147 - acc: 0.9982 - val_loss: 0.2623 - val_acc: 0.9440
Epoch 6/6
12s - loss: 0.0064 - acc: 1.0000 - val_loss: 0.2409 - val_acc: 0.9420
Out[135]:
<keras.callbacks.History at 0x7f0e1573d2d0>

In [136]:
lrg_model.fit(conv_trn_feat, trn_labels, batch_size=batch_size, nb_epoch=10, verbose=2,
             validation_data=(conv_val_feat, val_labels))


Train on 3277 samples, validate on 500 samples
Epoch 1/10
12s - loss: 0.0041 - acc: 0.9997 - val_loss: 0.3027 - val_acc: 0.9380
Epoch 2/10
12s - loss: 0.0060 - acc: 0.9991 - val_loss: 0.2148 - val_acc: 0.9500
Epoch 3/10
12s - loss: 0.0027 - acc: 1.0000 - val_loss: 0.2181 - val_acc: 0.9500
Epoch 4/10
12s - loss: 0.0021 - acc: 1.0000 - val_loss: 0.2199 - val_acc: 0.9540
Epoch 5/10
12s - loss: 0.0016 - acc: 1.0000 - val_loss: 0.2265 - val_acc: 0.9540
Epoch 6/10
12s - loss: 0.0032 - acc: 0.9994 - val_loss: 0.2566 - val_acc: 0.9460
Epoch 7/10
12s - loss: 0.0022 - acc: 1.0000 - val_loss: 0.2514 - val_acc: 0.9500
Epoch 8/10
12s - loss: 0.0018 - acc: 1.0000 - val_loss: 0.2313 - val_acc: 0.9560
Epoch 9/10
12s - loss: 0.0247 - acc: 0.9948 - val_loss: 0.4098 - val_acc: 0.9200
Epoch 10/10
12s - loss: 0.1216 - acc: 0.9600 - val_loss: 0.5418 - val_acc: 0.8840
Out[136]:
<keras.callbacks.History at 0x7f0e1a626910>

In [137]:
lrg_model.save_weights(path+'models/lrg_nmp.h5')

In [138]:
# lrg_model.load_weights(path+'models/lrg_nmp.h5')

Pseudo-labeling


In [158]:
preds = model.predict(conv_test_feat)

In [159]:
gen = image.ImageDataGenerator()

In [160]:
test_batches = gen.flow(conv_test_feat, preds, batch_size=16)


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-160-ba8f45991d06> in <module>()
----> 1 test_batches = gen.flow(conv_test_feat, preds, batch_size=16)

/home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/preprocessing/image.pyc in flow(self, X, y, batch_size, shuffle, seed, save_to_dir, save_prefix, save_format)
    425             save_to_dir=save_to_dir,
    426             save_prefix=save_prefix,
--> 427             save_format=save_format)
    428 
    429     def flow_from_directory(self, directory,

/home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/preprocessing/image.pyc in __init__(self, x, y, image_data_generator, batch_size, shuffle, seed, dim_ordering, save_to_dir, save_prefix, save_format)
    673                              'should have the same length. '
    674                              'Found: X.shape = %s, y.shape = %s' %
--> 675                              (np.asarray(x).shape, np.asarray(y).shape))
    676         if dim_ordering == 'default':
    677             dim_ordering = K.image_dim_ordering()

/home/ubuntu/anaconda2/lib/python2.7/site-packages/numpy/core/numeric.pyc in asarray(a, dtype, order)
    480 
    481     """
--> 482     return array(a, dtype, copy=False, order=order)
    483 
    484 def asanyarray(a, dtype=None, order=None):

ValueError: could not broadcast input array from shape (1000,4) into shape (1000)

In [161]:
val_batches = gen.flow(conv_val_feat, val_labels, batch_size=4)


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-161-f53994576015> in <module>()
----> 1 val_batches = gen.flow(conv_val_feat, val_labels, batch_size=4)

/home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/preprocessing/image.pyc in flow(self, X, y, batch_size, shuffle, seed, save_to_dir, save_prefix, save_format)
    425             save_to_dir=save_to_dir,
    426             save_prefix=save_prefix,
--> 427             save_format=save_format)
    428 
    429     def flow_from_directory(self, directory,

/home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/preprocessing/image.pyc in __init__(self, x, y, image_data_generator, batch_size, shuffle, seed, dim_ordering, save_to_dir, save_prefix, save_format)
    688                              'either 1, 3 or 4 channels on axis ' + str(channels_axis) + '. '
    689                              'However, it was passed an array with shape ' + str(self.x.shape) +
--> 690                              ' (' + str(self.x.shape[channels_axis]) + ' channels).')
    691         if y is not None:
    692             self.y = np.asarray(y)

ValueError: NumpyArrayIterator is set to use the dimension ordering convention "th" (channels on axis 1), i.e. expected either 1, 3 or 4 channels on axis 1. However, it was passed an array with shape (500, 512, 14, 14) (512 channels).

In [ ]:
batches = gen.flow(conv_feat, trn_labels, batch_size=44)

In [ ]:
mi = MixIterator([batches, test_batches, val_batches)

In [ ]:
bn_model.fit_generator(mi, mi.N, nb_epoch=8, validation_data=(conv_val_feat, val_labels), verbose=2)

Submit


In [821]:
def do_clip(arr, mx): return np.clip(arr, (1-mx)/7, mx)

In [829]:
lrg_model.evaluate(conv_val_feat, val_labels, batch_size*2)


500/500 [==============================] - 0s     
Out[829]:
[0.11417267167568207, 0.97199999332427978]

In [851]:
preds = model.predict(conv_test_feat, batch_size=batch_size)

In [852]:
preds = preds[1]

In [25]:
test = load_array(path+'results/test_640.dat')

In [5]:
test = load_array(path+'results/test.dat')

In [26]:
preds = conv_model.predict(test, batch_size=32)

In [853]:
subm = do_clip(preds,0.82)

In [854]:
subm_name = path+'results/subm_bb.gz'

In [855]:
# classes = sorted(batches.class_indices, key=batches.class_indices.get)
classes = ['ALB', 'BET', 'DOL', 'LAG', 'NoF', 'OTHER', 'SHARK', 'YFT']

In [856]:
submission = pd.DataFrame(subm, columns=classes)
submission.insert(0, 'image', raw_test_filenames)
submission.head()


Out[856]:
image ALB BET DOL LAG NoF OTHER SHARK YFT
0 img_00005.jpg 0.025714 0.025714 0.025714 0.025714 0.820000 0.025714 0.025714 0.025714
1 img_00007.jpg 0.820000 0.025714 0.025714 0.025714 0.025714 0.025714 0.025714 0.025714
2 img_00009.jpg 0.820000 0.025714 0.025714 0.025714 0.025714 0.025714 0.025714 0.025714
3 img_00018.jpg 0.457916 0.025714 0.025714 0.025714 0.025714 0.539635 0.025714 0.025714
4 img_00027.jpg 0.820000 0.025714 0.025714 0.025714 0.025714 0.025714 0.025714 0.102664

In [857]:
submission.to_csv(subm_name, index=False, compression='gzip')

In [858]:
FileLink(subm_name)





In [ ]: