Welcome to the first week of the first deep learning certificate! We're going to use convolutional neural networks (CNNs) to allow our computer to see - something that is only possible thanks to deep learning.
We're going to try to create a model to enter the Dogs vs Cats competition at Kaggle. There are 25,000 labelled dog and cat photos available for training, and 12,500 in the test set that we have to try to label for this competition. According to the Kaggle web-site, when this competition was launched (end of 2013): "State of the art: The current literature suggests machine classifiers can score above 80% accuracy on this task". So if we can beat 80%, then we will be at the cutting edge as of 2013!
There isn't too much to do to get started - just a few simple configuration steps.
This shows plots in the web page itself - we always wants to use this when using jupyter notebook:
In [1]:
%matplotlib inline
Define path to data: (It's a good idea to put it in a subdirectory of your notebooks folder, and then exclude that directory from git control by adding it to .gitignore.)
In [2]:
path = "/input/sample/"
#path = "data/dogscats/sample/"
A few basic libraries that we'll need for the initial exercises:
In [3]:
from __future__ import division,print_function
import os, json
from glob import glob
import numpy as np
np.set_printoptions(precision=4, linewidth=100)
from matplotlib import pyplot as plt
We have created a file most imaginatively called 'utils.py' to store any little convenience functions we'll want to use. We will discuss these as we use them.
In [4]:
import utils; reload(utils)
from utils import plots
Our first step is simply to use a model that has been fully created for us, which can recognise a wide variety (1,000 categories) of images. We will use 'VGG', which won the 2014 Imagenet competition, and is a very simple model to create and understand. The VGG Imagenet team created both a larger, slower, slightly more accurate model (VGG 19) and a smaller, faster model (VGG 16). We will be using VGG 16 since the much slower performance of VGG19 is generally not worth the very minor improvement in accuracy.
We have created a python class, Vgg16, which makes using the VGG 16 model very straightforward.
In [5]:
# As large as you can, but no larger than 64 is recommended.
# If you have an older or cheaper GPU, you'll run out of memory, so will have to decrease this.
batch_size=64
num_epochs=2
In [6]:
# Import our class, and instantiate
import vgg16; reload(vgg16)
from vgg16 import Vgg16
In [7]:
vgg = Vgg16()
In [12]:
weights_file = None;
for epoch in range(num_epochs):
print("Running epoch: %d" % epoch)
# Grab a few images at a time for training and validation.
# NB: They must be in subdirectories named based on their category
batches = vgg.get_batches(path+'train', batch_size=batch_size)
print('batches.nb_sample=%d' % (batches.nb_sample,))
val_batches = vgg.get_batches(path+'valid', batch_size=batch_size*2)
vgg.finetune(batches)
vgg.fit(batches, val_batches, nb_epoch=1)
weights_file = '/output/ft%d.h5' % epoch
vgg.model.save_weights(weights_file)
print ("Completed %s fit operations" % num_epochs)
In [ ]:
The code above will work for any image recognition task, with any number of categories! All you have to do is to put your images into one folder per category, and run the code above.
Let's take a look at how this works, step by step...
In [72]:
batches, preds = vgg.test(path+'test', batch_size = batch_size*2)
In [ ]:
#Save our test results arrays so we can use them again later
save_array('/output/test_preds.dat', preds)
save_array('/output/filenames.dat', filenames)
In [73]:
#Grab the dog prediction column
isdog = preds[:,1]
print "Raw Predictions: " + str(isdog[:5])
print "Mid Predictions: " + str(isdog[(isdog < .6) & (isdog > .4)])
print "Edge Predictions: " + str(isdog[(isdog == 1) | (isdog == 0)])
Log Loss doesn't support probability values of 0 or 1--they are undefined (and we have many). Fortunately, Kaggle helps us by offsetting our 0s and 1s by a very small value. So if we upload our submission now we will have lots of .99999999 and .000000001 values. This seems good, right?
Not so. There is an additional twist due to how log loss is calculated--log loss rewards predictions that are confident and correct (p=.9999,label=1), but it punishes predictions that are confident and wrong far more (p=.0001,label=1). See visualization below.
In [ ]:
#So to play it safe, we use a sneaky trick to round down our edge predictions
#Swap all ones with .95 and all zeros with .05
isdog = isdog.clip(min=0.05, max=0.95)
In [ ]:
#Extract imageIds from the filenames in our test/unknown directory
filenames = batches.filenames
ids = np.array([int(f[8:f.find('.')]) for f in filenames])
Here we join the two columns into an array of [imageId, isDog]|
In [ ]:
subm = np.stack([ids,isdog], axis=1)
subm[:5]
In [ ]:
np.savetxt('/output/kaggle_submission.csv', subm, fmt='%d,%.5f', header='id,label', comments='')
Now, submit to kaggle.
In [ ]: