```
In [15]:
```# Rather than importing everything manually, we'll make things easy
# and load them all in utils.py, and just import them from there.
%matplotlib inline
import utils; reload(utils)
from utils import *

We need to find a way to convert the imagenet predictions to a probability of being a cat or a dog, since that is what the Kaggle competition requires us to submit. We could use the imagenet hierarchy to download a list of all the imagenet categories in each of the dog and cat groups, and could then solve our problem in various ways, such as:

- Finding the largest probability that's either a cat or a dog, and using that label
- Averaging the probability of all the cat categories and comparing it to the average of all the dog categories.

But these approaches have some downsides:

- They require manual coding for something that we should be able to learn from the data
- They ignore information available in the predictions; for instance, if the models predicts that there is a bone in the image, it's more likely to be a dog than a cat.

A very simple solution to both of these problems is to learn a linear model that is trained using the 1,000 predictions from the imagenet model for each image as input, and the dog/cat label as target.

```
In [2]:
```%matplotlib inline
from __future__ import division,print_function
import os, json
from glob import glob
import numpy as np
import scipy
from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics import confusion_matrix
np.set_printoptions(precision=4, linewidth=100)
from matplotlib import pyplot as plt
import utils; reload(utils)
from utils import plots, get_batches, plot_confusion_matrix, get_data

```
In [3]:
```from numpy.random import random, permutation
from scipy import misc, ndimage
from scipy.ndimage.interpolation import zoom
import keras
from keras import backend as K
from keras.utils.data_utils import get_file
from keras.models import Sequential
from keras.layers import Input
from keras.layers.core import Flatten, Dense, Dropout, Lambda
from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.optimizers import SGD, RMSprop
from keras.preprocessing import image

It turns out that each of the Dense() layers is just a *linear model*, followed by a simple *activation function*. We'll learn about the activation function later - first, let's review how linear models work.

A linear model is (as I'm sure you know) simply a model where each row is calculated as *sum(row * weights)*, where *weights* needs to be learnt from the data, and will be the same for every row. For example, let's create some data that we know is linearly related:

```
In [3]:
```x = random((30,2))
y = np.dot(x, [2., 3.]) + 1.

```
In [4]:
```x[:5]

```
Out[4]:
```

```
In [5]:
```y[:5]

```
Out[5]:
```

*Dense()* - with no activation - in Keras) and optimize it using SGD to minimize mean squared error (*mse*):

```
In [6]:
```lm = Sequential([ Dense(1, input_shape=(2,)) ])
lm.compile(optimizer=SGD(lr=0.1), loss='mse')

(See the *Optim Tutorial* notebook and associated Excel spreadsheet to learn all about SGD and related optimization algorithms.)

This has now learnt internal weights inside the lm model, which we can use to evaluate the loss function (MSE).

```
In [8]:
```lm.evaluate(x, y, verbose=0)

```
Out[8]:
```

```
In [10]:
```lm.fit(x, y, nb_epoch=5, batch_size=1)

```
Out[10]:
```

```
In [11]:
```lm.evaluate(x, y, verbose=0)

```
Out[11]:
```

```
In [12]:
```lm.get_weights()

```
Out[12]:
```

*always* a good idea in *all* machine learning, since we should do all of our initial testing using a dataset small enough that we never have to wait for it.

```
In [16]:
```#path = "data/dogscats/sample/"
path = "data/dogscats/"
model_path = path + 'models/'
if not os.path.exists(model_path): os.mkdir(model_path)

```
In [17]:
```batch_size=100
#batch_size=4

We need to start with our VGG 16 model, since we'll be using its predictions and features.

```
In [18]:
```from vgg16 import Vgg16
vgg = Vgg16()
model = vgg.model

Our overall approach here will be:

- Get the true labels for every image
- Get the 1,000 imagenet category predictions for every image
- Feed these predictions as input to a simple linear model.

Let's start by grabbing training and validation batches.

```
In [22]:
```# Use batch size of 1 since we're just doing preprocessing on the CPU
val_batches = get_batches(path+'valid', shuffle=False, batch_size=1)
batches = get_batches(path+'train', shuffle=False, batch_size=1)

```
```

```
In [8]:
```import bcolz
def save_array(fname, arr): c=bcolz.carray(arr, rootdir=fname, mode='w'); c.flush()
def load_array(fname): return bcolz.open(fname)[:]

```
In [ ]:
```val_data = get_data(val_batches)

```
In [231]:
```trn_data = get_data(batches)

```
```

```
In [155]:
```trn_data.shape

```
Out[155]:
```

```
In [153]:
```save_array(model_path+ 'train_data.bc', trn_data)
save_array(model_path + 'valid_data.bc', val_data)

We can load our training and validation data later without recalculating them:

```
In [19]:
```trn_data = load_array(model_path+'train_data.bc')
val_data = load_array(model_path+'valid_data.bc')

```
In [23]:
```val_data.shape

```
Out[23]:
```

Keras returns *classes* as a single column, so we convert to one hot encoding

```
In [20]:
```def onehot(x): return np.array(OneHotEncoder().fit_transform(x.reshape(-1,1)).todense())

```
In [23]:
```val_classes = val_batches.classes
trn_classes = batches.classes
val_labels = onehot(val_classes)
trn_labels = onehot(trn_classes)

```
In [27]:
```trn_labels.shape

```
Out[27]:
```

```
In [24]:
```trn_classes[:4]

```
Out[24]:
```

```
In [28]:
```trn_labels[:4]

```
Out[28]:
```

*features* for our linear model:

```
In [144]:
```trn_features = model.predict(trn_data, batch_size=batch_size)
val_features = model.predict(val_data, batch_size=batch_size)

```
In [26]:
```trn_features.shape

```
Out[26]:
```

```
In [149]:
```save_array(model_path+ 'train_lastlayer_features.bc', trn_features)
save_array(model_path + 'valid_lastlayer_features.bc', val_features)

We can load our training and validation features later without recalculating them:

```
In [25]:
```trn_features = load_array(model_path+'train_lastlayer_features.bc')
val_features = load_array(model_path+'valid_lastlayer_features.bc')

Now we can define our linear model, just like we did earlier:

```
In [28]:
```# 1000 inputs, since that's the saved features, and 2 outputs, for dog and cat
lm = Sequential([ Dense(2, activation='softmax', input_shape=(1000,)) ])
lm.compile(optimizer=RMSprop(lr=0.1), loss='categorical_crossentropy', metrics=['accuracy'])

We're ready to fit the model!

```
In [29]:
```batch_size=64

```
In [12]:
```batch_size=4

```
In [32]:
```lm.fit(trn_features, trn_labels, nb_epoch=3, batch_size=batch_size,
validation_data=(val_features, val_labels))

```
Out[32]:
```

```
In [31]:
```lm.summary()

```
```

Keras' *fit()* function conveniently shows us the value of the loss function, and the accuracy, after every epoch ("*epoch*" refers to one full run through all training examples). The most important metrics for us to look at are for the validation set, since we want to check for over-fitting.

**Tip**: with our first model we should try to overfit before we start worrying about how to handle that - there's no point even thinking about regularization, data augmentation, etc if you're still under-fitting! (We'll be looking at these techniques shortly).

As well as looking at the overall metrics, it's also a good idea to look at examples of each of:

- A few correct labels at random
- A few incorrect labels at random
- The most correct labels of each class (ie those with highest probability that are correct)
- The most incorrect labels of each class (ie those with highest probability that are incorrect)
- The most uncertain labels (ie those with probability closest to 0.5).

Let's see what we, if anything, we can from these (in general, these are particularly useful for debugging problems in the model; since this model is so simple, there may not be too much to learn at this stage.)

Calculate predictions on validation set, so we can find correct and incorrect examples:

```
In [37]:
```# We want both the classes...
preds = lm.predict_classes(val_features, batch_size=batch_size)
# ...and the probabilities of being a cat
probs = lm.predict_proba(val_features, batch_size=batch_size)[:,0]
probs[:8]

```
Out[37]:
```

```
In [38]:
```preds[:8]

```
Out[38]:
```

Get the filenames for the validation set, so we can view images:

```
In [39]:
```filenames = val_batches.filenames

```
In [40]:
```# Number of images to view for each visualization task
n_view = 4

Helper function to plot images by index in the validation set:

```
In [41]:
```def plots_idx(idx, titles=None):
plots([image.load_img(path + 'valid/' + filenames[i]) for i in idx], titles=titles)

```
In [42]:
```#1. A few correct labels at random
correct = np.where(preds==val_labels[:,1])[0]
idx = permutation(correct)[:n_view]
plots_idx(idx, probs[idx])

```
```

```
In [43]:
```#2. A few incorrect labels at random
incorrect = np.where(preds!=val_labels[:,1])[0]
idx = permutation(incorrect)[:n_view]
plots_idx(idx, probs[idx])

```
```

```
In [44]:
```#3. The images we most confident were cats, and are actually cats
correct_cats = np.where((preds==0) & (preds==val_labels[:,1]))[0]
most_correct_cats = np.argsort(probs[correct_cats])[::-1][:n_view]
plots_idx(correct_cats[most_correct_cats], probs[correct_cats][most_correct_cats])

```
```

```
In [45]:
```# as above, but dogs
correct_dogs = np.where((preds==1) & (preds==val_labels[:,1]))[0]
most_correct_dogs = np.argsort(probs[correct_dogs])[:n_view]
plots_idx(correct_dogs[most_correct_dogs], 1-probs[correct_dogs][most_correct_dogs])

```
```