Let us start by loading the necessary Python libraries:
In [ ]:
import os
import numpy as np
import matplotlib.pyplot as plt
from scipy import misc
% matplotlib inline
plt.rcParams['figure.figsize'] = (10, 10)
plt.rcParams['image.cmap'] = 'gray'
In [ ]:
# Load the grace_hopper.jpg image from the data folder
The flag flatten means you imported the image as a grey-scale image. This means each pixel is represented by a single value between 0 (black) and 255 (white). You can view the image using the imshow
function from matplotlib
:
In [ ]:
# View the image
or view a part of the image by selecting a region:
In [ ]:
# View a region of the image
Alternatively, you can view the values of the pixels directly, for example to view the values in the top left corner of the image
In [ ]:
# Print the pixel values of a region in the top left corner
Now lets define a convolution function. First you must define a function which traverses the image to apply the convolution at every point and returns the result in a filtered image. Calculating the size of the filtered image along each dimension can be a little tricky, the formula is:
Size of the filtered image = input image size - filter size + 1
Let us start by implementing the convolve
function. It takes as input an image and a filter, and returns
the output of applying the filter at each position in the image through a function multiply_sum
In [ ]:
# Convolution function
def convolve(image, filter):
filter_height = filter.shape[0]
filter_width = filter.shape[1]
filtered_image = np.ndarray(shape=(image.shape[0] - filter_height + 1,
image.shape[1] - filter_width + 1))
for x in range(0, filtered_image.shape[0]):
for y in range(0, filtered_image.shape[1]):
# We select a local patch of the image
patch = image[x: x + filter_height,
y: y + filter_width]
# Then apply the convolution operation to it
filtered_image[x,y] = multiply_sum(patch, filter)
return filtered_image
Now let's implement the multiply_sum
function. It takes as input two numpy
arrays of the same shape, multiplies them elementwise and returns the sum:
In [ ]:
# Multiply_sum function
def multiply_sum(patch, filter):
# Let's make sure our two inputs have the same shape
assert(patch.shape == filter.shape)
return np.sum(patch * filter)
An there you have it, a convolution operator! You can apply a filter onto an image and see the result:
In [ ]:
#First define the 3x3 filter
#Then apply it onto the image
#Show the result
Quiz: What did our filter do?
1) By looking at the image, can you tell what kind of pattern the filter detected?
2) How would you design a filter which detects vertical edges?
3) What would the following filter do:
filter = np.array([[ 1, 1, 1], [ 0, 0, 0], [-1, -1, -1]])
In [ ]:
# Define the filter
In [ ]:
# Convolve using this filter
# Show the result
Transpose the filter:
In [ ]:
# Transpose
In [ ]:
# Convolve using this filter
# Show the result
Very good! But what if we had a colour image, how would we use that extra information to detect useful patterns? The idea is simple, on top of having a set weight for each pixel, we have a set weight for each colour channel within that pixel. The following kernel detects region of the image which are mostly brown.
In [ ]:
# Create a brown filter
brown_filter = np.array(
[[[ 0.13871045, 0.17157242, 0.12934428],
[ 0.16168842, 0.20229845, 0.14835016],
[ 0.135694 , 0.16206263, 0.11727387]],
[[ 0.04231958, 0.05471011, 0.03167877],
[ 0.0462575 , 0.06581022, 0.03104937],
[ 0.04185439, 0.04734124, 0.02087744]],
[[-0.15704881, -0.16666673, -0.16600266],
[-0.17439997, -0.17757156, -0.18760149],
[-0.15435153, -0.17037505, -0.17269668]]])
print(brown_filter.shape)
The first dimension represents colours. You can see the filter responds to regions that are red, a little green, and not at all blue. Let's load a colourful version of the Grace Hopper portrait
In [ ]:
# Read and show the Grace Hopper portrait
Quiz
Can you design a filter which will detect the edge from the background (blue) to Grace Hopper’s left shoulder (black).
Hint: make sure the weights in your lter sum to 0.
In [ ]:
# Answer
We could almost apply our convolution operator already, but the filter we have is currently in the wrong format. The colour channel dimension should be the last one of the filter, same as in our image. This can be fixed by a simple transpose:
In [ ]:
# Transpose the brown filter
Now lets load an already trained network in our environment. This network (VGG-16) has been trained on the Imagenet dataset where the goal is to classify pictures into one of one thousand categories. When it came out in 2014, it won the annual ImageNet Recognition Challenge correctly classifying 93% of the images in the test set. For comparison, humans can achieve around 95% accuracy. It's also very simple, it only uses 3x3 convolutions! It is very deep though and it takes 4 GPUS 2-3 weeks to train it.
To load the model, you must first define it's architecture. You're going to do this step by step as you learn the components of convolutional neural networks. But first, lets load the necessary libraries. We are again going to use the Keras
library.
In [ ]:
import theano
import cv2
from keras.models import Sequential
from keras.layers.core import Flatten, Dense, Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.layers.convolutional import ZeroPadding2D
from keras.optimizers import SGD
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.utils import np_utils
In [ ]:
# Implement a convolutional layer
# Create the model
# On the very first layer, you must specify the input shape
# Your first convolutional layer will have 64 3x3 filters, and will use a relu activation function
In [ ]:
# Stacking layers
# Once again you must add padding
In [ ]:
# Add a pooling layer with window size 2x2
# The stride indicates the distance between each pooled window
In [ ]:
# Lots more Convolutional and Pooling layers
As you can see, the depth of the layers get progressively larger, up to 512 for the latest layers. This means as we go along, each layer detects a greater number of features. On the other hand, each max-pooling layer halves the height and width of the layer outputs. Starting from images of dimensions 224x224, the final outputs are only of size 7x7.
Now you're about to add some fully connected layers which can learn the more abstract features of the image. But first you must first change the layout of the input so it looks like a 1-D tensor.
In [ ]:
# Flatten the input
# Add a fully connected layer with 4096 neurons
The Flatten
function removes the spatial dimensions of the layer output, it is now a simple 1-D row of numbers. This means we can no longer apply convolutions, but can apply fully connected layers like the ones of the perceptron from the previous module.
Dense
layers are fully connected layers. You used them in the previous module.
Dropout
is a method used at train time to prevent overfitting. As a layer, it randomly modifies its input
so that the neural network learns to be robust to these changes. Although you won’t actually use it
now, you must define it to correctly load the weights as it was part of the original network.
In [ ]:
# Add a dropout layer
The number 0.5 indicates the amount of change, 0.0 means no change, and 1.0 means completely different.
Add one more fully connected layer:
In [ ]:
# Add another fully connected layer with 4096 neurons and a Dropout at the output
Finally a softmax layer to predict the categories. There are 1000 categories and hence 1000 neurons.
In [ ]:
# Add softmax layer
In [ ]:
# Load the weights
# Compile the network no need to worry about this for now
In [ ]:
# Load the image
img = cv2.resize(cv2.imread('data/cat.jpg'), (224, 224))
# Transform it to the right formatd
def transform_image(image):
image_t = np.copy(image).astype(np.float32) # Avoids modifying the original
image_t[:,:,0] -= 103.939 # Substracts mean Red
image_t[:,:,1] -= 116.779 # Substracts mean Green
image_t[:,:,2] -= 123.68 # Substracts mean Blue
image_t = image_t.transpose((2,0,1)) # The colour dimension comes first
image_t = np.expand_dims(image_t, axis=0) # The network takes batches of images as input
return image_t
img_t = transform_image(img)
# What does the image look like?
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
In [ ]:
# Push the image through
#The network takes batches of images, we only want the result for one image
#The output is an array with 1000 values, one for each category. What does it look like?
The network seems pretty confident! Lets look at its top 5 guesses:
In [ ]:
# Load labels
# Sort top k predictions from softmax output
Hurray! Our network knows what it's talking about. Let's have a closer look at what goes on inside.
In a convolutional neural network, there's an easy way to visualise the filters learned at the very first layer. We can print each filter to show which colours it reponds to.
In [ ]:
# This is a helper function to let you visualise what goes on inside the network
def vis_square(weights, padsize=1, padval=0):
#Avoids modifying the network weights
data = np.copy(weights)
#Normalize the inputs
data -= data.min()
data /= data.max()
# Lets tile the inputs
# How many inputs per row
n = int(np.ceil(np.sqrt(data.shape[0])))
# Add padding between inputs
padding = ((0, n ** 2 - data.shape[0]), (0, padsize), (0, padsize)) + ((0, 0),) * (data.ndim - 3)
data = np.pad(data, padding, mode='constant', constant_values=(padval, padval))
#place the filters on an n by n grid
data = data.reshape((n, n) + data.shape[1:])
#merge the filters contents onto a single image
data = data.transpose((0, 2, 1, 3) + tuple(range(4, data.ndim)))
data = data.reshape((n * data.shape[1], n * data.shape[3]) + data.shape[4:])
#show the filter
plt.imshow(data)
# Get the weights of the first convolutional layer
first_layer_weights = vgg_model.layers[1].get_weights()
# first_layer_weights[0] stores the connection weights
# first_layer_weights[1] stores the bias weights
# For now we're interrested in the connections
filters = first_layer_weights[0]
# Visualise the filters
vis_square(filters.transpose(0, 2, 3, 1))
You can see how each filter detects a different property of the input image. Some are designed to respond to certain colours, while some other -- the greyscale looking ones -- detects changes in brightness such as edges. You may notice the brown filter in the top left corner, if we print the values of its weights
In [ ]:
print(filters[1])
It is the brown filter we applied to the Grace Hopper portrait above!
Another way of visualising the network is to see which neurons get activated as the images traverses the network. A neuron outputing a high value means the pattern it has learnt to detect has been observed. Let's apply this to our kitten image.
In [ ]:
# This function fetches the intermediary output from a layer
def get_layer_output(model, image, layer):
# This theano function lets us look at the acivations throughout the network
theano_function = theano.function([model.layers[0].input],
model.layers[layer].get_output(train=False))
return theano_function(image)[0]
layer_output = get_layer_output(vgg_model, img_t, 1)
vis_square(layer_output, padsize=5, padval=1)
It's worth spending a moment to understand what is going on here. Each pixel in this image is a different neuron in the neural network. Neurons on the same image sample share the same weights and therefore detect the same feature. You can compare the visualised filters above with their corresponding image sample. For example, have find the bright green filter in the original visualisation and look at its corresponding image response.
Using this method, it is possible to visualise the deeper parts of the neural network, although they become harder to interpret. You can visualise the output of the second convolutional layer:
In [ ]:
# Visualise the output of the second convolutional layer
And the eighth layer:
In [ ]:
# Visualise the output of the eighth convolutional layer
As we get further down the network, the representations become smaller in their spatial features thanks to the pooling layers. The final convolutional layers only have dimensions 14 by 14.
In [ ]:
# Visualise the output of the final convolutional layers
Load the CIFAR10 dataset:
In [ ]:
# Load the data
# Turn our images into floating point numbers
# Put our input data in the range 0-1
# convert class vectors to binary class matrices
In [ ]:
# Define the model: Our model has six layers, four convolutional, and two fully connected. The first two layers have a
# depth of 32, meaning they each detect 32 types of filters. They use 3x3 sized filters.
Quiz: HOW MANY WEIGHTS IN THE NETWORK?
In [ ]:
# Using Stochastic gradient descent with an initial learning rate of 0.01
# With Nesterov momentum, and a learning rate decay of 1e-6 per iteration
In [ ]:
# Preprocessing, does both normalization and augmentation
datagen = ImageDataGenerator(
featurewise_center=True, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=True, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
rotation_range=0, # randomly rotate images in the range (degrees, 0 to 180)
width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=True, # randomly flip images
vertical_flip=False) # randomly flip images
# Compute quantities required for featurewise normalization (std, mean)
datagen.fit(X_train)
And you're set! You can start training and see the accuracy improve!
In [ ]:
# Train, set, go!!!