In [ ]:
"""This area sets up the Jupyter environment.
Please do not modify anything in this cell.
"""
import os
import sys
import time
# Add project to PYTHONPATH for future use
sys.path.insert(1, os.path.join(sys.path[0], '..'))
# Import miscellaneous modules
from IPython.core.display import display, HTML
# Set CSS styling
with open('../admin/custom.css', 'r') as f:
style = """<style>\n{}\n</style>""".format(f.read())
display(HTML(style))
For a computer an image is a matrix of data, where each pixel is represented by one or more values:
We had a brief look at this dataset in the previous notebook, and here we will go through it again with much more detail. As before, the MNIST database (Modified National Institute of Standards and Technology database) is a multiclass classification problem where we are tasked with classifying a digit ($0-9$) based on a $28\times 28$ greyscale image:
The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.
It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.
In the following example we will load data from MNIST.
In [ ]:
# Plots will be displaying plots within the notebook
%matplotlib notebook
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
# NumPy is a package for manipulating N-dimensional array objects
import numpy as np
# Pandas is a data analysis package
import pandas as pd
#Library To test/verify some tasks
import problem_unittests as tests # Used to test ouw anwsers
# Mnist wrapper
from keras.datasets import mnist
# Code to load the data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Print data shape
print('Shape of x_train {}'.format(x_train.shape))
print('Shape of y_train {}'.format(y_train.shape))
print('Shape of x_test {}'.format(x_train.shape))
print('Shape of y_test {}'.format(y_train.shape))
# Code to plot the 5th training sample.
fig,ax1 = plt.subplots(1,1, figsize=(7, 7))
ax1.imshow(x_train[5], cmap='gray')
title = 'Target = {}'.format(y_train[5])
ax1.set_title(title)
ax1.grid(which='Major')
ax1.xaxis.set_major_locator(MaxNLocator(28))
ax1.yaxis.set_major_locator(MaxNLocator(28))
fig.canvas.draw()
time.sleep(0.1)
Before we start classifying digits we need to pre-process the data.
Your first task is to create a function that normalises 8-bit images from [0,255] to [0,1]:
In [ ]:
def normalise_images(images):
"""Normalise input images.
"""
# Normalise image here
return images
### Do *not* modify the following lines ###
tests.test_normalize_images(normalise_images)
# Normalize the data for future use
x_train = normalise_images(x_train)
x_test = normalise_images(x_test)
When we loaded the MNIST dataset each digit was represented by a matrix of size $(28, 28)$. However, the artificial neural network we will be building uses the concept of colour channels and feature maps even for greyscale images. This means that we have to transform $(28, 28)$ to $(28, 28, 1)$.
In [ ]:
# Write your code here
x_train = None
x_test = None
### Do *not* modify the following lines ###
print('Shape of x_train {}'.format(x_train.shape))
print('Shape of y_train {}'.format(y_train.shape))
print('Shape of x_test {}'.format(x_test.shape))
print('Shape of y_test {}'.format(y_test.shape))
To classify our digits we need to use one-hot encoding to represent the target outputs. One-hot enconding is a robust yet simple solution to represent multi-categorical targets.
This enconding is an ideal representation to train a model using gradient descent algorithm with the softmax function we discussed in a previous notebook.
Here's an example of how a one-hot encoding scheme looks like:
The core idea is that you transform multi-categorical data to a combination of several single class(es). By doing this we can, for each example, see whether it belongs to any class, where 1 indicates that it does and 0 otherwise.
keras.utils.to_categorical(vector, number_classes)
like we did in the previous notebook and come back to this task later
In [ ]:
def one_hot(vector, number_classes):
"""Return a one-hot encoded matrix given the argument vector.
"""
# Where we will store our one-hots
one_hot = []
# One-hot encode `vector` here
# Transform list to numpy array and return it
return np.array(one_hot)
### Do *not* modify the following line ###
tests.test_one_hot(one_hot)
# One-hot encode the MNIST target values
y_train = one_hot (y_train, 10)
y_test = one_hot(y_test, 10)
Now that we have both added an extra dimension to the input data as well as one-hot encoded the target values, let's take a look at the shapes of the data matrices.
In [ ]:
print('Shape of x_train {}'.format(x_train.shape))
print('Shape of y_train {}'.format(y_train.shape))
print('Shape of x_test {}'.format(x_train.shape))
print('Shape of y_test {}'.format(y_train.shape))
If you have the task of recognising cats in an image, you might want to recognise / classify the animal regardless of its position. To do that we rely on a statistical fact: natural images are stationary source.
So, if we calculate a statistic for some location in the input image, then that statistic might also be valuable to calculate at some other location. One can exploit this property to define small networks that learn features that can be applied on different parts of an image.
Convolutional neural networks employs these aspects to create very efficient neural networks.
In case you want to watch a short video (has captions) walking through the concepts behind convolutional networks, take a look at the following YouTube link:
A sliding window defines a small region of interest in an image.
The region of interested is used to scan the whole image as shown in the following animation:
If we use the sliding window to define what is the input seen by a small neural network, we have a so called convolution.
Assuming we have a color image, and a small neural network with $k$ outputs: for every possible position of the sliding window we will have $k$ outputs.
After the sliding window has scanned the whole image you have 3 dimensional matrix that can be investigated further.
Let's assume that we have an image of $5 \times 5=25$ pixels:
$$ \begin{equation*} \begin{array}{|c|c|c|c|c|} \hline 1 & 1 & 1 & 0 & 0 \\ \hline 0 & 1 & 1 & 1 & 0\\ \hline 0 & 0 & 1 & 1 & 1\\ \hline 0 & 0 & 1 & 1 & 0\\ \hline 0 & 1 & 1 & 0 & 0\\ \hline \end{array} \end{equation*} $$Assume that we define a small neural network that has $3 \times 3$ weights and a single output. The weight matrix is
$$ \begin{equation*} \begin{array}{|c|c|c|} \hline 1 & 0 & 1 \\ \hline 0 & 1 & 0\\ \hline 1 & 0 & 1\\ \hline \end{array} \end{equation*} $$By feed-forwarding the network using a $3 \times 3$ sliding window we get the following convolved features (also know as feature map):
$$ \begin{equation*} \begin{array}{|c|c|c|} \hline 4 & 3 & 4 \\ \hline 2 & 4 & 3\\ \hline 2 & 3 & 4\\ \hline \end{array} \end{equation*} $$The number of so-called feature maps produced will depend on the number of outputs of the neural network. In this case we have just one feature map.
We can use padding and strides to achive different shapes on the feature maps. Below are four animations that showcases the convolution operation on an input matrix using different paddings and strides:
Valid padding, stride=1 | Valid padding, stride=2 | Same Padding, stride =1 | Padding =1 , stride =2 |
To compute the size of the feature map resulting from a convolution we need to know the input size, the size of the kernel (filter), the stride, and the padding:
$$ \begin{equation*} output = \frac{1}{stride} (input - kernel + 2 * padding) + 1 \end{equation*} $$The height can be calculated like this:
$$ \begin{equation*} height_{new} = \frac{1}{stride} (height_{input} - height_{kernel} + 2 * padding)+1 \end{equation*} $$The width can be calculated like this:
$$ \begin{equation*} widht_{new} = \frac{1}{stride} (widht_{input} - widht_{kernel} + 2 * padding)+1 \end{equation*} $$Let us assume that we have an image that is $5 \times 5$. If we pad this image with a single pixel and then convolve it with a $3 \times 3$ kernel using a stride of $2$, we get the following feature map:
We can compute the output size using the equations above:
$$ \begin{equation*} \begin{aligned} height_{new} &= \frac{1}{stride} (height_{input} - height_{kernel} + 2 * padding) + 1 \\ &= \frac{1}{2} (5 - 3 + 2 * 1) + 1 \\ &= \frac{1}{2}(4)+1 \\ &= 3 \end{aligned} \end{equation*} $$It has become common practice to use pooling layer between convolutional layers.
Sucessful convolutional neural networks like alexnet, VGG16, VGG19 employed this technique.
Pooling layers also depend on sliding windows, but instead of using the window as inputs for neurons, the sliding input data goes through a max
, mean
, or some other operator.
Max-pooling has several advantages:
We will implement the following convolution network:
The components of this network can be seen below:
Define an input:
input_x = Input(shape=sample_shape)
sample_shape
is an input parameterGenerate 32 kernel maps using a convolutional layer:
output_layer = Conv2D(PARAMETERS)(input_layer)
Generate 64 kernel maps using a convolutional layer:
output_layer = Conv2D(PARAMETERS)(input_layer)
Reduce the feature maps using max-pooling:
output_layer = MaxPooling2D(PARAMETERS)(input_layer)
Flatten the feature map:
Fully-connected, i.e. Dense
, to 128 dimensions:
Fully-connected, i.e. Dense
, to $K$ classes (argument) dimensions:
It is time to implement our first convolutional neural network.
In [ ]:
# Import Keras library
import keras
from keras.models import Model
from keras.layers import *
def net_1(sample_shape, nb_classes):
# Define the network input to have `sample_shape´ shape
input_x = None
# Create network internals here
x = None
# Dense `nb_classes`
probabilities = Dense(nb_classes, activation='softmax')(x)
# Define the output
model = Model(inputs=input_x, outputs=probabilities)
return model
In [ ]:
# Shape of sample
sample_shape = x_train[0].shape
# Construct net
model = net_1(sample_shape, 10)
model.summary()
We need to define hyperparameters so our network can learn.
Keep in mind that training these kinds of networks will take longer than the ones we have looked at so far.
In [ ]:
# Define hyperparameters
batch_size = None
epochs = None
### Do *not* modify the following lines ###
# There is no learning rate because we are using the recommended
# values for the Adadelta optimiser more information here:
# https://keras.io/optimizers/
# We need to compile our model
model.compile(loss='categorical_crossentropy',
optimizer='Adadelta',
metrics=['accuracy'])
# Train
logs = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=2,
validation_split=0.1)
# Plot our losses and accuracy
fig, ax = plt.subplots(1,1)
pd.DataFrame(logs.history).plot(ax=ax)
ax.grid(linestyle='dotted')
ax.legend()
plt.show()
# Assess performance
print('='*80)
print('Assesing Test dataset...')
print('='*80)
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
There is a recent discussion if max-pooling is a good solution for reducing the amount of data between layers of a network. Some recent approaches show that a similar, and sometimes better performance, can be achieved by using convolutions with strides larger than 1.
Getting rid of pooling. Many people dislike the pooling operation and think that we can get away without it. For example, Striving for Simplicity: The All Convolutional Net proposes to discard the pooling layer in favor of architecture that only consists of repeated CONV layers. To reduce the size of the representation they suggest using larger stride in CONV layer once in a while. Discarding pooling layers has also been found to be important in training good generative models, such as variational autoencoders (VAEs) or generative adversarial networks (GANs). It seems likely that future architectures will feature very few to no pooling layers.
Implement a convolutional neural network without pooling layers:
In [ ]:
def net_2(sample_shape, nb_classes):
# Define the network input to have `sample_shape` shape
input_x = None
# Create network internals here
x = None
# Dense number_classes
probabilities = Dense(nb_classes, activation='softmax')(x)
# Define the output
model = Model(inputs=input_x, outputs=probabilities)
return model
In [ ]:
# Shape of sample
sample_shape = x_train[0].shape
# Construct net
model = net_2(sample_shape, 10)
model.summary()
As before, we need to define some hyperparameters and train the network. Feel free to reuse the hyperparameters you found before.
In [ ]:
# Define hyperparameters
batch_size = None
epochs = None
### Do *not* modify the following lines ###
# As always we need to compile our model
model.compile(loss='categorical_crossentropy',
optimizer='Adadelta',
metrics=['accuracy'])
# Train
LOGS = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=2,
validation_split = 0.1,)
# Plot our losses and accuracy
fig, ax = plt.subplots(1,1)
pd.DataFrame(logs.history).plot(ax=ax)
ax.grid(linestyle='dotted')
ax.legend()
fig.canvas.draw()
# Assess performance
print('='*80)
print('Assesing Test dataset...')
print('='*80)
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
The following explanantion of Cifar 10 comes from official cifar page:
The CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.
The CIFAR-10 dataset consists of 60000 $32 \times 32$ colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.</li>
Here are the classes in the dataset, as well as 10 random images from each:
airplane | ||||||||||
automobile | ||||||||||
bird | ||||||||||
cat | ||||||||||
deer | ||||||||||
dog | ||||||||||
frog | ||||||||||
horse | ||||||||||
ship | ||||||||||
truck |
In the following example we will load data from CIFAR10.
In [ ]:
from keras.datasets import cifar10
# The data, shuffled and split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
target_2_class = {0:'airplane',
1:'automobile',
2:'bird',
3:'cat',
4:'deer',
5:'dog',
6:'frog',
7:'horse',
8:'ship',
9:'truck'}
# Code to plot the 5th training sample.
fig,ax1 = plt.subplots(1,1, figsize=(7,7))
ax1.imshow(x_train[5])
target = y_train[5][0]
title = 'Target is {} - Class {}'.format(target_2_class[target],target )
ax1.set_title(title)
ax1.grid(which='Major')
ax1.xaxis.set_major_locator(MaxNLocator(32))
ax1.yaxis.set_major_locator(MaxNLocator(32))
fig.canvas.draw()
time.sleep(0.1)
print('Shape of x_train {}'.format(x_train.shape))
print('Shape of y_train {}'.format(y_train.shape))
print('Shape of x_test {}'.format(x_train.shape))
print('Shape of y_test {}'.format(y_train.shape))
In [ ]:
y_train = None
y_test = None
### Do *not* modify the following line ###
# Print data sizes
print('Shape of x_train {}'.format(x_train.shape))
print('Shape of y_train {}'.format(y_train.shape))
print('Shape of x_test {}'.format(x_train.shape))
print('Shape of y_test {}'.format(y_train.shape))
In [ ]:
x_train = None
x_test = None
In [ ]:
# Shape of samples
sample_shape = x_train[0].shape
# Construct net
model = None
model.summary()
# We need to compile our model network:
model.compile(loss='categorical_crossentropy',
optimizer='Adam',
metrics=['accuracy'])
In [ ]:
# Build the code within this cell
Flatten
layer
In [ ]: