MNIST Demo


Load package

Also remember to start your Julia session with multiple threads to speed up learning. Here I'm using 4.


In [1]:
using Alice

In [2]:
Base.Threads.nthreads()


Out[2]:
4

Load the data

The MNIST database is a collection of handwritten digits and labels from a larger collection originally created by NIST (National Institute of Standards and Technology). It contains characters written by American Census bereau employees and high school students.

The common set that everyone uses now for ML testing was created by Yan LeCun (http://yann.lecun.com/exdb/mnist/). And there is a Julia package that makes it easy to access the data.


In [3]:
using MNIST

Visualising the data


In [4]:
function display_mnist(; flipink = false)
    rows, cols = 16, 32
    num_images = rows * cols
    selection = rand(1:60000, num_images)            # Random selection of image indices
    images = Array(Array{Float64, 2}, 0)             # Initialise blank outer array to contain image arrays
    img = ones(30, 30)                               # Create blank canvas with some padding
    for i in selection
        img[2:29, 2:29] = trainfeatures(i) ./ 255    # Reshape each image as a matrix and place in canvas
        push!(images, copy(img))                     # Push single image into images vector
    end
    images = hvcat(cols, images...)                  # Concatenate individual image matrices into a large matrix
    
    # Display image with either white or black ink with opposite background
    flipink ? Gray.(1.0 - images) : Gray.(images)
end;

In [5]:
display_mnist()


Out[5]:

In [6]:
display_mnist(flipink = true)


Out[6]:

Training, validation and test data sets

The original data has two separate sets - 60,000 training and 10,000 test observations. Here we'll train on the first 50,000 in the original training set and use the other 10,000 for cross validation. The test set can stay as it is.

The only pre-processing will be to scale the pixel inputs to the range [0, 1], reshape each image into a 28x28 array (in the provided set each image is a single column), and change the output label 0 to 10.


In [7]:
# Download training images and labels from MNIST package
train_images, train_labels = traindata()
test_images, test_labels = testdata()

# Rescale and reshape
train_images = train_images ./ 255
train_images = reshape(train_images, 28, 28, 60000)
test_images = test_images ./ 255
test_images = reshape(test_images, 28, 28, 10000)

# Convert target to integer (from float) type and swap label 0 for 10
train_labels = Int.(train_labels)
train_labels[train_labels .== 0] = 10
test_labels = Int.(test_labels)
test_labels[test_labels .== 0] = 10

# Split training set into 50,000 training and 10,000 validation images

# Images
val_images = train_images[:, :, 50001:60000]
train_images = train_images[:, :, 1:50000]

# Labels
val_labels = train_labels[50001:60000]
train_labels = train_labels[1:50000];

Build and train models

Net 1 - Shallow and narrow feed forward neural network

Let's get a benchmark result with a feedforward neural network (only fully connected hidden layers) with 2 hidden layers each with 50 neurons.


In [8]:
# Set seed to be able to replicate
srand(123)

# Data Box and Input Layer
databox = Data(train_images, train_labels, val_images, val_labels)
batch_size = 128
input = InputLayer(databox, batch_size)

# Fully connected hidden layers
dim = 30
fc1 = FullyConnectedLayer(size(input), dim, activation = :tanh)
fc2 = FullyConnectedLayer(size(fc1), dim, activation = :tanh)

# Softmax Output Layer
num_classes = 10
output = SoftmaxOutputLayer(databox, size(fc2), num_classes)

# Model
λ = 1e-3    # Regularisation
net = NeuralNet(databox, [input, fc1, fc2, output], λ, regularisation=:L2)


Out[8]:
Neural Network
Training Data Dimensions - (28,28,50000)
Layers:
Layer 1 - InputLayer{Float64}, Dimensions - (28,28,128)
Layer 2 - FullyConnectedLayer{Float64}, Activation - tanh, Dimensions - (30,128)
Layer 3 - FullyConnectedLayer{Float64}, Activation - tanh, Dimensions - (30,128)
Layer 4 - SoftmaxOutputLayer{Float64,Int64}, Dimensions - (10,128)

In [9]:
# Training parameters
num_epochs = 40    # number of epochs
α = 1e-2           # learning rate
μ = 0.9            # momentum param / viscosity

# Train
train(net, num_epochs, α, μ, nesterov=true, shuffle=true, last_train_every=2, full_train_every=10, val_every=10)


22:39:39 : Epoch 2, last batch training error (with regⁿ) - 0.386
22:39:43 : Epoch 4, last batch training error (with regⁿ) - 0.267
22:39:47 : Epoch 6, last batch training error (with regⁿ) - 0.289
22:39:51 : Epoch 8, last batch training error (with regⁿ) - 0.319
22:39:57 : Epoch 10, last batch training error (with regⁿ) - 0.219

Coffee break:
Training error (with regⁿ) - 0.213  |  Training accuracy - 96.6
Validation error (without regⁿ) - 0.135  |  Validation accuracy - 96.5

22:40:03 : Epoch 12, last batch training error (with regⁿ) - 0.338
22:40:08 : Epoch 14, last batch training error (with regⁿ) - 0.214
22:40:14 : Epoch 16, last batch training error (with regⁿ) - 0.261
22:40:19 : Epoch 18, last batch training error (with regⁿ) - 0.160
22:40:24 : Epoch 20, last batch training error (with regⁿ) - 0.226

Coffee break:
Training error (with regⁿ) - 0.181  |  Training accuracy - 97.7
Validation error (without regⁿ) - 0.116  |  Validation accuracy - 96.8

22:40:30 : Epoch 22, last batch training error (with regⁿ) - 0.233
22:40:35 : Epoch 24, last batch training error (with regⁿ) - 0.152
22:40:40 : Epoch 26, last batch training error (with regⁿ) - 0.134
22:40:45 : Epoch 28, last batch training error (with regⁿ) - 0.147
22:40:51 : Epoch 30, last batch training error (with regⁿ) - 0.173

Coffee break:
Training error (with regⁿ) - 0.170  |  Training accuracy - 98.0
Validation error (without regⁿ) - 0.109  |  Validation accuracy - 96.9

22:40:57 : Epoch 32, last batch training error (with regⁿ) - 0.147
22:41:02 : Epoch 34, last batch training error (with regⁿ) - 0.184
22:41:07 : Epoch 36, last batch training error (with regⁿ) - 0.172
22:41:12 : Epoch 38, last batch training error (with regⁿ) - 0.191
22:41:17 : Epoch 40, last batch training error (with regⁿ) - 0.145

Completed Training:
Training error (with regⁿ) - 0.163  |  Training accuracy - 98.4
Validation error (without regⁿ) - 0.102  |  Validation accuracy - 97.0

In [10]:
# PLot loss curves
Gadfly.set_default_plot_size(24cm, 12cm)
plot_loss_history(net, 2, 10, 10)


Out[10]:
epoch -50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80 90 -40 -38 -36 -34 -32 -30 -28 -26 -24 -22 -20 -18 -16 -14 -12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 -50 0 50 100 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 -0.40 -0.38 -0.36 -0.34 -0.32 -0.30 -0.28 -0.26 -0.24 -0.22 -0.20 -0.18 -0.16 -0.14 -0.12 -0.10 -0.08 -0.06 -0.04 -0.02 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.50 0.52 0.54 0.56 0.58 0.60 0.62 0.64 0.66 0.68 0.70 0.72 0.74 0.76 0.78 -0.5 0.0 0.5 1.0 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 loss validation loss full training loss last batch training loss

Net 2 - Convolutional neural network

Next let's train a convolutional neural network with configuration:
input -> convolution layer -> max pool layer -> fully connected layer -> softmax output layer


In [11]:
# Set seed to be able to replicate
srand(123)

# Data Box and Input Layer
databox = Data(train_images, train_labels, val_images, val_labels, test_images, test_labels)
batch_size = 128
input = InputLayer(databox, batch_size)

# Convolution Layer
patch_dim = 5
num_patches = 20
patch_dims = (patch_dim, patch_dim, num_patches)
conv1 = ConvolutionLayer(size(input), patch_dims, activation = :relu)

# Pooling Layer
stride = 2
pool1 = MaxPoolLayer(size(conv1), stride)

# Fully Connected Layer
dim = 100
fc1 = FullyConnectedLayer(size(pool1), dim, activation = :relu)

# Softmax Output Layer
num_classes = 10
output = SoftmaxOutputLayer(databox, size(fc1), num_classes)

# Model
λ = 1e-3    # Regularisation
net = NeuralNet(databox, [input, conv1, pool1, fc1, output], λ, regularisation=:L2)


Out[11]:
Neural Network
Training Data Dimensions - (28,28,50000)
Layers:
Layer 1 - InputLayer{Float64}, Dimensions - (28,28,128)
Layer 2 - ConvolutionLayer{Float64}, Activation - relu, Dimensions - (24,24,20,128)
Layer 3 - MaxPoolLayer{Float64,Int64}, Dimensions - (12,12,20,128)
Layer 4 - FullyConnectedLayer{Float64}, Activation - relu, Dimensions - (100,128)
Layer 5 - SoftmaxOutputLayer{Float64,Int64}, Dimensions - (10,128)

In [12]:
# Training parameters
num_epochs = 40    # number of epochs
α = 1e-2           # learning rate
μ = 0.9            # momentum param / viscosity

# Training
train(net, num_epochs, α, μ, nesterov=true, shuffle=true, last_train_every=2, full_train_every=10, val_every=10)


22:43:30 : Epoch 2, last batch training error (with regⁿ) - 0.252
22:44:30 : Epoch 4, last batch training error (with regⁿ) - 0.181
22:45:30 : Epoch 6, last batch training error (with regⁿ) - 0.204
22:46:29 : Epoch 8, last batch training error (with regⁿ) - 0.146
22:47:27 : Epoch 10, last batch training error (with regⁿ) - 0.170

Coffee break:
Training error (with regⁿ) - 0.124  |  Training accuracy - 99.1
Validation error (without regⁿ) - 0.052  |  Validation accuracy - 98.5

22:48:41 : Epoch 12, last batch training error (with regⁿ) - 0.092
22:49:40 : Epoch 14, last batch training error (with regⁿ) - 0.145
22:50:39 : Epoch 16, last batch training error (with regⁿ) - 0.115
22:51:37 : Epoch 18, last batch training error (with regⁿ) - 0.100
22:52:36 : Epoch 20, last batch training error (with regⁿ) - 0.088

Coffee break:
Training error (with regⁿ) - 0.088  |  Training accuracy - 99.6
Validation error (without regⁿ) - 0.045  |  Validation accuracy - 98.7

22:53:49 : Epoch 22, last batch training error (with regⁿ) - 0.093
22:54:49 : Epoch 24, last batch training error (with regⁿ) - 0.084
22:55:48 : Epoch 26, last batch training error (with regⁿ) - 0.088
22:56:47 : Epoch 28, last batch training error (with regⁿ) - 0.075
22:57:46 : Epoch 30, last batch training error (with regⁿ) - 0.062

Coffee break:
Training error (with regⁿ) - 0.072  |  Training accuracy - 99.7
Validation error (without regⁿ) - 0.044  |  Validation accuracy - 98.8

22:59:00 : Epoch 32, last batch training error (with regⁿ) - 0.066
23:00:00 : Epoch 34, last batch training error (with regⁿ) - 0.085
23:00:59 : Epoch 36, last batch training error (with regⁿ) - 0.077
23:01:59 : Epoch 38, last batch training error (with regⁿ) - 0.059
23:02:57 : Epoch 40, last batch training error (with regⁿ) - 0.064

Completed Training:
Training error (with regⁿ) - 0.066  |  Training accuracy - 99.8
Validation error (without regⁿ) - 0.043  |  Validation accuracy - 98.7

In [13]:
plot_loss_history(net, 2, 10, 10)


Out[13]:
epoch -50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80 90 -40 -38 -36 -34 -32 -30 -28 -26 -24 -22 -20 -18 -16 -14 -12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 -50 0 50 100 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 -0.26 -0.25 -0.24 -0.23 -0.22 -0.21 -0.20 -0.19 -0.18 -0.17 -0.16 -0.15 -0.14 -0.13 -0.12 -0.11 -0.10 -0.09 -0.08 -0.07 -0.06 -0.05 -0.04 -0.03 -0.02 -0.01 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.30 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.50 0.51 -0.5 0.0 0.5 1.0 -0.26 -0.24 -0.22 -0.20 -0.18 -0.16 -0.14 -0.12 -0.10 -0.08 -0.06 -0.04 -0.02 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.50 0.52 loss validation loss full training loss last batch training loss

Net 3 - Deeper convolutional neural network

Next let's train a deeper convolutional neural network with configuration:
input -> convolution layer -> max pool layer -> convolution layer -> max pool layer -> fully connected layer -> softmax output layer


In [14]:
# Set seed to be able to replicate
srand(123)

# Data Box and Input Layer
databox = Data(train_images, train_labels, val_images, val_labels, test_images, test_labels)
batch_size = 128
input = InputLayer(databox, batch_size)

# Convolution Layer 1
patch_dim = 5
num_patches1 = 20
patch_dims = (patch_dim, patch_dim, num_patches1)
conv1 = ConvolutionLayer(size(input), patch_dims, activation = :relu)

# Pooling Layer 1
stride = 2
pool1 = MaxPoolLayer(size(conv1), stride)

# Convolution Layer 2
patch_dim = 5
num_patches2 = 40
patch_dims = (patch_dim, patch_dim, num_patches1, num_patches2)
conv2 = ConvolutionLayer(size(pool1), patch_dims, activation = :relu)

# Pooling Layer 2
stride = 2
pool2 = MaxPoolLayer(size(conv2), stride)

# Fully Connected Layer
dim = 100
fc1 = FullyConnectedLayer(size(pool2), dim, activation = :relu)

# Softmax Output Layer
num_classes = 10
output = SoftmaxOutputLayer(databox, size(fc1), num_classes)

# Model
λ = 1e-3    # Regularisation
net = NeuralNet(databox, [input, conv1, pool1, conv2, pool2, fc1, output], λ, regularisation=:L2)


Out[14]:
Neural Network
Training Data Dimensions - (28,28,50000)
Layers:
Layer 1 - InputLayer{Float64}, Dimensions - (28,28,128)
Layer 2 - ConvolutionLayer{Float64}, Activation - relu, Dimensions - (24,24,20,128)
Layer 3 - MaxPoolLayer{Float64,Int64}, Dimensions - (12,12,20,128)
Layer 4 - ConvolutionLayer{Float64}, Activation - relu, Dimensions - (8,8,40,128)
Layer 5 - MaxPoolLayer{Float64,Int64}, Dimensions - (4,4,40,128)
Layer 6 - FullyConnectedLayer{Float64}, Activation - relu, Dimensions - (100,128)
Layer 7 - SoftmaxOutputLayer{Float64,Int64}, Dimensions - (10,128)

In [15]:
# Training parameters
num_epochs = 20    # number of epochs
α = 1e-2           # learning rate
μ = 0.9            # momentum param / viscosity

# Training
train(net, num_epochs, α, μ, nesterov=true, shuffle=true, last_train_every=2, full_train_every=10, val_every=10)


23:08:56 : Epoch 2, last batch training error (with regⁿ) - 0.241
23:13:53 : Epoch 4, last batch training error (with regⁿ) - 0.170
23:18:47 : Epoch 6, last batch training error (with regⁿ) - 0.155
23:23:41 : Epoch 8, last batch training error (with regⁿ) - 0.139
23:28:36 : Epoch 10, last batch training error (with regⁿ) - 0.179

Coffee break:
Training error (with regⁿ) - 0.142  |  Training accuracy - 99.0
Validation error (without regⁿ) - 0.049  |  Validation accuracy - 98.5

23:34:18 : Epoch 12, last batch training error (with regⁿ) - 0.121
23:39:12 : Epoch 14, last batch training error (with regⁿ) - 0.118
23:44:06 : Epoch 16, last batch training error (with regⁿ) - 0.104
23:49:00 : Epoch 18, last batch training error (with regⁿ) - 0.109
23:53:55 : Epoch 20, last batch training error (with regⁿ) - 0.118

Completed Training:
Training error (with regⁿ) - 0.104  |  Training accuracy - 99.3
Validation error (without regⁿ) - 0.043  |  Validation accuracy - 98.9

In [16]:
Gadfly.set_default_plot_size(24cm, 12cm)
plot_loss_history(net, 2, 10, 10)


Out[16]:
epoch -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 -20 -19 -18 -17 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 -20 0 20 40 -20 -18 -16 -14 -12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 -0.25 -0.24 -0.23 -0.22 -0.21 -0.20 -0.19 -0.18 -0.17 -0.16 -0.15 -0.14 -0.13 -0.12 -0.11 -0.10 -0.09 -0.08 -0.07 -0.06 -0.05 -0.04 -0.03 -0.02 -0.01 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.30 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 -0.25 0.00 0.25 0.50 -0.26 -0.24 -0.22 -0.20 -0.18 -0.16 -0.14 -0.12 -0.10 -0.08 -0.06 -0.04 -0.02 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.50 loss validation loss full training loss last batch training loss

Best test set accuracy

It should be possible to get to around 99.2% with a neural network with 2 convolution and pooling layers. This net isn't quite there so it'll be worth finetuning to get over 99%.


In [17]:
accuracy(net, test_images, test_labels)


Out[17]:
98.99839743589743

In [ ]: