Title: Artificial Neural Network in Keras Slug: ann-mnist Summary: A Feed Forward Artificial Nerual Netowrk in Keras using the MNIST Digits Dataset Date: 2017-01-3 19:23 Category: Neural Networks Tags: Basics Authors: Thomas Pinder

Introduction

The Keras library is a very popular library for implementing a range of different types of neural networks. In this guide we'll focus on a fully connected neural network using the popular MNIST dataset. We'll also be validating our results using k-fold cross validation: the gold standard for assessing a model's performance. By the end of this guide you should be comfortable doing the following:

  • Implementing a fully connected neural network
  • Applying different activation functions
  • Adding and removing layers from your network
  • Interpreting your cross validated results

If you're not familiar with activation functions already you may wish to brush up on these here.

Preliminaries

Before we can do any modeling we must first load in the necessary libraries and the MNIST data. Fortunately, the MNIST data is provided as part of the Keras library and is around 15mb. Installation details for Keras can be found here, however the command pip install keras should work provided you do not want any of the dependencies related to Keras.


In [1]:
%matplotlib inline
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import to_categorical
import numpy as np
import matplotlib.pyplot as plt

# Load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
print(X_train.shape)
print(y_train.shape)


Using TensorFlow backend.
(60000, 28, 28)
(60000,)

For those of you unfamiliar with the MNIST dataset, it is a set of 70000, 28x28 pixel images depicting handwritten digits from 0-9. It is a commonly used dataset within image classification tutorials due to to its clean nature and uniform structure meaning that very little time must be spent wrangling the data.

Preprocessing

To get a feel for the data, we can first plot an image to physically see what we're modeling:


In [2]:
plt.imshow(X_train[123], cmap=plt.get_cmap('gray'))


Out[2]:
<matplotlib.image.AxesImage at 0x7f703f774240>

This may actually be a harder observation for our network to classify as it is likely a 7, however, it could in fact be a 2 or 3 with the lower segment cropped. In any case, this gives us a feel for the type of images that we are modeling.

As can be seen from the dimension of the X_train dataset, we currently have a 3-dimensional array. This array needs to be flattened prior to passing the data through our neural network. This can be achieved through the NumPy function reshape() which does what it says on the tin - takes an array and reshapes it whilst preserving the original data.


In [3]:
def array_reshape(X, output_size):
    X = X.reshape(output_size[0],output_size[1])
    return X
X_train = array_reshape(X_train, [60000, 28**2])
X_test = array_reshape(X_test, [10000, 28**2])

With the data reshaped, the only remaining steps are to standardise the data with mean 0 and 1 standard deviation and encode the output variable. Whilst not entirely necessary as each columns holds values within the same range (0 and 255), it will nonetheless make backpropogation more efficient and avoid saturation. If you're interested in reading more on this then I recommend these two articles here and here. Whilst there is a preprocessing() function available in sklearn, we'll implement our own here just demonstrate the point ore verbosely.


In [4]:
def standardise_array(X, multiplier=1):
    X = (X-np.mean(X))/(multiplier*np.std(X))
    return X

X_train = standardise_array(X_train)
X_test = standardise_array(X_test)

The argument for a multiplier has been included here as you may wish to divide through by twice the standard deviation, as is the recommendation here. As for encoding the labels, we're going to apply one-hot encoding to them. What this means is the a k-leveled variable is transformed into k individual boolean columns, in our case k=10, so 10 new columns created, the first indicating if the image is a zero or not, the second if it is a one and so forth. You may be wondering why we must do this additional step and the reason is simply that if we do not, the model will assume the outputs are ordered, so 1 is greater than 0 and so forth, often leading to poor predictions from the model. This one-hot encoding can be achieved through the Keras function to_categorical().


In [5]:
y_train_enc = to_categorical(y_train)
y_test_enc = to_categorical(y_test)

Building The Model

With the data now correctly preprocessed, we can begin building and training our neural network. In its simplest form, Keras works by initially by defining your network type and from there building layers into the network. With this defined, the model can be compiled with the addition of a loss and optimisation function. In our case we're going to be build a simple fully-connected model with two hidden layers. The term fully-connected means that every pair of nodes in adjacent layers is connected. The network type being used here is a Sequential network, meaning that the entire network is a linear stack of layers. The two hidden layers utilise the ReLU activation function (discussed in greater depth [here])(runningthenumbers.co.uk/neural-networks/activation-functions.html)) along with the softmax function in the final layer. With all that said, get into building our network.


In [6]:
model = Sequential()
model.add(Dense(units = 32, activation = "relu", input_dim = 28**2, kernel_initializer = "normal"))
model.add(Dense(units = 12, activation = "relu", kernel_initializer='normal'))
model.add(Dense(units = 10, activation = "softmax", kernel_initializer='normal'))
model.compile(loss = "categorical_crossentropy",
             optimizer = "adam",
             metrics = ["accuracy"])

The kernel_initializer argument initialises the sets of weights between each layer to be random draws from the normal distribution. When you build your own model you should not be too prescriptive in your choices of units and activation functions in your first two layers, instead trying out different sizes and function. The input_dimension and units arguments in the firs and final layer respectively should be kept constant however as these are dictated by the shape of the data.

Training

With a model compiled it should trained. In Keras this is a very intuitive process, with the user only needing to define the training data, epochs (number forward and backward propogation passes) and the batch size (the number of samples to pass through the network). Generally the more epochs the better, although there will be diminishing returns at a point. I generally start with 50 epochs and increase the number if necessary. As for batch size, 32 is generally considered a good starting point, though a larger batch size will result in a faster training time but converge slower, with the converse being true for smaller batch sizes. It is important to train the model using batches as failure to do so can result in excessive memory usage.


In [7]:
np.random.seed(123)
model.fit(X_train, y_train_enc, epochs=50, batch_size=32, verbose=0)


Out[7]:
<keras.callbacks.History at 0x7f703bed1828>

You'll notice in the above code snippet that verbose=0, I have done this to keep the tutorial clean, however you may want to set verbose=1 when running yours as you'll get useful output regarding the model's accuracy. Should you re-run this exact guide, you'll notice that the model's accuracy on the training data begins to plateau around 40 epochs.

Testing

With a model trained, the final step is to make some predictions on our testing data and assess the true accuracy of the model by running it on unseen data.


In [8]:
accuracy = model.evaluate(X_test, y_test_enc, verbose=0)
print("The model has {}% accuracy on unseen testing data".format(np.round(accuracy[1]*100, 1)))


The model has 96.6% accuracy on unseen testing data

An accuracy as high as this is by all standards very good, however some tuning of our networks can increase this further, although that is beyond the scope of this guide.

K-Fold Cross-Validation

As discussed in the introduction to this guide, the true accuracy of this network will be assessed through cross-validation. To implement this, we can make use of the SKFold function in the sklearn library. K-fold cross validation works by dividing the data up into k partitions and using k-1 of the partitions as training data and the final partition as testing data. This process is repeated k times, with each of the partitions being used once as the testing data. It has been found that when k=10, the best balance of variance and bias is found in cross validation, so it will therefore be used here. To do this in Keras we must present our model in a function. We will also have to stack the data as cross-validation is run on the entire dataset. Cross-validation can be quite time consuming as for each data partition, a new model will have to be defined and trained before being fitted, however, this should give you a chunk of code with everything needed to run your own neural network later.


In [9]:
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from keras.wrappers.scikit_learn import KerasClassifier

# Set the seed again for reproducabilty
np.random.seed(123)

# Create our entire arrays
X = np.vstack((X_train, X_test))
y = np.vstack((y_train_enc, y_test_enc))

# Wrap our model inside a function
def mnist_nn():
    model = Sequential()
    model.add(Dense(units = 32, activation = "relu", input_dim = 28**2, kernel_initializer = "normal"))
    model.add(Dense(units = 12, activation = "relu", kernel_initializer='normal'))
    model.add(Dense(units = 10, activation = "softmax", kernel_initializer='normal'))
    model.compile(loss = "categorical_crossentropy",
                 optimizer = "adam",
                 metrics = ["accuracy"])
    return model

# Place an Sklearn wrapper around our Keras Network
clf = KerasClassifier(build_fn=mnist_nn, epochs = 50, batch_size=32, verbose=0)

# Define the number of folds to be made
folds = KFold(n_splits=10, random_state=123)

# Run Cross-validation
accuracies = cross_val_score(clf, X, y, cv=folds)

# Obtain our final model's metrics
final_accuracy = np.mean(accuracies)
s_error = np.std(accuracies*100)/np.sqrt(10)
print("Final Accuracy: {}, with standard error: {}".format(np.round(final_accuracy*100,1), np.round(s_error, 1)))


Final Accuracy: 96.5, with standard error: 0.1

Prior to researching this article, I was unaware of a way to run cross-validation with Sklearn on a Keras Neural network and had always been forced to implement it manually. Fortunately, during my research I stumbled across a post by Dr. Jason Brownlee at Machine Learning Mastery which you can find here. I highly recommend checking out his blog as he provides really great, in-depth articles and discussions around current machine learning techniques.

Conclusion

So there you have it, in this guide we've successfully implemented a neural network to the MNIST dataset and assessed the network's performance using both a standard dataset and cross-validation. Additionally, you've seen how to standardise features, one-hot encode labels and flatten NumPy arrays.

If you're interested in seeing more real world applications of neural networks, then I suggest reading Krizhevsky et. al. here and an even deeper understanding into the topic of neural networks and deep learning, the book Deep Learning, by Ian Goodfellow and Yoshua Bengio which is freely available in HTML here..

The source code for this post is available here.


In [ ]: