Featuring Tensorflow (TFlow).
We'll be going classifying MNIST data, which is a set of ~70,000 images of handwritten digits. Bear in mind, this is a solved problem, so we're not doing anything novel.
What you should leave with: You should leave here with a practical understanding of how to implement an Artificial Neural Network (ANN) from nothing. The concepts don't change when you move to different domains, simply the way in which you apply them. Your understanding of the central concept of ANNs, backpropagation (backprop) should be well founded and given some more practice, you could explain this to a friend.
You should also leave here with a minimal understanding TensorFlow and how using such a library can speed up your model development, as well as understanding some of it's drawbacks.
In [1]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("data/", one_hot=True)
We're going to build an Artificial Neural Network (ANN) from the ground-up, using raw Python, NumPy, and ScipPy.
We'll build the ANN from the group up to give you the intuition behind how one would go about creating an ANN. These can run faster than the ANN built with TensorFlow, Torch, or other libraries, but these libraries introduce simplicity of building (which you'll see later).
In [2]:
import numpy as np
from scipy.special import expit
We're going to build a ANN class, called NeuralNetwork, this will contain two functions, and an initializer.
The functions are: train(...) and query(...). The ... is because we don't necessarily know what we should be passing through to these functions.
NOTE: We're going to build the functions that go into the class, so we take things a step as a time, and so this commentary can be there in between. Once we've built the functions, we'll copy-paste them into the class definition and run with it from there.
The __init__(...) is almost like a constructor. Essentially, we use this to setup some instance variables that enable us to avoid passing the ANN's configuration to each function we call.
This function should have a few variables that keep track of and add to the class:
In [3]:
def __init__():
pass
The query(...) function should enable us to talk to the ANN and ask it to classify some images we hand it.
We write this function before train(...) because it's less complex in nature because it equates to a forward pass through the network. This should help us ground the ideas that we should be implementing into train(...) when we get there.
In [4]:
def query():
pass
The train(...) function is how the ANN learns. We'll hand it our dataset with the labels for it to validate itself on by completing forward passes and updating the weights through backprop.
We'll need to hand this function:
Now that we've given our network data to train on, we need to implement the forward pass, followed by the backward pass. Recall that the backward pass involves a few stages. First, we need to calculate the output error, then distribute that error backwards across the network. This will update our weights, but the update will be moderated by the learning rate, which we specified earlier in __init__().
In [5]:
def train():
pass
In [6]:
class NeuralNetwork():
pass
Now we'll move on the training the network on MNIST, but to do so, we need to specify some parameters.
Recall that these images are 28x28 pixel images, which results in a total of 784 inputs. We ultimately need to classify these images into 10 classes, as we're analyzing the numbers 0-9. The hidden layers is rather arbitrary in size, so we can use just about any amount of hidden layers we want.
In [7]:
n_inodes = 1
n_hnodes = 1
n_onodes = 1
learn_rt = 1
nn = NeuralNetwork(n_inodes, n_hnodes, n_onodes, learn_rt)
We've initialized the ANN, so now we need to actually execute the training of it. We'll train over N epochs, which are essentially just the number of times we go over the data to see if we can continue to refine the weights.
In [8]:
epochs = 5
for e in range(epochs):
for record, label in zip(mnist.train.images, mnist.train.labels):
record[record == float(0)] = 0.01
label[label == float(0)] = 0.01
nn.train(record, label)
In [9]:
score = []
for record, label in zip(mnist.test.images, mnist.test.labels):
correct_label = np.argmax(label)
inputs = record * 0.99 + 0.01
outputs = nn.query(inputs)
label = np.argmax(outputs)
score.append(1 if label == correct_label else 0)
In [10]:
print("Performance = {0:.3f}%".format(np.array(score).mean() * 100))
Now we've built our first ANN. This is a pretty small one compared to some that exist in the depths of the interwebs, but ultimately it's a start.
This network's accuracy is about 97%, which is pretty bad for MNIST, but for your first network, that's pretty awesome.
Now let's build the same sort of ANN, but this time in TensorFlow. Ultimately, you can build your own networks and models; but one of the benefits of using a platform like TensorFlow is that it enables you to use other's models, as well as allow others to use your own.
In [11]:
import tensorflow as tf
In TensorFlow, a placeholder is a promise that we'll provide a value later; it's akin to declaring a variable, but not initializing it.
In [12]:
x = tf.placeholder(tf.float32, [None, 784])
Variables are what you're used to from programming; in TensorFlow, these are considered "trainable parameters" that get added to your Graph. They're akin to declaring and initializing the variable. They have a dtype and initial value.
In [13]:
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
Softmax is one of many different activation functions. Softmax is defined as:
We use softmax because our ultimate goal is to assign probabilities to our input that it belongs to one of the 10 classes of numbers. It will give us a list of results from [0, 1] which all sum to 1, as well.
One of the reasons for using softmax is that it allows us to assign probabilities between 0 and 1 for a collection of data that we want. So, if we had 4 outputs to choose from, softmax could give an output like [0.1, 0.2, 0.4, 0.3] and our classifier would choose whatever class 2 (CS#2) represents.
Softmax is composed of two steps:
In [14]:
y = tf.nn.softmax(tf.matmul(x, W) + b)
In [15]:
y_ = tf.placeholder(tf.float32, [None, 10])
As always, we need a loss function, like SSE in the ANN. ross-entropy arises from thinking about information compressing codes in information theory but it winds up being an important idea in lots of areas, from gambling to machine learning.
In some rough sense, the cross-entropy is measuring how inefficient our predictions are for describing the truth.
In [16]:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
In [17]:
train_step = tf.train.GradientDescentOptimizer(0.1).minimize(cross_entropy)
This is how we actually interact with the network we've built.
In [18]:
sess = tf.InteractiveSession()
In [19]:
tf.global_variables_initializer().run()
In [20]:
for _ in range(10000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
In [21]:
correct_pred = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
In [22]:
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
In [23]:
print("Performance = {0:.3f}%".format(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}) * 100))
We've rebuilt an equivalent ANN in TensorFlow. This network's accuracy is about 91%, which is pretty bad for MNIST, but for a raw neural network, it's still pretty awesome.
In [ ]:
import tensorflow as tf
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.1).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
for _ in range(10000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
correct_pred = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
print("Performance = {0:.3f}%".format(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}) * 100))