TensorFlow Basics

Note: If running this notebook directly, make sure you are running your Jupyter kernel in an environment with TensorFlow installed. Some useful packages to install/import first, if you didn't in notebook 00A:


In [1]:
# Beginning a line with "!" allows you to execute a command in your terminal
# You'll need to install these packages for certain lines of this notebook if you don't already have them.
!conda install -y matplotlib
!pip install -U tqdm


Fetching package metadata .......
Solving package specifications: ..........

# All requested packages already installed.
# packages in environment at C:\Users\kevin_000\Anaconda3\envs\tensorflow:
#
matplotlib                2.0.2               np113py36_0  
Requirement already up-to-date: tqdm in c:\users\kevin_000\anaconda3\envs\tensorflow\lib\site-packages

In [2]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
from tqdm import trange

Basic TensorFlow test from the installation instructions


In [3]:
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))


b'Hello, TensorFlow!'

Define a simple TensorFlow graph

The example from the TensorFlow Introduction slides:


In [4]:
a = tf.constant(3.0, dtype=tf.float32)
b = tf.constant(4.0, dtype=tf.float32)
sum_a_b = tf.add(a, b)

# Using the Python "with" as a context manager
with tf.Session() as sess:
    print(sess.run(sum_a_b)) # Prints “7.0” to the screen


7.0

The previous graph only produces a constant output. Not particularly interesting. To create a similar graph that can acccept variable inputs:


In [5]:
a = tf.placeholder(tf.float32)
b = tf.placeholder(tf.float32)
sum_a_b = tf.add(a, b)

with tf.Session() as sess:
    feed_dict = {a: 3.0, b: 4.0}
    print('1st Result: {0}'.format(sess.run(sum_a_b, feed_dict=feed_dict)))
    
    feed_dict = {a: 2015, b: 2020}
    print('2nd Result: {0}'.format(sess.run(sum_a_b, feed_dict=feed_dict)))


1st Result: 7.0
2nd Result: 4035.0

MNIST Example

The MNIST dataset is very popular machine learning dataset, consisting of 70000 grayscale images of handwritten digits, of dimensions 28x28. We'll be using it as our example for this section of the tutorial, with the goal being to predict which the digit is in each image.

Since it's such a common (and small) dataset, TensorFlow has commands for downloading and formatting the dataset conveniently baked in already:


In [6]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)


Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

Let's take a look at how the data is organized:


In [7]:
# Dataset statistics
print('Training image data: {0}'.format(mnist.train.images.shape))
print('Validation image data: {0}'.format(mnist.validation.images.shape))
print('Testing image data: {0}'.format(mnist.test.images.shape))
print('28 x 28 = {0}'.format(28*28))

print('\nTest Labels: {0}'.format(mnist.test.labels.shape))
labels = np.arange(10)
num_labels = np.sum(mnist.test.labels, axis=0, dtype=np.int)
print('Label distribution:{0}'.format(list(zip(labels, num_labels))))

# Example image
print('\nTrain image 1 is labelled one-hot as {0}'.format(mnist.train.labels[1,:]))
image = np.reshape(mnist.train.images[1,:],[28,28])
plt.imshow(image, cmap='gray')


Training image data: (55000, 784)
Validation image data: (5000, 784)
Testing image data: (10000, 784)
28 x 28 = 784

Test Labels: (10000, 10)
Label distribution:[(0, 980), (1, 1135), (2, 1032), (3, 1010), (4, 982), (5, 892), (6, 958), (7, 1028), (8, 974), (9, 1009)]

Train image 1 is labelled one-hot as [ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
Out[7]:
<matplotlib.image.AxesImage at 0x1e8803f1eb8>

Logistic Regression Model

Define the graph input: this is where we feed in our training images into the model. Since MNIST digits are pretty small and the model we're using is very simple, we'll feed them in as flat vectors.


In [8]:
# Define input placeholder
x = tf.placeholder(tf.float32, [None, 784])

To get our predicted probabilities of each digit, let's first start with the probability of a digit being a 3 like the image above. For our simple model, we start by applying a linear transformation. That is, we multiply each value of the input vector by a weight, sum them all together, and then add a bias. In equation form:

\begin{align} y_3 = \sum_i w_{i,3} x_i + b_3 \end{align}

The magnitude of this result $y_3$, we'll take as being correlated to our belief in how likely we think the input digit was a 3. The higher the value of $y_3$, the more likely we think the input image $x$ was a 3 (ie, we'd hope we'd get a relatively large value for $y_3$ for the above image). Remember though, our original goal was to identify all 10 digits, so we also have:

\begin{align*} y_0 =& \sum_i w_{i,0} x_i + b_0 \\ &\vdots \\ y_9 =& \sum_i w_{i,9} x_i + b_9 \end{align*}

We can express this in matrix form as:

\begin{align} y = W x + b \end{align}

To put this into our graph in TensorFlow, we need to define some Variables to hold the weights and biases:


In [9]:
# Define linear transformation
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.matmul(x, W) + b

We can interpret these values (aka logits) $y$ as probabilities if we normalize them to be positive and add up to 1. In logistic regression, we do this with a softmax:

\begin{align} p(y_i) = \text{softmax}(y_i) = \frac{\text{exp}(y_i)}{\sum_j\text{exp}(y_j)} \end{align}

Notice that because the range of the exponential function is always non-negative, and since we're normalizing by the sum, the softmax achieves the desired property of producing values between 0 and 1 that sum to 1. If we look at the case with only 2 classes, we see that the softmax is the multi-class extension of the binary sigmoid function: <img src="Figures/Logistic-curve.png", width="500">

Computing a softmax in TensorFlow is pretty easy, sort of*:

*More on this later


In [10]:
# Softmax to probabilities
py = tf.nn.softmax(y)

That defines our forward pass of our model! We now have a graph that performs a forward pass: given an input image, the graph returns the probabilities the model thinks the input is each of the 10 classes. Are we done?

Not quite. We don't know the values of $W$ and $b$ yet. We're going to learn those by defining a loss and using gradient descent to do backpropagation. Essentially, we'll be taking the derivative with respect to each of the elements in $W$ and $b$ and wiggling them in a direction that reduces our loss.

The loss we commonly use in classification is cross-entropy. Cross-entropy is a concept from information theory:

\begin{align} H_{y'}(y)=-\sum_i y'_i \text{log}(y_i) \end{align}

Cross-entropy not only captures how correct (max probability corresponds to the right answer) the model's answers are, it also accounts for how confident (high confidence in correct answers) they are. This encourages the model to produce very high probabilities for correct answers while driving down the probabilities for the wrong answers, instead of merely be satisfied with it being the argmax.

In supervised models, we need labels to learn, so we create a placeholder for the labels in our training data:


In [11]:
# Define labels placeholder
y_ = tf.placeholder(tf.float32, [None, 10])

The cross-entropy loss is pretty easy to implement:


In [12]:
# Loss
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(py), reduction_indices=[1]))

In the old days, we would have to go through and derive all the gradients ourselves, then code them into our program. Nowadays, we have libraries to compute all the gradients automatically. Not only that, but TensorFlow comes with a whole suite of optimizers implementing various optimization algorithms. I'm not going to go into the details of why you should appreciate that right now, because I know that Prof David Carlson has an entire day's worth of material on optimization.


In [13]:
# Optimizer
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)

To train, we simply call the optimizer op we defined above. First though, we need to start a session and initialize our variables:


In [14]:
# Create a session object and initialize all graph variables
sess = tf.Session()
sess.run(tf.global_variables_initializer())

There are much cleverer ways to design a training regimen that stop training once the model is converged and before it starts overfitting, but for this demo, we'll keep it simple:


In [15]:
# Train the model
# trange is a tqdm function. It's the same as range, but adds a pretty progress bar
for _ in trange(1000): 
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})


100%|█████████████████████████████████████| 1000/1000 [00:01<00:00, 571.94it/s]

Notice, because of the way the dependency links are connected in our graph, running the optimizer requires an input to both the training image placeholder x and the training label placeholder y_ (as it should). The values of all variables (W and b) are updated in place automatically by the optimizer.

Now let's see how we did! For every image in our test set, we run the data through the model, and take the digit in which we have the highest confidence as our answer. We then compute an accuracy by seeing how many we got correct:


In [16]:
# Test trained model
correct_prediction = tf.equal(tf.argmax(py, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print('Test accuracy: {0}'.format(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})))


Test accuracy: 0.902899980545044

Not bad for a simple model and a few lines of code. Before we close the session, there's one more interesting thing we can do. Normally, it can be difficult to inspect exactly what the filters in a model are doing, but since this model is so simple, and the weights transform the data directly to their logits, we can actually visualize what the model's learning by simply plotting the weights. The results look pretty reasonable:


In [17]:
# Get weights
weights = sess.run(W)

fig, ax = plt.subplots(1, 10, figsize=(20, 2))

for digit in range(10):
    ax[digit].imshow(weights[:,digit].reshape(28,28), cmap='gray')

# Close session to finish
sess.close()


The entire model, with the complete model definition, training, and evaluation (but minus the weights visualization), is below. Note the slight difference when calculating the softmax; this is done for numerical stability purposes.


In [18]:
import tensorflow as tf
from tqdm import trange
from tensorflow.examples.tutorials.mnist import input_data

# Import data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# Create the model
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.matmul(x, W) + b

# Define loss and optimizer
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

# Create a Session object, initialize all variables
sess = tf.Session()
sess.run(tf.global_variables_initializer())

# Train
for _ in trange(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

# Test trained model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print('Test accuracy: {0}'.format(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})))

sess.close()


Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
100%|█████████████████████████████████████| 1000/1000 [00:01<00:00, 572.50it/s]
Test accuracy: 0.9190000295639038

Note: The accuracy from the full version directly above might return a slightly different test accuracy from the step-by-step version we first went through. This is because mnist.train.next_batch by default shuffles the order of the training data, so we're seeing the data in a different order.

Acknowledgment: Material adapted from the TensorFlow tutorial: https://www.tensorflow.org/get_started/