In [1]:
!pip install tensorflow
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
import math
!pip install -U okpy
from client.api.notebook import Notebook
ok = Notebook('lab12.ok')
In today's lab, we're going to use logistic regression to classify handwritten digits. You'll learn about logistic / softmax regression and TensorFlow, a popular machine learning library developed by Google.
TensorFlow is a library typically used to train deep neural networks (DNNs). DNN learning is just like linear regression or classification, except that we search over a more complicated class of functions, not just linear ones. DNNs have been popularized by their success in many fields, such as in spam detection, speech recognition, or even in art, such as Neural Style. They are a building block in many successful applications of machine learning in recent years.
Protip: This lab is taken straight from the TensorFlow tutorials so if you get stuck, go ahead and reference that page.
The MNIST dataset is comprised of 60,000 handwritten digits from 0-9 (10 total types). The data are greyscale pixels from scans of handwriting.
Let's load in and take a peek at the data. The next cell will download and load the data into a variable called mnist
.
In [2]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
Here are the dimensions of the data. You'll see that TensorFlow has already split the dataset into training, validation, and test sets.
In [3]:
mnist.train.images.shape, mnist.validation.images.shape, mnist.test.images.shape
Each training example is originally a 28x28 image:
To make it easier for machine learning, the images are flattened out into length-784 vectors.
Here's a function to reshape the vector back into a 28x28 image and a function to display one / multiple images.
In [4]:
def example_to_image(example):
'''Takes in a length-784 training example and returns a (28, 28) image.'''
return example.reshape((28, 28))
def show_images(images, ncols=2, figsize=(10, 7), **kwargs):
"""
Shows one or more images.
images: Image or list of images.
"""
def show_image(image, axis=plt):
plt.imshow(image, **kwargs)
if not (isinstance(images, list) or isinstance(images, tuple)):
images = [images]
nrows = math.ceil(len(images) / ncols)
ncols = min(len(images), ncols)
plt.figure(figsize=figsize)
for i, image in enumerate(images):
axis = plt.subplot2grid(
(nrows, ncols),
(i // ncols, i % ncols),
)
axis.tick_params(bottom='off', left='off', top='off', right='off',
labelleft='off', labelbottom='off')
axis.grid(False)
show_image(image, axis)
Question 1: Use the provided example_to_image
and show_images
function to visualize the training examples given below.
In [5]:
# These indices are the examples you should show from mnist.train.images
examples_to_show = np.array([0, 5100, 10200, 15300, 20400, 25500, 30600, 35700, 40800, 45900])
# Get the examples from the training data
examples = ...
# Convert each example into an image
images = ...
# Call show_images using ncols=5
...
# We'll print the labels for each of these examples
mnist.train.labels[examples_to_show]
Notice that there are more than 2 labels (0 through 9), and the label data are represented in a one-hot encoding. So, the labels have dimension n x 10. This is different from what we've done before, but it is is a typical strategy for multiclass classification. We will see how our softmax loss function incorporates 10-dimensional labels.
We've discussed logistic regression at length during lecture. The basic idea is that instead of taking the standard regression equation:
$$ f_\theta(x) = \theta_1x_1 + ... + \theta_dx_d + b = \theta^\top x + b $$We fit the sigmoid function instead:
$$ f_\theta(x) = s(\theta_1x_1 + ... + \theta_dx_d + b) = s(\theta^\top x + b) $$Where $$ s(x) = \frac{1}{1 + e^{-x}} $$
The output of $s$ is always a number between 0 and 1, so we can roughly say, "This example has a 70% chance of being in class 1 and 30% chance of being in class 2, so we'll label it class 1."
When we have more than one class (say $J$ classes), we instead use the softmax function:
$$ \text{softmax}(x)_i = \frac{e ^ {x_i}}{\sum_{j=1}^{J} e^{x_j}} $$Which basically means: "For an example $x$, give each possible class a score, then make sure all the scores add to 1 so we can say this example has a 50% chance of being a 0, 10% of being a 1, 15% of being a 2, etc."
Then our regression function becomes:
$$ f_\theta(x) = \text{softmax}(\theta^\top x + b) $$It's important to notice that the output of $f_\theta$ and the input to $\text{softmax}$ are 10-dimensional. Since we learn a different score for each class, we need a whole row of parameters for each class. Think about what that says about the dimensions of $\theta$ and $b$.
Let's code this up in TensorFlow. It's easy to implement this after you learn the syntax.
Once you learn the basic syntax, you can create much more complicated models in a similar way. TensorFlow also allows you to use your computer's GPU (graphical processing unit) to train your model, significantly decreasing training time.
We're not going to doing very complicated things in TensorFlow today. However, we'll point out where it gives us flexibility that scikit-learn
doesn't.
TensorFlow operates on variables and relationships between them. Defining, training, and using a model has a few steps:
We use tf.placeholder
to specify an input variable. In our case, we want our training data to be an input to the classifier (eg. training points in -> prediction out).
The syntax is: tf.placeholder( type , shape )
where shape
is the shape of the input, like a NumPy array's shape.
For example, tf.placeholder(tf.int32, [50, 3])
says: "This input takes in an integer array with 50 examples, 3 dimensions each." Generally we don't hard-code the first dimension, the number of training examples, ourselves. Instead, we write tf.placeholder(tf.int32, [None, 3])
, which says: "This input takes in an integer array with any number of examples, 3 dimensions each."
Question 2: Create a placeholder called x
that takes in a tf.float32
array with any number of examples from the mnist
dataset.
Then, create a placeholder called y_
that takes in a tf.float32
array with any number of corresponding labels from the mnist
dataset.
In [6]:
x = ...
y_ = ...
Question 3: Weight and bias vectors are not determined by external input, but will be constantly updated while the gradient descent training process runs. The syntax to create such variables (initializing them to 0) is:
tf.Variable(tf.zeros( shape ))
where shape
is the shape of the variable, again in NumPy style.
Create variables theta
and b
corresponding to the weights and bias of our classifier.
Remember that our prediction is a length 10 vector, not a single value as we have done before. This means that
the dimensions of theta
are not (784, 1)
as usual. Think carefully about the dimensions of x
, theta
, b
, and our prediction.
In [7]:
theta = ...
b = ...
Question 4: Now, we can implement our classifier.
The tf.nn.softmax(...)
function provides a softmax implementation for us. Instead of using the typical X @ theta
, we use tf.matmul(...)
. Addition via +
works as normal.
Set y
to the output of the softmax regression function.
In [8]:
y = ...
y
is a variable now. Its value will be determined by the inputs x
and parameters theta
and b
.
We can implement all sorts of classifiers just by changing parts of the equation above. You just have to know the functional form of the classification function.
In order to train our classifier, we need to implement the correct loss function. In class, we saw that the loss function for logistic regression was the negative log probability assigned by the model to the true labels. This translates directly to softmax regression. When there are multiple classes, it is called the cross-entropy loss:
$$ L_{y}(\hat{y}) = - \sum_{j=1}^{J} y_j \log \hat{y}_j $$where $ y $ is the one-hot vector of the label and $ \hat{y} $ is the vector of predicted softmax values.
Verify that if we assign probability 1 to the correct label and 0 to the others, then the loss is 0. It's also useful to verify that if the prediction is incorrect, the loss is greater than 0.
Here's the cross entropy loss in TensorFlow:
In [9]:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
We'll call cross_entropy
the loss function, but as a Python object it's just another TensorFlow variable. Its value is a scalar, the number we'd like to minimize by choosing theta
and b
.
Note: Ordinarily you wouldn't have to write out the last two steps; TensorFlow provides a single function that produces the cross-entropy loss given just $\theta^T x + b$.
Question 5: Now that we have written down our classification pipeline and loss, we need to tell TensorFlow how to run gradient descent.
The syntax for this is:
tf.train.GradientDescentOptimizer( learning_rate ).minimize( loss_fn )
Here learning_rate
is the size of the steps we take at each iteration of gradient descent, and loss_fn
is the variable defining the loss we'd like to minimize.
Set train_step
to the gradient descent rule using 0.5
as the learning rate and the cross entropy loss function.
In [10]:
train_step = ...
Our variables were containers or placeholders for data, with no numbers yet. Similarly, train_step
is a just a recipe for optimizing, embodied in a Python object. We didn't actually do any optimization yet. But we're ready now.
The next cell tells TensorFlow to repeatedly compute train_step
, filling in batches of 100 images at a time for x
and y_
. This will update theta
and b
using stochastic gradient descent for 1000 iterations, using 100 examples per iteration.
In [18]:
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
for _ in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
In [19]:
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print("Accuracy:")
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
Not bad! Let's see some examples of your predictions:
In [20]:
EXAMPLES_TO_SHOW = 10
corrects = sess.run(correct_prediction, feed_dict={x: mnist.test.images, y_: mnist.test.labels})
correct_i = np.where(corrects)[0][:EXAMPLES_TO_SHOW]
print("Correct predictions:")
correct_ex = mnist.test.images[correct_i]
correct_images = [example_to_image(example) for example in correct_ex]
show_images(correct_images, 5)
In [21]:
incorrect_i = np.where(~corrects)[0][:EXAMPLES_TO_SHOW]
print("Incorrect predictions:")
incorrect_ex = mnist.test.images[incorrect_i]
incorrect_images = [example_to_image(example) for example in incorrect_ex]
show_images(incorrect_images, 5)
print("You predicted:")
print(sess.run(tf.argmax(y,1), feed_dict={x: mnist.test.images, y_: mnist.test.labels})[incorrect_i])
Chances are some of your incorrect predictions are hard for you to guess, too!
We have only scratched the surface of TensorFlow. If you'd like to continue, you can start at the online tutorials.
In [ ]:
i_finished_the_lab = False
In [108]:
_ = ok.grade('qcompleted')
_ = ok.backup()
In [109]:
_ = ok.submit()
Now, run this code in your terminal to make a
git commit
that saves a snapshot of your changes in git
. The last line of the cell
runs git push, which will send your work to your personal Github repo.
# Tell git to commit your changes to this notebook
git add -A
# Tell git to make the commit
git commit -m "lab11 finished"
# Send your updates to your personal private repo
git push origin master