Lab 12: Logistic Regression Using TensorFlow



In [1]:

    
!pip install tensorflow

import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
import math

!pip install -U okpy
from client.api.notebook import Notebook
ok = Notebook('lab12.ok')

In today's lab, we're going to use logistic regression to classify handwritten digits. You'll learn about logistic / softmax regression and TensorFlow, a popular machine learning library developed by Google.

TensorFlow is a library typically used to train deep neural networks (DNNs). DNN learning is just like linear regression or classification, except that we search over a more complicated class of functions, not just linear ones. DNNs have been popularized by their success in many fields, such as in spam detection, speech recognition, or even in art, such as Neural Style. They are a building block in many successful applications of machine learning in recent years.

Protip: This lab is taken straight from the TensorFlow tutorials so if you get stuck, go ahead and reference that page.

Digitize it

The MNIST dataset is comprised of 60,000 handwritten digits from 0-9 (10 total types). The data are greyscale pixels from scans of handwriting.

Let's load in and take a peek at the data. The next cell will download and load the data into a variable called mnist.



In [2]:

    
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Here are the dimensions of the data. You'll see that TensorFlow has already split the dataset into training, validation, and test sets.



In [3]:

    
mnist.train.images.shape, mnist.validation.images.shape, mnist.test.images.shape

Each training example is originally a 28x28 image:

To make it easier for machine learning, the images are flattened out into length-784 vectors.

Here's a function to reshape the vector back into a 28x28 image and a function to display one / multiple images.



In [4]:

    
def example_to_image(example):
    '''Takes in a length-784 training example and returns a (28, 28) image.'''
    return example.reshape((28, 28))

def show_images(images, ncols=2, figsize=(10, 7), **kwargs):
    """
    Shows one or more images.
    
    images: Image or list of images.
    """
    def show_image(image, axis=plt):
        plt.imshow(image, **kwargs)
        
    if not (isinstance(images, list) or isinstance(images, tuple)):
        images = [images]
    
    nrows = math.ceil(len(images) / ncols)
    ncols = min(len(images), ncols)
    
    plt.figure(figsize=figsize)
    for i, image in enumerate(images):
        axis = plt.subplot2grid(
            (nrows, ncols),
            (i // ncols,  i % ncols),
        )
        axis.tick_params(bottom='off', left='off', top='off', right='off',
                         labelleft='off', labelbottom='off')
        axis.grid(False)
        show_image(image, axis)

Question 1: Use the provided example_to_image and show_images function to visualize the training examples given below.



In [5]:

    
# These indices are the examples you should show from mnist.train.images
examples_to_show = np.array([0,  5100, 10200, 15300, 20400, 25500, 30600, 35700, 40800, 45900])

# Get the examples from the training data
examples = ...

# Convert each example into an image
images = ...

# Call show_images using ncols=5
...

# We'll print the labels for each of these examples
mnist.train.labels[examples_to_show]

Notice that there are more than 2 labels (0 through 9), and the label data are represented in a one-hot encoding. So, the labels have dimension n x 10. This is different from what we've done before, but it is is a typical strategy for multiclass classification. We will see how our softmax loss function incorporates 10-dimensional labels.

Softmax Regression

We've discussed logistic regression at length during lecture. The basic idea is that instead of taking the standard regression equation:

$$ f_\theta(x) = \theta_1x_1 + ... + \theta_dx_d + b = \theta^\top x + b $$

We fit the sigmoid function instead:

$$ f_\theta(x) = s(\theta_1x_1 + ... + \theta_dx_d + b) = s(\theta^\top x + b) $$

Where $$ s(x) = \frac{1}{1 + e^{-x}} $$

The output of $s$ is always a number between 0 and 1, so we can roughly say, "This example has a 70% chance of being in class 1 and 30% chance of being in class 2, so we'll label it class 1."

When we have more than one class (say $J$ classes), we instead use the softmax function:

$$ \text{softmax}(x)_i = \frac{e ^ {x_i}}{\sum_{j=1}^{J} e^{x_j}} $$

Which basically means: "For an example $x$, give each possible class a score, then make sure all the scores add to 1 so we can say this example has a 50% chance of being a 0, 10% of being a 1, 15% of being a 2, etc."

Then our regression function becomes:

$$ f_\theta(x) = \text{softmax}(\theta^\top x + b) $$

It's important to notice that the output of $f_\theta$ and the input to $\text{softmax}$ are 10-dimensional. Since we learn a different score for each class, we need a whole row of parameters for each class. Think about what that says about the dimensions of $\theta$ and $b$.

TensorFlow

Let's code this up in TensorFlow. It's easy to implement this after you learn the syntax.

Once you learn the basic syntax, you can create much more complicated models in a similar way. TensorFlow also allows you to use your computer's GPU (graphical processing unit) to train your model, significantly decreasing training time.

We're not going to doing very complicated things in TensorFlow today. However, we'll point out where it gives us flexibility that scikit-learn doesn't.

TensorFlow operates on variables and relationships between them. Defining, training, and using a model has a few steps:

We define variables for every quantity involved in the modeling process. Some examples: the input to a model, the parameters of the model, any intermediate calculations done by the model, the outputs of the model, and the true labels we want to match.
We describe the relationships between those variables; for example, multiplying the parameters by the inputs will produce our scores.
We fill in the inputs and true labels, and we tell TensorFlow to use gradient descent to compute the best parameters.
We can then fill in new inputs and observe the outputs of the trained model.

Inputs

We use tf.placeholder to specify an input variable. In our case, we want our training data to be an input to the classifier (eg. training points in -> prediction out).

The syntax is: tf.placeholder( type , shape ) where shape is the shape of the input, like a NumPy array's shape.

For example, tf.placeholder(tf.int32, [50, 3]) says: "This input takes in an integer array with 50 examples, 3 dimensions each." Generally we don't hard-code the first dimension, the number of training examples, ourselves. Instead, we write tf.placeholder(tf.int32, [None, 3]), which says: "This input takes in an integer array with any number of examples, 3 dimensions each."

Question 2: Create a placeholder called x that takes in a tf.float32 array with any number of examples from the mnist dataset.

Then, create a placeholder called y_ that takes in a tf.float32 array with any number of corresponding labels from the mnist dataset.



In [6]:

    
x = ...
y_ = ...

Question 3: Weight and bias vectors are not determined by external input, but will be constantly updated while the gradient descent training process runs. The syntax to create such variables (initializing them to 0) is:

tf.Variable(tf.zeros( shape )) where shape is the shape of the variable, again in NumPy style.

Create variables theta and b corresponding to the weights and bias of our classifier.

Remember that our prediction is a length 10 vector, not a single value as we have done before. This means that the dimensions of theta are not (784, 1) as usual. Think carefully about the dimensions of x, theta, b, and our prediction.



In [7]:

    
theta = ...
b = ...

Question 4: Now, we can implement our classifier.

The tf.nn.softmax(...) function provides a softmax implementation for us. Instead of using the typical X @ theta, we use tf.matmul(...). Addition via + works as normal.

Set y to the output of the softmax regression function.



In [8]:

    
y = ...

y is a variable now. Its value will be determined by the inputs x and parameters theta and b.

We can implement all sorts of classifiers just by changing parts of the equation above. You just have to know the functional form of the classification function.

In order to train our classifier, we need to implement the correct loss function. In class, we saw that the loss function for logistic regression was the negative log probability assigned by the model to the true labels. This translates directly to softmax regression. When there are multiple classes, it is called the cross-entropy loss:

$$ L_{y}(\hat{y}) = - \sum_{j=1}^{J} y_j \log \hat{y}_j $$

where $ y $ is the one-hot vector of the label and $ \hat{y} $ is the vector of predicted softmax values.

Verify that if we assign probability 1 to the correct label and 0 to the others, then the loss is 0. It's also useful to verify that if the prediction is incorrect, the loss is greater than 0.

Here's the cross entropy loss in TensorFlow:



In [9]:

    
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

We'll call cross_entropy the loss function, but as a Python object it's just another TensorFlow variable. Its value is a scalar, the number we'd like to minimize by choosing theta and b.

Note: Ordinarily you wouldn't have to write out the last two steps; TensorFlow provides a single function that produces the cross-entropy loss given just $\theta^T x + b$.

Question 5: Now that we have written down our classification pipeline and loss, we need to tell TensorFlow how to run gradient descent.

The syntax for this is:

tf.train.GradientDescentOptimizer( learning_rate ).minimize( loss_fn )

Here learning_rate is the size of the steps we take at each iteration of gradient descent, and loss_fn is the variable defining the loss we'd like to minimize.

Set train_step to the gradient descent rule using 0.5 as the learning rate and the cross entropy loss function.



In [10]:

    
train_step = ...

Train it!

Our variables were containers or placeholders for data, with no numbers yet. Similarly, train_step is a just a recipe for optimizing, embodied in a Python object. We didn't actually do any optimization yet. But we're ready now.

The next cell tells TensorFlow to repeatedly compute train_step, filling in batches of 100 images at a time for x and y_. This will update theta and b using stochastic gradient descent for 1000 iterations, using 100 examples per iteration.



In [18]:

    
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

for _ in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

How did we do?

Run the next cell to see how your classifier did on the test set.



In [19]:

    
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print("Accuracy:")
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

Not bad! Let's see some examples of your predictions:



In [20]:

    
EXAMPLES_TO_SHOW = 10

corrects = sess.run(correct_prediction, feed_dict={x: mnist.test.images, y_: mnist.test.labels})
correct_i = np.where(corrects)[0][:EXAMPLES_TO_SHOW]

print("Correct predictions:")
correct_ex = mnist.test.images[correct_i]
correct_images = [example_to_image(example) for example in correct_ex]
show_images(correct_images, 5)



In [21]:

    
incorrect_i = np.where(~corrects)[0][:EXAMPLES_TO_SHOW]
print("Incorrect predictions:")
incorrect_ex = mnist.test.images[incorrect_i]
incorrect_images = [example_to_image(example) for example in incorrect_ex]
show_images(incorrect_images, 5)

print("You predicted:")
print(sess.run(tf.argmax(y,1), feed_dict={x: mnist.test.images, y_: mnist.test.labels})[incorrect_i])

Chances are some of your incorrect predictions are hard for you to guess, too!

We have only scratched the surface of TensorFlow. If you'd like to continue, you can start at the online tutorials.

Submitting your assignment

If you made a good-faith effort to complete the lab, change i_finished_the_lab to True in the cell below. In any case, run the cells below to submit the lab.



In [ ]:

    
i_finished_the_lab = False



In [108]:

    
_ = ok.grade('qcompleted')
_ = ok.backup()



In [109]:

    
_ = ok.submit()

Now, run this code in your terminal to make a git commit that saves a snapshot of your changes in git. The last line of the cell runs git push, which will send your work to your personal Github repo.

# Tell git to commit your changes to this notebook
git add -A

# Tell git to make the commit
git commit -m "lab11 finished"

# Send your updates to your personal private repo
git push origin master