TensorFlow Tutorial 01

Simple Linear Model

Introduction

This tutorial demonstrates the basic workflow of TensorFlow with a simple linear model.


In [10]:
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import confusion_matrix

This was developed using Python 3.6.4 (Anaconda) and TensorFlow.

Load Data

The MNIST data set is about 12 MB.


In [11]:
from tensorflow.examples.tutorials.mnist import input_data
data = input_data.read_data_sets("data/MNIST/", one_hot=True)


Extracting data/MNIST/train-images-idx3-ubyte.gz
Extracting data/MNIST/train-labels-idx1-ubyte.gz
Extracting data/MNIST/t10k-images-idx3-ubyte.gz
Extracting data/MNIST/t10k-labels-idx1-ubyte.gz

The MNIST data set has now been loaded and it consists of 70,000 images and associated labels.


In [12]:
print("Size of:")
print("- Training-set:\t\t{}".format(len(data.train.labels)))
print("- Test-set:\t\t{}".format(len(data.test.labels)))
print("- Validatin-set:\t{}".format(len(data.validation.labels)))


Size of:
- Training-set:		55000
- Test-set:		10000
- Validatin-set:	5000

One-Hot Encoding

The data set has been loaded and as so-called One-Hot encoding. This means the labels have been converted to a single number to a vector whose length equals the number of possible classes. All elements of the vector are zero except for the i^th element which is one and means the class is i. For example, the One-Hot encoded for the first 5 images in the test-set are:


In [13]:
data.test.labels[0:5,:]


Out[13]:
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.]])

We also need the classes as single numbers for various comparisons and performance measures, so we convert the One-Hot encoded vectors to a single number by taking the index of the highest element. Note that the word 'class' is a keyword used in Python so we need to use the name 'cls' instead.


In [14]:
data.test.cls = np.array([label.argmax() for label in data.test.labels])

We can now see the class for the first five images in the test-set. Compare these to One-Hot encoded vector above. For example, the class for the first image is 7, which corresponds to a One-Hot encoded vector where all elements are zero except for the element with index 7.


In [15]:
data.test.cls[0:5]


Out[15]:
array([7, 2, 1, 0, 4])

Data Dimensions

The data dimensions are used in several places in source-code below. In computer programming, it is best to use variables and constants rather than having to hard-code specific numbers every time that number is used.


In [16]:
# MNIST images are 28 pixels in each dimension.
img_size = 28

# Images are stored in one-dimensional arrays of this length.
img_size_flat = img_size * img_size

# Tuple with height and width of images used to reshape arrays.
ima_shape = (img_size, img_size)

# Number of classes, one class for each of 10 digits.
num_classes = 10

Helper-functin for plotting images

Function used to plot 9 images in a 3*3 grid, and writing the true and predicted classes below each image.


In [17]:
def plot_image(images, cls_true, cls_pred=None):
    assert len(image) == len(cls_true) == 9
    
    #Create figure with 3*3 sub-plots.
    fig, axes = plt.subplots(3,3)
    fig.subplots_adjust(hspace = 0.3, wspace = 0.3)
    
    for i, ax in enumerate(axes.flat):
        # Plot image
        ax.imshow(images[i].reshape(img_shape), cmap='binary')
        
        # Show true and predicted classes.
        if cls_pred is None:
            xlabel = "True: {0}".format(cls_true[i])
        else:
            xlabel = "True: {0}, Pred:{1}".format(cls_true[i],cls_pred[i])
        
        ax.set_xlabel(xlabel)
        
        #Remove ticks from the plot
        ax.set_xticks([])
        ax.set_yticks([])

Plot a few images to see if data is correct


In [18]:
#Get the first images from the test-set.
images = data.test.images[0:9]

#Get the true classes for those images
cls_true = data.test.cls[0:9]

#Plot the images and labels using our helper-function above
plot_images(images = images, cls_true=cls_true)


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-18-31e18f1aa184> in <module>()
      6 
      7 #Plot the images and labels using our helper-function above
----> 8 plot_images(images = images, cls_true=cls_true)

NameError: name 'plot_images' is not defined

TensorFlow Graph

The entir purpose of TensorFlow is to have a so-called computational graph that can be executed much more efficiently if the same calculations were to be performed directly in Python. TensorFlow can be more efficient than NumPy because TensorFlow knows the entire computation graph that must be executed, while NumPy only knows the computation of a single mathmatical operation at a time.

TensorFlow can also take automatically calculate the gradients that are needed to optimize the variables of the graph so as to make the model perform better. This is because the graph is a combination of simple mathematical expressions so the gradient of the entire graph can be calculated using the chain-rule for derivatives.

TensorFlow can also take advantage of multi-core CPUs as well as GPUs - and Google has even built special chips just for TensorFlow which are called TPUs (Tensor Processing Units) and are even faster than GPUs.

A TensorFlow graph consists of the following parts which will be detailed below:

  • Placeholder variables used to change the input to the graph
  • Model variables that are going to be optimized so as to make the model perform better
  • The model which is essentially just a mathematical function that calculates some output given the input in the placeholder variables and the model variables.
  • A cost measure that can be used to guide the optimization of the variables.
  • An optimization method which updates the variables of the model.

In addition, the TensorFlow graph may also contain various debugging statements e.g. for logging data to displayed using TensorBoard, which is not covered in this tutorial.

Placeholder variables

Placeholder variables serve as the input to the graph that we may change each time we execute the graph. We call this feeding the placeholder variables and it is demonstrated further below.

First we define the placeholder variable for the input images. This allows us to change the images that are input to the TensorFlow graph. This is a so-called tensor, which just means that it is a multi-dimensional vector or matrix. The data-type is set to float32 and the shape is set to [None, img_size_flat], where None means that the tensor may hold an arbitrary number of images with each image being a vector of length image_size_flat.


In [19]:
x = tf.placeholder(tf.float32, [None, img_size_flat])

Next we have the placeholder variable for the true labels associated with the images that were input in the placeholder variable x. The shape of this variable is [None, num_clssses] which means it may hold an arbitrary number of labels and each label is a vector of length num_classes which is 10 in this case.


In [20]:
y_true = tf.placeholder(tf.float32, [None, num_classes])

Finally we have the placeholder variable for the true classes of each image in the placeholder variable x. These are integers and the dimensionality of this placeholder variable is se to [None] which means the placeholder variable is a one-dimensional vector or arbitrary length.


In [21]:
y_true_cls = tf.placeholder(tf.int64, [None])

Model

This simple mathematical model multiplies in the placeholder variable x with the weights and then holdsthe biases.

The result is a matrix of shape [num_images, num_classes] because x has shape [num_images, img_size_flat] and weights has shape [img_size_flat, num_classes], so the multiplication of those two matricies is a matrix shape [num_images, num_classes] and then the biases vector is added to each row of matrix.

Note that the name logits is typical TensorFlow terminology, but other people may call the variable something else.


In [23]:
logits = tf.matmul(x, weights) + biases


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-23-5bad4d6612ca> in <module>()
----> 1 logits = tf.matmul(x, weights) + biases

NameError: name 'weights' is not defined

Now logits is a matrix with num_images rows and num_classes columns, where the element of the ith row and jth column is an estimate of how likely the ith input image is to be of jth class.

However, these estimates are a bit rouh and difficult to interpret because the numbers may be very small or large, so we want to normalize them so that each row of the logits matrix sums to one, and each element is limited between zero and one. This is calculated using the so-called softmax function and the result is stored in y_pred.


In [24]:
y_pred = tf.nn.softmax(logits)


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-24-b7fae193db9b> in <module>()
----> 1 y_pred = tf.nn.softmax(logits)

NameError: name 'logits' is not defined

The predicted class can be calculated from the y_pred matrix by taking the index of the largest element in each row.


In [ ]:
y_pred_cls = tf.argmax(y_pred, dimension = 1)

Cost-function to be optimized

To make the model better at classifying the input images, we must somehow change the variables for weights and biases. To do this we first need to know how well the model currently performs by comparing the predicted output of the model y_pred to the desired output y_true.

The cross-entropy is a performance measure used in classification. The cross-entropy is a continuous function that is always positive and if the predicted output of the model exactly matches the desired output the the cross-entropy equals zero. The goal of optimization if therefore to minimize the cross-entropy so it gets as close to zero as possible by changing the weights and biases of the model.

TensorFlow has a built-in function for calculating the cross-entropy. Note that it uses the values of the logits because it also calculates the softmax internally.


In [ ]:
cross-entropy = tf.nn.softmax_cross_entropy_logits(logits=logits,
                                                   labels=y_true)

We have now calculated the cross-entropy for each of the image classifications so we have a measure of how well the model performs on each image individually. But in order to use the cross-entropy to guide the optimization of the model's variables we need a single scalar value, so we simply take the average of the cross-entropy for all the entire classifications.


In [ ]:
cost = tf.reduce_mean(cross-entropy)

Optimization method

Now that we have a cost measure that must be minimized, we can then reduce an optimizer. in this case it is the basic form of Gradient Descent where the step-size is set to 0.5.

Note that optimization is not performed at this point. In fact, nothing is calculated at all, we just add the optimizer-object to the TensorFlow graph for later execution.


In [ ]:
optimizer = tf.train.GradientDescentOptimizer(learning-rate=0.5).minimize(cost)

Performance measures

We need a few more performance measures to display the progress to the user.

This is a vector of booleans whether the predicted class equals the true class of each image.


In [ ]:
correct_prediction = tf.equal(y_pred_cls,y_true_cls)

This calculates the classification accuracy by first type-casting the vector of booleans to floats, so that False becomes 0, and True becomes 1, and then calculating the average of these numbers.


In [ ]:
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

TensorFlow Run

Create TensorFlow session

Once the TensorFlow graph has been created, we have to create a TensorFlow session which is used to execute the graph.


In [ ]:
session = tf.Session()

Initilize variables

The variables for weights and biases must be initilized before we start optimizing them.


In [ ]:
session.run(tf.global_variables_initializer())

Helper-function to perform optimization iterations

There are 50,000 images in the training-set, it takes a long time to calculate the gradient of the model using all these images. We therefore use Stochastic Gradient Descent which only uses a small batch of images in each iteration of the optimizer.


In [ ]:
batch_size = 100

Function for performing a number of optimization iterations so as to gradually improve the weights and biases of the model. In each iteration, a new batch of data is selected from the training-set and then TensorFlow executes the optimizer using those training samples.


In [ ]:
def optimize(num_iterations):
    for i in range(num_iterations):
        x_batch, y_true_batch = data.train.next_batch(batch_size)
        
        feed_dict_train = {x: x_batch,
                          y_true: y_true_batch}
        session.run(optimizer, feed_dict = feed_dict_train)

In [ ]:


In [ ]:


In [ ]:


In [ ]: