This is a basic TensorFlow tutorial on image classification


In [ ]:
import tensorflow as tf
tf.set_random_seed(1337)

The MNIST dataset

Contains images of handwritten digits (0, 1, 2, ..., 9).

Download the dataset using TF's built-in method.


In [ ]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Every MNIST sample has two parts: an image (vectorized, raster-scanned) of a handwritten digit and a corresponding label.


In [ ]:
import matplotlib.pyplot as plt

def show_sample(index):
    image = mnist.train.images[index].reshape(28, 28) # 784 -> 28x28
    label = mnist.train.labels[index]

    plt.imshow(image, cmap='Greys')
    plt.show()
    plt.clf()
    plt.cla()
    plt.close()
    print('label[%d]: %s' % (index, str(label)))

show_sample(10)
print('------------------------------------------------------------')
show_sample(24)
print('------------------------------------------------------------')
show_sample(12)
print('------------------------------------------------------------')
show_sample(11)
print('------------------------------------------------------------')
show_sample(18)
print('------------------------------------------------------------')

Inputs

Here we specify placeholders for the TF computational graph. Basicall, this determines how to input images, $x$, and labels, $y$, into the learning process.


In [ ]:
def build_inputs():
    #

Our first classification model: softmax regression

We're going to train a model to look at images and predict what digits they are.

A function $M: \mathbb{R}^{28\times 28}\rightarrow \mathbb{R}^{10}$ outputs a classification score for each input digit. In other words, $M(\text{image})=\text{a vector of per-class scores}$. We want that a higher score for class $c$ translates to higher confidence that $c$ is the correct class.

For example, if $M$ outputs $$ (0.05, 0.03, 0.82, 0.02, 0.01, 0.02, 0.01, 0.02, 0.01, 0.1) $$ for an input image, it classifies that image as a $2$.

Let us choose a very simple classification model first: $$ M(\mathbf{x})= \mathbf{x}\cdot\mathbf{W} + \mathbf{b} , $$ where $\mathbf{x}\in\mathbb{R}^{784}$ is a vectorized input image, and $\mathbf{W}\in\mathbb{R}^{784\times 10}$ and $\mathbf{b}\in\mathbb{R}^{10}$ are the model parameters. The elements of $M(\mathbf{x})$ are sometimes called logits.


In [ ]:
def build_affine(x):
    # x*W + b

Learning the model parameters from data

Initially, $\mathbf{W}$ and $\mathbf{b}$ contain random values that will not produce correct classification results.

We have to tune these tensors by minimizing an appropirate loss function that will "measure" the quality of classification.

We will use the cross entropy criterion: $$ L(\mathbf{x}, c)= -\log p_c(\mathbf{x}) , $$ where $p_c(\mathbf{x})$ is the probability assigned by the model that $\mathbf{x}$ belongs to class $c$, $$ p_c= \frac{e^{l_c}}{\sum_{j=1}^{10} e^{l_j}} , $$ and $(l_0, l_1, \ldots, l_9)=M(\mathbf{x})$ are the logits output by the model.

The derivatives can now be computed by TensorFlow and the model can be tuned with stochastic gradient descent ($k=0, 1, 2, \ldots$): $$ \mathbf{W}_{k+1}= \mathbf{W}_k - \eta\frac{\partial L}{\partial\mathbf{W}_k} $$ $$ \mathbf{b}_{k+1}= \mathbf{b}_k - \eta\frac{\partial L}{\partial\mathbf{b}_k} $$


In [ ]:
def build_loss(logits, y):
    #

The loss $L$ is usually approximated on a batch of images. The code above can handle this case as well. We set the batch size to $100$ is our experiment.

Measuring the test set accuracy

We measure the quality of the model on a separate testing dataset by counting the number of images that it has correctly classified: $$ \text{accuracy}= \frac{\text{number of correctly classified samples}}{\text{total number of samples}} $$


In [ ]:
def build_ncorrect(logits, y):
    #

def get_accuracy(ncorrect):
    #

Trainig/testing loop

The code for traing and validating the model follows.


In [ ]:
def run_training_loop(step, accuracy, batchsize, niters):
    #

Training #1


In [ ]:
# prepare computation graphs
# (insert code below)

# final preparations for learning
sess = tf.Session()
tf.global_variables_initializer().run(session=sess)

# start the learning process: batch size=100, number of iterations=10000
# (insert code below)

# clear the current session (so we can start another one later)
sess.close()
tf.reset_default_graph()

The obtained classification accuracy on the test set should be between 91 and 93 percent. This is a pretty bad result. Can we do better that that?

Defining a convolutional model for digit classification

Convolutional networks are significantly better for image classification than traditional machine-learning methods.

Basic components:

  • convolutional layers;
  • rectified linear units (ReLUs);
  • dense layers (affine projection).

In [ ]:
def build_convnet(x):
    #

Training #2


In [ ]:
# inputs and model outputs
x, y = build_inputs()
logits = build_convnet(x)

# loss-computation grah
loss = build_loss(logits, y)

# testing-accuracy graph
ncorrect = build_ncorrect(logits, y)

# we use RMSProp to gradually tune the model parameters (similar to SGD, but better in most cases)
step = tf.train.RMSPropOptimizer(1e-3).minimize(loss)

# final preparations for learning
sess = tf.Session()
tf.global_variables_initializer().run(session=sess)

# start the learning process: batch size=100, number of iterations=5000
run_training_loop(step, ncorrect, 100, 5001) 

# clear the current session (so we can start another one later)
sess.close()
tf.reset_default_graph()

The obtained classification accuracy should be well over 99%.