In [ ]:
import tensorflow as tf
tf.set_random_seed(1337)
In [ ]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
Every MNIST sample has two parts: an image (vectorized, raster-scanned) of a handwritten digit and a corresponding label.
In [ ]:
import matplotlib.pyplot as plt
def show_sample(index):
image = mnist.train.images[index].reshape(28, 28) # 784 -> 28x28
label = mnist.train.labels[index]
plt.imshow(image, cmap='Greys')
plt.show()
plt.clf()
plt.cla()
plt.close()
print('label[%d]: %s' % (index, str(label)))
show_sample(10)
print('------------------------------------------------------------')
show_sample(24)
print('------------------------------------------------------------')
show_sample(12)
print('------------------------------------------------------------')
show_sample(11)
print('------------------------------------------------------------')
show_sample(18)
print('------------------------------------------------------------')
In [ ]:
def build_inputs():
#
We're going to train a model to look at images and predict what digits they are.
A function $M: \mathbb{R}^{28\times 28}\rightarrow \mathbb{R}^{10}$ outputs a classification score for each input digit. In other words, $M(\text{image})=\text{a vector of per-class scores}$. We want that a higher score for class $c$ translates to higher confidence that $c$ is the correct class.
For example, if $M$ outputs $$ (0.05, 0.03, 0.82, 0.02, 0.01, 0.02, 0.01, 0.02, 0.01, 0.1) $$ for an input image, it classifies that image as a $2$.
Let us choose a very simple classification model first: $$ M(\mathbf{x})= \mathbf{x}\cdot\mathbf{W} + \mathbf{b} , $$ where $\mathbf{x}\in\mathbb{R}^{784}$ is a vectorized input image, and $\mathbf{W}\in\mathbb{R}^{784\times 10}$ and $\mathbf{b}\in\mathbb{R}^{10}$ are the model parameters. The elements of $M(\mathbf{x})$ are sometimes called logits.
In [ ]:
def build_affine(x):
# x*W + b
Initially, $\mathbf{W}$ and $\mathbf{b}$ contain random values that will not produce correct classification results.
We have to tune these tensors by minimizing an appropirate loss function that will "measure" the quality of classification.
We will use the cross entropy criterion: $$ L(\mathbf{x}, c)= -\log p_c(\mathbf{x}) , $$ where $p_c(\mathbf{x})$ is the probability assigned by the model that $\mathbf{x}$ belongs to class $c$, $$ p_c= \frac{e^{l_c}}{\sum_{j=1}^{10} e^{l_j}} , $$ and $(l_0, l_1, \ldots, l_9)=M(\mathbf{x})$ are the logits output by the model.
The derivatives can now be computed by TensorFlow and the model can be tuned with stochastic gradient descent ($k=0, 1, 2, \ldots$): $$ \mathbf{W}_{k+1}= \mathbf{W}_k - \eta\frac{\partial L}{\partial\mathbf{W}_k} $$ $$ \mathbf{b}_{k+1}= \mathbf{b}_k - \eta\frac{\partial L}{\partial\mathbf{b}_k} $$
In [ ]:
def build_loss(logits, y):
#
The loss $L$ is usually approximated on a batch of images. The code above can handle this case as well. We set the batch size to $100$ is our experiment.
In [ ]:
def build_ncorrect(logits, y):
#
def get_accuracy(ncorrect):
#
In [ ]:
def run_training_loop(step, accuracy, batchsize, niters):
#
In [ ]:
# prepare computation graphs
# (insert code below)
# final preparations for learning
sess = tf.Session()
tf.global_variables_initializer().run(session=sess)
# start the learning process: batch size=100, number of iterations=10000
# (insert code below)
# clear the current session (so we can start another one later)
sess.close()
tf.reset_default_graph()
The obtained classification accuracy on the test set should be between 91 and 93 percent. This is a pretty bad result. Can we do better that that?
In [ ]:
def build_convnet(x):
#
In [ ]:
# inputs and model outputs
x, y = build_inputs()
logits = build_convnet(x)
# loss-computation grah
loss = build_loss(logits, y)
# testing-accuracy graph
ncorrect = build_ncorrect(logits, y)
# we use RMSProp to gradually tune the model parameters (similar to SGD, but better in most cases)
step = tf.train.RMSPropOptimizer(1e-3).minimize(loss)
# final preparations for learning
sess = tf.Session()
tf.global_variables_initializer().run(session=sess)
# start the learning process: batch size=100, number of iterations=5000
run_training_loop(step, ncorrect, 100, 5001)
# clear the current session (so we can start another one later)
sess.close()
tf.reset_default_graph()
The obtained classification accuracy should be well over 99%.