MNIST ML Pros tutorial

This notebook is based on the tutorial found here

This tutorial is very similar to the beginners tutorial except for some incremental improvements added to the end to improve the accuracy

Get the MNIST dataset


In [2]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/data/MNIST/",one_hot=True)
sess = tf.InteractiveSession()


Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting /data/MNIST/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting /data/MNIST/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /data/MNIST/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /data/MNIST/t10k-labels-idx1-ubyte.gz

Helper functions

Here we place our helper functions for creating weight & bias variables as well as doing our vanilla 2D convolution and Pooling operations


In [3]:
def weight_variable(shape):
    initial = tf.truncated_normal(shape,stddev=0.1)
    return tf.Variable(initial)
    
def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

def conv2d(x,W):
    return tf.nn.conv2d(x,W,strides=[1,1,1,1],padding='SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x,ksize=[1,2,2,1],strides=[1,2,2,1], padding='SAME')

In [4]:
# Setup our Input placeholder
x = tf.placeholder(tf.float32, [None, 784])

# Define loss and optimizer
y_ = tf.placeholder(tf.float32,[None,10])

First Convolution Layer

Our first layer consists of a convolution layer followed by a max pooling layer. It will compute 32 features for each 5x5 patch.

The weight tensor has the shape

[patch_width, patch_height, num_input_channels, num_output_channels]

Reshape our image to a 4D tensor with the second and third dimension corresponding to image size and the final dimension for the number of colors. The -1 in this case indicates the dimension that will be automatically modified to keep the size of the new tensor the same as the original.


In [5]:
W_conv1 = weight_variable([5,5,1,32])
b_conv1 = bias_variable([32])

x_image = tf.reshape(x,[-1,28,28,1])

h_conv1 = tf.nn.relu(conv2d(x_image,W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

Second Convolution Layer

We create a similar structure except now we have 32 inputs and 64 feature outputs for each 5x5 patch.


In [6]:
W_conv2 = weight_variable([5,5,32,64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1,W_conv2)+b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

Densely Connected Layer

We have now done 2 2x2 convolutions which have reduced our image size to 7x7 since every 2x2 convolution effectively produces a new image that is half the size of the input image.

But for each 7x7 image, we now have 64 features. So we will add a layer with 1024 neurons to allow processing on the entire image.


In [7]:
W_fc1 = weight_variable([7*7*64,1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1,7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat,W_fc1)+b_fc1)

Dropout Layer

The dropout layer helps to reduce overfitting by dropping connections between neurons in the densely connected layers. This paper has a nice discussion on the matter.


In [8]:
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1,keep_prob)

Readout Layer

We add a layer that takes the output of our fully connected layer and does a softmax regression into our classes.


In [9]:
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

Training

It should not that depending on the CPU available this could take some time to complete.


In [ ]:
cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.global_variables_initializer())
for i in range(10000):
  batch = mnist.train.next_batch(50)
  if i%100 == 0:
    train_accuracy = accuracy.eval(feed_dict={
        x:batch[0], y_: batch[1], keep_prob: 1.0})
    print("step %d, training accuracy %g"%(i, train_accuracy))
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print("test accuracy %g"%accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))


step 0, training accuracy 0.2
step 100, training accuracy 0.86
step 200, training accuracy 0.92
step 300, training accuracy 0.9
step 400, training accuracy 1
step 500, training accuracy 0.88
step 600, training accuracy 1
step 700, training accuracy 0.96
step 800, training accuracy 0.88
step 900, training accuracy 1
step 1000, training accuracy 0.96
step 1100, training accuracy 1
step 1200, training accuracy 0.94
step 1300, training accuracy 0.98
step 1400, training accuracy 0.98
step 1500, training accuracy 0.98
step 1600, training accuracy 1
step 1700, training accuracy 0.98
step 1800, training accuracy 0.98
step 1900, training accuracy 0.98
step 2000, training accuracy 1
step 2100, training accuracy 1
step 2200, training accuracy 0.98
step 2300, training accuracy 0.98
step 2400, training accuracy 1
step 2500, training accuracy 0.98
step 2600, training accuracy 1
step 2700, training accuracy 0.94
step 2800, training accuracy 1
step 2900, training accuracy 1
step 3000, training accuracy 1
step 3100, training accuracy 1
step 3200, training accuracy 0.94
step 3300, training accuracy 0.98
step 3400, training accuracy 1
step 3500, training accuracy 1
step 3600, training accuracy 0.96
step 3700, training accuracy 0.96
step 3800, training accuracy 0.98
step 3900, training accuracy 0.98
step 4000, training accuracy 0.96
step 4100, training accuracy 1
step 4200, training accuracy 1
step 4300, training accuracy 0.96
step 4400, training accuracy 1
step 4500, training accuracy 0.98
step 4600, training accuracy 1
step 4700, training accuracy 0.98
step 4800, training accuracy 1
step 4900, training accuracy 1
step 5000, training accuracy 1
step 5100, training accuracy 0.98
step 5200, training accuracy 1
step 5300, training accuracy 1
step 5400, training accuracy 0.98
step 5500, training accuracy 0.98
step 5600, training accuracy 1
step 5700, training accuracy 1
step 5800, training accuracy 1
step 5900, training accuracy 1
step 6000, training accuracy 0.98
step 6100, training accuracy 1
step 6200, training accuracy 1
step 6300, training accuracy 1
step 6400, training accuracy 0.98
step 6500, training accuracy 0.98
step 6600, training accuracy 1
step 6700, training accuracy 1
step 6800, training accuracy 1
step 6900, training accuracy 0.98
step 7000, training accuracy 1
step 7100, training accuracy 1
step 7200, training accuracy 1
step 7300, training accuracy 1
step 7400, training accuracy 0.98
step 7500, training accuracy 1
step 7600, training accuracy 1
step 7700, training accuracy 1
step 7800, training accuracy 1
step 7900, training accuracy 0.98
step 8000, training accuracy 1
step 8100, training accuracy 1
step 8200, training accuracy 1
step 8300, training accuracy 1
step 8400, training accuracy 1
step 8500, training accuracy 1
step 8600, training accuracy 1
step 8700, training accuracy 1
step 8800, training accuracy 1
step 8900, training accuracy 0.98
step 9000, training accuracy 1
step 9100, training accuracy 1
step 9200, training accuracy 0.96
step 9300, training accuracy 1
step 9400, training accuracy 1
step 9500, training accuracy 1
step 9600, training accuracy 1
step 9700, training accuracy 0.98
step 9800, training accuracy 1
step 9900, training accuracy 1