Deep Learning

Assignment 3

Previously in 2_fullyconnected.ipynb, you trained a logistic regression and a neural network model.

The goal of this assignment is to explore regularization techniques.


In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle

In [2]:
# Some personnal imports
import matplotlib.pyplot as plt
%matplotlib inline

First reload the data we generated in notmnist.ipynb.


In [3]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)


Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)

Reformat into a shape that's more adapted to the models we're going to train:

  • data as a flat matrix,
  • labels as float 1-hot encodings.

In [4]:
image_size = 28
num_labels = 10

def reformat(dataset, labels):
  dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
  # Map 2 to [0.0, 1.0, 0.0 ...], 3 to [0.0, 0.0, 1.0 ...]
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)


Training set (200000, 784) (200000, 10)
Validation set (10000, 784) (10000, 10)
Test set (10000, 784) (10000, 10)

In [5]:
def accuracy(predictions, labels):
  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

Problem 1

Introduce and tune L2 regularization for both logistic and neural network models. Remember that L2 amounts to adding a penalty on the norm of the weights to the loss. In TensorFlow, you can compute the L2 loss for a tensor t using nn.l2_loss(t). The right amount of regularization should improve your validation / test accuracy.


Let's start with the logistic model:


In [6]:
batch_size = 128

graph = tf.Graph()
with graph.as_default():

  # Input data. For the training data, we use a placeholder that will be fed
  # at run time with a training minibatch.
  tf_train_dataset = tf.placeholder(tf.float32,
                                    shape=(batch_size, image_size * image_size))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  beta_regul = tf.placeholder(tf.float32)
  
  # Variables.
  weights = tf.Variable(
    tf.truncated_normal([image_size * image_size, num_labels]))
  biases = tf.Variable(tf.zeros([num_labels]))
  
  # Training computation.
  logits = tf.matmul(tf_train_dataset, weights) + biases
  loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels)) + beta_regul * tf.nn.l2_loss(weights)
  
  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(
    tf.matmul(tf_valid_dataset, weights) + biases)
  test_prediction = tf.nn.softmax(tf.matmul(tf_test_dataset, weights) + biases)

In [7]:
num_steps = 3001

with tf.Session(graph=graph) as session:
  tf.initialize_all_variables().run()
  print("Initialized")
  for step in range(num_steps):
    # Pick an offset within the training data, which has been randomized.
    # Note: we could use better randomization across epochs.
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    # Generate a minibatch.
    batch_data = train_dataset[offset:(offset + batch_size), :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    # Prepare a dictionary telling the session where to feed the minibatch.
    # The key of the dictionary is the placeholder node of the graph to be fed,
    # and the value is the numpy array to feed to it.
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels, beta_regul : 1e-3}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 500 == 0):
      print("Minibatch loss at step %d: %f" % (step, l))
      print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
      print("Validation accuracy: %.1f%%" % accuracy(
        valid_prediction.eval(), valid_labels))
  print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))


Initialized
Minibatch loss at step 0: 19.309465
Minibatch accuracy: 6.2%
Validation accuracy: 12.7%
Minibatch loss at step 500: 2.463279
Minibatch accuracy: 82.8%
Validation accuracy: 76.4%
Minibatch loss at step 1000: 1.775184
Minibatch accuracy: 78.9%
Validation accuracy: 78.3%
Minibatch loss at step 1500: 0.983707
Minibatch accuracy: 85.2%
Validation accuracy: 79.8%
Minibatch loss at step 2000: 0.856673
Minibatch accuracy: 86.7%
Validation accuracy: 80.7%
Minibatch loss at step 2500: 0.862013
Minibatch accuracy: 79.7%
Validation accuracy: 81.2%
Minibatch loss at step 3000: 0.778380
Minibatch accuracy: 82.0%
Validation accuracy: 81.9%
Test accuracy: 88.9%

The L2 regularization introduces a new meta parameter that should be tuned. Since I do not have any idea of what should be the right value for this meta parameter, I will plot the accuracy by the meta parameter value (in a logarithmic scale).


In [8]:
num_steps = 3001
regul_val = [pow(10, i) for i in np.arange(-4, -2, 0.1)]
accuracy_val = []

for regul in regul_val:
  with tf.Session(graph=graph) as session:
    tf.initialize_all_variables().run()
    for step in range(num_steps):
    # Pick an offset within the training data, which has been randomized.
    # Note: we could use better randomization across epochs.
      offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    # Generate a minibatch.
      batch_data = train_dataset[offset:(offset + batch_size), :]
      batch_labels = train_labels[offset:(offset + batch_size), :]
    # Prepare a dictionary telling the session where to feed the minibatch.
    # The key of the dictionary is the placeholder node of the graph to be fed,
    # and the value is the numpy array to feed to it.
      feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels, beta_regul : regul}
      _, l, predictions = session.run(
        [optimizer, loss, train_prediction], feed_dict=feed_dict)
    accuracy_val.append(accuracy(test_prediction.eval(), test_labels))

In [9]:
plt.semilogx(regul_val, accuracy_val)
plt.grid(True)
plt.title('Test accuracy by regularization (logistic)')
plt.show()


Let's see if the same technique will improve the prediction of the 1-layer neural network:


In [10]:
batch_size = 128
num_hidden_nodes = 1024

graph = tf.Graph()
with graph.as_default():

  # Input data. For the training data, we use a placeholder that will be fed
  # at run time with a training minibatch.
  tf_train_dataset = tf.placeholder(tf.float32,
                                    shape=(batch_size, image_size * image_size))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  beta_regul = tf.placeholder(tf.float32)
  
  # Variables.
  weights1 = tf.Variable(
    tf.truncated_normal([image_size * image_size, num_hidden_nodes]))
  biases1 = tf.Variable(tf.zeros([num_hidden_nodes]))
  weights2 = tf.Variable(
    tf.truncated_normal([num_hidden_nodes, num_labels]))
  biases2 = tf.Variable(tf.zeros([num_labels]))
  
  # Training computation.
  lay1_train = tf.nn.relu(tf.matmul(tf_train_dataset, weights1) + biases1)
  logits = tf.matmul(lay1_train, weights2) + biases2
  loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels)) + \
      beta_regul * (tf.nn.l2_loss(weights1) + tf.nn.l2_loss(weights2))
  
  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  lay1_valid = tf.nn.relu(tf.matmul(tf_valid_dataset, weights1) + biases1)
  valid_prediction = tf.nn.softmax(tf.matmul(lay1_valid, weights2) + biases2)
  lay1_test = tf.nn.relu(tf.matmul(tf_test_dataset, weights1) + biases1)
  test_prediction = tf.nn.softmax(tf.matmul(lay1_test, weights2) + biases2)

In [11]:
num_steps = 3001

with tf.Session(graph=graph) as session:
  tf.initialize_all_variables().run()
  print("Initialized")
  for step in range(num_steps):
    # Pick an offset within the training data, which has been randomized.
    # Note: we could use better randomization across epochs.
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    # Generate a minibatch.
    batch_data = train_dataset[offset:(offset + batch_size), :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    # Prepare a dictionary telling the session where to feed the minibatch.
    # The key of the dictionary is the placeholder node of the graph to be fed,
    # and the value is the numpy array to feed to it.
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels, beta_regul : 1e-3}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 500 == 0):
      print("Minibatch loss at step %d: %f" % (step, l))
      print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
      print("Validation accuracy: %.1f%%" % accuracy(
        valid_prediction.eval(), valid_labels))
  print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))


Initialized
Minibatch loss at step 0: 590.374023
Minibatch accuracy: 12.5%
Validation accuracy: 25.4%
Minibatch loss at step 500: 199.734955
Minibatch accuracy: 80.5%
Validation accuracy: 78.8%
Minibatch loss at step 1000: 116.521393
Minibatch accuracy: 78.9%
Validation accuracy: 81.0%
Minibatch loss at step 1500: 68.802231
Minibatch accuracy: 90.6%
Validation accuracy: 82.9%
Minibatch loss at step 2000: 41.379978
Minibatch accuracy: 89.8%
Validation accuracy: 84.4%
Minibatch loss at step 2500: 25.250950
Minibatch accuracy: 86.7%
Validation accuracy: 85.2%
Minibatch loss at step 3000: 15.515349
Minibatch accuracy: 86.7%
Validation accuracy: 86.5%
Test accuracy: 93.3%

Finally something above 90%! I will also plot the final accuracy by the L2 parameter to find the best value.


In [12]:
num_steps = 3001
regul_val = [pow(10, i) for i in np.arange(-4, -2, 0.1)]
accuracy_val = []

for regul in regul_val:    
  with tf.Session(graph=graph) as session:
    tf.initialize_all_variables().run()
    for step in range(num_steps):
      # Pick an offset within the training data, which has been randomized.
      # Note: we could use better randomization across epochs.
      offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
      # Generate a minibatch.
      batch_data = train_dataset[offset:(offset + batch_size), :]
      batch_labels = train_labels[offset:(offset + batch_size), :]
      # Prepare a dictionary telling the session where to feed the minibatch.
      # The key of the dictionary is the placeholder node of the graph to be fed,
      # and the value is the numpy array to feed to it.
      feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels, beta_regul : regul}
      _, l, predictions = session.run(
        [optimizer, loss, train_prediction], feed_dict=feed_dict)
    accuracy_val.append(accuracy(test_prediction.eval(), test_labels))

In [13]:
plt.semilogx(regul_val, accuracy_val)
plt.grid(True)
plt.title('Test accuracy by regularization (1-layer net)')
plt.show()



Problem 2

Let's demonstrate an extreme case of overfitting. Restrict your training data to just a few batches. What happens?



In [14]:
batch_size = 128
num_hidden_nodes = 1024

graph = tf.Graph()
with graph.as_default():

  # Input data. For the training data, we use a placeholder that will be fed
  # at run time with a training minibatch.
  tf_train_dataset = tf.placeholder(tf.float32,
                                    shape=(batch_size, image_size * image_size))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  beta_regul = tf.placeholder(tf.float32)
  
  # Variables.
  weights1 = tf.Variable(
    tf.truncated_normal([image_size * image_size, num_hidden_nodes]))
  biases1 = tf.Variable(tf.zeros([num_hidden_nodes]))
  weights2 = tf.Variable(
    tf.truncated_normal([num_hidden_nodes, num_labels]))
  biases2 = tf.Variable(tf.zeros([num_labels]))
  
  # Training computation.
  lay1_train = tf.nn.relu(tf.matmul(tf_train_dataset, weights1) + biases1)
  logits = tf.matmul(lay1_train, weights2) + biases2
  loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
  
  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  lay1_valid = tf.nn.relu(tf.matmul(tf_valid_dataset, weights1) + biases1)
  valid_prediction = tf.nn.softmax(tf.matmul(lay1_valid, weights2) + biases2)
  lay1_test = tf.nn.relu(tf.matmul(tf_test_dataset, weights1) + biases1)
  test_prediction = tf.nn.softmax(tf.matmul(lay1_test, weights2) + biases2)

In [15]:
num_steps = 101
num_batches = 3

with tf.Session(graph=graph) as session:
  tf.initialize_all_variables().run()
  print("Initialized")
  for step in range(num_steps):
    # Pick an offset within the training data, which has been randomized.
    # Note: we could use better randomization across epochs.
    #offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    offset = ((step % num_batches) * batch_size) % (train_labels.shape[0] - batch_size)
    # Generate a minibatch.
    batch_data = train_dataset[offset:(offset + batch_size), :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    # Prepare a dictionary telling the session where to feed the minibatch.
    # The key of the dictionary is the placeholder node of the graph to be fed,
    # and the value is the numpy array to feed to it.
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels, beta_regul : 1e-3}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 2 == 0):
      print("Minibatch loss at step %d: %f" % (step, l))
      print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
      print("Validation accuracy: %.1f%%" % accuracy(
        valid_prediction.eval(), valid_labels))
  print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))


Initialized
Minibatch loss at step 0: 370.972351
Minibatch accuracy: 8.6%
Validation accuracy: 25.9%
Minibatch loss at step 2: 952.214539
Minibatch accuracy: 36.7%
Validation accuracy: 40.9%
Minibatch loss at step 4: 344.697205
Minibatch accuracy: 62.5%
Validation accuracy: 53.6%
Minibatch loss at step 6: 10.702987
Minibatch accuracy: 95.3%
Validation accuracy: 58.9%
Minibatch loss at step 8: 5.507030
Minibatch accuracy: 96.9%
Validation accuracy: 58.8%
Minibatch loss at step 10: 6.428697
Minibatch accuracy: 99.2%
Validation accuracy: 59.7%
Minibatch loss at step 12: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 14: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 16: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 18: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 20: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 22: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 24: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 26: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 28: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 30: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 32: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 34: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 36: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 38: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 40: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 42: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 44: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 46: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 48: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 50: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 52: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 54: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 56: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 58: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 60: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 62: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 64: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 66: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 68: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 70: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 72: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 74: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 76: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 78: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 80: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 82: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 84: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 86: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 88: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 90: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 92: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 94: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 96: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 98: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Minibatch loss at step 100: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 59.7%
Test accuracy: 66.8%

Since there are far too much parameters and no regularization, the accuracy of the batches is 100%. The generalization capability is poor, as shown in the validation and test accuracy.


Problem 3

Introduce Dropout on the hidden layer of the neural network. Remember: Dropout should only be introduced during training, not evaluation, otherwise your evaluation results would be stochastic as well. TensorFlow provides nn.dropout() for that, but you have to make sure it's only inserted during training.

What happens to our extreme overfitting case?



In [16]:
batch_size = 128
num_hidden_nodes = 1024

graph = tf.Graph()
with graph.as_default():

  # Input data. For the training data, we use a placeholder that will be fed
  # at run time with a training minibatch.
  tf_train_dataset = tf.placeholder(tf.float32,
                                    shape=(batch_size, image_size * image_size))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  
  # Variables.
  weights1 = tf.Variable(
    tf.truncated_normal([image_size * image_size, num_hidden_nodes]))
  biases1 = tf.Variable(tf.zeros([num_hidden_nodes]))
  weights2 = tf.Variable(
    tf.truncated_normal([num_hidden_nodes, num_labels]))
  biases2 = tf.Variable(tf.zeros([num_labels]))
  
  # Training computation.
  lay1_train = tf.nn.relu(tf.matmul(tf_train_dataset, weights1) + biases1)
  drop1 = tf.nn.dropout(lay1_train, 0.5)
  logits = tf.matmul(drop1, weights2) + biases2
  loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
    
  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  lay1_valid = tf.nn.relu(tf.matmul(tf_valid_dataset, weights1) + biases1)
  valid_prediction = tf.nn.softmax(tf.matmul(lay1_valid, weights2) + biases2)
  lay1_test = tf.nn.relu(tf.matmul(tf_test_dataset, weights1) + biases1)
  test_prediction = tf.nn.softmax(tf.matmul(lay1_test, weights2) + biases2)

In [17]:
num_steps = 101
num_batches = 3

with tf.Session(graph=graph) as session:
  tf.initialize_all_variables().run()
  print("Initialized")
  for step in range(num_steps):
    # Pick an offset within the training data, which has been randomized.
    # Note: we could use better randomization across epochs.
    #offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    offset = step % num_batches
    # Generate a minibatch.
    batch_data = train_dataset[offset:(offset + batch_size), :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    # Prepare a dictionary telling the session where to feed the minibatch.
    # The key of the dictionary is the placeholder node of the graph to be fed,
    # and the value is the numpy array to feed to it.
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 2 == 0):
      print("Minibatch loss at step %d: %f" % (step, l))
      print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
      print("Validation accuracy: %.1f%%" % accuracy(
        valid_prediction.eval(), valid_labels))
  print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))


Initialized
Minibatch loss at step 0: 522.686890
Minibatch accuracy: 10.9%
Validation accuracy: 29.1%
Minibatch loss at step 2: 814.265259
Minibatch accuracy: 43.8%
Validation accuracy: 28.4%
Minibatch loss at step 4: 300.448486
Minibatch accuracy: 59.4%
Validation accuracy: 54.8%
Minibatch loss at step 6: 24.725126
Minibatch accuracy: 91.4%
Validation accuracy: 65.4%
Minibatch loss at step 8: 2.053320
Minibatch accuracy: 95.3%
Validation accuracy: 66.7%
Minibatch loss at step 10: 26.185101
Minibatch accuracy: 90.6%
Validation accuracy: 62.2%
Minibatch loss at step 12: 74.086914
Minibatch accuracy: 89.8%
Validation accuracy: 66.1%
Minibatch loss at step 14: 16.961481
Minibatch accuracy: 93.8%
Validation accuracy: 67.2%
Minibatch loss at step 16: 0.000043
Minibatch accuracy: 100.0%
Validation accuracy: 68.5%
Minibatch loss at step 18: 1.931412
Minibatch accuracy: 98.4%
Validation accuracy: 68.1%
Minibatch loss at step 20: 3.458273
Minibatch accuracy: 96.9%
Validation accuracy: 67.9%
Minibatch loss at step 22: 0.269873
Minibatch accuracy: 99.2%
Validation accuracy: 67.5%
Minibatch loss at step 24: 6.727062
Minibatch accuracy: 98.4%
Validation accuracy: 67.4%
Minibatch loss at step 26: 1.342917
Minibatch accuracy: 99.2%
Validation accuracy: 67.0%
Minibatch loss at step 28: 3.533568
Minibatch accuracy: 98.4%
Validation accuracy: 66.1%
Minibatch loss at step 30: 2.286844
Minibatch accuracy: 98.4%
Validation accuracy: 66.2%
Minibatch loss at step 32: 0.303651
Minibatch accuracy: 99.2%
Validation accuracy: 67.0%
Minibatch loss at step 34: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 67.0%
Minibatch loss at step 36: 0.939636
Minibatch accuracy: 99.2%
Validation accuracy: 66.4%
Minibatch loss at step 38: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 66.4%
Minibatch loss at step 40: 0.764459
Minibatch accuracy: 99.2%
Validation accuracy: 65.7%
Minibatch loss at step 42: 0.769536
Minibatch accuracy: 99.2%
Validation accuracy: 67.0%
Minibatch loss at step 44: 2.977767
Minibatch accuracy: 98.4%
Validation accuracy: 68.4%
Minibatch loss at step 46: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 69.5%
Minibatch loss at step 48: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 69.6%
Minibatch loss at step 50: 3.058991
Minibatch accuracy: 98.4%
Validation accuracy: 68.9%
Minibatch loss at step 52: 0.909829
Minibatch accuracy: 98.4%
Validation accuracy: 68.7%
Minibatch loss at step 54: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 68.7%
Minibatch loss at step 56: 1.273322
Minibatch accuracy: 98.4%
Validation accuracy: 68.3%
Minibatch loss at step 58: 2.320458
Minibatch accuracy: 99.2%
Validation accuracy: 68.5%
Minibatch loss at step 60: 0.501477
Minibatch accuracy: 98.4%
Validation accuracy: 68.6%
Minibatch loss at step 62: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 68.7%
Minibatch loss at step 64: 0.263712
Minibatch accuracy: 99.2%
Validation accuracy: 68.9%
Minibatch loss at step 66: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 68.9%
Minibatch loss at step 68: 1.202995
Minibatch accuracy: 99.2%
Validation accuracy: 69.0%
Minibatch loss at step 70: 1.206359
Minibatch accuracy: 99.2%
Validation accuracy: 69.0%
Minibatch loss at step 72: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 69.0%
Minibatch loss at step 74: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 69.0%
Minibatch loss at step 76: 1.197501
Minibatch accuracy: 99.2%
Validation accuracy: 67.9%
Minibatch loss at step 78: 1.862640
Minibatch accuracy: 99.2%
Validation accuracy: 67.9%
Minibatch loss at step 80: 0.206966
Minibatch accuracy: 99.2%
Validation accuracy: 67.4%
Minibatch loss at step 82: 0.051613
Minibatch accuracy: 99.2%
Validation accuracy: 67.3%
Minibatch loss at step 84: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 67.4%
Minibatch loss at step 86: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 67.5%
Minibatch loss at step 88: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 66.7%
Minibatch loss at step 90: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 66.7%
Minibatch loss at step 92: 0.589971
Minibatch accuracy: 99.2%
Validation accuracy: 67.1%
Minibatch loss at step 94: 1.581323
Minibatch accuracy: 98.4%
Validation accuracy: 66.6%
Minibatch loss at step 96: 2.395708
Minibatch accuracy: 98.4%
Validation accuracy: 67.9%
Minibatch loss at step 98: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 67.9%
Minibatch loss at step 100: 0.000000
Minibatch accuracy: 100.0%
Validation accuracy: 66.7%
Test accuracy: 73.7%

The first conclusion is that 100% of accuracy on the minibatches is more difficult achieved or to keep. As a result, the test accuracy is improved by 6%, the final net is more capable of generalization.


Problem 4

Try to get the best performance you can using a multi-layer model! The best reported test accuracy using a deep network is 97.1%.

One avenue you can explore is to add multiple layers.

Another one is to use learning rate decay:

global_step = tf.Variable(0)  # count the number of steps taken.
learning_rate = tf.train.exponential_decay(0.5, step, ...)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)


Let's do a first try with 2 layers. Note how the parameters are initialized, compared to the previous cases.


In [18]:
batch_size = 128
num_hidden_nodes1 = 1024
num_hidden_nodes2 = 100
beta_regul = 1e-3

graph = tf.Graph()
with graph.as_default():

  # Input data. For the training data, we use a placeholder that will be fed
  # at run time with a training minibatch.
  tf_train_dataset = tf.placeholder(tf.float32,
                                    shape=(batch_size, image_size * image_size))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  global_step = tf.Variable(0)

  # Variables.
  weights1 = tf.Variable(
    tf.truncated_normal(
        [image_size * image_size, num_hidden_nodes1],
        stddev=np.sqrt(2.0 / (image_size * image_size)))
    )
  biases1 = tf.Variable(tf.zeros([num_hidden_nodes1]))
  weights2 = tf.Variable(
    tf.truncated_normal([num_hidden_nodes1, num_hidden_nodes2], stddev=np.sqrt(2.0 / num_hidden_nodes1)))
  biases2 = tf.Variable(tf.zeros([num_hidden_nodes2]))
  weights3 = tf.Variable(
    tf.truncated_normal([num_hidden_nodes2, num_labels], stddev=np.sqrt(2.0 / num_hidden_nodes2)))
  biases3 = tf.Variable(tf.zeros([num_labels]))
  
  # Training computation.
  lay1_train = tf.nn.relu(tf.matmul(tf_train_dataset, weights1) + biases1)
  lay2_train = tf.nn.relu(tf.matmul(lay1_train, weights2) + biases2)
  logits = tf.matmul(lay2_train, weights3) + biases3
  loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels)) + \
      beta_regul * (tf.nn.l2_loss(weights1) + tf.nn.l2_loss(weights2) + tf.nn.l2_loss(weights3))
  
  # Optimizer.
  learning_rate = tf.train.exponential_decay(0.5, global_step, 1000, 0.65, staircase=True)
  optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  lay1_valid = tf.nn.relu(tf.matmul(tf_valid_dataset, weights1) + biases1)
  lay2_valid = tf.nn.relu(tf.matmul(lay1_valid, weights2) + biases2)
  valid_prediction = tf.nn.softmax(tf.matmul(lay2_valid, weights3) + biases3)
  lay1_test = tf.nn.relu(tf.matmul(tf_test_dataset, weights1) + biases1)
  lay2_test = tf.nn.relu(tf.matmul(lay1_test, weights2) + biases2)
  test_prediction = tf.nn.softmax(tf.matmul(lay2_test, weights3) + biases3)

In [19]:
num_steps = 9001

with tf.Session(graph=graph) as session:
  tf.initialize_all_variables().run()
  print("Initialized")
  for step in range(num_steps):
    # Pick an offset within the training data, which has been randomized.
    # Note: we could use better randomization across epochs.
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    # Generate a minibatch.
    batch_data = train_dataset[offset:(offset + batch_size), :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    # Prepare a dictionary telling the session where to feed the minibatch.
    # The key of the dictionary is the placeholder node of the graph to be fed,
    # and the value is the numpy array to feed to it.
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 500 == 0):
      print("Minibatch loss at step %d: %f" % (step, l))
      print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
      print("Validation accuracy: %.1f%%" % accuracy(
        valid_prediction.eval(), valid_labels))
  print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))


Initialized
Minibatch loss at step 0: 3.272147
Minibatch accuracy: 10.9%
Validation accuracy: 34.4%
Minibatch loss at step 500: 0.930104
Minibatch accuracy: 90.6%
Validation accuracy: 85.6%
Minibatch loss at step 1000: 0.904542
Minibatch accuracy: 84.4%
Validation accuracy: 86.6%
Minibatch loss at step 1500: 0.575127
Minibatch accuracy: 93.0%
Validation accuracy: 88.0%
Minibatch loss at step 2000: 0.520965
Minibatch accuracy: 94.5%
Validation accuracy: 88.2%
Minibatch loss at step 2500: 0.531228
Minibatch accuracy: 90.6%
Validation accuracy: 88.6%
Minibatch loss at step 3000: 0.565390
Minibatch accuracy: 88.3%
Validation accuracy: 88.7%
Minibatch loss at step 3500: 0.573201
Minibatch accuracy: 89.1%
Validation accuracy: 89.2%
Minibatch loss at step 4000: 0.445847
Minibatch accuracy: 92.2%
Validation accuracy: 89.0%
Minibatch loss at step 4500: 0.444020
Minibatch accuracy: 90.6%
Validation accuracy: 89.4%
Minibatch loss at step 5000: 0.498980
Minibatch accuracy: 90.6%
Validation accuracy: 89.5%
Minibatch loss at step 5500: 0.493428
Minibatch accuracy: 89.8%
Validation accuracy: 89.6%
Minibatch loss at step 6000: 0.563357
Minibatch accuracy: 88.3%
Validation accuracy: 89.8%
Minibatch loss at step 6500: 0.390322
Minibatch accuracy: 93.0%
Validation accuracy: 89.8%
Minibatch loss at step 7000: 0.506404
Minibatch accuracy: 87.5%
Validation accuracy: 90.0%
Minibatch loss at step 7500: 0.472213
Minibatch accuracy: 89.1%
Validation accuracy: 90.0%
Minibatch loss at step 8000: 0.571431
Minibatch accuracy: 85.9%
Validation accuracy: 90.0%
Minibatch loss at step 8500: 0.409382
Minibatch accuracy: 92.2%
Validation accuracy: 90.1%
Minibatch loss at step 9000: 0.470708
Minibatch accuracy: 89.8%
Validation accuracy: 90.0%
Test accuracy: 95.8%

This is getting really good. Let's try one layer deeper with dropouts.


In [20]:
batch_size = 128
num_hidden_nodes1 = 1024
num_hidden_nodes2 = 256
num_hidden_nodes3 = 128
keep_prob = 0.5

graph = tf.Graph()
with graph.as_default():

  # Input data. For the training data, we use a placeholder that will be fed
  # at run time with a training minibatch.
  tf_train_dataset = tf.placeholder(tf.float32,
                                    shape=(batch_size, image_size * image_size))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  global_step = tf.Variable(0)

  # Variables.
  weights1 = tf.Variable(
    tf.truncated_normal(
        [image_size * image_size, num_hidden_nodes1],
        stddev=np.sqrt(2.0 / (image_size * image_size)))
    )
  biases1 = tf.Variable(tf.zeros([num_hidden_nodes1]))
  weights2 = tf.Variable(
    tf.truncated_normal([num_hidden_nodes1, num_hidden_nodes2], stddev=np.sqrt(2.0 / num_hidden_nodes1)))
  biases2 = tf.Variable(tf.zeros([num_hidden_nodes2]))
  weights3 = tf.Variable(
    tf.truncated_normal([num_hidden_nodes2, num_hidden_nodes3], stddev=np.sqrt(2.0 / num_hidden_nodes2)))
  biases3 = tf.Variable(tf.zeros([num_hidden_nodes3]))
  weights4 = tf.Variable(
    tf.truncated_normal([num_hidden_nodes3, num_labels], stddev=np.sqrt(2.0 / num_hidden_nodes3)))
  biases4 = tf.Variable(tf.zeros([num_labels]))
  
  # Training computation.
  lay1_train = tf.nn.relu(tf.matmul(tf_train_dataset, weights1) + biases1)
  lay2_train = tf.nn.relu(tf.matmul(lay1_train, weights2) + biases2)
  lay3_train = tf.nn.relu(tf.matmul(lay2_train, weights3) + biases3)
  logits = tf.matmul(lay3_train, weights4) + biases4
  loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
  
  # Optimizer.
  learning_rate = tf.train.exponential_decay(0.5, global_step, 4000, 0.65, staircase=True)
  optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  lay1_valid = tf.nn.relu(tf.matmul(tf_valid_dataset, weights1) + biases1)
  lay2_valid = tf.nn.relu(tf.matmul(lay1_valid, weights2) + biases2)
  lay3_valid = tf.nn.relu(tf.matmul(lay2_valid, weights3) + biases3)
  valid_prediction = tf.nn.softmax(tf.matmul(lay3_valid, weights4) + biases4)
  lay1_test = tf.nn.relu(tf.matmul(tf_test_dataset, weights1) + biases1)
  lay2_test = tf.nn.relu(tf.matmul(lay1_test, weights2) + biases2)
  lay3_test = tf.nn.relu(tf.matmul(lay2_test, weights3) + biases3)
  test_prediction = tf.nn.softmax(tf.matmul(lay3_test, weights4) + biases4)

In [21]:
num_steps = 18001

with tf.Session(graph=graph) as session:
  tf.initialize_all_variables().run()
  print("Initialized")
  for step in range(num_steps):
    # Pick an offset within the training data, which has been randomized.
    # Note: we could use better randomization across epochs.
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    # Generate a minibatch.
    batch_data = train_dataset[offset:(offset + batch_size), :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    # Prepare a dictionary telling the session where to feed the minibatch.
    # The key of the dictionary is the placeholder node of the graph to be fed,
    # and the value is the numpy array to feed to it.
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 500 == 0):
      print("Minibatch loss at step %d: %f" % (step, l))
      print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
      print("Validation accuracy: %.1f%%" % accuracy(
        valid_prediction.eval(), valid_labels))
  print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))


Initialized
Minibatch loss at step 0: 2.427155
Minibatch accuracy: 7.0%
Validation accuracy: 27.4%
Minibatch loss at step 500: 0.363047
Minibatch accuracy: 89.8%
Validation accuracy: 85.8%
Minibatch loss at step 1000: 0.466222
Minibatch accuracy: 85.9%
Validation accuracy: 86.2%
Minibatch loss at step 1500: 0.249981
Minibatch accuracy: 92.2%
Validation accuracy: 87.8%
Minibatch loss at step 2000: 0.246187
Minibatch accuracy: 94.5%
Validation accuracy: 88.7%
Minibatch loss at step 2500: 0.279155
Minibatch accuracy: 91.4%
Validation accuracy: 88.4%
Minibatch loss at step 3000: 0.340918
Minibatch accuracy: 86.7%
Validation accuracy: 89.0%
Minibatch loss at step 3500: 0.344907
Minibatch accuracy: 88.3%
Validation accuracy: 89.1%
Minibatch loss at step 4000: 0.252765
Minibatch accuracy: 93.0%
Validation accuracy: 89.3%
Minibatch loss at step 4500: 0.248396
Minibatch accuracy: 91.4%
Validation accuracy: 89.6%
Minibatch loss at step 5000: 0.309714
Minibatch accuracy: 92.2%
Validation accuracy: 89.9%
Minibatch loss at step 5500: 0.205931
Minibatch accuracy: 93.0%
Validation accuracy: 89.8%
Minibatch loss at step 6000: 0.344032
Minibatch accuracy: 90.6%
Validation accuracy: 90.0%
Minibatch loss at step 6500: 0.167668
Minibatch accuracy: 94.5%
Validation accuracy: 90.0%
Minibatch loss at step 7000: 0.291468
Minibatch accuracy: 90.6%
Validation accuracy: 90.2%
Minibatch loss at step 7500: 0.183530
Minibatch accuracy: 93.0%
Validation accuracy: 90.3%
Minibatch loss at step 8000: 0.275425
Minibatch accuracy: 93.8%
Validation accuracy: 90.4%
Minibatch loss at step 8500: 0.143154
Minibatch accuracy: 95.3%
Validation accuracy: 90.5%
Minibatch loss at step 9000: 0.174426
Minibatch accuracy: 95.3%
Validation accuracy: 90.2%
Minibatch loss at step 9500: 0.191256
Minibatch accuracy: 95.3%
Validation accuracy: 90.6%
Minibatch loss at step 10000: 0.177660
Minibatch accuracy: 93.0%
Validation accuracy: 90.5%
Minibatch loss at step 10500: 0.156403
Minibatch accuracy: 95.3%
Validation accuracy: 90.5%
Minibatch loss at step 11000: 0.076319
Minibatch accuracy: 97.7%
Validation accuracy: 90.5%
Minibatch loss at step 11500: 0.141267
Minibatch accuracy: 96.1%
Validation accuracy: 90.6%
Minibatch loss at step 12000: 0.126884
Minibatch accuracy: 96.1%
Validation accuracy: 90.7%
Minibatch loss at step 12500: 0.100883
Minibatch accuracy: 98.4%
Validation accuracy: 90.5%
Minibatch loss at step 13000: 0.167142
Minibatch accuracy: 94.5%
Validation accuracy: 90.7%
Minibatch loss at step 13500: 0.077268
Minibatch accuracy: 97.7%
Validation accuracy: 90.8%
Minibatch loss at step 14000: 0.110395
Minibatch accuracy: 96.9%
Validation accuracy: 90.8%
Minibatch loss at step 14500: 0.094239
Minibatch accuracy: 95.3%
Validation accuracy: 90.9%
Minibatch loss at step 15000: 0.073499
Minibatch accuracy: 98.4%
Validation accuracy: 90.7%
Minibatch loss at step 15500: 0.083872
Minibatch accuracy: 98.4%
Validation accuracy: 90.9%
Minibatch loss at step 16000: 0.030450
Minibatch accuracy: 100.0%
Validation accuracy: 90.8%
Minibatch loss at step 16500: 0.058453
Minibatch accuracy: 98.4%
Validation accuracy: 90.9%
Minibatch loss at step 17000: 0.026212
Minibatch accuracy: 99.2%
Validation accuracy: 90.9%
Minibatch loss at step 17500: 0.013765
Minibatch accuracy: 100.0%
Validation accuracy: 90.9%
Minibatch loss at step 18000: 0.056984
Minibatch accuracy: 98.4%
Validation accuracy: 90.9%
Test accuracy: 96.3%

Huge! That's my best score on this dataset. I have also tried more parameters, but it does not help:


In [22]:
batch_size = 128
num_hidden_nodes1 = 1024
num_hidden_nodes2 = 512
num_hidden_nodes3 = 256
keep_prob = 0.5

graph = tf.Graph()
with graph.as_default():

  # Input data. For the training data, we use a placeholder that will be fed
  # at run time with a training minibatch.
  tf_train_dataset = tf.placeholder(tf.float32,
                                    shape=(batch_size, image_size * image_size))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  global_step = tf.Variable(0)

  # Variables.
  weights1 = tf.Variable(
    tf.truncated_normal(
        [image_size * image_size, num_hidden_nodes1],
        stddev=np.sqrt(2.0 / (image_size * image_size)))
    )
  biases1 = tf.Variable(tf.zeros([num_hidden_nodes1]))
  weights2 = tf.Variable(
    tf.truncated_normal([num_hidden_nodes1, num_hidden_nodes2], stddev=np.sqrt(2.0 / num_hidden_nodes1)))
  biases2 = tf.Variable(tf.zeros([num_hidden_nodes2]))
  weights3 = tf.Variable(
    tf.truncated_normal([num_hidden_nodes2, num_hidden_nodes3], stddev=np.sqrt(2.0 / num_hidden_nodes2)))
  biases3 = tf.Variable(tf.zeros([num_hidden_nodes3]))
  weights4 = tf.Variable(
    tf.truncated_normal([num_hidden_nodes3, num_labels], stddev=np.sqrt(2.0 / num_hidden_nodes3)))
  biases4 = tf.Variable(tf.zeros([num_labels]))
  
  # Training computation.
  lay1_train = tf.nn.relu(tf.matmul(tf_train_dataset, weights1) + biases1)
  drop1 = tf.nn.dropout(lay1_train, 0.5)
  lay2_train = tf.nn.relu(tf.matmul(drop1, weights2) + biases2)
  drop2 = tf.nn.dropout(lay2_train, 0.5)
  lay3_train = tf.nn.relu(tf.matmul(drop2, weights3) + biases3)
  drop3 = tf.nn.dropout(lay3_train, 0.5)
  logits = tf.matmul(drop3, weights4) + biases4
  loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
    
  # Optimizer.
  learning_rate = tf.train.exponential_decay(0.5, global_step, 5000, 0.80, staircase=True)
  optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  lay1_valid = tf.nn.relu(tf.matmul(tf_valid_dataset, weights1) + biases1)
  lay2_valid = tf.nn.relu(tf.matmul(lay1_valid, weights2) + biases2)
  lay3_valid = tf.nn.relu(tf.matmul(lay2_valid, weights3) + biases3)
  valid_prediction = tf.nn.softmax(tf.matmul(lay3_valid, weights4) + biases4)
  lay1_test = tf.nn.relu(tf.matmul(tf_test_dataset, weights1) + biases1)
  lay2_test = tf.nn.relu(tf.matmul(lay1_test, weights2) + biases2)
  lay3_test = tf.nn.relu(tf.matmul(lay2_test, weights3) + biases3)
  test_prediction = tf.nn.softmax(tf.matmul(lay3_test, weights4) + biases4)

In [23]:
num_steps = 20001

with tf.Session(graph=graph) as session:
  tf.initialize_all_variables().run()
  print("Initialized")
  for step in range(num_steps):
    # Pick an offset within the training data, which has been randomized.
    # Note: we could use better randomization across epochs.
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    # Generate a minibatch.
    batch_data = train_dataset[offset:(offset + batch_size), :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    # Prepare a dictionary telling the session where to feed the minibatch.
    # The key of the dictionary is the placeholder node of the graph to be fed,
    # and the value is the numpy array to feed to it.
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 500 == 0):
      print("Minibatch loss at step %d: %f" % (step, l))
      print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
      print("Validation accuracy: %.1f%%" % accuracy(
        valid_prediction.eval(), valid_labels))
  print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))


Initialized
Minibatch loss at step 0: 2.644048
Minibatch accuracy: 10.2%
Validation accuracy: 22.9%
Minibatch loss at step 500: 0.505960
Minibatch accuracy: 85.2%
Validation accuracy: 84.5%
Minibatch loss at step 1000: 0.571871
Minibatch accuracy: 80.5%
Validation accuracy: 85.0%
Minibatch loss at step 1500: 0.519562
Minibatch accuracy: 85.2%
Validation accuracy: 85.4%
Minibatch loss at step 2000: 0.388242
Minibatch accuracy: 89.8%
Validation accuracy: 86.4%
Minibatch loss at step 2500: 0.469020
Minibatch accuracy: 82.8%
Validation accuracy: 86.6%
Minibatch loss at step 3000: 0.533019
Minibatch accuracy: 83.6%
Validation accuracy: 86.6%
Minibatch loss at step 3500: 0.550292
Minibatch accuracy: 84.4%
Validation accuracy: 87.2%
Minibatch loss at step 4000: 0.479638
Minibatch accuracy: 86.7%
Validation accuracy: 87.4%
Minibatch loss at step 4500: 0.430816
Minibatch accuracy: 88.3%
Validation accuracy: 87.2%
Minibatch loss at step 5000: 0.413097
Minibatch accuracy: 88.3%
Validation accuracy: 87.7%
Minibatch loss at step 5500: 0.483560
Minibatch accuracy: 82.8%
Validation accuracy: 87.8%
Minibatch loss at step 6000: 0.562747
Minibatch accuracy: 82.0%
Validation accuracy: 88.3%
Minibatch loss at step 6500: 0.346888
Minibatch accuracy: 90.6%
Validation accuracy: 88.3%
Minibatch loss at step 7000: 0.523011
Minibatch accuracy: 81.2%
Validation accuracy: 88.3%
Minibatch loss at step 7500: 0.518974
Minibatch accuracy: 84.4%
Validation accuracy: 88.6%
Minibatch loss at step 8000: 0.692198
Minibatch accuracy: 80.5%
Validation accuracy: 88.8%
Minibatch loss at step 8500: 0.438252
Minibatch accuracy: 89.8%
Validation accuracy: 88.6%
Minibatch loss at step 9000: 0.436238
Minibatch accuracy: 86.7%
Validation accuracy: 88.8%
Minibatch loss at step 9500: 0.430096
Minibatch accuracy: 85.9%
Validation accuracy: 88.9%
Minibatch loss at step 10000: 0.506851
Minibatch accuracy: 85.9%
Validation accuracy: 88.8%
Minibatch loss at step 10500: 0.352449
Minibatch accuracy: 91.4%
Validation accuracy: 89.2%
Minibatch loss at step 11000: 0.386867
Minibatch accuracy: 91.4%
Validation accuracy: 89.3%
Minibatch loss at step 11500: 0.369807
Minibatch accuracy: 86.7%
Validation accuracy: 89.3%
Minibatch loss at step 12000: 0.622503
Minibatch accuracy: 85.2%
Validation accuracy: 89.4%
Minibatch loss at step 12500: 0.330038
Minibatch accuracy: 90.6%
Validation accuracy: 89.4%
Minibatch loss at step 13000: 0.437459
Minibatch accuracy: 87.5%
Validation accuracy: 89.4%
Minibatch loss at step 13500: 0.383894
Minibatch accuracy: 89.1%
Validation accuracy: 89.4%
Minibatch loss at step 14000: 0.422878
Minibatch accuracy: 84.4%
Validation accuracy: 89.6%
Minibatch loss at step 14500: 0.470360
Minibatch accuracy: 85.9%
Validation accuracy: 89.7%
Minibatch loss at step 15000: 0.400381
Minibatch accuracy: 89.1%
Validation accuracy: 89.6%
Minibatch loss at step 15500: 0.422781
Minibatch accuracy: 86.7%
Validation accuracy: 89.6%
Minibatch loss at step 16000: 0.276475
Minibatch accuracy: 91.4%
Validation accuracy: 89.8%
Minibatch loss at step 16500: 0.233879
Minibatch accuracy: 93.8%
Validation accuracy: 89.9%
Minibatch loss at step 17000: 0.289002
Minibatch accuracy: 92.2%
Validation accuracy: 89.7%
Minibatch loss at step 17500: 0.200542
Minibatch accuracy: 93.8%
Validation accuracy: 90.0%
Minibatch loss at step 18000: 0.277440
Minibatch accuracy: 93.0%
Validation accuracy: 89.9%
Minibatch loss at step 18500: 0.352895
Minibatch accuracy: 86.7%
Validation accuracy: 89.9%
Minibatch loss at step 19000: 0.293568
Minibatch accuracy: 90.6%
Validation accuracy: 90.1%
Minibatch loss at step 19500: 0.369922
Minibatch accuracy: 89.8%
Validation accuracy: 89.9%
Minibatch loss at step 20000: 0.426287
Minibatch accuracy: 85.2%
Validation accuracy: 90.2%
Test accuracy: 95.6%