Deep Learning

Assignment 3

Previously in 2_fullyconnected.ipynb, you trained a logistic regression and a neural network model.

The goal of this assignment is to explore regularization techniques.


In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle

First reload the data we generated in notmist.ipynb.


In [2]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)


Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)

Reformat into a shape that's more adapted to the models we're going to train:

  • data as a flat matrix,
  • labels as float 1-hot encodings.

In [3]:
image_size = 28
num_labels = 10

def reformat(dataset, labels):
  dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
  # Map 2 to [0.0, 1.0, 0.0 ...], 3 to [0.0, 0.0, 1.0 ...]
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)


Training set (200000, 784) (200000, 10)
Validation set (10000, 784) (10000, 10)
Test set (10000, 784) (10000, 10)

Note:

np.arange(num_labels) == labels[:,None]

This is a filter from numpy.ndarray.


In [6]:
def accuracy(predictions, labels):
  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

Problem 1

Introduce and tune L2 regularization for both logistic and neural network models. Remember that L2 amounts to adding a penalty on the norm of the weights to the loss. In TensorFlow, you can compute the L2 loss for a tensor t using nn.l2_loss(t). The right amount of regularization should improve your validation / test accuracy.



In [34]:
batch_size = 3000
num_hiddens = 50
alpha = 0.1

graph = tf.Graph()
with graph.as_default():
    #input
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size,image_size*image_size))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size,num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    
    tf_valid_labels = tf.constant(valid_labels) #invalid 
    tf_test_labels = tf.constant(test_labels)

    #variables
    weights1 = tf.Variable(tf.truncated_normal([image_size*image_size,num_hiddens]))
    biases1 = tf.Variable(tf.zeros([num_hiddens]))
    weights2 = tf.Variable(tf.truncated_normal([num_hiddens, num_labels]))
    biases2 = tf.Variable(tf.zeros([num_labels]))
    
    #training computation
    hiddens1_input = tf.matmul(tf_train_dataset,weights1)+biases1
    hiddens1_output = tf.nn.relu(hiddens1_input)
    logits = tf.matmul(hiddens1_output,weights2)+biases2
    loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels)+alpha*tf.nn.l2_loss(weights1)+alpha*tf.nn.l2_loss(weights2))
    
    #optimizer
    optimizer = tf.train.GradientDescentOptimizer(0.2).minimize(loss)
    
    #predictions
    tf_train_prediction = tf.nn.softmax(logits)
    tf_valid_prediction = tf.nn.softmax(tf.matmul(tf.nn.relu(tf.matmul(tf_valid_dataset,weights1)+biases1),weights2)+biases2)
    tf_test_prediction = tf.nn.softmax(tf.matmul(tf.nn.relu(tf.matmul(tf_test_dataset,weights1)+biases1),weights2)+biases2)
    
# training
num_steps = 6000
with tf.Session(graph=graph) as sess:
    # initilze variables
    init_graph = tf.initialize_all_variables()
    sess.run(init_graph)
    print("Initialized!")
    
    #training iterations
    for step in range(num_steps):
        #offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[0:batch_size, :]
        batch_labels = train_labels[0:batch_size, :]
        
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = sess.run([optimizer, loss, tf_train_prediction], feed_dict=feed_dict)
        
        if (step % 500 == 0):
            print("Minibatch loss at step %d: %f" % (step, l))
            print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
            print("Validation accuracy: %.1f%%" % accuracy(tf_valid_prediction.eval(), valid_labels))
            print("Test accuracy: %.1f%%" % accuracy(tf_test_prediction.eval(), test_labels))
            print("----------------------------------------")


Initialized!
Minibatch loss at step 0: 1633.720459
Minibatch accuracy: 9.8%
Validation accuracy: 16.5%
Test accuracy: 17.1%
----------------------------------------
Minibatch loss at step 500: 1.332976
Minibatch accuracy: 82.0%
Validation accuracy: 80.0%
Test accuracy: 86.3%
----------------------------------------
Minibatch loss at step 1000: 1.318922
Minibatch accuracy: 81.9%
Validation accuracy: 79.9%
Test accuracy: 86.3%
----------------------------------------
Minibatch loss at step 1500: 1.313562
Minibatch accuracy: 82.2%
Validation accuracy: 80.0%
Test accuracy: 86.4%
----------------------------------------
Minibatch loss at step 2000: 1.310341
Minibatch accuracy: 82.2%
Validation accuracy: 80.0%
Test accuracy: 86.3%
----------------------------------------
Minibatch loss at step 2500: 1.308096
Minibatch accuracy: 82.2%
Validation accuracy: 80.0%
Test accuracy: 86.3%
----------------------------------------
Minibatch loss at step 3000: 1.306499
Minibatch accuracy: 82.2%
Validation accuracy: 79.9%
Test accuracy: 86.3%
----------------------------------------
Minibatch loss at step 3500: 1.305253
Minibatch accuracy: 82.2%
Validation accuracy: 79.9%
Test accuracy: 86.3%
----------------------------------------
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-34-3822a5f78e39> in <module>()
     50 
     51         feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
---> 52         _, l, predictions = sess.run([optimizer, loss, tf_train_prediction], feed_dict=feed_dict)
     53 
     54         if (step % 500 == 0):

/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    338     try:
    339       result = self._run(None, fetches, feed_dict, options_ptr,
--> 340                          run_metadata_ptr)
    341       if run_metadata:
    342         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
    562     try:
    563       results = self._do_run(handle, target_list, unique_fetches,
--> 564                              feed_dict_string, options, run_metadata)
    565     finally:
    566       # The movers are no longer used. Delete them.

/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
    635     if handle is None:
    636       return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
--> 637                            target_list, options, run_metadata)
    638     else:
    639       return self._do_call(_prun_fn, self._session, handle, feed_dict,

/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
    642   def _do_call(self, fn, *args):
    643     try:
--> 644       return fn(*args)
    645     except tf_session.StatusNotOK as e:
    646       error_message = compat.as_text(e.error_message)

/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
    626       else:
    627         return tf_session.TF_Run(
--> 628             session, None, feed_dict, fetch_list, target_list, None)
    629 
    630     def _prun_fn(session, handle, feed_dict, fetch_list):

KeyboardInterrupt: 

Problem 2

Let's demonstrate an extreme case of overfitting. Restrict your training data to just a few batches. What happens?



In [31]:
batch_size = 3000
num_hiddens = 50
alpha = 0.1

graph = tf.Graph()
with graph.as_default():
    #input
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size,image_size*image_size))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size,num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    
    tf_valid_labels = tf.constant(valid_labels) #invalid 
    tf_test_labels = tf.constant(test_labels)

    #variables
    weights1 = tf.Variable(tf.truncated_normal([image_size*image_size,num_hiddens]))
    biases1 = tf.Variable(tf.zeros([num_hiddens]))
    weights2 = tf.Variable(tf.truncated_normal([num_hiddens, num_labels]))
    biases2 = tf.Variable(tf.zeros([num_labels]))
    
    #training computation
    hiddens1_input = tf.matmul(tf_train_dataset,weights1)+biases1
    hiddens1_output = tf.nn.relu(hiddens1_input)
    logits = tf.matmul(hiddens1_output,weights2)+biases2
    loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
    
    #optimizer
    optimizer = tf.train.GradientDescentOptimizer(0.2).minimize(loss)
    
    #predictions
    tf_train_prediction = tf.nn.softmax(logits)
    tf_valid_prediction = tf.nn.softmax(tf.matmul(tf.nn.relu(tf.matmul(tf_valid_dataset,weights1)+biases1),weights2)+biases2)
    tf_test_prediction = tf.nn.softmax(tf.matmul(tf.nn.relu(tf.matmul(tf_test_dataset,weights1)+biases1),weights2)+biases2)
    
# training
num_steps = 6000
with tf.Session(graph=graph) as sess:
    # initilze variables
    init_graph = tf.initialize_all_variables()
    sess.run(init_graph)
    print("Initialized!")
    
    #training iterations
    for step in range(num_steps):
        #offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[0:batch_size, :]
        batch_labels = train_labels[0:batch_size, :]
        
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = sess.run([optimizer, loss, tf_train_prediction], feed_dict=feed_dict)
        
        if (step % 500 == 0):
            print("Minibatch loss at step %d: %f" % (step, l))
            print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
            print("Validation accuracy: %.1f%%" % accuracy(tf_valid_prediction.eval(), valid_labels))
            print("Test accuracy: %.1f%%" % accuracy(tf_test_prediction.eval(), test_labels))
            print("----------------------------------------")


Initialized!
Minibatch loss at step 0: 83.285378
Minibatch accuracy: 9.9%
Validation accuracy: 16.8%
Test accuracy: 17.2%
----------------------------------------
Minibatch loss at step 500: 0.552913
Minibatch accuracy: 85.7%
Validation accuracy: 71.4%
Test accuracy: 79.1%
----------------------------------------
Minibatch loss at step 1000: 0.378556
Minibatch accuracy: 88.9%
Validation accuracy: 72.2%
Test accuracy: 80.1%
----------------------------------------
Minibatch loss at step 1500: 0.197338
Minibatch accuracy: 95.3%
Validation accuracy: 72.7%
Test accuracy: 80.8%
----------------------------------------
Minibatch loss at step 2000: 0.125837
Minibatch accuracy: 97.2%
Validation accuracy: 72.5%
Test accuracy: 80.6%
----------------------------------------
Minibatch loss at step 2500: 0.080391
Minibatch accuracy: 98.6%
Validation accuracy: 72.7%
Test accuracy: 80.8%
----------------------------------------
Minibatch loss at step 3000: 0.051489
Minibatch accuracy: 99.2%
Validation accuracy: 72.9%
Test accuracy: 81.0%
----------------------------------------
Minibatch loss at step 3500: 0.033196
Minibatch accuracy: 99.6%
Validation accuracy: 73.0%
Test accuracy: 81.0%
----------------------------------------
Minibatch loss at step 4000: 0.022618
Minibatch accuracy: 99.8%
Validation accuracy: 73.1%
Test accuracy: 81.1%
----------------------------------------
Minibatch loss at step 4500: 0.016339
Minibatch accuracy: 99.8%
Validation accuracy: 73.2%
Test accuracy: 81.3%
----------------------------------------
Minibatch loss at step 5000: 0.012354
Minibatch accuracy: 99.9%
Validation accuracy: 73.2%
Test accuracy: 81.2%
----------------------------------------
Minibatch loss at step 5500: 0.009699
Minibatch accuracy: 100.0%
Validation accuracy: 73.2%
Test accuracy: 81.2%
----------------------------------------
Minibatch loss at step 6000: 0.007862
Minibatch accuracy: 100.0%
Validation accuracy: 73.2%
Test accuracy: 81.3%
----------------------------------------
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-31-12f441beedc4> in <module>()
     50 
     51         feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
---> 52         _, l, predictions = sess.run([optimizer, loss, tf_train_prediction], feed_dict=feed_dict)
     53 
     54         if (step % 500 == 0):

/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    338     try:
    339       result = self._run(None, fetches, feed_dict, options_ptr,
--> 340                          run_metadata_ptr)
    341       if run_metadata:
    342         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
    562     try:
    563       results = self._do_run(handle, target_list, unique_fetches,
--> 564                              feed_dict_string, options, run_metadata)
    565     finally:
    566       # The movers are no longer used. Delete them.

/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
    635     if handle is None:
    636       return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
--> 637                            target_list, options, run_metadata)
    638     else:
    639       return self._do_call(_prun_fn, self._session, handle, feed_dict,

/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
    642   def _do_call(self, fn, *args):
    643     try:
--> 644       return fn(*args)
    645     except tf_session.StatusNotOK as e:
    646       error_message = compat.as_text(e.error_message)

/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
    626       else:
    627         return tf_session.TF_Run(
--> 628             session, None, feed_dict, fetch_list, target_list, None)
    629 
    630     def _prun_fn(session, handle, feed_dict, fetch_list):

KeyboardInterrupt: 

Problem 3

Introduce Dropout on the hidden layer of the neural network. Remember: Dropout should only be introduced during training, not evaluation, otherwise your evaluation results would be stochastic as well. TensorFlow provides nn.dropout() for that, but you have to make sure it's only inserted during training.

What happens to our extreme overfitting case?



In [35]:
batch_size = 3000
num_hiddens = 50
keep_prob = 0.5

graph = tf.Graph()
with graph.as_default():
    #input
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size,image_size*image_size))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size,num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    
    tf_valid_labels = tf.constant(valid_labels) #invalid 
    tf_test_labels = tf.constant(test_labels)

    #variables
    weights1 = tf.Variable(tf.truncated_normal([image_size*image_size,num_hiddens]))
    biases1 = tf.Variable(tf.zeros([num_hiddens]))
    weights2 = tf.Variable(tf.truncated_normal([num_hiddens, num_labels]))
    biases2 = tf.Variable(tf.zeros([num_labels]))
    
    #training computation
    hiddens1_input = tf.matmul(tf_train_dataset,weights1)+biases1
    hiddens1_output = tf.nn.dropout(tf.nn.relu(hiddens1_input),keep_prob)
    logits = tf.matmul(hiddens1_output,weights2)+biases2
    loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
    
    #optimizer
    optimizer = tf.train.GradientDescentOptimizer(0.2).minimize(loss)
    
    #predictions
    tf_train_prediction = tf.nn.softmax(logits)
    tf_valid_prediction = tf.nn.softmax(tf.matmul(tf.nn.relu(tf.matmul(tf_valid_dataset,weights1)+biases1),weights2)+biases2)
    tf_test_prediction = tf.nn.softmax(tf.matmul(tf.nn.relu(tf.matmul(tf_test_dataset,weights1)+biases1),weights2)+biases2)
    
# training
num_steps = 6000
with tf.Session(graph=graph) as sess:
    # initilze variables
    init_graph = tf.initialize_all_variables()
    sess.run(init_graph)
    print("Initialized!")
    
    #training iterations
    for step in range(num_steps):
        #offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[0:batch_size, :]
        batch_labels = train_labels[0:batch_size, :]
        
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = sess.run([optimizer, loss, tf_train_prediction], feed_dict=feed_dict)
        
        if (step % 500 == 0):
            print("Minibatch loss at step %d: %f" % (step, l))
            print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
            print("Validation accuracy: %.1f%%" % accuracy(tf_valid_prediction.eval(), valid_labels))
            print("Test accuracy: %.1f%%" % accuracy(tf_test_prediction.eval(), test_labels))
            print("----------------------------------------")


Initialized!
Minibatch loss at step 0: 97.932686
Minibatch accuracy: 10.9%
Validation accuracy: 20.0%
Test accuracy: 21.1%
----------------------------------------
Minibatch loss at step 500: 1.542884
Minibatch accuracy: 56.3%
Validation accuracy: 66.8%
Test accuracy: 73.5%
----------------------------------------
Minibatch loss at step 1000: 1.260966
Minibatch accuracy: 63.6%
Validation accuracy: 70.8%
Test accuracy: 77.5%
----------------------------------------
Minibatch loss at step 1500: 1.168070
Minibatch accuracy: 64.6%
Validation accuracy: 73.4%
Test accuracy: 79.7%
----------------------------------------
Minibatch loss at step 2000: 1.070042
Minibatch accuracy: 67.6%
Validation accuracy: 72.6%
Test accuracy: 79.0%
----------------------------------------
Minibatch loss at step 2500: 1.030476
Minibatch accuracy: 69.1%
Validation accuracy: 75.9%
Test accuracy: 83.1%
----------------------------------------
Minibatch loss at step 3000: 0.965172
Minibatch accuracy: 70.2%
Validation accuracy: 75.1%
Test accuracy: 81.6%
----------------------------------------
Minibatch loss at step 3500: 0.903344
Minibatch accuracy: 72.5%
Validation accuracy: 76.1%
Test accuracy: 82.9%
----------------------------------------
Minibatch loss at step 4000: 0.845268
Minibatch accuracy: 73.4%
Validation accuracy: 76.9%
Test accuracy: 83.6%
----------------------------------------
Minibatch loss at step 4500: 0.835708
Minibatch accuracy: 74.8%
Validation accuracy: 77.2%
Test accuracy: 84.4%
----------------------------------------
Minibatch loss at step 5000: 0.821380
Minibatch accuracy: 74.3%
Validation accuracy: 76.1%
Test accuracy: 82.6%
----------------------------------------
Minibatch loss at step 5500: 0.729420
Minibatch accuracy: 76.4%
Validation accuracy: 77.2%
Test accuracy: 84.0%
----------------------------------------

Problem 4

Try to get the best performance you can using a multi-layer model! The best reported test accuracy using a deep network is 97.1%.

One avenue you can explore is to add multiple layers.

Another one is to use learning rate decay:

global_step = tf.Variable(0)  # count the number of steps taken.
learning_rate = tf.train.exponential_decay(0.5, global_step, ...)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)



In [51]:
# example, how to operate tf variable
a = tf.Variable(0)
a = a+2
with tf.Session() as sess:
    init_graph = tf.initialize_all_variables()
    sess.run(init_graph)
    result = sess.run(a)
print(result)


2

In [ ]:
batch_size = 128
num_hiddens1 = 2024
keep_prob = 0.5

graph = tf.Graph()
with graph.as_default():
    #input
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size,image_size*image_size))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size,num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    
    tf_valid_labels = tf.constant(valid_labels) #invalid 
    tf_test_labels = tf.constant(test_labels)

    #variables
    weights1 = tf.Variable(tf.truncated_normal([image_size*image_size,num_hiddens1]))
    biases1 = tf.Variable(tf.zeros([num_hiddens1]))
    weights2 = tf.Variable(tf.truncated_normal([num_hiddens1, num_labels]))
    biases2 = tf.Variable(tf.zeros([num_labels]))
    #weights3 = tf.Variable(tf.truncated_normal([num_hiddens2, num_labels]))
    #biases3 = tf.Variable(tf.zeros([num_labels]))
    #weights4 = tf.Variable(tf.truncated_normal([num_hiddens3, num_labels]))
    #biases4 = tf.Variable(tf.zeros([num_labels]))
    
    #training computation
    hiddens1_input = tf.matmul(tf_train_dataset,weights1)+biases1
    hiddens1_output = tf.nn.dropout(tf.nn.relu(hiddens1_input),keep_prob)
    
    hiddens2_input = tf.matmul(hiddens1_output,weights2)+biases2
    #hiddens2_output = tf.nn.relu(hiddens2_input)
    
    #hiddens3_input = tf.matmul(hiddens2_output,weights3)+biases3
    #hiddens3_output = tf.nn.relu(hiddens3_input)
    
    #hiddens4_input = tf.matmul(hiddens3_output,weights4)+biases4
    logits = hiddens2_input
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
    
    #optimizer
    global_step = tf.Variable(0)  # count the number of steps taken.
    learning_rate = tf.train.exponential_decay(0.5, global_step,500,0.90,staircase=True)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
    
    #predictions
    tf_train_prediction = tf.nn.softmax(logits)
    
    valid_h1_in = tf.matmul(tf_valid_dataset,weights1)+biases1
    valid_h1_out = tf.nn.relu(valid_h1_in)
    valid_h2_in = tf.matmul(valid_h1_out,weights2)+biases2
    #valid_h2_out = tf.nn.relu(valid_h2_in)
    #valid_h3_in = tf.matmul(valid_h2_out,weights3)+biases3
    #valid_h3_out = tf.nn.relu(valid_h3_in)
    #valid_h4_in = tf.matmul(valid_h3_out,weights4)+biases4
    valid_logits = valid_h2_in
    tf_valid_prediction = tf.nn.softmax(valid_logits)
    
    test_h1_in = tf.matmul(tf_test_dataset,weights1)+biases1
    test_h1_out = tf.nn.relu(test_h1_in)
    test_h2_in = tf.matmul(test_h1_out,weights2)+biases2
    #test_h2_out = tf.nn.relu(test_h2_in)
    #test_h3_in = tf.matmul(test_h2_out,weights3)+biases3
    #test_h3_out = tf.nn.relu(test_h3_in)
    #test_h4_in = tf.matmul(test_h3_out,weights4)+biases4
    test_logits = test_h2_in
    tf_test_prediction = tf.nn.softmax(test_logits)
    
# training
num_steps = 12000
with tf.Session(graph=graph) as sess:
    # initilze variables
    init_graph = tf.initialize_all_variables()
    sess.run(init_graph)
    print("Initialized!")
    
    #training iterations
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        lr, _, l, predictions = sess.run([learning_rate, optimizer, loss, tf_train_prediction], feed_dict=feed_dict)
        
        global_step = global_step + 1
        
        if (step % 500 == 0):
            print("Learning rate: %0.3f" % lr)
            print("Minibatch loss at step %d: %f" % (step, l))
            print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
            print("Validation accuracy: %.1f%%" % accuracy(tf_valid_prediction.eval(), valid_labels))
            print("Test accuracy: %.1f%%" % accuracy(tf_test_prediction.eval(), test_labels))
            print("----------------------------------------")


Initialized!
Learning rate: 0.500
Minibatch loss at step 0: 689.082703
Minibatch accuracy: 7.8%
Validation accuracy: 33.7%
Test accuracy: 35.9%
----------------------------------------
Learning rate: 0.450
Minibatch loss at step 500: 47.778175
Minibatch accuracy: 77.3%
Validation accuracy: 81.0%
Test accuracy: 87.5%
----------------------------------------
Learning rate: 0.405
Minibatch loss at step 1000: 15.570626
Minibatch accuracy: 80.5%
Validation accuracy: 82.3%
Test accuracy: 88.8%
----------------------------------------
Learning rate: 0.364
Minibatch loss at step 1500: 25.814987
Minibatch accuracy: 80.5%
Validation accuracy: 82.8%
Test accuracy: 89.1%
----------------------------------------
Learning rate: 0.328
Minibatch loss at step 2000: 21.954035
Minibatch accuracy: 78.9%
Validation accuracy: 83.2%
Test accuracy: 89.8%
----------------------------------------
Learning rate: 0.295
Minibatch loss at step 2500: 34.246334
Minibatch accuracy: 81.2%
Validation accuracy: 83.1%
Test accuracy: 89.5%
----------------------------------------
Learning rate: 0.266
Minibatch loss at step 3000: 16.740276
Minibatch accuracy: 81.2%
Validation accuracy: 82.8%
Test accuracy: 89.1%
----------------------------------------
Learning rate: 0.239
Minibatch loss at step 3500: 10.099864
Minibatch accuracy: 82.8%
Validation accuracy: 83.5%
Test accuracy: 89.6%
----------------------------------------
Learning rate: 0.215
Minibatch loss at step 4000: 7.153698
Minibatch accuracy: 81.2%
Validation accuracy: 83.5%
Test accuracy: 89.7%
----------------------------------------
Learning rate: 0.194
Minibatch loss at step 4500: 12.003256
Minibatch accuracy: 74.2%
Validation accuracy: 83.5%
Test accuracy: 89.8%
----------------------------------------
Learning rate: 0.174
Minibatch loss at step 5000: 7.057142
Minibatch accuracy: 78.1%
Validation accuracy: 84.0%
Test accuracy: 89.6%
----------------------------------------
Learning rate: 0.157
Minibatch loss at step 5500: 2.096121
Minibatch accuracy: 78.9%
Validation accuracy: 83.8%
Test accuracy: 90.2%
----------------------------------------
Learning rate: 0.141
Minibatch loss at step 6000: 3.546118
Minibatch accuracy: 81.2%
Validation accuracy: 84.3%
Test accuracy: 90.3%
----------------------------------------
Learning rate: 0.127
Minibatch loss at step 6500: 4.163855
Minibatch accuracy: 81.2%
Validation accuracy: 84.5%
Test accuracy: 90.7%
----------------------------------------
Learning rate: 0.114
Minibatch loss at step 7000: 6.390179
Minibatch accuracy: 75.0%
Validation accuracy: 84.7%
Test accuracy: 90.7%
----------------------------------------
Learning rate: 0.103
Minibatch loss at step 7500: 9.002898
Minibatch accuracy: 71.9%
Validation accuracy: 85.0%
Test accuracy: 91.0%
----------------------------------------
Learning rate: 0.093
Minibatch loss at step 8000: 1.808643
Minibatch accuracy: 85.2%
Validation accuracy: 85.0%
Test accuracy: 91.2%
----------------------------------------
Learning rate: 0.083
Minibatch loss at step 8500: 2.679923
Minibatch accuracy: 85.9%
Validation accuracy: 85.2%
Test accuracy: 91.3%
----------------------------------------
Learning rate: 0.075
Minibatch loss at step 9000: 2.403125
Minibatch accuracy: 82.0%
Validation accuracy: 84.9%
Test accuracy: 91.3%
----------------------------------------
Learning rate: 0.068
Minibatch loss at step 9500: 1.435917
Minibatch accuracy: 85.2%
Validation accuracy: 85.2%
Test accuracy: 91.4%
----------------------------------------
Learning rate: 0.061
Minibatch loss at step 10000: 3.545970
Minibatch accuracy: 78.9%
Validation accuracy: 85.4%
Test accuracy: 91.6%
----------------------------------------
Learning rate: 0.055
Minibatch loss at step 10500: 2.524397
Minibatch accuracy: 78.9%
Validation accuracy: 85.5%
Test accuracy: 91.6%
----------------------------------------