Deep Learning

Assignment 4

Previously in 2_fullyconnected.ipynb and 3_regularization.ipynb, we trained fully connected networks to classify notMNIST characters.

The goal of this assignment is make the neural network convolutional.


In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

In [2]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
    save = pickle.load(f)
    train_dataset = save['train_dataset']
    train_labels = save['train_labels']
    valid_dataset = save['valid_dataset']
    valid_labels = save['valid_labels']
    test_dataset = save['test_dataset']
    test_labels = save['test_labels']
    del save  # hint to help gc free up memory
    print('Training set', train_dataset.shape, train_labels.shape)
    print('Validation set', valid_dataset.shape, valid_labels.shape)
    print('Test set', test_dataset.shape, test_labels.shape)


Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)

Reformat into a TensorFlow-friendly shape:

  • convolutions need the image data formatted as a cube (width by height by #channels)
  • labels as float 1-hot encodings.

In [3]:
image_size = 28
num_labels = 10
num_channels = 1 # grayscale

import numpy as np

def reformat(dataset, labels):
    dataset = dataset.reshape(
      (-1, image_size, image_size, num_channels)).astype(np.float32)
    labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
    return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)


Training set (200000, 28, 28, 1) (200000, 10)
Validation set (10000, 28, 28, 1) (10000, 10)
Test set (10000, 28, 28, 1) (10000, 10)

In [4]:
def accuracy(predictions, labels):
  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

Let's build a small network with two convolutional layers, followed by one fully connected layer. Convolutional networks are more expensive computationally, so we'll limit its depth and number of fully connected nodes.


In [5]:
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64

graph = tf.Graph()

with graph.as_default():

    # Input data.
    tf_train_dataset = tf.placeholder(
      tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
  
    # Variables.
    layer1_weights = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, num_channels, depth], stddev=0.1))
    layer1_biases = tf.Variable(tf.zeros([depth]))
    layer2_weights = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, depth, depth], stddev=0.1))
    layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
    layer3_weights = tf.Variable(tf.truncated_normal(
      [image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
    layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
    layer4_weights = tf.Variable(tf.truncated_normal(
      [num_hidden, num_labels], stddev=0.1))
    layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
  
    # Model.
    def model(data):
        conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + layer1_biases)
        conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + layer2_biases)
        shape = hidden.get_shape().as_list()
        reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
        hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
        return tf.matmul(hidden, layer4_weights) + layer4_biases

    # Training computation.
    logits = model(tf_train_dataset)
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
    
    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
  
    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model(tf_test_dataset))

In [7]:
num_steps = 1001

with tf.Session(graph=graph) as session:
    tf.initialize_all_variables().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run(
          [optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 50 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
            print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))


Initialized
Minibatch loss at step 0: 3.459153
Minibatch accuracy: 6.2%
Validation accuracy: 10.0%
Minibatch loss at step 50: 1.542578
Minibatch accuracy: 31.2%
Validation accuracy: 56.3%
Minibatch loss at step 100: 1.173745
Minibatch accuracy: 75.0%
Validation accuracy: 69.4%
Minibatch loss at step 150: 1.045482
Minibatch accuracy: 62.5%
Validation accuracy: 67.2%
Minibatch loss at step 200: 0.882007
Minibatch accuracy: 68.8%
Validation accuracy: 76.5%
Minibatch loss at step 250: 0.706315
Minibatch accuracy: 81.2%
Validation accuracy: 79.0%
Minibatch loss at step 300: 0.333619
Minibatch accuracy: 93.8%
Validation accuracy: 79.3%
Minibatch loss at step 350: 0.644926
Minibatch accuracy: 75.0%
Validation accuracy: 80.4%
Minibatch loss at step 400: 0.765317
Minibatch accuracy: 81.2%
Validation accuracy: 80.4%
Minibatch loss at step 450: 0.871855
Minibatch accuracy: 75.0%
Validation accuracy: 79.9%
Minibatch loss at step 500: 0.304989
Minibatch accuracy: 87.5%
Validation accuracy: 81.3%
Minibatch loss at step 550: 0.650514
Minibatch accuracy: 81.2%
Validation accuracy: 80.1%
Minibatch loss at step 600: 1.222596
Minibatch accuracy: 81.2%
Validation accuracy: 79.5%
Minibatch loss at step 650: 1.170069
Minibatch accuracy: 62.5%
Validation accuracy: 81.6%
Minibatch loss at step 700: 0.315514
Minibatch accuracy: 87.5%
Validation accuracy: 81.0%
Minibatch loss at step 750: 0.810973
Minibatch accuracy: 75.0%
Validation accuracy: 82.5%
Minibatch loss at step 800: 0.585656
Minibatch accuracy: 75.0%
Validation accuracy: 81.3%
Minibatch loss at step 850: 0.683406
Minibatch accuracy: 68.8%
Validation accuracy: 81.8%
Minibatch loss at step 900: 0.400925
Minibatch accuracy: 93.8%
Validation accuracy: 83.2%
Minibatch loss at step 950: 0.962259
Minibatch accuracy: 68.8%
Validation accuracy: 82.4%
Minibatch loss at step 1000: 0.877937
Minibatch accuracy: 75.0%
Validation accuracy: 83.1%
Test accuracy: 89.8%

Problem 1

The convolutional model above uses convolutions with stride 2 to reduce the dimensionality. Replace the strides by a max pooling operation (nn.max_pool()) of stride 2 and kernel size 2.



In [6]:
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64

graph = tf.Graph()

print("Problem1")
with graph.as_default():

    # Input data.
    tf_train_dataset = tf.placeholder(
      tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
  
    # Variables.
    # Dimensions for conv weights are: 
    # patch_height x patch_width x #channels x depth
    layer1_weights = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, num_channels, depth], stddev=0.1))
    layer1_biases = tf.Variable(tf.zeros([depth]))
    layer2_weights = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, depth, depth], stddev=0.1))
    layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
    # Divide by 4 here, as images have been halved twice during max pooling steps
    layer3_weights = tf.Variable(tf.truncated_normal(
      [image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
    layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
    layer4_weights = tf.Variable(tf.truncated_normal(
      [num_hidden, num_labels], stddev=0.1))
    layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
  
    # Model.
    def model(data):
        # Hidden layer 1
        #conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
        #hidden = tf.nn.relu(conv + layer1_biases)
        # Dimensions for strides are:
        # batch x patch_height x patch_width x #channels
        # e.g. [1, 2, 2, 1]
        conv = tf.nn.conv2d(data, layer1_weights, [1, 1, 1, 1], padding='SAME')
        print(data.get_shape().as_list())
        # Do max-pooling with patch size of 2x2 and stride of 2x2, include all batches and channels
        maxpool = tf.nn.max_pool(conv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(maxpool + layer1_biases)
        print(hidden.get_shape().as_list())
        #
        # Hidden layer 2
        #conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
        #hidden = tf.nn.relu(conv + layer2_biases)
        conv = tf.nn.conv2d(hidden, layer2_weights, [1, 1, 1, 1], padding='SAME')
        maxpool = tf.nn.max_pool(conv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(maxpool + layer2_biases)
        shape = hidden.get_shape().as_list()
        print(shape)
        #
        # Fully connected final layer: batch_size x total_features
        # Rollout height, width and feature_map into total features
        reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
        hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
        return tf.matmul(hidden, layer4_weights) + layer4_biases

    # Training computation.
    logits = model(tf_train_dataset)
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
    
    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
  
    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model(tf_test_dataset))


Problem1
[16, 28, 28, 1]
[16, 14, 14, 16]
[16, 7, 7, 16]
[10000, 28, 28, 1]
[10000, 14, 14, 16]
[10000, 7, 7, 16]
[10000, 28, 28, 1]
[10000, 14, 14, 16]
[10000, 7, 7, 16]

In [7]:
num_steps = 1001

with tf.Session(graph=graph) as session:
    tf.initialize_all_variables().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run(
          [optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 50 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
            print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))


Initialized
Minibatch loss at step 0: 2.814492
Minibatch accuracy: 6.2%
Validation accuracy: 10.0%
Minibatch loss at step 50: 1.825976
Minibatch accuracy: 31.2%
Validation accuracy: 39.3%
Minibatch loss at step 100: 1.208937
Minibatch accuracy: 68.8%
Validation accuracy: 56.3%
Minibatch loss at step 150: 0.901899
Minibatch accuracy: 68.8%
Validation accuracy: 68.3%
Minibatch loss at step 200: 0.881065
Minibatch accuracy: 68.8%
Validation accuracy: 78.7%
Minibatch loss at step 250: 0.900500
Minibatch accuracy: 68.8%
Validation accuracy: 80.1%
Minibatch loss at step 300: 0.320497
Minibatch accuracy: 93.8%
Validation accuracy: 80.7%
Minibatch loss at step 350: 0.574538
Minibatch accuracy: 75.0%
Validation accuracy: 80.0%
Minibatch loss at step 400: 0.815669
Minibatch accuracy: 75.0%
Validation accuracy: 80.8%
Minibatch loss at step 450: 0.951818
Minibatch accuracy: 68.8%
Validation accuracy: 79.9%
Minibatch loss at step 500: 0.213410
Minibatch accuracy: 100.0%
Validation accuracy: 83.2%
Minibatch loss at step 550: 0.486456
Minibatch accuracy: 75.0%
Validation accuracy: 82.2%
Minibatch loss at step 600: 1.400446
Minibatch accuracy: 75.0%
Validation accuracy: 81.3%
Minibatch loss at step 650: 1.138508
Minibatch accuracy: 62.5%
Validation accuracy: 83.0%
Minibatch loss at step 700: 0.264180
Minibatch accuracy: 87.5%
Validation accuracy: 82.7%
Minibatch loss at step 750: 0.651341
Minibatch accuracy: 81.2%
Validation accuracy: 83.9%
Minibatch loss at step 800: 0.639289
Minibatch accuracy: 75.0%
Validation accuracy: 83.2%
Minibatch loss at step 850: 0.591433
Minibatch accuracy: 81.2%
Validation accuracy: 82.6%
Minibatch loss at step 900: 0.325814
Minibatch accuracy: 87.5%
Validation accuracy: 84.0%
Minibatch loss at step 950: 0.967972
Minibatch accuracy: 68.8%
Validation accuracy: 83.7%
Minibatch loss at step 1000: 1.029310
Minibatch accuracy: 68.8%
Validation accuracy: 83.9%
Test accuracy: 90.1%

Unable to run this example on my Ubuntu 14.04 VM with 2GB RAM. Either results in "ResourceExhaustedError" or the IPython notebook dies some way in. Had to up RAM to 8GB and use 2 processors to get the code to run. Interesting to check the output from this:

$ cat /proc/meminfo | grep Mem

while running this code. Even with 8GB, the two steps involving acurracy calculations on minibatch see available/free memory drop from ~ 1.8GB to as low as 130MB.

Here's the version of Linux I'm using:

malm@malm-VirtualBox ~ $ cat /proc/version Linux version 3.16.0-38-generic (buildd@allspice) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #52~14.04.1-Ubuntu SMP Fri May 8 09:43:57 UTC 2015


Problem 2

Try to get the best performance you can using a convolutional net. Look for example at the classic LeNet5 architecture, adding Dropout, and/or adding learning rate decay.



In [8]:
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64
dropout_prob = 0.5

graph = tf.Graph()

print("Problem2")
with graph.as_default():

    # Input data.
    tf_train_dataset = tf.placeholder(
      tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
  
    # Variables.
    # Dimensions for conv weights are: 
    # patch_height x patch_width x #channels x depth
    layer1_weights = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, num_channels, depth], stddev=0.1))
    layer1_biases = tf.Variable(tf.zeros([depth]))
    layer2_weights = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, depth, depth], stddev=0.1))
    layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
    # Divide by 4 here, as images have been halved twice during max pooling steps
    layer3_weights = tf.Variable(tf.truncated_normal(
      [image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
    layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
    layer4_weights = tf.Variable(tf.truncated_normal(
      [num_hidden, num_labels], stddev=0.1))
    layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
  
    # Model.
    def model(data):
        # Hidden layer 1
        #conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
        #hidden = tf.nn.relu(conv + layer1_biases)
        # Dimensions for strides are:
        # batch x patch_height x patch_width x #channels
        # e.g. [1, 2, 2, 1]
        conv = tf.nn.conv2d(data, layer1_weights, [1, 1, 1, 1], padding='SAME')
        print(data.get_shape().as_list())
        # Do max-pooling with patch size of 2x2 and stride of 2x2, include all batches and channels
        maxpool = tf.nn.max_pool(conv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(maxpool + layer1_biases)
        # Using dropout - NEW
        hidden = tf.nn.dropout(hidden, dropout_prob)
        print(hidden.get_shape().as_list())
        #
        # Hidden layer 2
        #conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
        #hidden = tf.nn.relu(conv + layer2_biases)
        conv = tf.nn.conv2d(hidden, layer2_weights, [1, 1, 1, 1], padding='SAME')
        maxpool = tf.nn.max_pool(conv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(maxpool + layer2_biases)
        # Using dropout - NEW
        hidden = tf.nn.dropout(hidden, dropout_prob)
        shape = hidden.get_shape().as_list()
        print(shape)
        #
        # Fully connected final layer: batch_size x total_features
        # Rollout height, width and feature_map into total features
        reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
        hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
        # Using dropout - NEW
        hidden = tf.nn.dropout(hidden, dropout_prob)
        return tf.matmul(hidden, layer4_weights) + layer4_biases

    # Training computation.
    logits = model(tf_train_dataset)
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
    
    # Use learning rate decay - NEW
    global_step = tf.Variable(0)
    learning_rate = tf.train.exponential_decay(0.2, global_step, 10000, 0.96)
    
    # Optimizer - NEW
    #optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
  
    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model(tf_test_dataset))


Problem2
[16, 28, 28, 1]
[16, 14, 14, 16]
[16, 7, 7, 16]
[10000, 28, 28, 1]
[10000, 14, 14, 16]
[10000, 7, 7, 16]
[10000, 28, 28, 1]
[10000, 14, 14, 16]
[10000, 7, 7, 16]

In [9]:
num_steps = 1001

with tf.Session(graph=graph) as session:
    tf.initialize_all_variables().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run(
          [optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 50 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
            print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))


Initialized
Minibatch loss at step 0: 5.649518
Minibatch accuracy: 6.2%
Validation accuracy: 10.4%
Minibatch loss at step 50: 2.324049
Minibatch accuracy: 12.5%
Validation accuracy: 12.8%
Minibatch loss at step 100: 2.121173
Minibatch accuracy: 25.0%
Validation accuracy: 19.3%
Minibatch loss at step 150: 2.366560
Minibatch accuracy: 25.0%
Validation accuracy: 17.2%
Minibatch loss at step 200: 2.118935
Minibatch accuracy: 25.0%
Validation accuracy: 22.5%
Minibatch loss at step 250: 2.489249
Minibatch accuracy: 6.2%
Validation accuracy: 23.9%
Minibatch loss at step 300: 1.774330
Minibatch accuracy: 31.2%
Validation accuracy: 23.9%
Minibatch loss at step 350: 2.743913
Minibatch accuracy: 37.5%
Validation accuracy: 29.3%
Minibatch loss at step 400: 1.796861
Minibatch accuracy: 37.5%
Validation accuracy: 34.1%
Minibatch loss at step 450: 2.024087
Minibatch accuracy: 31.2%
Validation accuracy: 31.4%
Minibatch loss at step 500: 2.045249
Minibatch accuracy: 6.2%
Validation accuracy: 36.0%
Minibatch loss at step 550: 1.824360
Minibatch accuracy: 31.2%
Validation accuracy: 33.0%
Minibatch loss at step 600: 2.493344
Minibatch accuracy: 25.0%
Validation accuracy: 36.3%
Minibatch loss at step 650: 2.603277
Minibatch accuracy: 18.8%
Validation accuracy: 36.4%
Minibatch loss at step 700: 1.832464
Minibatch accuracy: 31.2%
Validation accuracy: 39.6%
Minibatch loss at step 750: 1.692683
Minibatch accuracy: 43.8%
Validation accuracy: 40.8%
Minibatch loss at step 800: 1.896492
Minibatch accuracy: 43.8%
Validation accuracy: 39.4%
Minibatch loss at step 850: 1.809458
Minibatch accuracy: 25.0%
Validation accuracy: 32.9%
Minibatch loss at step 900: 2.363876
Minibatch accuracy: 43.8%
Validation accuracy: 41.8%
Minibatch loss at step 950: 1.493961
Minibatch accuracy: 50.0%
Validation accuracy: 48.6%
Minibatch loss at step 1000: 1.789812
Minibatch accuracy: 31.2%
Validation accuracy: 45.1%
Test accuracy: 48.5%

More "kernel died unexpectedly" IPython notebook woe with batch_size 16 on 14.04 with 2GB RAM. Tried max_pool and avg_pool and it made no difference. Need to try again with 8GB RAM and then it works. Too many dropouts above though causing accuracy to drop to 48.5%. Let's try with less dropouts.


In [5]:
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64
dropout_prob = 0.33

graph = tf.Graph()

print("Problem2")
with graph.as_default():

    # Input data.
    tf_train_dataset = tf.placeholder(
      tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
  
    # Variables.
    # Dimensions for conv weights are: 
    # patch_height x patch_width x #channels x depth
    layer1_weights = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, num_channels, depth], stddev=0.1))
    layer1_biases = tf.Variable(tf.zeros([depth]))
    layer2_weights = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, depth, depth], stddev=0.1))
    layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
    # Divide by 4 here, as images have been halved twice during max pooling steps
    layer3_weights = tf.Variable(tf.truncated_normal(
      [image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
    layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
    layer4_weights = tf.Variable(tf.truncated_normal(
      [num_hidden, num_labels], stddev=0.1))
    layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
  
    # Model.
    def model(data):
        # Hidden layer 1
        #conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
        #hidden = tf.nn.relu(conv + layer1_biases)
        # Dimensions for strides are:
        # batch x patch_height x patch_width x #channels
        # e.g. [1, 2, 2, 1]
        conv = tf.nn.conv2d(data, layer1_weights, [1, 1, 1, 1], padding='SAME')
        print(data.get_shape().as_list())
        # Do max-pooling with patch size of 2x2 and stride of 2x2, include all batches and channels
        maxpool = tf.nn.max_pool(conv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(maxpool + layer1_biases)
        # Using dropout - NEW
        #hidden = tf.nn.dropout(hidden, dropout_prob)
        print(hidden.get_shape().as_list())
        #
        # Hidden layer 2
        #conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
        #hidden = tf.nn.relu(conv + layer2_biases)
        conv = tf.nn.conv2d(hidden, layer2_weights, [1, 1, 1, 1], padding='SAME')
        maxpool = tf.nn.max_pool(conv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(maxpool + layer2_biases)
        # Using dropout - NEW
        #hidden = tf.nn.dropout(hidden, dropout_prob)
        shape = hidden.get_shape().as_list()
        print(shape)
        #
        # Fully connected final layer: batch_size x total_features
        # Rollout height, width and feature_map into total features
        reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
        hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
        # Using dropout - NEW
        hidden = tf.nn.dropout(hidden, dropout_prob)
        return tf.matmul(hidden, layer4_weights) + layer4_biases

    # Training computation.
    logits = model(tf_train_dataset)
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
    
    # Use learning rate decay - NEW
    # Returns the decayed learning rate = learning_rate * decay_rate ^(global_step/decay_steps)
    # tf.train.exponential_decay(learning_rate,global_step,decay_steps,decay_rate)
    starter_learning_rate = 0.1
    global_step = tf.Variable(0)
    decay_step = 0.90
    learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step, 10000, decay_step)
    
    # Optimizer - NEW
    #optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
  
    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model(tf_test_dataset))


Problem2
[16, 28, 28, 1]
[16, 14, 14, 16]
[16, 7, 7, 16]
[10000, 28, 28, 1]
[10000, 14, 14, 16]
[10000, 7, 7, 16]
[10000, 28, 28, 1]
[10000, 14, 14, 16]
[10000, 7, 7, 16]

In [6]:
num_steps = 1001

with tf.Session(graph=graph) as session:
    tf.initialize_all_variables().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run(
          [optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 50 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
            print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))


Initialized
Minibatch loss at step 0: 6.041399
Minibatch accuracy: 31.2%
Validation accuracy: 10.5%
Minibatch loss at step 50: 2.313320
Minibatch accuracy: 12.5%
Validation accuracy: 16.3%
Minibatch loss at step 100: 1.877945
Minibatch accuracy: 25.0%
Validation accuracy: 26.6%
Minibatch loss at step 150: 2.586305
Minibatch accuracy: 31.2%
Validation accuracy: 31.6%
Minibatch loss at step 200: 2.224568
Minibatch accuracy: 43.8%
Validation accuracy: 38.2%
Minibatch loss at step 250: 1.665021
Minibatch accuracy: 37.5%
Validation accuracy: 48.0%
Minibatch loss at step 300: 0.971591
Minibatch accuracy: 62.5%
Validation accuracy: 53.9%
Minibatch loss at step 350: 1.527162
Minibatch accuracy: 31.2%
Validation accuracy: 55.0%
Minibatch loss at step 400: 1.533419
Minibatch accuracy: 43.8%
Validation accuracy: 56.6%
Minibatch loss at step 450: 1.882622
Minibatch accuracy: 56.2%
Validation accuracy: 57.2%
Minibatch loss at step 500: 0.756343
Minibatch accuracy: 81.2%
Validation accuracy: 62.3%
Minibatch loss at step 550: 1.097952
Minibatch accuracy: 62.5%
Validation accuracy: 66.5%
Minibatch loss at step 600: 1.277898
Minibatch accuracy: 56.2%
Validation accuracy: 64.5%
Minibatch loss at step 650: 1.713371
Minibatch accuracy: 56.2%
Validation accuracy: 64.0%
Minibatch loss at step 700: 0.681067
Minibatch accuracy: 81.2%
Validation accuracy: 68.6%
Minibatch loss at step 750: 0.581509
Minibatch accuracy: 87.5%
Validation accuracy: 68.0%
Minibatch loss at step 800: 0.869155
Minibatch accuracy: 62.5%
Validation accuracy: 69.9%
Minibatch loss at step 850: 0.830149
Minibatch accuracy: 62.5%
Validation accuracy: 65.3%
Minibatch loss at step 900: 0.743215
Minibatch accuracy: 62.5%
Validation accuracy: 70.3%
Minibatch loss at step 950: 1.233025
Minibatch accuracy: 62.5%
Validation accuracy: 72.2%
Minibatch loss at step 1000: 0.956788
Minibatch accuracy: 68.8%
Validation accuracy: 70.8%
Test accuracy: 77.0%

In [ ]: