Fully Connected Naural Networks + Convolutions

Previously we trained fully connected networks to classify notMNIST characters. Let's make the neural network convolutional.

Suggested Reading

A guide to convolution arithmetic for deep learning

No padding, no strides Arbitrary padding, no strides Half padding, no strides Full padding, no strides
No padding, no strides, transposed Arbitrary padding, no strides, transposed Half padding, no strides, transposed Full padding, no strides, transposed
No padding, strides Padding, strides Padding, strides (odd)
No padding, strides, transposed Padding, strides, transposed Padding, strides, transposed (odd)
No padding, no stride, dilation

Preliminaries


In [3]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

In [4]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)


Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)

Reformat into a TensorFlow-friendly shape:

  • convolutions need the image data formatted as a cube (width by height by #channels)
  • labels as float 1-hot encodings.

In [5]:
image_size = 28
num_labels = 10
num_channels = 1 # grayscale

import numpy as np

def reformat(dataset, labels):
  dataset = dataset.reshape(
    (-1, image_size, image_size, num_channels)).astype(np.float32)
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)


Training set (200000, 28, 28, 1) (200000, 10)
Validation set (10000, 28, 28, 1) (10000, 10)
Test set (10000, 28, 28, 1) (10000, 10)

In [6]:
def accuracy(predictions, labels):
  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

Relationship between input size, zero padding, strides and kernel size

Let's consider

  • 2-D discrete convolutions (N = 2),
  • square inputs (i1 = i2 = i),
  • square kernel size (k1 = k2 = k),
  • same strides along both axes (s1 = s2 = s),
  • same zero padding along both axes (p1 = p2 = p)

Convolution arithmetic For any $i$, $k$, $p$ and $s$,

$$o =\lfloor{\frac{i+2p-k}{s}}\rfloor+1$$

Pooling arithmetic For any $i$, $k$, $p$

$$o =\lfloor{\frac{i-k}{s}}\rfloor+1$$

See Suggested Reading for further details.


In [11]:
import math 

def out_conv(i,p,k,s):
    assert s > 0 
    return math.floor((i+2*p-k)/s)+1

def out_pool(i,k,s):
    return out_conv(i,0,k,s)

In [13]:
### VALID padding - unit stride  
out_conv(28,0,3,1)


Out[13]:
26

In [14]:
### SAME padding - unit stride 
out_conv(28,1,3,1)


Out[14]:
28

In [15]:
### VALID padding - double stride 
out_conv(28,0,3,2)


Out[15]:
13

Network with two convolutional layers followed by one fully connected layer

Let's build a small network with two convolutional layers, followed by one fully connected layer. Convolutional networks are more expensive computationally, so we'll limit its depth and number of fully connected nodes.


In [59]:
#batch_size = 16
batch_size = 128
patch_size = 5
depth = 16
num_hidden = 128

In [60]:
### First Convolutional layer: SAME padding - 2 stride 
out_conv(28,2,patch_size,2)


Out[60]:
14

In [61]:
### Second Convolutional layer: SAME padding - 2 stride 
out_conv(14,2,patch_size,2)


Out[61]:
7

In [62]:
### Reshape - as in code 
image_size // 4 * image_size // 4 * depth


Out[62]:
784

In [63]:
### Reshape - as per previous considerations 
7*7*depth


Out[63]:
784

In [64]:
graph = tf.Graph()

with graph.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))

  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  
  # Variables.
  layer1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth], stddev=0.1))
  layer1_biases = tf.Variable(tf.zeros([depth]))

  layer2_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth, depth], stddev=0.1))
  layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))

  layer3_weights = tf.Variable(tf.truncated_normal([image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
  layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))

  layer4_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev=0.1))
  layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
  
  # Model.
  def model(data):
    print("\n>>> data:"+str(data.get_shape().as_list()))
    
    conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
    print("conv1:"+str(conv.get_shape().as_list()))
    hidden = tf.nn.relu(conv + layer1_biases)
    print("hidden1:"+str(hidden.get_shape().as_list()))
    
    conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
    print("conv2:"+str(conv.get_shape().as_list()))
    hidden = tf.nn.relu(conv + layer2_biases)
    print("hidden2:"+str(hidden.get_shape().as_list()))
    
    shape = hidden.get_shape().as_list()
    reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
    hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
    print("hidden3:"+str(hidden.get_shape().as_list()))
    
    out = tf.matmul(hidden, layer4_weights) + layer4_biases
    print("out:"+str(out.get_shape().as_list()))
    return out
  
  # Training computation.
  logits = model(tf_train_dataset)
  loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
    
  # Optimizer.
  #optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
  global_step = tf.Variable(0)  # count the number of steps taken.
  learning_rate = tf.train.exponential_decay(0.05, global_step, 100000, 0.96, staircase=True)
  optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))


>>> data:[128, 28, 28, 1]
conv1:[128, 14, 14, 16]
hidden1:[128, 14, 14, 16]
conv2:[128, 7, 7, 16]
hidden2:[128, 7, 7, 16]
hidden3:[128, 128]
out:[128, 10]

>>> data:[10000, 28, 28, 1]
conv1:[10000, 14, 14, 16]
hidden1:[10000, 14, 14, 16]
conv2:[10000, 7, 7, 16]
hidden2:[10000, 7, 7, 16]
hidden3:[10000, 128]
out:[10000, 10]

>>> data:[10000, 28, 28, 1]
conv1:[10000, 14, 14, 16]
hidden1:[10000, 14, 14, 16]
conv2:[10000, 7, 7, 16]
hidden2:[10000, 7, 7, 16]
hidden3:[10000, 128]
out:[10000, 10]

In [65]:
num_steps = 3001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    
    batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
    
    if (step % 500 == 0):
      print('\nMinibatch loss at step %d: %f' % (step, l))
      print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
      print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))
      print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))


Initialized

Minibatch loss at step 0: 4.356413
Minibatch accuracy: 10.9%
Validation accuracy: 10.0%
Test accuracy: 10.0%

Minibatch loss at step 500: 0.454277
Minibatch accuracy: 86.7%
Validation accuracy: 83.0%
Test accuracy: 89.8%

Minibatch loss at step 1000: 0.595456
Minibatch accuracy: 80.5%
Validation accuracy: 84.9%
Test accuracy: 91.4%

Minibatch loss at step 1500: 0.328296
Minibatch accuracy: 90.6%
Validation accuracy: 85.7%
Test accuracy: 92.2%

Minibatch loss at step 2000: 0.327264
Minibatch accuracy: 90.6%
Validation accuracy: 86.7%
Test accuracy: 93.0%

Minibatch loss at step 2500: 0.394724
Minibatch accuracy: 86.7%
Validation accuracy: 87.3%
Test accuracy: 93.4%

Minibatch loss at step 3000: 0.401665
Minibatch accuracy: 87.5%
Validation accuracy: 87.2%
Test accuracy: 93.3%

Classical Convolutional Architecture

Let's implement this:

  1. Image
  2. Convolution
  3. Max Pooling
  4. Convolution
  5. Max Pooling
  6. Fully Connected
  7. Classifier

In [78]:
def maxpool2d(x, k=2,padding='SAME'):
    # MaxPool2D wrapper
    return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],padding=padding)

def conv2d(x, W, b, strides=1,padding='SAME'):
    # Conv2D wrapper, with bias and relu activation
    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding=padding)
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x)


graph = tf.Graph()

with graph.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))

  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  
  # Convolution 1 
  layer1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth], stddev=0.1))
  layer1_biases = tf.Variable(tf.zeros([depth]))

  layer2_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth, depth], stddev=0.1))
  layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))

  layer3_weights = tf.Variable(tf.truncated_normal([image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
  layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))

  layer4_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev=0.1))
  layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
  
  # Model.
  def model(data):
    print("\n>>> data:"+str(data.get_shape().as_list()))
    
    conv = conv2d(data, layer1_weights , layer1_biases, strides=1,padding='SAME')
    print("conv1:"+str(conv.get_shape().as_list()))
    mp = maxpool2d(conv, k=2)
    print("max-pooling1:"+str(mp.get_shape().as_list()))
    
    conv = conv2d(mp, layer2_weights , layer2_biases, strides=1,padding='SAME')
    print("conv2:"+str(conv.get_shape().as_list()))
    mp = maxpool2d(conv, k=2)
    print("max-pooling1:"+str(mp.get_shape().as_list()))
    
    shape = mp.get_shape().as_list()
    reshape = tf.reshape(mp, [shape[0], shape[1] * shape[2] * shape[3]])
    hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
    print("hidden:"+str(hidden.get_shape().as_list()))
    
    out = tf.matmul(hidden, layer4_weights) + layer4_biases
    print("out:"+str(out.get_shape().as_list()))
    return out
  
  # Training computation.
  logits = model(tf_train_dataset)
  loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
    
  # Optimizer.
  #optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
  global_step = tf.Variable(0)  # count the number of steps taken.
  learning_rate = tf.train.exponential_decay(0.05, global_step, 100000, 0.96, staircase=True)
  optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))


>>> data:[128, 28, 28, 1]
conv1:[128, 28, 28, 16]
max-pooling1:[128, 14, 14, 16]
conv2:[128, 14, 14, 16]
max-pooling1:[128, 7, 7, 16]
hidden:[128, 128]
out:[128, 10]

>>> data:[10000, 28, 28, 1]
conv1:[10000, 28, 28, 16]
max-pooling1:[10000, 14, 14, 16]
conv2:[10000, 14, 14, 16]
max-pooling1:[10000, 7, 7, 16]
hidden:[10000, 128]
out:[10000, 10]

>>> data:[10000, 28, 28, 1]
conv1:[10000, 28, 28, 16]
max-pooling1:[10000, 14, 14, 16]
conv2:[10000, 14, 14, 16]
max-pooling1:[10000, 7, 7, 16]
hidden:[10000, 128]
out:[10000, 10]

In [75]:
num_steps = 3001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    
    batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
    
    if (step % 500 == 0):
      print('\nMinibatch loss at step %d: %f' % (step, l))
      print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
      print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))
      print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))


Initialized

Minibatch loss at step 0: 4.662801
Minibatch accuracy: 5.5%
Validation accuracy: 15.2%
Test accuracy: 16.0%

Minibatch loss at step 500: 0.384480
Minibatch accuracy: 88.3%
Validation accuracy: 84.3%
Test accuracy: 91.1%

Minibatch loss at step 1000: 0.581485
Minibatch accuracy: 84.4%
Validation accuracy: 85.9%
Test accuracy: 92.5%

Minibatch loss at step 1500: 0.284368
Minibatch accuracy: 92.2%
Validation accuracy: 87.0%
Test accuracy: 93.6%

Minibatch loss at step 2000: 0.312073
Minibatch accuracy: 90.6%
Validation accuracy: 87.7%
Test accuracy: 94.2%

Minibatch loss at step 2500: 0.333975
Minibatch accuracy: 89.8%
Validation accuracy: 88.1%
Test accuracy: 94.3%

Minibatch loss at step 3000: 0.381347
Minibatch accuracy: 88.3%
Validation accuracy: 88.3%
Test accuracy: 94.6%

Let's add one more fully connected layer at the end

Let's implement this:

  1. Image
  2. Convolution
  3. Max Pooling
  4. Convolution
  5. Max Pooling
  6. Fully Connected
  7. Fully Connected
  8. Classifier

In [76]:
graph = tf.Graph()

with graph.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))

  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  
  # Convolution 1 
  layer1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth], stddev=0.1))
  layer1_biases = tf.Variable(tf.zeros([depth]))

  layer2_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth, depth], stddev=0.1))
  layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))

  layer3_weights = tf.Variable(tf.truncated_normal([image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
  layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))

  layer4_weights = tf.Variable(tf.truncated_normal([num_hidden, num_hidden], stddev=0.1))
  layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))

  layer5_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev=0.1))
  layer5_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
  
  # Model.
  def model(data):
    print("\n>>> data:"+str(data.get_shape().as_list()))
    
    conv = conv2d(data, layer1_weights , layer1_biases, strides=1,padding='SAME')
    print("conv1:"+str(conv.get_shape().as_list()))
    mp = maxpool2d(conv, k=2)
    print("max-pooling1:"+str(mp.get_shape().as_list()))
    
    conv = conv2d(mp, layer2_weights , layer2_biases, strides=1,padding='SAME')
    print("conv2:"+str(conv.get_shape().as_list()))
    mp = maxpool2d(conv, k=2)
    print("max-pooling1:"+str(mp.get_shape().as_list()))
    
    shape = mp.get_shape().as_list()
    reshape = tf.reshape(mp, [shape[0], shape[1] * shape[2] * shape[3]])
    hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
    print("hidden:"+str(hidden.get_shape().as_list()))
    
    hidden = tf.nn.relu(tf.matmul(hidden, layer4_weights) + layer4_biases)
    print("hidden:"+str(hidden.get_shape().as_list()))
    
    out = tf.matmul(hidden, layer5_weights) + layer5_biases
    print("out:"+str(out.get_shape().as_list()))
    return out
  
  # Training computation.
  logits = model(tf_train_dataset)
  loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
    
  # Optimizer.
  #optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
  global_step = tf.Variable(0)  # count the number of steps taken.
  learning_rate = tf.train.exponential_decay(0.05, global_step, 100000, 0.96, staircase=True)
  optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))


>>> data:[128, 28, 28, 1]
conv1:[128, 28, 28, 16]
max-pooling1:[128, 14, 14, 16]
conv2:[128, 14, 14, 16]
max-pooling1:[128, 7, 7, 16]
hidden:[128, 128]
hidden:[128, 128]
out:[128, 10]

>>> data:[10000, 28, 28, 1]
conv1:[10000, 28, 28, 16]
max-pooling1:[10000, 14, 14, 16]
conv2:[10000, 14, 14, 16]
max-pooling1:[10000, 7, 7, 16]
hidden:[10000, 128]
hidden:[10000, 128]
out:[10000, 10]

>>> data:[10000, 28, 28, 1]
conv1:[10000, 28, 28, 16]
max-pooling1:[10000, 14, 14, 16]
conv2:[10000, 14, 14, 16]
max-pooling1:[10000, 7, 7, 16]
hidden:[10000, 128]
hidden:[10000, 128]
out:[10000, 10]

In [77]:
num_steps = 3001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    
    batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
    
    if (step % 500 == 0):
      print('\nMinibatch loss at step %d: %f' % (step, l))
      print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
      print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))
      print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))


Initialized

Minibatch loss at step 0: 4.579202
Minibatch accuracy: 11.7%
Validation accuracy: 10.0%
Test accuracy: 10.0%

Minibatch loss at step 500: 0.372589
Minibatch accuracy: 89.8%
Validation accuracy: 84.6%
Test accuracy: 91.4%

Minibatch loss at step 1000: 0.547572
Minibatch accuracy: 82.0%
Validation accuracy: 86.1%
Test accuracy: 92.6%

Minibatch loss at step 1500: 0.315100
Minibatch accuracy: 90.6%
Validation accuracy: 87.0%
Test accuracy: 93.5%

Minibatch loss at step 2000: 0.283378
Minibatch accuracy: 93.8%
Validation accuracy: 88.0%
Test accuracy: 93.9%

Minibatch loss at step 2500: 0.363731
Minibatch accuracy: 88.3%
Validation accuracy: 88.0%
Test accuracy: 94.2%

Minibatch loss at step 3000: 0.385576
Minibatch accuracy: 88.3%
Validation accuracy: 88.2%
Test accuracy: 94.5%

Let's add yet another fully connected layer at the end

Let's implement this:

  1. Image
  2. Convolution
  3. Max Pooling
  4. Convolution
  5. Max Pooling
  6. Fully Connected
  7. Fully Connected
  8. Fully Connected
  9. Classifier

In [79]:
graph = tf.Graph()

with graph.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))

  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  
  # Convolution 1 
  layer1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth], stddev=0.1))
  layer1_biases = tf.Variable(tf.zeros([depth]))

  layer2_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth, depth], stddev=0.1))
  layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))

  layer3_weights = tf.Variable(tf.truncated_normal([image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
  layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))

  layer4_weights = tf.Variable(tf.truncated_normal([num_hidden, num_hidden], stddev=0.1))
  layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))

  layer5_weights = tf.Variable(tf.truncated_normal([num_hidden, num_hidden], stddev=0.1))
  layer5_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))

  layer6_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev=0.1))
  layer6_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
  
  # Model.
  def model(data):
    print("\n>>> data:"+str(data.get_shape().as_list()))
    
    conv = conv2d(data, layer1_weights , layer1_biases, strides=1,padding='SAME')
    print("conv1:"+str(conv.get_shape().as_list()))
    mp = maxpool2d(conv, k=2)
    print("max-pooling1:"+str(mp.get_shape().as_list()))
    
    conv = conv2d(mp, layer2_weights , layer2_biases, strides=1,padding='SAME')
    print("conv2:"+str(conv.get_shape().as_list()))
    mp = maxpool2d(conv, k=2)
    print("max-pooling1:"+str(mp.get_shape().as_list()))
    
    shape = mp.get_shape().as_list()
    reshape = tf.reshape(mp, [shape[0], shape[1] * shape[2] * shape[3]])
    
    hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
    print("hidden:"+str(hidden.get_shape().as_list()))
    
    hidden = tf.nn.relu(tf.matmul(hidden, layer4_weights) + layer4_biases)
    print("hidden:"+str(hidden.get_shape().as_list()))
    
    hidden = tf.nn.relu(tf.matmul(hidden, layer5_weights) + layer5_biases)
    print("hidden:"+str(hidden.get_shape().as_list()))
    
    out = tf.matmul(hidden, layer6_weights) + layer6_biases
    print("out:"+str(out.get_shape().as_list()))
    return out
  
  # Training computation.
  logits = model(tf_train_dataset)
  loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
    
  # Optimizer.
  #optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
  global_step = tf.Variable(0)  # count the number of steps taken.
  learning_rate = tf.train.exponential_decay(0.05, global_step, 100000, 0.96, staircase=True)
  optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))


>>> data:[128, 28, 28, 1]
conv1:[128, 28, 28, 16]
max-pooling1:[128, 14, 14, 16]
conv2:[128, 14, 14, 16]
max-pooling1:[128, 7, 7, 16]
hidden:[128, 128]
hidden:[128, 128]
hidden:[128, 128]
out:[128, 10]

>>> data:[10000, 28, 28, 1]
conv1:[10000, 28, 28, 16]
max-pooling1:[10000, 14, 14, 16]
conv2:[10000, 14, 14, 16]
max-pooling1:[10000, 7, 7, 16]
hidden:[10000, 128]
hidden:[10000, 128]
hidden:[10000, 128]
out:[10000, 10]

>>> data:[10000, 28, 28, 1]
conv1:[10000, 28, 28, 16]
max-pooling1:[10000, 14, 14, 16]
conv2:[10000, 14, 14, 16]
max-pooling1:[10000, 7, 7, 16]
hidden:[10000, 128]
hidden:[10000, 128]
hidden:[10000, 128]
out:[10000, 10]

In [80]:
num_steps = 3001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    
    batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
    
    if (step % 500 == 0):
      print('\nMinibatch loss at step %d: %f' % (step, l))
      print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
      print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))
      print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))


Initialized

Minibatch loss at step 0: 4.244905
Minibatch accuracy: 10.9%
Validation accuracy: 8.3%
Test accuracy: 8.0%

Minibatch loss at step 500: 0.420338
Minibatch accuracy: 86.7%
Validation accuracy: 84.2%
Test accuracy: 91.0%

Minibatch loss at step 1000: 0.599001
Minibatch accuracy: 80.5%
Validation accuracy: 86.0%
Test accuracy: 92.4%

Minibatch loss at step 1500: 0.285010
Minibatch accuracy: 91.4%
Validation accuracy: 87.1%
Test accuracy: 93.5%

Minibatch loss at step 2000: 0.278693
Minibatch accuracy: 93.0%
Validation accuracy: 87.8%
Test accuracy: 94.4%

Minibatch loss at step 2500: 0.317523
Minibatch accuracy: 90.6%
Validation accuracy: 88.3%
Test accuracy: 94.5%

Minibatch loss at step 3000: 0.342140
Minibatch accuracy: 85.9%
Validation accuracy: 88.7%
Test accuracy: 94.5%

Dropout

Adding further fully connected layers doesn't seem useful. Hence, let's add a dropout layer to the convolutional architecture with one fully connected layer.


In [103]:
graph = tf.Graph()

with graph.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))

  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  
  # Convolution 1 
  layer1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth], stddev=0.1))
  layer1_biases = tf.Variable(tf.zeros([depth]))

  layer2_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth, depth], stddev=0.1))
  layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))

  layer3_weights = tf.Variable(tf.truncated_normal([image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
  layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))

  layer4_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev=0.1))
  layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
  
  # Model.
  def model(data,dropout):
    print("\n>>> data:"+str(data.get_shape().as_list()))
    
    conv = conv2d(data, layer1_weights , layer1_biases, strides=1,padding='SAME')
    print("conv1:"+str(conv.get_shape().as_list()))
    mp = maxpool2d(conv, k=2)
    print("max-pooling1:"+str(mp.get_shape().as_list()))
    
    conv = conv2d(mp, layer2_weights , layer2_biases, strides=1,padding='SAME')
    print("conv2:"+str(conv.get_shape().as_list()))
    mp = maxpool2d(conv, k=2)
    print("max-pooling1:"+str(mp.get_shape().as_list()))
    
    shape = mp.get_shape().as_list()
    reshape = tf.reshape(mp, [shape[0], shape[1] * shape[2] * shape[3]])
    hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
    print("hidden:"+str(hidden.get_shape().as_list()))
    
    if dropout>0:
      hidden = tf.nn.dropout(hidden, dropout)
    
    out = tf.matmul(hidden, layer4_weights) + layer4_biases
    print("out:"+str(out.get_shape().as_list()))
    return out
  
  # Training computation.
  logits = model(tf_train_dataset,0.5)
  loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
    
  # Optimizer.
  #optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
  global_step = tf.Variable(0)  # count the number of steps taken.
  learning_rate = tf.train.exponential_decay(0.05, global_step, 100000, 0.96, staircase=True)
  optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset,0))
  test_prediction = tf.nn.softmax(model(tf_test_dataset,0))


>>> data:[128, 28, 28, 1]
conv1:[128, 28, 28, 16]
max-pooling1:[128, 14, 14, 16]
conv2:[128, 14, 14, 16]
max-pooling1:[128, 7, 7, 16]
hidden:[128, 128]
out:[128, 10]

>>> data:[10000, 28, 28, 1]
conv1:[10000, 28, 28, 16]
max-pooling1:[10000, 14, 14, 16]
conv2:[10000, 14, 14, 16]
max-pooling1:[10000, 7, 7, 16]
hidden:[10000, 128]
out:[10000, 10]

>>> data:[10000, 28, 28, 1]
conv1:[10000, 28, 28, 16]
max-pooling1:[10000, 14, 14, 16]
conv2:[10000, 14, 14, 16]
max-pooling1:[10000, 7, 7, 16]
hidden:[10000, 128]
out:[10000, 10]

In [104]:
num_steps = 3001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    
    batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
    
    if (step % 500 == 0):
      print('\nMinibatch loss at step %d: %f' % (step, l))
      print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
      print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))
      print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))


Initialized

Minibatch loss at step 0: 5.376145
Minibatch accuracy: 4.7%
Validation accuracy: 10.0%
Test accuracy: 10.1%

Minibatch loss at step 500: 0.503859
Minibatch accuracy: 84.4%
Validation accuracy: 82.1%
Test accuracy: 89.4%

Minibatch loss at step 1000: 0.673074
Minibatch accuracy: 78.9%
Validation accuracy: 84.5%
Test accuracy: 91.4%

Minibatch loss at step 1500: 0.360929
Minibatch accuracy: 85.2%
Validation accuracy: 85.8%
Test accuracy: 92.5%

Minibatch loss at step 2000: 0.379078
Minibatch accuracy: 89.1%
Validation accuracy: 86.6%
Test accuracy: 93.2%

Minibatch loss at step 2500: 0.450356
Minibatch accuracy: 88.3%
Validation accuracy: 86.7%
Test accuracy: 93.3%

Minibatch loss at step 3000: 0.541036
Minibatch accuracy: 82.8%
Validation accuracy: 87.2%
Test accuracy: 93.7%

It did not work. Probably we should evaluate a larger network architecture.

Conclusions

  • Network with two convolutional layers followed by one fully connected layer: Test accuracy: 93.3%
  • Classical Convolutional Architecture: Test accuracy: 94.6%
  • Let's add one more fully connected layer at the end: Test accuracy: 94.5%
  • Let's add yet another fully connected layer at the end: Test accuracy: 94.5%
  • Dropout: Test accuracy: 93.7%