Batch normalization is most useful when building deep neural networks. To demonstrate this, we'll create a convolutional neural network with 20 convolutional layers, followed by a fully connected layer. We'll use it to classify handwritten digits in the MNIST dataset, which should be familiar to you by now.
This is not a good network for classfying MNIST digits. You could create a much simpler network and get better results. However, to give you hands-on experience with batch normalization, we had to make an example that was:
This notebook includes two versions of the network that you can edit. The first uses higher level functions from the tf.layers
package. The second is the same network, but uses only lower level functions in the tf.nn
package.
The following cell loads TensorFlow, downloads the MNIST dataset if necessary, and loads it into an object named mnist
. You'll need to run this cell before running anything else in the notebook.
In [1]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True, reshape=False)
tf.layers.batch_normalization
This version of the network uses tf.layers
for almost everything, and expects you to implement batch normalization using tf.layers.batch_normalization
We'll use the following function to create fully connected layers in our network. We'll create them with the specified number of neurons and a ReLU activation function.
This version of the function does not include batch normalization.
In [2]:
def fully_connected(prev_layer, num_units):
"""
Create a fully connectd layer with the given layer as input and the given number of neurons.
:param prev_layer: Tensor
The Tensor that acts as input into this layer
:param num_units: int
The size of the layer. That is, the number of units, nodes, or neurons.
:returns Tensor
A new fully connected layer
"""
layer = tf.layers.dense(prev_layer, num_units, activation=tf.nn.relu)
return layer
We'll use the following function to create convolutional layers in our network. They are very basic: we're always using a 3x3 kernel, ReLU activation functions, strides of 1x1 on layers with odd depths, and strides of 2x2 on layers with even depths. We aren't bothering with pooling layers at all in this network.
This version of the function does not include batch normalization.
In [3]:
def conv_layer(prev_layer, layer_depth):
"""
Create a convolutional layer with the given layer as input.
:param prev_layer: Tensor
The Tensor that acts as input into this layer
:param layer_depth: int
We'll set the strides and number of feature maps based on the layer's depth in the network.
This is *not* a good way to make a CNN, but it helps us create this example with very little code.
:returns Tensor
A new convolutional layer
"""
strides = 2 if layer_depth % 3 == 0 else 1
conv_layer = tf.layers.conv2d(prev_layer, layer_depth*4, 3, strides, 'same', activation=tf.nn.relu)
return conv_layer
This cell builds the network without batch normalization, then trains it on the MNIST dataset. It displays loss and accuracy data periodically while training.
In [4]:
def train(num_batches, batch_size, learning_rate):
# Build placeholders for the input samples and labels
inputs = tf.placeholder(tf.float32, [None, 28, 28, 1])
labels = tf.placeholder(tf.float32, [None, 10])
# Feed the inputs into a series of 20 convolutional layers
layer = inputs
for layer_i in range(1, 20):
layer = conv_layer(layer, layer_i)
# Flatten the output from the convolutional layers
orig_shape = layer.get_shape().as_list()
layer = tf.reshape(layer, shape=[-1, orig_shape[1] * orig_shape[2] * orig_shape[3]])
# Add one fully connected layer
layer = fully_connected(layer, 100)
# Create the output layer with 1 node for each
logits = tf.layers.dense(layer, 10)
# Define loss and training operations
model_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=labels))
train_opt = tf.train.AdamOptimizer(learning_rate).minimize(model_loss)
# Create operations to test accuracy
correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(labels,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Train and test the network
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for batch_i in range(num_batches):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
# train this batch
sess.run(train_opt, {inputs: batch_xs, labels: batch_ys})
# Periodically check the validation or training loss and accuracy
if batch_i % 100 == 0:
loss, acc = sess.run([model_loss, accuracy], {inputs: mnist.validation.images,
labels: mnist.validation.labels})
print('Batch: {:>2}: Validation loss: {:>3.5f}, Validation accuracy: {:>3.5f}'.format(batch_i, loss, acc))
elif batch_i % 25 == 0:
loss, acc = sess.run([model_loss, accuracy], {inputs: batch_xs, labels: batch_ys})
print('Batch: {:>2}: Training loss: {:>3.5f}, Training accuracy: {:>3.5f}'.format(batch_i, loss, acc))
# At the end, score the final accuracy for both the validation and test sets
acc = sess.run(accuracy, {inputs: mnist.validation.images,
labels: mnist.validation.labels})
print('Final validation accuracy: {:>3.5f}'.format(acc))
acc = sess.run(accuracy, {inputs: mnist.test.images,
labels: mnist.test.labels})
print('Final test accuracy: {:>3.5f}'.format(acc))
# Score the first 100 test images individually. This won't work if batch normalization isn't implemented correctly.
correct = 0
for i in range(100):
correct += sess.run(accuracy,feed_dict={inputs: [mnist.test.images[i]],
labels: [mnist.test.labels[i]]})
print("Accuracy on 100 samples:", correct/100)
num_batches = 800
batch_size = 64
learning_rate = 0.002
tf.reset_default_graph()
with tf.Graph().as_default():
train(num_batches, batch_size, learning_rate)
With this many layers, it's going to take a lot of iterations for this network to learn. By the time you're done training these 800 batches, your final test and validation accuracies probably won't be much better than 10%. (It will be different each time, but will most likely be less than 15%.)
Using batch normalization, you'll be able to train this same network to over 90% in that same number of batches.
We've copied the previous three cells to get you started. Edit these cells to add batch normalization to the network. For this exercise, you should use tf.layers.batch_normalization
to handle most of the math, but you'll need to make a few other changes to your network to integrate batch normalization.
In [5]:
def fully_connected(prev_layer, num_units, is_training):
"""
Create a fully connectd layer with the given layer as input and the given number of neurons.
:param prev_layer: Tensor
The Tensor that acts as input into this layer
:param num_units: int
The size of the layer. That is, the number of units, nodes, or neurons.
:returns Tensor
A new fully connected layer
"""
logits = tf.layers.dense(prev_layer, num_units, use_bias=False)
batch_normalized_output = tf.layers.batch_normalization(logits, training=is_training)
return tf.nn.relu(batch_normalized_output)
In [6]:
def conv_layer(prev_layer, layer_depth, is_training):
"""
Create a convolutional layer with the given layer as input.
:param prev_layer: Tensor
The Tensor that acts as input into this layer
:param layer_depth: int
We'll set the strides and number of feature maps based on the layer's depth in the network.
This is *not* a good way to make a CNN, but it helps us create this example with very little code.
:returns Tensor
A new convolutional layer
"""
strides = 2 if layer_depth % 3 == 0 else 1
conv_logits = tf.layers.conv2d(prev_layer, layer_depth*4, 3, strides, 'same', use_bias=False)
batch_normalized_output = tf.layers.batch_normalization(conv_logits, training=is_training)
return tf.nn.relu(batch_normalized_output)
In [7]:
def train(num_batches, batch_size, learning_rate):
# Build placeholders for the input samples and labels
inputs = tf.placeholder(tf.float32, [None, 28, 28, 1])
labels = tf.placeholder(tf.float32, [None, 10])
is_training = tf.placeholder(tf.bool, name="is_training")
# Feed the inputs into a series of 20 convolutional layers
layer = inputs
for layer_i in range(1, 20):
layer = conv_layer(layer, layer_i, is_training)
# Flatten the output from the convolutional layers
orig_shape = layer.get_shape().as_list()
layer = tf.reshape(layer, shape=[-1, orig_shape[1] * orig_shape[2] * orig_shape[3]])
# Add one fully connected layer
layer = fully_connected(layer, 100, is_training)
# Create the output layer with 1 node for each
logits = tf.layers.dense(layer, 10)
# Define loss and training operations
model_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=labels))
train_opt = tf.train.AdamOptimizer(learning_rate).minimize(model_loss)
# Tell TensorFlow to update the population statistics while training
with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
train_opt = tf.train.AdamOptimizer(learning_rate).minimize(model_loss)
# Create operations to test accuracy
correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(labels,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Train and test the network
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for batch_i in range(num_batches):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
# train this batch
sess.run(train_opt, {inputs: batch_xs, labels: batch_ys, is_training: True})
# Periodically check the validation or training loss and accuracy
if batch_i % 100 == 0:
loss, acc = sess.run([model_loss, accuracy], {inputs: mnist.validation.images,
labels: mnist.validation.labels,
is_training: False})
print('Batch: {:>2}: Validation loss: {:>3.5f}, Validation accuracy: {:>3.5f}'.format(batch_i, loss, acc))
elif batch_i % 25 == 0:
loss, acc = sess.run([model_loss, accuracy], {inputs: batch_xs,
labels: batch_ys,
is_training: False})
print('Batch: {:>2}: Training loss: {:>3.5f}, Training accuracy: {:>3.5f}'.format(batch_i, loss, acc))
# At the end, score the final accuracy for both the validation and test sets
acc = sess.run(accuracy, {inputs: mnist.validation.images,
labels: mnist.validation.labels,
is_training: False})
print('Final validation accuracy: {:>3.5f}'.format(acc))
acc = sess.run(accuracy, {inputs: mnist.test.images,
labels: mnist.test.labels,
is_training: False})
print('Final test accuracy: {:>3.5f}'.format(acc))
# Score the first 100 test images individually. This won't work if batch normalization isn't implemented correctly.
correct = 0
for i in range(100):
correct += sess.run(accuracy,feed_dict={inputs: [mnist.test.images[i]],
labels: [mnist.test.labels[i]],
is_training: False})
print("Accuracy on 100 samples:", correct/100)
num_batches = 800
batch_size = 64
learning_rate = 0.002
tf.reset_default_graph()
with tf.Graph().as_default():
train(num_batches, batch_size, learning_rate)
With batch normalization, you should now get an accuracy over 90%. Notice also the last line of the output: Accuracy on 100 samples
. If this value is low while everything else looks good, that means you did not implement batch normalization correctly. Specifically, it means you either did not calculate the population mean and variance while training, or you are not using those values during inference.
tf.nn.batch_normalization
Most of the time you will be able to use higher level functions exclusively, but sometimes you may want to work at a lower level. For example, if you ever want to implement a new feature – something new enough that TensorFlow does not already include a high-level implementation of it, like batch normalization in an LSTM – then you may need to know these sorts of things.
This version of the network uses tf.nn
for almost everything, and expects you to implement batch normalization using tf.nn.batch_normalization
.
In [8]:
def fully_connected(prev_layer, num_units, is_training):
"""
Create a fully connectd layer with the given layer as input and the given number of neurons.
:param prev_layer: Tensor
The Tensor that acts as input into this layer
:param num_units: int
The size of the layer. That is, the number of units, nodes, or neurons.
:returns Tensor
A new fully connected layer
"""
logits = tf.layers.dense(prev_layer, num_units, use_bias=False)
beta_offset = tf.Variable(tf.zeros([num_units]))
gamma_scaling = tf.Variable(tf.ones([num_units]))
pop_mean = tf.Variable(tf.zeros([num_units]), trainable=False)
pop_var = tf.Variable(tf.ones([num_units]), trainable=False)
def batch_norm_training():
batch_mean, batch_var = tf.nn.moments(logits, [0])
decay = 0.99
train_mean = tf.assign(pop_mean,
pop_mean * decay + batch_mean * (1 - decay))
train_var = tf.assign(pop_var,
pop_var * decay + batch_var * (1 - decay))
with tf.control_dependencies([train_mean, train_var]):
return tf.nn.batch_normalization(logits, batch_mean, batch_var,
beta_offset, gamma_scaling, 1e-3)
def batch_norm_inference():
return tf.nn.batch_normalization(logits, pop_mean, pop_var,
beta_offset, gamma_scaling, 1e-3)
batch_normalized_output = tf.cond(is_training, batch_norm_training,
batch_norm_inference)
return tf.nn.relu(batch_normalized_output)
In [9]:
def conv_layer(prev_layer, layer_depth, is_training):
"""
Create a convolutional layer with the given layer as input.
:param prev_layer: Tensor
The Tensor that acts as input into this layer
:param layer_depth: int
We'll set the strides and number of feature maps based on the layer's depth in the network.
This is *not* a good way to make a CNN, but it helps us create this example with very little code.
:returns Tensor
A new convolutional layer
"""
strides = 2 if layer_depth % 3 == 0 else 1
in_channels = prev_layer.get_shape().as_list()[3]
wid = prev_layer.get_shape().as_list()[1]
hei = prev_layer.get_shape().as_list()[2]
out_channels = layer_depth * 4
weights = tf.Variable(
tf.truncated_normal([3, 3, in_channels, out_channels], stddev=0.05))
bias = tf.Variable(tf.zeros(out_channels))
conv_layer = tf.nn.conv2d(
prev_layer, weights, strides=[1, strides, strides, 1], padding='SAME')
beta_offset = tf.Variable(tf.zeros(conv_layer.get_shape().as_list()[1:]))
gamma_scaling = tf.Variable(tf.ones(conv_layer.get_shape().as_list()[1:]))
pop_mean = tf.Variable(
tf.zeros(conv_layer.get_shape().as_list()[1:]), trainable=False)
pop_var = tf.Variable(
tf.ones(conv_layer.get_shape().as_list()[1:]), trainable=False)
def batch_norm_training():
batch_mean, batch_var = tf.nn.moments(conv_layer, [0])
decay = 0.99
train_mean = tf.assign(pop_mean,
pop_mean * decay + batch_mean * (1 - decay))
train_var = tf.assign(pop_var,
pop_var * decay + batch_var * (1 - decay))
with tf.control_dependencies([train_mean, train_var]):
return tf.nn.batch_normalization(conv_layer, batch_mean, batch_var,
beta_offset, gamma_scaling, 1e-3)
def batch_norm_inference():
return tf.nn.batch_normalization(conv_layer, pop_mean, pop_var,
beta_offset, gamma_scaling, 1e-3)
batch_normalized_output = tf.cond(is_training, batch_norm_training,
batch_norm_inference)
return tf.nn.relu(batch_normalized_output)
In [10]:
def train(num_batches, batch_size, learning_rate):
# Build placeholders for the input samples and labels
inputs = tf.placeholder(tf.float32, [None, 28, 28, 1])
labels = tf.placeholder(tf.float32, [None, 10])
is_training = tf.placeholder(tf.bool)
# Feed the inputs into a series of 20 convolutional layers
layer = inputs
for layer_i in range(1, 20):
layer = conv_layer(layer, layer_i, is_training)
# Flatten the output from the convolutional layers
orig_shape = layer.get_shape().as_list()
layer = tf.reshape(layer, shape=[-1, orig_shape[1] * orig_shape[2] * orig_shape[3]])
# Add one fully connected layer
layer = fully_connected(layer, 100, is_training)
# Create the output layer with 1 node for each
logits = tf.layers.dense(layer, 10)
# Define loss and training operations
model_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=labels))
train_opt = tf.train.AdamOptimizer(learning_rate).minimize(model_loss)
# Create operations to test accuracy
correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(labels,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Train and test the network
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for batch_i in range(num_batches):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
# train this batch
sess.run(train_opt, {inputs: batch_xs,
labels: batch_ys,
is_training: True})
# Periodically check the validation or training loss and accuracy
if batch_i % 100 == 0:
loss, acc = sess.run([model_loss, accuracy], {inputs: mnist.validation.images,
labels: mnist.validation.labels,
is_training: False})
print('Batch: {:>2}: Validation loss: {:>3.5f}, Validation accuracy: {:>3.5f}'.format(batch_i, loss, acc))
elif batch_i % 25 == 0:
loss, acc = sess.run([model_loss, accuracy], {inputs: batch_xs,
labels: batch_ys,
is_training: False})
print('Batch: {:>2}: Training loss: {:>3.5f}, Training accuracy: {:>3.5f}'.format(batch_i, loss, acc))
# At the end, score the final accuracy for both the validation and test sets
acc = sess.run(accuracy, {inputs: mnist.validation.images,
labels: mnist.validation.labels,
is_training: False})
print('Final validation accuracy: {:>3.5f}'.format(acc))
acc = sess.run(accuracy, {inputs: mnist.test.images,
labels: mnist.test.labels,
is_training: False})
print('Final test accuracy: {:>3.5f}'.format(acc))
# Score the first 100 test images individually. This won't work if batch normalization isn't implemented correctly.
correct = 0
for i in range(100):
correct += sess.run(accuracy,feed_dict={inputs: [mnist.test.images[i]],
labels: [mnist.test.labels[i]],
is_training: False})
print("Accuracy on 100 samples:", correct/100)
num_batches = 800
batch_size = 64
learning_rate = 0.002
tf.reset_default_graph()
with tf.Graph().as_default():
train(num_batches, batch_size, learning_rate)