In [71]:
import tensorflow as tf
import math
import numpy as np
How to construct a simple convo network:
In [3]:
input = tf.placeholder(tf.float32, (None, 32, 32, 3))
filter_weights = tf.Variable(tf.truncated_normal((8, 8, 3, 20))) # (height, width, input_depth, output_depth)
filter_bias = tf.Variable(tf.zeros(20))
strides = [1, 2, 2, 1] # (batch, height, width, depth)
padding = 'VALID'
conv = tf.nn.conv2d(input, filter_weights, strides, padding) + filter_bias
calculate the number of parameters of a convo layer:
H = height, W = width, D = depth
The output layer shape can be calcuated using:
new_height = (input_height - filter_height + 2 * P)/S + 1
new_width = (input_width - filter_width + 2 * P)/S + 1
In [8]:
# convo layer output layer shape:
# new_height = (input_height - filter_height + 2 * P)/S + 1
# new_width = (input_width - filter_width + 2 * P)/S + 1
((32 - 8 + 2*1) / 2 + 1), ((32 - 8 + 2*1) / 2 + 1), 20
Out[8]:
The output layer shape is: 14x14x20 (HxWxD). The new depth is equal to the number of filters, which is 20.
In [6]:
# parameters in a convo layer
(8*8*3 +1) * (14*14*20)
Out[6]:
There are 756,560 total parameters. That's a HUGE amount! Here's how we calculate it:
8 * 8 * 3
is the number of weights, we add 1 for the bias. Remember, each weight is assigned to every single part of the output (14 * 14 * 20)
. So we multiply these two numbers together and we get the final answer.
Calculate the number of parameters in the convolutional layer, if every neuron in the output layer shares its parameters with every other neuron in its same channel.
This is the number of parameters actually used in a convolution layer tf.nn.conv2d().
In [24]:
((8*8*3)+1) * 20 + 20
Out[24]:
TensorFlow provides the tf.nn.conv2d() and tf.nn.bias_add() functions to create your own convolutional layers.
In [26]:
# Output depth
k_output = 64
# Image Properties
image_width = 10
image_height = 10
color_channels = 3
# Convolution filter
filter_size_width = 5
filter_size_height = 5
# Input/Image
input = tf.placeholder(
tf.float32,
shape=[None, image_height, image_width, color_channels])
# Weight and bias
weight = tf.Variable(tf.truncated_normal(
[filter_size_height, filter_size_width, color_channels, k_output]))
bias = tf.Variable(tf.zeros(k_output))
# Apply Convolution
conv_layer = tf.nn.conv2d(input, weight, strides=[1, 2, 2, 1], padding='SAME')
# Add bias
conv_layer = tf.nn.bias_add(conv_layer, bias)
# Apply activation function
conv_layer = tf.nn.relu(conv_layer)
Max pooling takes the filter size, say 2x2, and grabs the max value and passes it on. This often leads to more accurate models, but is computationally expensive as the stride is typically 1.
Pooling layers decrease the size of the output and prevent overfitting.
The tf.nn.max_pool() function performs max pooling with the ksize parameter as the size of the filter and the strides parameter as the length of the stride. 2x2 filters with a stride of 2x2 are common in practice.
The ksize and strides parameters are structured as 4-element lists, with each element corresponding to a dimension of the input tensor ([batch, height, width, channels]). For both ksize and strides, the batch and channel dimensions are typically set to 1.
In [27]:
# Apply Max Pooling
conv_layer = tf.nn.max_pool(
conv_layer,
ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1],
padding='SAME')
Recently, pooling layers have fallen out of favor. Some reasons are:
- Recent datasets are so big and complex we're more concerned about underfitting.
- Dropout is a much better regularizer.
- Pooling results in a loss of information. Think about the max pooling operation as an example. We only keep the largest of n numbers, thereby disregarding n-1 numbers completely.
A pooling layer example:
In [28]:
input = tf.placeholder(tf.float32, (None, 4, 4, 5))
filter_shape = [1, 2, 2, 1]
strides = [1, 2, 2, 1]
padding = 'VALID'
pool = tf.nn.max_pool(input, filter_shape, strides, padding)
Quiz
In [48]:
"""
Setup the strides, padding and filter weight/bias such that
the output shape is (1, 2, 2, 3).
"""
import tensorflow as tf
import numpy as np
# `tf.nn.conv2d` requires the input be 4D (batch_size, height, width, depth)
# (1, 4, 4, 1)
x = np.array([
[0, 1, 0.5, 10],
[2, 2.5, 1, -8],
[4, 0, 5, 6],
[15, 1, 2, 3]], dtype=np.float32).reshape((1, 4, 4, 1))
X = tf.constant(x)
x.shape
Out[48]:
In [64]:
def conv2d(input):
# Filter (weights and bias)
# The shape of the filter weight is (height, width, input_depth, output_depth)
# The shape of the filter bias is (output_depth,)
# TODO: Define the filter weights `F_W` and filter bias `F_b`.
# NOTE: Remember to wrap them in `tf.Variable`, they are trainable parameters after all.
F_W = tf.Variable(tf.truncated_normal([2,2,1,3]))
F_b = tf.Variable(tf.zeros(3))
# TODO: Set the stride for each dimension (batch_size, height, width, depth)
strides = [1, 2, 2, 1]
# TODO: set the padding, either 'VALID' or 'SAME'.
padding = 'SAME'
# https://www.tensorflow.org/versions/r0.11/api_docs/python/nn.html#conv2d
# `tf.nn.conv2d` does not include the bias computation so we have to add it ourselves after.
return tf.nn.conv2d(input, F_W, strides, padding) + F_b
out = conv2d(X)
In [62]:
# udacity's solution
def conv2d(input):
# Filter (weights and bias)
F_W = tf.Variable(tf.truncated_normal((2, 2, 1, 3))) # (height, width, input_depth, output_depth)
F_b = tf.Variable(tf.zeros(3)) # (output_depth)
strides = [1, 2, 2, 1]
padding = 'VALID'
return tf.nn.conv2d(input, F_W, strides, padding) + F_b
Calculate the output height and width using the formula:
out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width = ceil(float(in_width - filter_width + 1) / float(strides[2]))
In [70]:
out_height = math.ceil(float(4 - 2 + 1) / float(2))
out_width = math.ceil(float(4 - 2 + 1) / float(2))
out_height, out_width
Out[70]:
In [72]:
"""
Set the values to `strides` and `ksize` such that
the output shape after pooling is (1, 2, 2, 1).
"""
# `tf.nn.max_pool` requires the input be 4D (batch_size, height, width, depth)
# (1, 4, 4, 1)
x = np.array([
[0, 1, 0.5, 10],
[2, 2.5, 1, -8],
[4, 0, 5, 6],
[15, 1, 2, 3]], dtype=np.float32).reshape((1, 4, 4, 1))
X = tf.constant(x)
def maxpool(input):
# TODO: Set the ksize (filter size) for each dimension (batch_size, height, width, depth)
ksize = [1, 2, 2, 1]
# TODO: Set the stride for each dimension (batch_size, height, width, depth)
strides = [1, 2, 2, 1]
# TODO: set the padding, either 'VALID' or 'SAME'.
padding = 'VALID'
# https://www.tensorflow.org/versions/r0.11/api_docs/python/nn.html#max_pool
return tf.nn.max_pool(input, ksize, strides, padding)
out = maxpool(X)
In [73]:
# udacity solution
def maxpool(input):
ksize = [1, 2, 2, 1]
strides = [1, 2, 2, 1]
padding = 'VALID'
return tf.nn.max_pool(input, ksize, strides, padding)
In [ ]: