In [ ]:
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
1. Familiar with Python
2. Completed Chapter II: Convolutional Neural Networks
1. Create a basic CNN.
2. Create a VGG class CNN
3. Create a CNN with an identity link (Residual CNN)
Let's create a basic CNN. We will make it as two convolutional layers, each followed by a max pooling layer.
We will use these approaches:
1. We will double the number of filters with each subsequent layer.
2. We will reduce the size of the feature maps by using a stride > 1.
You fill in the blanks (replace the ??), make sure it passes the Python interpreter, and then verify it's correctness with the summary output.
You will need to:
1. Set the number of channels on the input vector (i.e., input shape).
2. Set the number of filters and stride on the convolutional layers.
3. Set the max pooling window size and stride.
In [ ]:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense
# Let's start with a Sequential model
model = Sequential()
# Let's assume we are building a model for CIFAR-10, which are 32x32 RGB images
# HINT: how many channels are in an RGB image
input_shape=(32, 32, ??)
# Let's add a first convolution layer with 16 filters of size 3x3 and stride of 2
# HINT: first parameter is the number of filters and the second is the filter (kernel) size
model.add(Conv2D(??, ??, strides=2, activation='relu', input_shape=input_shape))
# Let's reduce the feature maps by 75%
# HINT: 2x2 window and move 2 pixels at a time
model.add(MaxPooling2D(??, strides=??))
# Let's add a second convolution layer with 3x3 filter and strides=2 and double the filters
# HINT: double the number of filters you specified in the first Conv2D
model.add(Conv2D(??, ??, strides=2, activation='relu'))
# Let's reduce the feature maps by 75%
model.add(MaxPooling2D(??, strides=??))
model.add(Dense(10, activation='softmax'))
It should look like below:
Layer (type) Output Shape Param #
=================================================================
conv2d_4 (Conv2D) (None, 15, 15, 16) 448
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 7, 7, 16) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 3, 3, 32) 4640
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 1, 1, 32) 0
_________________________________________________________________
dense_3 (Dense) (None, 1, 1, 10) 330
=================================================================
Total params: 5,418
Trainable params: 5,418
Non-trainable params: 0
In [ ]:
model.summary()
Next, we will create a VGG convolutional network. VGG networks are sequential, but they add the concept of convolutional groups. The basic elements of a VGG are:
1. Each convolutional group consists of two or more convolutional layers.
2. Max pooling is deferred to the end of the convolutional group.
3. Each convolutional group is the same or double the number of filters as the last
group.
4. Multiple dense layers are used for the classifer.
You will need to:
1. Set the number of filers ,filter size and padding on the stem convolutional group.
2. Set the the number of filters for the convolutional blocks.
3. Add the flattening layer between the feature learning and classifier groups.
4. Set the number of nodes in the dense layers of the classifier.
In [ ]:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
def conv_block(n_layers, n_filters):
"""
n_layers : number of convolutional layers
n_filters: number of filters
"""
for n in range(n_layers):
model.add(Conv2D(n_filters, (3, 3), strides=(1, 1), padding="same",
activation="relu"))
model.add(MaxPooling2D(2, strides=2))
# Create a Sequential Model
model = Sequential()
# Add Convolutional Frontend with 64 3x3 filters of stride 1
# Set the padding so when the filter is slid over the edges of the image, the "imaginary" pixels have the same
# value as the pixels on the edge.
model.add(Conv2D(??, ??, strides=(1, 1), padding=??, activation="relu",
input_shape=(224, 224, 3)))
# These are the convolutional groups - double the number of filters on each progressive group
conv_block(1, 64)
conv_block(2, ??)
conv_block(3, ??)
# The last two groups in a VGG16, its double the size of the previous of the group, but both groups are the same size.
# HINT: the number should be the same for both
conv_block(3, ??)
conv_block(3, ??)
# Add layer to transistion from final 2D feature maps (bottleneck layer) to 1D vector for DNN.
# HINT: think of what you need to do to the 2D feature maps from the convolutional layers before passing to dense layers.
model.add(??)
# Add DNN Backend with two layers of 4096 nodes
# HINT:
model.add(Dense(??, activation='relu'))
model.add(Dense(??, activation='relu'))
# Output layer for classification (1000 classes)
model.add(Dense(1000, activation=??))
It should look like below:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_14 (Conv2D) (None, 224, 224, 64) 1792
_________________________________________________________________
conv2d_15 (Conv2D) (None, 224, 224, 64) 36928
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 112, 112, 64) 0
_________________________________________________________________
conv2d_16 (Conv2D) (None, 112, 112, 128) 73856
_________________________________________________________________
conv2d_17 (Conv2D) (None, 112, 112, 128) 147584
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 56, 56, 128) 0
_________________________________________________________________
conv2d_18 (Conv2D) (None, 56, 56, 256) 295168
_________________________________________________________________
conv2d_19 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
conv2d_20 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 28, 28, 256) 0
_________________________________________________________________
conv2d_21 (Conv2D) (None, 28, 28, 512) 1180160
_________________________________________________________________
conv2d_22 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
conv2d_23 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 14, 14, 512) 0
_________________________________________________________________
conv2d_24 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
conv2d_25 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
conv2d_26 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
max_pooling2d_10 (MaxPooling (None, 7, 7, 512) 0
_________________________________________________________________
flatten_2 (Flatten) (None, 25088) 0
_________________________________________________________________
dense_4 (Dense) (None, 4096) 102764544
_________________________________________________________________
dense_5 (Dense) (None, 4096) 16781312
_________________________________________________________________
dense_6 (Dense) (None, 1000) 4097000
=================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
__________________________
In [ ]:
model.summary()
Finally, we will create a residual convolutional network (ResNet). The basic elements of a ResNet are:
1. A stem convolutional group of 7x7 filter size.
2. A sequence of residual blocks, where each doubles the number of filters.
A. Each residual block consists of two 3x3 filters, w/o max pooling.
B. The input to the residual block is added to the output.
3. Between residual blocks is a convolutional block that doubles the number of
filters from the previous block, so the number of filters coming in and going
out are the same for the identity link matrix add operation.
A. Each convolutional block consists of two 3x3 filters, but uses stride=2
to downsample the size of the feature maps.
You will need to:
1. Save the input to the residual block for the identity link.
2. Complete the matrix add of the identity link to the output of the residual block.
3. Set (double) the filters for the convolutional block between residual block groups to match filter sizes for matrix add operations.
4. Add the global averaging layer between the feature learning groups and the classifier.
In [ ]:
from tensorflow.keras import Model
import tensorflow.keras.layers as layers
def residual_block(n_filters, x):
""" Create a Residual Block of Convolutions
n_filters: number of filters
x : input into the block
"""
# Save the input as the shortcut for the identity link
# Hint: read the comment on the params to the function.
shortcut = ??
x = layers.Conv2D(n_filters, (3, 3), strides=(1, 1), padding="same",
activation="relu")(x)
x = layers.Conv2D(n_filters, (3, 3), strides=(1, 1), padding="same",
activation="relu")(x)
# Add the saved input (identity link) to the output.
# HINT: the name of the variable you used above to save the input.
x = layers.add([??, x])
return x
def conv_block(n_filters, x):
""" Create Block of Convolutions without Pooling
n_filters: number of filters
x : input into the block
"""
x = layers.Conv2D(n_filters, (3, 3), strides=(2, 2), padding="same",
activation="relu")(x)
x = layers.Conv2D(n_filters, (3, 3), strides=(2, 2), padding="same",
activation="relu")(x)
return x
# The input tensor
inputs = layers.Input(shape=(224, 224, 3))
# First Convolutional layer, where pooled feature maps will be reduced by 75%
x = layers.Conv2D(64, kernel_size=(7, 7), strides=(2, 2), padding='same', activation='relu')(inputs)
x = layers.MaxPool2D(pool_size=(3, 3), strides=(2, 2), padding='same')(x)
# First Residual Block Group of 64 filters
for _ in range(3):
x = residual_block(64, x)
# Double the size of filters and reduce feature maps by 75% (strides=2, 2) to fit the next Residual Group
# HINT: number should be twice as big as the number of filters in prior residual_blocks.
x = conv_block(??, x)
# Second Residual Block Group of 128 filters
for _ in range(3):
x = residual_block(128, x)
# Double the size of filters and reduce feature maps by 75% (strides=2, 2) to fit the next Residual Group
x = conv_block(??, x)
# Third Residual Block Group of 256 filters
for _ in range(5):
x = residual_block(256, x)
# Double the size of filters and reduce feature maps by 75% (strides=2, 2) to fit the next Residual Group
x = conv_block(??, x)
# Fourth Residual Block Group of 512 filters
for _ in range(2):
x = residual_block(??, x)
# Add a Global Averaging Pooling (inplace of a Flatten) at the end of all the convolutional residual blocks
x = layers.??()(x)
# Final Dense Outputting Layer for 1000 outputs
outputs = layers.Dense(1000, activation='softmax')(x)
model = Model(inputs, outputs)
It should look like below:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 224, 224, 3) 0
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 112, 112, 64) 9472 input_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, 56, 56, 64) 0 conv2d_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 56, 56, 64) 36928 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 56, 56, 64) 36928 conv2d_2[0][0]
__________________________________________________________________________________________________
add_1 (Add) (None, 56, 56, 64) 0 max_pooling2d_1[0][0]
conv2d_3[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D) (None, 56, 56, 64) 36928 add_1[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, 56, 56, 64) 36928 conv2d_4[0][0]
__________________________________________________________________________________________________
add_2 (Add) (None, 56, 56, 64) 0 add_1[0][0]
conv2d_5[0][0]
__________________________________________________________________________________________________
conv2d_6 (Conv2D) (None, 56, 56, 64) 36928 add_2[0][0]
__________________________________________________________________________________________________
conv2d_7 (Conv2D) (None, 56, 56, 64) 36928 conv2d_6[0][0]
__________________________________________________________________________________________________
add_3 (Add) (None, 56, 56, 64) 0 add_2[0][0]
conv2d_7[0][0]
__________________________________________________________________________________________________
conv2d_8 (Conv2D) (None, 28, 28, 128) 73856 add_3[0][0]
__________________________________________________________________________________________________
conv2d_9 (Conv2D) (None, 14, 14, 128) 147584 conv2d_8[0][0]
__________________________________________________________________________________________________
conv2d_10 (Conv2D) (None, 14, 14, 128) 147584 conv2d_9[0][0]
__________________________________________________________________________________________________
conv2d_11 (Conv2D) (None, 14, 14, 128) 147584 conv2d_10[0][0]
__________________________________________________________________________________________________
add_4 (Add) (None, 14, 14, 128) 0 conv2d_9[0][0]
conv2d_11[0][0]
__________________________________________________________________________________________________
conv2d_12 (Conv2D) (None, 14, 14, 128) 147584 add_4[0][0]
__________________________________________________________________________________________________
conv2d_13 (Conv2D) (None, 14, 14, 128) 147584 conv2d_12[0][0]
__________________________________________________________________________________________________
add_5 (Add) (None, 14, 14, 128) 0 add_4[0][0]
conv2d_13[0][0]
__________________________________________________________________________________________________
conv2d_14 (Conv2D) (None, 14, 14, 128) 147584 add_5[0][0]
__________________________________________________________________________________________________
conv2d_15 (Conv2D) (None, 14, 14, 128) 147584 conv2d_14[0][0]
__________________________________________________________________________________________________
add_6 (Add) (None, 14, 14, 128) 0 add_5[0][0]
conv2d_15[0][0]
__________________________________________________________________________________________________
conv2d_16 (Conv2D) (None, 7, 7, 256) 295168 add_6[0][0]
__________________________________________________________________________________________________
conv2d_17 (Conv2D) (None, 4, 4, 256) 590080 conv2d_16[0][0]
__________________________________________________________________________________________________
conv2d_18 (Conv2D) (None, 4, 4, 256) 590080 conv2d_17[0][0]
__________________________________________________________________________________________________
conv2d_19 (Conv2D) (None, 4, 4, 256) 590080 conv2d_18[0][0]
__________________________________________________________________________________________________
add_7 (Add) (None, 4, 4, 256) 0 conv2d_17[0][0]
conv2d_19[0][0]
__________________________________________________________________________________________________
conv2d_20 (Conv2D) (None, 4, 4, 256) 590080 add_7[0][0]
__________________________________________________________________________________________________
conv2d_21 (Conv2D) (None, 4, 4, 256) 590080 conv2d_20[0][0]
__________________________________________________________________________________________________
add_8 (Add) (None, 4, 4, 256) 0 add_7[0][0]
conv2d_21[0][0]
__________________________________________________________________________________________________
conv2d_22 (Conv2D) (None, 4, 4, 256) 590080 add_8[0][0]
__________________________________________________________________________________________________
conv2d_23 (Conv2D) (None, 4, 4, 256) 590080 conv2d_22[0][0]
__________________________________________________________________________________________________
add_9 (Add) (None, 4, 4, 256) 0 add_8[0][0]
conv2d_23[0][0]
__________________________________________________________________________________________________
conv2d_24 (Conv2D) (None, 4, 4, 256) 590080 add_9[0][0]
__________________________________________________________________________________________________
conv2d_25 (Conv2D) (None, 4, 4, 256) 590080 conv2d_24[0][0]
__________________________________________________________________________________________________
add_10 (Add) (None, 4, 4, 256) 0 add_9[0][0]
conv2d_25[0][0]
__________________________________________________________________________________________________
conv2d_26 (Conv2D) (None, 4, 4, 256) 590080 add_10[0][0]
__________________________________________________________________________________________________
conv2d_27 (Conv2D) (None, 4, 4, 256) 590080 conv2d_26[0][0]
__________________________________________________________________________________________________
add_11 (Add) (None, 4, 4, 256) 0 add_10[0][0]
conv2d_27[0][0]
__________________________________________________________________________________________________
conv2d_28 (Conv2D) (None, 2, 2, 512) 1180160 add_11[0][0]
__________________________________________________________________________________________________
conv2d_29 (Conv2D) (None, 1, 1, 512) 2359808 conv2d_28[0][0]
__________________________________________________________________________________________________
conv2d_30 (Conv2D) (None, 1, 1, 512) 2359808 conv2d_29[0][0]
__________________________________________________________________________________________________
conv2d_31 (Conv2D) (None, 1, 1, 512) 2359808 conv2d_30[0][0]
__________________________________________________________________________________________________
add_12 (Add) (None, 1, 1, 512) 0 conv2d_29[0][0]
conv2d_31[0][0]
__________________________________________________________________________________________________
conv2d_32 (Conv2D) (None, 1, 1, 512) 2359808 add_12[0][0]
__________________________________________________________________________________________________
conv2d_33 (Conv2D) (None, 1, 1, 512) 2359808 conv2d_32[0][0]
__________________________________________________________________________________________________
add_13 (Add) (None, 1, 1, 512) 0 add_12[0][0]
conv2d_33[0][0]
__________________________________________________________________________________________________
global_average_pooling2d_1 (Glo (None, 512) 0 add_13[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 1000) 513000 global_average_pooling2d_1[0][0]
==================================================================================================
Total params: 21,616,232
Trainable params: 21,616,232
Non-trainable params: 0
In [ ]:
model.summary()
In [ ]:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
def makeVGG6():
def conv_block(n_layers, n_filters):
"""
n_layers : number of convolutional layers
n_filters: number of filters
"""
for n in range(n_layers):
model.add(Conv2D(n_filters, (3, 3), strides=(1, 1), padding="same",
activation="relu"))
model.add(MaxPooling2D(2, strides=2))
model = Sequential()
model.add(Conv2D(64, (3, 3), strides=(1, 1), padding='same', activation="relu",
input_shape=(32, 32, 3)))
# These are the convolutional groups
conv_block(1, 64)
conv_block(2, 128)
model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['acc'])
return model
vgg6 = makeVGG6()
Let's now check the summary(). You should see 34 million parameters.
In [ ]:
vgg6.summary()
In [ ]:
def makeVGG10():
def conv_block(n_layers, n_filters):
"""
n_layers : number of convolutional layers
n_filters: number of filters
"""
for n in range(n_layers):
model.add(Conv2D(n_filters, (3, 3), strides=(1, 1), padding="same",
activation="relu"))
model.add(MaxPooling2D(2, strides=2))
model = Sequential()
model.add(Conv2D(64, (3, 3), strides=(1, 1), padding='same', activation="relu",
input_shape=(32, 32, 3)))
# These are the convolutional groups
conv_block(1, 64)
conv_block(2, 128)
conv_block(3, 256)
model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dense(4096, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['acc'])
return model
vgg10 = makeVGG10()
Let's now check the summary(). You should see 35 million parameters. Note how there have nearly the same number of parameters, but the 10 layer VGG is deeper.
In [ ]:
vgg10.summary()
In [ ]:
from tensorflow.keras.datasets import cifar10
import numpy as np
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = (x_train / 255.0).astype(np.float32)
In [ ]:
vgg6.fit(x_train, y_train, epochs=3, batch_size=32, validation_split=0.1, verbose=1)
In [ ]:
vgg10.fit(x_train, y_train, epochs=3, batch_size=32, validation_split=0.1, verbose=1)
Notice how the shallower VGG (6 layers) increasesn in accuracy across all three epochs (??), but the deeper VGG (10) does not and in fact it learns nothing (10% is same as random guessing).
While this is not a vanishing gradient (we do not see a NaN on the loss), it does show how early CNN architectures when made deeper became less reliable to converge - not covered yet.
If we use a larger image size, like 224x224, we can go more layers because we have more pixel data, but eventually we hit the same problem again.