In this notebook we will design our own ConvNet and see some existing applications.
We will also see the three different methods to define a Keras model:
The goal of this notebook is not to compare models performance, as we are limited on compute power, but to compare model architectures.
In [ ]:
from tensorflow.keras import datasets
from sklearn.model_selection import train_test_split
(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()
h, w = x_train.shape[1:]
x_train = x_train.reshape(x_train.shape[0], h, w, 1)
x_test = x_test.reshape(x_test.shape[0], h, w, 1)
input_shape = (h, w, 1)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
x_train, x_val, y_train, y_val = train_test_split(
x_train, y_train, test_size=10000, random_state=42)
(x_train.shape, y_train.shape), (x_val.shape, y_val.shape), (x_test.shape, y_test.shape)
In [ ]:
import matplotlib.pyplot as plt
plt.imshow(x_train[0].squeeze(-1))
plt.title(y_train[0]);
In [ ]:
import numpy as np
print("{} unique labels.".format(np.unique(y_train)))
Let's define a (slightly modified) LeNet model introduced by Yann Le Cun in 1998 (paper url). The model is very simple and can be defined with the Sequential API.
In [ ]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPool2D, Dense, Flatten
from tensorflow.keras import optimizers
lenet = Sequential(name="LeNet-5")
lenet.add(Conv2D(6, kernel_size=(5, 5), activation="tanh", padding="same", input_shape=input_shape, name="C1"))
lenet.add(MaxPool2D(pool_size=(2, 2), name="S2"))
lenet.add(Conv2D(16, kernel_size=(5, 5), activation='tanh', name="C3"))
lenet.add(AvgPool2D(pool_size=(2, 2), name="S4"))
lenet.add(Conv2D(120, kernel_size=(5, 5), activation='tanh', name="C5"))
lenet.add(Flatten())
lenet.add(Dense(84, activation='tanh', name="F6"))
lenet.add(Dense(10, activation='softmax'))
lenet.summary()
In [ ]:
n_epochs = 5
batch_size = 256
lenet.compile(
optimizer=optimizers.SGD(lr=0.1),
loss="sparse_categorical_crossentropy",
metrics=["accuracy"]
)
lenet.fit(
x_train, y_train,
epochs=n_epochs,
batch_size=batch_size,
validation_data=(x_val, y_val)
)
In [ ]:
lenet.evaluate(x_test, y_test, verbose=0)
Note that while LeNet was first defined using either tanh
or sigmoid
, those activations are now rarely used. As seen in Lab02, both activations saturate on very small and large values making their gradient almost null.
Now most network use ReLU
as hidden activation function or one of its derivative (https://keras.io/layers/advanced-activations/).
Inception models were introduced in 2014 by Szegedy et al. (paper url).
Convolutions have an effective receptive field: the bigger the kernels, and the deeper the model, a features pixel will see more image pixels. Read this for a good explanation: medium blog.
In Inception, convolution kernels of different sizes are combined. Small kernels see small clusters of features (think a detail as an eye) while big kernels see big clusters of features (think a face).
This time, use the Functional API to define a single Inception layer like the previous image Exemple usage:
a = Input(shape=(32,))
b = Dense(32)(a)
model = Model(inputs=a, outputs=b)
The layer is first instancied (first pair of parenthesis) then called on a tensor (second set of parenthesis).
In [ ]:
from tensorflow.keras.layers import Concatenate, Input
from tensorflow.keras.models import Model
def inception_layer(tensor, n_filters):
# TODO: define the 4 branches
branch1x1 = None
branch5x5 = None
branch3x3 = None
branch_pool = None
# TODO: merge it using Concatenate layer, use Concatenate? for more info
output = None
return output
input_tensor = Input(shape=input_shape)
x = Conv2D(16, kernel_size=(5, 5), padding="same")(input_tensor)
x = inception_layer(x, 32)
x = Flatten()(x)
output_tensor = Dense(10, activation="softmax")(x)
mini_inception = Model(inputs=input_tensor, outputs=output_tensor)
mini_inception.summary()
In [ ]:
# %load solutions/mini_inception.py
ResNet (Residual Networks) models were introduced by He et al. in 2015 (paper url). They found that more layers improved the performance but unfortunatly it was hard to backpropagate the gradients up to first layers.
A trick to let the gradients "flow" easily is to use shortcut connection that let the forward tensor untouched (aka a residual):
This time, code a ResNet layer using the Oriented-Object API:
Exemple usage:
class MyModel(Model):
def __init__(self):
self.classifier = Dense(10, activation="softmax")
def call(self, inputs):
return self.classifier(inputs)
In [ ]:
from tensorflow.keras.layers import Layer, Add
class ResidualBlock(Layer):
def __init__(self, n_filters):
super().__init__(name="ResidualBlock")
# TODO: define needed layers, use Add to combine the shortcut with the convs' output.
def call(self, inputs):
# TODO
return 42
class MiniResNet(Model):
def __init__(self, n_filters):
super().__init__()
self.conv = Conv2D(n_filters, kernel_size=(5, 5), padding="same")
self.block = ResidualBlock(n_filters)
self.flatten = Flatten()
self.classifier = Dense(10, activation="softmax")
def call(self, inputs):
# TODO
return 1337
mini_resnet = MiniResNet(32)
mini_resnet.build((None, *input_shape))
mini_resnet.summary()
In [ ]:
# %load solutions/mini_resnet.py
Batch Normalization is not an architecture but a layer. Introduced by Ioffe et al. in 2015 (paper url). Here is an extract from their abstract:
Training Deep Neural Networks is complicated by the fact that the distribution of each layer’s inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learningrates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs.
The results are that ConvNet trained with BatchNorm converge faster and with better results. Nowadays all (or almost all) networks are use it or one of its variants. See this article on normalization for more info.
A classic block is:
In [ ]:
class ConvBlock(Layer):
def __init__(n_filters, kernel_size):
super().__init__()
self.conv = Conv2D(n_filters, kernel_size=kernel_size, use_bias=False)
self.bn = BatchNormalization(axis=3)
self.activation = Activation("relu")
def call(self, inputs):
return self.activation(
self.bn(self.conv(inputs))
)
That you can place multiple times into your network as Lego blocks.
ConvNet usually have a lot of parameters because of their large depth. A trick to trim the number of parameters with minimal performance loss is to use separable convolution.
The standard convolution has quite a lot of parameters (but still much less than a Fully Connected layer!):
In [ ]:
conv_model = Sequential(name="Conv Model")
conv_model.add(Conv2D(8, kernel_size=(3, 3), use_bias=False))
In [ ]:
# conv_model.build((None, *input_shape))
# conv_model.summary()
Separable convolutions are made of two convolutions:
In [ ]:
from tensorflow.keras.layers import DepthwiseConv2D
separable_model = Sequential(name="Separable Model")
separable_model.add(DepthwiseConv2D(kernel_size=(3, 3), use_bias=False))
separable_model.add(Conv2D(8, kernel_size=(1, 1), use_bias=False))
In [ ]:
# separable_model.build((None, *input_shape))
# separable_model.summary()
In [ ]: