ConvNet design

In this notebook we will design our own ConvNet and see some existing applications.

We will also see the three different methods to define a Keras model:

Sequential
Functional
Object-Oriented

The goal of this notebook is not to compare models performance, as we are limited on compute power, but to compare model architectures.

Dataset loading



In [ ]:

    
from tensorflow.keras import datasets
from sklearn.model_selection import train_test_split

(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()
h, w = x_train.shape[1:]

x_train = x_train.reshape(x_train.shape[0], h, w, 1)
x_test = x_test.reshape(x_test.shape[0], h, w, 1)
input_shape = (h, w, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

x_train, x_val, y_train, y_val = train_test_split(
    x_train, y_train, test_size=10000, random_state=42)

(x_train.shape, y_train.shape), (x_val.shape, y_val.shape), (x_test.shape, y_test.shape)



In [ ]:

    
import matplotlib.pyplot as plt

plt.imshow(x_train[0].squeeze(-1))
plt.title(y_train[0]);



In [ ]:

    
import numpy as np

print("{} unique labels.".format(np.unique(y_train)))

1. LeNet

Let's define a (slightly modified) LeNet model introduced by Yann Le Cun in 1998 (paper url). The model is very simple and can be defined with the Sequential API.



In [ ]:

    
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPool2D, Dense, Flatten
from tensorflow.keras import optimizers

lenet = Sequential(name="LeNet-5")

lenet.add(Conv2D(6, kernel_size=(5, 5), activation="tanh", padding="same", input_shape=input_shape, name="C1"))
lenet.add(MaxPool2D(pool_size=(2, 2), name="S2"))
lenet.add(Conv2D(16, kernel_size=(5, 5), activation='tanh', name="C3"))
lenet.add(AvgPool2D(pool_size=(2, 2), name="S4"))
lenet.add(Conv2D(120, kernel_size=(5, 5), activation='tanh', name="C5"))
lenet.add(Flatten())
lenet.add(Dense(84, activation='tanh', name="F6"))
lenet.add(Dense(10, activation='softmax'))

lenet.summary()



In [ ]:

    
n_epochs = 5
batch_size = 256

lenet.compile(
    optimizer=optimizers.SGD(lr=0.1),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)

lenet.fit(
    x_train, y_train,
    epochs=n_epochs,
    batch_size=batch_size,
    validation_data=(x_val, y_val)
)



In [ ]:

    
lenet.evaluate(x_test, y_test, verbose=0)

Note that while LeNet was first defined using either tanh or sigmoid, those activations are now rarely used. As seen in Lab02, both activations saturate on very small and large values making their gradient almost null.

Now most network use ReLU as hidden activation function or one of its derivative (https://keras.io/layers/advanced-activations/).

2. Inception

Inception models were introduced in 2014 by Szegedy et al. (paper url).

Convolutions have an effective receptive field: the bigger the kernels, and the deeper the model, a features pixel will see more image pixels. Read this for a good explanation: medium blog.

In Inception, convolution kernels of different sizes are combined. Small kernels see small clusters of features (think a detail as an eye) while big kernels see big clusters of features (think a face).

This time, use the Functional API to define a single Inception layer like the previous image Exemple usage:

a = Input(shape=(32,))
b = Dense(32)(a)
model = Model(inputs=a, outputs=b)

The layer is first instancied (first pair of parenthesis) then called on a tensor (second set of parenthesis).



In [ ]:

    
from tensorflow.keras.layers import Concatenate, Input
from tensorflow.keras.models import Model


def inception_layer(tensor, n_filters):
    # TODO: define the 4 branches
    branch1x1 = None
    branch5x5 = None
    branch3x3 = None
    branch_pool = None

    # TODO: merge it using Concatenate layer, use Concatenate? for more info
    output = None
    return output


input_tensor = Input(shape=input_shape)
x = Conv2D(16, kernel_size=(5, 5), padding="same")(input_tensor)
x = inception_layer(x, 32)
x = Flatten()(x)
output_tensor = Dense(10, activation="softmax")(x)

mini_inception = Model(inputs=input_tensor, outputs=output_tensor)

mini_inception.summary()



In [ ]:

    
# %load solutions/mini_inception.py

3. ResNet

ResNet (Residual Networks) models were introduced by He et al. in 2015 (paper url). They found that more layers improved the performance but unfortunatly it was hard to backpropagate the gradients up to first layers.

A trick to let the gradients "flow" easily is to use shortcut connection that let the forward tensor untouched (aka a residual):

This time, code a ResNet layer using the Oriented-Object API:

Exemple usage:

class MyModel(Model):
    def __init__(self):
        self.classifier = Dense(10, activation="softmax")

    def call(self, inputs):
        return self.classifier(inputs)



In [ ]:

    
from tensorflow.keras.layers import Layer, Add


class ResidualBlock(Layer):
    def __init__(self, n_filters):
        super().__init__(name="ResidualBlock")
        
        # TODO: define needed layers, use Add to combine the shortcut with the convs' output.
        
    def call(self, inputs):
        # TODO
        return 42
    

class MiniResNet(Model):
    def __init__(self, n_filters):
        super().__init__()

        self.conv = Conv2D(n_filters, kernel_size=(5, 5), padding="same")
        self.block = ResidualBlock(n_filters)
        self.flatten = Flatten()
        self.classifier = Dense(10, activation="softmax")
        
    def call(self, inputs):
        # TODO
        return 1337


mini_resnet = MiniResNet(32)
mini_resnet.build((None, *input_shape))
mini_resnet.summary()



In [ ]:

    
# %load solutions/mini_resnet.py

4. Batch Normalization

Batch Normalization is not an architecture but a layer. Introduced by Ioffe et al. in 2015 (paper url). Here is an extract from their abstract:

Training Deep Neural Networks is complicated by the fact that the distribution of each layer’s inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learningrates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs.

The results are that ConvNet trained with BatchNorm converge faster and with better results. Nowadays all (or almost all) networks are use it or one of its variants. See this article on normalization for more info.

A classic block is:



In [ ]:

    
class ConvBlock(Layer):
    def __init__(n_filters, kernel_size):
        super().__init__()
        
        self.conv = Conv2D(n_filters, kernel_size=kernel_size, use_bias=False)
        self.bn = BatchNormalization(axis=3)
        self.activation = Activation("relu")
        
    def call(self, inputs):
        return self.activation(
            self.bn(self.conv(inputs))
        )

That you can place multiple times into your network as Lego blocks.

5. Separable Convolutions

ConvNet usually have a lot of parameters because of their large depth. A trick to trim the number of parameters with minimal performance loss is to use separable convolution.

The standard convolution has quite a lot of parameters (but still much less than a Fully Connected layer!):



In [ ]:

    
conv_model = Sequential(name="Conv Model")
conv_model.add(Conv2D(8, kernel_size=(3, 3), use_bias=False))

Exercice:

How many parameters does this convolution has?



In [ ]:

    
# conv_model.build((None, *input_shape))
# conv_model.summary()

Separable convolutions are made of two convolutions:

A depthwise convolution, a single kernel is created per input channels, spatial information is affected, but channels information is not shared

A pointwise convolution, is a usual convolution with a kernel of size (1, 1). Spatial information is not affected, but channels information is shared.



In [ ]:

    
from tensorflow.keras.layers import DepthwiseConv2D

separable_model = Sequential(name="Separable Model")
separable_model.add(DepthwiseConv2D(kernel_size=(3, 3), use_bias=False))
separable_model.add(Conv2D(8, kernel_size=(1, 1), use_bias=False))

Exercice:

How many parameters does the Depthwise convolution has?
How many parameters does the Pointwise convolution has?



In [ ]:

    
# separable_model.build((None, *input_shape))
# separable_model.summary()

Home assignments

See the different models available on Keras: https://keras.io/applications/ What are their different architecture tricks?
Try to pick an architecture (like MobileNet or Squeeze-and-Excitation network), read their paper, implement it in Keras, and try to reach good performance on a small dataset like CIFAR10.



In [ ]: