In [1]:
    
%matplotlib inline
import importlib
import utils2; importlib.reload(utils2)
from utils2 import *
    
    
In [2]:
    
limit_mem()
    
In [3]:
    
from keras.datasets.cifar10 import load_batch
    
This notebook contains a Keras implementation of Huang et al.'s DenseNet
Our motivation behind studying DenseNet is because of how well it works with limited data.
DenseNet beats state-of-the-art results on CIFAR-10/CIFAR-100 w/ and w/o data augmentation, but the performance increase is most pronounced w/o data augmentation.
Compare to FractalNet, state-of-the-art on both datasets:
That increase is motivation enough.
So what is a DenseNet?
Put simply, DenseNet is a Resnet where we replace addition with concatenation.
Recall that in broad terms, a Resnet is a Convnet that uses residual block structures.
These "blocks" work as follows:
As mentioned, the difference w/ DenseNet is instead of adding Lt to Lt+1, it is being concatenated.
As with Resnet, DenseNet consists of multiple blocks. Therefore, there is a recursive relationship across blocks:
The number of filters added to each layer needs to be monitored, given that the input space for each block keeps growing.
Huang et al. calls the # of filters added at each layer the growth rate, and appropriately denotes this number with the related letter k.
Let's load data.
In [4]:
    
def load_data():
    path = 'data/cifar-10-batches-py'
    num_train_samples = 50000
    x_train = np.zeros((num_train_samples, 3, 32, 32), dtype='uint8')
    y_train = np.zeros((num_train_samples,), dtype='uint8')
    for i in range(1, 6):
        data, labels = load_batch(os.path.join(path, 'data_batch_' + str(i)))
        x_train[(i - 1) * 10000: i * 10000, :, :, :] = data
        y_train[(i - 1) * 10000: i * 10000] = labels
    x_test, y_test = load_batch(os.path.join(path, 'test_batch'))
    y_train = np.reshape(y_train, (len(y_train), 1))
    y_test = np.reshape(y_test, (len(y_test), 1))
    x_train = x_train.transpose(0, 2, 3, 1)
    x_test = x_test.transpose(0, 2, 3, 1)
    return (x_train, y_train), (x_test, y_test)
    
In [5]:
    
(x_train, y_train), (x_test, y_test) = load_data()
    
Here's an example of CIFAR-10
In [6]:
    
plt.imshow(x_train[1])
    
    Out[6]:
    
We want to normalize pixel values (0-255) to unit interval.
In [7]:
    
x_train = x_train/255.
x_test = x_test/255.
    
Let's make some helper functions for piecing together our network using Keras' Functional API.
These components should all be familiar to you:
In [8]:
    
def relu(x): return Activation('relu')(x)
def dropout(x, p): return Dropout(p)(x) if p else x
def bn(x): return BatchNormalization()(x)  # Keras 2 (axix=-1 is default for TensorFlow image dim ordering)
def relu_bn(x): return relu(bn(x))
    
Convolutional layer:
In [9]:
    
def conv(x, nf, sz, wd, p):
    x = Conv2D(nf, (sz, sz), kernel_initializer='he_uniform', padding='same',  # Keras 2
                          kernel_regularizer=l2(wd))(x)
    return dropout(x,p)
    
Define ConvBlock as sequence:
The authors also use something called a bottleneck layer to reduce dimensionality of inputs.
Recall that the filter space dimensionality grows at each block. The input dimensionality will determine the dimensionality of your convolution weight matrices, i.e. # of parameters.
At size 3x3 or larger, convolutions can become extremely costly and # of parameters can increase quickly as a function of the input feature (filter) space. Therefore, a smart approach is to reduce dimensionality of filters by using a 1x1 convolution w/ smaller # of filters before the larger convolution.
Bottleneck consists of:
nf * 4
In [10]:
    
def conv_block(x, nf, bottleneck=False, p=None, wd=0):
    x = relu_bn(x)
    if bottleneck: x = relu_bn(conv(x, nf * 4, 1, wd, p))
    return conv(x, nf, 3, wd, p)
    
Now we can define the dense block:
xbx and conv block output bx for next block
In [11]:
    
def dense_block(x, nb_layers, growth_rate, bottleneck=False, p=None, wd=0):
    if bottleneck: nb_layers //= 2
    for i in range(nb_layers):
        b = conv_block(x, growth_rate, bottleneck=bottleneck, p=p, wd=wd)
        x = concatenate([x,b])  # Keras 2
    return x
    
As typical for CV architectures, we'll do some pooling after computation.
We'll define this unit as the transition block, and we'll put one between each dense block.
Aside from BN -> ReLU and Average Pooling, there is also an option for filter compression in this block. This is simply feature reduction via 1x1 conv as discussed before, where the new # of filters is a percentage of the incoming # of filters.
Together with bottleneck, compression has been shown to improve performance and computational efficiency of DenseNet architectures. (the authors call this DenseNet-BC)
In [12]:
    
def transition_block(x, compression=1.0, p=None, wd=0):
    nf = int(x.get_shape().as_list()[-1] * compression)
    x = relu_bn(x)
    x = conv(x, nf, 1, wd, p)
    return AveragePooling2D((2, 2), strides=(2, 2))(x)
    
We've now defined all the building blocks (literally) to put together a DenseNet.
Returns: keras tensor with nb_layers of conv_block appended
From start to finish, this generates:
nb_block times, ommitting Transition block after last Dense block(Depth-4)/nb_block layers
In [13]:
    
def create_dense_net(nb_classes, img_input, depth=40, nb_block=3, 
     growth_rate=12, nb_filter=16, bottleneck=False, compression=1.0, p=None, wd=0, activation='softmax'):
    
    assert activation == 'softmax' or activation == 'sigmoid'
    assert (depth - 4) % nb_block == 0
    nb_layers_per_block = int((depth - 4) / nb_block)
    nb_layers = [nb_layers_per_block] * nb_block
    x = conv(img_input, nb_filter, 3, wd, 0)
    for i,block in enumerate(nb_layers):
        x = dense_block(x, block, growth_rate, bottleneck=bottleneck, p=p, wd=wd)
        if i != len(nb_layers)-1:
            x = transition_block(x, compression=compression, p=p, wd=wd)
    x = relu_bn(x)
    x = GlobalAveragePooling2D()(x)
    return Dense(nb_classes, activation=activation, kernel_regularizer=l2(wd))(x)  # Keras 2
    
Now we can test it out on CIFAR-10.
In [14]:
    
input_shape = (32,32,3)
    
In [15]:
    
img_input = Input(shape=input_shape)
    
In [16]:
    
x = create_dense_net(10, img_input, depth=100, nb_filter=16, compression=0.5, 
                     bottleneck=True, p=0.2, wd=1e-4)
    
In [17]:
    
model = Model(img_input, x)
    
In [18]:
    
model.compile(loss='sparse_categorical_crossentropy', 
      optimizer=keras.optimizers.SGD(0.1, 0.9, nesterov=True), metrics=["accuracy"])
    
In [19]:
    
parms = {'verbose': 2, 'callbacks': [TQDMNotebookCallback()]}
    
In [20]:
    
K.set_value(model.optimizer.lr, 0.1)
    
This will likely need to run overnight + lr annealing...
In [21]:
    
model.fit(x_train, y_train, 64, 20, validation_data=(x_test, y_test), **parms)
    
    
    
 
 
    
 
 
    
    
 
 
    
    
 
 
    
    
 
 
    
    
 
 
    
    
 
 
    
    
 
 
    
    
 
 
    
    
 
 
    
    
In [22]:
    
K.set_value(model.optimizer.lr, 0.01)
    
    
In [23]:
    
model.fit(x_train, y_train, 64, 4, validation_data=(x_test, y_test), **parms)
    
    
    
 
 
    
    
 
 
    
    
 
 
    
    
 
 
    
    
 
 
    
    Out[23]:
In [ ]:
    
K.set_value(model.optimizer.lr, 0.1)
    
In [ ]:
    
model.fit(x_train, y_train, 64, 20, validation_data=(x_test, y_test), **parms)
    
In [ ]:
    
K.set_value(model.optimizer.lr, 0.01)
    
In [ ]:
    
model.fit(x_train, y_train, 64, 40, validation_data=(x_test, y_test), **parms)
    
In [ ]:
    
K.set_value(model.optimizer.lr, 0.001)
    
In [ ]:
    
model.fit(x_train, y_train, 64, 20, validation_data=(x_test, y_test), **parms)
    
In [ ]:
    
K.set_value(model.optimizer.lr, 0.01)
    
In [ ]:
    
model.fit(x_train, y_train, 64, 10, validation_data=(x_test, y_test), **parms)
    
In [ ]:
    
K.set_value(model.optimizer.lr, 0.001)
    
In [ ]:
    
model.fit(x_train, y_train, 64, 20, validation_data=(x_test, y_test), **parms)
    
And we're able to replicate their state-of-the-art results!
In [ ]:
    
%time model.save_weights('models/93.h5')
    
In [ ]: