Static vs Dynamic Neural Networks in NNabla

NNabla allows you to define static and dynamic neural networks. Static neural networks have a fixed layer architecture, i.e., a static computation graph. In contrast, dynamic neural networks use a dynamic computation graph, e.g., randomly dropping layers for each minibatch.

This tutorial compares both computation graphs.


In [ ]:
# If you run this notebook on Google Colab, uncomment and run the following to set up dependencies.
# !pip install nnabla-ext-cuda100
# !git clone https://github.com/sony/nnabla.git
# %cd nnabla/tutorial

In [ ]:
# python2/3 compatibility
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division

In [ ]:
%matplotlib inline
import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import nnabla.solvers as S

import numpy as np
np.random.seed(0)

GPU = 0  # ID of GPU that we will use
batch_size = 64  # Reduce to fit your device memory

Dataset loading

We will first setup the digits dataset from scikit-learn:


In [ ]:
from tiny_digits import *

digits = load_digits()
data = data_iterator_tiny_digits(digits, batch_size=batch_size, shuffle=True)

Each sample in this dataset is a grayscale image of size 8x8 and belongs to one of the ten classes 0, 1, ..., 9.


In [ ]:
img, label = data.next()
print(img.shape, label.shape)

Network definition

As an example, we define a (unnecessarily) deep CNN:


In [ ]:
def cnn(x):
    """Unnecessarily Deep CNN.
    
    Args:
        x : Variable, shape (B, 1, 8, 8)
        
    Returns:
        y : Variable, shape (B, 10)
    """
    with nn.parameter_scope("cnn"):  # Parameter scope can be nested
        with nn.parameter_scope("conv1"):
            h = F.tanh(PF.batch_normalization(
                PF.convolution(x, 64, (3, 3), pad=(1, 1))))
        for i in range(10):  # unnecessarily deep
            with nn.parameter_scope("conv{}".format(i + 2)):
                h = F.tanh(PF.batch_normalization(
                    PF.convolution(h, 128, (3, 3), pad=(1, 1))))
        with nn.parameter_scope("conv_last"):
            h = F.tanh(PF.batch_normalization(
                PF.convolution(h, 512, (3, 3), pad=(1, 1))))
            h = F.average_pooling(h, (2, 2))
        with nn.parameter_scope("fc"):
            h = F.tanh(PF.affine(h, 1024))
        with nn.parameter_scope("classifier"):
            y = PF.affine(h, 10)
    return y

Static computation graph

First, we will look at the case of a static computation graph where the neural network does not change during training.


In [ ]:
from nnabla.ext_utils import get_extension_context

# setup cuda extension
ctx_cuda = get_extension_context('cudnn', device_id=GPU)  # replace 'cudnn' by 'cpu' if you want to run the example on the CPU
nn.set_default_context(ctx_cuda)

# create variables for network input and label
x = nn.Variable(img.shape)
t = nn.Variable(label.shape)

# create network
static_y = cnn(x)
static_y.persistent = True

# define loss function for training
static_l = F.mean(F.softmax_cross_entropy(static_y, t))

Setup solver for training


In [ ]:
solver = S.Adam(alpha=1e-3)
solver.set_parameters(nn.get_parameters())

Create data iterator


In [ ]:
loss = []
def epoch_end_callback(epoch):
    global loss
    print("[", epoch, np.mean(loss), itr, "]", end='')
    loss = []

data = data_iterator_tiny_digits(digits, batch_size=batch_size, shuffle=True)
data.register_epoch_end_callback(epoch_end_callback)

Perform training iterations and output training loss:


In [ ]:
%%time
for epoch in range(30):
    itr = 0
    while data.epoch == epoch:
        x.d, t.d = data.next()
        static_l.forward(clear_no_need_grad=True)
        solver.zero_grad()
        static_l.backward(clear_buffer=True)
        solver.update()
        loss.append(static_l.d.copy())
        itr += 1
print('')

Dynamic computation graph

Now, we will use a dynamic computation graph, where the neural network is setup each time we want to do a forward/backward pass through it. This allows us to, e.g., randomly dropout layers or to have network architectures that depend on input data. In this example, we will use for simplicity the same neural network structure and only dynamically create it. For example, adding a if np.random.rand() > dropout_probability: into cnn() allows to dropout layers.

First, we setup the solver and the data iterator for the training:


In [ ]:
nn.clear_parameters()
solver = S.Adam(alpha=1e-3)
solver.set_parameters(nn.get_parameters())

loss = []
def epoch_end_callback(epoch):
    global loss
    print("[", epoch, np.mean(loss), itr, "]", end='')
    loss = []
data = data_iterator_tiny_digits(digits, batch_size=batch_size, shuffle=True)
data.register_epoch_end_callback(epoch_end_callback)

In [ ]:
%%time
for epoch in range(30):
    itr = 0
    while data.epoch == epoch:
        x.d, t.d = data.next()
        with nn.auto_forward():
            dynamic_y = cnn(x)
            dynamic_l = F.mean(F.softmax_cross_entropy(dynamic_y, t))
        solver.set_parameters(nn.get_parameters(), reset=False, retain_state=True) # this can be done dynamically
        solver.zero_grad()
        dynamic_l.backward(clear_buffer=True)
        solver.update()
        loss.append(dynamic_l.d.copy())
        itr += 1
print('')

Comparing the two processing times, we can observe that both schemes ("static" and "dynamic") takes the same execution time, i.e., although we created the computation graph dynamically, we did not loose performance.