NNabla allows you to define static and dynamic neural networks. Static neural networks have a fixed layer architecture, i.e., a static computation graph. In contrast, dynamic neural networks use a dynamic computation graph, e.g., randomly dropping layers for each minibatch.
This tutorial compares both computation graphs.
In [ ]:
# If you run this notebook on Google Colab, uncomment and run the following to set up dependencies.
# !pip install nnabla-ext-cuda100
# !git clone https://github.com/sony/nnabla.git
# %cd nnabla/tutorial
In [ ]:
# python2/3 compatibility
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division
In [ ]:
%matplotlib inline
import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import nnabla.solvers as S
import numpy as np
np.random.seed(0)
GPU = 0 # ID of GPU that we will use
batch_size = 64 # Reduce to fit your device memory
In [ ]:
from tiny_digits import *
digits = load_digits()
data = data_iterator_tiny_digits(digits, batch_size=batch_size, shuffle=True)
Each sample in this dataset is a grayscale image of size 8x8 and belongs to one of the ten classes 0
, 1
, ..., 9
.
In [ ]:
img, label = data.next()
print(img.shape, label.shape)
In [ ]:
def cnn(x):
"""Unnecessarily Deep CNN.
Args:
x : Variable, shape (B, 1, 8, 8)
Returns:
y : Variable, shape (B, 10)
"""
with nn.parameter_scope("cnn"): # Parameter scope can be nested
with nn.parameter_scope("conv1"):
h = F.tanh(PF.batch_normalization(
PF.convolution(x, 64, (3, 3), pad=(1, 1))))
for i in range(10): # unnecessarily deep
with nn.parameter_scope("conv{}".format(i + 2)):
h = F.tanh(PF.batch_normalization(
PF.convolution(h, 128, (3, 3), pad=(1, 1))))
with nn.parameter_scope("conv_last"):
h = F.tanh(PF.batch_normalization(
PF.convolution(h, 512, (3, 3), pad=(1, 1))))
h = F.average_pooling(h, (2, 2))
with nn.parameter_scope("fc"):
h = F.tanh(PF.affine(h, 1024))
with nn.parameter_scope("classifier"):
y = PF.affine(h, 10)
return y
In [ ]:
from nnabla.ext_utils import get_extension_context
# setup cuda extension
ctx_cuda = get_extension_context('cudnn', device_id=GPU) # replace 'cudnn' by 'cpu' if you want to run the example on the CPU
nn.set_default_context(ctx_cuda)
# create variables for network input and label
x = nn.Variable(img.shape)
t = nn.Variable(label.shape)
# create network
static_y = cnn(x)
static_y.persistent = True
# define loss function for training
static_l = F.mean(F.softmax_cross_entropy(static_y, t))
Setup solver for training
In [ ]:
solver = S.Adam(alpha=1e-3)
solver.set_parameters(nn.get_parameters())
Create data iterator
In [ ]:
loss = []
def epoch_end_callback(epoch):
global loss
print("[", epoch, np.mean(loss), itr, "]", end='')
loss = []
data = data_iterator_tiny_digits(digits, batch_size=batch_size, shuffle=True)
data.register_epoch_end_callback(epoch_end_callback)
Perform training iterations and output training loss:
In [ ]:
%%time
for epoch in range(30):
itr = 0
while data.epoch == epoch:
x.d, t.d = data.next()
static_l.forward(clear_no_need_grad=True)
solver.zero_grad()
static_l.backward(clear_buffer=True)
solver.update()
loss.append(static_l.d.copy())
itr += 1
print('')
Now, we will use a dynamic computation graph, where the neural network is setup each time we want to do a forward/backward pass through it. This allows us to, e.g., randomly dropout layers or to have network architectures that depend on input data. In this example, we will use for simplicity the same neural network structure and only dynamically create it. For example, adding a if np.random.rand() > dropout_probability:
into cnn()
allows to dropout layers.
First, we setup the solver and the data iterator for the training:
In [ ]:
nn.clear_parameters()
solver = S.Adam(alpha=1e-3)
solver.set_parameters(nn.get_parameters())
loss = []
def epoch_end_callback(epoch):
global loss
print("[", epoch, np.mean(loss), itr, "]", end='')
loss = []
data = data_iterator_tiny_digits(digits, batch_size=batch_size, shuffle=True)
data.register_epoch_end_callback(epoch_end_callback)
In [ ]:
%%time
for epoch in range(30):
itr = 0
while data.epoch == epoch:
x.d, t.d = data.next()
with nn.auto_forward():
dynamic_y = cnn(x)
dynamic_l = F.mean(F.softmax_cross_entropy(dynamic_y, t))
solver.set_parameters(nn.get_parameters(), reset=False, retain_state=True) # this can be done dynamically
solver.zero_grad()
dynamic_l.backward(clear_buffer=True)
solver.update()
loss.append(dynamic_l.d.copy())
itr += 1
print('')
Comparing the two processing times, we can observe that both schemes ("static" and "dynamic") takes the same execution time, i.e., although we created the computation graph dynamically, we did not loose performance.