benchmark
layerThis notebook shows how to write a benchmark
layer that can wrap any layer(s) in your network and that logs the execution times of the initialization, forward pass and backward pass. The benchmark layer can also be mapped to an operator like @
to make it easy to add debugging to your network.
In [ ]:
!pip install "thinc>=8.0.0a0" ml_datasets
To log the results, we first set up a custom logger using Python's logging
module. You could also just print the stats instead, but using logging
is cleaner, since it lets other users modify the logger's behavior more easily, and separates the logs from other output and write it to a file (e.g. if you're benchmarking several layers during training). The following logging config will output the date and time, the name of the logger and the logged results.
In [1]:
import logging
logger = logging.getLogger("thinc:benchmark")
if not logger.hasHandlers(): # prevent Jupyter from adding multiple loggers
formatter = logging.Formatter('%(asctime)s %(name)s %(message)s', datefmt="%Y-%m-%d %H:%M:%S")
handler = logging.StreamHandler()
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(logging.DEBUG)
Here's a minimalistic time logger that can be initialized with the name of a given layer, and can track several events (e.g. "forward"
and "backward"
). When the TimeLogger.end
method is called, the output is formatted nicely and the elapsed time is logged with the logger name and colored label.
In [2]:
from timeit import default_timer
from wasabi import color
class TimeLogger:
def __init__(self, name):
self.colors = {"forward": "green", "backward": "blue"}
self.name = name
self.timers = {}
def start(self, name):
self.timers[name] = default_timer()
def end(self, name):
result = default_timer() - self.timers[name]
label = f"{name.upper():<8}"
label = color(label, self.colors.get(name), bold=True)
logger.debug(f"{self.name:<12} | {label} | {result:.6f}")
The benchmark
layer now has to wrap the forward pass, backward pass and initialization of the layer it wraps and log the execution times. It then returns a Thinc model instance with the custom forward
function and a custom init
function. We'll also allow setting a custom name
to make it easier to tell multiple wrapped benchmark layers apart.
In [3]:
from thinc.api import Model
def benchmark(layer, name=None):
name = name if name is not None else layer.name
t = TimeLogger(name)
def init(model, X, Y):
t.start("init")
result = layer.initialize(X, Y)
t.end("init")
return result
def forward(model, X, is_train):
t.start("forward")
layer_Y, layer_callback = layer(X, is_train=is_train)
t.end("forward")
def backprop(dY):
t.start("backward")
result = layer_callback(dY)
t.end("backward")
return result
return layer_Y, backprop
return Model(f"benchmark:{layer.name}", forward, init=init)
In [4]:
import numpy
from thinc.api import chain, Linear
X = numpy.zeros((1, 2), dtype="f")
model = benchmark(chain(benchmark(Linear(1)), Linear(1)), name="outer")
model.initialize(X=X)
Y, backprop = model(X, is_train=False)
dX = backprop(Y)
benchmark
layer as an operatorAlternatively, we can also use Model.define_operators
to map benchmark
to an operator like @
. The left argument of the operator is the first argument passed into the function (the layer) and the right argument is the second argument (the name). The following example wraps the whole network (two chained Linear
layers) in a benchmark layer named "outer"
, and the first Linear
layer in a benchmark layer named "first"
.
In [5]:
from thinc.api import Model
with Model.define_operators({">>": chain, "@": benchmark}):
model = (Linear(1) @ "first" >> Linear(1)) @ "outer"
model.initialize(X=X)
Y, backprop = model(X, is_train=True)
dX = backprop(Y)
In [7]:
from thinc.api import Model, chain, Relu, Softmax, Adam
import ml_datasets
n_hidden = 32
dropout = 0.2
with Model.define_operators({">>": chain, "@": benchmark}):
model = (
Relu(nO=n_hidden, dropout=dropout) @ "relu1"
>> Relu(nO=n_hidden, dropout=dropout) @ "relu2"
>> Softmax()
)
train_X = numpy.zeros((5, 784), dtype="f")
train_Y = numpy.zeros((540, 10), dtype="f")
model.initialize(X=train_X[:5], Y=train_Y[:5])
optimizer = Adam(0.001)
for i in range(10):
for X, Y in model.ops.multibatch(8, train_X, train_Y, shuffle=True):
Yh, backprop = model.begin_update(X)
backprop(Yh - Y)
model.finish_update(optimizer)