Intro to Thinc's Model class, model definition and methods

Thinc follows a functional-programming approach to model definition. Its approach is especially effective for complicated network architectures, and use cases where different data types need to be passed through the network to reach specific subcomponents. This notebook shows how to compose Thinc models and how to use the Model class and its methods.


In [ ]:
!pip install "thinc>=8.0.0a0"

Thinc provides a variety of layers, functions that create Model instances. Thinc tries to avoid inheritance, preferring function composition. The Linear function gives you a model that computes Y = X @ W.T + b (the function is defined in thinc.layers.linear.forward).


In [ ]:
import numpy
from thinc.api import Linear, zero_init

n_in = numpy.zeros((128, 16), dtype="f")
n_out = numpy.zeros((128, 10), dtype="f")

model = Linear(nI=n_in.shape[1], nO=n_out.shape[1], init_W=zero_init)
nI = model.get_dim("nI")
nO = model.get_dim("nO")
print(f"Initialized model with input dimension nI={nI} and output dimension nO={nO}.")

Models support dimension inference from data. You can defer some or all of the dimensions.


In [ ]:
model = Linear(init_W=zero_init)
print(f"Initialized model with no input/ouput dimensions.")

In [ ]:
X = numpy.zeros((128, 16), dtype="f")
Y = numpy.zeros((128, 10), dtype="f")
model.initialize(X=X, Y=Y)
nI = model.get_dim("nI")
nO = model.get_dim("nO")
print(f"Initialized model with input dimension nI={nI} and output dimension nO={nO}.")

The chain function wires two model instances together, with a feed-forward relationship. Dimension inference is especially helpful here.


In [ ]:
from thinc.api import chain, glorot_uniform_init

n_hidden = 128
X = numpy.zeros((128, 16), dtype="f")
Y = numpy.zeros((128, 10), dtype="f")

model = chain(Linear(n_hidden, init_W=glorot_uniform_init), Linear(init_W=zero_init),)
model.initialize(X=X, Y=Y)
nI = model.get_dim("nI")
nO = model.get_dim("nO")
nO_hidden = model.layers[0].get_dim("nO")
print(f"Initialized model with input dimension nI={nI} and output dimension nO={nO}.")
print(f"The size of the hidden layer is {nO_hidden}.")

We call functions like chain combinators. Combinators take one or more models as arguments, and return another model instance, without introducing any new weight parameters. Another useful combinator is concatenate:


In [ ]:
from thinc.api import concatenate

model = concatenate(Linear(n_hidden), Linear(n_hidden))
model.initialize(X=X)
nO = model.get_dim("nO")
print(f"Initialized model with output dimension nO={nO}.")

The concatenate function produces a layer that runs the child layers separately, and then concatenates their outputs together. This is often useful for combining features from different sources. For instance, we use this all the time to build spaCy's embedding layers.

Some combinators work on a layer and a numeric argument. For instance, the clone combinator creates a number of copies of a layer, and chains them together into a deep feed-forward network. The shape inference is especially handy here: we want the first and last layers to have different shapes, so we can avoid providing any dimensions into the layer we clone. We then just have to specify the first layer's output size, and we can let the rest of the dimensions be inferred from the data.


In [ ]:
from thinc.api import clone

model = clone(Linear(), 5)
model.layers[0].set_dim("nO", n_hidden)
model.initialize(X=X, Y=Y)
nI = model.get_dim("nI")
nO = model.get_dim("nO")
print(f"Initialized model with input dimension nI={nI} and output dimension nO={nO}.")

We can apply clone to model instances that have child layers, making it easy to define more complex architectures. For instance, we often want to attach an activation function and dropout to a linear layer, and then repeat that substructure a number of times. Of course, you can make whatever intermediate functions you find helpful.


In [ ]:
from thinc.api import Relu, Dropout

def Hidden(dropout=0.2):
    return chain(Linear(), Relu(), Dropout(dropout))

model = clone(Hidden(0.2), 5)

Some combinators are unary functions: they take only one model. These are usually input and output transformations. For instance, the with_array combinator produces a model that flattens lists of arrays into a single array, and then calls the child layer to get the flattened output. It then reverses the transformation on the output.


In [ ]:
from thinc.api import with_array

model = with_array(Linear(4, 2))
Xs = [model.ops.alloc2f(10, 2, dtype="f")]
model.initialize(X=Xs)
Ys = model.predict(Xs)
print(f"Prediction shape: {Ys[0].shape}.")

The combinator system makes it easy to wire together complex models very concisely. A concise notation is a huge advantage, because it lets you read and review your model with less clutter – making it easy to spot mistakes, and easy to make changes. For the ultimate in concise notation, you can also take advantage of Thinc's operator overloading, which lets you use an infix notation. Operator overloading can lead to unexpected results, so you have to enable the overloading explicitly in a contextmanager. This also lets you control how the operators are bound, making it easy to use the feature with your own combinators. For instance, here is a definition for a text classification network:


In [ ]:
from thinc.api import add, chain, concatenate, clone
from thinc.api import with_array, reduce_max, reduce_mean, residual
from thinc.api import Model, Embed, Maxout, Softmax

nH = 5

with Model.define_operators({">>": chain, "|": concatenate, "+": add, "**": clone}):
    model = (
        with_array(
            (Embed(128, column=0) + Embed(64, column=1))
            >> Maxout(nH, normalize=True, dropout=0.2)
        )
        >> (reduce_max() | reduce_mean())
        >> residual(Relu() >> Dropout(0.2)) ** 2
        >> Softmax()
    )

The network above will expect a list of arrays as input, where each array should have two columns with different numeric identifier features. The two features will be embedded using separate embedding tables, and the two vectors added and passed through a Maxout layer with layer normalization and dropout. The sequences then pass through two pooling functions, and the concatenated results are passed through 2 Relu layers with dropout and residual connections. Finally, the sequence vectors are passed through an output layer, which has a Softmax activation.


Using a model

Define the model:


In [ ]:
from thinc.api import Linear, Adam
import numpy

X = numpy.zeros((128, 10), dtype="f")
dY = numpy.zeros((128, 10), dtype="f")

model = Linear(10, 10)

Initialize the model with a sample of the data:


In [ ]:
model.initialize(X=X, Y=dY)

Run the model over some data


In [ ]:
Y = model.predict(X)
Y

Get a callback to backpropagate:


In [ ]:
Y, backprop = model.begin_update(X)
Y, backprop

Run the callback to calculate the gradient with respect to the inputs. If the model has trainable parameters, gradients for the parameters are accumulated internally, as a side-effect.


In [ ]:
dX = backprop(dY)
dX

The backprop() callback only increments the parameter gradients, it doesn't actually change the weights. To increment the weights, call model.finish_update(), passing it an optimizer:


In [ ]:
optimizer = Adam()
model.finish_update(optimizer)

You can get and set dimensions, parameters and attributes by name:


In [ ]:
dim = model.get_dim("nO")
W = model.get_param("W")
model.attrs["hello"] = "world"
model.attrs.get("foo", "bar")

You can also retrieve parameter gradients, and increment them explicitly:


In [ ]:
dW = model.get_grad("W")
model.inc_grad("W", dW * 0.1)

Finally, you can serialize models using the model.to_bytes and model.to_disk methods, and load them back with from_bytes and from_disk.


In [ ]:
model_bytes = model.to_bytes()