In [1]:
# we assume that we have the pycnn module in your path.
# we also assume that LD_LIBRARY_PATH includes a pointer to where libcnn_shared.so is.
from pycnn import *

Working with the pyCNN package

The pyCNN package is intended for neural-network processing on the CPU, and is particularly suited for NLP applications. It is a python-wrapper for the CNN package written by Chris Dyer.

There are two modes of operation:

  • Static networks, in which a network is built and then being fed with different inputs/outputs. Most NN packages work this way.
  • Dynamic networks, in which a new network is built for each training example (sharing parameters with the networks of other trianing examples). This approach is what makes pyCNN unique, and where most of it power comes from.

We will describe both of these modes.

Package Fundementals

The main piece of pyCNN is the ComputationGraph, which is what essentially defined a neural network. The ComputationGraph is composed of expressions, which relate to the inputs and outputs of the network, as well as the Parameters of the network. The parameters are the things in the network that are optimized over time, and all of the parameters sit inside a Model. There are trainers (for example SimpleSGDTrainer) that are in charge of setting the parameter values.

We will not be using the ComputationGraph directly, but it is there in the background, as a singleton object. When pycnn is imported, a new ComputationGraph is created. We can then reset the computation graph to a new state by calling renew_cg().

Static Networks

The life-cycle of a pyCNN program is:

  1. Create a Model, and populate it with Parameters.
  2. Renew the computation graph, and create Expression representing the network. (the network will include the Expressions for the Parameters defined in the model)
  3. Optimize the model for the objective of the network.

As an example, consider a model for solving the "xor" problem. The network has two inputs, which can be 0 or 1, and a single output which should be the xor of the two inputs. We will model this as a multi-layer perceptron with a single hidden node.

Let $x = x_1, x_2$ be our input. We will have a hidden-layer of 8 nodes, and an output layer of a single node. The activation on the hidden layer will be a $tanh$. Our network will then be:

$\sigma(V(tanh(Wx+b)))$

Where $W$ is a $8 \times 2$ matrix, and $V$ is an $8 \times 1$ matrix, and $b$ is an 8-dim vector.

We want the output to be either 0 or 1, so we take the output layer to be the logistic-sigmoid function, $\sigma(x)$, that takes values between $-\infty$ and $+\infty$ and returns numbers in $[0,1]$.

We will begin by defining the model and the computation graph.


In [2]:
# create a model and add the parameters.
m = Model()
m.add_parameters("W", (8,2))
m.add_parameters("V", (1,8))
m.add_parameters("b", (8))

renew_cg() # new computation graph. not strictly needed here, but good practice.

# associate the parameters with cg Expressions
W = parameter(m["W"])
V = parameter(m["V"])
b = parameter(m["b"])

In [3]:
#b[1:-1].value()
b.value()


Out[3]:
[-0.391936719417572,
 0.4916459918022156,
 -0.471852570772171,
 0.8333062529563904,
 -0.6168352961540222,
 -0.2860015630722046,
 0.13444989919662476,
 -0.7587275505065918]

The first block creates a model and populates it with parameters. The second block creates a computation graph and adds the parameters to it, transforming them into Expressions. The need to distinguish model parameters from "expressions" will become clearer later.

We now make use of the W and V expressions, in order to created the complete expression for the network.


In [4]:
x = vecInput(2) # an input vector of size 2. Also an expression.
output = logistic(V*(tanh((W*x)+b)))

In [5]:
# we can now query our network
x.set([0,0])
output.value()


Out[5]:
0.46759992837905884

In [6]:
# we want to be able to define a loss, so we need an input expression to work against.
y = scalarInput(0) # this will hold the correct answer
loss = binary_log_loss(output, y)

In [7]:
x.set([1,0])
y.set(0)
print loss.value()

y.set(1)
print loss.value()


-0.510900914669
0.916177868843

Training

We now want to set the parameters weights such that the loss is minimized.

For this, we will use a trainer object. A trainer is constructed with respect to the parameters of a given model.


In [8]:
trainer = SimpleSGDTrainer(m)

To use the trainer, we need to:

  • call the forward_scalar method of ComputationGraph. This will run a forward pass through the network, calculating all the intermediate values until the last one (loss, in our case), and then convert the value to a scalar. (The final output of our network must be a single scalar value. However, if we do not care about the value, we can just use cg.forward() instead of cg.forward_sclar().
  • call the backward method of ComputationGraph. This will run a backward pass from the last node, calculating the gradients with respect to minimizing the last expression (in our case we want to minimize the loss). The gradients are stored in the model, and we can now let the trainer take care of the optimization step.
  • call trainer.update() to optimize the values with respect to the latest gradients.

In [18]:
x.set([1,0])
y.set(1)
loss_value = loss.value() # this performs a forward through the network.
print "the loss before step is:",loss_value

# now do an optimization step
loss.backward()  # compute the gradients
trainer.update()

# see how it affected the loss:
loss_value = loss.value(recalculate=True) # recalucate=True means "don't use precomputed value"
print "the loss after step is:",loss_value


the loss before step is: 0.237541377544
the loss after step is: 0.215936839581

The optimization step indeed made the loss decrease. We now need to run this in a loop. To this end, we will create a training set, and iterate over it.

For the xor problem, the training instances are easy to create.


In [19]:
def create_xor_instances(num_rounds=2000):
    questions = []
    answers = []
    for round in xrange(num_rounds):
        for x1 in 0,1:
            for x2 in 0,1:
                answer = 0 if x1==x2 else 1
                questions.append((x1,x2))
                answers.append(answer)
    return questions, answers 

questions, answers = create_xor_instances()

We now feed each question / answer pair to the network, and try to minimize the loss.


In [20]:
total_loss = 0
seen_instances = 0
for question, answer in zip(questions, answers):
    x.set(question)
    y.set(answer)
    seen_instances += 1
    total_loss += loss.value()
    loss.backward()
    trainer.update()
    if (seen_instances > 1 and seen_instances % 100 == 0):
        print "average loss is:",total_loss / seen_instances


average loss is: -0.0736760540307
average loss is: -0.0431773296744
average loss is: -0.0329923908412
average loss is: -0.0265793037787
average loss is: -0.0219315118343
average loss is: -0.0185768032136
average loss is: -0.0160888710139
average loss is: -0.0141828030394
average loss is: -0.0126798275403
average loss is: -0.0114656574577
average loss is: -0.0104649186862
average loss is: -0.00962608709466
average loss is: -0.0089128920701
average loss is: -0.00829910863324
average loss is: -0.00776528354424
average loss is: -0.00729674054834
average loss is: -0.00688219008419
average loss is: -0.00651278639471
average loss is: -0.00618152223086
average loss is: -0.00588276661444
average loss is: -0.00561194644775
average loss is: -0.00536530932623
average loss is: -0.00513974289653
average loss is: -0.00493264887171
average loss is: -0.00474184197951
average loss is: -0.00456546585523
average loss is: -0.00440193738926
average loss is: -0.00424990062352
average loss is: -0.00410817924357
average loss is: -0.00397575346412
average loss is: -0.00385173368884
average loss is: -0.0037353409997
average loss is: -0.00362589035656
average loss is: -0.00352277747492
average loss is: -0.0034254647359
average loss is: -0.00333347332166
average loss is: -0.0032463763395
average loss is: -0.00316379235218
average loss is: -0.00308537749233
average loss is: -0.00301082218514
average loss is: -0.00293984767561
average loss is: -0.00287220093921
average loss is: -0.0028076520279
average loss is: -0.00274599160401
average loss is: -0.00268702900767
average loss is: -0.00263059020554
average loss is: -0.00257651594453
average loss is: -0.00252466075627
average loss is: -0.00247489001897
average loss is: -0.00242707972354
average loss is: -0.00238111569624
average loss is: -0.00233689236173
average loss is: -0.00229431213741
average loss is: -0.00225328474719
average loss is: -0.00221372625837
average loss is: -0.00217555890941
average loss is: -0.00213871019562
average loss is: -0.00210311289732
average loss is: -0.00206870423375
average loss is: -0.00203542506477
average loss is: -0.00200322053556
average loss is: -0.00197203894936
average loss is: -0.00194183212189
average loss is: -0.00191255454974
average loss is: -0.00188416398736
average loss is: -0.00185662054628
average loss is: -0.00182988676211
average loss is: -0.00180392725441
average loss is: -0.00177870851088
average loss is: -0.00175419908922
average loss is: -0.00173036946333
average loss is: -0.00170719160213
average loss is: -0.00168463913631
average loss is: -0.00166268690267
average loss is: -0.0016413110688
average loss is: -0.00162048917527
average loss is: -0.00160019980025
average loss is: -0.00158042266556
average loss is: -0.0015611385186
average loss is: -0.00154232902762

our network is now trained, let's verify that it indeed learned the xor function:


In [21]:
x.set([0,1])
print "0,1",output.value()

x.set([1,0])
print "1,0",output.value()

x.set([0,0])
print "0,0",output.value()

x.set([1,1])
print "1,1",output.value()


0,1 0.998457551003
1,0 0.998303294182
0,0 0.00132494198624
1,1 0.00213180552237

In case we are curious about the parameter values, we can query them:


In [22]:
W.value()


Out[22]:
array([[ 1.90704894,  1.75941706],
       [-0.51026875, -0.73472238],
       [ 1.00825202,  0.86155057],
       [-1.68297076, -1.80956674],
       [-1.2174753 , -1.15852094],
       [-3.23514462,  2.84460068],
       [ 1.63482118,  1.50156498],
       [ 2.60078287, -3.01065731]])

In [23]:
V.value()


Out[23]:
array([[ 2.06817722,  0.85734618,  0.69402838,  3.06676149, -1.10298848,
         5.04940414,  1.77656221,  4.74531031]])

In [24]:
b.value()


Out[24]:
[-0.5166335701942444,
 0.8676984906196594,
 0.008914745412766933,
 2.637610912322998,
 0.019709745422005653,
 -1.4869117736816406,
 -0.33558133244514465,
 -1.3341320753097534]

To summarize

Here is a complete program:


In [25]:
# define the parameters
m = Model()
m.add_parameters("W", (8,2))
m.add_parameters("V", (1,8))
m.add_parameters("b", (8))

# renew the computation graph
renew_cg()

# add the parameters to the graph
W = parameter(m["W"])
V = parameter(m["V"])
b = parameter(m["b"])

# create the network
x = vecInput(2) # an input vector of size 2.
output = logistic(V*(tanh((W*x)+b)))
# define the loss with respect to an output y.
y = scalarInput(0) # this will hold the correct answer
loss = binary_log_loss(output, y)

# create training instances
def create_xor_instances(num_rounds=2000):
    questions = []
    answers = []
    for round in xrange(num_rounds):
        for x1 in 0,1:
            for x2 in 0,1:
                answer = 0 if x1==x2 else 1
                questions.append((x1,x2))
                answers.append(answer)
    return questions, answers 

questions, answers = create_xor_instances()

# train the network
trainer = SimpleSGDTrainer(m)

total_loss = 0
seen_instances = 0
for question, answer in zip(questions, answers):
    x.set(question)
    y.set(answer)
    seen_instances += 1
    total_loss += loss.value()
    loss.backward()
    trainer.update()
    if (seen_instances > 1 and seen_instances % 100 == 0):
        print "average loss is:",total_loss / seen_instances


average loss is: 0.000710777640343
average loss is: 0.00538051903248
average loss is: 0.008753751417
average loss is: 0.00962603349239
average loss is: 0.00916970175505
average loss is: 0.00848576014241
average loss is: 0.00780005061201
average loss is: 0.0071754240524
average loss is: 0.0066243140565
average loss is: 0.00614239533804
average loss is: 0.00572097523307
average loss is: 0.00535109124146
average loss is: 0.00502474307441
average loss is: 0.00473515995978
average loss is: 0.00447674103267
average loss is: 0.00424488715769
average loss is: 0.00403580133858
average loss is: 0.00384634935659
average loss is: 0.00367392660486
average loss is: 0.00351635961281
average loss is: 0.00337182009371
average loss is: 0.00323876434577
average loss is: 0.00311587982183
average loss is: 0.00300204546501
average loss is: 0.0028963002421
average loss is: 0.00279780962182
average loss is: 0.00270585036854
average loss is: 0.00261979493978
average loss is: 0.00253909190298
average loss is: 0.00246325349792
average loss is: 0.00239185206555
average loss is: 0.0023245080647
average loss is: 0.00226088441619
average loss is: 0.00220067988526
average loss is: 0.00214362350718
average loss is: 0.00208947365366
average loss is: 0.0020380129623
average loss is: 0.00198904496207
average loss is: 0.00194239226152
average loss is: 0.00189789384627
average loss is: 0.00185540357037
average loss is: 0.00181478771642
average loss is: 0.00177592387499
average loss is: 0.00173870052693
average loss is: 0.0017030151133
average loss is: 0.00166877407409
average loss is: 0.00163589137076
average loss is: 0.00160428755732
average loss is: 0.00157388916344
average loss is: 0.00154462760519
average loss is: 0.00151644010562
average loss is: 0.00148926836801
average loss is: 0.00146305795107
average loss is: 0.00143775891996
average loss is: 0.00141332406345
average loss is: 0.00138970975105
average loss is: 0.00136687505267
average loss is: 0.00134478173209
average loss is: 0.00132339411352
average loss is: 0.00130267874258
average loss is: 0.00128260426256
average loss is: 0.00126314118082
average loss is: 0.0012442619527
average loss is: 0.00122594058595
average loss is: 0.00120815254805
average loss is: 0.00119087479342
average loss is: 0.00117408560255
average loss is: 0.00115776436539
average loss is: 0.00114189174765
average loss is: 0.00112644939362
average loss is: 0.00111142013773
average loss is: 0.00109678746174
average loss is: 0.00108253585497
average loss is: 0.0010686503392
average loss is: 0.00105511702904
average loss is: 0.00104192272314
average loss is: 0.00102905462018
average loss is: 0.00101650077129
average loss is: 0.0010042497123
average loss is: 0.000992290693015

Dynamic Networks

Dynamic networks are very similar to static ones, but instead of creating the network once and then calling "set" in each training example to change the inputs, we just create a new network for each training example.

We present an example below. While the value of this may not be clear in the xor example, the dynamic approach is very convenient for networks for which the structure is not fixed, such as recurrent or recursive networks.


In [26]:
# create training instances, this is as before
def create_xor_instances(num_rounds=2000):
    questions = []
    answers = []
    for round in xrange(num_rounds):
        for x1 in 0,1:
            for x2 in 0,1:
                answer = 0 if x1==x2 else 1
                questions.append((x1,x2))
                answers.append(answer)
    return questions, answers 

questions, answers = create_xor_instances()

# create a network for the xor problem given input and output
def create_xor_network(model, inputs, expected_answer):
    renew_cg()
    W = parameter(model["W"])
    V = parameter(model["V"])
    b = parameter(model["b"])
    x = vecInput(len(inputs))
    x.set(inputs)
    y = scalarInput(expected_answer)
    output = logistic(V*(tanh((W*x)+b)))
    loss =  binary_log_loss(output, y)
    return loss

m = Model()
m.add_parameters("W", (8,2))
m.add_parameters("V", (1,8))
m.add_parameters("b", (8))
trainer = SimpleSGDTrainer(m)

seen_instances = 0
total_loss = 0
for question, answer in zip(questions, answers):
    loss = create_xor_network(m, question, answer)
    seen_instances += 1
    total_loss += loss.value()
    loss.backward()
    trainer.update()
    if (seen_instances > 1 and seen_instances % 100 == 0):
        print "average loss is:",total_loss / seen_instances


average loss is: -0.0434117043018
average loss is: -0.030382682085
average loss is: -0.0260350414117
average loss is: -0.0245157124847
average loss is: -0.0215704288483
average loss is: -0.0178998744239
average loss is: -0.0148817687695
average loss is: -0.0126448741369
average loss is: -0.0109926707587
average loss is: -0.00973503550515
average loss is: -0.00874631876634
average loss is: -0.00794731451975
average loss is: -0.00728702781435
average loss is: -0.00673133983875
average loss is: -0.00625665952327
average loss is: -0.00584609933663
average loss is: -0.00548723993439
average loss is: -0.00517074972184
average loss is: -0.00488942034788
average loss is: -0.00463762690453
average loss is: -0.00441089029069
average loss is: -0.00420562038346
average loss is: -0.00401886963711
average loss is: -0.00384821915689
average loss is: -0.00369165667389
average loss is: -0.00354749959141
average loss is: -0.00341432033779
average loss is: -0.00329090173223
average loss is: -0.00317620157505
average loss is: -0.00306932546338
average loss is: -0.00296949436062
average loss is: -0.00287602832846
average loss is: -0.00278833741689
average loss is: -0.00270589900306
average loss is: -0.00262825351767
average loss is: -0.00255499271922
average loss is: -0.00248575425367
average loss is: -0.00242021449964
average loss is: -0.00235808361865
average loss is: -0.00229910133278
average loss is: -0.00224303434427
average loss is: -0.00218967100073
average loss is: -0.00213881912322
average loss is: -0.00209030493051
average loss is: -0.00204397013002
average loss is: -0.00199967051132
average loss is: -0.00195727446959
average loss is: -0.00191666179603
average loss is: -0.00187772191776
average loss is: -0.0018403530397
average loss is: -0.00180446159583
average loss is: -0.0017699615956
average loss is: -0.00173677365427
average loss is: -0.00170482409213
average loss is: -0.00167404436261
average loss is: -0.00164437123308
average loss is: -0.00161574586736
average loss is: -0.00158811347559
average loss is: -0.0015614229679
average loss is: -0.00153562722582
average loss is: -0.00151068181977
average loss is: -0.00148654521539
average loss is: -0.00146317849714
average loss is: -0.00144054521572
average loss is: -0.00141861138174
average loss is: -0.0013973448718
average loss is: -0.00137671559425
average loss is: -0.00135669496189
average loss is: -0.00133725664821
average loss is: -0.00131837574296
average loss is: -0.00130002835688
average loss is: -0.00128219219668
average loss is: -0.00126484589098
average loss is: -0.00124796962008
average loss is: -0.00123154441345
average loss is: -0.00121555239431
average loss is: -0.00119997651854
average loss is: -0.00118480070451
average loss is: -0.00117000979887
average loss is: -0.00115558931208

In [ ]: