Working with the python DyNet package

The DyNet package is intended for training and using neural networks, and is particularly suited for applications with dynamically changing network structures. It is a python-wrapper for the DyNet C++ package.

In neural network packages there are generally two modes of operation:

Static networks, in which a network is built and then being fed with different inputs/outputs. Most NN packages work this way.
Dynamic networks, in which a new network is built for each training example (sharing parameters with the networks of other training examples). This approach is what makes DyNet unique, and where most of its power comes from.

We will describe both of these modes.

Package Fundamentals

The main piece of DyNet is the ComputationGraph, which is what essentially defines a neural network. The ComputationGraph is composed of expressions, which relate to the inputs and outputs of the network, as well as the Parameters of the network. The parameters are the things in the network that are optimized over time, and all of the parameters sit inside a ParameterCollection. There are trainers (for example SimpleSGDTrainer) that are in charge of setting the parameter values.

We will not be using the ComputationGraph directly, but it is there in the background, as a singleton object. When dynet is imported, a new ComputationGraph is created. We can then reset the computation graph to a new state by calling renew_cg().

Static Networks

The life-cycle of a DyNet program is:

Create a ParameterCollection, and populate it with Parameters.
Renew the computation graph, and create Expression representing the network (the network will include the Expressions for the Parameters defined in the parameter collection).
Optimize the model for the objective of the network.

As an example, consider a model for solving the "xor" problem. The network has two inputs, which can be 0 or 1, and a single output which should be the xor of the two inputs. We will model this as a multi-layer perceptron with a single hidden layer.

Let $x = x_1, x_2$ be our input. We will have a hidden layer of 8 nodes, and an output layer of a single node. The activation on the hidden layer will be a $\tanh$. Our network will then be:

$\sigma(V(\tanh(Wx+b)))$

Where $W$ is a $8 \times 2$ matrix, $V$ is an $8 \times 1$ matrix, and $b$ is an 8-dim vector.

We want the output to be either 0 or 1, so we take the output layer to be the logistic-sigmoid function, $\sigma(x)$, that takes values between $-\infty$ and $+\infty$ and returns numbers in $[0,1]$.

We will begin by defining the model and the computation graph.



In [1]:

    
# we assume that we have the dynet module in your path.
# OUTDATED: we also assume that LD_LIBRARY_PATH includes a pointer to where libcnn_shared.so is.
import dynet as dy



In [2]:

    
# create a parameter collection and add the parameters.
m = dy.ParameterCollection()
pW = m.add_parameters((8,2))
pV = m.add_parameters((1,8))
pb = m.add_parameters((8))

renew_cg() # new computation graph. not strictly needed here, but good practice.

# associate the parameters with cg Expressions
W = parameter(pW)
V = parameter(pV)
b = parameter(pb)



In [3]:

    
#b[1:-1].value()
b.value()









    Out[3]:





[-0.5920619964599609,
 -0.4818088114261627,
 -0.011437613517045975,
 -0.7547096610069275,
 0.2887613773345947,
 -0.39806437492370605,
 -0.8494511246681213,
 0.295582115650177]

The first block creates a parameter collection and populates it with parameters. The second block creates a computation graph and adds the parameters to it, transforming them into Expressions. The need to distinguish model parameters from "expressions" will become clearer later.

We now make use of the W and V expressions, in order to create the complete expression for the network.



In [4]:

    
x = vecInput(2) # an input vector of size 2. Also an expression.
output = logistic(V*(tanh((W*x)+b)))



In [5]:

    
# we can now query our network
x.set([0,0])
output.value()









    Out[5]:





0.706532895565033



In [6]:

    
# we want to be able to define a loss, so we need an input expression to work against.
y = scalarInput(0) # this will hold the correct answer
loss = binary_log_loss(output, y)



In [7]:

    
x.set([1,0])
y.set(0)
print loss.value()

y.set(1)
print loss.value()









    



1.25551486015
0.335373580456

Training

We now want to set the parameter weights such that the loss is minimized.

For this, we will use a trainer object. A trainer is constructed with respect to the parameters of a given model.



In [8]:

    
trainer = SimpleSGDTrainer(m)

To use the trainer, we need to:

call the forward_scalar method of ComputationGraph. This will run a forward pass through the network, calculating all the intermediate values until the last one (loss, in our case), and then convert the value to a scalar. The final output of our network must be a single scalar value. However, if we do not care about the value, we can just use cg.forward() instead of cg.forward_sclar().
call the backward method of ComputationGraph. This will run a backward pass from the last node, calculating the gradients with respect to minimizing the last expression (in our case we want to minimize the loss). The gradients are stored in the parameter collection, and we can now let the trainer take care of the optimization step.
call trainer.update() to optimize the values with respect to the latest gradients.



In [9]:

    
x.set([1,0])
y.set(1)
loss_value = loss.value() # this performs a forward through the network.
print "the loss before step is:",loss_value

# now do an optimization step
loss.backward()  # compute the gradients
trainer.update()

# see how it affected the loss:
loss_value = loss.value(recalculate=True) # recalculate=True means "don't use precomputed value"
print "the loss after step is:",loss_value









    



the loss before step is: 0.335373580456
the loss after step is: 0.296859383583

The optimization step indeed made the loss decrease. We now need to run this in a loop. To this end, we will create a training set, and iterate over it.

For the xor problem, the training instances are easy to create.



In [10]:

    
def create_xor_instances(num_rounds=2000):
    questions = []
    answers = []
    for round in xrange(num_rounds):
        for x1 in 0,1:
            for x2 in 0,1:
                answer = 0 if x1==x2 else 1
                questions.append((x1,x2))
                answers.append(answer)
    return questions, answers 

questions, answers = create_xor_instances()

We now feed each question / answer pair to the network, and try to minimize the loss.



In [11]:

    
total_loss = 0
seen_instances = 0
for question, answer in zip(questions, answers):
    x.set(question)
    y.set(answer)
    seen_instances += 1
    total_loss += loss.value()
    loss.backward()
    trainer.update()
    if (seen_instances > 1 and seen_instances % 100 == 0):
        print "average loss is:",total_loss / seen_instances









    



average loss is: 0.730996069312
average loss is: 0.686455376148
average loss is: 0.614968097508
average loss is: 0.529396591447
average loss is: 0.454356552631
average loss is: 0.39492503399
average loss is: 0.348310606687
average loss is: 0.311234809482
average loss is: 0.281200638587
average loss is: 0.256437818106
average loss is: 0.235696636033
average loss is: 0.218082525641
average loss is: 0.202943060785
average loss is: 0.189793206944
average loss is: 0.178265773896
average loss is: 0.168078109015
average loss is: 0.15900931143
average loss is: 0.150884356805
average loss is: 0.143562835396
average loss is: 0.136930837112
average loss is: 0.130894997159
average loss is: 0.125378077089
average loss is: 0.120315633187
average loss is: 0.115653475622
average loss is: 0.111345707807
average loss is: 0.107353201057
average loss is: 0.103642390902
average loss is: 0.100184321725
average loss is: 0.0969538828368
average loss is: 0.0939291894056
average loss is: 0.0910910811149
average loss is: 0.0884227104994
average loss is: 0.0859092032744
average loss is: 0.0835373785728
average loss is: 0.0812955136038
average loss is: 0.0791731475857
average loss is: 0.0771609158713
average loss is: 0.0752504101568
average loss is: 0.0734340592178
average loss is: 0.0717050271845
average loss is: 0.0700571256665
average loss is: 0.0684847396141
average loss is: 0.0669827620572
average loss is: 0.0655465372522
average loss is: 0.0641718128339
average loss is: 0.0628546962203
average loss is: 0.0615916178524
average loss is: 0.0603792975615
average loss is: 0.0592147165184
average loss is: 0.0580950913344
average loss is: 0.0570178513814
average loss is: 0.0559806190546
average loss is: 0.0549811920022
average loss is: 0.0540175269391
average loss is: 0.0530877257938
average loss is: 0.0521900229302
average loss is: 0.0513227736969
average loss is: 0.0504844442235
average loss is: 0.0496736022536
average loss is: 0.0488889090025
average loss is: 0.0481291114653
average loss is: 0.0473930355647
average loss is: 0.0466795804093
average loss is: 0.0459877123818
average loss is: 0.0453164599289
average loss is: 0.0446649091876
average loss is: 0.0440321997496
average loss is: 0.0434175205679
average loss is: 0.0428201068594
average loss is: 0.042239236579
average loss is: 0.041674227424
average loss is: 0.0411244342562
average loss is: 0.0405892467939
average loss is: 0.0400680867989
average loss is: 0.0395604063634
average loss is: 0.0390656857708
average loss is: 0.0385834318376
average loss is: 0.0381131761705
average loss is: 0.037654473684
average loss is: 0.0372069010154

Our network is now trained. Let's verify that it indeed learned the xor function:



In [12]:

    
x.set([0,1])
print "0,1",output.value()

x.set([1,0])
print "1,0",output.value()

x.set([0,0])
print "0,0",output.value()

x.set([1,1])
print "1,1",output.value()









    



0,1 0.998090803623
1,0 0.998076915741
0,0 0.00135990511626
1,1 0.00213058013469

In case we are curious about the parameter values, we can query them:



In [13]:

    
W.value()









    Out[13]:





array([[ 1.26847982,  1.25287616],
       [ 0.91610891,  0.80253637],
       [ 3.18741179, -2.58643913],
       [-0.82472938, -0.68830448],
       [-2.74162889,  3.30151606],
       [ 0.2677069 ,  0.46926948],
       [-2.60197234, -2.61786079],
       [ 0.89582258, -0.44721049]])



In [14]:

    
V.value()









    Out[14]:





array([[-2.33788562, -1.54022419, -4.58266163, -0.91096258, -4.88002253,
        -0.70912606, -4.09791088, -0.61150461]])



In [15]:

    
b.value()









    Out[15]:





[-1.9798537492752075,
 -1.3854612112045288,
 1.2350027561187744,
 -0.8094932436943054,
 1.3227168321609497,
 -0.5688062906265259,
 0.9074684381484985,
 0.21831640601158142]

To summarize

Here is a complete program:



In [16]:

    
# define the parameters
m = ParameterCollection()
pW = m.add_parameters((8,2))
pV = m.add_parameters((1,8))
pb = m.add_parameters((8))

# renew the computation graph
renew_cg()

# add the parameters to the graph
W = parameter(pW)
V = parameter(pV)
b = parameter(pb)

# create the network
x = vecInput(2) # an input vector of size 2.
output = logistic(V*(tanh((W*x)+b)))
# define the loss with respect to an output y.
y = scalarInput(0) # this will hold the correct answer
loss = binary_log_loss(output, y)

# create training instances
def create_xor_instances(num_rounds=2000):
    questions = []
    answers = []
    for round in xrange(num_rounds):
        for x1 in 0,1:
            for x2 in 0,1:
                answer = 0 if x1==x2 else 1
                questions.append((x1,x2))
                answers.append(answer)
    return questions, answers 

questions, answers = create_xor_instances()

# train the network
trainer = SimpleSGDTrainer(m)

total_loss = 0
seen_instances = 0
for question, answer in zip(questions, answers):
    x.set(question)
    y.set(answer)
    seen_instances += 1
    total_loss += loss.value()
    loss.backward()
    trainer.update()
    if (seen_instances > 1 and seen_instances % 100 == 0):
        print "average loss is:",total_loss / seen_instances









    



average loss is: 0.725458401442
average loss is: 0.656036808193
average loss is: 0.563800293456
average loss is: 0.473188629244
average loss is: 0.401578919515
average loss is: 0.347210133697
average loss is: 0.30537398648
average loss is: 0.27243115149
average loss is: 0.245902155418
average loss is: 0.22411154042
average loss is: 0.205906257995
average loss is: 0.190473453378
average loss is: 0.177226172269
average loss is: 0.165731058566
average loss is: 0.155661680364
average loss is: 0.146767699362
average loss is: 0.138854031509
average loss is: 0.131766459678
average loss is: 0.125381493949
average loss is: 0.119599098227
average loss is: 0.114337381247
average loss is: 0.109528665657
average loss is: 0.105116533384
average loss is: 0.101053577985
average loss is: 0.0972996741069
average loss is: 0.093820632044
average loss is: 0.0905871372991
average loss is: 0.0875739114509
average loss is: 0.0847590394488
average loss is: 0.0821234288742
average loss is: 0.079650368163
average loss is: 0.0773251660003
average loss is: 0.0751348558335
average loss is: 0.0730679483965
average loss is: 0.0711142273374
average loss is: 0.0692645774255
average loss is: 0.0675108397355
average loss is: 0.0658456894337
average loss is: 0.0642625315812
average loss is: 0.0627554119665
average loss is: 0.0613189413034
average loss is: 0.059948229676
average loss is: 0.0586388300699
average loss is: 0.05738668844
average loss is: 0.0561881021362
average loss is: 0.0550396820511
average loss is: 0.0539383201534
average loss is: 0.0528811609025
average loss is: 0.0518655761557
average loss is: 0.0508891425877
average loss is: 0.0499496224367
average loss is: 0.0490449456893
average loss is: 0.0481731953563
average loss is: 0.0473325925335
average loss is: 0.0465214848134
average loss is: 0.0457383351514
average loss is: 0.0449817118815
average loss is: 0.0442502796927
average loss is: 0.0435427918518
average loss is: 0.0428580828441
average loss is: 0.0421950617608
average loss is: 0.0415527067172
average loss is: 0.0409300591527
average loss is: 0.0403262192239
average loss is: 0.0397403411381
average loss is: 0.0391716292271
average loss is: 0.0386193343495
average loss is: 0.0380827505725
average loss is: 0.0375612118193
average loss is: 0.0370540894219
average loss is: 0.0365607894682
average loss is: 0.0360807502221
average loss is: 0.0356134402267
average loss is: 0.0351583559568
average loss is: 0.0347150203697
average loss is: 0.0342829808685
average loss is: 0.0338618080745
average loss is: 0.0334510939502
average loss is: 0.0330504509121
average loss is: 0.0326595103741

Dynamic Networks

Dynamic networks are very similar to static ones, but instead of creating the network once and then calling "set" in each training example to change the inputs, we just create a new network for each training example.

We present an example below. While the value of this may not be clear in the xor example, the dynamic approach is very convenient for networks for which the structure is not fixed, such as recurrent or recursive networks.



In [17]:

    
import dynet as dy
# create training instances, as before
def create_xor_instances(num_rounds=2000):
    questions = []
    answers = []
    for round in xrange(num_rounds):
        for x1 in 0,1:
            for x2 in 0,1:
                answer = 0 if x1==x2 else 1
                questions.append((x1,x2))
                answers.append(answer)
    return questions, answers 

questions, answers = create_xor_instances()

# create a network for the xor problem given input and output
def create_xor_network(pW, pV, pb, inputs, expected_answer):
    dy.renew_cg() # new computation graph
    W = dy.parameter(pW) # add parameters to graph as expressions
    V = dy.parameter(pV)
    b = dy.parameter(pb)
    x = dy.vecInput(len(inputs))
    x.set(inputs)
    y = dy.scalarInput(expected_answer)
    output = dy.logistic(V*(dy.tanh((W*x)+b)))
    loss =  dy.binary_log_loss(output, y)
    return loss

m2 = dy.ParameterCollection()
pW = m2.add_parameters((8,2))
pV = m2.add_parameters((1,8))
pb = m2.add_parameters((8))
trainer = dy.SimpleSGDTrainer(m2)

seen_instances = 0
total_loss = 0
for question, answer in zip(questions, answers):
    loss = create_xor_network(pW, pV, pb, question, answer)
    seen_instances += 1
    total_loss += loss.value()
    loss.backward()
    trainer.update()
    if (seen_instances > 1 and seen_instances % 100 == 0):
        print "average loss is:",total_loss / seen_instances









    



average loss is: 0.736730417013
average loss is: 0.725369692743
average loss is: 0.715208243926
average loss is: 0.698906037733
average loss is: 0.667973376453
average loss is: 0.620016210104
average loss is: 0.564173455558
average loss is: 0.511108190748
average loss is: 0.464656613212
average loss is: 0.424903827408
average loss is: 0.390944672838
average loss is: 0.361782596097
average loss is: 0.336552875967
average loss is: 0.314552738269
average loss is: 0.295221981726
average loss is: 0.27811523865
average loss is: 0.262876965393
average loss is: 0.249221329002
average loss is: 0.236916671552
average loss is: 0.225773662324
average loss is: 0.215636288271
average loss is: 0.206374970573
average loss is: 0.197881278039
average loss is: 0.190063834667
average loss is: 0.182845127269
average loss is: 0.176158992879
average loss is: 0.16994863152
average loss is: 0.164165015582
average loss is: 0.158765610311
average loss is: 0.153713339384
average loss is: 0.148975738776
average loss is: 0.14452426397
average loss is: 0.140333718062
average loss is: 0.13638177571
average loss is: 0.132648585576
average loss is: 0.129116437846
average loss is: 0.125769484215
average loss is: 0.122593499324
average loss is: 0.119575678358
average loss is: 0.116704463887
average loss is: 0.113969398874
average loss is: 0.111360997359
average loss is: 0.108870635643
average loss is: 0.106490455879
average loss is: 0.104213282756
average loss is: 0.102032551605
average loss is: 0.0999422444205
average loss is: 0.0979368338955
average loss is: 0.0960112348951
average loss is: 0.094160760665
average loss is: 0.0923810851444
average loss is: 0.0906682085468
average loss is: 0.0890184267577
average loss is: 0.0874283051604
average loss is: 0.0858946543594
average loss is: 0.0844145084265
average loss is: 0.0829851059784
average loss is: 0.0816038727351
average loss is: 0.0802684055211
average loss is: 0.0789764590814
average loss is: 0.0777259325812
average loss is: 0.0765148587798
average loss is: 0.0753413928689
average loss is: 0.0742038039022
average loss is: 0.073100465403
average loss is: 0.072029847966
average loss is: 0.0709905121502
average loss is: 0.0699811016467
average loss is: 0.0690003377412
average loss is: 0.0680470136383
average loss is: 0.0671199895066
average loss is: 0.0662181878878
average loss is: 0.0653405894968
average loss is: 0.0644862291951
average loss is: 0.0636541927901
average loss is: 0.0628436133573
average loss is: 0.062053668331
average loss is: 0.0612835769022
average loss is: 0.0605325971122
average loss is: 0.0598000235481