Working with the python DyNet package

The DyNet package is intended for training and using neural networks, and is particularly suited for applications with dynamically changing network structures. It is a python-wrapper for the DyNet C++ package.

In neural network packages there are generally two modes of operation:

  • Static networks, in which a network is built and then being fed with different inputs/outputs. Most NN packages work this way.
  • Dynamic networks, in which a new network is built for each training example (sharing parameters with the networks of other training examples). This approach is what makes DyNet unique, and where most of its power comes from.

We will describe both of these modes.

Package Fundamentals

The main piece of DyNet is the ComputationGraph, which is what essentially defines a neural network. The ComputationGraph is composed of expressions, which relate to the inputs and outputs of the network, as well as the Parameters of the network. The parameters are the things in the network that are optimized over time, and all of the parameters sit inside a ParameterCollection. There are trainers (for example SimpleSGDTrainer) that are in charge of setting the parameter values.

We will not be using the ComputationGraph directly, but it is there in the background, as a singleton object. When dynet is imported, a new ComputationGraph is created. We can then reset the computation graph to a new state by calling renew_cg().

Static Networks

The life-cycle of a DyNet program is:

  1. Create a ParameterCollection, and populate it with Parameters.
  2. Renew the computation graph, and create Expression representing the network (the network will include the Expressions for the Parameters defined in the parameter collection).
  3. Optimize the model for the objective of the network.

As an example, consider a model for solving the "xor" problem. The network has two inputs, which can be 0 or 1, and a single output which should be the xor of the two inputs. We will model this as a multi-layer perceptron with a single hidden layer.

Let $x = x_1, x_2$ be our input. We will have a hidden layer of 8 nodes, and an output layer of a single node. The activation on the hidden layer will be a $\tanh$. Our network will then be:

$\sigma(V(\tanh(Wx+b)))$

Where $W$ is a $8 \times 2$ matrix, $V$ is an $8 \times 1$ matrix, and $b$ is an 8-dim vector.

We want the output to be either 0 or 1, so we take the output layer to be the logistic-sigmoid function, $\sigma(x)$, that takes values between $-\infty$ and $+\infty$ and returns numbers in $[0,1]$.

We will begin by defining the model and the computation graph.


In [1]:
# we assume that we have the dynet module in your path.
import dynet as dy

In [2]:
# create a parameter collection and add the parameters.
m = dy.ParameterCollection()
W = m.add_parameters((8,2))
V = m.add_parameters((1,8))
b = m.add_parameters((8))

dy.renew_cg() # new computation graph. not strictly needed here, but good practice.


Out[2]:
<_dynet.ComputationGraph at 0x7f79486546c0>

In [3]:
#b[1:-1].value()
b.value()


Out[3]:
[0.31104135513305664,
 -0.36465519666671753,
 -0.43395277857780457,
 0.5421143770217896,
 -0.3137839138507843,
 0.16922643780708313,
 0.3162959814071655,
 -0.08413488417863846]

The first block creates a parameter collection and populates it with parameters. The second block creates a computation graph.

The model parameters can be used as expressions in the computation graph. We now make use of V, W, and b in order to create the complete expression for the network.


In [4]:
x = dy.vecInput(2) # an input vector of size 2. Also an expression.
output = dy.logistic(V*(dy.tanh((W*x)+b)))

In [5]:
# we can now query our network
x.set([0,0])
output.value()


Out[5]:
0.41870421171188354

In [6]:
# we want to be able to define a loss, so we need an input expression to work against.
y = dy.scalarInput(0) # this will hold the correct answer
loss = dy.binary_log_loss(output, y)

In [7]:
x.set([1,0])
y.set(0)
print(loss.value())

y.set(1)
print(loss.value())


0.5666957497596741
0.837935209274292

Training

We now want to set the parameter weights such that the loss is minimized.

For this, we will use a trainer object. A trainer is constructed with respect to the parameters of a given model.


In [8]:
trainer = dy.SimpleSGDTrainer(m)

To use the trainer, we need to:

  • call the forward_scalar method of ComputationGraph. This will run a forward pass through the network, calculating all the intermediate values until the last one (loss, in our case), and then convert the value to a scalar. The final output of our network must be a single scalar value. However, if we do not care about the value, we can just use cg.forward() instead of cg.forward_sclar().
  • call the backward method of ComputationGraph. This will run a backward pass from the last node, calculating the gradients with respect to minimizing the last expression (in our case we want to minimize the loss). The gradients are stored in the parameter collection, and we can now let the trainer take care of the optimization step.
  • call trainer.update() to optimize the values with respect to the latest gradients.

In [9]:
x.set([1,0])
y.set(1)
loss_value = loss.value() # this performs a forward through the network.
print("the loss before step is:",loss_value)

# now do an optimization step
loss.backward()  # compute the gradients
trainer.update()

# see how it affected the loss:
loss_value = loss.value(recalculate=True) # recalculate=True means "don't use precomputed value"
print("the loss after step is:",loss_value)


the loss before step is: 0.837935209274292
the loss after step is: 0.6856433749198914

The optimization step indeed made the loss decrease. We now need to run this in a loop. To this end, we will create a training set, and iterate over it.

For the xor problem, the training instances are easy to create.


In [10]:
def create_xor_instances(num_rounds=2000):
    questions = []
    answers = []
    for round in range(num_rounds):
        for x1 in 0,1:
            for x2 in 0,1:
                answer = 0 if x1==x2 else 1
                questions.append((x1,x2))
                answers.append(answer)
    return questions, answers 

questions, answers = create_xor_instances()

We now feed each question / answer pair to the network, and try to minimize the loss.


In [11]:
total_loss = 0
seen_instances = 0
for question, answer in zip(questions, answers):
    x.set(question)
    y.set(answer)
    seen_instances += 1
    total_loss += loss.value()
    loss.backward()
    trainer.update()
    if (seen_instances > 1 and seen_instances % 100 == 0):
        print("average loss is:",total_loss / seen_instances)


average loss is: 0.7301900261640548
average loss is: 0.7117043241858483
average loss is: 0.6865043716629347
average loss is: 0.6486480497568846
average loss is: 0.5961489198803902
average loss is: 0.5363389208291968
average loss is: 0.4800921744853258
average loss is: 0.4316547167254612
average loss is: 0.39094374931520887
average loss is: 0.35675006832927464
average loss is: 0.3278251909870993
average loss is: 0.30312755185179413
average loss is: 0.2818366446174108
average loss is: 0.2633153002815587
average loss is: 0.2470681765327851
average loss is: 0.23270742912660353
average loss is: 0.2199265043853837
average loss is: 0.20848063852793228
average loss is: 0.19817242979072033
average loss is: 0.18884113303828054
average loss is: 0.18035465603760842
average loss is: 0.17260352481601082
average loss is: 0.1654962877224645
average loss is: 0.1589559833255286
average loss is: 0.15291739877592772
average loss is: 0.14732492341812198
average loss is: 0.14213085135462245
average loss is: 0.13729403161759754
average loss is: 0.13277878321490474
average loss is: 0.12855401825974697
average loss is: 0.12459252774257273
average loss is: 0.12087039561367419
average loss is: 0.11736651676206031
average loss is: 0.11406219601699644
average loss is: 0.11094081379626212
average loss is: 0.10798754755103598
average loss is: 0.10518913371828259
average loss is: 0.10253366940838628
average loss is: 0.10001044190178315
average loss is: 0.09760978312254884
average loss is: 0.09532294639377718
average loss is: 0.09314199849195401
average loss is: 0.0910597273951862
average loss is: 0.08906956217108845
average loss is: 0.08716550341245925
average loss is: 0.08534206202271415
average loss is: 0.08359420659250896
average loss is: 0.08191731647852672
average loss is: 0.08030714023444915
average loss is: 0.07875976019569207
average loss is: 0.0772715599823734
average loss is: 0.07583919591308445
average loss is: 0.07445957204553229
average loss is: 0.07312981744738796
average loss is: 0.07184726620641198
average loss is: 0.07060943942273817
average loss is: 0.06941402890125142
average loss is: 0.06825888305458899
average loss is: 0.06714199365698732
average loss is: 0.06606148393452167
average loss is: 0.06501559805323477
average loss is: 0.0640026917649741
average loss is: 0.06302122345515748
average loss is: 0.06206974616015941
average loss is: 0.06114690068011316
average loss is: 0.060251408738302856
average loss is: 0.05938206722341311
average loss is: 0.05853774277446278
average loss is: 0.057717366610177914
average loss is: 0.05691992998316917
average loss is: 0.056144480362147565
average loss is: 0.05539011717681812
average loss is: 0.05465598844808259
average loss is: 0.053941287759839356
average loss is: 0.053245250818133354
average loss is: 0.052567153106946006
average loss is: 0.051906307245069484
average loss is: 0.05126206048185943
average loss is: 0.05063379273556108
average loss is: 0.05002091444953112

Our network is now trained. Let's verify that it indeed learned the xor function:


In [12]:
x.set([0,1])
print("0,1",output.value())

x.set([1,0])
print("1,0",output.value())

x.set([0,0])
print("0,0",output.value())

x.set([1,1])
print("1,1",output.value())


0,1 0.998213529586792
1,0 0.9983397722244263
0,0 0.0007906468817964196
1,1 0.0021107089705765247

In case we are curious about the parameter values, we can query them:


In [13]:
W.value()


Out[13]:
array([[ 2.85107112,  2.83952975],
       [-3.29001093,  2.51486993],
       [-1.92002058, -1.90759397],
       [-1.4002918 , -1.43046546],
       [ 0.10682328, -0.89503163],
       [-1.78532696,  2.70406151],
       [ 1.20831835, -0.47131985],
       [ 0.92750639, -2.0729847 ]])

In [14]:
V.value()


Out[14]:
array([[ 4.12795115,  4.78487778, -2.21212292,  2.85010242,  1.01012611,
        -3.31246257, -1.09919119,  2.19970202]])

In [15]:
b.value()


Out[15]:
[-0.937517523765564,
 -1.1073024272918701,
 0.33602145314216614,
 2.16909122467041,
 0.17579713463783264,
 0.7122746706008911,
 0.0978747308254242,
 -0.16976511478424072]

To summarize

Here is a complete program:


In [16]:
# define the parameters
m = dy.ParameterCollection()
W = m.add_parameters((8,2))
V = m.add_parameters((1,8))
b = m.add_parameters((8))

# renew the computation graph
dy.renew_cg()

# create the network
x = dy.vecInput(2) # an input vector of size 2.
output = dy.logistic(V*(dy.tanh((W*x)+b)))
# define the loss with respect to an output y.
y = dy.scalarInput(0) # this will hold the correct answer
loss = dy.binary_log_loss(output, y)

# create training instances
def create_xor_instances(num_rounds=2000):
    questions = []
    answers = []
    for round in range(num_rounds):
        for x1 in 0,1:
            for x2 in 0,1:
                answer = 0 if x1==x2 else 1
                questions.append((x1,x2))
                answers.append(answer)
    return questions, answers 

questions, answers = create_xor_instances()

# train the network
trainer = dy.SimpleSGDTrainer(m)

total_loss = 0
seen_instances = 0
for question, answer in zip(questions, answers):
    x.set(question)
    y.set(answer)
    seen_instances += 1
    total_loss += loss.value()
    loss.backward()
    trainer.update()
    if (seen_instances > 1 and seen_instances % 100 == 0):
        print("average loss is:",total_loss / seen_instances)


average loss is: 0.7205019667744637
average loss is: 0.6994020892679691
average loss is: 0.6667007360855738
average loss is: 0.6162097529321909
average loss is: 0.5529091520011425
average loss is: 0.4909045885503292
average loss is: 0.43754800063158783
average loss is: 0.39328361499123277
average loss is: 0.3566404625069764
average loss is: 0.3260498847439885
average loss is: 0.3002264795401557
average loss is: 0.27818110284395514
average loss is: 0.2591624129871623
average loss is: 0.242597631290555
average loss is: 0.22804588572122156
average loss is: 0.21516385969764087
average loss is: 0.20368098794153947
average loss is: 0.1933815449432263
average loss is: 0.18409163802911185
average loss is: 0.17566965313232505
average loss is: 0.16799916441020157
average loss is: 0.16098361631537872
average loss is: 0.1545422787496658
average loss is: 0.14860714952888276
average loss is: 0.1431205491213128
average loss is: 0.13803323951316998
average loss is: 0.13330293813159827
average loss is: 0.12889313311440803
average loss is: 0.1247721328136736
average loss is: 0.12091229626559652
average loss is: 0.11728940626944326
average loss is: 0.11388215588252933
average loss is: 0.11067172380397096
average loss is: 0.10764142127125524
average loss is: 0.10477639951383962
average loss is: 0.10206340225982584
average loss is: 0.09949055765566693
average loss is: 0.09704720211006995
average loss is: 0.09472372988238931
average loss is: 0.09251146528020035
average loss is: 0.09040255263388135
average loss is: 0.08838986149038343
average loss is: 0.08646690461080694
average loss is: 0.08462776689538838
average loss is: 0.08286704372017023
average loss is: 0.08117978683943637
average loss is: 0.07956145741833136
average loss is: 0.07800788381944584
average loss is: 0.07651522612707613
average loss is: 0.07507994277025573
average loss is: 0.07369876249170607
average loss is: 0.07236865903304603
average loss is: 0.07108682857070751
average loss is: 0.06985066948984577
average loss is: 0.06865776468049312
average loss is: 0.06750586506561376
average loss is: 0.06639287554648737
average loss is: 0.06531684178260862
average loss is: 0.06427593874778614
average loss is: 0.06326845997859103
average loss is: 0.062292808323533684
average loss is: 0.061347487150436086
average loss is: 0.06043109254734147
average loss is: 0.059542306310031566
average loss is: 0.05867988926305686
average loss is: 0.057842675803783064
average loss is: 0.057029568297517444
average loss is: 0.056239532018431314
average loss is: 0.05547159101134942
average loss is: 0.05472482365616763
average loss is: 0.05399835920601617
average loss is: 0.05329137419280111
average loss is: 0.052603089143243006
average loss is: 0.05193276596526974
average loss is: 0.051279704820535454
average loss is: 0.05064324217525274
average loss is: 0.050022748128961556
average loss is: 0.04941762432470941
average loss is: 0.048827302071201034
average loss is: 0.048251240481033165

Dynamic Networks

Dynamic networks are very similar to static ones, but instead of creating the network once and then calling "set" in each training example to change the inputs, we just create a new network for each training example.

We present an example below. While the value of this may not be clear in the xor example, the dynamic approach is very convenient for networks for which the structure is not fixed, such as recurrent or recursive networks.


In [17]:
import dynet as dy
# create training instances, as before
def create_xor_instances(num_rounds=2000):
    questions = []
    answers = []
    for round in range(num_rounds):
        for x1 in 0,1:
            for x2 in 0,1:
                answer = 0 if x1==x2 else 1
                questions.append((x1,x2))
                answers.append(answer)
    return questions, answers 

questions, answers = create_xor_instances()

# create a network for the xor problem given input and output
def create_xor_network(W, V, b, inputs, expected_answer):
    dy.renew_cg() # new computation graph
    x = dy.vecInput(len(inputs))
    x.set(inputs)
    y = dy.scalarInput(expected_answer)
    output = dy.logistic(V*(dy.tanh((W*x)+b)))
    loss =  dy.binary_log_loss(output, y)
    return loss

m2 = dy.ParameterCollection()
W = m2.add_parameters((8,2))
V = m2.add_parameters((1,8))
b = m2.add_parameters((8))
trainer = dy.SimpleSGDTrainer(m2)

seen_instances = 0
total_loss = 0
for question, answer in zip(questions, answers):
    loss = create_xor_network(W, V, b, question, answer)
    seen_instances += 1
    total_loss += loss.value()
    loss.backward()
    trainer.update()
    if (seen_instances > 1 and seen_instances % 100 == 0):
        print("average loss is:",total_loss / seen_instances)


average loss is: 0.7242387273907661
average loss is: 0.6815587529540061
average loss is: 0.6179609213272731
average loss is: 0.5381991682946682
average loss is: 0.46485630394518374
average loss is: 0.4053867908070485
average loss is: 0.3582047475874424
average loss is: 0.32044148206710815
average loss is: 0.2897336838249531
average loss is: 0.2643519359687343
average loss is: 0.24305510523898358
average loss is: 0.22494598247110845
average loss is: 0.20936599294225183
average loss is: 0.195823337073837
average loss is: 0.18394447944127024
average loss is: 0.17344117014989024
average loss is: 0.16408769504578016
average loss is: 0.15570493978655173
average loss is: 0.14814903329935317
average loss is: 0.1413031330066733
average loss is: 0.1350713880714916
average loss is: 0.12937444295332004
average loss is: 0.12414604435290169
average loss is: 0.11933044975992137
average loss is: 0.11488042589090765
average loss is: 0.11075568636440529
average loss is: 0.10692166014971143
average loss is: 0.10334851395771173
average loss is: 0.10001036719587664
average loss is: 0.09688465872919187
average loss is: 0.0939516312917394
average loss is: 0.09119390803352871
average loss is: 0.08859614507605632
average loss is: 0.08614474194569459
average loss is: 0.08382760104763189
average loss is: 0.0816339253942715
average loss is: 0.07955404841146003
average loss is: 0.07757928909470425
average loss is: 0.07570182968754895
average loss is: 0.07391461094305851
average loss is: 0.07221124223728732
average loss is: 0.070585923835613
average loss is: 0.06903338058765025
average loss is: 0.06754880329556975
average loss is: 0.06612779856276595
average loss is: 0.06476634448308133
average loss is: 0.06346075249892819
average loss is: 0.062207633629247236
average loss is: 0.06100386795333625
average loss is: 0.059846579102938995
average loss is: 0.058733110335135064
average loss is: 0.05766100448007749
average loss is: 0.05662798507044636
average loss is: 0.05563194011197926
average loss is: 0.054670907723167066
average loss is: 0.05374306264720092
average loss is: 0.05284670477963804
average loss is: 0.051980248590979466
average loss is: 0.05114221337111272
average loss is: 0.050331215119435606
average loss is: 0.04954595835507298
average loss is: 0.04878522939514369
average loss is: 0.04804788986208021
average loss is: 0.04733287107374053
average loss is: 0.04663916844668655
average loss is: 0.04596583708531618
average loss is: 0.045311987426708826
average loss is: 0.04467678093440447
average loss is: 0.04405942677290759
average loss is: 0.04345917822181114
average loss is: 0.04287532994617105
average loss is: 0.04230721487954724
average loss is: 0.04175420160314438
average loss is: 0.041215692378306835
average loss is: 0.04069112060347883
average loss is: 0.040179948867359934
average loss is: 0.03968166718186883
average loss is: 0.0391957911709622
average loss is: 0.03872186044427048
average loss is: 0.03825943737498892