Dependency Parsing



In [ ]:

    
%load_ext autoreload
%autoreload 2

Eisner's algorithm step-by-step

Inputs

arcs scores $s_\theta(h,m)$, for $h\in \{0,...,N\}$ and $m\in \{1,...,N\}$, $h\neq m$
sentence She enjoys the Summer School.

Notice that

the length of the sequence is $N = 5$
the terminal symbols that comprise the sentence are $w_1=$ She, $w_2=$ enjoys, $w_3=$ the, $w_4=$ Summer, $w_5=$ School
the root symbol $w_0=\ast$ is defined for convenience; the whole sentence can be thought as being $\ast$ She enjoys the Summer School.

Variables to fill in:

$\mathrm{incomplete}$, shape $(N+1)\times (N+1) \times 2$: incomplete span scores
$\mathrm{complete}$, shape $(N+1)\times (N+1) \times 2$: complete span scores

Initialization

Initialization corresponds to setting all 1-word 'spans' scores to zero. As we will see in the induction stage, these will be the initial building blocks for computing longer span scores.

The figure below illustrates all the initialized span scores.

Induction

We now proceed to do some iterations on the Induction stage's double for loop:

Spans with $k=1$

$k=1$ corresponds to spans over pairs of words. The inner loop variable $s$ loops over the words, and determines the leftmost word in the span. The other variable $t:=s+k$ corresponds to the end of the span (the rightmost word).

Incomplete spans

Since $s\leq r<t$, and $t=s+k=s+1$, one concludes that $r=s$ for all spans with $k=1$. For that reason, the highest score corresponds to the only value of $r$:

$$ \mathrm{incomplete}[s,t,\leftarrow]\overset{(r=s)}{=}\mathrm{complete[s,s,\rightarrow]}+\mathrm{complete[t,t,\leftarrow]}+s_\theta(t,s) $$

Notice the complete spans on the right hand side do not meet on top of a word. That is the reason why these spans are called incomplete.

The incomplete spans that go right are computed in the exact same way, except the arc score we use is the one of the arc going right: $s_\theta(s,t)$ instead of $s_\theta(t,s)$.

$$ \mathrm{incomplete}[s,t,\rightarrow]\overset{(r=s)}{=}\mathrm{complete[s,s,\rightarrow]}+\mathrm{complete[t,t,\leftarrow]}+s_\theta(s,t) $$

Spans with length $k=2$

The next step is to compute scores for spans over three words. An immediate consequence is that now $r$ can take two different values: $s$ and $s+1$. Now there is an actual need to maximize the score over possible values of $r$.

Incomplete spans

The different values of $r$ correspond to using different sets of complete span scores.

$$ \mathrm{incomplete}[s,t,\leftarrow]=\underset{r}{\max}\left\{\begin{matrix}(r=s)& \mathrm{complete}[s,s,\rightarrow]+\mathrm{complete}[s+1,t,\leftarrow]+s_\theta(t,s)\\ (r=s+1)& \mathrm{complete}[s,s+1,\rightarrow]+\mathrm{complete}[t,t,\leftarrow]+s_\theta(t,s)\end{matrix}\right. $$

The procedure to compute right-facing incomplete spans is similar to the one above. All that changes is the arc score that is used.

$$ \mathrm{incomplete}[s,t,\rightarrow]=\underset{r}{\max}\left\{\begin{matrix}(r=s)& \mathrm{complete}[s,s,\rightarrow]+\mathrm{complete}[s+1,t,\leftarrow]+s_\theta(s,t)\\ (r=s+1)& \mathrm{complete}[s,s+1,\rightarrow]+\mathrm{complete}[t,t,\leftarrow]+s_\theta(s,t)\end{matrix}\right. $$

Complete spans

We now proceed to compute complete span scores. Once again, the incomplete span scores required for this step were conveniently computed before.

$$ \mathrm{complete}[s,t,\leftarrow]=\underset{r}{\max}\left\{\begin{matrix}(r=s)& \mathrm{complete}[s,s,\leftarrow]+\mathrm{incomplete}[s,t,\leftarrow]\\ (r=s+1)& \mathrm{complete}[s,s+1,\leftarrow]+\mathrm{incomplete}[s+1,t,\leftarrow]\end{matrix}\right. $$

The last step in this demo is to compute right-facing complete span scores over three words:

$$ \mathrm{complete}[s,t,\rightarrow]=\underset{r}{\max}\left\{\begin{matrix}(r=s)& \mathrm{incomplete}[s,s+1,\rightarrow]+\mathrm{complete}[s+1,t,\rightarrow]\\ (r=s+1)& \mathrm{incomplete}[s,t,\rightarrow]+\mathrm{complete}[t,t,\rightarrow]\end{matrix}\right. $$

These steps continue until a complete span of size $N+1$ is computed, which corresponds to spanning the whole sentence. After that, we backtrack the highest scores to build the parse tree.

Implement Eisner's algorithm

Implement Eisner’s algorithm for projective dependency parsing. The pseudo-code is shown as Algorithm 13. Implement this algorithm as the function:

def parse_proj(self, scores):

in file dependency decoder.py. The input is a matrix of arc scores, whose dimension is (N + 1)-by-(N + 1), and whose (h, m) entry contains the score sq(h, m).

In particular, the first row contains the scores for the arcs that depart from the root, and the first column’s values, along with the main diagonal, are to be ignored (since no arcs point to the root, and there are no self-pointing arcs). To make your job easier, we provide an implementation of the backtracking part:

def backtrack_eisner(self, incomplete_backtrack, complete_backtrack, s, t, direction, complete, heads):

so you just need to build complete/incomplete spans and their backtrack pointers and then call

heads = -np.ones(N+1, dtype=int) 
    self.backtrack_eisner(incomplete_backtrack, complete_backtrack, 0, N, 1, 1,heads)
    return heads

to obtain the final parse. To test the algorithm, retrain the parser on the English data (where the trees are actually all projective) by setting the flag dp.projective to True:



In [ ]:

    
dp = depp.DependencyParser() 
dp.features.use_lexical = True 
dp.features.use_distance = True 
dp.features.use_contextual = True 
dp.read_data("english") 
dp.projective = True
dp.train_perceptron(10)
dp.test()

You should get the following results:

    4.2.5
    Number of sentences: 8044
    Number of tokens: 80504
    Number of words: 12202
    Number of pos: 48
    Number of features: 338014
    Epoch 1
    Training accuracy: 0.835637168541
    Epoch 2
    Training accuracy: 0.922426254687
    Epoch 3
    Training accuracy: 0.947621628947
    Epoch 4
    Training accuracy: 0.960326602521
    Epoch 5
    Training accuracy: 0.967689840538
    Epoch 6
    Training accuracy: 0.97263631025
    Epoch 7
    Training accuracy: 0.97619370285
    Epoch 8
    Training accuracy: 0.979209016579
    Epoch 9
    Training accuracy: 0.98127569228
    Epoch 10
    Training accuracy: 0.981320865519
    Test accuracy (509 test instances): 0.886732599366