In [ ]:
%load_ext autoreload
%autoreload 2
Inputs
Notice that
Variables to fill in:
We now proceed to do some iterations on the Induction stage's double for
loop:
$k=1$ corresponds to spans over pairs of words. The inner loop variable $s$ loops over the words, and determines the leftmost word in the span. The other variable $t:=s+k$ corresponds to the end of the span (the rightmost word).
Since $s\leq r<t$, and $t=s+k=s+1$, one concludes that $r=s$ for all spans with $k=1$. For that reason, the highest score corresponds to the only value of $r$:
$$ \mathrm{incomplete}[s,t,\leftarrow]\overset{(r=s)}{=}\mathrm{complete[s,s,\rightarrow]}+\mathrm{complete[t,t,\leftarrow]}+s_\theta(t,s) $$Notice the complete spans on the right hand side do not meet on top of a word. That is the reason why these spans are called incomplete.
The incomplete spans that go right are computed in the exact same way, except the arc score we use is the one of the arc going right: $s_\theta(s,t)$ instead of $s_\theta(t,s)$.
$$ \mathrm{incomplete}[s,t,\rightarrow]\overset{(r=s)}{=}\mathrm{complete[s,s,\rightarrow]}+\mathrm{complete[t,t,\leftarrow]}+s_\theta(s,t) $$The next step is to compute scores for spans over three words. An immediate consequence is that now $r$ can take two different values: $s$ and $s+1$. Now there is an actual need to maximize the score over possible values of $r$.
The different values of $r$ correspond to using different sets of complete span scores.
$$ \mathrm{incomplete}[s,t,\leftarrow]=\underset{r}{\max}\left\{\begin{matrix}(r=s)& \mathrm{complete}[s,s,\rightarrow]+\mathrm{complete}[s+1,t,\leftarrow]+s_\theta(t,s)\\ (r=s+1)& \mathrm{complete}[s,s+1,\rightarrow]+\mathrm{complete}[t,t,\leftarrow]+s_\theta(t,s)\end{matrix}\right. $$The procedure to compute right-facing incomplete spans is similar to the one above. All that changes is the arc score that is used.
$$ \mathrm{incomplete}[s,t,\rightarrow]=\underset{r}{\max}\left\{\begin{matrix}(r=s)& \mathrm{complete}[s,s,\rightarrow]+\mathrm{complete}[s+1,t,\leftarrow]+s_\theta(s,t)\\ (r=s+1)& \mathrm{complete}[s,s+1,\rightarrow]+\mathrm{complete}[t,t,\leftarrow]+s_\theta(s,t)\end{matrix}\right. $$We now proceed to compute complete span scores. Once again, the incomplete span scores required for this step were conveniently computed before.
$$ \mathrm{complete}[s,t,\leftarrow]=\underset{r}{\max}\left\{\begin{matrix}(r=s)& \mathrm{complete}[s,s,\leftarrow]+\mathrm{incomplete}[s,t,\leftarrow]\\ (r=s+1)& \mathrm{complete}[s,s+1,\leftarrow]+\mathrm{incomplete}[s+1,t,\leftarrow]\end{matrix}\right. $$The last step in this demo is to compute right-facing complete span scores over three words:
$$ \mathrm{complete}[s,t,\rightarrow]=\underset{r}{\max}\left\{\begin{matrix}(r=s)& \mathrm{incomplete}[s,s+1,\rightarrow]+\mathrm{complete}[s+1,t,\rightarrow]\\ (r=s+1)& \mathrm{incomplete}[s,t,\rightarrow]+\mathrm{complete}[t,t,\rightarrow]\end{matrix}\right. $$These steps continue until a complete span of size $N+1$ is computed, which corresponds to spanning the whole sentence. After that, we backtrack the highest scores to build the parse tree.
in file dependency decoder.py. The input is a matrix of arc scores, whose dimension is (N + 1)-by-(N + 1), and whose (h, m) entry contains the score sq(h, m).
In particular, the first row contains the scores for the arcs that depart from the root, and the first column’s values, along with the main diagonal, are to be ignored (since no arcs point to the root, and there are no self-pointing arcs). To make your job easier, we provide an implementation of the backtracking part:
def backtrack_eisner(self, incomplete_backtrack, complete_backtrack, s, t, direction, complete, heads):
so you just need to build complete/incomplete spans and their backtrack pointers and then call
heads = -np.ones(N+1, dtype=int)
self.backtrack_eisner(incomplete_backtrack, complete_backtrack, 0, N, 1, 1,heads)
return heads
to obtain the final parse. To test the algorithm, retrain the parser on the English data (where the trees are actually all projective) by setting the flag dp.projective to True:
In [ ]:
dp = depp.DependencyParser()
dp.features.use_lexical = True
dp.features.use_distance = True
dp.features.use_contextual = True
dp.read_data("english")
dp.projective = True
dp.train_perceptron(10)
dp.test()
You should get the following results:
4.2.5
Number of sentences: 8044
Number of tokens: 80504
Number of words: 12202
Number of pos: 48
Number of features: 338014
Epoch 1
Training accuracy: 0.835637168541
Epoch 2
Training accuracy: 0.922426254687
Epoch 3
Training accuracy: 0.947621628947
Epoch 4
Training accuracy: 0.960326602521
Epoch 5
Training accuracy: 0.967689840538
Epoch 6
Training accuracy: 0.97263631025
Epoch 7
Training accuracy: 0.97619370285
Epoch 8
Training accuracy: 0.979209016579
Epoch 9
Training accuracy: 0.98127569228
Epoch 10
Training accuracy: 0.981320865519
Test accuracy (509 test instances): 0.886732599366