chunker: default program



In [1]:

    
from default import *
import os

Run the default solution on dev



In [4]:

    
chunker = LSTMTagger(os.path.join('data', 'train.txt.gz'), os.path.join('data', 'chunker'), '.tar')
decoder_output = chunker.decode('data/input/dev.txt')









    



100%|██████████| 1027/1027 [00:02<00:00, 459.66it/s]

Evaluate the default output



In [5]:

    
flat_output = [ output for sent in decoder_output for output in sent ]
import conlleval
true_seqs = []
with open(os.path.join('data','reference','dev.out')) as r:
    for sent in conlleval.read_file(r):
        true_seqs += sent.split()
conlleval.evaluate(true_seqs, flat_output)









    



processed 23663 tokens with 11896 phrases; found: 11672 phrases; correct: 8568.
accuracy:  84.35%; (non-O)
accuracy:  85.65%; precision:  73.41%; recall:  72.02%; FB1:  72.71
             ADJP: precision:  36.49%; recall:  11.95%; FB1:  18.00  74
             ADVP: precision:  71.36%; recall:  39.45%; FB1:  50.81  220
            CONJP: precision:   0.00%; recall:   0.00%; FB1:   0.00  0
             INTJ: precision:   0.00%; recall:   0.00%; FB1:   0.00  0
               NP: precision:  70.33%; recall:  76.80%; FB1:  73.42  6811
               PP: precision:  92.40%; recall:  87.14%; FB1:  89.69  2302
              PRT: precision:  65.00%; recall:  57.78%; FB1:  61.18  40
             SBAR: precision:  84.62%; recall:  41.77%; FB1:  55.93  117
               VP: precision:  63.66%; recall:  58.25%; FB1:  60.83  2108






    Out[5]:





(73.40644276901988, 72.02420981842637, 72.70875763747455)

Documentation

Write some beautiful documentation of your program here.

Analysis

Do some analysis of the results. What ideas did you try? What worked and what did not?