Named Entity Recognition using sklearn-crfsuite

In this notebook we train a basic CRF model for Named Entity Recognition on CoNLL2002 data (following https://github.com/TeamHG-Memex/sklearn-crfsuite/blob/master/docs/CoNLL2002.ipynb) and check its weights to see what it learned.

To follow this tutorial you need NLTK > 3.x and sklearn-crfsuite Python packages. The tutorial uses Python 3.


In [1]:
import nltk
import sklearn_crfsuite
import eli5

1. Training data

CoNLL 2002 datasets contains a list of Spanish sentences, with Named Entities annotated. It uses IOB2 encoding. CoNLL 2002 data also provide POS tags.


In [2]:
train_sents = list(nltk.corpus.conll2002.iob_sents('esp.train'))
test_sents = list(nltk.corpus.conll2002.iob_sents('esp.testb'))
train_sents[0]


Out[2]:
[('Melbourne', 'NP', 'B-LOC'),
 ('(', 'Fpa', 'O'),
 ('Australia', 'NP', 'B-LOC'),
 (')', 'Fpt', 'O'),
 (',', 'Fc', 'O'),
 ('25', 'Z', 'O'),
 ('may', 'NC', 'O'),
 ('(', 'Fpa', 'O'),
 ('EFE', 'NC', 'B-ORG'),
 (')', 'Fpt', 'O'),
 ('.', 'Fp', 'O')]

2. Feature extraction

POS tags can be seen as pre-extracted features. Let's extract more features (word parts, simplified POS tags, lower/title/upper flags, features of nearby words) and convert them to sklear-crfsuite format - each sentence should be converted to a list of dicts. This is a very simple baseline; you certainly can do better.


In [3]:
def word2features(sent, i):
    word = sent[i][0]
    postag = sent[i][1]
    
    features = {
        'bias': 1.0,
        'word.lower()': word.lower(),
        'word[-3:]': word[-3:],
        'word.isupper()': word.isupper(),
        'word.istitle()': word.istitle(),
        'word.isdigit()': word.isdigit(),
        'postag': postag,
        'postag[:2]': postag[:2],        
    }
    if i > 0:
        word1 = sent[i-1][0]
        postag1 = sent[i-1][1]
        features.update({
            '-1:word.lower()': word1.lower(),
            '-1:word.istitle()': word1.istitle(),
            '-1:word.isupper()': word1.isupper(),
            '-1:postag': postag1,
            '-1:postag[:2]': postag1[:2],
        })
    else:
        features['BOS'] = True
        
    if i < len(sent)-1:
        word1 = sent[i+1][0]
        postag1 = sent[i+1][1]
        features.update({
            '+1:word.lower()': word1.lower(),
            '+1:word.istitle()': word1.istitle(),
            '+1:word.isupper()': word1.isupper(),
            '+1:postag': postag1,
            '+1:postag[:2]': postag1[:2],
        })
    else:
        features['EOS'] = True
                
    return features


def sent2features(sent):
    return [word2features(sent, i) for i in range(len(sent))]

def sent2labels(sent):
    return [label for token, postag, label in sent]

def sent2tokens(sent):
    return [token for token, postag, label in sent]

X_train = [sent2features(s) for s in train_sents]
y_train = [sent2labels(s) for s in train_sents]

X_test = [sent2features(s) for s in test_sents]
y_test = [sent2labels(s) for s in test_sents]

This is how features extracted from a single token look like:


In [4]:
X_train[0][1]


Out[4]:
{'+1:postag': 'NP',
 '+1:postag[:2]': 'NP',
 '+1:word.istitle()': True,
 '+1:word.isupper()': False,
 '+1:word.lower()': 'australia',
 '-1:postag': 'NP',
 '-1:postag[:2]': 'NP',
 '-1:word.istitle()': True,
 '-1:word.isupper()': False,
 '-1:word.lower()': 'melbourne',
 'bias': 1.0,
 'postag': 'Fpa',
 'postag[:2]': 'Fp',
 'word.isdigit()': False,
 'word.istitle()': False,
 'word.isupper()': False,
 'word.lower()': '(',
 'word[-3:]': '('}

3. Train a CRF model

Once we have features in a right format we can train a linear-chain CRF (Conditional Random Fields) model using sklearn_crfsuite.CRF:


In [5]:
crf = sklearn_crfsuite.CRF(
    algorithm='lbfgs',
    c1=0.1, 
    c2=0.1, 
    max_iterations=20,
    all_possible_transitions=False,
)
crf.fit(X_train, y_train);

4. Inspect model weights

CRFsuite CRF models use two kinds of features: state features and transition features. Let's check their weights using eli5.explain_weights:


In [6]:
eli5.show_weights(crf, top=30)


Out[6]:
From \ To O B-LOC I-LOC B-MISC I-MISC B-ORG I-ORG B-PER I-PER
O 3.281 2.204 0.0 2.101 0.0 3.468 0.0 2.325 0.0
B-LOC -0.259 -0.098 4.058 0.0 0.0 0.0 0.0 -0.212 0.0
I-LOC -0.173 -0.609 3.436 0.0 0.0 0.0 0.0 0.0 0.0
B-MISC -0.673 -0.341 0.0 0.0 4.069 -0.308 0.0 -0.331 0.0
I-MISC -0.803 -0.998 0.0 -0.519 4.977 -0.817 0.0 -0.611 0.0
B-ORG -0.096 -0.242 0.0 -0.57 0.0 -1.012 4.739 -0.306 0.0
I-ORG -0.339 -1.758 0.0 -0.841 0.0 -1.382 5.062 -0.472 0.0
B-PER -0.4 -0.851 0.0 0.0 0.0 -1.013 0.0 -0.937 4.329
I-PER -0.676 -0.47 0.0 0.0 0.0 0.0 0.0 -0.659 3.754
y=O top features y=B-LOC top features y=I-LOC top features y=B-MISC top features y=I-MISC top features y=B-ORG top features y=I-ORG top features y=B-PER top features y=I-PER top features
Weight? Feature
+4.416 postag[:2]:Fp
+3.116 BOS
+2.401 bias
+2.297 postag[:2]:Fc
+2.297 word.lower():,
+2.297 postag:Fc
+2.297 word[-3:]:,
+2.124 postag[:2]:CC
+2.124 postag:CC
+1.984 EOS
+1.859 word.lower():y
+1.684 postag:RG
+1.684 postag[:2]:RG
+1.610 word.lower():-
+1.610 postag[:2]:Fg
+1.610 word[-3:]:-
+1.610 postag:Fg
+1.582 postag:Fp
+1.582 word[-3:]:.
+1.582 word.lower():.
+1.372 word[-3:]:y
+1.187 postag:CS
+1.187 postag[:2]:CS
+1.150 word[-3:]:(
+1.150 postag:Fpa
+1.150 word.lower():(
… 16444 more positive …
… 3771 more negative …
-2.106 postag:NP
-2.106 postag[:2]:NP
-3.723 word.isupper()
-6.166 word.istitle()
Weight? Feature
+2.530 word.istitle()
+2.224 -1:word.lower():en
+0.906 word[-3:]:rid
+0.905 word.lower():madrid
+0.646 word.lower():españa
+0.640 word[-3:]:ona
+0.595 word[-3:]:aña
+0.595 +1:postag[:2]:Fp
+0.515 word.lower():parís
+0.514 word[-3:]:rís
+0.424 word.lower():barcelona
+0.420 -1:postag:Fg
+0.420 -1:word.lower():-
+0.420 -1:postag[:2]:Fg
+0.413 -1:word.isupper()
+0.390 -1:postag[:2]:Fp
+0.389 -1:postag:Fpa
+0.389 -1:word.lower():(
+0.388 word.lower():san
+0.385 postag:NC
… 2282 more positive …
… 413 more negative …
-0.389 -1:word.lower():"
-0.389 -1:postag:Fe
-0.389 -1:postag[:2]:Fe
-0.406 -1:postag[:2]:VM
-0.646 word[-3:]:ión
-0.759 -1:word.lower():del
-0.818 bias
-0.986 postag:SP
-0.986 postag[:2]:SP
-1.354 -1:word.istitle()
Weight? Feature
+0.886 -1:word.istitle()
+0.664 -1:word.lower():de
+0.582 word[-3:]:de
+0.578 word.lower():de
+0.529 -1:word.lower():san
+0.444 +1:word.istitle()
+0.441 word.istitle()
+0.335 -1:word.lower():la
+0.262 postag:SP
+0.262 postag[:2]:SP
+0.235 word[-3:]:la
+0.228 word[-3:]:iro
+0.226 word[-3:]:oja
+0.218 word[-3:]:del
+0.215 word.lower():del
+0.213 -1:postag:NC
+0.213 -1:postag[:2]:NC
+0.205 -1:word.lower():nueva
… 1665 more positive …
… 258 more negative …
-0.206 -1:postag[:2]:Z
-0.206 -1:postag:Z
-0.213 -1:postag[:2]:CC
-0.213 -1:postag:CC
-0.219 -1:word.lower():en
-0.222 +1:word.isupper()
-0.235 +1:postag:VMI
-0.342 word.isupper()
-0.366 +1:postag[:2]:AQ
-0.366 +1:postag:AQ
-0.392 +1:postag[:2]:VM
-1.690 BOS
Weight? Feature
+1.770 word.isupper()
+0.693 word.istitle()
+0.606 word.lower():"
+0.606 word[-3:]:"
+0.606 postag:Fe
+0.606 postag[:2]:Fe
+0.538 +1:word.istitle()
+0.508 -1:word.lower():"
+0.508 -1:postag:Fe
+0.508 -1:postag[:2]:Fe
+0.484 -1:postag[:2]:DA
+0.484 -1:postag:DA
+0.479 +1:word.isupper()
+0.457 postag[:2]:NC
+0.457 postag:NC
+0.400 word.lower():liga
+0.399 word[-3:]:iga
+0.367 -1:word.lower():la
+0.354 postag:Z
+0.354 postag[:2]:Z
+0.332 -1:word.lower():del
+0.286 +1:postag[:2]:Z
+0.286 +1:postag:Z
+0.284 +1:postag:NC
+0.284 +1:postag[:2]:NC
… 2284 more positive …
… 314 more negative …
-0.308 BOS
-0.377 -1:postag[:2]:VM
-0.908 postag[:2]:SP
-0.908 postag:SP
-1.094 -1:word.istitle()
Weight? Feature
+1.364 -1:word.istitle()
+0.675 -1:word.lower():de
+0.597 +1:postag:Fe
+0.597 +1:word.lower():"
+0.597 +1:postag[:2]:Fe
+0.369 -1:postag:NC
+0.369 -1:postag[:2]:NC
+0.324 -1:word.lower():liga
+0.318 word[-3:]:de
+0.304 word.lower():de
+0.303 word.isdigit()
+0.261 -1:postag[:2]:SP
+0.261 -1:postag:SP
+0.258 -1:word.lower():copa
+0.240 word.lower():campeones
+0.235 word[-3:]:000
+0.234 +1:postag:Z
+0.234 +1:postag[:2]:Z
+0.229 word.lower():2000
… 3675 more positive …
… 573 more negative …
-0.235 EOS
-0.264 -1:word.lower():y
-0.265 word.lower():y
-0.265 +1:postag:VMI
-0.274 postag[:2]:VM
-0.306 -1:postag:CC
-0.306 -1:postag[:2]:CC
-0.320 postag:CC
-0.320 postag[:2]:CC
-0.370 +1:postag[:2]:VM
-0.641 bias
Weight? Feature
+2.695 word.lower():efe
+2.519 word.isupper()
+2.084 word[-3:]:EFE
+1.174 word.lower():gobierno
+1.142 word.istitle()
+1.018 -1:word.lower():del
+0.958 word[-3:]:rno
+0.671 word[-3:]:PP
+0.671 word.lower():pp
+0.667 -1:word.lower():al
+0.555 -1:word.lower():el
+0.499 word[-3:]:eal
+0.413 word.lower():real
+0.393 word.lower():ayuntamiento
+0.391 postag:AQ
+0.391 postag[:2]:AQ
… 3518 more positive …
… 619 more negative …
-0.430 -1:postag[:2]:AQ
-0.430 -1:postag:AQ
-0.450 +1:word.lower():de
-0.455 postag[:2]:Z
-0.455 postag:Z
-0.500 -1:word.istitle()
-0.642 -1:word.lower():los
-0.664 -1:word.lower():de
-0.707 -1:word.isupper()
-0.746 -1:word.lower():en
-0.747 -1:postag[:2]:VM
-1.100 bias
-1.289 postag[:2]:SP
-1.289 postag:SP
Weight? Feature
+1.499 -1:word.istitle()
+1.200 -1:word.lower():de
+0.539 -1:word.lower():real
+0.511 word[-3:]:rid
+0.446 word[-3:]:de
+0.433 word.lower():de
+0.428 -1:postag:SP
+0.428 -1:postag[:2]:SP
+0.399 word.lower():madrid
+0.368 word[-3:]:la
+0.365 -1:word.lower():consejo
+0.363 word.istitle()
+0.352 -1:word.lower():comisión
+0.336 postag[:2]:AQ
+0.336 postag:AQ
+0.332 +1:postag:Fpa
+0.332 +1:word.lower():(
+0.311 -1:word.lower():estados
+0.306 word.lower():unidos
… 3473 more positive …
… 703 more negative …
-0.304 postag[:2]:NP
-0.304 postag:NP
-0.306 -1:word.lower():a
-0.384 +1:postag[:2]:NC
-0.384 +1:postag:NC
-0.391 -1:word.isupper()
-0.507 +1:postag:AQ
-0.507 +1:postag[:2]:AQ
-0.535 postag[:2]:VM
-0.540 postag:VMI
-1.195 bias
Weight? Feature
+1.698 word.istitle()
+0.683 -1:postag:VMI
+0.601 +1:postag[:2]:VM
+0.589 postag:NP
+0.589 postag[:2]:NP
+0.589 +1:postag:VMI
+0.565 -1:word.lower():a
+0.520 word[-3:]:osé
+0.503 word.lower():josé
+0.476 -1:postag[:2]:VM
+0.472 postag:NC
+0.472 postag[:2]:NC
+0.452 -1:postag[:2]:Fc
+0.452 -1:word.lower():,
+0.452 -1:postag:Fc
… 4117 more positive …
… 351 more negative …
-0.472 -1:word.lower():en
-0.475 -1:postag[:2]:Fe
-0.475 -1:word.lower():"
-0.475 -1:postag:Fe
-0.543 word.lower():la
-0.572 -1:word.lower():de
-0.693 -1:word.istitle()
-0.712 postag[:2]:SP
-0.712 postag:SP
-0.778 -1:word.lower():del
-0.818 -1:postag[:2]:DA
-0.818 -1:postag:DA
-0.923 -1:word.lower():la
-1.319 postag:DA
-1.319 postag[:2]:DA
Weight? Feature
+2.742 -1:word.istitle()
+0.736 word.istitle()
+0.660 -1:word.lower():josé
+0.598 -1:postag[:2]:AQ
+0.598 -1:postag:AQ
+0.510 -1:postag[:2]:VM
+0.487 -1:word.lower():juan
+0.419 -1:word.lower():maría
+0.413 -1:postag:VMI
+0.345 -1:word.lower():luis
+0.319 -1:word.lower():manuel
+0.315 postag[:2]:NC
+0.315 postag:NC
+0.309 -1:word.lower():carlos
… 3903 more positive …
… 365 more negative …
-0.301 postag[:2]:NP
-0.301 postag:NP
-0.301 word[-3:]:ión
-0.305 postag[:2]:Fe
-0.305 word.lower():"
-0.305 postag:Fe
-0.305 word[-3:]:"
-0.305 +1:word.lower():que
-0.324 -1:word.lower():el
-0.377 +1:postag[:2]:Z
-0.377 +1:postag:Z
-0.396 postag:VMI
-0.433 +1:postag:SP
-0.433 +1:postag[:2]:SP
-0.485 postag[:2]:VM
-1.431 bias

Transition features make sense: at least model learned that I-ENITITY must follow B-ENTITY. It also learned that some transitions are unlikely, e.g. it is not common in this dataset to have a location right after an organization name (I-ORG -> B-LOC has a large negative weight).

Features don't use gazetteers, so model had to remember some geographic names from the training data, e.g. that España is a location.

If we regularize CRF more, we can expect that only features which are generic will remain, and memoized tokens will go. With L1 regularization (c1 parameter) coefficients of most features should be driven to zero. Let's check what effect does regularization have on CRF weights:


In [7]:
crf = sklearn_crfsuite.CRF(
    algorithm='lbfgs',
    c1=200,
    c2=0.1,
    max_iterations=20,
    all_possible_transitions=False,
)
crf.fit(X_train, y_train)
eli5.show_weights(crf, top=30)


Out[7]:
From \ To O B-LOC I-LOC B-MISC I-MISC B-ORG I-ORG B-PER I-PER
O 3.232 1.76 0.0 2.026 0.0 2.603 0.0 1.593 0.0
B-LOC 0.035 0.0 2.773 0.0 0.0 0.0 0.0 0.0 0.0
I-LOC -0.02 0.0 3.099 0.0 0.0 0.0 0.0 0.0 0.0
B-MISC -0.382 0.0 0.0 0.0 4.758 0.0 0.0 0.0 0.0
I-MISC -0.256 0.0 0.0 0.0 4.155 0.0 0.0 0.0 0.0
B-ORG 0.161 0.0 0.0 0.0 0.0 0.0 3.344 0.0 0.0
I-ORG -0.126 -0.081 0.0 0.0 0.0 0.0 4.048 0.0 0.0
B-PER 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.449
I-PER -0.085 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.254
y=O top features y=B-LOC top features y=I-LOC top features y=B-MISC top features y=I-MISC top features y=B-ORG top features y=I-ORG top features y=B-PER top features y=I-PER top features
Weight? Feature
+3.363 BOS
+2.842 bias
+2.478 postag[:2]:Fp
+0.665 -1:word.isupper()
+0.439 +1:postag[:2]:AQ
+0.439 +1:postag:AQ
+0.400 postag[:2]:Fc
+0.400 word.lower():,
+0.400 word[-3:]:,
+0.400 postag:Fc
+0.391 postag:CC
+0.391 postag[:2]:CC
+0.365 EOS
+0.363 +1:postag:NC
+0.363 +1:postag[:2]:NC
+0.315 postag:SP
+0.315 postag[:2]:SP
+0.302 +1:word.isupper()
… 15 more positive …
… 14 more negative …
-0.216 postag:AQ
-0.216 postag[:2]:AQ
-0.334 -1:postag:SP
-0.334 -1:postag[:2]:SP
-0.417 postag[:2]:NP
-0.417 postag:NP
-0.547 postag[:2]:NC
-0.547 postag:NC
-0.547 word.lower():de
-0.600 word[-3:]:de
-3.552 word.isupper()
-5.446 word.istitle()
Weight? Feature
+1.417 -1:word.lower():en
+1.183 word.istitle()
+0.498 +1:postag[:2]:Fp
+0.150 +1:word.lower():,
+0.150 +1:postag:Fc
+0.150 +1:postag[:2]:Fc
+0.098 -1:postag[:2]:Fp
+0.081 -1:postag:Fpa
+0.081 -1:word.lower():(
+0.080 postag[:2]:NP
+0.080 postag:NP
+0.056 -1:postag:SP
+0.056 -1:postag[:2]:SP
+0.022 postag:NC
+0.022 postag[:2]:NC
+0.019 BOS
-0.008 +1:word.istitle()
-0.028 -1:word.lower():del
-0.572 -1:word.istitle()
Weight? Feature
+0.788 -1:word.istitle()
+0.248 word[-3:]:de
+0.237 word.lower():de
+0.199 -1:word.lower():de
+0.190 postag[:2]:SP
+0.190 postag:SP
+0.060 -1:postag:SP
+0.060 -1:postag[:2]:SP
+0.040 +1:word.istitle()
Weight? Feature
+0.349 word.isupper()
+0.053 -1:postag[:2]:DA
+0.053 -1:postag:DA
+0.030 word.istitle()
-0.009 -1:postag:SP
-0.009 -1:postag[:2]:SP
-0.060 bias
-0.172 -1:word.istitle()
Weight? Feature
+0.432 -1:word.istitle()
+0.158 -1:postag[:2]:NC
+0.158 -1:postag:NC
+0.146 +1:postag[:2]:Fe
+0.146 +1:word.lower():"
+0.146 +1:postag:Fe
+0.030 postag[:2]:SP
+0.030 postag:SP
-0.087 word.istitle()
-0.094 bias
-0.119 word.isupper()
-0.120 -1:word.isupper()
-0.121 +1:word.isupper()
-0.211 +1:word.istitle()
Weight? Feature
+1.681 word.isupper()
+0.507 -1:word.lower():del
+0.350 -1:postag:DA
+0.350 -1:postag[:2]:DA
+0.282 word.lower():efe
+0.234 word[-3:]:EFE
+0.195 -1:word.lower():(
+0.195 -1:postag:Fpa
+0.192 word.istitle()
+0.178 +1:postag:Fpt
+0.178 +1:word.lower():)
+0.173 -1:postag[:2]:Fp
+0.136 -1:word.lower():el
+0.110 postag[:2]:NC
+0.110 postag:NC
-0.004 +1:word.istitle()
-0.023 +1:postag[:2]:Fp
-0.041 +1:postag:NC
-0.041 +1:postag[:2]:NC
-0.210 -1:word.lower():de
-0.515 bias
Weight? Feature
+1.318 -1:word.istitle()
+0.762 -1:word.lower():de
+0.185 -1:postag:SP
+0.185 -1:postag[:2]:SP
+0.185 word[-3:]:de
+0.058 word.lower():de
-0.043 -1:word.isupper()
-0.267 +1:word.istitle()
-0.536 bias
Weight? Feature
+0.800 word.istitle()
+0.463 -1:word.lower():,
+0.463 -1:postag[:2]:Fc
+0.463 -1:postag:Fc
+0.148 +1:postag:VMI
+0.125 +1:word.istitle()
+0.095 +1:postag[:2]:VM
+0.007 +1:postag:AQ
+0.007 +1:postag[:2]:AQ
-0.039 -1:word.istitle()
-0.058 postag:DA
-0.058 postag[:2]:DA
-0.063 bias
-0.067 -1:word.lower():de
-0.159 -1:postag:SP
-0.159 -1:postag[:2]:SP
-0.263 -1:postag:DA
-0.263 -1:postag[:2]:DA
Weight? Feature
+2.127 -1:word.istitle()
+0.331 word.istitle()
+0.016 +1:postag[:2]:Fc
+0.016 +1:word.lower():,
+0.016 +1:postag:Fc
-0.089 +1:postag:SP
-0.089 +1:postag[:2]:SP
-0.648 bias

As you can see, memoized tokens are mostly gone and model now relies on word shapes and POS tags. There is only a few non-zero features remaining. In our example the change probably made the quality worse, but that's a separate question.

Let's focus on transition weights. We can expect that O -> I-ENTIRY transitions to have large negative weights because they are impossible. But these transitions have zero weights, not negative weights, both in heavily regularized model and in our initial model. Something is going on here.

The reason they are zero is that crfsuite haven't seen these transitions in training data, and assumed there is no need to learn weights for them, to save some computation time. This is the default behavior, but it is possible to turn it off using sklearn_crfsuite.CRF all_possible_transitions option. Let's check how does it affect the result:


In [8]:
crf = sklearn_crfsuite.CRF(
    algorithm='lbfgs',
    c1=0.1, 
    c2=0.1, 
    max_iterations=20, 
    all_possible_transitions=True,
)
crf.fit(X_train, y_train);

In [9]:
eli5.show_weights(crf, top=5, show=['transition_features'])


Out[9]:
From \ To O B-LOC I-LOC B-MISC I-MISC B-ORG I-ORG B-PER I-PER
O 2.732 1.217 -4.675 1.515 -5.785 1.36 -6.19 0.968 -6.236
B-LOC -0.226 -0.091 3.378 -0.433 -1.065 -0.861 -1.783 -0.295 -1.57
I-LOC -0.184 -0.585 2.404 -0.276 -0.485 -0.582 -0.749 -0.442 -0.647
B-MISC -0.714 -0.353 -0.539 -0.278 3.512 -0.412 -1.047 -0.336 -0.895
I-MISC -0.697 -0.846 -0.587 -0.297 4.252 -0.84 -1.206 -0.523 -1.001
B-ORG 0.419 -0.187 -1.074 -0.567 -1.607 -1.13 5.392 -0.223 -2.122
I-ORG -0.117 -1.715 -0.863 -0.631 -1.221 -1.442 5.141 -0.397 -1.908
B-PER -0.127 -0.806 -0.834 -0.52 -1.228 -1.089 -2.076 -1.01 4.04
I-PER -0.766 -0.242 -0.67 -0.418 -0.856 -0.903 -1.472 -0.692 2.909

With all_possible_transitions=True CRF learned large negative weights for impossible transitions like O -> I-ORG.

5. Customization

The table above is large and kind of hard to inspect; eli5 provides several options to look only at a part of features. You can check only a subset of labels:


In [10]:
eli5.show_weights(crf, top=10, targets=['O', 'B-ORG', 'I-ORG'])


Out[10]:
From \ To O B-ORG I-ORG
O 2.732 1.36 -6.19
B-ORG 0.419 -1.13 5.392
I-ORG -0.117 -1.442 5.141
y=O top features y=B-ORG top features y=I-ORG top features
Weight? Feature
+4.931 BOS
+3.754 postag[:2]:Fp
+3.539 bias
+2.328 word[-3:]:,
+2.328 word.lower():,
+2.328 postag[:2]:Fc
+2.328 postag:Fc
… 15039 more positive …
… 3905 more negative …
-2.187 postag[:2]:NP
-3.685 word.isupper()
-7.025 word.istitle()
Weight? Feature
+3.041 word.isupper()
+2.952 word.lower():efe
+1.851 word[-3:]:EFE
+1.278 word.lower():gobierno
+1.033 word[-3:]:rno
+1.005 word.istitle()
+0.864 -1:word.lower():del
… 3524 more positive …
… 621 more negative …
-0.842 -1:word.lower():en
-1.416 postag[:2]:SP
-1.416 postag:SP
Weight? Feature
+1.159 -1:word.lower():de
+0.993 -1:word.istitle()
+0.637 -1:postag[:2]:SP
+0.637 -1:postag:SP
+0.570 -1:word.lower():real
+0.547 word.istitle()
… 3517 more positive …
… 676 more negative …
-0.480 postag:VMI
-0.508 postag[:2]:VM
-0.533 -1:word.isupper()
-1.290 bias

Another option is to check only some of the features - it helps to check if a feature function works as intended. For example, let's check how word shape features are used by model using feature_re argument and hide transition table:


In [11]:
eli5.show_weights(crf, top=10, feature_re='^word\.is', 
                  horizontal_layout=False, show=['targets'])


Out[11]:

y=O top features

Weight? Feature
-3.685 word.isupper()
-7.025 word.istitle()

y=B-LOC top features

Weight? Feature
+2.397 word.istitle()
+0.099 word.isupper()
-0.152 word.isdigit()

y=I-LOC top features

Weight? Feature
+0.460 word.istitle()
-0.018 word.isdigit()
-0.345 word.isupper()

y=B-MISC top features

Weight? Feature
+2.017 word.isupper()
+0.603 word.istitle()
-0.012 word.isdigit()

y=I-MISC top features

Weight? Feature
+0.271 word.isdigit()
-0.072 word.isupper()
-0.106 word.istitle()

y=B-ORG top features

Weight? Feature
+3.041 word.isupper()
+1.005 word.istitle()
-0.044 word.isdigit()

y=I-ORG top features

Weight? Feature
+0.547 word.istitle()
+0.014 word.isdigit()
-0.012 word.isupper()

y=B-PER top features

Weight? Feature
+1.757 word.istitle()
+0.050 word.isupper()
-0.123 word.isdigit()

y=I-PER top features

Weight? Feature
+0.976 word.istitle()
+0.193 word.isupper()
-0.106 word.isdigit()

Looks fine - UPPERCASE and Titlecase words are likely to be entities of some kind.

6. Formatting in console

It is also possible to format the result as text (could be useful in console):


In [12]:
expl = eli5.explain_weights(crf, top=5, targets=['O', 'B-LOC', 'I-LOC'])
print(eli5.format_as_text(expl))


Explained as: CRF

Transition features:
            O    B-LOC    I-LOC
-----  ------  -------  -------
O       2.732    1.217   -4.675
B-LOC  -0.226   -0.091    3.378
I-LOC  -0.184   -0.585    2.404

y='O' top features
Weight  Feature       
------  --------------
+4.931  BOS           
+3.754  postag[:2]:Fp 
+3.539  bias          
… 15043 more positive …
… 3906 more negative …
-3.685  word.isupper()
-7.025  word.istitle()

y='B-LOC' top features
Weight  Feature           
------  ------------------
+2.397  word.istitle()    
+2.147  -1:word.lower():en
  … 2284 more positive …  
  … 433 more negative …   
-1.080  postag[:2]:SP     
-1.080  postag:SP         
-1.273  -1:word.istitle() 

y='I-LOC' top features
Weight  Feature           
------  ------------------
+0.882  -1:word.lower():de
+0.780  -1:word.istitle() 
+0.718  word[-3:]:de      
+0.711  word.lower():de   
  … 1684 more positive …  
  … 268 more negative …   
-1.965  BOS               


In [ ]: