LXMLS 2017 - Day 3

Learning Structed Predictors

In this class, we will continue to focus on sequence classification, but instead of following a generative approach (like in the previous chapter) we move towards discriminative approaches. Recall that the difference between these approaches is that generative approaches attempt to model the probability distribution of the data, P(X,Y), whereas discriminative ones only model the conditional probability of the sequence, given the observed data, P(Y|X).

Today’s Assignment The assignment of today’s class is to implement the structured version of the perceptron algorithm.

3.1 Classification vs Sequential Classification Table 3.1 shows how the models for classification have counterparts in sequential classification. In fact, in the last chapter we discussed the Hidden Markov model, which can be seen as a generalization of the Na¨ıve Bayes model for sequences. In this chapter, we will see a generalization of the Perceptron algorithm for sequence problems (yielding the Structured Perceptron, Collins 2002) and a generalization of Maximum Entropy model for sequences (yielding Conditional Random Fields, Lafferty et al. 2001). Note that both these generalizations are not specific for sequences and can be applied to a wide range of models (we will see in tomorrow’s lecture how these methods can be applied to parsing). Although we will not cover all the methods described in Chapter 1, bear in mind that all of those have a structured counterpart. It should be intuitive after this lecture how those methods could be extended to structured problems, given the perceptron example.

See Pag 66

Exercise 3.1 In this exercise you will train a CRF using different feature sets for Part-of-Speech Tagging. Start with the code below, which uses the ID feature set from table 3.2.


In [2]:
import sys
sys.path.append('../../../')

import lxmls.sequences.crf_online as crfo
import lxmls.sequences.structured_perceptron as spc
import lxmls.readers.pos_corpus as pcc
import lxmls.sequences.id_feature as idfc
import lxmls.sequences.extended_feature as exfc

print "CRF Exercise"

corpus = pcc.PostagCorpus( )
train_seq = corpus.read_sequence_list_conll("../../../data/train-02-21.conll", max_sent_len=10, max_nr_sent=1000)
test_seq = corpus.read_sequence_list_conll("../../../data/test-23.conll", max_sent_len=10, max_nr_sent=1000)
dev_seq = corpus.read_sequence_list_conll("../../../data/dev-22.conll", max_sent_len=10, max_nr_sent=1000)

feature_mapper = idfc.IDFeatures(train_seq)
feature_mapper.build_features()


CRF Exercise

In [4]:
crf_online = crfo.CRFOnline(corpus.word_dict, corpus.tag_dict, feature_mapper)
crf_online.num_epochs = 20
crf_online.train_supervised(train_seq)

pred_train = crf_online.viterbi_decode_corpus(train_seq)
pred_dev = crf_online.viterbi_decode_corpus(dev_seq)
pred_test = crf_online.viterbi_decode_corpus(test_seq)
eval_train = crf_online.evaluate_corpus(train_seq, pred_train)
eval_dev = crf_online.evaluate_corpus(dev_seq, pred_dev)
eval_test = crf_online.evaluate_corpus(test_seq, pred_test)

print "CRF - ID Features Accuracy Train: %.3f Dev: %.3f Test: %.3f" % (eval_train, eval_dev, eval_test)


Epoch: 0 Objective value: -5.779018
Epoch: 1 Objective value: -3.192724
Epoch: 2 Objective value: -2.717537
Epoch: 3 Objective value: -2.436614
Epoch: 4 Objective value: -2.240491
Epoch: 5 Objective value: -2.091833
Epoch: 6 Objective value: -1.973353
Epoch: 7 Objective value: -1.875643
Epoch: 8 Objective value: -1.793034
Epoch: 9 Objective value: -1.721857
Epoch: 10 Objective value: -1.659605
Epoch: 11 Objective value: -1.604499
Epoch: 12 Objective value: -1.555229
Epoch: 13 Objective value: -1.510806
Epoch: 14 Objective value: -1.470468
Epoch: 15 Objective value: -1.433612
Epoch: 16 Objective value: -1.399759
Epoch: 17 Objective value: -1.368518
Epoch: 18 Objective value: -1.339566
Epoch: 19 Objective value: -1.312636
CRF - ID Features Accuracy Train: 0.949 Dev: 0.846 Test: 0.858

Exercise 3.2 Repeat the previous exercise using the extended feature set. Compare the results


In [7]:
feature_mapper = exfc.ExtendedFeatures(train_seq)
feature_mapper.build_features()

crf_online = crfo.CRFOnline(corpus.word_dict, corpus.tag_dict, feature_mapper)
crf_online.num_epochs = 20
crf_online.train_supervised(train_seq)

pred_train = crf_online.viterbi_decode_corpus(train_seq)
pred_dev = crf_online.viterbi_decode_corpus(dev_seq)
pred_test = crf_online.viterbi_decode_corpus(test_seq)
eval_train = crf_online.evaluate_corpus(train_seq, pred_train)
eval_dev = crf_online.evaluate_corpus(dev_seq, pred_dev)
eval_test = crf_online.evaluate_corpus(test_seq, pred_test)

print "CRF - Extended Features Accuracy Train: %.3f Dev: %.3f Test: %.3f" % (eval_train, eval_dev, eval_test)


Epoch: 0 Objective value: -7.141596
Epoch: 1 Objective value: -1.807511
Epoch: 2 Objective value: -1.218877
Epoch: 3 Objective value: -0.955739
Epoch: 4 Objective value: -0.807821
Epoch: 5 Objective value: -0.712858
Epoch: 6 Objective value: -0.647382
Epoch: 7 Objective value: -0.599442
Epoch: 8 Objective value: -0.562584
Epoch: 9 Objective value: -0.533411
Epoch: 10 Objective value: -0.509885
Epoch: 11 Objective value: -0.490548
Epoch: 12 Objective value: -0.474318
Epoch: 13 Objective value: -0.460438
Epoch: 14 Objective value: -0.448389
Epoch: 15 Objective value: -0.437800
Epoch: 16 Objective value: -0.428402
Epoch: 17 Objective value: -0.419990
Epoch: 18 Objective value: -0.412406
Epoch: 19 Objective value: -0.405524
CRF - Extended Features Accuracy Train: 0.984 Dev: 0.899 Test: 0.894

3.5 Structured Perceptron

The structured perceptron (Collins, 2002), namely its averaged version, is a very simple algorithm that relies on Viterbi decoding and very simple additive updates. In practice this algorithm is very easy to implement and behaves remarkably well in a variety of problems. These two characteristics make the structured perceptron algorithm a natural first choice to try and test a new problem or a new feature set. Recall what you learned about the perceptron algorithm (Section 1.4.3) and compare it against the structured perceptron (Algorithm 11).

Exercise 3.3 Implement the structured perceptron algorithm.


In [5]:
def perceptron_update(self, sequence):

    ##########################
    # Solution to Exercise 3.3
    ##########################

    num_labels = 0
    num_mistakes = 0

    predicted_sequence, _ = self.viterbi_decode(sequence)

    y_hat = predicted_sequence.y

    # Update initial features.
    y_t_true = sequence.y[0]
    y_t_hat = y_hat[0]

    if y_t_true != y_t_hat:
        true_initial_features = self.feature_mapper.get_initial_features(sequence, y_t_true)
        self.parameters[true_initial_features] += self.learning_rate
        hat_initial_features = self.feature_mapper.get_initial_features(sequence, y_t_hat)
        self.parameters[hat_initial_features] -= self.learning_rate

    for pos in xrange(len(sequence.x)):
        y_t_true = sequence.y[pos]
        y_t_hat = y_hat[pos]

        # Update emission features.
        num_labels += 1
        if y_t_true != y_t_hat:
            num_mistakes += 1
            true_emission_features = self.feature_mapper.get_emission_features(sequence, pos, y_t_true)
            self.parameters[true_emission_features] += self.learning_rate
            hat_emission_features = self.feature_mapper.get_emission_features(sequence, pos, y_t_hat)
            self.parameters[hat_emission_features] -= self.learning_rate

        if pos > 0:
            # update bigram features
            # If true bigram != predicted bigram update bigram features
            prev_y_t_true = sequence.y[pos-1]
            prev_y_t_hat = y_hat[pos-1]
            if y_t_true != y_t_hat or prev_y_t_true != prev_y_t_hat:
                true_transition_features = self.feature_mapper.get_transition_features(
                    sequence, pos-1, y_t_true, prev_y_t_true)
                self.parameters[true_transition_features] += self.learning_rate
                hat_transition_features = self.feature_mapper.get_transition_features(
                    sequence, pos-1, y_t_hat, prev_y_t_hat)
                self.parameters[hat_transition_features] -= self.learning_rate

    pos = len(sequence.x)
    y_t_true = sequence.y[pos-1]
    y_t_hat = y_hat[pos-1]

    if y_t_true != y_t_hat:
        true_final_features = self.feature_mapper.get_final_features(sequence, y_t_true)
        self.parameters[true_final_features] += self.learning_rate
        hat_final_features = self.feature_mapper.get_final_features(sequence, y_t_hat)
        self.parameters[hat_final_features] -= self.learning_rate

    return num_labels, num_mistakes

    #################################
    # End of solution to Exercise 3.3
    #################################

Exercise 3.4 Repeat Exercises 3.1–3.2 using the structured perceptron algorithm instead of a CRF. Report the results. Here is the code for the simple feature set:


In [6]:
print "Perceptron Exercise"

feature_mapper = idfc.IDFeatures(train_seq)
feature_mapper.build_features()

sp = spc.StructuredPerceptron(corpus.word_dict, corpus.tag_dict, feature_mapper)
sp.num_epochs = 20
sp.train_supervised(train_seq)

pred_train = sp.viterbi_decode_corpus(train_seq)
pred_dev = sp.viterbi_decode_corpus(dev_seq)
pred_test = sp.viterbi_decode_corpus(test_seq)
eval_train = sp.evaluate_corpus(train_seq, pred_train)
eval_dev = sp.evaluate_corpus(dev_seq, pred_dev)
eval_test = sp.evaluate_corpus(test_seq, pred_test)

print "Structured Perceptron - ID Features Accuracy Train: %.3f Dev: %.3f Test: %.3f" % (eval_train, eval_dev, eval_test)

feature_mapper = exfc.ExtendedFeatures(train_seq)
feature_mapper.build_features()

sp = spc.StructuredPerceptron(corpus.word_dict, corpus.tag_dict, feature_mapper)
sp.num_epochs = 20
sp.train_supervised(train_seq)

pred_train = sp.viterbi_decode_corpus(train_seq)
pred_dev = sp.viterbi_decode_corpus(dev_seq)
pred_test = sp.viterbi_decode_corpus(test_seq)
eval_train = sp.evaluate_corpus(train_seq, pred_train)
eval_dev = sp.evaluate_corpus(dev_seq, pred_dev)
eval_test = sp.evaluate_corpus(test_seq, pred_test)

print "Structured Perceptron - Extended Features Accuracy Train: %.3f Dev: %.3f Test: %.3f" % (eval_train, eval_dev, eval_test)


Perceptron Exercise
Epoch: 0 Accuracy: 0.656806
Epoch: 1 Accuracy: 0.820898
Epoch: 2 Accuracy: 0.879176
Epoch: 3 Accuracy: 0.907432
Epoch: 4 Accuracy: 0.925239
Epoch: 5 Accuracy: 0.939956
Epoch: 6 Accuracy: 0.946284
Epoch: 7 Accuracy: 0.953790
Epoch: 8 Accuracy: 0.958499
Epoch: 9 Accuracy: 0.955114
Epoch: 10 Accuracy: 0.959235
Epoch: 11 Accuracy: 0.968065
Epoch: 12 Accuracy: 0.968212
Epoch: 13 Accuracy: 0.966740
Epoch: 14 Accuracy: 0.971302
Epoch: 15 Accuracy: 0.968653
Epoch: 16 Accuracy: 0.970419
Epoch: 17 Accuracy: 0.971891
Epoch: 18 Accuracy: 0.971744
Epoch: 19 Accuracy: 0.973510
Structured Perceptron - ID Features Accuracy Train: 0.984 Dev: 0.835 Test: 0.840
Epoch: 0 Accuracy: 0.764386
Epoch: 1 Accuracy: 0.872701
Epoch: 2 Accuracy: 0.903458
Epoch: 3 Accuracy: 0.927594
Epoch: 4 Accuracy: 0.938484
Epoch: 5 Accuracy: 0.951141
Epoch: 6 Accuracy: 0.949816
Epoch: 7 Accuracy: 0.959529
Epoch: 8 Accuracy: 0.957616
Epoch: 9 Accuracy: 0.962325
Epoch: 10 Accuracy: 0.961148
Epoch: 11 Accuracy: 0.970567
Epoch: 12 Accuracy: 0.968212
Epoch: 13 Accuracy: 0.973216
Epoch: 14 Accuracy: 0.974393
Epoch: 15 Accuracy: 0.973951
Epoch: 16 Accuracy: 0.976600
Epoch: 17 Accuracy: 0.977483
Epoch: 18 Accuracy: 0.974834
Epoch: 19 Accuracy: 0.977042
Structured Perceptron - Extended Features Accuracy Train: 0.984 Dev: 0.888 Test: 0.890