[Your name]

Homework 2

The maximum score of this homework is 100 points. Grading is listed in this table:

Grade	Score range
5	85+
4	70-84
3	55-69
2	40-54
1	0-39

Most exercises include tests which should pass if your solution is correct. However successful test do not guarantee that your solution is correct. You are free to add more tests.

Deadline

2017 November 20^th Monday 23:59

Main exercise (60 points)

Implement the Viterbi algorithm fot $k=3$.

The input can be found here. You can use the 1, 10 or 100 million word corpus, it is advised to use the 1 million corpus while you are developing and you can try te larger ones afterwards.

Write a class called Viterbi with the following attributes:

__init__: has no arguments, except self
train: one argument (aside self), an iterable object which generates 2-tuples of strings [(word1, pos1), (word2, pos2), ...]
- initializes the vocabularies, empirical probabilities or any other data attributes needed for the algorithm.
- you can read the data (generator) only once
- returns None
predict: one argument (aside self), an iterable object, a list of words.
- returns the predicted label sequence: a list of labels with the same length as the input.

Don't use global variables!

Hint

Use a 3-dimensional array for the transition probabilities $\mathbb{P}(v \ | \ w, u)$.
Use a 3-dimensional array for the Viterbi table, one index for time, one index for the previous state and one index for the state before that.
Same for the backtracking table



In [ ]:

    
import numpy



In [ ]:

    
def Viterbi(object):
    pass

with open("umbc.casesensitive.word_pos.1M.txt", "r", encoding="utf8", errors="ignore") as f:
    generator1 = (line.strip().split("\t") for line in f)
    generator2 = (line for line in generator1 if len(line) == 2)

    viterbi = Viterbi()
    viterbi.train(generator2)



In [ ]:

    
assert(viterbi.predict("You talk the talk .".split()) == ['PRP', 'VB', 'DT', 'NN', '.'])
print(viterbi.predict("The dog .".split()))
print(viterbi.predict("The dog runs .".split()))
print(viterbi.predict("The dog runs slowly .".split()))
print(viterbi.predict("The dog 's run was slow .".split()))

Small exercise 1. (10 points)

Modify the Viterbi class: use a logarithmic scale for probabilities.

In the Viterbi table instead of $$ \pi(k,u,v) = \max_{w\in L} \pi(k-1, w, u)\cdot \mathbb{P}(v \ | \ w,u)\cdot \mathbb{P}(w_k \ | \ v) $$ use $$ \hat\pi(k,u,v) = \max_{w\in L} \hat\pi(k−1,w,u) + \log\mathbb{P}(v \ | \ w,u) + \log\mathbb{P}(w_k \ | \ v) $$

Note that the minimum probability is $0$, but the minimum logarithm is $-\infty$. Both numpy and python float can deal with minus infinity.
Precalculate the log-probabilities in the initializer, not during the dymanic programming.

This should not affect the result, just the numbers in the viterbi table.

Name the log-scaled imlementation ViterbiLog, it can inherit from Viterbi or it can be a whole new class.



In [ ]:

    
with open("umbc.casesensitive.word_pos.1M.txt", "r", encoding="utf8", errors="ignore") as f:
    generator1 = (line.strip().split("\t") for line in f)
    generator2 = (line for line in generator1 if len(line) == 2)

    viterbi_log = ViterbiLog()
    viterbi_log.train(generator2)



In [ ]:

    
assert(viterbi.predict("The dog runs slowly .".split()) == viterbi_log.predict("The dog runs slowly .".split()))

Small exercise 2. (30 points)

a) 15 points

Modify the Viterbi class: use a sparse storage for transition probabilities, not a 3-dimensional array.

Use a dict to store the frequencies of the 2 and 3 tuples of labels.

For example if you had "adjective noun" 10 times and "adjective noun determinant" 5 times, then store the following



In [ ]:

    
{('JJ', 'NN'): 10, ('JJ', 'NN', 'DT'): 5}

In the example $\mathbb{P}(DT \ | \ JJ, NN ) = 0.5$

Note that whenever $\#\{JJ, NN\} = 0$ or $\#\{JJ, NN, DT\} = 0$, then $\mathbb{P}(DT \ | \ JJ, NN ) = 0$.

Implement this in a new class ViterbiSparse, it can inherit from the original one or it can be a new class.

b) 15 points

Try to find a sparse representation (with dict-s) which makes the inner for loop shorter. Note that you don't have to take the maximum over all the $w\in L$ elements, if you already know that some transition probabilities are zeros.

$$ \max_{\substack{w\in L \\ \mathbb{P}(v \ | \ w,u) > 0}} \pi(k-1, w, u)\cdot \mathbb{P}(v \ | \ w,u)\cdot \mathbb{P}(w_k \ | \ v) $$



In [ ]: