**(C) 2016-2019 by Damir Cavar <dcavar@iu.edu>**

**Version:** 1.1, September 2019

**Download:** This and various other Jupyter notebooks are available from my GitHub repo.

*cola*, *iced tea*, and *lemonade* when the machine is in state *Cola Pref.* are:

*Iced Tea Pref.* state, the emission probabilities for the three different beverages are:

We can describe the machine using state and emission matrices. The state matrix would be:

Cola Pref. | Iced Tea Pref. | |
---|---|---|

Cola Pref. |
0.7 | 0.3 |

Iced Tea Pref. |
0.5 | 0.5 |

The emission matrix is:

cola | iced tea | lemonade | |
---|---|---|---|

Cola Pref. |
0.6 | 0.1 | 0.3 |

Iced Tea Pref. |
0.1 | 0.7 | 0.2 |

*Cola Pref.* state, we specify the following initial probability matrix that assignes a 0 probability to the state *Iced Tea Pref.* as a start state:

State | Prob. |
---|---|

Cola Pref. |
1.0 |

Iced Tea Pref. |
0.0 |

```
In [1]:
```import numpy

The state matrix could be coded in the following way:

```
In [2]:
```stateMatrix = numpy.matrix("0.7 0.3; 0.5 0.5")

You can inspect the *stateMatrix* content:

```
In [3]:
``````
stateMatrix
```

```
Out[3]:
```

```
In [4]:
```stateMatrix.dot(stateMatrix)

```
Out[4]:
```

The emission matrix can be coded in the same way:

```
In [5]:
```emissionMatrix = numpy.matrix("0.6 0.1 0.3; 0.1 0.7 0.2")

You can inspect the *emissionMatrix* content:

```
In [6]:
``````
emissionMatrix
```

```
Out[6]:
```

And finally we can encode the initial probability matrix as:

```
In [7]:
```initialMatrix = numpy.matrix("1 0")

We can inspect it using:

```
In [8]:
``````
initialMatrix
```

```
Out[8]:
```

*cola lemonade cola* in that particular sequence. We can calculate the probability by assuming that we start in the *Cola Pref.* state, as expressed by the initial probability matrix. That is, computing the paths or transitions starting from any other start state would have a $0$ probability and are thus irrelevant. For only two observations *cola lemonade* there are 4 possible paths through the HMM above, this is: *Cola Pref.* and *Iced Tea Pref.*, *Cola Pref.* and *Cola Pref.*, *Iced Tea Pref.* and *Iced Tea Pref.*, *Iced Tea Pref.* and *Cola Pref.*. For three observations *cola lemonade cola* we have 8 paths with a probability larger than $0$ through our HMM. To compute the probability for a certain sequence to be emitted by our HMM taking **one path**, we would **multiply the products** of the state transition probability and the output emission probability at every time point for the observation. We would **sum up** the products of all these probabilities for **all possible paths** through our HMM to compute the general probability of our observation.

*Cola Pref.* state we would have a probability to transit to the *Cola Pref.* state of $P(Cola\ Pref.\ |\ Cola\ Pref.) = 0.7$ and the likelihood to emit a *cola* is $P(cola) = 0.6$. That is, for the first step we would have to multiply:

$$P(Cola\ Pref.\ |\ Cola\ Pref.)\ *\ P(cola\ |\ Cola\ Pref.) = 0.7 * 0.6 = 0.42$$

*Iced Tea Pref.* state with $P(Iced\ Tea\ Pref.\ |\ Cola\ Pref.) = 0.3$. The emission probability for *lemonade* would then be $P(lemonade\ |\ Iced\ Tea\ Pref.) = 0.2$. The probability for this sub-path so far given the observation *cola lemonade* would thus be:

$$P(Cola\ Pref.\ |\ Cola\ Pref.) * P(cola\ |\ Cola\ Pref.) * P(Iced\ Tea\ Pref.\ |\ Cola\ Pref.) * P(lemonade\ |\ Iced\ Tea\ Pref.) =$$

$$0.7 * 0.6 * 0.3 * 0.2 = 0.0252$$

*cola* from the target state is $P(cola\ |\ Cola\ Pref.) = 0.6$. The likelihood of observing the emissions (*cola, lemonade, cola*) in that particular sequence when starting in the *Cola Pref.* state and when taking a path through the states (*Cola Pref., Iced Tea Pref., Cola Pref.*) is thus $0.7 * 0.6 * 0.3 * 0.2 * 0.5 * 0.6$. Now we would have to sum up this probability with the respective probabilities for all the other possible paths.

*cola lemonade cola* for a given HMM, we would need to calculate the probability of all possible paths through the HMM and sum those up in the end. To calculate the probability of all possible paths, we would

$$P(\mathbf{O}) = \sum_{\mathbf{S}} P(\mathbf{O} \cap \mathbf{S}) =
\sum_{s_{i_1}, \ldots, s_{i_T}} (a_{i_1 k_1} \cdot v_{i_1}) \cdot
\prod_{t=2}^T p_{i_{t-1} i_t} \cdot a_{i_t k_t} $$

Coming soon

Coming soon

```
In [18]:
```from nltk.corpus import treebank

We need to import the HMM module as well, using the following code:

```
In [19]:
```from nltk.tag import hmm

We can instantiate a HMM-Trainer object and assign it to a *trainer* variable using:

```
In [20]:
```trainer = hmm.HiddenMarkovModelTrainer()

*brown.tagged_sents()* returns a list sentences which are lists of tuples of token-tag pairs. The following function returns the first two tagged sentences from the Brown corpus:

```
In [21]:
```treebank.tagged_sents()[:2]

```
Out[21]:
```

```
In [22]:
```tagger = trainer.train_supervised(treebank.tagged_sents())

We can print out some information about the resulting tagger:

```
In [23]:
``````
tagger
```

```
Out[23]:
```

```
In [24]:
```from nltk import word_tokenize

The function *word_tokenize* takes a string as parameter and returns a list of tokens:

```
In [25]:
```word_tokenize("Today is a good day.")

```
Out[25]:
```

*tag* method of the *tagger*-object:

```
In [26]:
```tagger.tag(word_tokenize("Today is a good day."))

```
Out[26]:
```

Bird, Steven, Ewan Klein, Edward Loper (2009) *Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit.* O'Reilly Media.

Jurafsky, Daniel, James H. Martin (2014) *Speech and Language Processing*. Online draft of September 1, 2014.

Manning, Chris and Hinrich Schütze (1999) *Foundations of Statistical Natural Language Processing*, MIT Press. Cambridge, MA.

**(C) 2016-2019 by Damir Cavar <dcavar@iu.edu>**