This notebook implements filtering on a simple HMM with binary states and evidence variables, walking you through an example step-by-step.
You can also see how the HMM model can be represented as two matrices and subsequently both stages of filtering reduce to the efficient operation of matrix multiplication.
In [ ]:
import numpy as np
In [ ]:
class BinaryHMM:
"""
startprob: np.array(shape(2,))
transmat: np.array(shape(2,2)) - First column is P(X|x), second is P(X|~x)
emissionprob: np.array(shape(2,2)) - First column is P(E|x), second is P(E|~x)
"""
def __init__(self, startprob, transmat, emissionprob):
self._startprob = startprob.reshape(2,1)
self._transmat = transmat
self._emissionprob = emissionprob
self._belief = startprob
def timestep(self):
self._belief = np.dot(self._transmat, self._belief)
def evidence(self, e):
row = 1-e # True is row 0, False is row 1
p_e = (self._emissionprob[row] * self._belief.T).T # Re-weigh belief probability by likelihood
self._belief = p_e / sum(p_e) # Normalize to sum up to 1
def filter(self, e):
self.timestep()
self.evidence(e)
def reset(self, startprob=None):
if startprob is not None:
self._startprob = startprob
if self._startprob is not None:
self._belief = self._startprob
Let's take an example of whether your advisor hates you, which is the hidden variable. The evidence is whether they sent you a curt email. We will assume that their mood doesn't change throughout the day and that they send you one email per day.
Initially, we do not know anything about whether they hate you on Day 1 of the semester.
$$ P(X_1=hates) = 0.5 $$And therefore $$ P(X_1=\neg hates) = 0.5 $$
In other words, we start with a uniform distribution about our Beliefs at time $t=1$. We will simplify by just writing $P(X_i=h)$ and $P(X_i=\neg h)$.
The good thing about this advisor is that they don't hold a grudge. Or, more likely, they don't remember stuff from one day to the next. This is lucky, because if they hate you at time $t=i$, you can't say much about $t=i+1$. It's a complete toss-up what will happen the next day.
Formally
$$ P(X_{i+1}=h | X_i=h) = 0.5 \quad \text{and therefore} \quad P(X_{i+1}=\neg h | X_i=h) = 0.5 \\ $$If they don't hate you at $t=i$, your odds are actually better the next day. The chance of them hating you all of a sudden are small, a mere 10%.
$$ P(X_{i+1}=h | X_i=\neg h) = 0.1 \quad \text{and therefore} \quad P(X_{i+1}=\neg h | X_i=\neg h) = 0.9 $$The only way for us to guess if they hate us or not is based on whether they sent us a curt email. From talking to other students, we discover that there's a pattern to their email sending conditioned on their feelings for us. If they hate us, they are almost certain to sent us a curt email:
$$ P(e_i=curt | X_i=h) = 0.99 \quad \text{and therefore} \quad P(e_i=\neg curt | X_i=h) = 0.01 \\ $$Again, we will shorten to $P(e_i=c)$.
It turns out that if they don't hate us, they will still usually send a curt email, just because they're insanely busy. In fact that will happen 70% of the time they don't hate us.
$$ P(e_i=c | X_i=\neg h) = 0.7 \quad \text{and therefore} \quad P(e_i=\neg c | \ X_i=\neg h) = 0.3 \\ $$
In [ ]:
# These are all the vectors / matrices representing the above model
startprob = np.array([.5, .5])
transmat = np.array([[0.5, 0.1], [0.5, 0.9]])
emissionprob = np.array([[0.99, 0.7], [0.01, 0.3]])
Let's initialize the HMM with this model. As you can see, the current belief $B(X_1)$ is 50%-50%.
In [ ]:
model = BinaryHMM (startprob, transmat, emissionprob)
print (model._belief)
You wait a day, and now have a new belief based on your Transition Model. This is $B^*(X_2)$, or the intermediate belief after time has passed, but before you got today's email.
This time step belief update is given by
$$ B^*(X_{t+1}) = \sum_{x_t} B(x_t)P(X_{t+1} | x_t) $$Or in our case
$$ B^*(X_{t+1}=h) = 0.5 \times 0.5 + 0.5 \times 0.1 = 0.25 + 0.05 = 0.3 $$Read this as: What's the probability of them hating us today knowing only the belief from the previous day?
Either they hated us yesterday (0.5 chance), then it's a toss-up (0.5), or they didn't (also 0.5 chance), but then it's unlikely that they hate us today (0.1).
The probability of them not hating us is the 1-complement. You could equally calculate from the second vector component.
Also notice that this can be compactly represented as a dot product of the transition matrix and the previous day's belief. In our code it looks like this:
def timestep(self):
self._belief = np.dot(self._transmat, self._belief)
In [ ]:
model.timestep()
print (model._belief)
Now you got your daily email, and what do you know? It's curt.
You want to update your belief based on the evidence and get $B(X_2)$
This evidence-based belief update is done by re-weighing each of your belief values by the likelihood (the Bayesian "flip") of the evidence given the belief value. In other words, you take your intermediate belief $B^*(X_{t+1})$ and for each value you multiply it by the probability to have seen the specific evidence given that value, normalized by the posterior to sum up to 1.
Formally:
$$ B(X_{t+1}) = \frac{B^*(x_{t+1})P(e_{t+1} | x_{t+1})}{P(e_{t+1})} $$Remember that the denominator is just the sum of all the values for the numerator.
Let's calculate the numerators:
$$ B(X_{t+1}=h) \propto 0.3 \times 0.99 = 0.297 $$Read this as: The product of the current belief (probability) of being hated times the likelihood of the curt email evidence if the advisor actually hated you.
And conversely:
$$ B(X_{t+1}=\neg h) \propto 0.7 \times 0.7 = 0.49 $$Read this as: The product of the current belief of not being hated times the likelihood of the curt email evidence if there is no hatred involved. Due to being busy, that is still quite likely (70%).
To get the actual probability for $B(X_{t+1}=h)$ we normalize by the sum of all probability values:
$$ B(X_{t+1}=h) = \frac{0.297}{0.297+.49} = 0.377 $$And conversely:
$$ B(X_{t+1}=\neg h) = \frac{0.49}{0.297+.49} = 0.623 $$In our code it looks like this:
def evidence(self, e):
row = 1-e # True is row 0, False is row 1
p_e = (self._emissionprob[row] * self._belief.T).T # Re-weigh belief probability by likelihood
self._belief = p_e / sum(p_e) # Normalize to sum up to 1
In [ ]:
model.evidence(True)
print (model._belief)
Looking at these numbers, it is interesting to note that even though the email was curt, our model infers that the probability of the advisor hating us is still low, and close to the pre-evidence value of 0.3. It is slightly higher because curt emails are more likely when they hate us, but the "prior" of 0.3 and the high likelihood of the curt emails without hate held the increase back.
The transition model made our original estimate go from 0.5 to 0.3 because, let's face it, the advisor most likely does not hate you. They are just busy. The transition model also indicates that, if they did hate you, they generally forgot about it the next day, which also contributes to the decline in $P(X_t=h)$.
Let's see what happens when we run this for ten more days, with you receiving a curt email every day.
In [ ]:
for i in range(10):
model.timestep()
print ("After Transition")
print (model._belief)
model.evidence(True)
print ("After Curt Email Evidence")
print (model._belief)
print ("------------")
Every day that passes the probability of hatred goes down ($B^*$ before evidence), and every curt email, it goes up. Eventually, our belief near-settles on an oscillation between 0.21 and 0.27. This convergence is due to the fact that our evidence is constant.
What would happen if we get a long email all of a sudden? Try to guess before running the code. Then try to calculate it by hand.
In [ ]:
model.timestep()
print ("After Transition")
print (model._belief)
model.evidence(False)
print ("After Long Email Evidence")
print (model._belief)
You can see that since the likelihood of a long email when the advisor hates you is so low (0.01), even a single evidence of a long email drastically reduces the belief $B(X_t=h)$ to less than 0.01. Numerically speaking, this drop in probability happened when the likelihood of that event (long email given hate) was multiplied with the already low probability of the hidden state (hating).
Now try out some different models of the system. What if the advisor does hold grudges? How would you model that? What if they are likely to send long emails if they don't hate you? How do these change affect the dynamics of the inference system?
Enjoy, and remember: your advisor is just busy!