Learning

Considering a model having

  • Observed variables $X$
  • Hidden variables $Z$
  • Parameters $\theta$

In lecture 16 we defined Learning:, estimate parameters $\theta$ from observed data $X$, as follows:

$$ P(\theta \mid X) = \sum_{z \in Z} P(\theta, z \mid X) = \sum_{z \in Z} P(\theta \mid z, X) P(z \mid X) $$

Make sure you understand this expression: use repeated applications of bayes rule to show that:

$ P(\theta, z \mid X) = P(\theta \mid z, X) P(z \mid X) $

Conditional independence and Graphical Models

In the following exercises, we ask whether variables in a graphical model are independent, perhaps conditioned on other variables. In each case, justify your answer by finding an "active path" or showing no such active path exists (e.g show D-separation) as appropriate.

For example, if we as whether $A \perp B \mid C$ you could answer no by finding an active path from $A$ to $B$, or say that yes, they are indpendent by showing that $A$ and $B$ have D-separation (there are no active paths).

Q1: is $A \perp B$?

Q2: is $A \perp B \mid E$?

Q3: is $A \perp B \mid C$?

Q4: is $A \perp B \mid F$?

Q5: is $X_4 \perp \left\{ {X_1, X_3} \right\} \mid X_2$?

Q6: is $X_1 \perp X_6 \mid \left\{ {X_2, X_3} \right\} $?

Q7: is $X_2 \perp X_3 \mid \left\{ {X_1, X_6} \right\} $?

Hidden Markov Models

Noisy observations $X_k$ generated from hidden Markov chain $Y_k$. $$ P(\vec{X}, \vec{Y}) = P(Y_1) P(X_1 \mid Y_1) \prod_{k=2}^N \left(P(Y_k \mid Y_{k-1}) P(X_k \mid Y_k)\right) $$

Problem: Find the most likely sequence

Assume you are given the parameters of the HMM model, i.e. $\pi_y = P(Y_1 = y)$, $B_{i,j} = P(X_k = j \mid Y_k = i)$, and $A_{i,j} = P(Y_k = j \mid Y_{k-1} = i)$.

Assume you are given a sequence of observed variables $X_1, \ldots, X_T$. Can you figure out how to compute the most likely sequence of unobserved variables? That is, how can we compute: $$ \arg\max_{Y_1, \ldots, Y_T} P(Y_1, \ldots, Y_T \mid X_1, \ldots, X_T) $$

Hint: Dynamic programming. Try to recursively compute $V_{t}(i)$, which is the probability (given all of the observed data) of the most likely (sub)sequence $Y_1, \ldots, Y_t$ that ends in $Y_t = i$.


In [ ]: