In [1]:
import numpy as np
import pylab as pl
%matplotlib inline

Abstract

  • HMMs can model a sub-class of linearly dependent processes (LDPs) -- this article characterizes LDPs using OOMs (learning algorithm has time complexity $O(N + nm^3)$ where $N$ is the length of the training set, $n$ is the number of distinguishable observations, and $m$ is the dimensionality of the model's state space

Introduction

  • LDPs introduced to decide when two HMMs are equivalent
  • See (Ito, 1992)
  • Linearly dependent processes
  • Notation:
    • Let $(X_n)_{n=0,1,2,\dots}$ be a discrete-time stochastic process
    • For $P(X_0 = a_0,\dots,X_N = a_N)$, write $P(a_0 \dots a_N)$ or $P(\bar{a})$
    • For $P(X_{N+1} = b_1,\dots,\dots,X_{N+M} = b_M \mid X_0 = a_0, \dots, X_N a_N)$, write $P(b_{N+1} \dots b_{N+M} \mid a_0 \dots a_N)$ or $P(\bar{v} \mid \bar{a})$
  • From HMMs to OOMs
    • In an HMM, we have a set of hidden states $S$, e.g. $S = \{ s_1,s_2 \}$, and a set of observable events $O$, e.g. $O = \{ a,b \}$
    • The transition rates between hidden states are represented by a transition matrix $M$ $$ \begin{array}{cc} M = \left[ \begin{array}{cc} 0.0 & 1.0 \\ 0.5 & 0.5 \end{array} \right] & M^T = \left[ \begin{array}{cc} 0.0 & 0.5 \\ 1.0 & 0.5 \end{array} \right] \end{array}$$
    • Observation probabilities are given by diagonal observation matrices $O_a$ and $O_b$ $$ \begin{array}{cc} O_a = \left[ \begin{array}{cc} 0.5 & 0.0 \\ 0.0 & 1.0 \end{array} \right] & O_b = \left[ \begin{array}{cc} 0.5 & 0.0 \\ 0.0 & 0.0 \end{array} \right] \end{array}$$

[skipping ahead a little]

  • HMM vs. OMM:
    • An HMM is a structure $(\newcommand{\reals}{\mathbb{R}} \reals^m, \{ T_a, T_b\}, w_-$
  • Definition: an OOM is a structure
  • Interpretable OOMs
    • Definition:
      • Let $O$ be a finite set of observables $k \geq 1$. A $k$-event is a nonempty subset of $O^k$
      • Let $m \geq 1$.
  • Learning OOMs from data:
    • In an interpretable OOM, $$\begin{align} w_0 & = (P(A_1),\dots,P(A_m)) \\ \tau_a w_0 & = (P(a A_1),\dots,P( a A_m)) \\ \tau_a \tau_b w_0 & = (P(b a A_1),\dots,P(b a A_m)) \\ & \dots \end{align}$$
    • $w_0, \tau_a, \tau_a \tau_b w_0, \dots$ can be estimated from data by counting frequencies
      • Can obtain $\tau_a$ from argument-value pairs $$ \begin{align} w_0 & \to \tau_a w_0\\ \tau_b w_0 & \to \tau_a \tau_b w_0\\ & \dots \end{align} $$

In [ ]: