A (discrete time) linear dynamical system describes the evolution of the state of a system and the observations that can be obtained from the state.
The system is described at times $t=1,2,\dots$. At each time $t$, the state is denoted by an $N \times 1$ state vector $x_t$. We don't directly observe the states but obtain observations that are related to the states. The observations are denoted by $M \times 1$ observation vector $y_t$. The mathematical model is
\begin{eqnarray} x_0 & \sim & {\mathcal N}(\mu, P) \\ x_{t} & = & A x_{t-1} + \epsilon_t \\ y_{t} & = & C x_{t} + \nu_t \end{eqnarray}here, $\epsilon_t$ and $\nu_t$ are assumed to be zero mean Gaussian random variables with variance $Q$ and $P$.
Equivalently, we can express the system as a hierarchical generative model \begin{eqnarray} x_0 & \sim & {\mathcal N}(\mu, P) \\ x_{t}|x_{t-1} & \sim & {\mathcal N}(A x_{t-1}, Q) \\ y_{t}|x_{t} & \sim & {\mathcal N}(C x_{t}, R) \end{eqnarray} for $t=1\dots T$.
Both formulations are equivalent; where the former is more often used in engineering and the latter in statistics. Later, we will describe a few small extensions to the model.
We can try to understand the meaning of each parameter to understand qualitatively what the LDS is doing. The parameters reflect our knowledge about
The initial state mean $\mu$ is a $N \times 1$ vector, and the initial state covariance $P$ is an $N \times N$ positive semidefinite matrix. These two parameters reflect our prior knowledge about the initial state of the process: the expected state and the uncertainty around it as characterized by the covariance matrix.
Think of the pair $(\mu, \Sigma)$ defining an ellipse and the initial state $x_0$ is most likely located somewhere in this ellipse.
The state transition matrix $A$ is $N \times N$, and the state transition noise covariance matrix $Q$ is $N \times N$, diagonal, positive semidefinite. The state transition matrix $A$ describes the dynamic behaviour of the linear dynamic system: the system would simply undergo a linear transformation if there was no uncertainty about the dynamics $$ x_t = A x_{t-1} $$ The linear dynamic system model also includes an additive noise term that enables us to model random deviations from an exactly deterministic linear transformation. The deviations are assumed to be zero mean and have the covariance matrix $Q$.
Given the previous state $x_{t-1}$ at time $t-1$, think of the pair $(A x_{t-1}, Q)$ defining an ellipse and the current state $x_t$ is most likely located somewhere in this ellipse.
Finally, the observation matrix
$C$ is $M \times N$, observation matrix $R$ is $M \times M$ and diagonal, positive semidefinite, observation noise covariance
Example: A point object moving with (almost) constant velocity.
The input to the algorithm are the parameters and a sequence of observations $y_t$ for $t=1 \dots T$
$y_t$ is $M \times 1$
The output of the algorithm is for $t=1\dots T$
-- Predicted state mean estimate: $m_{t|t-1}$
-- Covariance Matrix: $\Sigma_{t|t-1}$
-- Filtered state estimate: $m_{t|t}$
-- Covariances: $\Sigma_{t|t}$
-- A sequence of loglikelihoods $l_k = p(y_k| y_{1:k-1})$
The Kalman filtering algorithm is as follows:
The algorithm is initialized by \begin{eqnarray} m_{1|0} & = & \mu \\ \Sigma_{1|0} & = & P \\ l_0 & = & 0 \end{eqnarray}
For $t = 1\dots T $
End For
In [24]:
import numpy as np
N = 2
M = 3
A = np.matrix(np.ceil(5*np.random.randn(N,N)))
C = np.matrix(np.ceil(5*np.random.randn(M,N)))
R = np.matrix(2*np.eye(M))
Q = np.matrix(0.1*np.eye(N))
mu = np.matrix(10*np.ones((N,1)))
P = np.matrix(100*np.eye(N))
print('mu=\n',mu)
print('P=\n',P)
print('A=\n',A)
print('C=\n',C)
print('Q=\n',Q)
print('R=\n',R)
T = 3;
y = np.matrix(np.ceil(5*np.random.randn(M,T)))
print('observations=\n',y)
The loglikelihood computation seems to require the inversion of a large matrix and the computation of its determinant. We can circumvent this by a simple trick that works when the covariance matrix R is diagonal: we update the state by processing each entry of the observation incrementally.
We replace the update step with simpler operations where only scalar operations are needed.
Consider the observation matrix
\begin{eqnarray} C & = & \left( \begin{array}{c} c_1 \\ c_2 \\ \vdots \\ c_M \end{array} \right) \end{eqnarray}where $c$ are row vectors. For example, if $$ C = \left( \begin{array}{ccc} 1 & 2 & 4 \\ 3 & 0 & 3 \end{array} \right) $$ $c_1 = (1,2,4) $ and $c_2 = (3,0,3)$.
Similarly, assume that $R$ is diagonal with $$ R = \left( \begin{array}{cccc} r_1 & & & \\ & r_2 & & \\ & & \ddots & \\ & & & r_M \end{array} \right) $$
For example, if $$ R = \left( \begin{array}{ccc} 1 & 0 \\ 0 & 4 \end{array} \right) $$ $r_1 = 1 $ and $r_2 = 4$.
The new algorithm is as follows
The algorithm is initialized by $$ m_{1|0} = \mu $$ $$ \Sigma_{1|0} = P $$ $$ l_0 = 0 $$
For $t = 1\dots T $ (OuterFor)
Define the temporary covariance and mean $$ W_0 = \Sigma_{t|t-1} $$ $$ v_0 = m_{t|t-1} $$ $$ \lambda_0 = l_{t-1} $$
For $i=1 \dots M$ (InnerFor) $$ S_{t,i} = c_i W_{i-1} c_i^\top + r_i \\ $$ $$ G_{t,i} = \frac{1}{S_{t,i}} W_{i-1} c_i^\top \\ $$ $$ e_{t,i} = y_t[i] - c_i v_{i-1} \\ $$ $$ v_i = v_{i-1} + G_{t,i} e_{t,i} \\ $$ $$ W_i = W_{i-1} - G_{t,i} c_i W_{i-1} \\ $$ $$ \lambda_i = \lambda_{i-1} - \frac{1}{2}\log (2\pi S_{t,i}) - \frac{e_{t,i}^2}{2 S_{t,i}} $$ EndInnerfor
\begin{eqnarray} \Sigma_{t|t} & = & W_M \\ m_{t|t} & = & v_M \\ l_t & = & \lambda_M \end{eqnarray}EndOuterFor
Your task is implementing the Kalman filter with incremental updates as an object in C++.
All the parameters will be specified when constructing the object and observations will be provided to the filter as a $M \times T$ matrix.
In [18]:
A = np.mat('[1, 1; 0, 1]')
C = np.mat('[1, 0]')
Q = 0.01*np.eye(2)
R = 0.01*np.mat('[1]')
mu = np.mat('[0;1]')
P = 0.01*np.eye(2)
y = np.mat('[0, 1, 2.2]')
N = 2
M = 1
T = y.size
Below is a not very efficient and numerically unstable but nevertheless a readable implementation of the Kalman filter for your reference. For or small model sizes $M, N, T$ the results would give a base to test against. Cross check with your implementation that your numerical results are about the same.
In [25]:
m_pred = mu
Sig_pred = P
l = 0
for t in range(T):
S = C*Sig_pred*C.T + R
G = Sig_pred*C.T*(S.I)
e = y[:,t] - C*m_pred
m = m_pred + G*e
Sig = Sig_pred - G*C*Sig_pred
l = l - 0.5*np.log(np.linalg.det(2*np.pi*S)) - 0.5*e.T*S.I*e
print('m_{}'.format(t+1), '=\n', m)
print('Sig_{}'.format(t+1), '=\n',Sig)
print('l_{}'.format(t+1), '=\n',l)
print('\n')
Sig_pred = A*Sig*A.T + Q
m_pred = A*m
For $i=1 \dots M$ (InnerFor) $$ S_{t,i} = c_i W_{i-1} c_i^\top + r_i \\ $$ $$ G_{t,i} = \frac{1}{S_{t,i}} W_{i-1} c_i^\top \\ $$ $$ e_{t,i} = y_t[m] - c_i v_{i-1} \\ $$ $$ v_i = v_{i-1} + G_{t,i} e_{t,i} \\ $$ $$ W_i = W_{i-1} - G_{t,i} c_i W_{i-1} \\ $$ $$ \lambda_i = \lambda_{i-1} - \frac{1}{2}\log (2\pi S_{t,i}) - \frac{e_{t,i}^2}{2 S_{t,i}} $$ EndInnerfor
In [26]:
m_pred = mu
Sig_pred = P
l = 0
for t in range(T):
W = Sig_pred
v = m_pred
lam = l
for i in range(M):
S = C[i,:]*W*C[i,:].T + R[i,i]
G = W*C[i,:].T/S
e = y[i,t] - C[i,:]*v
v = v + G*e
W = W - G*C[i,:]*W
lam = lam - 0.5*np.log(2*np.pi*S) - e*e/(2*S)
Sig = W
m = v
l = lam
print('m_{}'.format(t+1), '=\n', m)
print('Sig_{}'.format(t+1), '=\n',Sig)
print('l_{}'.format(t+1), '=\n',l)
print('\n')
Sig_pred = A*Sig*A.T + Q
m_pred = A*m
In [7]:
%connect_info
In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pylab as plt
A = 0.9
C = 1.0
R = 0.01
Q = 0.05
x0 = 0
T = 100
x = np.zeros((T))
y = np.zeros((T))
for t in range(T):
if t==0:
x[0] = x0
else:
x[t] = A*x[t-1] + np.sqrt(Q)*np.random.randn(1)
y[t] = C*x[t] + np.sqrt(R)*np.random.randn(1)
plt.plot(y,'r')
plt.plot(x,'b')
Out[1]:
In [2]:
m = np.zeros_like(x)
Sig = np.zeros_like(x)
m0 = 0
Sg0 = 1
for t in range(T):
if t==0:
m_pred = A*m0
SG_pred = A*Sg0*A + Q
else:
m_pred = A*m[t-1]
SG_pred = A*Sig[t-1]*A + Q
Gt = SG_pred*C/(C*SG_pred*C+R)
m[t] = m_pred + Gt*(y[t]-C*m_pred)
Sig[t] = SG_pred - Gt*C*SG_pred
plt.plot(x,'.r')
plt.plot(m,'ob')
Out[2]: