Dynamical systems are an invaluable concept in many areas of science, such as control theory or physics. Over the past decade, they have also found numerous applications in computer vision. Especially, with the advent of big video data (e.g., youtube, vimeo), people have looked into ways of describing the dynamics as well as the appearance of those visual processes by generative models. This is appealing, since it allows to synthesize observations based on the estimated parameters of the dynamical system.
Here are some important references to get started with (kernel) dyanmic textures:
Dynamic texture models have recently been used in a variety of applications; here are just a few:
pydstk support a several of ways to load video material. All routines are implemented in the dsutil package. Let's import this first ...
In [39]:
import dsutil.dsutil as dsutil
First, we will try to load a video (traffic sequence) from an ASCII file that contains all pixel values for each frame:
In [41]:
data, sz = dsutil.loadDataFromASCIIFile("tests/data/data1.txt")
In [42]:
data.shape
Out[42]:
Each of the 48 frames in the video of size (48 $\times$ 48) was vectorized into a 2304 dimensional column vector.
In [4]:
sz
Out[4]:
To see what the video looks like, we take a look at the first frame of the video:
In [5]:
F = data[:,0].reshape((48,48))
In [6]:
imshow(F.T, cmap=cm.gray)
Out[6]:
The (discrete-time) linear dynamical system (aka dynamic texture in a vision context) can be written as:
$x_{t+1} = \boldsymbol{A}\boldsymbol{x}_{t} + \boldsymbol{w}_t$
$y_{t} = \boldsymbol{C}\boldsymbol{x}_{t} + \boldsymbol{v}_t$
Both state- and observation-evolvement are linear and the noise process (i.e., $w_t\sim \mathcal{N}(0,\boldsymbol{R})$ and $v_t \sim \mathcal{N}(0,\boldsymbol{Q}$) are zero-mean Gaussian (with different covariances). The task of system identification is to estimate the model parameters $(\boldsymbol{A},\boldsymbol{C},\boldsymbol{R},\boldsymbol{Q},x_0)$ from a set of observations $[\boldsymbol{y}_0-\bar{y},\ldots,\boldsymbol{y}_N-\bar{y}],~\boldsymbol{y}_i \in \mathbb{R}^n$. In case of standard dynamic textures, the observations $\boldsymbol{y}_i$ are our video frames (and $\bar{y}$ is the mean, taken over all frames of the video).
In our example $\boldsymbol{y}_i \in \mathbb{R}^{48 \times 48}$ and $N=48$. Next, we will create a Linear Dynamical System (LDS) with 5 states:
In [43]:
from dscore.system import LinearDS
In [46]:
lds = LinearDS(5, False, False)
We can now run system identification to obtain the dynamic texture parameters. System identification is done via the suboptimal (SVD-based) approach that was originally proposed in [Soatto01a] (see References).
In [9]:
lds.check()
Out[9]:
In [47]:
lds.suboptimalSysID(data)
In [48]:
lds.check()
Out[48]:
As we have seen, check() returns False before we actually estimate the system parameters. This can be used to test if the dynamic texture model is ready for instance.
In [12]:
lds._Ahat.shape
Out[12]:
Let's take a look at the spectral radius of the state-matrix (which should be $<1$ if the system is stable):
In [57]:
import numpy as np
specRad = np.max(np.abs(np.linalg.eig(lds._Ahat)[0]))
print specRad
Note: Although THIS system is stable, system identification using SVD does NOT enforce that!
In [56]:
lds._Chat.shape
Out[56]:
So, the state-transition matrix A is a (5 $\times$ 5) matrix, the observation matrix C is a (2304 $\times$ 48) matrix.
We will use the Martin Distance here as an example to measure similarity between two dynamical systems. There are many other recent proposals on suitable similarity measures, though (e.g., Binet-Cauchy kernels, etc.). First, we load a second model to compare to (another traffic sequence):
In [18]:
refData, _ = data, sz = dsutil.loadDataFromASCIIFile("tests/data/data2.txt")
In [19]:
refLDS = LinearDS(5, False, False)
In [20]:
refLDS.suboptimalSysID(refData)
Distance measurements in pydstk are implemented the dscore/dsdist.py module from which we'll import ldsMartinDistance:
In [21]:
from dscore.dsdist import ldsMartinDistance
In [64]:
D = ldsMartinDistance(lds, refLDS, 20)
Computation of the Martin distance, in principle, would require to build the infinite observability matrix of the system, i.e., $[\boldsymbol{C}~\boldsymbol{C}\boldsymbol{A}~\boldsymbol{C}\boldsymbol{A}^2\cdots]^T$. The last argument to ldsMartinDistance, i.e., 20 specifies the maximum power to which $\boldsymbol{A}$ is raised.
In [65]:
print D
When we compare lds to itself, we should get 0 (up to numerical acc.):
In [69]:
np.testing.assert_almost_equal(ldsMartinDistance(lds, lds, 20), 0, 5)
Clustering dynamic texture models is an important task when building Bag-of-DT models, or when computational resources are limited and model comparison needs to be done with respect to a limited set of representative models from a database.
First, we are going to import a list of data file names. These files contain 2-D signals (sine, cosine) with different frequencies, embedded in 24-dimensional space.
Let's take a look at one 2-D signal:
In [75]:
sig0 = np.genfromtxt("tests/data/clustering/sig0001.txt")
In [77]:
plot(sig0[0,:])
plot(sig0[1,:])
Out[77]:
In [82]:
sig1 = np.genfromtxt("tests/data/clustering/sig0077.txt")
In [83]:
plot(sig1[0,:])
plot(sig1[1,:])
Out[83]:
Signals 0-49 were created using random frequencies in [0.9,1], signals 50-100 were created using random frequencies in [0.3, 0.4]. The signals are sampled at 80 linearly spaced points in [0,$6\pi$]. Some random noise ($\sigma=0.01$) was finally added to the embedded signal (in 24-dimensional space).
In [84]:
with open("tests/data/clustering.files") as fid:
content = fid.read().splitlines()
In [28]:
content[0:5]
Out[28]:
Next, we estimate a 8-state LDS for the data contentof each file:
In [86]:
import os
models = []
for data_file in content:
data = np.genfromtxt(os.path.join("tests/data/", data_file))
lds = LinearDS(8, False, False)
lds.suboptimalSysID(data)
models.append(lds)
Let's create the pairwise distance matrix (between all the models) next:
In [87]:
N = len(models)
D = np.zeros((N, N))
for i in range(N):
for j in range(N):
D[i,j] = ldsMartinDistance(models[i], models[j], 50)
Let's take a look at the distance matrix:
In [88]:
imshow(D)
Out[88]:
Looks not too bad. Let's run LDS clustering (with 2 cluster centers):
In [89]:
eData, ids = LinearDS.cluster(D, 2)
In [90]:
# first, we plot the data
plot(eData[0:49,0],eData[0:49,1], color='b', marker='+', linestyle='')
plot(eData[50:,0],eData[50:,1], color='r', marker='o', linestyle='')
# then, we plot the representatives
plot(eData[ids[0],0],eData[ids[0],1], marker='s', color='black', markersize=10)
plot(eData[ids[1],0],eData[ids[1],1], marker='s', color='green', markersize=10)
Out[90]:
As we can see in this toy example, the LDS's are well separated and the cluster centers seem reasonable.