We are going to solve a simple problem:
This is readily solved with "standard" machine learning algorithms, but we will show how a model-led approach lets us encode our assumptions elegantly and we get all the benefits of probabilistic tracking. We'll see how probabilistic filters degrade gracefully when our models are bad or measurements are noisy.
We are going to view the problem as inferring which class a time series of observations belongs to. This is a very general problem in HCI; by explicitly treating this a process rather than as a static classification problem, we can resolve much of the clumsiness of many gesture recognisers.
We will use the particle filter algorithm (technically the SIR variant, which is the simplest to understand).
The Kalman filter used the normal distribution to model all of the uncertainty in the system. This is great for computational efficiency, since the updates are simple linear transforms. It is also inferentially efficient; given only a small amount of evidence the Kalman filter will converge quickly compared to other approaches.
But it has several significant drawbacks, which make it difficult to apply directly to infer gestures from observations:
This doesn't make much sense for tracking complex gesture trajectories; a dynamic model for a complete gesture is rarely going to be linear. We might want to be able to learn complex dynamics using a deep network, for example, and then plug them into a probabilistic filter. A Kalman filter does not support this.
Imagine an object disappearing behind an obstruction which could reappear on either side; the Kalman filter can only spread out the distribution over the whole area, with an expected location in the middle of the obstacle! We would like to instead be able to track the two possibilities here by splitting up the hypotheses.
Very often in HCI we encounter problems with a combination of discrete variables and continuous ones. This is critical in gesture recognition for example; at any point in time, our hypotheses might be split among multiple possible gestures with different spatial distributions. Being able to represent the combination of discrete + continuous states is critical. (Kalman filter banks are an alternative approach, which explicitly maintain the competing hypotheses as separate Kalman filters).
As an aside, the hidden Markov model, formerly a key algorithm in speech recognition, is to discrete state tracking what the Kalman filter is to continuous state tracking. The HMM can track discrete hidden states easily (with discrete or continuous observation space), but cannot track continuous variables on its own.
Particle filters are very simple to understand:
This can only ever approximate the "true" distribution; but the Monte Carlo approximation turns out to work surprisingly well, and has good theoretical guarantees.
Particle filters have many uses in HCI. For example, we have used them extensively to track finger configurations when using capacitive sensors. In this case, we have a finger pose state space (hidden) and a sensor matrix (observed), and the filter estimates pose in real-time.
A particle filter requires that we specify:
In [ ]:
# import the things we need
from __future__ import print_function, division
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import pfilter
pfilter = reload(pfilter)
import ipywidgets
import IPython
import scipy.stats
import matplotlib, matplotlib.colors
matplotlib.rcParams['figure.figsize'] = (14.0, 8.0)
%matplotlib inline
from scipy.stats import norm
In [ ]:
%%javascript
IPython.OutputArea.auto_scroll_threshold = 9999;
OutputArea.prototype._should_scroll = function(){return false};
To test the particle filter, we will try and track a very simple, 1D sine wave: $${\bf x_t} = {\bf y_t} = \sin(t)$$
This is a simple, smooth process. We will try and estimate our "hidden" value $\bf{x_t}$ via an observed variable $\bf{y_t}$ which is just $\bf x_t$ with some noise added.
In [ ]:
# 100 points sampled from a sine wave
t = np.linspace(0,20,100).reshape(-1,1)
x = np.sin(t)
plt.plot(t,x)
plt.title("Sine wave")
We will use a very simple model
In [ ]:
sigma_p = 0.2 # the process noise
In [ ]:
### Identity Example
def dynamics(x):
# tomorrow is the same as today
# but slightly randomly different
# we literally *add* noise to our previous state
# x_{t+1} = x_t + N(0,\sigma)
return np.random.normal(x, sigma_p,x.shape)
We can show what this dynamics model looks like, if we had no resampling step in the filter. This makes predictions without ever having seen any data, as we did with the Kalman filter.
We only have a simple state with no deterministic dynamics at all, so the result is just a random walk; our predictive model is not very strong!
In [ ]:
def simulate_dynamics(priors, dynamics, steps, n_runs=1):
runs = []
def simulate_run():
# draw samples from the priors
x = np.array([p() for p in priors])
xs = [x]
for i in range(steps):
x = dynamics(x)
xs.append(x)
return xs
return np.array([simulate_run() for j in range(n_runs)])
# run the simulation 20 times for 200 time steps each time
# we use a normal distribution as our prior on starting value
simulated = simulate_dynamics(priors=[norm(0,1).rvs], dynamics=dynamics, steps=200, n_runs=20)
plt.plot(simulated[:,:,0].T, alpha=0.3, c='C0');
plt.xlabel("Time step")
plt.ylabel("X")
plt.title("Simulated runs")
In [ ]:
def observe(x):
# we observe x directly
return x
Note that this gives more weight to particles that are more similar to the observation: it is a similarity function, not a distance function.
In [ ]:
beta = 0.25 # the RBF width
def weight(true_y, hypothesized_y):
# RBF similarity function
w = np.exp(-np.sum((hypothesized_y-true_y)**2, axis=1)/(0.5*beta**2))
return w
We can see what this function looks like, comparing a true value of 0 with values in [-1, 1], $\beta=0.25$:
In [ ]:
# plot the weight as a function of difference to hypothesized value
hypo = np.linspace(-1,1,100).reshape(100,1)
plt.plot(hypo.reshape(100,), weight(hypo,[0]))
plt.axvline(0,c='C1')
plt.xlabel("Hypothesized value")
plt.ylabel("Unnormalised weight")
plt.title("RBF kernel, $\\beta$=0.25")
Priors We assume a very simple prior on ${\bf x_0}$; that it is normally distributed, mean 0, variance 1, $X_0 \sim N(0,1)$
In [ ]:
# we assume that, before seeing any evidence, that the particles are
# normally distributed about 0, with std. dev. 1.0
def prior(n):
return scipy.stats.norm(0.1).rvs(size=(n,1))
In [ ]:
pfilter = reload(pfilter)
pf_simple = pfilter.ParticleFilter(initial=prior,
observe_fn=observe,
n_particles=200,
dynamics_fn=dynamics,
weight_fn=weight,
resample_proportion=0.02)
In [ ]:
def run_pfilter(pfilter, inputs):
"""Apply a particle filter to a time series of inputs,
and return the particles, their weights, and the mean
estimated state"""
# reset filter
pfilter.init_filter()
particles, weights, means = [], [], []
# apply to each element of the time series
for i in range(len(inputs)):
pfilter.update([inputs[i]])
# store the trace of the particle filter
particles.append(pfilter.original_particles)
weights.append(pfilter.weights)
means.append(pfilter.mean_state)
return np.array(particles), np.array(weights), np.array(means)
We define use a utility function to plot the results of running a particle filter:
In [ ]:
import particle_utils
particle_utils = reload(particle_utils)
from particle_utils import plot_pfilter, animate_pfilter
In [ ]:
def run_filter(sig=0.1, bet=0.5, noise=-2):
global sigma_p, beta, noise_level, frame
sigma_p = sig
beta = bet
noise_level = 10**noise
noise = np.random.normal(0, noise_level, x.shape)
plt.figure(figsize=(12,7))
y = x + noise
particles, weights, means = run_pfilter(pf_simple, y)
# animation: white particles, green MAP, orange observation, red hidden true, mean cyan
#animate_pfilter(t,x,y,particles,weights,means)
plot_pfilter(t, x, y, particles, weights, means)
%gui tk
plt.title("Sigma=%.2f, Beta=%.2f, Noise=%.2e" % (sigma_p, beta, noise_level))
plt.ylim(-1.5, 1.5)
run_filter(sig=0.1, bet=0.5, noise=-2)
There are a few parameters to adjust here: let's adjust these interactively to see the effect.
In [ ]:
ipywidgets.interact_manual(run_filter, sig=(0.0, 2, 0.05), bet=(0.1, 5.0, 0.1), noise = (-5.0, 1.0, 0.1))
Imagine we wanted to infer the phase of the oscillator driving this sine wave. The phase variable is not observable, but we can infer it from the observed oscillation. Furthermore, we want the unwrapped phase, i.e. we expect the phase to monotonically increase. We also drop some of our data to show how the filter responds when observations are not available; the filter has to predict without correction in the absence of observation.
We can encode these assumptions in our model, then see if the particle filter is able to infer the hidden parameter over time.
Observation We postulate an observation model: $${\bf y_t} = \sin({\bf x_t})$$ i.e. that what we see is the effect of sine on a hidden variable $\bf x_t$. Because we defined ${\bf y_t}=\sin(t)$, we are actually trying to infer $t$.
Dynamics We assume that we have a very simple dynamical system in discrete time, where we have the phase and its first time derivative. $${\bf x_t} = \begin{bmatrix}x \\ \dot{x}\end{bmatrix},$$ and $${\bf x_{t+1}} = {\bf x_t} + \begin{bmatrix}\dot{x} \\ 0 \end{bmatrix} + N(0, \Sigma),$$ where $$\Sigma= \begin{bmatrix} \sigma_x & 0 \\ 0 & \sigma_dx \\ \end{bmatrix} $$
This means that if we start linearly increasing or decreasing at a certain rate, we should expect to keep doing so.
In [ ]:
### Example
sigma_x = 0.1
sigma_dx = 0.001
process_sigmas = [sigma_x, sigma_dx] # how much noise for x and dx
# transition function
def linear_dynamics(x):
nx = np.dot(x,np.array([[1,0],
[1,1]]))
nx += np.random.normal(0,process_sigmas,x.shape)
return nx
def observe_sin(x):
# y_t = sin(x_t[0])
return np.sin(x[:,0:1])
def weight_sin(hypothesized_y, true_y):
# RBF similarity function, as we used before
w = np.exp(-np.sum((hypothesized_y-true_y)**2, axis=1)/(0.5*beta**2))
return w
def prior_sin(n):
# simple normal prior on the initial state
return np.stack([scipy.stats.norm(0,1).rvs(n), scipy.stats.norm(0, 0.25).rvs(n)]).T
# drop some of the data
pfilter = reload(pfilter)
pf_sin = pfilter.ParticleFilter(initial=prior_sin,
observe_fn=observe_sin,
n_particles=200,
dynamics_fn=linear_dynamics,
weight_fn=weight_sin,
resample_proportion=0.01)
In [ ]:
def run_sinfilter(sig_x=0.1, sig_dx=0.01, bet=0.25, noise=-2):
global sigma_p, beta, noise_level, frame
sigma_x = sig_x
sigma_dx = sig_dx
beta = bet
noise_level = 10**noise
noise = np.random.normal(0,noise_level, x.shape)
plt.figure(figsize=(12,7))
y = x + noise
y[20:40,:] = np.nan
frame = 0
particles, weights, means = run_pfilter(pf_sin, y)
plot_pfilter(t, t, y, particles, weights, means)
plt.title("Sigma_x=%.2f, Sigma_dx=%.2f, Beta=%.2f, Noise=%.2e" % (sigma_x, sigma_dx, beta, noise_level))
ipywidgets.interact_manual(run_sinfilter, sig_x=(0.0, 2, 0.05), sig_dx=(0.0, 0.5, 0.01),
bet=(0.1, 5.0, 0.1), noise = (-5.0, 1.0, 0.1))
One element we have not discussed is the resample_proportion=0.01
parameter when creating the particle filter.
This tells the filter to replace 1% of particles in each time step with random draws from the original prior. There are formulations of the particle filter which omit this, but it is often very useful to introduce this resampling process.
If the filter drifts far from the observations, or a step change occurs in the observation (imagine a camera suddenly changing exposure), it may take a long time for the particle filter to adapt. In the worst case, it might never be able to move into the relevant portion of the state space. Prior resampling helps keep a "bank" of fresh particles which can rapidly lock onto to new hypotheses.
This is very similar in motivation to the use of multiple chains or multiple restarts in MCMC sampling.
Lets now apply these ideas to a practical HCI task: recognising 2D gestures. We will try and do the full set of gesture recognition tasks is one single, probabilistic model:
This is a challenging task! We will assume a small set of Graffiti-like symbols as the basis for this example.
We will base our algorithm directly on the one given in A Probabilistic Framework for Matching Temporal Trajectories.
In [ ]:
import gestures
g = gestures.GestureData("gestures.txt")
In [ ]:
n_gestures = g.n_gestures
for i in range(n_gestures):
# one plot per gesture
sub = plt.subplot(1, n_gestures,i+1)
path = g.gestures[i]
plt.plot(path[:,0], path[:,1])
plt.plot(path[0,0], path[0,1], 'o')
plt.text(0,0,"%d"%i)
plt.axis("off")
plt.axis("equal")
plt.xlim(0,280)
plt.ylim(0,280)
sub.invert_yaxis() # make it appear correctly for screen co-ordinates
plt.suptitle("Gesture set")
In [ ]:
plt.figure()
for i in range(n_gestures):
plt.subplot(n_gestures,2,i*2+1)
path = g.gestures[i]
plt.plot(path[:,0], '-C0')
plt.plot(path[:,1], '-C1')
plt.xlabel("Samples")
plt.subplot(n_gestures,2,i*2+2)
plt.plot(path[:,0], path[:,1], 'C0')
plt.gca().invert_yaxis()
plt.axis("equal")
plt.axis("off")
In [ ]:
# test get_template
for i in range(100):
xy = g.get_template(1, i)
plt.plot(xy[0], -xy[1], 'C1.')
We want to recognize 2D gestures drawn with a mouse (or finger/stylus).
You have to implement the four key elements of the model:
The rest of the machinery for recognition is implemented for you.
With these four components implemented correctly, you will have a working gesture recogniser.
The following description outlines in more detail how such a model would work:
Putting our assumptions into the probabilistic framework, we want to infer a probability distribution over gesture classes, parameters and gesture completion state, given a time series of $x,y$ coordinates. The $x,y$ points at each time step form our observation vector $y_t$.
We assume gesture reproduction is in some way a "noisy reproduction" of the ideal template form, where there are various types of distortion that can be encountered. We formulate our problem as inferring a distribution over the parameters of these distortions.
We could look at the whole time series of $x,y$ points and try and classify that. However, there are two problems:
This is doable, but a simple way to eliminate these problems is to rewrite the recognizer so that it depends on nothing but its immediately previous state; i.e. so that it satisfies the Markov property.
To do this, we need to introduce additional variables into the state space; but with judicious choice these can be a very small number of additional variables. In particular, for this problem, we can just track how far along a gesture we are (the "phase") and update that over time. The phase state starts at zero and increments as the gesture is performed.
We are now in a position to write down a model for our gesture recognizer.
First of all, the state we are trying to infer:
We have one of $n$ possible gestures
Our model says a gesture will be identical to the template for that class of gesture, but might vary in:
We must take note that what we observe is a position at a single time point in a gesture. This means we must estimate how far through a gesture we are.
Given a gesture $i$, we have a template $G_i(\phi)$, which is returns an $x,y$ point for any value of $\phi$.
We expect to observe $\hat{{\bf y}}=AG_i(\phi)$, where $A$ is a transformation matrix applying the translation $x_c,y_c$, the scaling $s$ and the rotation $\theta$.
The utility function linear_transform()
defined below is a useful component in implementing this:
In [ ]:
def linear_transform(xys, angle=0.0, scale=1.0, translate=(0,0)):
"""Takes a an n x 2 array of point `xys` and returns the 2D points transformed by
rotating by `angle` (degrees)
scaling by `scale` (proportional 1.0=no change, 0.5=half, etc.)
translating by `translate` ((x,y) offset)"""
ca, sa = np.cos(np.radians(angle)), np.sin(np.radians(angle))
rot = np.array([[ca, -sa],
[sa, ca]])
return np.dot(xys, rot)*scale + np.array(translate)
In [ ]:
# demo of linear transform on points distributed in a unit square
original = np.random.uniform(0,1, size=(200,2))
transformed = linear_transform(original, angle=45, scale=1.5, translate=(-5,2))
plt.plot(transformed[:,0], transformed[:,1], '.')
plt.plot(original[:,0], original[:,1], '.')
plt.axis("square");
In [ ]:
In [ ]:
def gesture_observation(state):
# given an n x d matrix of n particle samples
# return a n x 2 matrix of expected x,y, positions for that gesture model
# dummy code,which returns random results:
# obviously, this should be a transformation of the hidden state passed in!
observation_matrix = np.random.normal(200,20,size=(state.shape[0], 2))
return observation_matrix
We then need to specify some simple dynamics. Hint: these should allow the values to slowly change over time (i.e. some random drift), except for the phase $\phi$ which, by our model, we also expect to steadily increase at the rate $\dot{\phi}$. The gesture class should probably not change during a gesture!
In [ ]:
def gesture_dynamics(prev_states):
# take an n x d array of particle samples
# return an n x d array representing the next states
# dummy code: apply no dynamics
next_states = np.array(prev_states)
return next_states
In [ ]:
def gesture_prior(n):
# return an n x d matrix with columns [i, s, x_c, y_c, \theta, \phi, \phi_dot] as an initial guess
# these should call a function draw a value from a distribution
# dummy code: choose a random class and set all other variables to 1.0
return np.stack([
0, # i
np.ones(n,), # s
np.zeros(n,), np.zeros(n,), # x_c, y_c
np.zeros(n,), # \theta
np.zeros(n,), np.ones(n,)]).T # \phi, \phi_dot
# print some test samples from the prior
print(gesture_prior(3))
We again use the simple heat kernel (or RBF): $$w_i = e^{\left(-\frac{(y-y^\prime)^2}{2\beta^2}\right)}$$ (the specific choice of weighting function is rarely very important, except if there are particularly unusual states to be compared).
The gesture_beta
parameter may need adjusted.
You can change this function to a different similarity measure if you want. Note the assumption that we only observe one $x,y$ location at any time $t$.
In [ ]:
def gesture_weight(hypothesized, true):
# take a 2D observation (x,y)
# and an n x 2 matrix of observation samples (returned from gesture_observation())
# return the weight for each, representing how similar they are
gesture_beta = 1000.0 # the RBF width
# RBF similarity function
w = np.exp(-np.sum((hypothesized-true)**2, axis=1)/(0.5*gesture_beta**2))
return w
In [ ]:
import gestures
gestures = reload(gestures)
gestures.interactive_recogniser(
dynamics=gesture_dynamics,
observation=gesture_observation,
prior=gesture_prior,
weight=gesture_weight,
gestures=g.gestures)
%gui tk
Probabilistic filters can be applied to many problems in HCI. Typically, if a process unfolds over time and there is uncertainty, a probabilistic filter is a strong candidate for inference.
The fact that inference is performed over time is a potential advantage over "static" classification approaches, as feedback can be generated on the fly, instead only after completion of an action.
In the specific context of gestures, the ability to infer the start and end-point of gestures can solve the "segmentation problem" or "gesture spotting problem" that is often awkward and leads to kludges like button presses to segment actions.
Probabilistic motion models can easily be linked to higher-order probabilistic models which infer long-term actions on the part of the user. Because everything is a probability distribution, there is a simple common basis for integrating such models. This, for example, can include language models which estimate a distribution over text that is likely to be entered given both user input and a statistical model of language.
Much use of probabilistic filters has depended on strong mathematical models of the fundamental process. For example, in rocket science, sophisticated physics models were used to specify the Kalman filters used for stable control.
However, it is becoming increasingly possible to infer these models from observations. Techniques such as deep learning (for example variational autoencoders or generative adversarial networks) make it possible to learn very sophisticated generative models from observations of data. These models can be dropped into probabilistic filters to produce robust inferential engines for user interaction.
As an illustrative example, we recently built a touch pose estimator to estimate the pose of a finger from a capacitive sensor array (as found on a touch screen). We trained DCNN to predict finger pose from sensor images (inverse model), a separate deconvolutional CNN to predict sensor images from finger poses (forward model) and then fused these using a particle filter.
This combined model gives substantial robustness, and we were able to introduce a simple dynamics model, which filters out completely implausible movements.