Probabilistic filtering for intention inference

Part II: Particle filters: Sequential Monte Carlo estimation

Problem description

We are going to solve a simple problem:

  • Recognising simple 2D gestures, drawn with a mouse.

This is readily solved with "standard" machine learning algorithms, but we will show how a model-led approach lets us encode our assumptions elegantly and we get all the benefits of probabilistic tracking. We'll see how probabilistic filters degrade gracefully when our models are bad or measurements are noisy.

We are going to view the problem as inferring which class a time series of observations belongs to. This is a very general problem in HCI; by explicitly treating this a process rather than as a static classification problem, we can resolve much of the clumsiness of many gesture recognisers.

Algorithm

We will use the particle filter algorithm (technically the SIR variant, which is the simplest to understand).

Why particle filter?

The Kalman filter used the normal distribution to model all of the uncertainty in the system. This is great for computational efficiency, since the updates are simple linear transforms. It is also inferentially efficient; given only a small amount of evidence the Kalman filter will converge quickly compared to other approaches.

But it has several significant drawbacks, which make it difficult to apply directly to infer gestures from observations:

Kalman filter drawbacks

  • the dynamics have to be linear: we can't have complicated dynamic models (although we can linearise at each time step).

This doesn't make much sense for tracking complex gesture trajectories; a dynamic model for a complete gesture is rarely going to be linear. We might want to be able to learn complex dynamics using a deep network, for example, and then plug them into a probabilistic filter. A Kalman filter does not support this.

  • all of the uncertainty must be normal: so we can't track multiple modes, for example, because a normal distribution has exactly one mode.

Imagine an object disappearing behind an obstruction which could reappear on either side; the Kalman filter can only spread out the distribution over the whole area, with an expected location in the middle of the obstacle! We would like to instead be able to track the two possibilities here by splitting up the hypotheses.

[Waddington's epigenetic landscape, illustrating a dynamic system which develops multiple modes as it evolves; a Gaussian approximation is wholly inappropriate]

Very often in HCI we encounter problems with a combination of discrete variables and continuous ones. This is critical in gesture recognition for example; at any point in time, our hypotheses might be split among multiple possible gestures with different spatial distributions. Being able to represent the combination of discrete + continuous states is critical. (Kalman filter banks are an alternative approach, which explicitly maintain the competing hypotheses as separate Kalman filters).

As an aside, the hidden Markov model, formerly a key algorithm in speech recognition, is to discrete state tracking what the Kalman filter is to continuous state tracking. The HMM can track discrete hidden states easily (with discrete or continuous observation space), but cannot track continuous variables on its own.


Particle Filtering

Particle filters are very simple to understand:

  • Sample-based: Instead of trying to represent the PDFs of our distributions, which could be any function that integrates to unity, we represent distributions using samples from those distributions (i.e. it is a sequential Monte Carlo method; we use samples to represent distributions). We choose some fixed number of particles $N_s$ and have a set of particles $S_t = \{ x^{1}_t, x^{2}_t, \dots, x^{N_s}_t \}$. At any time point, our belief is captured by this sample set.
  • Arbitrary dynamics: we can apply any dynamics simply by applying our transition function $f({\bf x_t})$ to each sample from our current set $S_t$ to get a new predicted set of samples $S_{t+1}$. Likewise, for any sample, we can compute the corresponding observation by applying an arbitrary function $g(x_t)$ to each sample independently. There are no restrictions on the form of $f({\bf x_t})$ and $g({\bf x_t})$.
  • Arbitrary distribution: Our uncertainty is captured by the locations of the set of particles in the state space; they are samples from our current posterior. We do not have an explicit parametric form for the distribution; samples are all that we have. Therefore there are no restrictions on the form of the distribution; it can be multi-modal, heavy-tailed etc.
  • No explicit likelihood: We don't explicitly compute the likelihood of observations, but instead we apply some weighting function to the particles and then normalise these weights to approximate the likelihood. This means we only have to find a reasonable weighting function, and not a true likelihood.
  • Importance sampled We resample particles which are likely after an observation, and discard those which are unlikely. This is importance resampling. Without this step, particles would quickly diffuse into very unlikely parts of the space. Importance resampling continuously adapts the particles to cover likely portions of the space.

This can only ever approximate the "true" distribution; but the Monte Carlo approximation turns out to work surprisingly well, and has good theoretical guarantees.


Particle filters have many uses in HCI. For example, we have used them extensively to track finger configurations when using capacitive sensors. In this case, we have a finger pose state space (hidden) and a sensor matrix (observed), and the filter estimates pose in real-time.

See the AnglePose video

Specifying a particle filter

A particle filter requires that we specify:

  • A dynamics or transition function $f({\bf {x}}_t)$ that predicts how we expect the world to evolve, which takes ${\bf{x}}^k_t \rightarrow {\bf x}^k_{t+1}$ for any given sample ${\bf{x}}^k_t$.
  • An observation function $g({\bf x}_{t})$ that predicts what we expect to observe, given a hypothesized state: ${\bf{x}_t^k} \rightarrow \hat{\bf y}_t^k $
  • A weight function, that, given a hypothesized observation $\hat{\bf y}_t^k$, can be used to approximate $p(\hat{\bf y}_t|{\bf y}_t)$. This is performed by computing weights $w_k$ for each particle $\bf x^k$ and then normalizing to produce the per-sample likelihood: $$p^{(k)}(\hat{\bf y}_t|{\bf y}_t) = \frac{w_k}{\sum_j w_j}$$
  • A set of prior distributions that specify our initial guesses for $\hat{\bf x}_0$, and allow us to draw the samples $\{ \hat{\bf x}^1_0, \hat{\bf x}^2_0, \dots, \hat{\bf x}^{N_s}_0 \}$

Basic filtering

We will first implement a basic particle filter that can track a very simple one dimensional time series. This toy problem is easy to understand and work with.

Then, we will show how this can generalise to the gesture tracking problem.


In [ ]:
# import the things we need
from __future__ import print_function, division
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import pfilter
pfilter = reload(pfilter)
import ipywidgets
import IPython
import scipy.stats
import matplotlib, matplotlib.colors
matplotlib.rcParams['figure.figsize'] = (14.0, 8.0)
%matplotlib inline
from scipy.stats import norm

In [ ]:
%%javascript
IPython.OutputArea.auto_scroll_threshold = 9999;
OutputArea.prototype._should_scroll = function(){return false};

Some test data

To test the particle filter, we will try and track a very simple, 1D sine wave: $${\bf x_t} = {\bf y_t} = \sin(t)$$

This is a simple, smooth process. We will try and estimate our "hidden" value $\bf{x_t}$ via an observed variable $\bf{y_t}$ which is just $\bf x_t$ with some noise added.


In [ ]:
# 100 points sampled from a sine wave
t = np.linspace(0,20,100).reshape(-1,1)
x = np.sin(t)
plt.plot(t,x)
plt.title("Sine wave")

Simple model

We will use a very simple model

  • Dynamics We assume that there are no predictable dynamics, just some Gaussian noise $X_t = X_{t+1} + N(0,\sigma_p)$. Note that when we have a stochastic element in a particle filter, we simply draw a sample from a distribution, and treat it as part of the deterministic update. Each particle will therefore get a different draw. We can have any kind of stochastic system here, not just additive noise (as in the Kalman filter); we could easily implement multiplicative noise instead.

In [ ]:
sigma_p = 0.2              # the process noise

In [ ]:
### Identity Example
def dynamics(x):
    # tomorrow is the same as today
    # but slightly randomly different
    # we literally *add* noise to our previous state
    # x_{t+1} = x_t + N(0,\sigma)
    return np.random.normal(x, sigma_p,x.shape)

We can show what this dynamics model looks like, if we had no resampling step in the filter. This makes predictions without ever having seen any data, as we did with the Kalman filter.

We only have a simple state with no deterministic dynamics at all, so the result is just a random walk; our predictive model is not very strong!


In [ ]:
def simulate_dynamics(priors, dynamics, steps, n_runs=1):
    runs = []
    def simulate_run():
        # draw samples from the priors
        x = np.array([p() for p in priors])
        xs = [x]    
        for i in range(steps):
            x = dynamics(x)
            xs.append(x)
        return xs    
    return np.array([simulate_run() for j in range(n_runs)])
    
# run the simulation 20 times for 200 time steps each time
# we use a normal distribution as our prior on starting value
simulated = simulate_dynamics(priors=[norm(0,1).rvs], dynamics=dynamics, steps=200, n_runs=20)   
plt.plot(simulated[:,:,0].T, alpha=0.3, c='C0');
plt.xlabel("Time step")
plt.ylabel("X")
plt.title("Simulated runs")
  • Observation We assume that the sensor we measure is the value we want to infer, i.e. $\bf y_t=x_t$, and therefore $g({\bf x_t})$ is just the identity.

In [ ]:
def observe(x):
    # we observe x directly
    return x
  • Weighting We weight samples according to how similar they are to the observed output. We use a simple heat kernel: $$w_i = e^{\left(-\frac{(y-y^\prime)^2}{2\beta^2}\right)}$$ $\beta$ is a parameter that lets us specify how precise our matching between observation and reality is.

Note that this gives more weight to particles that are more similar to the observation: it is a similarity function, not a distance function.


In [ ]:
beta = 0.25                # the RBF width

def weight(true_y, hypothesized_y):
    # RBF similarity function       
    w = np.exp(-np.sum((hypothesized_y-true_y)**2, axis=1)/(0.5*beta**2))    
    return w

We can see what this function looks like, comparing a true value of 0 with values in [-1, 1], $\beta=0.25$:


In [ ]:
# plot the weight as a function of difference to hypothesized value
hypo = np.linspace(-1,1,100).reshape(100,1)
plt.plot(hypo.reshape(100,), weight(hypo,[0]))
plt.axvline(0,c='C1')
plt.xlabel("Hypothesized value")
plt.ylabel("Unnormalised weight")
plt.title("RBF kernel, $\\beta$=0.25")

Priors We assume a very simple prior on ${\bf x_0}$; that it is normally distributed, mean 0, variance 1, $X_0 \sim N(0,1)$


In [ ]:
# we assume that, before seeing any evidence, that the particles are 
# normally distributed about 0, with std. dev. 1.0

def prior(n):
    return scipy.stats.norm(0.1).rvs(size=(n,1))

Filter creation

We use a simple implementation of a particle filter (pfilter), which simply takes these functions and the number of particles to use in the sampling process.


In [ ]:
pfilter = reload(pfilter)
pf_simple = pfilter.ParticleFilter(initial=prior, 
                                    observe_fn=observe,
                                    n_particles=200,                                    
                                    dynamics_fn=dynamics,
                                    weight_fn=weight,                    
                                    resample_proportion=0.02)

In [ ]:
def run_pfilter(pfilter, inputs):
    """Apply a particle filter to a time series of inputs,
    and return the particles, their weights, and the mean
    estimated state"""    
    # reset filter
    pfilter.init_filter()
    particles, weights, means = [], [], []
    # apply to each element of the time series
    for i in range(len(inputs)):           
        pfilter.update([inputs[i]])
        # store the trace of the particle filter
        particles.append(pfilter.original_particles)    
        weights.append(pfilter.weights)
        means.append(pfilter.mean_state)        
    return np.array(particles), np.array(weights), np.array(means)

We define use a utility function to plot the results of running a particle filter:


In [ ]:
import particle_utils
particle_utils = reload(particle_utils)
from particle_utils import plot_pfilter, animate_pfilter

Running the estimator


In [ ]:
def run_filter(sig=0.1, bet=0.5, noise=-2):
    global sigma_p, beta, noise_level, frame
    sigma_p = sig
    beta = bet
    noise_level = 10**noise
    noise = np.random.normal(0, noise_level, x.shape)
    plt.figure(figsize=(12,7))
    y = x + noise
    particles, weights, means = run_pfilter(pf_simple, y)
    # animation: white particles, green MAP, orange observation, red hidden true, mean cyan
    #animate_pfilter(t,x,y,particles,weights,means)
    plot_pfilter(t, x, y, particles, weights, means)
    %gui tk
    plt.title("Sigma=%.2f, Beta=%.2f, Noise=%.2e" % (sigma_p, beta, noise_level))
    plt.ylim(-1.5, 1.5)
    
run_filter(sig=0.1, bet=0.5, noise=-2)

There are a few parameters to adjust here: let's adjust these interactively to see the effect.


In [ ]:
ipywidgets.interact_manual(run_filter, sig=(0.0, 2, 0.05), bet=(0.1, 5.0, 0.1), noise = (-5.0, 1.0, 0.1))

Things to note

  • This is a trivial model, but still tracks "complex" functions, because it is adapting to observations
  • Things to adjust:
    • noise level
    • rbf width beta
    • dynamics noise sigma

A more interesting example

Imagine we wanted to infer the phase of the oscillator driving this sine wave. The phase variable is not observable, but we can infer it from the observed oscillation. Furthermore, we want the unwrapped phase, i.e. we expect the phase to monotonically increase. We also drop some of our data to show how the filter responds when observations are not available; the filter has to predict without correction in the absence of observation.

[Image credit:1ucasvb]

We can encode these assumptions in our model, then see if the particle filter is able to infer the hidden parameter over time.

  • Observation We postulate an observation model: $${\bf y_t} = \sin({\bf x_t})$$ i.e. that what we see is the effect of sine on a hidden variable $\bf x_t$. Because we defined ${\bf y_t}=\sin(t)$, we are actually trying to infer $t$.

  • Dynamics We assume that we have a very simple dynamical system in discrete time, where we have the phase and its first time derivative. $${\bf x_t} = \begin{bmatrix}x \\ \dot{x}\end{bmatrix},$$ and $${\bf x_{t+1}} = {\bf x_t} + \begin{bmatrix}\dot{x} \\ 0 \end{bmatrix} + N(0, \Sigma),$$ where $$\Sigma= \begin{bmatrix} \sigma_x & 0 \\ 0 & \sigma_dx \\ \end{bmatrix} $$

This means that if we start linearly increasing or decreasing at a certain rate, we should expect to keep doing so.

  • Priors We again assume that the initial distribution is normally distributed, with $$ {\bf x_0} \sim N(0, \Sigma_0) $$

In [ ]:
### Example
sigma_x = 0.1
sigma_dx = 0.001
process_sigmas = [sigma_x, sigma_dx] # how much noise for x and dx    

# transition function
def linear_dynamics(x):        
    nx = np.dot(x,np.array([[1,0],
                            [1,1]]))        
    nx += np.random.normal(0,process_sigmas,x.shape)
    return nx

def observe_sin(x):    
    # y_t = sin(x_t[0])    
    return np.sin(x[:,0:1])

def weight_sin(hypothesized_y, true_y):
    # RBF similarity function, as we used before     
    w = np.exp(-np.sum((hypothesized_y-true_y)**2, axis=1)/(0.5*beta**2))    
    return w   

def prior_sin(n):
    # simple normal prior on the initial state
    return np.stack([scipy.stats.norm(0,1).rvs(n), scipy.stats.norm(0, 0.25).rvs(n)]).T  

# drop some of the data
pfilter = reload(pfilter)
pf_sin = pfilter.ParticleFilter(initial=prior_sin, 
                                observe_fn=observe_sin,
                                n_particles=200,
                                dynamics_fn=linear_dynamics,
                                weight_fn=weight_sin,                    
                                resample_proportion=0.01)

In [ ]:
def run_sinfilter(sig_x=0.1, sig_dx=0.01, bet=0.25, noise=-2):
    global sigma_p, beta, noise_level, frame
    sigma_x = sig_x
    sigma_dx = sig_dx
    beta = bet
    noise_level = 10**noise
    noise = np.random.normal(0,noise_level, x.shape)
    plt.figure(figsize=(12,7))
    y = x + noise
    y[20:40,:] = np.nan
    frame = 0
    particles, weights, means = run_pfilter(pf_sin, y)
    plot_pfilter(t, t, y, particles, weights, means)
    plt.title("Sigma_x=%.2f, Sigma_dx=%.2f, Beta=%.2f, Noise=%.2e" % (sigma_x, sigma_dx, beta, noise_level))
    
ipywidgets.interact_manual(run_sinfilter, sig_x=(0.0, 2, 0.05), sig_dx=(0.0, 0.5, 0.01), 
                           bet=(0.1, 5.0, 0.1), noise = (-5.0, 1.0, 0.1))

Things to note

  • The particle filter was able to infer the hidden state, despite only having a forward model (i.e knowledge of $\sin(x)$, not $\sin^{-1}(x)$)
  • It correctly unwrapped phase, because we primed it with the dynamics model to expect values that would increase at a linear rate.
  • This problem results in multimodal distributions, because there are an infinite number of solutions to $y=sin(x)$ because we can add any multiple of $2\pi$ without changing anything. We can see these as fainter lines on the particle plot.
  • This means that the particle mean is not actually a good estimate in this case! A better choice might be the most likely particle -- the Maximum A Posteriori (MAP) estimate.

Resampling

One element we have not discussed is the resample_proportion=0.01 parameter when creating the particle filter.

This tells the filter to replace 1% of particles in each time step with random draws from the original prior. There are formulations of the particle filter which omit this, but it is often very useful to introduce this resampling process.

If the filter drifts far from the observations, or a step change occurs in the observation (imagine a camera suddenly changing exposure), it may take a long time for the particle filter to adapt. In the worst case, it might never be able to move into the relevant portion of the state space. Prior resampling helps keep a "bank" of fresh particles which can rapidly lock onto to new hypotheses.

This is very similar in motivation to the use of multiple chains or multiple restarts in MCMC sampling.


Challenge: Gesture recognition

Lets now apply these ideas to a practical HCI task: recognising 2D gestures. We will try and do the full set of gesture recognition tasks is one single, probabilistic model:

  • spot gestures: determine when they start and end, without any external segmentation cue like a button push.
  • recognise gestures: label them according to class
  • parameterise gestures: recover parameters like the size, speed or rotation of the gestures performed.

This is a challenging task! We will assume a small set of Graffiti-like symbols as the basis for this example.

We will base our algorithm directly on the one given in A Probabilistic Framework for Matching Temporal Trajectories.

Data

We have some example data with a few example 2D mouse-drawn gestures. Here we are assuming just a single template for each gesture, for simplicity.


In [ ]:
import gestures
g = gestures.GestureData("gestures.txt")

Gesture shapes

We can plot the shapes of the gesture trajectories.


In [ ]:
n_gestures = g.n_gestures
for i in range(n_gestures):
    # one plot per gesture
    sub = plt.subplot(1, n_gestures,i+1)
    path = g.gestures[i]    
    plt.plot(path[:,0], path[:,1])
    plt.plot(path[0,0], path[0,1], 'o')
    
    plt.text(0,0,"%d"%i)
    plt.axis("off")
    plt.axis("equal")
   
    plt.xlim(0,280)
    plt.ylim(0,280)
    sub.invert_yaxis() # make it appear correctly for screen co-ordinates
    plt.suptitle("Gesture set")

Timeseries view

We can also see each gesture as a trajectory of two coordinates ($x,y$ coordinates) over time. This is closer to the way in which the matching will work.


In [ ]:
plt.figure()
for i in range(n_gestures):
    plt.subplot(n_gestures,2,i*2+1)    
    path = g.gestures[i]    
    plt.plot(path[:,0], '-C0')
    plt.plot(path[:,1], '-C1')
    plt.xlabel("Samples")
    plt.subplot(n_gestures,2,i*2+2)
    plt.plot(path[:,0], path[:,1], 'C0')
    plt.gca().invert_yaxis()
    plt.axis("equal")
    plt.axis("off")

get_template

We have a simple utility function get_template(i, t) which returns the $x,y$ co-ordinate for gesture $i$ at sample $t$. It automatically clips $t$ from 0 to the length of the gesture (in frames).


In [ ]:
# test get_template
for i in range(100):
    xy = g.get_template(1, i)
    plt.plot(xy[0], -xy[1], 'C1.')

Challenge

We want to recognize 2D gestures drawn with a mouse (or finger/stylus).

We know:

  • We observe sequences of $x,y$ coordinates over time.
  • We have some example templates for particular shapes that we want to match (e.g. letters)

We don't know:

  • where the user will start drawing a gesture
  • how big the gesture will be
  • how fast the user will draw the gesture (it may well be drawn at a non-constant speed)

We want to know:

  • which gesture the user is performing, if any
  • when the user has finished doing the gesture
  • the parameters of the movement, like speed of performance

Your task

You have to implement the four key elements of the model:

  • the transition function $f(x_t)$
  • the observation function $g(x_t)$
  • the prior samping function $p()$
  • the weighting function $w(y_t, \hat{y_t})$

The rest of the machinery for recognition is implemented for you.

With these four components implemented correctly, you will have a working gesture recogniser.

The following description outlines in more detail how such a model would work:


Probabilistic view

Putting our assumptions into the probabilistic framework, we want to infer a probability distribution over gesture classes, parameters and gesture completion state, given a time series of $x,y$ coordinates. The $x,y$ points at each time step form our observation vector $y_t$.

We assume gesture reproduction is in some way a "noisy reproduction" of the ideal template form, where there are various types of distortion that can be encountered. We formulate our problem as inferring a distribution over the parameters of these distortions.

Markov approximation

We could look at the whole time series of $x,y$ points and try and classify that. However, there are two problems:

  • What is the "whole" time series -- i.e. how do we segment the gesture?
  • We would have to store the entire series and somehow match it against templates.

This is doable, but a simple way to eliminate these problems is to rewrite the recognizer so that it depends on nothing but its immediately previous state; i.e. so that it satisfies the Markov property.

To do this, we need to introduce additional variables into the state space; but with judicious choice these can be a very small number of additional variables. In particular, for this problem, we can just track how far along a gesture we are (the "phase") and update that over time. The phase state starts at zero and increments as the gesture is performed.

Parameters

We are now in a position to write down a model for our gesture recognizer.

State

First of all, the state we are trying to infer:

$${\bf x_t} = [i,s,x_c,y_c,\theta,\phi,\dot{\phi}]$$

We have one of $n$ possible gestures

  • $i$ the class index of the gesture.

Our model says a gesture will be identical to the template for that class of gesture, but might vary in:

  • $s$ overall scale, within some tolerance
  • $x_c,y_c$ center position (could be anywhere on screen)
  • $\theta$ small changes in rotation (e.g. $<45^o$)

We must take note that what we observe is a position at a single time point in a gesture. This means we must estimate how far through a gesture we are.

  • $\phi$ the proportion of gesture complete, in the fraction [0,1].
  • $\dot{\phi}$ the rate at which the gesture is being performed (i.e. fast or slow).

Implementation

Modify the sections in the code cells below to implement the filter as you see it working. Run the testing cell at the end to bring up the interactive recogniser and see how well you do in recognising the six gestures above.

Hints:

  • Write the observation function first, then the prior, then the dynamics, then adjust the weight function if needed.
  • Test the filter in between to see if it behaves as you might expect it to.
  • As a start, you might want to try tracking the gestures without allowing any geometric transformations, then add these in.

Model specification

Observation

  • Given a gesture $i$, we have a template $G_i(\phi)$, which is returns an $x,y$ point for any value of $\phi$.

  • We expect to observe $\hat{{\bf y}}=AG_i(\phi)$, where $A$ is a transformation matrix applying the translation $x_c,y_c$, the scaling $s$ and the rotation $\theta$.

  • The utility function linear_transform() defined below is a useful component in implementing this:


In [ ]:
def linear_transform(xys, angle=0.0, scale=1.0, translate=(0,0)):
    """Takes a an n x 2 array of point `xys` and returns the 2D points transformed by
    rotating by `angle` (degrees)
    scaling by `scale` (proportional 1.0=no change, 0.5=half, etc.)
    translating by `translate` ((x,y) offset)"""
    ca, sa = np.cos(np.radians(angle)), np.sin(np.radians(angle))
    rot = np.array([[ca, -sa], 
                    [sa, ca]])
    return np.dot(xys, rot)*scale + np.array(translate)

In [ ]:
# demo of linear transform on points distributed in a unit square
original = np.random.uniform(0,1, size=(200,2))
transformed = linear_transform(original, angle=45, scale=1.5, translate=(-5,2))
plt.plot(transformed[:,0], transformed[:,1], '.')
plt.plot(original[:,0], original[:,1], '.')
plt.axis("square");

In [ ]:


In [ ]:
def gesture_observation(state):
    # given an n x d matrix of n particle samples
    # return a n x 2 matrix of expected x,y, positions for that gesture model
    # dummy code,which returns random results:
    # obviously, this should be a transformation of the hidden state passed in!
    observation_matrix = np.random.normal(200,20,size=(state.shape[0], 2))
    return observation_matrix

Dynamics

We then need to specify some simple dynamics. Hint: these should allow the values to slowly change over time (i.e. some random drift), except for the phase $\phi$ which, by our model, we also expect to steadily increase at the rate $\dot{\phi}$. The gesture class should probably not change during a gesture!


In [ ]:
def gesture_dynamics(prev_states):
    # take an n x d array of particle samples
    # return an n x d array representing the next states
    # dummy code: apply no dynamics
    next_states = np.array(prev_states)
    return next_states

Priors

We then need to define our initial guesses for the state of the system, encoded as prior probability distributions.


In [ ]:
def gesture_prior(n):
    # return an n x d matrix with columns [i, s, x_c, y_c, \theta, \phi, \phi_dot] as an initial guess
    # these should call a function draw a value from a distribution
    # dummy code: choose a random class and set all other variables to 1.0
    return np.stack([
        0,  # i
        np.ones(n,),  # s
        np.zeros(n,), np.zeros(n,), # x_c, y_c
        np.zeros(n,),  # \theta
        np.zeros(n,), np.ones(n,)]).T # \phi, \phi_dot

# print some test samples from the prior
print(gesture_prior(3))

Weighting function

We again use the simple heat kernel (or RBF): $$w_i = e^{\left(-\frac{(y-y^\prime)^2}{2\beta^2}\right)}$$ (the specific choice of weighting function is rarely very important, except if there are particularly unusual states to be compared).

The gesture_beta parameter may need adjusted. You can change this function to a different similarity measure if you want. Note the assumption that we only observe one $x,y$ location at any time $t$.


In [ ]:
def gesture_weight(hypothesized, true):
    # take a 2D observation (x,y)
    # and an n x 2 matrix of observation samples (returned from gesture_observation())
    # return the weight for each, representing how similar they are
    gesture_beta = 1000.0               # the RBF width

    # RBF similarity function       
    w = np.exp(-np.sum((hypothesized-true)**2, axis=1)/(0.5*gesture_beta**2))
    return w

Testing the model

Running the cell below will start the gesture recogniser using the functions passed.


In [ ]:
import gestures
gestures = reload(gestures)
gestures.interactive_recogniser(
    dynamics=gesture_dynamics,
    observation=gesture_observation,
    prior=gesture_prior,
    weight=gesture_weight,
    gestures=g.gestures)
%gui tk

Outlook


Scope and limitations

Scope

  • Probabilistic filters can be applied to many problems in HCI. Typically, if a process unfolds over time and there is uncertainty, a probabilistic filter is a strong candidate for inference.

  • The fact that inference is performed over time is a potential advantage over "static" classification approaches, as feedback can be generated on the fly, instead only after completion of an action.

  • In the specific context of gestures, the ability to infer the start and end-point of gestures can solve the "segmentation problem" or "gesture spotting problem" that is often awkward and leads to kludges like button presses to segment actions.

  • Probabilistic motion models can easily be linked to higher-order probabilistic models which infer long-term actions on the part of the user. Because everything is a probability distribution, there is a simple common basis for integrating such models. This, for example, can include language models which estimate a distribution over text that is likely to be entered given both user input and a statistical model of language.

Limitations

  • PFs can be computationally intensive to run.
  • Curse-of-dimensionality can make the attractive simplicity of PFs work poorly in practice as the state space expands (although often better than you might expect).
  • Sometimes the inverse probability model can be hard to formulate. Conversely, it is sometimes very much easier.
  • Particle filters are simple and elegant, but inferentially weak.
  • Kalman filters are rigid and restrictive, but very inferentially efficient.
  • Hybrid approaches (Ensemble Kalman filter, Unscented Kalman Filter, hybrid particle/Kalman filters, Rao-Blackwellized filters) can trade these qualities off, but they aren't off the shelf solutions (i.e. you need an expert!).

Resources

Basic

Advanced


Future of probabilistic filtering

Learned models

Much use of probabilistic filters has depended on strong mathematical models of the fundamental process. For example, in rocket science, sophisticated physics models were used to specify the Kalman filters used for stable control.

However, it is becoming increasingly possible to infer these models from observations. Techniques such as deep learning (for example variational autoencoders or generative adversarial networks) make it possible to learn very sophisticated generative models from observations of data. These models can be dropped into probabilistic filters to produce robust inferential engines for user interaction.

Example

As an illustrative example, we recently built a touch pose estimator to estimate the pose of a finger from a capacitive sensor array (as found on a touch screen). We trained DCNN to predict finger pose from sensor images (inverse model), a separate deconvolutional CNN to predict sensor images from finger poses (forward model) and then fused these using a particle filter.

This combined model gives substantial robustness, and we were able to introduce a simple dynamics model, which filters out completely implausible movements.