A simple temporal data assimilation example

J Gómez-Dans


In [1]:
# Initial set-up
import json
import urllib2
import matplotlib
s = json.load ( urllib2.urlopen("https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/raw/master/styles/bmh_matplotlibrc.json"))
matplotlib.rcParams.update(s)
%pylab inline
%config InlineBackend.figure_format = 'svg'
figsize( 7,7 )
from IPython.display import HTML
def css_styling():
    styles = "https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/raw/master/styles/custom.css"
    return HTML(styles)

css_styling()


Populating the interactive namespace from numpy and matplotlib
Out[1]:

Introduction

This example shows a very simple application of the eoldas_ng <http://github.com/jgomezdans/eoldas_ng/>_ library. The library implements a family of components that allow the user to build up (and solve) complex data assimilation problems. That is, we provide a flexible framework for enunciating and solving problems where observations (and their imperfections) and prior knowledge (including mechanistic models) can be put together to infer a the most likely state of the system (plus its associated uncertainty). eoldas_ng implements a 4DVAR mechanism. Under the assumptions of linearity and normality, DA can be seen as the minimisation of a cost function. This cost function is a formalism for representing the compromise between fitting imperfect observations and fitting imperfect models.

The aim is to build a simple example of how the different bits that make this cost function can be used. The example is a simple one that is typically find in many areas of science where time series are used: we obtain some experimental data which has noise added on to it, and due to the acquisition system imperfections, we have data gaps. We shall see how to solve this problem in a DA framework.

The state

Crucial to our effort is the definition of the state of the system. This is the quantity of interest: an estimate of the state given all the observations, their limitations and as much additional information we might have. In EOLDAS, the State class is the basic building block: it defines the state, allows us to collect the information containers for all the other pieces of information, and it allows us to combine things optimally. The State requires some initialisation:

  1. a state_config dictionary. This dictionary has keys for each of the elements of the state (these could be e.g. temperature, leaf area index, turbidity...). Each entry is associated one key (defined in eoldas_ng: FIXED, CONSTANT or VARIABLE. The first indicates that a given value is used for that particular part of the state (i.e., we assume that the parameter is known and prescribed). CONSTANT indicates that in a dynamic (temporal or spatial) problem, the state has a single scalar representation of that particular variable (i.e., all time periods share the same value of this parameter, as with FIXED, but now we are actually solving for it). Finally, VARIABLE is interpreted as solving for a value of the parameter for each time or space position.
  2. a state_grid array that defines the topology of the problem. It could be a one-dimensional array with the times we will estimate the state, or a 2D array that defines a grid on which the state is defined spatially, for example.
  3. a default_values dictionary that sets for each parameter the default values (i.e., the ones that will substituted for in a FIXED parameter).

In Python, we do this as follows:


In [2]:
import sys
sys.path.append("..")
from eoldas_ng import *
state_config = { 'magnitude': VARIABLE } # We solve for a parameter called 'magnitude'
default_values = { 'magnitude': 0.5 } # The default value is 0.5 but since it's defined
                                      # as VARIABLE above, this definition has no effect
parameter_min = OrderedDict ()
parameter_max = OrderedDict ()
parameter_min [ 'magnitude' ] = 0.
parameter_max [ 'magnitude' ] = 1.
state_grid = np.arange ( 1, 366 ) # A daily time series with 1 day sampling
# Now, just define the state
the_state = State ( state_config, state_grid, default_values, parameter_min, \
                   parameter_max)

The data

Let's assume that we have an annual time series of a given magnitude. It could be anything that changes slowly with time. Something like the temporal trajectory of a vegetation index. Typically, these are smooth, as the phenomena that drive the signal (vegetation development, etc.) change slowly and continously. In the case of a vegetation index, missing observations will happen because of orbital considerations, and also due to observational availability (cloud cover, nighttime overpasses, etc.). Let's write a Python function to create some synthetic data


In [3]:
def create_data ( sigma_obs ):
    """Creates a 365-step long observations, contaminated with 
    additive Gaussian iid noise of standard deviation `sigma_obs`. 
    Random observations will be dropped by setting `obs_mask` to 
    `False`. We return the time axis, the original "clean" data, the
    noisy observations and the observation mask."""
    x = np.arange(1,366 )
    y_smooth = (1-np.cos(np.pi*x/366.)**2)
    y_noisy = y_smooth + np.random.randn(365)*sigma_obs
    obs_mask = np.random.rand(x.shape[0])
    obs_mask = np.where ( obs_mask > 0.7, True, False )

    return ( x, y_smooth, y_noisy, obs_mask )

In [4]:
sigma_obs = 0.15 # Observational uncertainty
( x, y_smooth, y_noisy, obs_mask ) = create_data ( sigma_obs )
plt.plot ( x, y_smooth, '-', lw=1.5, label="'True' data")
yN = np.ma.array ( y_noisy, mask=~obs_mask)
plt.plot ( x, yN, 's', label="Noisy data" )
plt.xlabel ("Time")
plt.ylabel("Magnitude")
plt.legend(loc='best')


Out[4]:
<matplotlib.legend.Legend at 0x1e7c8410>

Now, we need to define an Operator for dealing with the data. Typically, if we assume (like here) that data are corrupted by Gaussian iid noise of variance $\sigma^2$, the minus loglikelihood function is given by $$ J_{obs} = -\frac{1}{2}\left( \mathbf{y} - \mathcal{H}(\mathbf{x})\right)^{T}\mathbf{C}_{obs}^{-1}\left( \mathbf{y} - \mathcal{H}(\mathbf{x})\right), $$ where

$\mathbf{x}$ is the state vector

$\mathbf{y}$ is the observations vector. If a observation is missing, it does not contribute to this term.

$\mathbf{C}_{obs}^{-1}$ is the inverse covariance matrix of the observations.

$\mathcal{H}(\mathbf{x})$ is the observation operator. This operator allows us to map the state vector into the vector of the observations (for example, you might want to use some radiative transfer model to map from LAI, etc. to albedo or radiance).

A way of understanding this term is that we want to reduce the difference between $\mathbf{y}$ and $\mathcal{H}(\mathbf{x})$, so they agree, but always modulated by how much trust we put in the observations (the covariance matrix). In code:


In [5]:
the_observations = ObservationOperator ( state_grid, y_noisy, sigma_obs, obs_mask)

The Operator classes are special because they have two specific functions: der_cost and der_der_cost. The former calculates $J_{obs}$ as above, but also its partial derivatives (the Jacobian). der_der_cost calculates the Hessian (second order partial derivatives matrix). We'll see why later. It is expected that the user will be able to subclass the original classes and adapt them to her circumstances.

In this particular case, $\mathcal{H}(\mathbf{x})$ is the identity operator ($\mathcal{H}(\mathbf{x})=\mathbf{x}$, but others might be required. Note that for the calculation of derivatives, the derivatives of $\mathcal{H}(\mathbf{x})$ will be required. If not available, they can be evaluated numerically.

the_observations now contains the information on the observations. We add this to the_state, using the add_operator method. This method takes a string for the name of the cost function (will be used for logging later, not yet implemented), and the operator itself:


In [6]:
the_state.add_operator ( "Observations", the_observations )

We can now "solve" this simple problem (in fact, this is the maximum likelihood approach), and minimise $J_{obs}$. The problem is that our data is noisy and incomplete, and therefore, the solution would be undefined for points with no observations. But as an example of how it works, it's useful. Basically, the procedure will minimise $J_{obs}$. This minimisation will be done efficiently using an optimiser that follows the partial derivatives to descend to the lowest point in the cost function (that's why the partial derivatives are required), or at least the point where the partial derivatives are 0 that are close to a given starting point. To minimise, we need to give the minimiser a starting point. You can choose whatever you want. A good starting point makes the minimisation faster, and might avoid local minima being selected to the detriment of a global minimum. Let's start with the true value, so we can take a look at how this works:


In [7]:
retval = the_state.optimize( y_smooth )
plt.plot ( x, retval[0], '-', label="Inferred")
plt.plot(x, yN, 'o', label="Obs")
plt.legend(loc='best')


Out[7]:
<matplotlib.legend.Legend at 0x2ab62008d1d0>

What's happened here? As we start with the correct answer, where there are no observations, this is retained. But whenever we get observations, we inmediately see how the state moves towards them, because all we are doing is trying to minimise the distance between one point in the curve and the observations. There's no other constraint. Also note that if the observations go above 1 or below 0 (the upper and lower boundary for the optimisation), the optimiser does not allow the state to fit the observations perfectly.

What is interesting is to note that this solution is quite spiky: it's got the right shape (because we provided the exact true values where there are no observations), but we inmediately see that this jaggedness is unnatural in a slow-evolving system: smooth transitions are to be expected. This high frquency noise is usually smoothed out using filtering or smoothing techniques.

Now, let's put everything together: the prior, the "dynamic model" (or smoother) and the observations:


In [8]:
mu_prior = { 'magnitude': np.array([0.5]) }
inv_cov_prior = { 'magnitude': np.array([1./(2*2)]) }
the_prior = Prior ( mu_prior, inv_cov_prior ) # Uninformative prior
the_model = TemporalSmoother ( state_grid, gamma=5e3 )
the_state.add_operator ( "Prior", the_prior )
the_state.add_operator ( "Model", the_model )


retval = the_state.optimize( np.random.rand(y_smooth.size) )
plt.plot ( x, retval[0], '-', label="Inferred")
plt.plot ( x, y_smooth, '-', label="True")
plt.plot(x, yN, 'o', label="Obs")
plt.legend(loc='best')
plt.title("RMSE=%f" % ( np.std ( y_smooth - retval[0] ) ) )


Out[8]:
<matplotlib.text.Text at 0x1ebc49d0>

So we see that for a suitable value of the regularisation constant, we get a reasonable reconstruction of the underlying trend. Note how this breaks down for the first sample, as edge effects take precedence here and the first sample is just assumed to be the prior value of 0.5.

But what about if we had some prior idea of how the annual trajectory of our magnitude looks like? For example, we might guess that it follows some particular trend, with uncertainty on it:


In [9]:
y_prior = 0.2 + 0.8 * np.sin(np.pi*x/365.)**5
plt.plot(x, y_prior )


Out[9]:
[<matplotlib.lines.Line2D at 0x1ef52390>]

In [10]:
mu_prior = { 'magnitude': y_prior }
inv_cov_prior = { 'magnitude': np.array([1./(0.2*0.2)]) }
the_prior = Prior ( mu_prior, inv_cov_prior ) # Uninformative prior
the_model = TemporalSmoother ( state_grid, gamma=5e3 )
the_state.add_operator ( "Prior", the_prior )
the_state.add_operator ( "Model", the_model )


retval = the_state.optimize( y_smooth )
plt.plot ( x, retval[0], '-', label="Inferred")
plt.plot ( x, y_smooth, '-', label="True")
plt.plot(x, y_prior, '-', label="Prior" )
plt.plot(x, yN, 'o', label="Obs")
plt.legend(loc='best')
plt.title("RMSE=%f" % ( np.std ( y_smooth - retval[0] ) ) )


Out[10]:
<matplotlib.text.Text at 0x2ab6202304d0>

The previous case shows that we have a lot of trust in the prior (the standard deviation was set to 0.2, a fairly tight value). In fact, the fit is worse than having the very broad prior we started with (stating basically ignorance). Let's loosen it a bit:


In [11]:
mu_prior = { 'magnitude': y_prior }
inv_cov_prior = { 'magnitude': np.array([1./(0.4*0.4)]) }
the_prior = Prior ( mu_prior, inv_cov_prior ) # Uninformative prior
the_model = TemporalSmoother ( state_grid, gamma=5e3 )
the_state.add_operator ( "Prior", the_prior )
the_state.add_operator ( "Model", the_model )


retval = the_state.optimize( y_smooth )
plt.plot ( x, retval[0], '-', label="Inferred")
plt.plot ( x, y_smooth, '-', label="True")
plt.plot(x, y_prior, '-', label="Prior" )
plt.plot(x, yN, 'o', label="Obs")
plt.legend(loc='best')
plt.title("RMSE=%f" % ( np.std ( y_smooth - retval[0] ) ) )


Out[11]:
<matplotlib.text.Text at 0x1eda0890>

So we get a better fit now than we started with.

Exercises

  1. Try different priors and try to ascertain the effect it has on the output. Instead of using an RMSE figure (or in addition to it), you could also think about looking at the maximum absolute deviation, for example.
  2. Try to find an optimal value for the regularisation constant (sweep through it and look at the RMSE and find a value that minimises it). Then change the priors, and see how that optimal value changes.
  3. In this case, y_smooth is the truth. In most cases, we don't have the luxury of comparing to the truth. How would you go about finding what the optimal regularisation constant is?

In [11]: