In [1]:
# Initial set-up
import json
import urllib2
import matplotlib
s = json.load ( urllib2.urlopen("https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/raw/master/styles/bmh_matplotlibrc.json"))
matplotlib.rcParams.update(s)
%pylab inline
%config InlineBackend.figure_format = 'svg'
figsize( 7,7 )
from IPython.display import HTML
def css_styling():
styles = "https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/raw/master/styles/custom.css"
return HTML(styles)
css_styling()
Out[1]:
This example shows a very simple application of the eoldas_ng <http://github.com/jgomezdans/eoldas_ng/>
_ library. The library implements a family of components that allow the user to build up (and solve) complex data assimilation problems. That is, we provide a flexible framework for enunciating and solving problems where observations (and their imperfections) and prior knowledge (including mechanistic models) can be put together to infer a the most likely state of the system (plus its associated uncertainty). eoldas_ng
implements a 4DVAR mechanism. Under the assumptions of linearity and normality, DA can be seen as the minimisation of a cost function. This cost function is a formalism for representing the compromise between fitting imperfect observations and fitting imperfect models.
The aim is to build a simple example of how the different bits that make this cost function can be used. The example is a simple one that is typically find in many areas of science where time series are used: we obtain some experimental data which has noise added on to it, and due to the acquisition system imperfections, we have data gaps. We shall see how to solve this problem in a DA framework.
Crucial to our effort is the definition of the state of the system. This is the quantity of interest: an estimate of the state given all the observations, their limitations and as much additional information we might have. In EOLDAS, the State
class is the basic building block: it defines the state, allows us to collect the information containers for all the other pieces of information, and it allows us to combine things optimally. The State
requires some initialisation:
state_config
dictionary. This dictionary has keys for each of the elements of the state (these could be e.g. temperature, leaf area index, turbidity...). Each entry is associated one key (defined in eoldas_ng
: FIXED
, CONSTANT
or VARIABLE
. The first indicates that a given value is used for that particular part of the state (i.e., we assume that the parameter is known and prescribed). CONSTANT
indicates that in a dynamic (temporal or spatial) problem, the state has a single scalar representation of that particular variable (i.e., all time periods share the same value of this parameter, as with FIXED
, but now we are actually solving for it). Finally, VARIABLE
is interpreted as solving for a value of the parameter for each time or space position.state_grid
array that defines the topology of the problem. It could be a one-dimensional array with the times we will estimate the state, or a 2D array that defines a grid on which the state is defined spatially, for example.default_values
dictionary that sets for each parameter the default values (i.e., the ones that will substituted for in a FIXED
parameter).In Python, we do this as follows:
In [2]:
import sys
sys.path.append("..")
from eoldas_ng import *
state_config = { 'magnitude': VARIABLE } # We solve for a parameter called 'magnitude'
default_values = { 'magnitude': 0.5 } # The default value is 0.5 but since it's defined
# as VARIABLE above, this definition has no effect
parameter_min = OrderedDict ()
parameter_max = OrderedDict ()
parameter_min [ 'magnitude' ] = 0.
parameter_max [ 'magnitude' ] = 1.
state_grid = np.arange ( 1, 366 ) # A daily time series with 1 day sampling
# Now, just define the state
the_state = State ( state_config, state_grid, default_values, parameter_min, \
parameter_max)
Let's assume that we have an annual time series of a given magnitude. It could be anything that changes slowly with time. Something like the temporal trajectory of a vegetation index. Typically, these are smooth, as the phenomena that drive the signal (vegetation development, etc.) change slowly and continously. In the case of a vegetation index, missing observations will happen because of orbital considerations, and also due to observational availability (cloud cover, nighttime overpasses, etc.). Let's write a Python function to create some synthetic data
In [3]:
def create_data ( sigma_obs ):
"""Creates a 365-step long observations, contaminated with
additive Gaussian iid noise of standard deviation `sigma_obs`.
Random observations will be dropped by setting `obs_mask` to
`False`. We return the time axis, the original "clean" data, the
noisy observations and the observation mask."""
x = np.arange(1,366 )
y_smooth = (1-np.cos(np.pi*x/366.)**2)
y_noisy = y_smooth + np.random.randn(365)*sigma_obs
obs_mask = np.random.rand(x.shape[0])
obs_mask = np.where ( obs_mask > 0.7, True, False )
return ( x, y_smooth, y_noisy, obs_mask )
In [4]:
sigma_obs = 0.15 # Observational uncertainty
( x, y_smooth, y_noisy, obs_mask ) = create_data ( sigma_obs )
plt.plot ( x, y_smooth, '-', lw=1.5, label="'True' data")
yN = np.ma.array ( y_noisy, mask=~obs_mask)
plt.plot ( x, yN, 's', label="Noisy data" )
plt.xlabel ("Time")
plt.ylabel("Magnitude")
plt.legend(loc='best')
Out[4]:
Now, we need to define an Operator
for dealing with the data. Typically, if we assume (like here) that data are corrupted by Gaussian iid noise of variance $\sigma^2$, the minus loglikelihood function is given by
$$
J_{obs} = -\frac{1}{2}\left( \mathbf{y} - \mathcal{H}(\mathbf{x})\right)^{T}\mathbf{C}_{obs}^{-1}\left( \mathbf{y} - \mathcal{H}(\mathbf{x})\right),
$$
where
$\mathbf{x}$ is the state vector
$\mathbf{y}$ is the observations vector. If a observation is missing, it does not contribute to this term.
$\mathbf{C}_{obs}^{-1}$ is the inverse covariance matrix of the observations.
$\mathcal{H}(\mathbf{x})$ is the observation operator. This operator allows us to map the state vector into the vector of the observations (for example, you might want to use some radiative transfer model to map from LAI, etc. to albedo or radiance).
A way of understanding this term is that we want to reduce the difference between $\mathbf{y}$ and $\mathcal{H}(\mathbf{x})$, so they agree, but always modulated by how much trust we put in the observations (the covariance matrix). In code:
In [5]:
the_observations = ObservationOperator ( state_grid, y_noisy, sigma_obs, obs_mask)
The Operator
classes are special because they have two specific functions: der_cost
and der_der_cost
. The former calculates $J_{obs}$ as above, but also its partial derivatives (the Jacobian). der_der_cost
calculates the Hessian (second order partial derivatives matrix). We'll see why later. It is expected that the user will be able to subclass the original classes and adapt them to her circumstances.
In this particular case, $\mathcal{H}(\mathbf{x})$ is the identity operator ($\mathcal{H}(\mathbf{x})=\mathbf{x}$, but others might be required. Note that for the calculation of derivatives, the derivatives of $\mathcal{H}(\mathbf{x})$ will be required. If not available, they can be evaluated numerically.
the_observations
now contains the information on the observations. We add this to the_state
, using the add_operator
method. This method takes a string for the name of the cost function (will be used for logging later, not yet implemented), and the operator itself:
In [6]:
the_state.add_operator ( "Observations", the_observations )
We can now "solve" this simple problem (in fact, this is the maximum likelihood approach), and minimise $J_{obs}$. The problem is that our data is noisy and incomplete, and therefore, the solution would be undefined for points with no observations. But as an example of how it works, it's useful. Basically, the procedure will minimise $J_{obs}$. This minimisation will be done efficiently using an optimiser that follows the partial derivatives to descend to the lowest point in the cost function (that's why the partial derivatives are required), or at least the point where the partial derivatives are 0 that are close to a given starting point. To minimise, we need to give the minimiser a starting point. You can choose whatever you want. A good starting point makes the minimisation faster, and might avoid local minima being selected to the detriment of a global minimum. Let's start with the true value, so we can take a look at how this works:
In [7]:
retval = the_state.optimize( y_smooth )
plt.plot ( x, retval[0], '-', label="Inferred")
plt.plot(x, yN, 'o', label="Obs")
plt.legend(loc='best')
Out[7]:
What's happened here? As we start with the correct answer, where there are no observations, this is retained. But whenever we get observations, we inmediately see how the state moves towards them, because all we are doing is trying to minimise the distance between one point in the curve and the observations. There's no other constraint. Also note that if the observations go above 1 or below 0 (the upper and lower boundary for the optimisation), the optimiser does not allow the state to fit the observations perfectly.
What is interesting is to note that this solution is quite spiky: it's got the right shape (because we provided the exact true values where there are no observations), but we inmediately see that this jaggedness is unnatural in a slow-evolving system: smooth transitions are to be expected. This high frquency noise is usually smoothed out using filtering or smoothing techniques.
Now, let's put everything together: the prior, the "dynamic model" (or smoother) and the observations:
In [8]:
mu_prior = { 'magnitude': np.array([0.5]) }
inv_cov_prior = { 'magnitude': np.array([1./(2*2)]) }
the_prior = Prior ( mu_prior, inv_cov_prior ) # Uninformative prior
the_model = TemporalSmoother ( state_grid, gamma=5e3 )
the_state.add_operator ( "Prior", the_prior )
the_state.add_operator ( "Model", the_model )
retval = the_state.optimize( np.random.rand(y_smooth.size) )
plt.plot ( x, retval[0], '-', label="Inferred")
plt.plot ( x, y_smooth, '-', label="True")
plt.plot(x, yN, 'o', label="Obs")
plt.legend(loc='best')
plt.title("RMSE=%f" % ( np.std ( y_smooth - retval[0] ) ) )
Out[8]:
So we see that for a suitable value of the regularisation constant, we get a reasonable reconstruction of the underlying trend. Note how this breaks down for the first sample, as edge effects take precedence here and the first sample is just assumed to be the prior value of 0.5.
But what about if we had some prior idea of how the annual trajectory of our magnitude looks like? For example, we might guess that it follows some particular trend, with uncertainty on it:
In [9]:
y_prior = 0.2 + 0.8 * np.sin(np.pi*x/365.)**5
plt.plot(x, y_prior )
Out[9]:
In [10]:
mu_prior = { 'magnitude': y_prior }
inv_cov_prior = { 'magnitude': np.array([1./(0.2*0.2)]) }
the_prior = Prior ( mu_prior, inv_cov_prior ) # Uninformative prior
the_model = TemporalSmoother ( state_grid, gamma=5e3 )
the_state.add_operator ( "Prior", the_prior )
the_state.add_operator ( "Model", the_model )
retval = the_state.optimize( y_smooth )
plt.plot ( x, retval[0], '-', label="Inferred")
plt.plot ( x, y_smooth, '-', label="True")
plt.plot(x, y_prior, '-', label="Prior" )
plt.plot(x, yN, 'o', label="Obs")
plt.legend(loc='best')
plt.title("RMSE=%f" % ( np.std ( y_smooth - retval[0] ) ) )
Out[10]:
The previous case shows that we have a lot of trust in the prior (the standard deviation was set to 0.2, a fairly tight value). In fact, the fit is worse than having the very broad prior we started with (stating basically ignorance). Let's loosen it a bit:
In [11]:
mu_prior = { 'magnitude': y_prior }
inv_cov_prior = { 'magnitude': np.array([1./(0.4*0.4)]) }
the_prior = Prior ( mu_prior, inv_cov_prior ) # Uninformative prior
the_model = TemporalSmoother ( state_grid, gamma=5e3 )
the_state.add_operator ( "Prior", the_prior )
the_state.add_operator ( "Model", the_model )
retval = the_state.optimize( y_smooth )
plt.plot ( x, retval[0], '-', label="Inferred")
plt.plot ( x, y_smooth, '-', label="True")
plt.plot(x, y_prior, '-', label="Prior" )
plt.plot(x, yN, 'o', label="Obs")
plt.legend(loc='best')
plt.title("RMSE=%f" % ( np.std ( y_smooth - retval[0] ) ) )
Out[11]:
So we get a better fit now than we started with.
y_smooth
is the truth. In most cases, we don't have the luxury of comparing to the truth. How would you go about finding what the optimal regularisation constant is?
In [11]: