Spatial regularisation example with eoldas_ng

J Gómez-Dans (NCEO & UCL)


In [1]:
# Initial set-up
import json
import urllib2
import matplotlib
s = json.load ( urllib2.urlopen("https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/raw/master/styles/bmh_matplotlibrc.json"))
matplotlib.rcParams.update(s)
%pylab inline
%config InlineBackend.figure_format = 'svg'
figsize( 7, 7)
from IPython.display import HTML
def css_styling():
    styles = "https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/raw/master/styles/custom.css"
    return HTML(styles)

css_styling()


Populating the interactive namespace from numpy and matplotlib
Out[1]:

This Section deals with spatial regularisation. In the last few examples we have always considered that things vary in time, and that we have some expectation of smoothness in time. An identical approach can be taken spatially, where we assume that parameters on the system are spatially correlated. The same formalism can be applied in space, provided the site of interest is homogeneous, as the assumption of smoothness will tend to blur borders between different fields, forests, etc. Spatial treatment of the data will also allow us to merge different spatial resolutions, provided that we have a way to relate the state to the different resolutions. In many cases, simple additive approaches can be very succesful in doing this.

Creating some data

It is convenient to create some functionality to produce data. The data that we will be looking at will be an image of a given shape, where there will be two two dimensional Gaussians. Noise will be added, as well as observations will be dropped to simulate e.g., cloudiness in satellite data.


In [2]:
import sys
sys.path.append("..")
from eoldas_ng import *
import numpy as np
import matplotlib.pyplot as plt

from collections import OrderedDict



def gauss_kern(size, sizey=None):
    """ 
    Returns a normalized 2D gauss kernel array for convolutions 
    From http://www.scipy.org/Cookbook/SignalSmooth
    """
    import numpy as np
    size = int(size)
    if not sizey:
        sizey = size
    else:
        sizey = int(sizey)
    x, y = np.mgrid[-size:size+1, -sizey:sizey+1]
    g = np.exp(-(x**2/float(size)+y**2/float(sizey)))
    return g / g.sum()
def create_data ( nr,nc,n_per=1, noise=0.15, obs_off=0.1, \
                                        window_size=5):
    """
    Create synthetic "NDVI-like" data for a fictitious space series. We return
    the original data, noisy data (using IID Gaussian noise), the QA flag as well
    as the time axis.
    
    Missing observations are simulated by drawing a random number between 0 and 1
    and checking against obs_off.
    
    Parameters
    ----------
    nc : number of columns
    nr : number of rows

    n_per : integer
    Observation periodicity. By default, assumes every sample

    noise : float
    The noise standard deviation. By default, 0.15

    obs_off : float
    The threshold to decide on missing observations in the series.

    
    """
    import numpy as np
    from scipy import signal

    r,c = np.mgrid[1:nr+1:n_per,1:nc+1:n_per]
    ndvi_clean =  np.clip(np.sin(np.pi*2*(r-1)/nr)*np.sin(np.pi*2*(c-1)/nc), 0,1) 
    ndvi = ndvi_clean.copy()
    # add Gaussian noise of sd noise
    ndvi = np.random.normal(ndvi,noise,ndvi.shape)
     
    # set the qa flags for each sample to 1 (good data)
    qa_flag = np.ones_like ( ndvi).astype( np.int32 )
    passer = np.random.rand(ndvi.shape[0],ndvi.shape[1])
    if window_size >0:
        # force odd
        window_size = 2*(window_size/2)+1
        # smooth passer
        g = gauss_kern(window_size)
        passer = signal.convolve(passer,g,mode='same')
    # assign a proportion of the qa to 0 from an ordering of the smoothed 
    # random numbers
    qf = qa_flag.flatten()
    qf[np.argsort(passer,axis=None)[:passer.size * obs_off]]  = 0
    qa_flag = qf.reshape(passer.shape)
    return ( r,c , ndvi_clean, ndvi, qa_flag )

With the previous helper function, we can now create some data:


In [3]:
cmap = plt.cm.spectral
cmap.set_bad ( '0.8' )
rows = 100
cols = 100
sigma_obs = 0.15
R, C, ndvi_true, ndvi_obs, qa_flag = create_data ( rows, cols, obs_off=0.3 )
plt.subplot(1, 2, 1)
plt.imshow ( ndvi_true, interpolation='nearest', vmin=0, vmax=1, cmap=cmap )
plt.xticks([])
plt.yticks([])
plt.subplot(1, 2, 2)
plt.imshow ( np.ma.array(ndvi_obs, mask=qa_flag==False), interpolation='nearest', \
            vmin=0, vmax=1, cmap=cmap )
plt.xticks([])
plt.yticks([])


Out[3]:
([], <a list of 0 Text yticklabel objects>)

In the above example, we are adding a significant quantity of noise, as well as reducing the observations by 30%, a fairly important amount. Let's see how eoldas_ng can be used to reconstruct the original data.

First, we define the state. As usual, the state will only have one parameter, which we'll call 'magnitude'. We need to supply a default value, as well as bounds, and a state grid (a 2D array of the required shape). For simplicity we also create an initial estimate x_dict set to 0.25. the_state is defined with all that information:


In [4]:
state_config = { 'magnitude': VARIABLE } # We solve for a parameter called 
                                         # 'magnitude'
default_values = { 'magnitude': 0.5 } 
# The default value is 0.5 but since it's defined
# as VARIABLE above, this definition has no effect
parameter_min = OrderedDict ()
parameter_max = OrderedDict ()
parameter_min [ 'magnitude' ] = 0.
parameter_max [ 'magnitude' ] = 1.
state_grid = np.arange ( 1, rows*cols + 1 ).reshape((rows, cols)) # A 2d grid
# Now, just define the state
the_state = State ( state_config, state_grid, default_values, parameter_min, \
                   parameter_max)
x_dict = {'magnitude': np.ones(rows*cols)*0.25}

Next, we add a prior term that states that our prior belief is that the parameter of interest will have a mean of 0.5 and a standard deviation of 2. Note that the prior dictionary needs to be expressed in terms of covariance, which in this case (as the mean value) are identical for all pixels.


In [5]:
######################################################
###### The prior
##########################################################
mu_prior = { 'magnitude': np.array([0.5]) }
inv_cov_prior = { 'magnitude': np.array([1./(2*2)]) }
the_prior = Prior ( mu_prior, inv_cov_prior ) # Uninformative prior

the_state.add_operator ( "Prior", the_prior )

This is the spatial smoother. We need to give it a for the regularisation constant, and in this case, it's set to 0.1. The state grid is also required. Note that this particular prior is a Markov Random Field with a first order neighbourhood that in this case deals with 4 neighbours (up, down, left and right). It can also deal with 8-neighbours if required. The actual function that eoldas_ng implements is $$ J_{smooth} = -\frac{1}{2}\sum_{i}^{N_{rows}}\sum_{j}^{N_{cols}} \sum_{ii=-1}^{1}\sum_{jj=-1}^{1} \frac{(x_{i,j} - x^{}_{i+ii,j+jj})^2}{\sigma_m^2} $$ This of writing the smoothness constraint without matrices, just as a number of embedded summations. It basically means that for each pixel i,j we consider its eight neighbours: $(i−1,j−1),\,(i,j−1),\,(i+1,j−1),\,(i−1,j),\,(i+1,j),\,(i−1,j+1),\,(i−1,j+1),\,(i−1,j+1)$. We assume that the smoothness term is isotropic and that it affects all directions equally (we ought to take into account the fact that the diagonal neighbours are a bit further away though!), and ultimately, we are modelling the difference between pixel $x_{i,j}$ with its neighbours as a zero-mean Gaussian distribution with a variance given by $\gamma=\sigma_{m}^{-2}$.


In [6]:
######################################################
###### The spatial smoother (MRF)
##########################################################

the_smoother = SpatialSmoother ( state_grid, 0.1 )

the_state.add_operator ( "Smoother", the_smoother )

The observational constraint term is very similar to the ones we have seen before: we need the data, a measure of uncertainty, and a mask or qa_flag to indicate what pixels have no observations.


In [7]:
######################################################
###### The observations
##########################################################

the_obs = ObservationOperator(state_grid, ndvi_obs, sigma_obs, qa_flag )

the_state.add_operator ( "Observations", the_obs )

At this stage, we can solve the inference system with the_state.optimize, and get an answer. However, we are not sure what the optimal value of $\gamma$ to use. For expediency's sake, and given that we have true data, we can calculate the RMSE between the inferred field and the true field and select a value of $\gamma$ that minimises this. If the true data were not available, we could do some sort of cross-validation, where some pixels are left out and then the prediction is used to assess the success or otherwise of the assimilation experiment.


In [8]:
rmse = []
g_list = []
for gamma in np.logspace(-3,2,10):
    g_list.append ( gamma )
    the_smoother.gamma = gamma
    retval = the_state.optimize ( x_dict )
    rmse.append ( np.std( retval[0].reshape((rows,cols)) - ndvi_true ) )

Let's now plot the RMSE as a funciton of $\gamma$...


In [9]:
plt.semilogx ( g_list, rmse, 'o')


Out[9]:
[<matplotlib.lines.Line2D at 0x6abded0>]

The best solution appears to be with $\gamma=0.1668$, so we'll re-run the problem, and then plot the results:


In [19]:
the_smoother.gamma = 0.1668
retval = the_state.optimize ( x_dict )

In [20]:
plt.subplot(1, 3, 1)
plt.imshow ( ndvi_true, interpolation='nearest', vmin=0, vmax=1, cmap=cmap )
plt.xticks([])
plt.yticks([])
plt.title("True")
plt.subplot(1, 3, 2)
plt.imshow ( retval[0].reshape((rows, cols)), interpolation='nearest', \
            vmin=0, vmax=1, cmap=cmap )
plt.title("Inferred")
plt.xticks([])
plt.yticks([])
plt.subplot( 1, 3, 3)
plt.imshow ( np.ma.array(ndvi_obs, mask=qa_flag==False), interpolation='nearest', \
            vmin=0, vmax=1, cmap=cmap )
plt.title("Observed")
plt.xticks([])
plt.yticks([])


Out[20]:
([], <a list of 0 Text yticklabel objects>)

We see that we are doing a reasonable job both denoising and interpolating over the missing observations, although we have somehow "cheated" by not doing a proper cross-validation! Ultimately, you will notice that all we have done is just some smoothing and interpolation, but that critically, it is a coherent way to combine the information. For example, the prior might be more informative (it might be the result of a previous DA experiment from a previous date, with the uncertainty somehow inflated to cope with changes in the land surface between the last observation and now), and this method will provide a consistent answer.

Let us now consider more practical aspects of spatial DA. One is that we can solve multi-resolution setups.

Multi-resolution processing

An annoying fact of EO data spatial resolutions are often different, spanning from the few meters to the kilometer. In some senarios, moderate resolution data provides very frequent acquisitions, which allows the possibility of capturing the land surface when clouds are persistent. Higher resolution data, due to typically smaller swaths, usually has fewer revisit opportunities and that can result in fewer actual observations. Additionally, some high resolution data might suffer from poor spectral or angular sampling. In any case, we want to combine both datasets in an optimal way to make full use of the information contained in both observations.

In the present DA scheme, the resolution is only an issue for the observation operator. If we assume two resolution datasets are available (high- and low-resolution), both provide evidence of the state, which is defined in the high-resolution grid (say). What is needed is a way to map the state at high spatial resolution to the observation space. One simple way of doing this (if the data are aligned and linearly additive) is to just integrate the forward modelled state over the low resolution gridcell and compare it to the actual low resolution information. Let's see how this works in a bit more detail. For the hi-resolution observation, the state and the observations are in the same grid, so the mismatch is straightforward:

$$ J_{obs, hires} =-\frac{1}{2}\sum_{i}^{N_{rows}}\sum_{j}^{N_{cols}} \frac{qa_{i,j}\cdot(x_{i,j} - x^{o,h}_{i,j})^2}{\sigma_{o,h}^2} $$

The fit to the low resolution observations using the integration rule is then simply

$$ J_{obs, lores} =-\frac{1}{2}\sum_{i}^{N_{rows}}\sum_{j}^{N_{cols}}\sum_{ii=0}^{N} \sum_{jj=0}^{M}\frac{qa_{i/N,j/M}\cdot(x_{i+ii,j+jj} - x^{o,l}_{i/N,j/M})^2}{\sigma_{o,l}^2}, $$

where we have introduced $N$ and $M$ as the ratio in sizes between rows and columns in the two datasets. In more complicated situations, where a pulse spread function (PSF) might be required, then rather than having a summation, we would have an integration of the state over the low resolution PSF. Note how this is a very powerful statement that allows one to take in swath data provided the PSF is known and that the observations at the two different resolutions are conmesurate (or some model can be used to map the state to both observation spaces, for example by using a RT model).

So all that is needed is an observation operator that implements the second of the equations in the previous paragraph. We do this by using the factor option to ObservationOperator to provide $N$ and $M$. The othe assumptions are that the data are co-registered and that $N$ and $M$ are integers.

Let's start by creating the data. We will use again create_data, the utility function defined at the begginning. We will assume that the low resolution data has a resolution three times that of the high resolution data. We will also assume that the high-resolution data will have 60% of the observations missing and a noise figure of 0.2. By contrast, the low resolution data will have 30% of the observations missing and a noise standard deviation of 0.15. This is a way of allowing for the likely increased observationa opportunity of the low resolution data, as well as the improved signal to noise that is to be expected.


In [21]:
rows = 90
cols = 90
sigma_obs = 0.15
R, C, ndvi_true_hires, ndvi_obs_hires, qa_flag_hires = create_data ( rows, cols, \
        obs_off=0.6, noise=0.20, window_size=11 )

cmap=plt.cm.spectral
cmap.set_bad('0.8')
plt.subplot ( 2,2,1 )
plt.imshow ( ndvi_true_hires, interpolation='nearest', vmin=0, vmax=1, cmap=cmap )
plt.xticks(visible=False)
plt.yticks(visible=False)
plt.ylabel("High Resolution")
plt.title("'True'")
plt.subplot ( 2,2,2)
plt.title("Observed")
plt.imshow ( np.ma.array(ndvi_obs_hires,mask=qa_flag_hires==0), \
    interpolation='nearest', vmin=0, vmax=1, cmap=cmap )
plt.xticks(visible=False)
plt.yticks(visible=False)
plt.ylabel("$\sigma=0.25$\nObsOff=60%%")
rows = 30
cols = 30
sigma_obs = 0.15
R, C, ndvi_true_lores, ndvi_obs_lores, qa_flag_lores = create_data ( rows, cols, \
        noise=0.15, obs_off=0.3)

cmap=plt.cm.spectral
cmap.set_bad('0.8')
plt.subplot ( 2,2,3 )
plt.imshow ( ndvi_true_lores, interpolation='nearest', vmin=0, vmax=1, cmap=cmap )
plt.xticks(visible=False)
plt.yticks(visible=False)


plt.ylabel("Low Resolution")
plt.subplot ( 2,2,4)
plt.imshow ( np.ma.array(ndvi_obs_lores,mask=qa_flag_lores==0), \
    interpolation='nearest', vmin=0, vmax=1, cmap=cmap )
plt.xticks(visible=False)
plt.yticks(visible=False)
plt.ylabel("$\sigma=0.15$\nObsOff=20%%")
cax = plt.axes([0.01, 0.05, 0.85, 0.05])
plt.colorbar(cax=cax,orientation='horizontal')


Out[21]:
<matplotlib.colorbar.Colorbar instance at 0x715c200>

Let's start by defining the state. As you can see, this is exactly the same state that we had in the previous, single resolution example


In [22]:
rows, cols = 90, 90
state_config = { 'magnitude': VARIABLE } # We solve for a parameter called 
'magnitude'
default_values = { 'magnitude': 0.5 } 
# The default value is 0.5 but since it's defined
# as VARIABLE above, this definition has no effect
parameter_min = OrderedDict ()
parameter_max = OrderedDict ()
parameter_min [ 'magnitude' ] = 0.
parameter_max [ 'magnitude' ] = 1.

x_dict = {'magnitude': np.ones(rows*cols)*0.25}
state_grid = np.arange ( 1, rows*cols + 1 ).reshape((rows, cols)) # A 2d grid
# Now, just define the state
the_state = State ( state_config, state_grid, default_values, parameter_min, \
                   parameter_max)

In [23]:
###################################################
### The prior
#######################################################
mu_prior = { 'magnitude': np.array([0.5]) }
inv_cov_prior = { 'magnitude': np.array([1./(2*2)]) }
the_prior = Prior ( mu_prior, inv_cov_prior ) # Uninformative prior

the_state.add_operator ( "Prior", the_prior )

the_smoother = SpatialSmoother ( state_grid, 0.1 )

the_state.add_operator ( "Smoother", the_smoother )

the_hires_obs = ObservationOperator(state_grid, ndvi_obs_hires, sigma_obs, \
                                    qa_flag_hires )

the_state.add_operator ( "HiRes Observations", the_hires_obs )

the_lores_obs = ObservationOperator ( state_grid, ndvi_obs_lores, sigma_obs, \
                                     qa_flag_lores, factor=[3,3])

the_state.add_operator ( "LoRes Observations", the_lores_obs )
#retval = the_state.optimize ( x_dict )

In [15]:
rmse = []
g_list = []
for gamma in np.logspace(-3,2,10):
    g_list.append ( gamma )
    the_smoother.gamma = gamma
    retval = the_state.optimize ( x_dict )
    rmse.append ( np.std( (retval[0].reshape((rows,cols)) - ndvi_true_hires ) ))

In [24]:
plt.semilogx ( g_list, rmse, 'o')
opt_gamma= g_list[np.argmin(rmse)]
print opt_gamma


0.16681005372

In [25]:
the_smoother.gamma = opt_gamma
retval = the_state.optimize ( x_dict )

In [28]:
plt.subplot ( 2,3,1 )
plt.imshow ( ndvi_true_hires, interpolation='nearest', vmin=0, vmax=1, cmap=cmap )
plt.xticks(visible=False)
plt.yticks(visible=False)
plt.ylabel("High Resolution")
plt.title("'True'")
plt.subplot ( 2,3,2)
plt.title("Observed")
plt.imshow ( np.ma.array(ndvi_obs_hires,mask=qa_flag_hires==0), \
    interpolation='nearest', vmin=0, vmax=1, cmap=cmap )
plt.xticks(visible=False)
plt.yticks(visible=False)
plt.ylabel("$\sigma=0.25$\nObsOff=60%")
cmap=plt.cm.spectral
cmap.set_bad('0.8')
plt.subplot ( 2,3,4 )
plt.imshow ( ndvi_true_lores, interpolation='nearest', vmin=0, vmax=1, cmap=cmap )
plt.xticks(visible=False)
plt.yticks(visible=False)


plt.ylabel("Low Resolution")
plt.subplot ( 2,3,5)
plt.imshow ( np.ma.array(ndvi_obs_lores,mask=qa_flag_lores==0), \
    interpolation='nearest', vmin=0, vmax=1, cmap=cmap )
plt.xticks(visible=False)
plt.yticks(visible=False)
plt.ylabel("$\sigma=0.15$\nObsOff=20%")
#cax = plt.axes([0.01, 0.05, 0.85, 0.05])
#plt.colorbar(cax=cax,orientation='horizontal')

plt.subplot ( 2,3,3)
plt.imshow ( retval[0].reshape((90,90)), \
    interpolation='nearest', vmin=0, vmax=1, cmap=cmap )

plt.title("Inferred")
plt.xticks(visible=False)
plt.yticks(visible=False)


Out[28]:
(array([ -20.,    0.,   20.,   40.,   60.,   80.,  100.]),
 <a list of 7 Text yticklabel objects>)

At this stage, we have demonstrated how to use eoldas_ng to blend datasets of different spatial resolutions. This is achieved quite simply by instructing the observation operator that one pixel in one dataset is equivalent to a particular box in the state. More observations at different resolutions could be added in the same way. We haven't actually proved that this approach is optimal in the cross-validation sense, but that is an easy extension of this work that we encourage the reader to do.


In [18]: