In [5]:

    
import json
import urllib2
import matplotlib
s = json.load ( urllib2.urlopen("https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/raw/master/styles/bmh_matplotlibrc.json"))
matplotlib.rcParams.update(s)
figsize( 11, 9)
%config InlineBackend.figure_format = 'svg'

Inverting the PROSAIL model with CHRIS/PROBA data using Gaussian Process emulators

J Gómez-Dans (NCEO & UCL)

Introduction

The aim of this document is to illustrate the usage of Gaussian Process (GPs) emulators to efficiently approximate a more costly function, as well as its partial derivatives. We illustrate this approach here using the PROSAIL radiative transfer model and CHRIS/PROBA atmospherically corrected surface reflectance.

The approach is as follows:

The RT model is run to produce a sample of outputs (directional reflectance) varying the input parameters (LAI, chlorophyll concentration, etc) for the required imaged geometry.
A GP emulator is trained to emulate the training set.
A set of cost functions are implemented using the emulator

This is a first attempt at usage. We shall only use a very simple inversion set-up.

The first stage is to import a number of utility functions:



In [6]:

    
import numpy as np
from lut_tools import lut, limits, samples, brdfModel, transform, invtransform, pack, unpack
from multivariate_gp import MultivariateEmulator
from scipy.optimize import *
import scipy.spatial
import time
import scipy.stats
from mpl_toolkits.axes_grid1 import ImageGrid

Creating the look-up table (LUT)

One can envisage the GP as a powerful look-up table (LUT), where pre-calculated outputs are stored and used in the inversion. We will create a database of model output (62 samples for CHRIS' 62 bands) as a function of parameters, for a single observation geometry (in this case, rather than use all of CHRIS' cameras, we will only use the nadir one). First, let's define the spectral and geometric aspects of the problem:



In [8]:

    
bands_txt = """410.501      405.633      415.244      9.61120
      441.286      436.253      446.572      10.3190
      451.155      446.572      455.890      9.31790
      460.776      455.890      465.796      9.90570
      470.956      465.796      476.259      10.4620
      481.712      476.259      487.298      11.0390
      491.899      487.298      496.596      9.29790
      501.417      496.596      506.357      9.76010
      511.431      506.357      516.657      10.3000
      522.043      516.657      527.573      10.9150
      531.848      527.573      536.238      8.66460
      542.221      536.238      548.447      12.2090
      553.251      548.447      558.182      9.73490
      563.245      558.182      568.430      10.2470
      573.769      568.430      579.258      10.8280
      582.975      579.258      586.800      7.54180
      592.636      586.800      598.645      11.8440
      604.822      598.645      611.168      12.5230
      615.512      611.168      619.917      8.74920
      624.419      619.917      628.969      9.05170
      633.636      628.969      638.395      9.42590
      643.218      638.395      648.135      9.73960
      653.172      648.135      658.276      10.1410
      663.488      658.276      668.815      10.5380
      674.242      668.815      679.748      10.9320
      682.559      679.748      685.324      5.57530
      688.174      685.324      691.070      5.74610
      693.963      691.070      696.883      5.81340
      699.860      696.883      702.841      5.95720
      705.839      702.841      708.888      6.04680
      711.954      708.888      715.058      6.17070
      718.192      715.058      721.320      6.26160
      724.529      721.320      727.724      6.40440
      730.932      727.724      734.266      6.54210
      737.563      734.266      740.901      6.63450
      744.231      740.901      747.686      6.78460
      751.100      747.686      754.582      6.89580
      758.096      754.582      761.634      7.05270
      765.172      761.634      768.737      7.10220
      772.388      768.737      776.076      7.33970
      779.735      776.076      783.491      7.41460
      787.287      783.491      791.046      7.55520
      794.949      791.046      798.750      7.70350
      802.702      798.750      806.614      7.86430
      810.582      806.614      814.631      8.01700
      831.065      822.750      839.468      16.7170
      843.759      839.468      848.038      8.57030
      852.458      848.038      856.874      8.83580
      861.241      856.874      865.688      8.81430
      870.174      865.688      874.757      9.06900
      879.316      874.757      883.899      9.14150
      888.505      883.899      893.240      9.34100
      898.019      893.240      902.751      9.51130
      907.533      902.751      912.337      9.58560
      917.255      912.337      922.147      9.81040
      927.173      922.147      932.027      9.88020
      942.121      932.027      952.277      20.2490
      957.371      952.277      962.585      10.3070
      967.744      962.585      972.983      10.3980
      978.331      972.983      983.534      10.5500
      988.911      983.534      994.152      10.6180
      999.543      994.152      1004.96      10.8060"""
# Convert the band info to an array. 
# Columns are centre, min, max and bandwidth
bands = np.fromstring ( bands_txt, sep="\n").reshape((62,4))

proba_min = bands[:, 1]
proba_max = bands[:, 2]
# Number of bands
n_bands = 62
# The bandpass functions are purely arrays
band_pass = np.zeros (( n_bands, 2101 ), dtype = np.bool )
bw = bands [ :, 3]
wv = np.arange ( 400, 2501 )
for i in xrange( n_bands ):
    band_pass[i,:] = np.logical_and ( wv >= proba_min[i], \
                      wv <= proba_max[i] )
# Geometry for NADIR camera only
sza = 22.4
saa = 134.7
vza = 19.4
vaa = 102.4
raa = vaa - saa

In order to create the LUT, we need to specify the parameters that we shall be varying, as well as their bounds (you might want to actually include prior information on this, but we will just use some reasoably large parameter spread). We also need to set the number of samples to use for the emulator, and additionally, an inddependent validation dataset to see how well the emulator performs. Note that we will work in transformed units. These transformations result in a small linearisation of the model, which both benefits the emulator and the conditioning of the actual problem we want to solve. See Weiss et al 2001



In [9]:

    
# Number of training samples
n_train = 200
# Number of validation samples
n_validate = 5000
# Parameter names
param_names = ['bsoil', 'cbrown', 'hspot', 'n', \
               'psoil', 'xala', 'xcab', 'xcar', \
               'xcm', 'xcw', 'xlai']
#training parameter arrays
train_brf = np.zeros ( ( n_train, n_bands ) )
validate_brf = np.zeros ( ( n_train, n_bands ) )
train_params = {}
# Set limits (hardcoded function)
( pmin, pmax ) = limits()
#  **Training** dataset using random sampling
strain = samples( n=n_train )

# **Validation** dataset using random sampling
svalidate = samples( n=n_validate )
# Output simulated reflectance arrays

rho_train = np.zeros (( n_train, n_bands ))
rho_validate = np.zeros (( n_validate, n_bands ))

# Do forward modelling of **training** dataset
a, trans_p, rho = brdfModel ( strain, vza, sza, raa )
# Apply bandpass functions
for j in xrange( n_bands ):
    rho_train[:, j] = rho[:, band_pass[j,:]].sum(axis=1)/bw[j]
# Transform parameters to "real units"
tstrain = transform ( strain )
train_parameters, param_names = unpack ( tstrain)

# Do forward modelling of **training** dataset
a, trans_p, rho = brdfModel ( svalidate, vza, sza, raa )
# Apply bandpass functions
for j in xrange( n_bands ):
    rho_validate[:, j] = rho[:, band_pass[j,:]].sum(axis=1)/bw[j]
# Transform parameters to "real units"
tsvalidate = transform ( svalidate )
validate_parameters, param_names = unpack ( tsvalidate )
# Create a bounds list-of-lists
bounds = []
# Change transformed LAI to be a bit smaller than 15
pmin['xlai'] = 0.018
for p in param_names:
    bounds.append ( [pmin[p], pmax[p]])

The emulator

At this stage, we have created a 200-sample LUT and a 5000-sample validation dataset. We are in a position to create and train a GP emulator. This is very easy to do with the MultivariateEmulator class, as we shall see. Finally, we will also use the emulator to forward model the validation dataset, to check how well we are doing. Note that this stage can take quite some time!



In [12]:

    
emulator = MultivariateEmulator ( X=rho_train, y=train_parameters )


res = np.array ( [ (emulator.predict( validate_parameters[i, :])[0] - rho_validate[i, :]) \
    for i in xrange(n_validate) ] )









    



Decomposing the input dataset into basis functions... Done!
 ====> Using 6 basis functions
	Fitting GP for basis function 0
	Fitting GP for basis function 1
	Fitting GP for basis function 2
	Fitting GP for basis function 3
	Fitting GP for basis function 4
	Fitting GP for basis function 5

Let's see how well the emulator works. Note that we are trying the emulator on a completely independent, much larger dataset. Let's plot the mean error as well as the $1\sigma$ error spread:



In [26]:

    
plt.plot ( bands[:,0], res.mean(axis=0))
plt.fill_between(bands[:,0], y1=res.mean(axis=0)-res.std(axis=0), 
    y2=res.mean(axis=0)+res.std(axis=0), color='0.8')
plt.xlabel("Wavelength[nm]")
plt.ylabel("Simulated-Emulated [-]")









    Out[26]:





<matplotlib.text.Text at 0x2b68dde479d0>

The above plot shows that the emulator is quite reasonable: the mean error is close to 0, and the spread is fairly small (between $\pm$ 0.01 and $\pm$ 0.025. We could use these figures to add this "model inadequacy error" to our analysis, or increase the number of training samples that went into the GP emulator to make the quality of the emulator higher (and hence the error negligible). Also note that the error is spread equally over all parameter space: shrinking this space by putting better limits or using a better space-filling algorithm could also improve on things. For this problem, we will assume that te quality of the emulator is good enough and can be ignored.

Application to CHRIS/PROBA data

Now let's load the CHRIS data, and extract a couple of spectra. We shall choose a high vegetation and a low vegetation pixel:



In [49]:

    
def get_proba ( xx = [150, 250], yy=[350, 250]):
    """Gets PROBA data from UCL storage, and plots a RGB composite"""
    import os
    import gdal
    proba_dir = "/data/netapp_4/ucfajlg/BARRAX/SPARC/" + \
        "SPARC_DB/SPARC2003/PROBA/corregidasAtm/corregidasAtm_120703"
    g = gdal.Open ( os.path.join ( proba_dir, "refl_030712_3598.tif" ) )
    R = g.GetRasterBand ( 18 ).ReadAsArray()/10000.
    G = g.GetRasterBand ( 14 ).ReadAsArray()/10000.
    B = g.GetRasterBand ( 5 ).ReadAsArray()/10000.
    RGB = np.dstack ( (R, G, B ))
    plt.figure()
    plt.imshow ( RGB, interpolation='nearest')
    plt.plot ( xx[0], yy[0], 'rx', ms=20 )
    plt.plot ( xx[1], yy[1], 'rx', ms=20 )
    plt.grid ( False)
    plt.xlim(0, 300)
    plt.ylim ( 0, 400 )
    ax = plt.gca()
    ax.set_xticklabels([])
    ax.set_yticklabels([])
    plt.figure()
    return g

g = get_proba ()









    
















    





<matplotlib.figure.Figure at 0xa968750>

We can now use the validation dataset as a LUT to get a first guess of what the inversion of each of these spectra could be:



In [50]:

    
rhop = []
buf = g.ReadRaster(151, 351, 1, 1)
rhop.append(np.frombuffer(buf, dtype=np.int16).T/10000.)
plt.plot ( bands[:,0],np.frombuffer(buf, dtype=np.int16).T/10000.)
rhof = []
buf = g.ReadRaster(250, 150, 1, 1)
rhof.append(np.frombuffer(buf, dtype=np.int16).T/10000.)
plt.plot ( bands[:,0],np.frombuffer(buf, dtype=np.int16).T/10000.)
plt.xlabel("Wavelength[nm]")
plt.ylabel("Reflectance[-]")

iloc_rhof = np.sum(( rho_validate - rhof)**2,axis=1).argmin()
iloc_rhop = np.sum(( rho_validate - rhop)**2,axis=1).argmin()
xstart_rhof = unpack (  tsvalidate )[0][iloc_rhof,:]
xstart_rhop = unpack (  tsvalidate )[0][iloc_rhop,:]
plt.plot ( bands[:,0], rho_validate[iloc_rhof,:], '--')
plt.plot ( bands[:,0], rho_validate[iloc_rhop,:], '--')









    Out[50]:





[<matplotlib.lines.Line2D at 0x2b68fdab5390>]

We can see that the LUT inversion is still quite far from the real observed data, particularly for the low vegetation pixel. Let's use now minimise a cost function to improve on this!



In [55]:

    
def der_cost_obs (  x, rho, emulator ):
    rho_sim, rho_der = emulator.predict ( x )
    cost = 0.5*np.sum ( ( rho - rho_sim )  **2 )
    der_cost = - ( rho - rho_sim )*rho_der
    return cost, der_cost.sum(axis=1)

retvalf = scipy.optimize.fmin_l_bfgs_b(der_cost_obs, xstart_rhof, fprime=None, args=(rhof, emulator ), \
                                       bounds = bounds, iprint=0 )
x_sol_f = invtransform( pack(retvalf[0], param_names) )

retvalp = scipy.optimize.fmin_l_bfgs_b(der_cost_obs, xstart_rhop, fprime=None, args=(rhop, emulator ), \
                                       bounds=bounds, iprint=0 )
x_sol_p = invtransform( pack(retvalp[0], param_names) )


plt.plot ( bands[:,0], rhof[0], '-', lw=1.5 )
plt.plot ( bands[:,0], rhop[0], '-', lw=1.5 )

plt.plot ( bands[:,0], emulator.predict ( retvalf[0] )[0], '--', lw=1.5 )
plt.plot ( bands[:,0], emulator.predict ( retvalp[0] )[0], '--', lw=1.5 )
print x_sol_f
print x_sol_p









    



{'bsoil': 1.6315730116275207, 'lai': 0.56595971912702969, 'cm': 0.033099999999999997, 'car': 25.300000000000001, 'n': 0.80000000000000004, 'cbrown': 0.052634483238933528, 'psoil': 1.0, 'cab': 0.19999999999999879, 'ala': 0.0, 'cw': 0.0043, 'hspot': 0.999}
{'bsoil': 0.73667273751630358, 'lai': 1.4769214180742407, 'cm': 0.0036988426129459196, 'car': 18.865752912244663, 'n': 1.3958834485461418, 'cbrown': 0.0094298596364187567, 'psoil': 0.98784249531496715, 'cab': 47.700276413695875, 'ala': 0.0, 'cw': 0.01723756602046855, 'hspot': 0.83171341352636186}

So, a better fit, but still not great parameter values! We could improve on this providing better prior distributions (effectively, we've just bound the prior space usinb bounds above)

Adding a prior

The previous example shows that retrieving so many parameters, even with an informative spectral configuration, is riddled with complexities. Many parameter combinations will result in a reasonable fit to the data, and note that in the previous example, we have not even added any uncertainty in the observations due to thermal, quantisation or noise arising from atmospheric correction.

One way to improve on this is to place prior distributions on the parameters. Priors encode our knowledge about the problem before we even look at the data. Many different kinds of prior knowledge are typically available, and the real difficulty is to phrase this knowledge in a consistent and quantitative way.

For the case above, we shall basically assume that the parameters are distributed a priori with a Gaussian distribution with a given mean value and a given standard deviation. We will assume no correlation between parameters. We will define the distribution in the transformed variables space. Clearly, the choice of priors is subjective, and the ones chosen here are just given as an example:



In [114]:

    
# Define Untransformed parameter names
parameter_names = ['bsoil', 'cbrown','hspot','n','psoil','ala','cab','car','cm','cw','lai']
# The prior mean, in transformed units
t_prior = {}
t_prior['bsoil'] = 1.0
t_prior['cbrown'] = 0.2
t_prior['hspot'] = 0.5
t_prior['n'] = 1.5
t_prior['psoil'] = 0.5
t_prior['xala'] = 0.5
t_prior['xcab'] = 0.6
t_prior['xcar'] = 0.95
t_prior['xcm'] = 0.5
t_prior['xcw'] = 0.4
t_prior['xlai'] = 0.2
# Define the variance, again, in transformed units
cov = np.array ( [2, 0.5, 0.005, 0.5, 0.2, 0.05, 0.2, 0.005, 0.5, 0.05,  0.95] )
# The **inverse** covariance matrix is diagonal, with elements equal to 1./sigma^2
inv_cov = np.diag(1./cov)
x_mean = np.zeros((11))
print "%-16s\t%12s\t%12s%12s" % ( "Parameter", "Mean", "Mean-sigma", "Mean+sigma")
for i,p in enumerate(param_names):
    j = parameter_names[i]
    x_mean[i] = t_prior[p]
    
    u, v = invtransform(dict(zip([p],[x_mean[i] + np.sqrt(cov[i])])))[j], \
        invtransform(dict(zip([p],[x_mean[i] - np.sqrt(cov[i])])))[j]
    if u > v:
        print "%-16s\t%12.2G\t%12.2G%12.2G" % ( j, invtransform(dict(zip([p],[x_mean[i]])))[j], v, u)
    else:
        print "%-16s\t%12.2G\t%12.2G\t%12.2G" % ( j,  invtransform(dict(zip([p],[x_mean[i]])))[j], u, v)









    



Parameter       	        Mean	  Mean-sigma  Mean+sigma
bsoil           	           1	       -0.41         2.4
cbrown          	         0.2	       -0.51        0.91
hspot           	         0.5	        0.43        0.57
n               	         1.5	        0.79         2.2
psoil           	         0.5	       0.053        0.95
ala             	          45	          25          65
cab             	          51	        -4.6	     1.9E+02
car             	         5.1	          -2	          13
cm              	      0.0069	     -0.0019	         NAN
cw              	       0.018	      0.0094	       0.035
lai             	         3.2	       -0.32	         NAN

The above table shows the prior spread. We can see that the values are not very sensible (some parameters go negative), but remember that we also have set boundaries for the parameters, so hopefully, we will not get stupid results back. We will define a new cost function, that itself requires the fit to the observations we used above function, plus a prior one:



In [115]:

    
def der_cost_prior ( x, x_mean, cov ):
    cost = 0.5* ( x-x_mean).dot ( \
            cov ).dot ( x - x_mean )
    der_cost = ( x-x_mean).dot ( \
            cov )
    return cost, der_cost

def der_cost_obs (  x, rho, emulator ):
    rho_sim, rho_der = emulator.predict ( x )
    cost = 0.5*np.sum ( ( rho - rho_sim )  **2 )
    der_cost = - ( rho - rho_sim )*rho_der
    return cost, der_cost.sum(axis=1)

def der_cost ( x, rho, x_mean, cov, emulator ):
    cost1, dcost1 = der_cost_obs ( x, rho, emulator )
    cost2, dcost2 = der_cost_prior ( x, x_mean, cov )
    return cost1+cost2, dcost1+dcost2



In [118]:

    
retvalf = scipy.optimize.fmin_l_bfgs_b(der_cost, xstart_rhof, fprime=None, args=(rhof, x_mean, inv_cov, emulator ), bounds = bounds, iprint=0 )
x_sol_f = invtransform( pack(retvalf[0], param_names) )

retvalp = scipy.optimize.fmin_l_bfgs_b(der_cost, xstart_rhop, fprime=None, args=(rhop, x_mean, inv_cov, emulator ), bounds=bounds, iprint=0 )
x_sol_p = invtransform( pack(retvalp[0], param_names) )
plt.plot ( bands[:,0], rhof[0], '-', lw=1.5 )
plt.plot ( bands[:,0], rhop[0], '-', lw=1.5 )
plt.plot ( bands[:,0], emulator.predict ( retvalf[0] )[0], '--', lw=2.5 )
plt.plot ( bands[:,0], emulator.predict ( retvalp[0] )[0], '--', lw=2.5 )
print "%-16s\t%12s\t%12s%12s" % ( "Parameter", "Prior Mean", "rhof", "rhop")
for i,p in enumerate(param_names):
    j = parameter_names[i]    
    print "%-16s\t%12.2G\t%12.2G%12.2G" % ( j, invtransform(dict(zip([p],[x_mean[i]])))[j], x_sol_f[j], x_sol_p[j])









    



Parameter       	  Prior Mean	        rhof        rhop
bsoil           	           1	         1.2         1.1
cbrown          	         0.2	       0.069        0.14
hspot           	         0.5	         0.5         0.5
n               	         1.5	         1.6         1.5
psoil           	         0.5	        0.52         0.5
ala             	          45	          42          43
cab             	          51	         0.2          43
car             	         5.1	         5.1         5.1
cm              	      0.0069	      0.0077      0.0062
cw              	       0.018	       0.018       0.018
lai             	         3.2	         3.6         5.1

We inmediately see that imposing priors results in a poorer fit to the observations: we are trading off our expectation of the parameter value with the fit to the observation. Interestingly, we can see how far the posterior estimate has shifted from the prior: it appears that not very much for many parameters (ala, cw, where we don't expect much information will be available from this type of observation are important). LAI is interesting: we have a fairly tight prior, so the bare soil pixel shows up with a large lai... What would happen if we loosen the priors on LAI and chlorophyll?



In [123]:

    
# The prior mean, in transformed units
t_prior = {}
t_prior['bsoil'] = 1.0
t_prior['cbrown'] = 0.2
t_prior['hspot'] = 0.5
t_prior['n'] = 1.5
t_prior['psoil'] = 0.5
t_prior['xala'] = 0.5
t_prior['xcab'] = 0.6
t_prior['xcar'] = 0.95
t_prior['xcm'] = 0.5
t_prior['xcw'] = 0.4
t_prior['xlai'] = 0.2
# Define the variance, again, in transformed units
cov = np.array ( [2, 0.5, 0.005, 0.5, 0.2, 0.05, 0.5, 0.005, 0.5, 0.05,  1.95] )
# The **inverse** covariance matrix is diagonal, with elements equal to 1./sigma^2
inv_cov = np.diag(1./cov)
x_mean = np.zeros((11))
print "%-16s\t%12s\t%12s%12s" % ( "Parameter", "Mean", "Mean-sigma", "Mean+sigma")
for i,p in enumerate(param_names):
    j = parameter_names[i]
    x_mean[i] = t_prior[p]
    
    u, v = invtransform(dict(zip([p],[x_mean[i] + np.sqrt(cov[i])])))[j], \
        invtransform(dict(zip([p],[x_mean[i] - np.sqrt(cov[i])])))[j]
    if u > v:
        print "%-16s\t%12.2G\t%12.2G%12.2G" % ( j, invtransform(dict(zip([p],[x_mean[i]])))[j], v, u)
    else:
        print "%-16s\t%12.2G\t%12.2G\t%12.2G" % ( j,  invtransform(dict(zip([p],[x_mean[i]])))[j], u, v)









    



Parameter       	        Mean	  Mean-sigma  Mean+sigma
bsoil           	           1	       -0.41         2.4
cbrown          	         0.2	       -0.51        0.91
hspot           	         0.5	        0.43        0.57
n               	         1.5	        0.79         2.2
psoil           	         0.5	       0.053        0.95
ala             	          45	          25          65
cab             	          51	         -27	         NAN
car             	         5.1	          -2	          13
cm              	      0.0069	     -0.0019	         NAN
cw              	       0.018	      0.0094	       0.035
lai             	         3.2	       -0.94	         NAN



In [124]:

    
retvalf = scipy.optimize.fmin_l_bfgs_b(der_cost, xstart_rhof, fprime=None, args=(rhof, x_mean, inv_cov, emulator ), bounds = bounds, iprint=0 )
x_sol_f = invtransform( pack(retvalf[0], param_names) )

retvalp = scipy.optimize.fmin_l_bfgs_b(der_cost, xstart_rhop, fprime=None, args=(rhop, x_mean, inv_cov, emulator ), bounds=bounds, iprint=0 )
x_sol_p = invtransform( pack(retvalp[0], param_names) )
plt.plot ( bands[:,0], rhof[0], '-', lw=1.5 )
plt.plot ( bands[:,0], rhop[0], '-', lw=1.5 )
plt.plot ( bands[:,0], emulator.predict ( retvalf[0] )[0], '--', lw=2.5 )
plt.plot ( bands[:,0], emulator.predict ( retvalp[0] )[0], '--', lw=2.5 )
print "%-16s\t%12s\t%12s%12s" % ( "Parameter", "Prior Mean", "rhof", "rhop")
for i,p in enumerate(param_names):
    j = parameter_names[i]    
    print "%-16s\t%12.2G\t%12.2G%12.2G" % ( j, invtransform(dict(zip([p],[x_mean[i]])))[j], x_sol_f[j], x_sol_p[j])









    



Parameter       	  Prior Mean	        rhof        rhop
bsoil           	           1	         1.5           1
cbrown          	         0.2	        0.13        0.15
hspot           	         0.5	         0.5         0.5
n               	         1.5	         1.5         1.5
psoil           	         0.5	        0.61         0.5
ala             	          45	          44          44
cab             	          51	         0.2          34
car             	         5.1	         5.2         5.1
cm              	      0.0069	      0.0068      0.0065
cw              	       0.018	       0.018       0.018
lai             	         3.2	        0.93         6.9

What we see is that the values are more in line with what we expect, although the fit to the bare soil is still quite poor (this could be due to the spectrum used for soil in PROSAIl being quite different from the one in Barrax, and also due to missing BRDF effects on the soil).

Application to the image

We can now use the previous approach to invert individual pixels over a large chunk of the image, and see how things progress. This is just a loop, the only "clever" thing is that we select the starting point to be either the LUT response (efficiently implemented by a KD-Tree), or the value of the previous pixel, whichver has a lowest cost value. We assume that in a "piecwise continuous" landscape as this, this will be a beneficial approach and put the solver closer to the actual solution than starting somewhere randomly.

Note that this code snippet takes a whole to run



In [128]:

    
# Read a spatial chunk from the image
print time.asctime(), "Reading data...", 
proba = g.ReadAsArray()[:,100:300, 100:200]/10000.
print "  Read!"
# We create a KDtree for the LUT starting point
#tree = scipy.spatial.cKDTree(np.r_[rho_validate])
#A=tree.query ( proba.reshape((62,200*100)).T, k=1 )[1]
#proba_rho = proba.reshape((62,200*100))
big_X = np.empty((11, 200*100))
xstart = validate_parameters[A[0]]
#xprev = validate_parameters[A[0]]
for i in xrange(200*100):
    rhog = proba_rho[:,i]
#    s1 = der_cost ( xstart, rhog, x_mean, inv_cov, emulator )[0]
#    s2 = der_cost ( xprev, rhog, x_mean, inv_cov, emulator )[0]
#    if s1 < s2:
#        x0 = xstart
#    else:
#        x0 = xprev
    x0 = xstart
    retvalp = scipy.optimize.fmin_l_bfgs_b(der_cost, x0, fprime=None, \
            args=(rhog, x_mean, inv_cov, emulator ), bounds=bounds, iprint=-1 )
    xprev = retvalp[0]    
    x_sol_p = invtransform( pack(retvalp[0], param_names) )
    if i % 1000 == 0:
        print time.asctime(), i, A[i], x_sol_p['lai']
    big_X[:, i] = np.array([ x_sol_p[k] for k in parameter_names ])









    



Fri Aug 30 15:32:10 2013 Reading data...   Read!
Fri Aug 30 15:32:10 2013 0 1871 1.94005869494
Fri Aug 30 15:36:39 2013 1000 2912 2.69729600626
Fri Aug 30 15:41:16 2013 2000 4634 5.40156319836
Fri Aug 30 15:46:00 2013 3000 2912 2.61225496971
Fri Aug 30 15:50:45 2013 4000 2136 8.03476704217
Fri Aug 30 15:55:26 2013 5000 2229 1.9223299441
Fri Aug 30 16:00:00 2013 6000 1871 1.96246680888
Fri Aug 30 16:04:45 2013 7000 4324 3.95847221431
Fri Aug 30 16:09:39 2013 8000 3540 0.568826933573
Fri Aug 30 16:14:37 2013 9000 1598 3.74848105587
Fri Aug 30 16:19:43 2013 10000 2912 2.97909658293
Fri Aug 30 16:24:21 2013 11000 1871 0.324467222357
Fri Aug 30 16:29:00 2013 12000 2229 1.50937741556
Fri Aug 30 16:33:33 2013 13000 2229 1.88822018574
Fri Aug 30 16:38:01 2013 14000 1511 1.69344838928
Fri Aug 30 16:42:12 2013 15000 684 0.638358526729
Fri Aug 30 16:46:39 2013 16000 1726 3.83883160296
Fri Aug 30 16:51:11 2013 17000 3565 2.77436526508
Fri Aug 30 16:55:46 2013 18000 3565 2.6187078667
Fri Aug 30 17:00:20 2013 19000 3565 2.4204589001



In [132]:

    
tabl = dict(zip ( ['bsoil', 'cbrown','hspot','n','psoil','ala','cab','car','cm','cw','lai','RGB'], [0,1,2,3,4,5,6,7,8,9,10]))
X = big_X.reshape((11, 200, 100))

fig = plt.figure ( 1, figsize=(11.69, 8.27))
grid = ImageGrid(fig, 111, # similar to subplot(111)
                   nrows_ncols = (2, 6),
                   direction="row",
                   axes_pad = 0.5,
                   add_all=True,
                   label_mode = "all",
                   share_all = True,
                   cbar_location="top",
                   cbar_mode="each",
                   cbar_size="5%",
                   cbar_pad="5%",
                   )
cmap = plt.cm.PiYG_r
cmap.set_under ( 'k' )
cmap.set_over ( 'k' )
P =  ['bsoil', 'cbrown','hspot','n','psoil','ala','cab','car','cm','cw','lai','RGB']
major_formatter = plt.FormatStrFormatter('%6.4f')
for ax, ( i, p) in zip ( grid, enumerate(P)):
    if p == "RGB":
        R = g.GetRasterBand ( 18 ).ReadAsArray()[100:300, 100:200]/10000.
        G = g.GetRasterBand ( 14 ).ReadAsArray()[100:300, 100:200]/10000.
        B = g.GetRasterBand ( 5 ).ReadAsArray()[100:300, 100:200]/10000.
        RGB = np.dstack ( (R, G, B ))
        im = ax.imshow( RGB, interpolation="nearest")
        ax.set_xticks ( [] )
        ax.set_yticks ( [] )
        ax.set_xlabel(p)
    else:
        
        vmin = scipy.stats.scoreatpercentile( X[tabl[p],:,:].flatten(), 1)
        vmax = scipy.stats.scoreatpercentile( X[tabl[p],:,:].flatten(), 95)
        smin = scipy.stats.scoreatpercentile( X[tabl[p],:,:].flatten(), 5)
        smax = scipy.stats.scoreatpercentile( X[tabl[p],:,:].flatten(), 95)
    
        im = ax.imshow( X[tabl[p], :, :], interpolation="nearest", vmin=vmin, vmax=vmax, cmap=cmap)
        ax.set_xticks ( [] )
        ax.set_yticks ( [] )
        C=ax.cax.colorbar(im )
        ax.set_xlabel(p)
        ax.cax.set_xticks ( [smin, smax])
        ax.cax.xaxis.set_major_formatter ( major_formatter )
        ax.cax.tick_params(labelsize=8) 
        print p, vmin, vmax









    



bsoil 0.934942237703 1.46921045466
cbrown 0.0941322926124 0.200497758433
hspot 0.500039828126 0.500771793426
n 1.49661623914 1.5421838917
psoil 0.494675562205 0.589523819769
ala 42.0045493899 45.1848717118
cab 0.2 49.5107879597
car 5.06408827978 5.2219026216
cm 0.00490497776193 0.00738317171237
cw 0.0182261962209 0.018331313704
lai 0.18891298192 8.03476704217

The results need to be interpreted with relationship to the prior. As mentioned before, there are quite a few parameters that do not move significantly away from the prior due to the little sensitivity of the observations to the processes controlled by those parameters. It appears that only cab, LAI, bsoil and dry matter are significantly different from the prior.



In [ ]: