Uncertainty in Environmental Modelling

May 7 2013 - UNESCO IHE - Delft


In [3]:
%pylab inline


 
Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.kernel.zmq.pylab.backend_inline].
For more information, type 'help(pylab)'.

In [22]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

4. PARAMETER UNCERTAINTY BASED ON DATA EVALUATION

We don't know the uncertain parameter values, but have (uncertain) data to evaluate the model performance

When we're having measurement data, we can verify the model behaviour and confront the (uncertain) measurement data with the (uncertain) model parameters and (uncertain) model structure.

This is directly related to Inverse modelling, Model Calibration, Parameter Estimation and Bayesian methods for model optimization, but this is out of the scope of the lesson today.

We will focus only on one specific method, called Generalised Likelihood Uncertainty Estimation (GLUE). There is a lot of scientific debate in literature about the statistical correctness of the method, but we will apply the method here as a model diagnosis tool, without stating that the final prediction uncertainty of the model is thé correct one. Hence, in uncertainty modelling, there is always a part of epistemic uncertainty...

Generalised Likelihood Uncertainty Estimation (GLUE)

Background information to read at home...

The basic principle of the GLUE approach is the recognition that, when using potentially overparameterized models, different parameter sets can have similar behaviour in terms of model performance.

So, rather than one single global optimal parameter set, there may be many different parameter sets that produce acceptable simulations, especially if the errors in both model input and observations are taken into account. Indeed, when large errors in the observations are not solely resulting from noise in the measure- ments and parameters are fitted by minimizing the error between the model output and observations, the risk of accepting non-realistic parameter sets arises. Moreover, appropriate parameter combinations can potentially be missed, comparable with a statistical type II error, where a 'hit' is disregarded and seen as a 'miss'. The GLUE approach accounts for this by accepting all model simulations (and corresponding parameter sets) with sufficient performance relative to the measurement errors, instead of looking to the overall best fit.

The major steps of the GLUE method are:

  • Decision about the evaluation function to be used in the evaluation of the different model simulations in combination with a rejection criterion (threshold) to identify non-behavioural model outputs. The evaluation function (called likelihood) can be both informal like common objective functions such as Sum of Squared Errors (SSE) measures or formal measures using an explicit representation of model and measurement error, although the latter is mostly hard to identify due to the unknown combinations of uncertainties and errors involved. However, when more specific information is available about the measurement and input errors, this could be included in the expression of the likelihood function. As, the likelihood function can be described as a formal likelihood function incorporating the error characteristics (e.g. non-normal, heteroscedastic or autocorrelated model errors). Ideally, the rejection criterion should be chosen before starting the simulations based on the possible observation errors, but in practice the definition of this criterion is mainly a learning process during the nalysis itself.

  • Selection of model parameters and input variables (or boundary conditions) to consider uncertain inputs and defining the prior distributions of these uncertain inputs. The Monte Carlo runs will be sampled randomly from these distributions and, as such, these must reject the prior knowledge of the parameters. When only little prior information is available, a non-informative uniform distribution is typically selected a priori. The range of the distribution is based on expert knowledge or reported values in literature.

  • Running a sufficient set of model simulations using a Monte Carlo-based technique, sampling from the prior distributions. Latin Hypercube Sampling (LHS) can be used in order to ensure a representative sampling of the parameter space at a lower number of samples than random sampling. The outcomes of the different runs are compared to the measured values and for each parameter set the corresponding likelihood function value (defined in step 1) is calculated.

  • The parameter sets with insufficient behaviour (objective function values below the agreed threshold) are considered non-behavioural and excluded from the subsequent analysis by attributing them a zero likelihood. Applying this threshold is a crucial step in the analysis, since it is directly related to the prediction uncertainty. In the ideal situation of exactly one single global optimal parameter set, a threshold would be sought resulting in a likelihood value of the optimal set of one and zero for all the other parameter sets.

  • The obtained likelihood values of the behavioural model outputs are normalised. To determine model prediction uncertainty, the model outputs are ranked at every time step and the normalised likelihood values are used to construct the cumulative distribution for the output variable. Prediction uncertainty is subsequently determined by selecting the appropriate percentiles (e.g. 5% and 95%) from the Cumulative Distribution Function (CDF) at that time step.

In short...

The GLUE method is similar with part 1 (the propagation), but uses data to evaluate if the simulations coming from the Monte Carlo runs are relevant, taking into account uncertainty of data and parameters.


In [2]:
from IPython.display import Image
Image(filename='glue.png')


Out[2]:

The GLUE method is also available in the pySTAN package and other environments. However, today we will give an introduction and script this ourselve to better understand the underlying idea.

EXAMPLE: The respirometric case


In [5]:
from definitions_exercises import RespiroModel #Load in the model

Now, we have data to analyse these model outcomes:

Time DO OURex min mg/l mg/l/min 0 8.48823381 0.00651276 0.014845 8.48869102 0.02220774 0.031511667 8.49305846 0.03790266 ...

So, we read in these data:


In [6]:
#first a definition to transform our data from minutes to the 
#days we're working in in the model
def min2d(TSin):
    return TSin/(60*24)

f=np.loadtxt('OUR_exp_3f.txt', skiprows=2)  #command to read the data from the textfile
Mtime_min=f[:,0]
Mtime_data=min2d(Mtime_min)
MDO=f[:,1]
MOURex=f[:,2]  #DATA of interest!

In [7]:
#visualize the data
fig,ax = plt.subplots(1,2, figsize=(10,6))
ax[0].plot(Mtime_data,MDO,'k'),ax[0].set_title('DO')
ax[1].plot(Mtime_data,MOURex,'k'),ax[1].set_title('OUR')


Out[7]:
([<matplotlib.lines.Line2D at 0x37fa8b0>], <matplotlib.text.Text at 0x3888290>)

So, we start with a Monte Carlo run (or Latin Hypercube), sampling our parameter space, just as in part 1


In [8]:
from definitions_exercises import lhs_uniform   #(nsamples,pmin,pmax)

In [9]:
Mtime_d = Mtime_data  #We use the same outputs for ourt model as the data is!

#run Monte Carlo for OUR with Latin Hypercube
##################################################
MCruns = 3000

#We save the output in an array with in the rows the different outputs
#and in the columns the timesteps
MCoutputs = np.zeros((MCruns, Mtime_d.size))

#We have to sample the distributions before we start the Monte-Carlo for loop:
X0 = lhs_uniform(MCruns, 650.,700.)    
mumax = lhs_uniform(MCruns,3.5,4.5)
Y = lhs_uniform(MCruns,0.77,0.81)
Ks = lhs_uniform(MCruns,0.3,0.5)
tau = lhs_uniform(MCruns,0.00001,0.001)

for i in range(MCruns):   
    #Run the model for the sampled parameters
    Init_unc=np.array([X0[i]])
    Pars=[mumax[i],Y[i],Ks[i],tau[i]]
    modeloutput = RespiroModel(Pars,Init_unc,Mtime_d)
    #Save only the OUR output of the model to analyze
    MCoutputs[i,:]=modeloutput[4]

So, we can now compare all the model simulations from the Monte Carlo with this measured output:


In [10]:
for i in range(100): #plot first 100
    plt.plot(modeloutput[0],MCoutputs[i,:])
plot(Mtime_data,MOURex,'gray', linewidth=3)


Out[10]:
[<matplotlib.lines.Line2D at 0x3888af0>]

Comparing the data with the model output can be done with many evaluation (objective) functions. Since we're working in a (Bayesian) statistics environment, these are called Likelihoods. In 'real' Bayesian statistics, this likelihood function is a statistical model describing the error-structure (out of scope for today).

However, to apply the GLUE method, the requirements are less strict and the well-known Nash-Sutcliff (NS) function can also be used:

  • Nash-Sutcliffe: $$ NS = 1- \frac{\sum_1^t(y_d^t - y_m^t)^2}{\sum_1^t(y_d^t-\overline{y_d^t})^2}$$

Basically, best model fits need higher values going to 1 and lower values would go to zero. With th Nash-sutcliffe, values lower than 0 would mean the model is worser than a model that woul represent the mean value of the data, so is considered unbehaviroual to describe the model behaviour.


In [11]:
def Nash_Sutcliff(data, model):
    '''
    Function for calculating the Nash Sutcliffe objective function
    '''
    residuals = data-model
    DMS = np.sum(residuals**2)
    MMS = (np.sum((data-np.mean(model))**2))
    Like=max(1.0-(DMS)/(MMS),0.0)
    return Like

#Test this for 1 output:
output2use=0 
print 'NS is ',Nash_Sutcliff(MOURex, MCoutputs[output2use,:])
plt.plot(Mtime_d,MCoutputs[output2use,:])
plt.plot(Mtime_d,MOURex)


NS is  0.87589266428
Out[11]:
[<matplotlib.lines.Line2D at 0x38bba30>]

Calculate the likelihood for every run:


In [12]:
Likels = np.zeros((MCruns,1))
for i in range(MCruns): 
    Likels[i] = Nash_Sutcliff(MOURex, MCoutputs[i,:])

#Let's find the best one
idmax = np.argmax(Likels)

One way of representing the output of the GLUE analysis, is to show the Likelihood of the analysis for 1 parameter in a scatterplot, in the methodology called 'dotty-plots'.

I'm also adding here the 'best' fit


In [13]:
par2use = mumax
plt.scatter(par2use,Likels, color='k')
plt.scatter(par2use[idmax],Likels[idmax],color='r')


Out[13]:
<matplotlib.collections.PathCollection at 0x378cd90>

Dotty plots are very useful for the evaluation of the identification of parameters.

Another way of checking the outcome is a two-dimensional plot, I prepared a script for to make this more visual:


In [14]:
from definitions_exercises import Scatter_hist

When we decided about the cut-off, we have to keep the 'behavioural' parameter sets and create an output graph of the outputs we selected

Let's set a threshold and check the parameter behaviour


In [15]:
#First stack all the parameters in one array:
all_pars = np.vstack((X0,mumax,Y,Ks,tau)).transpose()
all_pars.shape


Out[15]:
(3000, 5)

We can change the threshold with different values and see how the selected parameters behave:


In [23]:
from matplotlib import cm
threshold = 0.98
#select subset
behavpars = all_pars[np.where(Likels>threshold)[0]]
behavLik = Likels[np.where(Likels>threshold)]

bwx=3
bwy=0.03
x=np.arange(10)
y=np.sin(x)
Scatter_hist(behavpars[:,0], behavpars[:,1], xbinwidth = bwx, 
             ybinwidth=bwy,cleanstyle = True)


Out[23]:
(<matplotlib.figure.Figure at 0xa7d4570>,
 <matplotlib.axes.AxesSubplot at 0xa7cad30>,
 <matplotlib.axes.Axes at 0xaaa4470>,
 <matplotlib.axes.Axes at 0xa9a73f0>)

So, for higher values of the likelihood, we see a relationship between the parameter $\mu_{max}$ and our initial condition $X0$. We can visualize this better with some colors to explain this behaviour:


In [17]:
colormapss=cm.RdYlGn_r
threshold = 0.98
#select subset
behavpars = all_pars[np.where(Likels>threshold)[0]]
behavLik = Likels[np.where(Likels>threshold)]


fig,axScatter,axHistx,axHisty = Scatter_hist(behavpars[:,0], behavpars[:,1], 
                                             data1b=all_pars[:,0], data2b=all_pars[:,1], 
                                             xbinwidth = bwx, ybinwidth=bwy, 
                                             cleanstyle = True, s=25, marker='o', 
                                             SSE=1./behavLik, SSEb=1./Likels, 
                                             vmax=1./threshold+(1./threshold)*0.1, 
                                             colormaps= colormapss, roodlichter=1.0)

axScatter.set_ylabel(r'$\mu_{max}$',fontsize=20)
axScatter.set_xlabel(r'$X_0$',fontsize=20)


*args, **kwargs do not have any influcence when using two            options
Out[17]:
<matplotlib.text.Text at 0xa3d7570>

Another way of visualizing this is in 3D (without code, since computational too intensive, done with pure SSE):


In [4]:
Image(filename='goot.png')


Out[4]:

We can make some important issues about this correlation:

  • This correlation is directly related with uncertainty of the parameters and must be propagated towards our prediction uncertainty
  • In case we would be able to measure or better define one of them, we would be able to exclude this uncertainty
  • The Error propagation of the first part would actually over-estimate the uncertainty, since we can, based on the data, diminish the initial guessed uniform distribution towards a new posterior distribution.
  • The GLUE-method guides us in identifying this type of model uncertainty.

So, once we selected the parameters sets that we trust, based on the data, we can actually generate posterior bands of these, just as we did in the first part with error propagation. However, in the GLUE method, the selected outcomes are weighted, so better fits are having more weight.


In [18]:
#behavpars, behavLik we already hav, so now we do this for the outputs
behavMCoutputs = MCoutputs[np.where(Likels>threshold)[0]]
#We normalize the Likelihoods, in order to make the sum 1 
#(representing a probability distribution)
Likl_norm=behavLik/behavLik.sum()

Since we want to weight the outputs based on the Likelihood values, we have to adapt the percentile calculation, this is implemented in a separate function:


In [19]:
from definitions_exercises import UncertOut
[lowerb,upperb] = UncertOut(Likl_norm,behavMCoutputs, qval=0.025)

In [20]:
#So, what have we basically done, if we would do it for 1 timestep?
fig,ax = plt.subplots()
#the likelihood as weight:
ax.hist(behavMCoutputs[:,500], weights=Likl_norm, normed=True, color='k')


Out[20]:
(array([  3.94451755,  10.11042232,  10.57140826,  14.5551368 ,
        11.03786604,  14.99623748,  20.28512471,  12.75887673,
         9.67674756,   3.95337431]),
 array([ 0.46531078,  0.47424815,  0.48318553,  0.4921229 ,  0.50106027,
        0.50999764,  0.51893502,  0.52787239,  0.53680976,  0.54574713,
        0.5546845 ]),
 <a list of 10 Patch objects>)

So, we have our uncertainty boundaries an can plot them, similar to part 1:


In [25]:
plt.plot(Mtime_d, lowerb,'k--')
plt.plot(Mtime_d, upperb,'k--')
plt.fill_between(Mtime_d, lowerb, upperb, color='gray')
plt.plot(Mtime_d,MOURex,'r-')


Out[25]:
[<matplotlib.lines.Line2D at 0xaec5e70>]
------------------------------------------------------------------------------------------

EXERCISE: GLUE UNCERTAINTY EVALUATION

The python file ex7_glue.py brings together the information of the GLUE analysis: * Run the file and check the results. * Make a dotty plot of parameter $Y$ and discuss the result * Change the threshold value and see the effect it has on the final prediction uncertainty * Instead of the 5% confidence interval (2.5 and 97.5), you want to show the 10% confidence interval. Chnage this in the file and compare the result * Currently, the Scatter_hist-function gives the 2-d parameter scatterplot for $\mu_{max} and X_0$; change this into $Y$ and $\tau$. What can you say about these parameters? ------------------------------------------------------------------------------------------

Summary

The GLUE method evaluates the uncertain model based on available data to get more information about the model behaviour.

Advantages:

  • Easy to implement and to work with
  • The possibility of working with different evaluation functions
  • More information about the model behaviour as with error propagation methods
  • It is at the same time also a model optimization algorithm (however, very inefficient!)
  • It can also be used to define parameter sensitivity and identifiability (not shown today)

Disadvantages:

  • The selection of the Likelihood function is under discussion in literature
  • The method is inefficient for larger models, due to the high level of samples needed!
  • The selection of the threshold is very subjective and the resulting model prediction uncertainty is a direct effect of this decision

So what's more?

  • Other Bayesian methods are available, based on a more efficient way of sampling, called Markov Chain Monte Carlo.
  • These methods have both advantages and disadvantages compared to GLUE, see literature.

Further reading:

Beven, Keith. Environmental Modelling: An Uncertain Future? Taylor & Francis, 2008.

Vrugt, Jasper A., Cajo ter Braak, H. V. Gupta, and Bruce Robinson. “Equifinality of Formal (DREAM) and Informal (GLUE) Bayesian Approaches in Hydrologic Modeling?” Stochastic Environmental Research and Risk Assessment 23, no. 7 (2009): 1011–1026. http://dx.doi.org/10.1007/s00477-008-0274-y.