Uncertainty in Environmental Modelling

May 7 2013 - UNESCO IHE - Delft


In [10]:
%pylab inline


Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.kernel.zmq.pylab.backend_inline].
For more information, type 'help(pylab)'.

3. SENSITIVITY ANALYSIS

We don't know the uncertain inputs (parameters, data) and their characteristics

In case we don't know explicitly uncertainty of the inputs in the model, we can still get useful information about the contribution of uncertainty the input factors have on the model output. The analysis of identifying the most important factors contributing in input uncertainty is called sensitivity analysis.

Different methods exist, but we will focus here on:

  • Local sensitivity analysis
  • Global sensitivity analysis: Morris screening method

Remark: We still assume that the model structure we're using is 'correct'..


In [11]:
#Loading the python modules
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import random

Local Sensitivity Analysis (parameters)

In local sensitivity analysis, the change of the output as a consequence of a change in the input is analysed, expressed mathematically as $\frac{\delta y(t)}{\delta p}$ with $y$ the model output, and $p$ the model parameter.

In some occasions, this derivative can be calculated analystically, but in most instances a numeric approximation is made to analyse the sensitivity:

$$\frac{\delta y(t)}{\delta p} = \frac{y(t,p_j + \Delta p_j)-y(t,p_j - \Delta p_j)}{2\Delta p_j}$$

In [12]:
from IPython.display import Image
Image(filename='local_sens.png')


Out[12]:

So, basically, the model runs twice with slightly different parameter values and based on the difference of those two runs, the sensitivity is calculated. All other parameters keep their original value.

Important in the analysis is the definition of the $\Delta p_j = \zeta*p_j$, which can be defined as the choice of the perturbation factor, $\zeta$, a defined fraction of the parameter value itself.

Important, the sensitivity is calculated around one point in the 'parameter space', so for one chosen value of the parameters and with the numeric approach, linearity of the model is assumed around this point. So, when choosing the perturbation factor, this value can't be too large in order to fulfill this assumption. On the other hand, when choosing this value too small, the analysis will be hindered by the accuracy of the numerical solver.

The resulting $\frac{\delta y(t)}{\delta p}$ is also called Centralised Absolute Sensitivity (CAS)

Let's test this for the respiromodel we already know, we want to check the sensitivity of the parameters $K_s$ and $\mu_{max}$ on the output we measure $OUR_{ex}$:


In [13]:
from definitions_exercises import RespiroModel

#SET TIME
Mtime_d = np.arange(0.,0.05,0.0005)
#set X0 as initial condition
X0=6.75632395e+02
#set the parameter values
mumax=4.
Y=0.78
Ks=0.4
tau=2.25e-04
Init_unc=np.array([X0])


perturbation = 0.0001
#SENSITIVITY ANALYSIS  #PAR Ks
Ks1 = Ks + Ks*perturbation 
Pars=[mumax,Y,Ks1,tau]
Ks_plus=RespiroModel(Pars,Init_unc,Mtime_d)[4]  #returns [time,X,Ss,O,OUR_ex,infodic]

Ks2 = Ks - Ks*perturbation 
Pars=[mumax,Y,Ks2,tau]
Ks_min=RespiroModel(Pars,Init_unc,Mtime_d)[4] 

CAS_Ks = (Ks_plus-Ks_min)/(2.*perturbation*Ks)

#SENSITIVITY ANALYSIS  #PAR mumax
mumax1 = mumax + mumax*perturbation 
Pars=[mumax1,Y,Ks,tau]
mumax_plus=RespiroModel(Pars,Init_unc,Mtime_d)[4]  #returns [time,X,Ss,O,OUR_ex,infodic]

mumax2 = mumax - mumax*perturbation 
Pars=[mumax2,Y,Ks,tau]
mumax_min=RespiroModel(Pars,Init_unc,Mtime_d)[4] 

CAS_mumax = (mumax_plus-mumax_min)/(2.*perturbation*mumax)

fig,ax = plt.subplots(1,2, figsize=(10,4))
ax[0].plot(Mtime_d, CAS_Ks)
ax[0].set_title('Ks sens')
ax[1].plot(Mtime_d, CAS_mumax)
ax[1].set_title('mumax sens')
ax[0].set_ylabel('CAS')


Out[13]:
<matplotlib.text.Text at 0x395a7b0>

However, to make these results comparable with eachother, we must take into account the differences in parameter values!

That's why we can also calculate:

CPRS: Centralised Parameter Relative Sensitivity = $CAS \cdot p_j$
(CTRS: Centralised Total Relative Sensitivity = $CAS \cdot \frac{p_j}{y}$)


In [14]:
CPRS_Ks = CAS_Ks * Ks
CPRS_mumax = CAS_mumax * mumax

fig,ax = plt.subplots(1,2, figsize=(10,4))
ax[0].plot(Mtime_d, CPRS_Ks)
ax[0].set_title('Ks sens')
ax[1].plot(Mtime_d, CPRS_mumax)
ax[1].set_title('mumax sens')
ax[0].set_ylabel('CPRS')


Out[14]:
<matplotlib.text.Text at 0x38bea10>
------------------------------------------------------------------------------------------

EXERCISE: LOCAL SENSITIVITY ANALYSIS

In the python file ex5_local_sens.py, the the local sensitivity as presented above is already prepared. * Implement the sensitivity for $Y$ and $\tau$ yourself and compare with the other two. * Change the perturbation factor for different values. What happens if you make the parameter value very small? * Local sensitivity values can be both positive and negative, what does this mean concerning the effect of the parameter on the model output (for that specific parameter value)? ------------------------------------------------------------------------------------------

Local sensitivity, summary

Drawbacks:

  • Only one single point in the entire 'uncertain parameter space' checked
  • Linearity of the model is 'locally' assumed and not always valid in non-linear environmental models

Advantages:

  • Information in function of the time
  • Minimal computational efforts (2 runs for each parameter)
  • Check the model behaviour and evaluate of the behaviour corresponds to the expected behaviour

Global sensitivity analysis

As already mentioned, the analysis of the local sensitivity analysis is very dependent on the parameter values working with, certainly for non-linear models. With other parameter values, the results could be completely different, since other parameter interactions are influencing the importance.

To overcome this local behaviour and to chekc the sensitivity of the model parameter in the 'expected uncertainty range', global sensitivity analysis is used.

Different techniques exist:

  • Standardised regression coefficients (linearity assumption)
  • Screening techniques
  • Variance based techniques
  • Raginal sensitivity analysis (see later, part3, data exist)

We will here focus on screening techniques, where the main aim is to rank the parameters according to their sensitivity (i.e. contribution in output uncertainty). Basically, it is an extension of the local sensitivity, since a Monte Carlo sampling is used to analyse the sensitivity for different parameter sets.

EXAMPLE - Contamination of a river

Suppose the release of a biodegrable contamination in the river (e.g. illegal industrial release). This release will affect the river downstream and the present oxygen in the river. Information about the oxygen in the river can be given by:

  • Dissolved Oxygen (DO): maximum saturation of DO in water. Fresh water in rivers generally has a DO of around 9 $mg/l$
  • Biochemical Oxygen Demand (BOD): indication of the contamination and represents the amount of oxygen consumption when consumed biologically. In clean fresh water, BOD is generally less than 1 $mg/l$. In contaminated rivers, BOD can be 2-8 $mg/l$

We can develop a model for this bio-system. For this example, we 're not considering the spatial component (i.e. discharge in the river is constant) and as such, we can describe the model in function of the time only:

$$\frac{\delta BZV}{\delta t} = BZV_{in} - k_1 \cdot BZV$$$$\frac{\delta DO}{\delta t} = k_2 \cdot (DO_{sat} - DO) - k_1 \cdot BZV$$

with $BZV_{in}$ the BZV-flux of the contamination (expressed in $mg\;l^{-1}\;min^{-1}$), $DO_{sat}$ the saturation constant of dissolved oxygen in water, $k_1$ the microbial degradation rate ($min^{-1}$) and $k_2$ the rate of oxygen transfer in the atmosphere ($min^{-1}$).

Implementation of the model:


In [15]:
from scipy.integrate import odeint
#------------------------------------------------------------------------------
#DEFINITIONS TO RUN THE MODEL--------------------------------------------------
def deriv_river(u,t,Pars, dummy):
    '''
    Differential equations of the river contamination model
    '''
    #Define the parameters
    k1 = np.float64(Pars[0])
    k2 = np.float64(Pars[1])
    BZVin = np.float64(Pars[2])
    DOsat = np.float64(Pars[3])

    dBZVdt = BZVin - k1*u[0]                        #BZV
    dDOdt = k2 *(DOsat - u[1])-k1*u[0]              #DO

    return np.array([dBZVdt,dDOdt])

def RiverModel(Pars,time):
    '''
    Run the river contamination model
    '''
    #Define the initial conditions (Constants)
    BZV0 = 7.33
    DO0 = 8.5
    yinit = np.array([BZV0, DO0])  

    #Solve with LSODA scheme
    y,infodic=odeint(deriv_river,yinit,time,full_output=True, printmessg=False, args=(Pars,[]))

    #Get outputs
    BZV = y[:,0]
    DO = y[:,1]

    return [time,BZV,DO,infodic]
#END OF DEFINITIONS TO RUN THE MODEL-------------------------------------------
#------------------------------------------------------------------------------

#run model
#-----------------------------------
Mtime_d = np.linspace(0.,25.,250.)
#set the parameters
k1 = 0.3
k2 = 0.4
BZVin = 1
DOsat = 11
Pars=[k1,k2,BZVin,DOsat]
outputmodel=RiverModel(Pars,Mtime_d)
fig,ax = plt.subplots()
ax.plot(outputmodel[0], outputmodel[1],'k--', label='BOD')
ax.plot(outputmodel[0], outputmodel[2],'k-', label='DO')
ax.legend(loc=7)
ax.set_xlabel('Time')
ax.set_ylabel(r'$mg\;l^{-1}$', fontsize = 18)


Out[15]:
<matplotlib.text.Text at 0x5a6fe10>

So, the question is, which parameter does have the largest effect on the output of DO or BZV?

First, let's test the local sensitivity, but we will calculate it for different values of $k_1$ to check the local characteristic of the calculation:


In [16]:
k1t=[0.1,0.2,0.3,0.4,0.5] #values to check
perturbation = 0.0001

for k1 in k1t:
    k2 = 0.4
    BZVin = 1
    DOsat = 11
    k1_1 = k1+k1*perturbation
    Pars=[k1_1,k2,BZVin,DOsat]
    k1_plus=RiverModel(Pars,Mtime_d)[2]

    k1_2 = k1-k1*perturbation
    Pars=[k1_2,k2,BZVin,DOsat]
    k1_min=RiverModel(Pars,Mtime_d)[2]
    CAS_k1 = (k1_plus-k1_min)/(2.*perturbation*k1)
    plt.plot(Mtime_d, CAS_k1, label = 'k1 = '+str(k1))
    plt.legend(loc=4)
    plt.ylabel('CAS')


So, we see the dependence of the sensitivity in function of the parameter value itself. So let's go for the global sensitivity of this model

Morris Screening method

(For a full theoretical description of the Morris screening method, I advice to read Global Sensitivity Analysis, The Primer, Saltelli A.. Here only a short introduction is given.)

The Morris screening method calculates so-called 'Elementary Effects': $$EE_i = \frac{y(p_i + \Delta)- y(p_i)}{\Delta}$$ (comparable to the numerical approach in the local sensitivity, but here the stepsize is fixed and larger)

By calclualting multiple of these Elementary effects (taken from a uniform distribution) and evaluating the mean ($\mu$), the mean of the absolute values (\mu^*) and the standard deviation ($\sigma$) of them, a characterisation of the parameters is done:

  • Large values of $\mu$ are linked with important parameters
  • Large values of $\sigma$ are linked with non-linear effects of the model or parameter interaction
  • Small values of $\mu$ in combination with high $\mu^*$ values defines that the parameter values have both positive and negative effects on the model output.

In [17]:
from IPython.display import Image
Image(filename='morris.png')


Out[17]:

The implementation is already done as part of the pySTAN-python package. In the folder sensitivity, the aprt needed for the Morris screening is present. By loading this part of the package, we will be able to do the Morris screening. More information is also present here .


In [18]:
import os
import sys
sys.path.append(os.path.join(os.getcwd(),'sensitivity'))
from sensitivity import *

First, we load our parameters and define the ranges in between we want to do the analysis, for each parameter we define the lower and upper boundary and the name.


In [19]:
Pars = [(0.1,0.6,r'$k_1$'),(0.1,0.6,r'$k_2$'),(10.,12.,r'$DO_{sat}$'), 
          (0.1,2.0,r'$BZV_{in}$')]
print Pars


[(0.1, 0.6, '$k_1$'), (0.1, 0.6, '$k_2$'), (10.0, 12.0, '$DO_{sat}$'), (0.1, 2.0, '$BZV_{in}$')]

We set up the Morris screening to be able to do the screening, here we're using our own model, called 'external'


In [20]:
morris = MorrisScreening(Pars, ModelType = 'external')


The analysed model is externally run

The implementation tries to optimize the reach in parameter space, by taking samples in such a way that the distance between different trajectories are maximized. This can be done with 'optmized_Groups'. Checking the result visually is done with 'optimized_diagnostic':


In [21]:
# calculate an optimized set of parameter sets to run model
OptMatrix, OptOutVec = morris.Optimized_Groups(nbaseruns=150, intervals = 4, 
                                               noptimized=10, Delta = 0.4)
# Check the quality of the selected trajects                                                                           
morris.Optimized_diagnostic(width=0.15)


The quality of the sampling strategy changed from 0.375 with the old strategy to 0.475 for the optimized strategy
  • nbaseruns: the set of total samples
  • noptimized: the subset we're using to do the morris screening
  • intervals/Delta: parameters of the Morris screening method

total number of model runs is noptimized(number of pars +1) = 10(4+1)


In [22]:
morris.parset2run.shape


Out[22]:
(50, 4)

For all the parameters in morris.parset2run, we now need to run the model and save the result. An important step here is which information we want to extract from the model output. Different options are available and the choice is dependent from the specific model and research question:

  • Information for each timestep?
  • Taking the average of the model output?
  • Taking the minimum or maximum value of the model output during the simulation?

For this case, we will check DO and we want to now which parameter is influencing the most the oxygen minimum value that is achieved. Hence, characterizing this 'depression' of the oxygen as accurate as possible is essential to evaluate the potential harm the contamination is applying.

Running the model is completely similar as the Monte Carlo running in the first part.
However, in this case, the samples were prepared by the Morris-method!


In [23]:
#number of runs we have to do
Nruns = morris.parset2run.shape[0]
#We save the output in an array with in the rows the different outputs
MCoutputs = np.zeros((Nruns, 3)) #depends on the outputs

Mtime_d = np.linspace(0.,25.,250.) #model output time
for i in range(Nruns):
    #set the parameters
    param = morris.parset2run[i]
    outputmodel=RiverModel(param,Mtime_d)[2]
    #only take the minimum value of DO:
    MCoutputs[i,0] = outputmodel.min() #Take the DO minimum here
    MCoutputs[i,1] = outputmodel.mean() #Take the DO average
    MCoutputs[i,2] = outputmodel[50] #Take the DO at timestep 50

In [24]:
#Calculate the Morris screening diagnostics
morris.Morris_Measure_Groups(MCoutputs) 

#plot the mu-barchart to rank the parameters
morris.plotmu(outputid=2, ec='grey',fc='grey') #change to 2 and compare!!


No Groups are used
Different outputs are used, so split them in comparing the output, by using outputid
Out[24]:
(<matplotlib.figure.Figure at 0x60f9190>,
 <matplotlib.axes.AxesSubplot at 0x60f98b0>)

In [25]:
#plot the mu* in function of the std to evaluate parameter importance and parameter interactions
morris.plotmustarsigma(zoomperc = 'none', outputid = 2)
#outputid=0 when only one output. If you're considering multiple outputs, this identifies the output


Out[25]:
(<matplotlib.figure.Figure at 0x60e18d0>,
 <matplotlib.axes.AxesSubplot at 0x5fbda30>,
 [<matplotlib.text.Text at 0x5ccbef0>,
  <matplotlib.text.Text at 0x5ccb450>,
  <matplotlib.text.Text at 0x5ccbe50>,
  <matplotlib.text.Text at 0x5faa4f0>])

In [26]:
#Save the results in a text-file
morris.latexresults()


tex: The 1 th output evaluation criterion is used
Latex Results latex table file saved in directory D:\Projecten\Githubs\ipython_slides\ipython
------------------------------------------------------------------------------------------

EXERCISE: MORRIS SCREENING METHOD

The python file ex6_global_sens.py contains the sensitivity analysis described above. * Run the script and verify the outputs, add the visualisation of a barplot for $\sigma$ (tip: tab-completion in python or check the website here). * You're interested in the sensitivity of the model parameters for model output at calculated timestep 50. Add this in the script and check the difference with the current sensitivities. Is the chosen output important for the result of the analysis? ------------------------------------------------------------------------------------------

Summary

Sensitivity analysis is useful in comparing the effect of the uncertain input (parameters) on the output. Parameters with large sensitivity values will contribute more in the prediction uncertainty. Moreover, this information is also useful for parameter estimation, since these parameters are influencing the model fit most.

When the values of you're model parameters are known, local sensitivity analysis can give a lot of useful information with a limited set of runs. However, in complex non-linear models, global sensitivity analysis can give information about the influence taking into account the entire parameter space.

When applying global sensitivity analysis, it is important to consider the output variable that is used and the latter is always in function of the research question. Testing with multiple is always recommended!

Other methods are available in different formats and environments, for an overview of methods, with nice explanation when to use which method, I would advice following book:

Saltelli, Andrea, Marco Ratto, Terry Andres, Francesca Campolongo, Jessica Cariboni, Debora Gatelli, Michaela Saisana, and Stefano Tarantola. Global Sensitivity Analysis, The Primer. John Wiley & Sons Ltd, 2008.

The algorithms are available in:

For the latter, you can always contact me: stvhoey.vanhoey@ugent.be