In [10]:
%pylab inline
In case we don't know explicitly uncertainty of the inputs in the model, we can still get useful information about the contribution of uncertainty the input factors have on the model output. The analysis of identifying the most important factors contributing in input uncertainty is called sensitivity analysis.
Different methods exist, but we will focus here on:
Remark: We still assume that the model structure we're using is 'correct'..
In [11]:
#Loading the python modules
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import random
In local sensitivity analysis, the change of the output as a consequence of a change in the input is analysed, expressed mathematically as $\frac{\delta y(t)}{\delta p}$ with $y$ the model output, and $p$ the model parameter.
In some occasions, this derivative can be calculated analystically, but in most instances a numeric approximation is made to analyse the sensitivity:
$$\frac{\delta y(t)}{\delta p} = \frac{y(t,p_j + \Delta p_j)-y(t,p_j - \Delta p_j)}{2\Delta p_j}$$
In [12]:
from IPython.display import Image
Image(filename='local_sens.png')
Out[12]:
So, basically, the model runs twice with slightly different parameter values and based on the difference of those two runs, the sensitivity is calculated. All other parameters keep their original value.
Important in the analysis is the definition of the $\Delta p_j = \zeta*p_j$, which can be defined as the choice of the perturbation factor, $\zeta$, a defined fraction of the parameter value itself.
Important, the sensitivity is calculated around one point in the 'parameter space', so for one chosen value of the parameters and with the numeric approach, linearity of the model is assumed around this point. So, when choosing the perturbation factor, this value can't be too large in order to fulfill this assumption. On the other hand, when choosing this value too small, the analysis will be hindered by the accuracy of the numerical solver.
The resulting $\frac{\delta y(t)}{\delta p}$ is also called Centralised Absolute Sensitivity (CAS)
Let's test this for the respiromodel we already know, we want to check the sensitivity of the parameters $K_s$ and $\mu_{max}$ on the output we measure $OUR_{ex}$:
In [13]:
from definitions_exercises import RespiroModel
#SET TIME
Mtime_d = np.arange(0.,0.05,0.0005)
#set X0 as initial condition
X0=6.75632395e+02
#set the parameter values
mumax=4.
Y=0.78
Ks=0.4
tau=2.25e-04
Init_unc=np.array([X0])
perturbation = 0.0001
#SENSITIVITY ANALYSIS #PAR Ks
Ks1 = Ks + Ks*perturbation
Pars=[mumax,Y,Ks1,tau]
Ks_plus=RespiroModel(Pars,Init_unc,Mtime_d)[4] #returns [time,X,Ss,O,OUR_ex,infodic]
Ks2 = Ks - Ks*perturbation
Pars=[mumax,Y,Ks2,tau]
Ks_min=RespiroModel(Pars,Init_unc,Mtime_d)[4]
CAS_Ks = (Ks_plus-Ks_min)/(2.*perturbation*Ks)
#SENSITIVITY ANALYSIS #PAR mumax
mumax1 = mumax + mumax*perturbation
Pars=[mumax1,Y,Ks,tau]
mumax_plus=RespiroModel(Pars,Init_unc,Mtime_d)[4] #returns [time,X,Ss,O,OUR_ex,infodic]
mumax2 = mumax - mumax*perturbation
Pars=[mumax2,Y,Ks,tau]
mumax_min=RespiroModel(Pars,Init_unc,Mtime_d)[4]
CAS_mumax = (mumax_plus-mumax_min)/(2.*perturbation*mumax)
fig,ax = plt.subplots(1,2, figsize=(10,4))
ax[0].plot(Mtime_d, CAS_Ks)
ax[0].set_title('Ks sens')
ax[1].plot(Mtime_d, CAS_mumax)
ax[1].set_title('mumax sens')
ax[0].set_ylabel('CAS')
Out[13]:
However, to make these results comparable with eachother, we must take into account the differences in parameter values!
That's why we can also calculate:
CPRS: Centralised Parameter Relative Sensitivity = $CAS \cdot p_j$
(CTRS: Centralised Total Relative Sensitivity = $CAS \cdot \frac{p_j}{y}$)
In [14]:
CPRS_Ks = CAS_Ks * Ks
CPRS_mumax = CAS_mumax * mumax
fig,ax = plt.subplots(1,2, figsize=(10,4))
ax[0].plot(Mtime_d, CPRS_Ks)
ax[0].set_title('Ks sens')
ax[1].plot(Mtime_d, CPRS_mumax)
ax[1].set_title('mumax sens')
ax[0].set_ylabel('CPRS')
Out[14]:
Local sensitivity, summary
Drawbacks:
Advantages:
As already mentioned, the analysis of the local sensitivity analysis is very dependent on the parameter values working with, certainly for non-linear models. With other parameter values, the results could be completely different, since other parameter interactions are influencing the importance.
To overcome this local behaviour and to chekc the sensitivity of the model parameter in the 'expected uncertainty range', global sensitivity analysis is used.
Different techniques exist:
We will here focus on screening techniques, where the main aim is to rank the parameters according to their sensitivity (i.e. contribution in output uncertainty). Basically, it is an extension of the local sensitivity, since a Monte Carlo sampling is used to analyse the sensitivity for different parameter sets.
EXAMPLE - Contamination of a river
Suppose the release of a biodegrable contamination in the river (e.g. illegal industrial release). This release will affect the river downstream and the present oxygen in the river. Information about the oxygen in the river can be given by:
We can develop a model for this bio-system. For this example, we 're not considering the spatial component (i.e. discharge in the river is constant) and as such, we can describe the model in function of the time only:
$$\frac{\delta BZV}{\delta t} = BZV_{in} - k_1 \cdot BZV$$$$\frac{\delta DO}{\delta t} = k_2 \cdot (DO_{sat} - DO) - k_1 \cdot BZV$$with $BZV_{in}$ the BZV-flux of the contamination (expressed in $mg\;l^{-1}\;min^{-1}$), $DO_{sat}$ the saturation constant of dissolved oxygen in water, $k_1$ the microbial degradation rate ($min^{-1}$) and $k_2$ the rate of oxygen transfer in the atmosphere ($min^{-1}$).
Implementation of the model:
In [15]:
from scipy.integrate import odeint
#------------------------------------------------------------------------------
#DEFINITIONS TO RUN THE MODEL--------------------------------------------------
def deriv_river(u,t,Pars, dummy):
'''
Differential equations of the river contamination model
'''
#Define the parameters
k1 = np.float64(Pars[0])
k2 = np.float64(Pars[1])
BZVin = np.float64(Pars[2])
DOsat = np.float64(Pars[3])
dBZVdt = BZVin - k1*u[0] #BZV
dDOdt = k2 *(DOsat - u[1])-k1*u[0] #DO
return np.array([dBZVdt,dDOdt])
def RiverModel(Pars,time):
'''
Run the river contamination model
'''
#Define the initial conditions (Constants)
BZV0 = 7.33
DO0 = 8.5
yinit = np.array([BZV0, DO0])
#Solve with LSODA scheme
y,infodic=odeint(deriv_river,yinit,time,full_output=True, printmessg=False, args=(Pars,[]))
#Get outputs
BZV = y[:,0]
DO = y[:,1]
return [time,BZV,DO,infodic]
#END OF DEFINITIONS TO RUN THE MODEL-------------------------------------------
#------------------------------------------------------------------------------
#run model
#-----------------------------------
Mtime_d = np.linspace(0.,25.,250.)
#set the parameters
k1 = 0.3
k2 = 0.4
BZVin = 1
DOsat = 11
Pars=[k1,k2,BZVin,DOsat]
outputmodel=RiverModel(Pars,Mtime_d)
fig,ax = plt.subplots()
ax.plot(outputmodel[0], outputmodel[1],'k--', label='BOD')
ax.plot(outputmodel[0], outputmodel[2],'k-', label='DO')
ax.legend(loc=7)
ax.set_xlabel('Time')
ax.set_ylabel(r'$mg\;l^{-1}$', fontsize = 18)
Out[15]:
So, the question is, which parameter does have the largest effect on the output of DO or BZV?
First, let's test the local sensitivity, but we will calculate it for different values of $k_1$ to check the local characteristic of the calculation:
In [16]:
k1t=[0.1,0.2,0.3,0.4,0.5] #values to check
perturbation = 0.0001
for k1 in k1t:
k2 = 0.4
BZVin = 1
DOsat = 11
k1_1 = k1+k1*perturbation
Pars=[k1_1,k2,BZVin,DOsat]
k1_plus=RiverModel(Pars,Mtime_d)[2]
k1_2 = k1-k1*perturbation
Pars=[k1_2,k2,BZVin,DOsat]
k1_min=RiverModel(Pars,Mtime_d)[2]
CAS_k1 = (k1_plus-k1_min)/(2.*perturbation*k1)
plt.plot(Mtime_d, CAS_k1, label = 'k1 = '+str(k1))
plt.legend(loc=4)
plt.ylabel('CAS')
So, we see the dependence of the sensitivity in function of the parameter value itself. So let's go for the global sensitivity of this model
(For a full theoretical description of the Morris screening method, I advice to read Global Sensitivity Analysis, The Primer, Saltelli A.. Here only a short introduction is given.)
The Morris screening method calculates so-called 'Elementary Effects': $$EE_i = \frac{y(p_i + \Delta)- y(p_i)}{\Delta}$$ (comparable to the numerical approach in the local sensitivity, but here the stepsize is fixed and larger)
By calclualting multiple of these Elementary effects (taken from a uniform distribution) and evaluating the mean ($\mu$), the mean of the absolute values (\mu^*) and the standard deviation ($\sigma$) of them, a characterisation of the parameters is done:
In [17]:
from IPython.display import Image
Image(filename='morris.png')
Out[17]:
The implementation is already done as part of the pySTAN-python package. In the folder sensitivity, the aprt needed for the Morris screening is present. By loading this part of the package, we will be able to do the Morris screening. More information is also present here .
In [18]:
import os
import sys
sys.path.append(os.path.join(os.getcwd(),'sensitivity'))
from sensitivity import *
First, we load our parameters and define the ranges in between we want to do the analysis, for each parameter we define the lower and upper boundary and the name.
In [19]:
Pars = [(0.1,0.6,r'$k_1$'),(0.1,0.6,r'$k_2$'),(10.,12.,r'$DO_{sat}$'),
(0.1,2.0,r'$BZV_{in}$')]
print Pars
We set up the Morris screening to be able to do the screening, here we're using our own model, called 'external'
In [20]:
morris = MorrisScreening(Pars, ModelType = 'external')
The implementation tries to optimize the reach in parameter space, by taking samples in such a way that the distance between different trajectories are maximized. This can be done with 'optmized_Groups'. Checking the result visually is done with 'optimized_diagnostic':
In [21]:
# calculate an optimized set of parameter sets to run model
OptMatrix, OptOutVec = morris.Optimized_Groups(nbaseruns=150, intervals = 4,
noptimized=10, Delta = 0.4)
# Check the quality of the selected trajects
morris.Optimized_diagnostic(width=0.15)
total number of model runs is noptimized(number of pars +1) = 10(4+1)
In [22]:
morris.parset2run.shape
Out[22]:
For all the parameters in morris.parset2run, we now need to run the model and save the result. An important step here is which information we want to extract from the model output. Different options are available and the choice is dependent from the specific model and research question:
For this case, we will check DO and we want to now which parameter is influencing the most the oxygen minimum value that is achieved. Hence, characterizing this 'depression' of the oxygen as accurate as possible is essential to evaluate the potential harm the contamination is applying.
Running the model is completely similar as the Monte Carlo running in the first part.
However, in this case, the samples were prepared by the Morris-method!
In [23]:
#number of runs we have to do
Nruns = morris.parset2run.shape[0]
#We save the output in an array with in the rows the different outputs
MCoutputs = np.zeros((Nruns, 3)) #depends on the outputs
Mtime_d = np.linspace(0.,25.,250.) #model output time
for i in range(Nruns):
#set the parameters
param = morris.parset2run[i]
outputmodel=RiverModel(param,Mtime_d)[2]
#only take the minimum value of DO:
MCoutputs[i,0] = outputmodel.min() #Take the DO minimum here
MCoutputs[i,1] = outputmodel.mean() #Take the DO average
MCoutputs[i,2] = outputmodel[50] #Take the DO at timestep 50
In [24]:
#Calculate the Morris screening diagnostics
morris.Morris_Measure_Groups(MCoutputs)
#plot the mu-barchart to rank the parameters
morris.plotmu(outputid=2, ec='grey',fc='grey') #change to 2 and compare!!
Out[24]:
In [25]:
#plot the mu* in function of the std to evaluate parameter importance and parameter interactions
morris.plotmustarsigma(zoomperc = 'none', outputid = 2)
#outputid=0 when only one output. If you're considering multiple outputs, this identifies the output
Out[25]:
In [26]:
#Save the results in a text-file
morris.latexresults()
Sensitivity analysis is useful in comparing the effect of the uncertain input (parameters) on the output. Parameters with large sensitivity values will contribute more in the prediction uncertainty. Moreover, this information is also useful for parameter estimation, since these parameters are influencing the model fit most.
When the values of you're model parameters are known, local sensitivity analysis can give a lot of useful information with a limited set of runs. However, in complex non-linear models, global sensitivity analysis can give information about the influence taking into account the entire parameter space.
When applying global sensitivity analysis, it is important to consider the output variable that is used and the latter is always in function of the research question. Testing with multiple is always recommended!
Other methods are available in different formats and environments, for an overview of methods, with nice explanation when to use which method, I would advice following book:
Saltelli, Andrea, Marco Ratto, Terry Andres, Francesca Campolongo, Jessica Cariboni, Debora Gatelli, Michaela Saisana, and Stefano Tarantola. Global Sensitivity Analysis, The Primer. John Wiley & Sons Ltd, 2008.
The algorithms are available in:
For the latter, you can always contact me: stvhoey.vanhoey@ugent.be