We will do the exercises in Python, by using some packages:
In [34]:
#This is only relevant for the course environment
%pylab inline
In [32]:
#How to work with distributions in numpy
import numpy as np #http://docs.scipy.org/doc/numpy/reference/
import matplotlib.pyplot as plt
from scipy.stats import norm
import random
In [35]:
#some thigs to recognize in the code:
x=np.linspace(0.,10,500) #create an array with values between [0,10] in 100 steps
y=np.random.normal(0.0,10.0,x.size) #take elements from a uniform distribution
#plt.plot(x,y,'ro') #plot the numbers (line or markers)
plt.hist(y, edgecolor= 'white', facecolor='gray', normed=True) #make a histogram,
#frequency of values in certain intervals)
#calculations
w=x**2
z=np.sqrt(x) #square root
ze = np.ones(11) #make an array with zero values cfr. np.ones()
print ze #print this output
In [33]:
ze
Out[33]:
In [36]:
#getting information by using tab-function!!
#np.arange(
#writing definitions in order to run code multiple times with different inputs:
def example_def(input1,input2):
'''
In this part comes the information about what the definition is doing
eg. multiply the two input values
'''
def_output = input1*input2
return def_output
#Let's use the function:
print example_def(2.,8.)
In case we know our uncertainty of the inputs in the model, we can propagate these errors towards the model output. As such, the amount of 'variability' attributed by the different sources are propagated towards the model output.
We will focus on three methods:
Remark: We also assume that the model structure we're using is 'correct'
In the case of linear models, where the errors have a normal distribution and the uncertainties have no significant covariance (i.e. are not dependent on eachother), classical error-propagation rules can be defined (no correlation between input factors considered):
EXAMPLE - linear model
We define the linear 'model', describing the seperation between anorganic and organic nitrogen in a mixed solution: $$ \quad \quad \quad N_{tot} = N_{NH_4} + N_{org}$$
Let' say $N_{NH_4}$ and $N_{org}$ have a known ucertainty, expressed as a variance $\sigma_{NH_4}^2$ and $\sigma_{org}^2$. If we then want to propagate the uncertainty for this linear model, we can calculate the variance on the $N_{tot}$ by following calculation:
If the uncertainty of the two measurements would be dependent from eachother, the relation should be incorporated in the uncertainty: $$\sigma_{tot}^2 = \sigma_{NH_4}^2 + \sigma_{org}^2 + 2\cdot \rho \cdot \sigma_{NH_4}\sigma_{org}$$ with $\rho$ the correlation between both.
In [47]:
import numpy as np
import matplotlib.pyplot as plt
import random
#NH4
meanNH4 = 3.
stdNH4 = 0.6
NH4 = np.random.normal(loc=meanNH4, scale=stdNH4, size=2000)
#Norg
meanorg = 5.
stdorg = 0.2
Norg = np.random.normal(loc=meanorg, scale=stdorg, size=2000)
#plot NH4
plt.hist(NH4, bins = 20, color='k')
plt.vlines(meanNH4,0.,300.,color='white') #the mean of the uncertainty of NH4
plt.vlines(meanNH4+1.96*stdNH4,0.,300.,color='white',linestyle='--') # mu + 1.96*std
plt.vlines(meanNH4-1.96*stdNH4,0.,300.,color='white',linestyle='--') # mu - 1.96*std
#95% of the cirve lies in between these two boundaries
#plot the Norg
plt.hist(Norg, bins = 20, color='gray', edgecolor='None')
plt.vlines(meanorg,0.,300.,color='white') #the mean of the uncertainty of NH4
plt.vlines(meanorg+1.96*stdorg,0.,300.,color='white',linestyle='--') # mu + 1.96*std
plt.vlines(meanorg-1.96*stdorg,0.,300.,color='white',linestyle='--') # mu - 1.96*std
Out[47]:
For this simple model, the only information we need is the standard deviation of both model inputs
So if we get measurements values of $N_{org}$ (7mg/l) and $N_{NH_4}$ (3mg/l), we can give an estimate of the model output together with the uncertainty:
In [48]:
#Calculation of the Ntot uncertainty, assuming none dependent errors
rho = 0.7
stdNH4 = 0.6 #uncertainty of the NH4 input
stdorg = 0.2 #uncertaint of the Norg input
std_tot = np.sqrt(stdNH4**2 + stdorg**2) + 2* rho*stdNH4*stdorg
print 'The standard deviation of the linear model output is ', std_tot
In [49]:
N_tot = 3. + 5.
print 'Estimated total Nitrogen is', N_tot,'+/-', std_tot ,'mg/l'
Let's visualize this information (we work with normal distributions!)
In [50]:
#plot the linear model output distribution by doing this for many values
Ntot = np.random.normal(loc=meanNH4+meanorg, scale=std_tot, size=2000)
plt.hist(Ntot, bins = 20, color='k')
plt.vlines(meanNH4+meanorg,0.,300., color='white') #the mean of the uncertainty of NH4
plt.vlines(meanNH4+meanorg+1.96*std_tot,0.,300.,color='white',linestyle='--') # mu + 1.96*std
plt.vlines(meanNH4+meanorg-1.96*std_tot,0.,300.,color='white',linestyle='--') # mu - 1.96*std
plt.hist(NH4, bins = 15, facecolor='None', edgecolor = 'k')
plt.hist(Norg, bins = 20, facecolor='None', edgecolor='gray')
Out[50]:
------------------------------------------------------------------------------------------
The python file ex1_linprop.py is prepared for the following exercise; based on the following instructions, adapt the script in the #FILL IN places:
We define the linear 'model', describing the potential damage $P_d$ on a municipality by flooding as the combined effect of the flooding risk $R_f$ and the habitant concentration $H_c$: $$ \quad \quad \quad P_{d} = R_{f} \cdot H_{c}$$
Suppose we know the uncertainty/variability of the flooding risk and habitant concentration, having a normal distribution and being independent from eachother:
Calculate the uncertainty of the model output and visualize the uncertainty for $R_f = 1.5$ and $H_c = 0.5$ in a normal distribution around your model output value.
Rule for multiplication: $$z = x\cdot y\; \rightarrow \frac{\sigma_z}{z} = \sqrt{(\frac{\sigma_x}{x})^2 + (\frac{\sigma_y}{y})^2}$$
if the inputs are correlated with correlation factor $\rho_{x,y}$: $$z = x\cdot y\; \rightarrow \frac{\sigma_z}{z} = \sqrt{(\frac{\sigma_x}{x})^2 + (\frac{\sigma_y}{y})^2} + 2\frac{\sigma_x \sigma_y}{x\cdot y}\rho_{x,y}$$
------------------------------------------------------------------------------------------Problems!
As a result, we need more advanced and flexible methods (but basically, the idea remains the same)!!
Intermezzo: linear/non-linear models
Model is linear if $f(aA + bB) = af(A) + bf(B)$, but this is not always easy to check. So it is better to check if the partial derivative of the function (model) $f$ towards the variable $A$ is not dependent on $A$ anymore to evaluate the linearity (if not dependent, linear): $$\frac{\delta f}{\delta A}\neq g(A)$$
For example the model $\mu = \frac{\mu_{max}S}{K_s+S}$ (growth rate of micro-organisms)
(calculation rule for derivative of $f(x)=g(x)/h(x) \rightarrow \frac{g'(x)h(x)-h'(x)g(x)}{h(x)^2}$)
$$\frac{\delta \mu}{\delta S} = \frac{\mu_{max}S-\mu_{max}(K_s+S)}{(K_s +S)^2} = \frac{\mu_{max}K_s}{(K_s+S)^2} = g(S)$$There are situations where the computations are more complex, so we need a more general way of deriving expressions for the propagation of experimental errors. In linear error propagation, a linear approximation of the model is used to calculate the model output uncertainty, making use of the first order term of the Taylor Series Approach.
The uncertainty on a model output $y$ as a consequence of the different uncertainty model inputs ($n$ uncertaint inputs in total) is given by:
with $\frac{\delta y}{\delta x_i}$ the partial derivative of the output y towards the input x (given without proof).
As such, this is actually a generalisation of the 'classic' error propagation, but use is still limited:
(Taylor series apporach, see also http://en.wikipedia.org/wiki/Taylor_series, of a function f(x) $f(a)+\frac {f'(a)}{1!} (x-a)+ \frac{f''(a)}{2!} (x-a)^2+\frac{f^{(3)}(a)}{3!}(x-a)^3+ \cdots$, only taking first order means: $f(a)+\frac {f'(a)}{1!} (x-a)$)
EXAMPLE - non-linear model (illustrative)
The following model describes the nitrification rate (in the soil or in water) $r_{NH}$, considering the autotrophic biomass ($X_{B,A}$) constant. The conversion of ammonia is in this model dependent on the concentration of ammonia ($S_{NH}$) and the oxygen concentration ($S_O$)
$$r_NH = \frac{\mu_A}{Y_A}\frac{S_{NH}}{S_{NH}+K_{NH}}\frac{S_O}{S_O + K_{O,A}}X_{B,A}$$The first order Taylor approximation (not taking into account hte higher order terms) for the model would be (with four uncertain parameter $Y_A, K_{NH}, K_{O,A}$ and $\mu_{A}$):
Applying the linear uncertainty propagation (part 1) results in:
With all the terms resulting in a number and the $\sigma$ values the known uncertainties of the input (purely illustrative).
The uncertain inputs (parameters, initial conditions,...) are given a stochastic distribution (expert knowledge or specific characteristics). The 'deterministic' model is calculated for a set of N monte carlo runs, each with the inputs sampled from the defined distributions.
All uncertainty sources are lumped together, so he monte carlo offers no knowledge about the importance of the different uncertainty sources to the model output $\rightarrow$ see later in sensitivity analysis!
Starting from a deterministic model, we have certain inputs with an probability, we do multiple shots and get output distirbution:
In [1]:
from IPython.display import Image
Image(filename='monte_carlo2.png', width=500)
Out[1]:
Caption: Random samples are taken from the 'known' input distributions, and for each 'shot', the model gives an output. These outputs distributions are analysed te derive uncertainty information.
A priori, let's use the example of the linear model to see the similarity.
EXAMPLE - the linear model
We take the linear 'model', describing the seperation between anorganic and organic nitrogen in a mixed solution: $$ \quad \quad \quad N_{tot} = N_{NH_4} + N_{org}$$
In [43]:
#NH4
meanNH4 = 3.
stdNH4 = 0.4
#Norg
meanorg = 5.
stdorg = 0.2
#the model
def my_N_model(NH4,Norg):
#This could be any very complex model
return (NH4 + Norg)
#Basic Monte-Carlo setup:
number_of_montecarlo = 2000 #play with this value to see the importance of suffiecient samples
Ntot = np.zeros(number_of_montecarlo) #we prepare the model output array to save results in
for i in range(number_of_montecarlo):
#Only a little trick to follow the model running:
if i%100 == 0:
print 'Monte Carlo Run ',i,' is running'
#we sample from the input distributions of the model inputs
Norg = numpy.random.normal(loc=meanorg, scale=stdorg, size=1)
#Norg = numpy.random.uniform(2.,6.0, size=1) #-> see question later
NH4 = numpy.random.normal(loc=meanNH4, scale=stdNH4, size=1)
#we calculate the model for these sampled values and save them in our prepared array
Ntot[i] = my_N_model(NH4,Norg)
#Let's make a histogram of the result
plt.hist(Ntot, bins = 20, color='k', )
#We don't know the standard deviation of this 'unknown' distribution,
#but can easily calculate percentile values:
p1 = np.percentile(Ntot, 2.5) #2.5 percentile
p2 = np.percentile(Ntot, 97.5) #97.5 percentile
plt.vlines(p1,0.,100.,color='white',linestyle='--')
plt.vlines(p2,0.,100.,color='white',linestyle='--')
Out[43]:
Basic Monte Carlo simulation techniques mean:
In essence, this is the same as the error propagation, but we can use the Monte Carlo method towards:
EXAMPLE - Respirometric batch reactor model
The following respirometric model for aerobic degradation of acetate without biomass storage will be used (Gernaey, 2002). It predicts the model output variable exogenous oxygen uptake rate of bacteria: $OUR_{ex}$ ($mg L^{-1} d^{-1}$).
with
$S$: substrate concentration ($mgCOD\; L^{-1}$ ),
$X$: biomass concentration ($mg COD\; L^{-1}$),
$\tau$: retardation on biomass activity ($d^{-1}$ ),
$Y$: yield of the biomass (-),
$\mu_{max}$: maximum growth rate ($d^{-1}$),
$K_s$: half-saturation Monod constant ($mg COD\; L^{-1}$)
$\quad$(if $\mu$ equals half of $\mu_{max}$, the substrate $S$ equals $K_s$).
In [2]:
from IPython.display import Image
Image(filename='batcheactor.png', width=300)
Out[2]:
Caption: How the reactor with sludge and bacteria looks like. We add small pulse of substrate (Acetate) to the reactor, the bacteria consume substrate and use oxygen, we measure Dissolved Oxygen and (indirectly) OUR
The parameters in the model ($\tau, Y, K_s$ and $\mu_{max}$) are uncertain and we characterise their uncertainty with uniform distribution. Moreover, since we can not easily measure the initial biomass when starting the experiment, we also consider the initial biomass $X_O$ as an uncertain input of our model.
Reference:
Gernaey, K., Petersen, B., Nopens, I., Comeau Y. and P.A. Vanrolleghem, Modeling aerobic carbon source degradation processes using titrimetric data and combined respirometric-titrimetric data: experimental data and model structure, Biotechnology and bioengineering, 79(7), 741-753, 2002
To implement this model, we bring the differential equations into code. Based on two definitions, we will be able to run the model with different parameter sets.
In [14]:
#The model consist of differential equations, which needs integration (solver-based)
from scipy.integrate import odeint
def deriv_works(u,t,Pars,Const): #Derivative used in the general model function
'''
Differential equations of the respirometric model in code
'''
#Define the parameters
mumax = np.float64(Pars[0])
Y = np.float64(Pars[1])
Ks = np.float64(Pars[2])
tau = np.float64(Pars[3])
b = np.float64(Const[0])
kla = np.float64(Const[1])
SOeq = np.float64(Const[2])
Monod=mumax*(u[1])/(u[1]+Ks) #Monod Kinetic
Expo=1.0-np.exp(-t/tau) #Helpfunction for clearer implementation
#The model equations
dXdt = (Expo*Monod-b)*u[0] #Biomassa
dSsdt = -(1.0/Y)*Expo*Monod*u[0] #Substraat
dOdt = kla*(SOeq-u[2])-((1-Y)/Y)*Expo*Monod*u[0] #Oxygen
return np.array([dXdt,dSsdt,dOdt])
def RespiroModel(Pars,Init_unc,time):
'''
Run the respirometric model
'''
#Define the constants
b = 0.62
kla = 369.7334962
SOeq = 8.4
Constt = np.array([b,kla,SOeq])
#Define the initial conditions (Constants)Ss0
Ss0 = 58.4899
#Define the initial conditions (Uncertain) -> X0
X0=Init_unc[0]
yinit = np.array([X0,Ss0,SOeq])
#Define the necessary parameters
mumax = np.float64(Pars[0])
Y = np.float64(Pars[1])
Ks = np.float64(Pars[2])
tau = np.float64(Pars[3])
#Solve with LSODA scheme
y,infodic=odeint(deriv_works,yinit,time,full_output=True,
printmessg=False, args=(Pars,Constt))
#Get outputs
X = y[:,0]
Ss = y[:,1]
O = y[:,2]
OUR_ex=((1-np.exp(-time/tau))*mumax*(1-Y)/Y*Ss/(Ss+Ks)*X)/(24*60)
#when using this deifnition, we get the time, biomass X, substrate S,
#oxygen O and Oxygen uptake rate OUR_ex back,
#together with some information about the model
return [time,X,Ss,O,OUR_ex,infodic]
Once we have the model implementation, we can now run the model based on the RespiroModel-function:
In [52]:
#SET TIME
Mtime_d = np.arange(0.,0.05,0.0005)
#Since everything is in day,
#this reprecents more or less 1 hour and 12 minutes of time.
#set X0 as initial condition
X0=6.75632395e+02
#set the parameter values
mumax=4.
Y=0.78
Ks=0.4
tau=2.25e-04
Pars=[mumax,Y,Ks,tau]
Init_unc=np.array([X0])
#RUN THE MODEL WITH THE RespiroModel-command, with the specified parameter values
#and X0 value everything is saved in the outputmodel, which is a list with the outputs
outputmodel=RespiroModel(Pars,Init_unc,Mtime_d)
#check if the integration succesful:
print outputmodel[-1]['message']
We can make a plot with the different outputs of the model:
In [51]:
fig = plt.figure(figsize=(10,10))
fig.subplots_adjust(hspace=0.2, wspace = 0.3)
ax1 = fig.add_subplot(221)
ax1.plot(outputmodel[0],outputmodel[1],'k',label = 'X')
ax1.set_xticklabels([])
ax1.legend(loc=4)
ax2 = fig.add_subplot(222)
ax2.plot(outputmodel[0],outputmodel[2],'k',label = 'S')
ax2.set_xticklabels([])
ax2.legend(loc=4)
ax3 = fig.add_subplot(223)
ax3.plot(outputmodel[0],outputmodel[3],'k',label = '0')
ax3.legend(loc=4)
ax3.set_xlabel('Time')
ax4 = fig.add_subplot(224)
ax4.plot(outputmodel[0],outputmodel[4],'k',label = 'OUR')
ax4.legend(loc=4)
ax4.set_xlabel('Time')
Out[51]:
Interpretation of the model:
In [53]:
#run Monte Carlo for OUR
#########################
MCruns = 1000
#We save the output in an array with in the rows the different outputs
#and in the columns the timesteps
MCoutputs = np.zeros((MCruns, Mtime_d.size))
for i in range(MCruns):
#Sample the uncertain parameters, assume we know these values! -> shot
X0 = np.random.uniform(650.,700.,1)
mumax=np.random.uniform(3.5,4.5,1)
Y=np.random.uniform(0.77,0.79,1)
Ks=np.random.uniform(0.3,0.5,1)
tau=np.random.uniform(0.00001,0.001,1)
#Run the model for the sampled parameters
Init_unc=X0
Pars=[mumax[0],Y[0],Ks[0],tau[0]]
modeloutput = RespiroModel(Pars,Init_unc,Mtime_d)
#Save only the OUR output of the model to analyze
MCoutputs[i,:]=modeloutput[4]
We have now stored all the OUR model outputs in 1 array. This can be analysed comparable with the linear model, by calculating the percentile values for every timestep and plotting these in time. But first , just let plot all the model runs:
In [54]:
for i in range(MCruns):
plt.plot(modeloutput[0],MCoutputs[i,:])
We can get more information about the distribution of the model output uncertainty for every timestep and can plot also percentile intervals in function of the time:
In [55]:
#Coose a timestep and plot the histogram for this timestep:
timestep = 0.01
#plot a histogram at the place where the modelled timestep equals the timestep given
plt.hist(MCoutputs[:,np.where(Mtime_d==timestep)[0]], bins = 20, color='k')
#We plot again the percentile values:
p1 = np.percentile(MCoutputs[:,np.where(Mtime_d==timestep)[0]], 2.5) #2.5 percentile
p2 = np.percentile(MCoutputs[:,np.where(Mtime_d==timestep)[0]], 97.5) #97.5 percentile
plt.vlines(p1,0.,100.,color='white',linestyle='--')
plt.vlines(p2,0.,100.,color='white',linestyle='--')
Out[55]:
Calculating these percentiles value for every timestep gives use the uncertainty of the model in function of the time
In [56]:
p25 = np.percentile(MCoutputs, 2.5, axis=0)
#axis 0 means, the percentile is calculated along the rows,
#so every column gets a summarized version
p975 = np.percentile(MCoutputs, 97.5, axis=0)
plt.plot(modeloutput[0], p25,'k--')
plt.plot(modeloutput[0], p975,'k--')
plt.fill_between(modeloutput[0], p25, p975, color='gray')
Out[56]:
------------------------------------------------------------------------------------------
In the python file ex2_respiromodel.py, the respiromtric model is implemented and a setup as presented above is already prepared.
Monte Carlo sampling needs enough 'shots' of the distribution to be able to characterize the input and output distributions. Standard random sampling implementation is not always the most efficient way to sample the entire distribution. Other sampling procedures exist to achieve quicker a reasonably accurate random distribution:
We will focus today also on the Latin Hypercube/Stratefied sampling:
To perform the stratified sampling, the cumulative probability (100%) is divided into segments, one for each iteration of the Monte Carlo simulation. A probability is randomly picked within each segment using a uniform distribution, and then mapped to the correct representative value in of the variable’s actual distribution.
A simulation with 500 iterations would split the probability into 500 segments, each representing 0.2% of the total distribution. For the first segment, a number would be chosen between 0.0% and 0.2%. For the second segment, a number would be chosen between 0.2% and 0.4%. This number would be used to calculate the actual variable value based upon its distribution (inverse of the cumulative distribution).
Let's start with an example for 10 segments: 0%-10%, 10%-20%,...,90%-100%
In [57]:
#If we take 10 samples, we divide the cumulative distribution into 10 equal bins;
#so values 0.1,0.2,... 1.0
nsegments = 10
segments = np.linspace(0,1.,nsegments+1) #explain linspace!
print segments
In [58]:
#In each of these segments, we now sample as we would sample in a uniform distribution:
samples=np.zeros(nsegments) #here we will store our sampled values
for i in range(nsegments):
#uniform between these values
samples[i]=np.random.uniform(segments[i],segments[i+1],1)
print samples
In [23]:
#So what have we done now?
fig,ax = plt.subplots()
ax.hlines(segments,0.,1.)
ax.plot(0.5*np.ones(nsegments), samples,'ko',markersize=5)
ax.set_xticklabels([])
#compare this with a standard random sampling procedure:
#ax.plot(0.5*np.ones(nsegments), np.random.uniform(0.,1.,nsegments),'r*',markersize=8)
Out[23]:
This is basically a LH-sampling of 10 values between 0 and 1. However, in reality we also want to
To do so,
P(X <= x) = n is solved for x (you're searching the x value for which the probability is n), where n is the random point in the segment. How this is done is different for every distribution and it’s basically the matter of reversing the process of the probability function. However this is not always straightforward. For some continuous distributions, the inverse is not directly available and need to be approximated. here, we're not going into detail for the inverse calculation, but we will use the .ppf function of python scipy to solve this problem; more information/distributions: http://docs.scipy.org/doc/scipy/reference/stats.html
For normal distribution:
In [59]:
#Let's plot what we actually want to do -> normal distribution!
from scipy.stats import norm
#other distributions: http://docs.scipy.org/doc/scipy/reference/stats.html
#mean 8 and std 2.
norm_samp = norm.ppf(samples, 8.,2.) #ppf-function of python-scipy solves the inverse distribution!
fig,ax = plt.subplots()
ax.scatter(norm_samp,samples,c='k',s=30)
ax.set_xlim(0,14),ax.set_ylim(0,1)
ax.hlines(segments,0,2., linestyles='--')
#compare this with a standard random sampling procedure:
standardsam=np.random.uniform(0.,1.,nsegments)
ax.scatter(norm.ppf(standardsam, 8.,2.),standardsam,c='r',marker='*',s=40, edgecolor='None' )
Out[59]:
In [25]:
#Let's plot what we actually want to do -> uniform distribution
#min value 2; max value 14: [2,14]
#---------------------------------------------
unif_samples = 2. + samples* (14.-2.)
#---------------------------------------------
fig,ax = plt.subplots()
ax.scatter(unif_samples,samples,c='k',s=30)
ax.set_xlim(2,14),ax.set_ylim(0,1)
ax.hlines(segments,2.,14., linestyles='--')
ax.vlines(2+segments*(14.-2.),0,1., linestyles='--')
#compare this with a standard random sampling procedure:
standardsam=np.random.uniform(0.,1.,nsegments)
ax.scatter(2+standardsam*(14-2),standardsam,c='r',marker='*',s=40, edgecolor='None' )
Out[25]:
For a more dimensional space, we will mix the combinations of multiple parameters with eachother to get a sampling of the more-dimensional space:
In [5]:
from IPython.display import Image
Image(filename='LatinH_works (Small).png', width=500)
Out[5]:
Caption: 2-dimensional view of the Latin HYpercube sampling strategy.
In [26]:
#Shuffling the elements:
#--------------------------------------------------------
np.random.shuffle(unif_samples)
#--------------------------------------------------------
fig,ax = plt.subplots()
ax.scatter(unif_samples,samples,c='k',s=30)
ax.set_xlim([2,14]), ax.set_yticklabels([])
Out[26]:
We can then make a histogram of the sampling, to check if the sampling corresponds to the distribution we want:
In [27]:
from ex3_latinhypercube import lhs_uniform, lhs_normal
#FILL IN NUMBER OF SAMPLES-----------------------------------------------------
N = 1500
#------------------------------------------------------------------------------
#UNIFORM DISTRIBUTION
un_samp = lhs_uniform(N,2.,4.)
#fig
fig,ax=plt.subplots(1,2)
ax[0].hist(un_samp,facecolor='k', edgecolor='w',bins=20)
#classic random sampling in red
ax[0].hist(np.random.uniform(2,4,N),facecolor='r', edgecolor='w',bins=20)
#Normal distribution
norm_samp = lhs_normal(N,6.,0.3)
#fig
ax[1].hist(norm_samp,facecolor='gray', edgecolor='w',bins=20)
Out[27]:
We will focus only on normal and uniform distributions with LH:
For a Normal distribution, the calculation is done in the previous example
x = norm.ppf(y, mean_of_distribution,std_of_distribution)
For a uniform distribution, the calculation is then straightforward:
shuffling the elements for larger dimensionalities:
np.random.shuffle(array_to_shuffle)
FINAL REMARK: take care of large dimensionalities!
<img src="sampling_comp.png", , width=600>
The 'classic' error propagation or differential analysis are giving a good result of the output uncertainty. The main advantage is computational time, since the Monte Carlo technique is always more computer intensive!
However, since most environmental (spatial) models have many equations or can't be accessed, Monte Carlo based propagation is mostly the most generic solution.
So, consider always first if linear propagation if possible; if not, do Monte Carlo based error propagation, perferable with Latin-Hypercube sampling.
Main drawback of the uncertainty propagation:
The definition of these uncertain inputs is mostly NOT straight forward! Many distributions exist and the choice of the distribution has a direct effect on the model output (prediction) uncertainty. Definition of these distributions/ranges can be done based on
(1) repeated experiments (not easy in the field...);
(2) confidence intervals of the parameters (see later this day; we need data to do this),
(3) literature values or
(4) expert knowledge.
Next step: get more information about the model behaviour, by checking how the uncertain inputs contribute to the output uncertainty => Sensitivity Analysis
After that: Introduce (uncertain) DATA to evaluate the model behaviour!