Sensivity analysis with python using SALib

Written by Sarah Juricic

September 21st 2018



In [62]:

    
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Objectives : with this notebook you will

understand the very basics of a sensitivity analysis and the specificity of an RBD-FAST analysis
use on your own the python library SALib to perform an RBD-FAST analysis

NOTE : for more detail on sensitivity analysis in general, refer to the café focus by Jeanne Goffart (april 2017), presentation is on the server.

CAUTION : the sensitivity analysis tools should be used and analyzed with care. Without proper design, you won't be able to draw any conclusions at all, or worse, draw wrong conclusions...

Introduction : how RBD-FAST (and sensitivity analysis in general) works

oops! missing picture

RBD-FAST principle explained in Goffart, Rabouille and Mendes (2015)

oops! missing picture

GENERAL WORKFLOW

define the "problem" dictionnary that states the number of inputs, their names, their bounds : prerequisite to SALib
sample values for each parameters in the space you want to study (through LHS sampler of SALib)
run your model/workflow and save the output(s) you're interested in (your own code)
use a SA method (from SALib analyze tools)

and a surpriiiise !

Define the "problem dictionnary"

It is a regular python dictionnary object with mandatory keys :

'num_vars'
'names'
'bounds'

TRICKY ALERT Defining the bounds is one of the trickiest part of a sensitivity analysis

Do read the literature on that subject, what your fellow researchers have set as bounds for their studies etc...

Here we have chosen to focus on a specific area of all the possible values taken by the parameters : ∓ 10% around a value.



In [5]:

    
# STATE THE PROBLEM DICTIONNARY
# what will be varying (=inputs) ? in what bounds ?

problem = {
    'num_vars': 3,
    'names': ['x1', 'x2', 'x3'],
    'bounds': [[-3.14159265359, 3.14159265359],
               [-3.14159265359, 3.14159265359],
               [-3.14159265359, 3.14159265359]]
}

Draw a number of samples from your problem definition



In [6]:

    
# say we want to draw 150 samples

num_samples = 150



In [7]:

    
# we draw a Latin Hypercube Sampling (LHS) that is fitted for an RBD FAST analysis
# (other sampling metods available in the library though)

from SALib.sample.latin import sample

all_samples = sample(problem, num_samples)

Run your own model/workflow to get the output of interest

Here, you use your own code, can be a python code like here, can call any other program (dymola, energy-plus, etc...)

We use a lumped thermal model, aka RC model.

The scientific question behind (it's not just a stupid example ^^) : can the model be entirely calibrated ?

the parameters that have not influence on the output (indoor temperature) cannot be determined by any calibration algorithm
and it's also ok, it means that we can just fix its value without interfering with the calibration process, and it reduces the dimensions of the inverse problem --> also good !



In [12]:

    
# this is where you define your own model, procedure, experiment...
from SALib.test_functions import Ishigami
import numpy as np

def run_model(x1, x2, x3):
    """
    COPY HERE YOUR OWN CODE
    
    the function takes as input 1 sample
    returns 1 or more output
    """
    
    # Delete from HERE =======================================
    
    # As an example, we'll look at the famous Ishigami function
    # (A and B are specific parameters for the Ishigami function)
    A = 7
    B = 0.1
    y = (np.sin(x1) +
         A * np.sin(x2)**2 + 
         B * x3 ** 4 * np.sin(x1))

    # ========= TO HERE and replace with your own piece of code
    
    return y

NOTE : you could also use outputs from a different program, you would then have to upload your data into a python numpy array.



In [19]:

    
# run your model, procedure, experiment for each sample of your sampling

# unpack all_samples into 3 vectors x1, x2, x3
x1, x2, x3 = all_samples.T

# run the model, all samples at the same time
ishigami_results = run_model(x1, x2, x3)

SALib analysis tools

SALib is a python based library to perform sensitivity analysis.

So far, it contains the following analysis tools :

FAST - Fourier Amplitude Sensitivity Test
RBD-FAST - Random Balance Designs Fourier Amplitude Sensitivity Test
Method of Morris
Sobol Sensitivity Analysis
Delta Moment-Independent Measure
Derivative-based Global Sensitivity Measure (DGSM)
Fractional Factorial

RBD-FAST

Reminder : Why do we want to use RBD-FAST ??

not only does it robustly calculates sensitvity indices as accurately as a Sobol method with much fewer samples
but also it is based on a LH sampling : the output itself is exploitable and interesting to look at (unlike Morris method where the samples used for the calculation of the indices are useless afterwards).



In [17]:

    
from SALib.analyze import rbd_fast



In [20]:

    
# Let us look at the analyze method

rbd_fast.analyze(problem=problem, Y=ishigami_results, X=all_samples)









    Out[20]:





{'S1': [0.3248982740323958, 0.5273048950859327, -0.03802652771291007],
 'names': ['x1', 'x2', 'x3']}

It is a dictionnary with a single key 'S1', corresponding to a list of 6 items.
--> First order indices of all 6 input variables



In [21]:

    
# storing the first order indices of the analyze method
si1 = rbd_fast.analyze(problem=problem, Y=ishigami_results, X=all_samples)['S1']



In [23]:

    
# make nice plots with the indices (looks good on your presentations)
# do not use the plotting tools of SALib, they are made for the method of Morris ...

fig, ax = plt.subplots()
fig.set_size_inches(18, 5)

ax.tick_params(labelsize=18)

# ===== X-AXIS =====
ax.set_xticks(np.arange(problem['num_vars']))
ax.set_xticklabels(problem['names'])
ax.set_xlim(xmin=-0.5, xmax=problem['num_vars'] - 0.5)

# ===== Y-AXIS =====
ax.set_ylabel('RBD-FAST\nSensitivity indices\n', fontsize=25)

# ===== BARS REPRESENTING THE SENSITIVITY INDICES =====
ax.bar(np.arange(problem['num_vars']), si1,
       color='DarkSeaGreen');

#in striped grey : not significant indices 
ax.fill_between(x=[-0.5, 5.5], y1=-0.1, y2=0.1, color='grey', alpha=0.2, hatch='//', edgecolor='white');



In [29]:

    
# take a closer look to undestand interactions (looks even better on your presentations)
# this part can be done without analyzing with SALib, just with the ouput from the samples

fig, ax = plt.subplots()
fig.set_size_inches(8, 7)

ax.tick_params(labelsize=15)
ax.set_title('Output of the model studied (Ishigami)\n'
             'according to the value of the 2 most influent parameters',
             fontsize=20)

# ===== SCATTER =====
size = np.ones(num_samples) * 75

sc = ax.scatter(x1, x2,
                c=ishigami_results,
                s=size,
                vmin=ishigami_results.min(),
                vmax=ishigami_results.max(),
                cmap='seismic',
                edgecolor=None)


# ===== X-AXIS =====
ax.set_xlim(xmin=problem['bounds'][0][0], xmax=problem['bounds'][0][1])
ax.set_xlabel('Parameter 1', fontsize=20)


# ===== Y-AXIS =====
ax.set_ylim(ymin=problem['bounds'][1][0], ymax=problem['bounds'][1][1])
ax.set_ylabel('Parameter 2', fontsize=20)

# ===== COLORBAR =====
ticks = np.arange(ishigami_results.min(), ishigami_results.max(), 5)
cb = plt.colorbar(sc, ticks=ticks);
cb.ax.set_yticklabels([str(round(i,1)) for i in ticks])

fig.tight_layout();

Event without running the sensitivity analysis with the analyze module, this graph shows us some strong non-linearities (where blue is very near red). This only tells us that we should study this model with more samples, 150 is not enough.

Surprise 5th module

Basic convergence check : NOT in SALib BUT definitely mandatory

Principle Perform the SA from a sub-sample of 50, 60, 70, ... up to the total 300. We should see that the indices stabilize around a value. If not, we need more samples !



In [53]:

    
def conv_study(n, Y, X):
    # take n samples among the num_samples, without replacement
    subset = np.random.choice(num_samples, size=n, replace=False)
    return rbd_fast.analyze(problem=problem,
                            Y=Y[subset],
                            X=X[subset])['S1']

all_indices = np.array([conv_study(n=n, Y=ishigami_result, X=all_samples)
                        for n in np.arange(50, num_samples + 1, 5)])



In [54]:

    
# convergence check
fig, ax = plt.subplots()
fig.set_size_inches(15,8)

ax.set_title('Convergence of the sensitivity indices', fontsize=20)

ax.tick_params(labelsize=16)

ax.set_xlim(xmin=0, xmax=(num_samples - 50)//10)
ax.set_xticks(np.arange(0,(num_samples - 50)//10 +1,5))
ax.set_xticklabels([str(i) for i in range(50,num_samples+1,50)])

ax.set_ylim(ymin=-0.15, ymax=1)


for p,param in enumerate(problem['names']):
    ax.plot(all_indices[:,p], linewidth=3, label=param)

ax.fill_between(x=[0,(num_samples - 50)//10], y1=-0.15, y2=0.1, color='grey', alpha=0.2, hatch='//', edgecolor='white',
              label='REMINDER : not significant')
    
ax.legend(fontsize=20, ncol=4);

Last but not least the BOOTSTRAP principle : select a subset of n_sub samples, with n_sub < num_samples (say on 200 over 300) and perform the sensitivity analysis on that subset. Repeat 1000 times that operation. The indices will vary : the bigger they vary the larger the influence of the samples (aka some of the samples influence greatly the outcome of the SA).

The values taken by the indices are trustworthy if the indices show robustness in the bootstrap process (i.e. the "confidence intervals" are not too wide). In any case, it is a valuable information about the uncertainty around the indices.



In [59]:

    
def bootstrap(problem, Y, X):
    """
    Calculate confidence intervals of rbd-fast indices
    1000 draws
    returns 95% confidence intervals of the 1000 indices
    
    problem : dictionnary as SALib uses it
    X : SA input(s)
    Y : SA output(s)
    """
    all_indices = []
    
    for i in range(1000):
        X_new = np.zeros(X.shape)
        Y_new = np.zeros(Y.shape).flatten()
        # draw with replacement
        tirage_indices = np.random.randint(0, high=Y.shape[0], size = Y.shape[0])
        
        for j, index in enumerate(tirage_indices):
            X_new[j,:] = X[index, :]
            Y_new[j] = Y[index]
            
        all_indices.append(rbd_fast.analyze(problem=problem, Y=Y_new, X=X_new)['S1'])
    
    means = np.array([i.mean() for i in np.array(all_indices).T])
    stds = np.array([i.std() for i in np.array(all_indices).T])
    return np.array([means - 2 * stds, means + 2 * stds])



In [60]:

    
# Get bootstrap confidence intervals for each index

bootstrap_conf = bootstrap(problem=problem, Y=ishigami_results, X=all_samples)



In [61]:

    
# make nice plots with the indices (looks good on your presentations)
# do not use the plotting tools of SALib, they are made for the method of Morris ...

fig, ax = plt.subplots()
fig.set_size_inches(8,4)

ax.tick_params(labelsize=18)

# ===== X-AXIS =====
ax.set_xticks(np.arange(problem['num_vars']))
ax.set_xticklabels(problem['names'])
ax.set_xlim(xmin=-0.5, xmax=problem['num_vars'] - 0.5)

# ===== Y-AXIS =====
ax.set_ylabel('RBD-FAST\nSensitivity indices\n', fontsize=25)

# ===== BARS REPRESENTING THE SENSITIVITY INDICES =====
ax.bar(np.arange(problem['num_vars']), si1,
         color='DarkSeaGreen');

# ===== LINES REPRESENTING THE BOOTSTRAP "CONFIDENCE INTERVALS" =====
for j in range(problem['num_vars']):
    ax.plot([j, j], [bootstrap_conf[0, j], bootstrap_conf[1, j]],
              'k')

#ax.fill_between(x=[-0.5, 5.5], y1=-0.1, y2=0.1, color='grey', alpha=0.2, hatch='//', edgecolor='white');

Comment below 0.1 is not significant also because the bootstrap shows a standard deviation of +- 0.1

Lucky you, some references for you to nerd with

S. Tarantola, D. Gatelli and T. Mara (2006) “Random Balance Designs for the Estimation of First Order Global Sensitivity Indices”, Reliability Engineering and System Safety, 91:6, 717-727

Elmar Plischke (2010) “An effective algorithm for computing global sensitivity indices (EASI) Reliability Engineering & System Safety”, 95:4, 354-360. doi:10.1016/j.ress.2009.11.005

Jean-Yves Tissot, Clémentine Prieur (2012) “Bias correction for the estimation of sensitivity indices based on random balance designs.”, Reliability Engineering and System Safety, Elsevier, 107, 205-213. doi:10.1016/j.ress.2012.06.010

Jeanne Goffart, Mickael Rabouille & Nathan Mendes (2015) “Uncertainty and sensitivity analysis applied to hygrothermal simulation of a brick building in a hot and humid climate”, Journal of Building Performance Simulation. doi:10.1080/19401493.2015.1112430

Jeanne Goffart, Monika Woloszyn (2018) RBD-FAST : une méthode d’analyse de sensibilité rapide et rigoureuse pour la thermique du bâtiment, IBPSA France, Bordeaux

Your turn now ! Because it doesn't look very well with 150 samples...

... Download this notebook and try for yourself to make the analysis with 250, or 500 or even 1000 samples.

Then answer these questions :

Do the sensitivity indices change with more samples ?
How does the convergence look like with more samples ? With how many samples would you say we have reached convergence ?
How do the bootstrap "confidence intervals" vary? How many samples would you say are satisfactory for the estimation of the sensitivity indices ?