Confidence Intervals - Benchmarking cib module against R

The confidence intervals of a distribution parameters is implemented using a bootstrap technique in the confidence_intervals_bootstrap module, or cib.

The bootstrap technique according to [1] is implemented in the cib module using numpy and scipy.

Here this implementation is benchmarked against the equivalent R module fitdistrplus [2].

This is a python notebook using R code from the rpy2 module [3].

[1] Bootstrap confidence intervals

https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading24.pdf

[2] R package fitdistrplus

https://cran.r-project.org/web/packages/fitdistrplus/fitdistrplus.pdf

[3] Using R inside Python

http://rpy2.readthedocs.io/en/version_2.8.x/introduction.html


Visund Nord - ITS Deployment

This notebook checks the critical seastates found for the ITS deployment for the Visund Nord project, August 2017.

Document 072110C001 AB.00.00.300-RT-5804-0001 - General Deployment Analyses, Campaign 1, rev. 2

I:\DCC_STA\072110C001\TNO\Docs_PDF\AB.00.00.300-RT-5804-0001_221-2A-FM-O16-00010-.pdf


@author: rarossi

Created on Mon Oct 9 13:14:17 2017


In [23]:
import sys
import time
from scipy import stats as ss
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import itertools

# Import my modules
import quantilelib as ql
sys.path.append(r'C:\Users\rarossi\Documents\git\yoshimi\Qt\ResultsVisualiser')
import confidence_interval_bootstrap as cib

# Import of R objects
from rpy2 import robjects
from rpy2.robjects.packages import importr
rfitdistrplus = importr('fitdistrplus')
rstats = importr('stats')
rbase = importr('base')

In [24]:
# R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R
# Define the Gumbel distribution in R
robjects.r('''
        # create functions for `gumbel`
        dgumbel <- function(x, a, b) 1/b*exp((a-x)/b)*exp(-exp((a-x)/b))
        pgumbel <- function(q, a, b) exp(-exp((a-q)/b))
        qgumbel <- function(p, a, b) a-b*log(-log(p))
        dgumbelmin <- function(x, a, b) 1/b*exp((x-a)/b)*exp(-exp((x-a)/b))
        pgumbelmin <- function(q, a, b) 1-exp(-exp((q-a)/b))
        qgumbelmin <- function(p, a, b) a+b*log(-log(1-p))
        ''')
#  R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R


Out[24]:
R object with classes: ('function',) mapped to:
<SignatureTranslatedFunction - Python:0x0000000014F15B48 / R:0x00000000168C5358>

In [25]:
# Load a Results.txt to use as test sample
# resfile = r'C:\Users\rarossi\Documents\git\yoshimi\Qt\ResultsVisualiser\Results.txt'
resfile = r'T:\Temp\AOA\ITS IDC\ITS splash zone - 18m radius\Results.txt'
df = pd.read_table(resfile)

variables = df.columns[3:5]
hss = set(df.WaveHs)
tps = set(df.WaveTp)
wds = set(df.WaveDirection)

# critical sea states for the ITS deployment
seastates = {'Winch1 Max Tension': [(2.0, 8, 195), (2.0, 9, 195), (2.2, 6, 195), (2.2, 7 , 165), (2.2, 7 , 180), 
                                    (2.2, 7 , 195), (2.2, 8 , 165), (2.2, 8 , 180), (2.2, 9 , 165), (2.2, 9 , 180), 
                                    (2.2, 10, 195), (2.4, 6 , 165), (2.4, 10, 165), (2.4, 10, 180), (2.4, 11, 180),
                                    (2.4, 11, 195), (2.4, 12, 195), (2.4, 13, 195), (2.6, 11, 165), (2.6, 14, 195)],
             'Sling3 Min Tension': [(2.4, 10, 165), (2.6, 7, 195), (2.6, 8, 195), (2.6, 10, 165), (2.6, 11, 165)]
            }

In [32]:
def params_fit_cib(sample, tail):
    """Returns the distribution parameters, and the confidence interval for these parameters."""
    stats_fun = ss.gumbel_r.fit if tail == 'upper' else ss.gumbel_l.fit
    pfit = stats_fun(sample)
    pci = cib.confidence_interval(sample, stats_fun, relative=False, repeat=200)
    return pfit, pci

In [33]:
# R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R
#  R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R
def params_fit_R(sample, tail):
    """Returns the distribution parameters, and the confidence intervals for these parameters."""
    # Convert sample into a R FloatVector
    rsample = robjects.FloatVector(sample)
    # use method of moments for first estimate of parameters
    b = np.std(sample)*np.sqrt(6)/np.pi
    a = np.mean(sample) - b * np.euler_gamma
    # Use R fitdist to fit a Gumbel distribution to sample
    Rfun = "gumbel" if tail == 'upper' else "gumbelmin"
    rfit = rfitdistrplus.fitdist(rsample, Rfun, start=rbase.list(a=a, b=b))
    # Use R confint to calculate the confidence intervals for the fitted
    # distro parameters
    rci = np.array(rstats.confint(rfit)).transpose()
    rfit = tuple(rfit[0])

    # ATTENTION HERE
    # The order of the returned rfit[0] seems to vary! 
    # Sometimes (loc, scale) is returned, and sometime (scale, loc). 
    # Need to check every time! And un/comment the lines below when needed.
    #rfit = tuple(np.flipud(rfit))
    #rci = np.fliplr(rci)
    
    return rfit, rci

#  R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R
# R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R

In [34]:
def get_y(params, x, tail):
    """distribution params, variate -> -log(-log(cdf))
    
    If tail is 'lower', sf = 1 - cdf, is used instead of cdf. 
    """
    if tail == 'upper':
        return -np.log(-ss.gumbel_r(*params).logcdf(x))
    else:
        return -np.log(-ss.gumbel_l(*params).logsf(x))

In [35]:
count = 0
for var, seasts in seastates.items():
    for hs, tp, wd in seasts:
        count += 1

        sample = df[(df.WaveHs == hs) & (df.WaveTp == tp) & (df.WaveDirection == wd)][var]

        if sample.empty:
            continue

        print('#'*80)
        print('{} - Hs {} - Tp {} - wd {}'.format(var, hs, tp, wd))
     
        tail = 'upper' if 'Max' in var else 'lower'
        sample = np.array(sample)
       
        # Now move into R and fit statistics on sample, and do the same in cib
        rfit, rci = params_fit_R(sample, tail)
        pfit, pci = params_fit_cib(sample, tail)

        # ###
        # Compare what has been done so far
        print('Pack \t Loc    \t Scale')
        print('R \t %.3f \t %.3f' % (rfit))
        print('Scipy \t %.3f \t %.3f' % (pfit))
        print('Confidence Interval from R')
        print(' 2.5%%\t %.4f\t%.4f' % tuple(rci[0]))
        print('97.5%%\t %.4f\t%.4f' % tuple(rci[1]))
        print('Confidence Interval from cib')
        print(' 2.5%%\t %.4f\t%.4f' % tuple(pci[0]))
        print('97.5%%\t %.4f\t%.4f' % tuple(pci[1]))
        print('Relative % differences')
        print(str((100 - 100*pci/rci).round(2)))

        # Show graphically the differences

        # plot sample
        sample.sort()
        y = ql.llquantiles(len(sample)) if tail == 'upper' else ql.llsurvivals(len(sample))
        plt.plot(sample, y, 'o', label='sample')

        # plot best fit and confidence intervals from cib
        x = sample.min(), sample.max()
        y = get_y(pfit, x, tail).transpose()
        plt.plot(x, y, '--b', label='cib')
        y = np.array([get_y(p, x, tail) for p in pci]).transpose()
        plt.plot(x, y, '--b')

        # plot confidence intervals from R
        y = get_y(rfit, x, tail).transpose()
        plt.plot(x, y, 'r', label='R')
        y = np.array([get_y(p, x, tail) for p in rci]).transpose()
        plt.plot(x, y, 'r')
        plt.title('{} - Hs {} - Tp {} - wd {}'.format(var, hs, tp, wd))
        plt.legend(loc='best')
        plt.grid()
        #plt.savefig(r'R_in_py\{}-Hs{}-Tp{}-wd{}.png'.format(var, hs, tp, wd))
        plt.show()
        plt.close()


################################################################################
Sling3 Min Tension - Hs 2.4 - Tp 10 - wd 165
Pack 	 Loc    	 Scale
R 	 459.944 	 54.881
Scipy 	 459.945 	 54.872
Confidence Interval from R
 2.5%	 443.8618	43.0533
97.5%	 476.0261	66.7078
Confidence Interval from cib
 2.5%	 445.7672	45.8473
97.5%	 473.3994	64.1390
Relative % differences
[[-0.43 -6.49]
 [ 0.55  3.85]]
################################################################################
Sling3 Min Tension - Hs 2.6 - Tp 7 - wd 195
Pack 	 Loc    	 Scale
R 	 441.934 	 58.954
Scipy 	 441.945 	 58.951
Confidence Interval from R
 2.5%	 424.7347	46.1359
97.5%	 459.1327	71.7712
Confidence Interval from cib
 2.5%	 423.7974	47.6854
97.5%	 457.4009	70.5575
Relative % differences
[[ 0.22 -3.36]
 [ 0.38  1.69]]
################################################################################
Sling3 Min Tension - Hs 2.6 - Tp 8 - wd 195
Pack 	 Loc    	 Scale
R 	 417.734 	 46.794
Scipy 	 417.719 	 46.801
Confidence Interval from R
 2.5%	 403.9559	37.5813
97.5%	 431.5114	56.0073
Confidence Interval from cib
 2.5%	 403.9022	36.3446
97.5%	 432.5902	55.0532
Relative % differences
[[ 0.01  3.29]
 [-0.25  1.7 ]]
################################################################################
Sling3 Min Tension - Hs 2.6 - Tp 10 - wd 165
Pack 	 Loc    	 Scale
R 	 443.477 	 59.532
Scipy 	 443.487 	 59.540
Confidence Interval from R
 2.5%	 426.0232	46.7745
97.5%	 460.9304	72.2890
Confidence Interval from cib
 2.5%	 427.3847	49.4972
97.5%	 457.6083	69.2941
Relative % differences
[[-0.32 -5.82]
 [ 0.72  4.14]]
################################################################################
Sling3 Min Tension - Hs 2.6 - Tp 11 - wd 165
Pack 	 Loc    	 Scale
R 	 445.356 	 53.518
Scipy 	 445.362 	 53.521
Confidence Interval from R
 2.5%	 429.7454	41.6894
97.5%	 460.9667	65.3474
Confidence Interval from cib
 2.5%	 430.1873	42.5549
97.5%	 460.2934	64.0290
Relative % differences
[[-0.1  -2.08]
 [ 0.15  2.02]]
################################################################################
Winch1 Max Tension - Hs 2.0 - Tp 8 - wd 195
Pack 	 Loc    	 Scale
R 	 3596.882 	 157.402
Scipy 	 3596.849 	 157.383
Confidence Interval from R
 2.5%	 3551.1906	121.7294
97.5%	 3642.5729	193.0753
Confidence Interval from cib
 2.5%	 3556.1615	117.8229
97.5%	 3649.2630	194.8518
Relative % differences
[[-0.14  3.21]
 [-0.18 -0.92]]
################################################################################
Winch1 Max Tension - Hs 2.0 - Tp 9 - wd 195
Pack 	 Loc    	 Scale
R 	 3587.558 	 144.685
Scipy 	 3587.579 	 144.704
Confidence Interval from R
 2.5%	 3545.3623	112.7553
97.5%	 3629.7532	176.6150
Confidence Interval from cib
 2.5%	 3549.4508	114.5178
97.5%	 3632.9288	169.0928
Relative % differences
[[-0.12 -1.56]
 [-0.09  4.26]]
################################################################################
Winch1 Max Tension - Hs 2.2 - Tp 6 - wd 195
Pack 	 Loc    	 Scale
R 	 3533.388 	 169.911
Scipy 	 3533.405 	 169.949
Confidence Interval from R
 2.5%	 3484.2256	130.4887
97.5%	 3582.5507	209.3324
Confidence Interval from cib
 2.5%	 3488.9814	127.6567
97.5%	 3589.8423	213.7499
Relative % differences
[[-0.14  2.17]
 [-0.2  -2.11]]
################################################################################
Winch1 Max Tension - Hs 2.2 - Tp 7 - wd 165
Pack 	 Loc    	 Scale
R 	 3531.949 	 159.314
Scipy 	 3531.914 	 159.290
Confidence Interval from R
 2.5%	 3485.3800	125.3504
97.5%	 3578.5190	193.2775
Confidence Interval from cib
 2.5%	 3488.8425	130.0725
97.5%	 3587.1192	193.6342
Relative % differences
[[-0.1  -3.77]
 [-0.24 -0.18]]
################################################################################
Winch1 Max Tension - Hs 2.2 - Tp 7 - wd 180
Pack 	 Loc    	 Scale
R 	 3530.215 	 145.019
Scipy 	 3530.240 	 145.034
Confidence Interval from R
 2.5%	 3487.9750	112.7017
97.5%	 3572.4549	177.3361
Confidence Interval from cib
 2.5%	 3492.2389	117.7450
97.5%	 3580.8964	180.6314
Relative % differences
[[-0.12 -4.47]
 [-0.24 -1.86]]
################################################################################
Winch1 Max Tension - Hs 2.2 - Tp 7 - wd 195
Pack 	 Loc    	 Scale
R 	 3587.724 	 140.509
Scipy 	 3587.751 	 140.579
Confidence Interval from R
 2.5%	 3547.0156	107.9191
97.5%	 3628.4321	173.0992
Confidence Interval from cib
 2.5%	 3548.5849	99.3673
97.5%	 3628.5902	175.8979
Relative % differences
[[-0.04  7.92]
 [-0.   -1.62]]
################################################################################
Winch1 Max Tension - Hs 2.2 - Tp 8 - wd 165
Pack 	 Loc    	 Scale
R 	 3547.748 	 146.393
Scipy 	 3547.727 	 146.368
Confidence Interval from R
 2.5%	 3504.9165	114.7114
97.5%	 3590.5800	178.0740
Confidence Interval from cib
 2.5%	 3506.8368	111.3673
97.5%	 3588.6183	173.4800
Relative % differences
[[-0.05  2.92]
 [ 0.05  2.58]]
################################################################################
Winch1 Max Tension - Hs 2.2 - Tp 8 - wd 180
Pack 	 Loc    	 Scale
R 	 3578.394 	 153.255
Scipy 	 3578.354 	 153.257
Confidence Interval from R
 2.5%	 3533.8142	119.3947
97.5%	 3622.9748	187.1146
Confidence Interval from cib
 2.5%	 3536.9345	117.2819
97.5%	 3628.7641	194.4620
Relative % differences
[[-0.09  1.77]
 [-0.16 -3.93]]
################################################################################
Winch1 Max Tension - Hs 2.2 - Tp 9 - wd 165
Pack 	 Loc    	 Scale
R 	 3518.883 	 145.713
Scipy 	 3518.892 	 145.746
Confidence Interval from R
 2.5%	 3476.5077	112.5899
97.5%	 3561.2591	178.8358
Confidence Interval from cib
 2.5%	 3476.0795	110.1469
97.5%	 3564.1940	179.6463
Relative % differences
[[ 0.01  2.17]
 [-0.08 -0.45]]
################################################################################
Winch1 Max Tension - Hs 2.2 - Tp 9 - wd 180
Pack 	 Loc    	 Scale
R 	 3533.562 	 149.360
Scipy 	 3533.538 	 149.342
Confidence Interval from R
 2.5%	 3490.1767	115.8626
97.5%	 3576.9480	182.8575
Confidence Interval from cib
 2.5%	 3494.5285	115.5863
97.5%	 3578.0864	191.3323
Relative % differences
[[-0.12  0.24]
 [-0.03 -4.63]]
################################################################################
Winch1 Max Tension - Hs 2.2 - Tp 10 - wd 195
Pack 	 Loc    	 Scale
R 	 3614.248 	 127.988
Scipy 	 3614.260 	 127.971
Confidence Interval from R
 2.5%	 3577.0311	99.6450
97.5%	 3651.4647	156.3316
Confidence Interval from cib
 2.5%	 3579.8204	99.3085
97.5%	 3660.1459	156.2544
Relative % differences
[[-0.08  0.34]
 [-0.24  0.05]]
################################################################################
Winch1 Max Tension - Hs 2.4 - Tp 6 - wd 165
Pack 	 Loc    	 Scale
R 	 3563.791 	 160.801
Scipy 	 3563.793 	 160.817
Confidence Interval from R
 2.5%	 3516.9292	124.9744
97.5%	 3610.6536	196.6272
Confidence Interval from cib
 2.5%	 3517.7026	125.4011
97.5%	 3614.9309	192.4268
Relative % differences
[[-0.02 -0.34]
 [-0.12  2.14]]
################################################################################
Winch1 Max Tension - Hs 2.4 - Tp 10 - wd 165
Pack 	 Loc    	 Scale
R 	 3523.024 	 170.859
Scipy 	 3523.036 	 170.877
Confidence Interval from R
 2.5%	 3473.2592	134.7280
97.5%	 3572.7895	206.9894
Confidence Interval from cib
 2.5%	 3471.2417	125.8375
97.5%	 3582.6458	201.1561
Relative % differences
[[ 0.06  6.6 ]
 [-0.28  2.82]]
################################################################################
Winch1 Max Tension - Hs 2.4 - Tp 10 - wd 180
Pack 	 Loc    	 Scale
R 	 3546.345 	 158.494
Scipy 	 3546.389 	 158.535
Confidence Interval from R
 2.5%	 3500.1052	123.9268
97.5%	 3592.5840	193.0616
Confidence Interval from cib
 2.5%	 3505.6898	127.7515
97.5%	 3593.0519	182.7352
Relative % differences
[[-0.16 -3.09]
 [-0.01  5.35]]
################################################################################
Winch1 Max Tension - Hs 2.4 - Tp 11 - wd 180
Pack 	 Loc    	 Scale
R 	 3538.439 	 159.920
Scipy 	 3538.439 	 159.901
Confidence Interval from R
 2.5%	 3491.8122	125.6018
97.5%	 3585.0655	194.2390
Confidence Interval from cib
 2.5%	 3499.5439	123.5357
97.5%	 3585.7708	189.4679
Relative % differences
[[-0.22  1.64]
 [-0.02  2.46]]
################################################################################
Winch1 Max Tension - Hs 2.4 - Tp 11 - wd 195
Pack 	 Loc    	 Scale
R 	 3662.631 	 136.856
Scipy 	 3662.680 	 136.872
Confidence Interval from R
 2.5%	 3622.5572	107.9195
97.5%	 3702.7057	165.7920
Confidence Interval from cib
 2.5%	 3623.6888	106.9821
97.5%	 3704.5048	159.3834
Relative % differences
[[-0.03  0.87]
 [-0.05  3.87]]
################################################################################
Winch1 Max Tension - Hs 2.4 - Tp 12 - wd 195
Pack 	 Loc    	 Scale
R 	 3603.823 	 153.494
Scipy 	 3603.818 	 153.517
Confidence Interval from R
 2.5%	 3558.7834	120.9190
97.5%	 3648.8617	186.0699
Confidence Interval from cib
 2.5%	 3564.6787	127.6683
97.5%	 3652.9860	173.1780
Relative % differences
[[-0.17 -5.58]
 [-0.11  6.93]]
################################################################################
Winch1 Max Tension - Hs 2.4 - Tp 13 - wd 195
Pack 	 Loc    	 Scale
R 	 3560.258 	 147.972
Scipy 	 3560.237 	 147.974
Confidence Interval from R
 2.5%	 3516.7640	117.3795
97.5%	 3603.7517	178.5644
Confidence Interval from cib
 2.5%	 3518.7011	124.1253
97.5%	 3600.9317	168.5902
Relative % differences
[[-0.06 -5.75]
 [ 0.08  5.59]]
################################################################################
Winch1 Max Tension - Hs 2.6 - Tp 11 - wd 165
Pack 	 Loc    	 Scale
R 	 3539.219 	 142.188
Scipy 	 3539.256 	 142.214
Confidence Interval from R
 2.5%	 3497.9445	111.4574
97.5%	 3580.4929	172.9179
Confidence Interval from cib
 2.5%	 3499.5282	102.4756
97.5%	 3581.0155	175.4456
Relative % differences
[[-0.05  8.06]
 [-0.01 -1.46]]
################################################################################
Winch1 Max Tension - Hs 2.6 - Tp 14 - wd 195
Pack 	 Loc    	 Scale
R 	 3576.854 	 143.834
Scipy 	 3576.802 	 143.795
Confidence Interval from R
 2.5%	 3534.7254	113.5872
97.5%	 3618.9829	174.0810
Confidence Interval from cib
 2.5%	 3531.8704	108.6292
97.5%	 3621.7511	170.7815
Relative % differences
[[ 0.08  4.36]
 [-0.08  1.9 ]]

In [38]:
print(time.asctime())


Tue Oct 10 10:15:37 2017

In [ ]: