In [1]:
import statsmodels.stats.power as pow
import pandas as pan
import math as math
First import the csv file with all the estimated Cohen's d
In [2]:
data = pan.read_csv('/home/pbellec/Desktop/adnet/adnet_dcohen_effect_size.csv')
data
Out[2]:
Seed
Connection
pooled
ADNI2
CRIUGMa
CRIUGMb
MNI
0
22
7
0.4145
0.3618
1.4910
0.3228
0.1523
1
22
9
0.3079
0.2680
0.7692
0.3094
0.1734
2
22
11
0.3035
0.2185
0.9931
0.0604
0.5118
3
22
17
0.3216
0.1853
0.8669
0.3478
0.4969
4
22
20
0.2793
0.2109
0.6371
0.0871
0.5520
5
22
23
0.2885
0.2469
1.6770
0.0480
0.0319
6
22
28
0.3064
0.2296
1.2467
0.3550
0.0208
7
22
30
0.2803
0.1700
1.0418
0.0726
0.5100
8
22
31
0.3430
0.2294
1.1769
0.0967
0.6050
9
2
1
0.4501
0.3020
0.6207
0.5683
0.8152
10
2
4
0.2915
0.2403
0.3460
0.1644
0.6182
11
2
7
0.3404
0.1046
1.4018
0.3998
0.6383
12
2
14
0.3934
0.2541
0.5958
0.6490
0.5318
13
2
17
0.4265
0.3060
1.0641
0.6963
0.2243
14
2
20
0.3596
0.3849
0.1638
0.3841
0.3640
15
2
31
0.3928
0.1799
0.9057
0.8256
0.4560
16
2
33
0.3632
0.2327
0.3245
0.4497
0.8414
17
9
5
0.4210
0.5660
0.1482
0.3707
0.1097
18
9
10
0.4475
0.4108
0.5355
0.6368
0.3224
19
9
16
0.3105
0.2297
1.0588
0.3139
0.1806
20
9
17
0.2894
0.4264
0.6001
0.0133
-0.1172
21
9
22
0.3079
0.2680
0.7692
0.3094
0.1734
22
9
25
0.3546
0.2962
1.2266
0.3544
0.0872
23
9
26
0.3305
0.2806
0.6138
0.5236
0.1275
24
12
8
0.3674
0.4551
0.4566
0.1929
0.1738
25
12
10
0.3842
0.4355
0.0969
0.2626
0.5220
26
12
16
0.3735
0.3865
0.9229
-0.0064
0.4317
27
12
20
0.3471
0.2720
0.6641
0.3732
0.4104
28
12
23
0.3192
0.2561
0.9392
0.3202
0.1893
29
12
28
0.2851
0.1613
0.7195
0.4142
0.3524
Now use statsmodel to convert the $d$ estimates into required sample size, for group comparisons with balanced group composition, a sensitivity of $80\%$ and a false-positive rate $\alpha$ of $5\%$.
In [6]:
help(pow.TTestIndPower)
Help on class TTestIndPower in module statsmodels.stats.power:
class TTestIndPower(Power)
| Statistical Power calculations for t-test for two independent sample
|
| currently only uses pooled variance
|
| Method resolution order:
| TTestIndPower
| Power
| __builtin__.object
|
| Methods defined here:
|
| power(self, effect_size, nobs1, alpha, ratio=1, df=None, alternative='two-sided')
| Calculate the power of a t-test for two independent sample
|
| Parameters
| ----------
| effect_size : float
| standardized effect size, difference between the two means divided
| by the standard deviation. `effect_size` has to be positive.
| nobs1 : int or float
| number of observations of sample 1. The number of observations of
| sample two is ratio times the size of sample 1,
| i.e. ``nobs2 = nobs1 * ratio``
| alpha : float in interval (0,1)
| significance level, e.g. 0.05, is the probability of a type I
| error, that is wrong rejections if the Null Hypothesis is true.
| ratio : float
| ratio of the number of observations in sample 2 relative to
| sample 1. see description of nobs1
| The default for ratio is 1; to solve for ratio given the other
| arguments, it has to be explicitly set to None.
| df : int or float
| degrees of freedom. By default this is None, and the df from the
| ttest with pooled variance is used, ``df = (nobs1 - 1 + nobs2 - 1)``
| alternative : string, 'two-sided' (default), 'larger', 'smaller'
| extra argument to choose whether the power is calculated for a
| two-sided (default) or one sided test. The one-sided test can be
| either 'larger', 'smaller'.
|
| Returns
| -------
| power : float
| Power of the test, e.g. 0.8, is one minus the probability of a
| type II error. Power is the probability that the test correctly
| rejects the Null Hypothesis if the Alternative Hypothesis is true.
|
| solve_power(self, effect_size=None, nobs1=None, alpha=None, power=None, ratio=1.0, alternative='two-sided')
| solve for any one parameter of the power of a two sample t-test
|
| for t-test the keywords are:
| effect_size, nobs1, alpha, power, ratio
|
| exactly one needs to be ``None``, all others need numeric values
|
| Parameters
| ----------
| effect_size : float
| standardized effect size, difference between the two means divided
| by the standard deviation. `effect_size` has to be positive.
| nobs1 : int or float
| number of observations of sample 1. The number of observations of
| sample two is ratio times the size of sample 1,
| i.e. ``nobs2 = nobs1 * ratio``
| alpha : float in interval (0,1)
| significance level, e.g. 0.05, is the probability of a type I
| error, that is wrong rejections if the Null Hypothesis is true.
| power : float in interval (0,1)
| power of the test, e.g. 0.8, is one minus the probability of a
| type II error. Power is the probability that the test correctly
| rejects the Null Hypothesis if the Alternative Hypothesis is true.
| ratio : float
| ratio of the number of observations in sample 2 relative to
| sample 1. see description of nobs1
| The default for ratio is 1; to solve for ratio given the other
| arguments it has to be explicitly set to None.
| alternative : string, 'two-sided' (default), 'larger', 'smaller'
| extra argument to choose whether the power is calculated for a
| two-sided (default) or one sided test. The one-sided test can be
| either 'larger', 'smaller'.
|
| Returns
| -------
| value : float
| The value of the parameter that was set to None in the call. The
| value solves the power equation given the remaining parameters.
|
|
| Notes
| -----
| The function uses scipy.optimize for finding the value that satisfies
| the power equation. It first uses ``brentq`` with a prior search for
| bounds. If this fails to find a root, ``fsolve`` is used. If ``fsolve``
| also fails, then, for ``alpha``, ``power`` and ``effect_size``,
| ``brentq`` with fixed bounds is used. However, there can still be cases
| where this fails.
|
| ----------------------------------------------------------------------
| Methods inherited from Power:
|
| __init__(self, **kwds)
|
| plot_power(self, dep_var='nobs', nobs=None, effect_size=None, alpha=0.05, ax=None, title=None, plt_kwds=None, **kwds)
| plot power with number of observations or effect size on x-axis
|
| Parameters
| ----------
| dep_var : string in ['nobs', 'effect_size', 'alpha']
| This specifies which variable is used for the horizontal axis.
| If dep_var='nobs' (default), then one curve is created for each
| value of ``effect_size``. If dep_var='effect_size' or alpha, then
| one curve is created for each value of ``nobs``.
| nobs : scalar or array_like
| specifies the values of the number of observations in the plot
| effect_size : scalar or array_like
| specifies the values of the effect_size in the plot
| alpha : float or array_like
| The significance level (type I error) used in the power
| calculation. Can only be more than a scalar, if ``dep_var='alpha'``
| ax : None or axis instance
| If ax is None, than a matplotlib figure is created. If ax is a
| matplotlib axis instance, then it is reused, and the plot elements
| are created with it.
| title : string
| title for the axis. Use an empty string, ``''``, to avoid a title.
| plt_kwds : None or dict
| not used yet
| kwds : optional keywords for power function
| These remaining keyword arguments are used as arguments to the
| power function. Many power function support ``alternative`` as a
| keyword argument, two-sample test support ``ratio``.
|
| Returns
| -------
| fig : matplotlib figure instance
|
| Notes
| -----
| This works only for classes where the ``power`` method has
| ``effect_size``, ``nobs`` and ``alpha`` as the first three arguments.
| If the second argument is ``nobs1``, then the number of observations
| in the plot are those for the first sample.
| TODO: fix this for FTestPower and GofChisquarePower
|
| TODO: maybe add line variable, if we want more than nobs and effectsize
|
| ----------------------------------------------------------------------
| Data descriptors inherited from Power:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
In [9]:
list_site = [ 'pooled' , 'ADNI2' , 'CRIUGMa' , 'CRIUGMb' , 'MNI' ]
ssize = data.copy()
for site in list_site:
for rr in range(0,30):
ssize.loc[rr,site] = 2*math.ceil(pow.TTestIndPower().solve_power(effect_size=math.fabs(data.loc[rr,site]), power=0.8, ratio=1, alpha=0.05))
ssize
Out[9]:
Seed
Connection
pooled
ADNI2
CRIUGMa
CRIUGMb
MNI
0
22
7
186
242
18
304
1356
1
22
9
334
440
56
330
1048
2
22
11
344
660
34
8608
122
3
22
17
306
918
44
262
130
4
22
20
406
708
80
4142
106
5
22
23
380
518
14
13630
30856
6
22
28
338
598
24
252
72570
7
22
30
402
1090
32
5960
124
8
22
31
270
600
26
3360
88
9
2
1
158
348
84
100
50
10
2
4
372
546
266
1164
86
11
2
7
274
2872
20
200
80
12
2
14
206
490
92
78
114
13
2
17
176
338
30
68
626
14
2
20
246
214
1174
216
240
15
2
31
206
972
42
50
154
16
2
33
240
582
302
158
48
17
9
5
180
100
1432
232
2612
18
9
10
160
188
112
80
304
19
9
16
328
598
32
322
966
20
9
17
378
176
90
177488
2288
21
9
22
334
440
56
330
1048
22
9
25
252
360
24
252
4132
23
9
26
290
402
86
118
1934
24
12
8
236
154
154
846
1042
25
12
10
216
168
3346
458
118
26
12
16
228
214
40
766494
172
27
12
20
264
428
74
228
190
28
12
23
312
482
38
310
880
29
12
28
390
1210
64
186
256
Now simply write the results into a csv
In [10]:
ssize.to_csv('/home/pbellec/Desktop/adnet/adnet_sample_size.csv')
Content source: SIMEXP/Projects
Similar notebooks: