Load Libraries



In [1]:

    
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns
import bootstrap_contrast as bsc

import pandas as pd
import numpy as np
import scipy as sp

Create dummy dataset

Here, we create a dummy dataset to illustrate how bootstrap-contrast functions. In this dataset, each column corresponds to a group of observations, and each row is simply an index number referring to an observation. (This is known as a 'wide' dataset.)



In [2]:

    
dataset=list()
for seed in [10,11,12,13,14,15]:
    np.random.seed(seed) # fix the seed so we get the same numbers each time.
    dataset.append(np.random.randn(40))
df=pd.DataFrame(dataset).T
cols=['Control','Group1','Group2','Group3','Group4','Group5']
df.columns=cols
# Create some upwards/downwards shifts.
df['Group2']=df['Group2']-0.1
df['Group3']=df['Group3']+0.2
df['Group4']=(df['Group4']*1.1)+4
df['Group5']=(df['Group5']*1.1)-1
# Add gender column.
df['Gender']=np.concatenate([np.repeat('Male',20),np.repeat('Female',20)])

Note that we have 6 groups of observations, with an additional non-numerical column indicating gender.

The `bootstrap` class

Here, we introduce a new class called bootstrap. Essentially, it will compute the summary statistic and its associated confidence interval using bootstrapping. It can do this for a single group of observations, or for two groups of observations (both paired and unpaired).

Below, I obtain the bootstrapped contrast for 'Control' and 'Group1' in df.



In [3]:

    
contr = bsc.bootstrap(df['Control'],df['Group1'])

As mentioned above, contr is a bootstrap object. Calling it directly will not produce anything.



In [4]:

    
contr









    Out[4]:





<bootstrap_contrast.bootstrap_tools.bootstrap at 0x1064726d8>

It has several callable attributes. Of interest is its results attribute, which returns a dictionary summarising the results of the contrast computation.



In [5]:

    
contr.results









    Out[5]:





{'bca_ci_high': 0.21530883448325841,
 'bca_ci_low': -0.59272396423959439,
 'ci': 95.0,
 'is_difference': True,
 'is_paired': False,
 'stat_summary': -0.1808044652703821}

is_paired indicates the two arrays are paired (or repeated) observations. This is indicated by the paired flag.



In [6]:

    
contr_paired = bsc.bootstrap(df['Control'],df['Group1'],
                             paired=True)
contr_paired.results









    Out[6]:





{'bca_ci_high': 0.23101685944171341,
 'bca_ci_low': -0.57186646678627306,
 'ci': 95.0,
 'is_difference': True,
 'is_paired': True,
 'stat_summary': -0.18080446527038205}

is_difference basically indicates if one or two arrays were passed to the bootstrap function. Obseve what happens if we just give one array.



In [7]:

    
just_control = bsc.bootstrap(df['Control'])
just_control.results









    Out[7]:





{'bca_ci_high': 0.45933914074754423,
 'bca_ci_low': -0.13338863345306345,
 'ci': 95.0,
 'is_difference': False,
 'is_paired': False,
 'stat_summary': 0.17175621510073041}

Here, the confidence interval is with respect to the mean of the group Control.

There are several other statistics the bootstrap object contains. Please do have a look at its documentation. Below, I print the p-values for contr_paired as an example.



In [8]:

    
contr_paired.pvalue_2samp_paired_ttest









    Out[8]:





0.39310007728828344



In [9]:

    
contr_paired.pvalue_wilcoxon









    Out[9]:





0.35369319267722144

Producing Plots

Version 0.3 of bootstrap-contrast has an optimised version of the contrastplot command.

Floating contrast plots—Two-group unpaired

Below we produce three aligned Gardner-Altman floating contrast plots.

The contrastplot command will return 2 objects: a matplotlib Figure and a pandas DataFrame. In the Jupyter Notebook, with %matplotlib inline, the figure should automatically appear.

bs.bootstrap will automatically drop any NaNs in the data. Note how the Ns (appended to the group names in the xtick labels) indicate the number of datapoints being plotted, and used to calculate the contrasts.

The pandas DataFrame returned by bs.bootstrap contains the pairwise comparisons made in the course of generating the plot, with confidence intervals (95% by default) and relevant p-values.



In [10]:

    
f, b = bsc.contrastplot(df,
                      idx=('Control','Group1'),
                      color_col='Gender',
                      fig_size=(4,6) # The length and width of the image, in inches.
                      )
b









    Out[10]:







  
    
      
      reference_group
      experimental_group
      stat_summary
      bca_ci_low
      bca_ci_high
      ci
      is_difference
      is_paired
      pvalue_2samp_ind_ttest
      pvalue_mannWhitney
    
  
  
    
      0
      Control
      Group1
      -0.180804
      -0.594608
      0.219648
      95.0
      True
      False
      0.395987
      0.363178

Floating contrast plots—Two-group paired



In [11]:

    
f, b = bsc.contrastplot(df,
                        idx=('Control','Group2'),
                        color_col='Gender',
                        paired=True,
                        fig_size=(4,6))
b









    Out[11]:







  
    
      
      reference_group
      experimental_group
      stat_summary
      bca_ci_low
      bca_ci_high
      ci
      is_difference
      is_paired
      pvalue_2samp_paired_ttest
      pvalue_wilcoxon
    
  
  
    
      0
      Control
      Group2
      -0.532006
      -1.009002
      -0.029927
      95.0
      True
      True
      0.04253
      0.038456

If you want to plot the raw swarmplot instead of the paired lines, use the show_pairs flag to set this. The contrasts computed will still be paired, as indicated by the DataFrame produced.



In [12]:

    
f, b = bsc.contrastplot(df,
                        idx=('Control','Group2'),
                        color_col='Gender',
                        paired=True,
                        show_pairs=False,
                        fig_size=(4,6))
b









    Out[12]:







  
    
      
      reference_group
      experimental_group
      stat_summary
      bca_ci_low
      bca_ci_high
      ci
      is_difference
      is_paired
      pvalue_2samp_paired_ttest
      pvalue_wilcoxon
    
  
  
    
      0
      Control
      Group2
      -0.532006
      -1.004253
      -0.035963
      95.0
      True
      True
      0.04253
      0.038456

Floating contrast plots—Multi-plot design

In a multi-plot design, you can horizontally tile two or more two-group floating-contrasts. This is designed to meet data visualization and presentation paradigms that are predominant in academic biomedical research.

This is done mainly through the idx option. You can indicate two or more tuples to create a seperate subplot for that contrast.

The effect sizes and confidence intervals for each two-group plot will be computed.



In [13]:

    
f, b = bsc.contrastplot(df,
                        idx=(('Control','Group1'),
                             ('Group2','Group3')),
                        paired=True,
                        show_means='lines',
                        color_col='Gender')
b









    Out[13]:







  
    
      
      reference_group
      experimental_group
      stat_summary
      bca_ci_low
      bca_ci_high
      ci
      is_difference
      is_paired
      pvalue_2samp_paired_ttest
      pvalue_wilcoxon
    
  
  
    
      0
      Control
      Group1
      -0.180804
      -0.582288
      0.217477
      95.0
      True
      True
      0.393100
      0.353693
    
    
      1
      Group2
      Group3
      0.700802
      0.235262
      1.153145
      95.0
      True
      True
      0.005299
      0.004378

Hub-and-spoke plots

A common experimental design seen in contemporary biomedical research is a shared-control, or 'hub-and-spoke' design. Two or more experimental groups are compared to a common control group.

A hub-and-spoke plot implements estimation statistics and aesthetics on such an experimental design.

If more than 2 columns/groups are indicated in a tuple passed to idx, then contrastplot will produce a hub-and-spoke plot, where the first group in the tuple is considered the control group. The mean difference and confidence intervals of each subsequent group will be computed against the first control group.



In [14]:

    
f, b = bsc.contrastplot(df,
                        idx=df.columns[:-1],
                        color_col='Gender')
b









    Out[14]:







  
    
      
      reference_group
      experimental_group
      stat_summary
      bca_ci_low
      bca_ci_high
      ci
      is_difference
      is_paired
      pvalue_2samp_ind_ttest
      pvalue_mannWhitney
    
  
  
    
      0
      Control
      Group1
      -0.180804
      -0.598903
      0.232637
      95.0
      True
      False
      3.959867e-01
      3.631777e-01
    
    
      1
      Control
      Group2
      -0.532006
      -0.968608
      -0.080709
      95.0
      True
      False
      2.157452e-02
      1.358049e-02
    
    
      2
      Control
      Group3
      0.168796
      -0.254775
      0.602035
      95.0
      True
      False
      4.394990e-01
      4.161598e-01
    
    
      3
      Control
      Group4
      3.709493
      3.274405
      4.130370
      95.0
      True
      False
      1.748492e-27
      1.937605e-14
    
    
      4
      Control
      Group5
      -1.397284
      -1.837762
      -0.941801
      95.0
      True
      False
      4.761711e-08
      4.267387e-07

Hub-and-spoke plots—multi-plot design

You can also horizontally tile two or more hub-and-spoke plots.



In [15]:

    
f, b = bsc.contrastplot(df,
                        idx=(('Control','Group1'),('Group2','Group3'),
                             ('Group4','Group5')),
                        color_col='Gender')
b









    Out[15]:







  
    
      
      reference_group
      experimental_group
      stat_summary
      bca_ci_low
      bca_ci_high
      ci
      is_difference
      is_paired
      pvalue_2samp_ind_ttest
      pvalue_mannWhitney
    
  
  
    
      0
      Control
      Group1
      -0.180804
      -0.585926
      0.214224
      95.0
      True
      False
      3.959867e-01
      3.631777e-01
    
    
      1
      Group2
      Group3
      0.700802
      0.273723
      1.126667
      95.0
      True
      False
      2.720823e-03
      2.555804e-03
    
    
      2
      Group4
      Group5
      -5.106777
      -5.533075
      -4.636955
      95.0
      True
      False
      7.985186e-35
      1.435085e-14

Controlling Aesthetics



In [16]:

    
# Changing the contrast y-limits.
f, b = bsc.contrastplot(df,
                       idx=('Control','Group1','Group2'),
                       color_col='Gender',
                       contrast_ylim=(-2,2))



In [17]:

    
# Changing the swarmplot y-limits.
f, b = bsc.contrastplot(df,
                       idx=('Control','Group1','Group2'),
                       color_col='Gender',
                       swarm_ylim=(-10,10))



In [18]:

    
# Changing the size of the dots in the swarmplot.
# This is done through swarmplot_kwargs, which accepts a dictionary.
# You can pass any keywords that sns.swarmplot can accept.
f, b = bsc.contrastplot(df,
                       idx=('Control','Group1','Group2'),
                       color_col='Gender',
                       swarmplot_kwargs={'size':10} 
                      )



In [19]:

    
# Custom y-axis labels.
f, b = bsc.contrastplot(df,
                       idx=('Control','Group1','Group2'),
                       color_col='Gender',
                       swarm_label='My Custom\nSwarm Label',
                       contrast_label='This is the\nContrast Plot'
                       )



In [20]:

    
# Showing a histogram for the mean summary instead of a horizontal line.
f, b = bsc.contrastplot(df,
                       idx=('Control','Group1','Group4'),
                       color_col='Gender',
                       show_means='bars',
                       means_width=0.6 # Changes the width of the summary bar or the summary line.
                      )



In [21]:

    
# Passing a list as a custom palette.
f, b = bsc.contrastplot(df,
                        idx=('Control','Group1','Group4'),
                        color_col='Gender',
                        show_means='bars',
                        means_width=0.6,
                        custom_palette=['green', 'tomato'],
                       )



In [22]:

    
# Passing a dict as a custom palette.
f, b = bsc.contrastplot(df,
                        idx=('Control','Group1','Group4'),
                        color_col='Gender',
                        show_means='bars',
                        means_width=0.6,
                        custom_palette=dict(Male='grey', Female='green')
                       )



In [23]:

    
# custom y-axis labels for both swarmplots and violinplots.
f, b = bsc.contrastplot(df,
                        idx=('Control','Group1','Group4'),
                        color_col='Gender',
                        swarm_label='my swarm',
                        contrast_label='The\nContrasts' # add line break.
                       )

Appendix: On working with 'melted' DataFrames.

bsc.contrastplot can also work with 'melted' or 'longform' data. This term is so used because each row will now correspond to a single datapoint, with one column carrying the value (value) and other columns carrying 'metadata' describing that datapoint (in this case, group and Gender).

For more details on wide vs long or 'melted' data, see https://en.wikipedia.org/wiki/Wide_and_narrow_data

To read more about melting a dataframe,see https://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html



In [24]:

    
x='group'
y='my_metric'
color_col='Gender'

df_melt=pd.melt(df.reset_index(),
                id_vars=['index',color_col],
                value_vars=cols,value_name=y,var_name=x)

df_melt.head() # Gives the first five rows of `df_melt`.









    Out[24]:







  
    
      
      index
      Gender
      group
      my_metric
    
  
  
    
      0
      0
      Male
      Control
      1.331587
    
    
      1
      1
      Male
      Control
      0.715279
    
    
      2
      2
      Male
      Control
      -1.545400
    
    
      3
      3
      Male
      Control
      -0.008384
    
    
      4
      4
      Male
      Control
      0.621336

If you are using a melted DataFrame, you will need to specify the x (containing the categorical group names) and y (containing the numerical values for plotting) columns.



In [25]:

    
df_melt









    Out[25]:







  
    
      
      index
      Gender
      group
      my_metric
    
  
  
    
      0
      0
      Male
      Control
      1.331587
    
    
      1
      1
      Male
      Control
      0.715279
    
    
      2
      2
      Male
      Control
      -1.545400
    
    
      3
      3
      Male
      Control
      -0.008384
    
    
      4
      4
      Male
      Control
      0.621336
    
    
      5
      5
      Male
      Control
      -0.720086
    
    
      6
      6
      Male
      Control
      0.265512
    
    
      7
      7
      Male
      Control
      0.108549
    
    
      8
      8
      Male
      Control
      0.004291
    
    
      9
      9
      Male
      Control
      -0.174600
    
    
      10
      10
      Male
      Control
      0.433026
    
    
      11
      11
      Male
      Control
      1.203037
    
    
      12
      12
      Male
      Control
      -0.965066
    
    
      13
      13
      Male
      Control
      1.028274
    
    
      14
      14
      Male
      Control
      0.228630
    
    
      15
      15
      Male
      Control
      0.445138
    
    
      16
      16
      Male
      Control
      -1.136602
    
    
      17
      17
      Male
      Control
      0.135137
    
    
      18
      18
      Male
      Control
      1.484537
    
    
      19
      19
      Male
      Control
      -1.079805
    
    
      20
      20
      Female
      Control
      -1.977728
    
    
      21
      21
      Female
      Control
      -1.743372
    
    
      22
      22
      Female
      Control
      0.266070
    
    
      23
      23
      Female
      Control
      2.384967
    
    
      24
      24
      Female
      Control
      1.123691
    
    
      25
      25
      Female
      Control
      1.672622
    
    
      26
      26
      Female
      Control
      0.099149
    
    
      27
      27
      Female
      Control
      1.397996
    
    
      28
      28
      Female
      Control
      -0.271248
    
    
      29
      29
      Female
      Control
      0.613204
    
    
      ...
      ...
      ...
      ...
      ...
    
    
      210
      10
      Male
      Group5
      -1.220654
    
    
      211
      11
      Male
      Group5
      -0.609284
    
    
      212
      12
      Male
      Group5
      -0.241531
    
    
      213
      13
      Male
      Group5
      -0.548351
    
    
      214
      14
      Male
      Group5
      -1.621476
    
    
      215
      15
      Male
      Group5
      -0.340670
    
    
      216
      16
      Male
      Group5
      -1.179230
    
    
      217
      17
      Male
      Group5
      0.760236
    
    
      218
      18
      Male
      Group5
      -0.250210
    
    
      219
      19
      Male
      Group5
      -0.983632
    
    
      220
      20
      Female
      Group5
      -1.096558
    
    
      221
      21
      Female
      Group5
      -2.080330
    
    
      222
      22
      Female
      Group5
      -0.866140
    
    
      223
      23
      Female
      Group5
      -2.251181
    
    
      224
      24
      Female
      Group5
      -0.616097
    
    
      225
      25
      Female
      Group5
      -3.044364
    
    
      226
      26
      Female
      Group5
      -2.283900
    
    
      227
      27
      Female
      Group5
      0.567387
    
    
      228
      28
      Female
      Group5
      0.646222
    
    
      229
      29
      Female
      Group5
      0.418925
    
    
      230
      30
      Female
      Group5
      -2.992920
    
    
      231
      31
      Female
      Group5
      -2.648138
    
    
      232
      32
      Female
      Group5
      -2.595158
    
    
      233
      33
      Female
      Group5
      -2.863298
    
    
      234
      34
      Female
      Group5
      -0.750010
    
    
      235
      35
      Female
      Group5
      -1.538708
    
    
      236
      36
      Female
      Group5
      -1.000581
    
    
      237
      37
      Female
      Group5
      -1.539278
    
    
      238
      38
      Female
      Group5
      -1.872530
    
    
      239
      39
      Female
      Group5
      1.253789
    
  

240 rows × 4 columns



In [26]:

    
df



In [27]:

    
f, b = bsc.contrastplot(df_melt,
                        x='group',
                        y='my_metric',
                        fig_size=(4,6),
                        idx=('Control','Group1'),
                        color_col='Gender',
                        paired=True
                       )
b









    Out[27]:







  
    
      
      reference_group
      experimental_group
      stat_summary
      bca_ci_low
      bca_ci_high
      ci
      is_difference
      is_paired
      pvalue_2samp_paired_ttest
      pvalue_wilcoxon
    
  
  
    
      0
      Control
      Group1
      -0.180804
      -0.574237
      0.250589
      95.0
      True
      True
      0.3931
      0.353693



In [ ]:

	Control	Group1	Group2	Group3	Group4	Group5	Gender
0	1.331587	1.749455	0.372986	-0.512391	5.706473	-1.343561	Male
1	0.715279	-0.286073	-0.781426	0.953766	4.087105	-0.626787	Male
2	-1.545400	-0.484565	0.142439	0.155497	4.191374	-1.171499	Male
3	-0.008384	-2.653319	-1.800736	0.651812	3.920430	-1.551969	Male
4	0.621336	-0.008285	0.653143	1.545102	1.795238	-0.740874	Male
5	-0.720086	-0.319631	-1.634721	0.732338	4.159146	-2.939966	Male
6	0.265512	-0.536629	-0.094873	1.550188	2.348715	-2.205448	Male
7	0.108549	0.315403	-0.220228	1.061211	4.232220	-2.196542	Male
8	0.004291	0.421051	-0.906982	1.678686	3.385974	-1.335687	Male
9	-0.174600	-1.065603	2.771819	-0.845377	5.192982	-1.521123	Male
10	0.433026	-0.886240	-0.697823	-0.588989	3.795082	-1.220654	Male
11	1.203037	-0.475733	0.372457	-1.061606	4.016128	-0.609284	Male
12	-0.965066	0.689682	0.995956	0.762847	2.816874	-0.241531	Male
13	1.028274	0.561192	-1.315169	-0.043326	4.706477	-0.548351	Male
14	0.228630	-1.305549	1.242356	1.113741	3.801630	-1.621476	Male
15	0.445138	-1.119475	-0.222150	0.517351	4.682330	-0.340670	Male
16	-1.136602	0.736837	0.912515	0.327303	4.892072	-1.179230	Male
17	0.135137	1.574634	-1.013869	2.350383	4.855729	0.760236	Male
18	1.484537	-0.031075	-1.129530	0.806289	3.738761	-0.250210	Male
19	-1.079805	-0.683447	1.109796	0.173228	1.918896	-0.983632	Male
20	-1.977728	1.095630	0.401872	-0.784161	2.710666	-1.096558	Female
21	-1.743372	-0.309577	0.038846	1.390705	4.919828	-2.080330	Female
22	0.266070	0.725752	0.540761	1.152831	5.110201	-0.866140	Female
23	2.384967	1.549072	0.427333	-0.887182	5.422409	-2.251181	Female
24	1.123691	0.630080	-1.254360	0.054789	3.395736	-0.616097	Female
25	1.672622	0.073493	-2.313333	0.437858	2.920116	-3.044364	Female
26	0.099149	0.732271	-1.781757	-1.439093	5.006140	-2.283900	Female
27	1.397996	-0.642575	-1.888094	-0.078135	4.960377	0.567387	Female
28	-0.271248	-0.178093	-2.318535	1.599238	4.024322	0.646222	Female
29	0.613204	-0.573955	-0.747431	-1.415108	3.995442	0.418925	Female
30	-0.267317	-0.204375	-0.628404	0.690872	2.515961	-2.992920	Female
31	-0.549309	-0.486495	-0.139209	2.092742	3.770960	-2.648138	Female
32	0.132708	-0.185775	0.114976	-0.420980	5.459194	-2.595158	Female
33	-0.476142	-0.380536	-0.484359	-0.253752	2.992113	-2.863298	Female
34	1.308473	0.088978	-0.353904	0.417452	3.482986	-0.750010	Female
35	0.195013	0.063672	-0.026748	0.714329	3.835613	-1.538708	Female
36	0.400210	0.296347	-1.097204	0.597241	3.642214	-1.000581	Female
37	-0.337632	1.402771	-0.813856	-1.312845	2.013704	-1.539278	Female
38	1.256472	-1.546863	-0.064584	-0.564034	3.449593	-1.872530	Female
39	-0.731970	1.295619	-0.777945	0.301270	3.378741	1.253789	Female

	reference_group	experimental_group	stat_summary	bca_ci_low	bca_ci_high	ci	is_difference	is_paired	pvalue_2samp_paired_ttest	pvalue_wilcoxon
0	Control	Group1	-0.180804	-0.582288	0.217477	95.0	True	True	0.393100	0.353693
1	Group2	Group3	0.700802	0.235262	1.153145	95.0	True	True	0.005299	0.004378