License: BSD
Copyright: Copyright American Gut Project, 2015



In [1]:

    
# This cell allows us to render the notebook in the way we wish no matter where
# the notebook is rendered.
from IPython.core.display import HTML
css_file = './ag.css'
HTML(open(css_file, "r").read())









    Out[1]:

Introduction
Notebook Requirements
Function Imports
Analysis Parameters
Files and Directories
Data Download
Metadata Adjustment
Multiple Body Site Power Calculation
Power Calculation for Fecal Samples
Effect Size Estimation
Power Curve Plotting
- Alpha Diversity
- Beta Diversity
Discussion
References

Power Analysis

While null hypothesis statistical testing demonstrates whether something has an effect, effect sizes are used to estimate the importance of an effect. There have been many calls in medical literature for the inclusion of effect size along with statistical p-values, as effect sizes increase the accuracy of comparison [1, 2]. We can leverage effect sizes within microbiome research in multiple ways. Understanding the effect size may help us rank which factors affect the microbiome the most. This, in turn, could help identify targets for intervention. Effect size can also help us design better studies. In the context of statistical power, effect size can help us estimate how many samples we might need to be reasonably confident that our hypothesis is true, given some margin of error.

The complex relation between humans and their resident microbes, especially the microbes in their guts, has been recognized in many areas of human health. Our microbial communities change during the course of our lives, although most of this change occurs within the first three years of life [3-6]. Long-term dietary patterns also have a large influence on the gut microbiome, although certain extreme dietary changes can force an acute change [6-9]. The gut microbiome can also be re-shaped by antibiotic use [10-12]. Microbiome alterations (dysbiosis) have also been reported in a number of diseases. Inflammatory bowel disease (IBD) and obesity are well studied examples in which the disease state is associated with dysbiosis [13-18]. Seasonal changes in the microbiome have also been reported [19]. We will demonstrate that in a healthy population of adults, age, alcohol consumption, exercise frequency, and sleep duration also impact the gut microbiome.

The list of factors that interact with the microbiome is based on the results of significance testing and does not consider effect size. However, effect size is still an important consideration in microbiome research for both experimental design and possible interventions. Using the American Gut data, we demonstrate that antibiotic use in the month prior to sample collection is associated with significantly decreased alpha diversity, compared with people who have not used antibiotics in the past year (p < 0.01). We also show that people who drink alcohol regularly (three or more times per week) have higher alpha diversity than those who do not consume alcohol (p < 0.01).

If these results were published, one can imagine a situation in which popular media might advise the regular consumption of alcohol use following a dose of antibiotics to help repopulate the microbiome. If we ignore the number of logical fallacies associated with this practice and assume that regular alcohol consumption is able to reseed the gut microbiome in a healthy way, the question then becomes if this will be an effective treatment. (Of course, this represents only one of many potential examples, and no real data beyond the significant association between alcohol consumption and microbiome diversity exists to support this mechanism.) Regular alcohol consumption following antibiotic use might be a good treatment strategy, if the effect size of alcohol consumption is on the same order or greater than the effect of antibiotic use. On the other hand, if the effect size of alcohol consumption is less than that of antibiotic use, encouraging adults who do not drink or drink rarely to increase their alcohol consumption may not lead to the desired effects.

The American Gut data set presents a rare opportunity for effect size prediction within the microbiome field. The large size and high degree of heterogeneity among participants and the amount of survey metadata collected from each participant facilitates the examination of effect size for factors that are either currently uncharacterized or under characterized in microbiome research. The accompanying challenge is that effect size calculations are difficult for the type of data collected in microbiome research. Most traditional effect size metrics for comparing samples between groups, such as Cohen’s d, make assumptions about the normality of the data being studied, while microbiome data is not normal. As a result, we will leverage a method of empirically estimating statistical power.

Hypothesis $\rightarrow$	Reject H₀ ($cost_{lab} \neq cost_{dragon}$)	Fail to Reject H₀ ($cost_{lab} = cost_{dragon}$)
Truth ↓	Reject H₀ ($cost_{lab} \neq cost_{dragon}$)	Fail to Reject H₀ ($cost_{lab} = cost_{dragon}$)
$cost_{lab} = cost_{dragon}$	False Positive	Correct
$cost_{lab} \neq cost_{dragon}$	Correct	False Negative

txt_delim (string)	`txt_delim` specifies the way columns are separated in the files. QIIME typically consumes and produces tab-delimited (`"\t"`) text files (.txt) for metadata and results generation.
map_index (string)	The name of the column containing the sample id name. In QIIME, this column is called `#SampleID`.
map_nas (list of strings)	It is possible a mapping file map be missing values, since American Gut participants are free to skip any question. The pandas package is able to omit these missing samples from analysis. In raw American Gut files, missing values are typically denoted as `“NA”`, `“no_data”`, `“unknown”`, and empty spaces (`“”`).
write_na (string)	The value to denote missing values when text files are written from Pandas data frames. Using an empty space, (`“”`) will allow certain QIIME scripts, like [group_signigance.py](http://qiime.org/scripts/group_significance.html), to ignore the missing values.
date_cols (list of strings)	Temporal data can be identified using the `date_cols`.

a_div_metric (string)	The alpha diversity metric to be used in the analysis. Mapping files generated by the Preprocessing Notebook have a set of mapping columns appended which provide the mean for several metrics. These are labeled as the metric name with `“_mean”` to indicate the values are the mean of 10 rarefactions.
a_title (string)	The title to be displayed on the alpha diversity power curve.
a_suffix (string)	If files are saved, this string is used to differentiate alpha diversity files from beta diversity.

b_div_metric (string)	This identifies the beta diversity metric to be used in the analysis. This name will appear at the beginning of the distance matrix file.
b_num_iter (int)	Differences in beta diversity are frequently tested using a permutative test [[23](#Bondini)]. his takes care of many of the statistical constraints associated with distance matrices. `b_num_iter` sets the number of permutations performed on a distance matrix during beta diversity power calculation. A large number can slow processing considerably, since we much perform the permutation several times.
b_title (string)	The title to be displayed on the beta diversity power curve.
b_suffix (string)	If files are saved, this string is used to differentiate alpha diversity files from beta diversity, and different beta diversity metrics.

num_iter (int)	The number of times data should be subsampled at each sampling depth to calculate the statistical power for the sample.
num_runs (int)	The number of times paired samples should be drawn for confidence interval calculation.
p_crit (float)	The value of $\alpha$ (the probability of a false positive) acceptable for these power calculations. Empirical power will be based on the number of iterations for a sample set that are less than this value. For historical and cultural reasons, 0.05 is often used.
min_counts (int)	The minimum number of samples drawn from each group during statistical testing. This should be set based on the expected effect size and number of available samples.
max_counts (int)	The maximum number of samples drawn from each group during statistical testing. This should be set based on the expected effect size and number of available samples and should not exceed the size of the smallest group.
counts_interval (int)	A sampling interval used to determine the number of samples which should be drawn during statistical testing. Samples will be drawn in a size increasing from the `min_counts`, to `min_counts + counts_interval`, *`min_counts + 2counts_interval`, and so on, up to `max_counts`**.

all_cat (string)	The metadata category use for body site comparison.
all_order (string)	The body sites being analyzed.
all_controls (string)	The metadata categories used to identify matched samples.

all_min_counts (int)	The minimum number of samples drawn from each group during statistical testing. This should be set based on the expected effect size and number of available samples.
all_max_counts (int)	The maximum number of samples drawn from each group during statistical testing. This should be set based on the expected effect size and number of available samples and should not exceed the size of the smallest group.
all_counts_interval (int)	A sampling interval used to determine the number of samples which should be drawn during statistical testing.

fecal_cats (list of tuples)	A list of tuples which follow the format `(category, order)`. For example, to look at inflammatory bowel disease status, this might be `(‘IDB’, [‘I do not have IBD’, ‘IBD’])`. The order list allows us to select which groups we’ll compare. To analyze all groups in a category, the order position may take a value of `None`.
fecal_control_cats (list of strings)	The categories used to identify matched samples. So, if we are comparing in category A, but control for B, C, and D, samples will be selected where A is different but B, C, and D are the same.

plot_counts (array)	The number of samples which should be drawn to plot the curve. The minimum of this should not be less than two, although the maximum can exceed the number of samples in any group.
plot_colormap (array, None)	The colors used for the lines. If None is specified, the default colors from Statsmodels will be used. When a custom colormap is passed, it should have at least as many colors as there are categories in `fecal_cats`.
legend_size (int)	The size of the text appearing in the figure legend.
label_size (array of strings)	The way each category should appear in the final legend. This should include body site.

figure_size (tuple)	The height and width of the final figure, in inches.
legend_position (tuple)	Where the legend should be placed in the final figure. The tuple gives (left, bottom) as a fraction of the axis size.
print_position (tuple)	A four-element description of the size of the axis in the figure. This is given in inches. The tuple is give as (left, bottom, width, height).
space_position (tuple)	To render the legend correctly, we have to create a dummy axis. This gives the location of the dummy axis within the figure in inches from the bottom left corner. Positions are (left, bottom, width, height).

save_pad (tuple)	The extra space (in inches) for display around the edge of the figure.
save_bbox (tuple, str)	The size of the image to be saved. Using a value of `'tight'` will display the entire figure and allow padding.

Category	Alpha			Beta
IBD	20	±	2	15	±	2
Plants Consumed	35	±	8	20	±	2
Antibiotic Use	40	±	7	25	±	4
Age	100	±	20	25	±	2
Alcohol Use	60	±	9	35	±	2
Sleep Duration	40	±	7	55	±	7
Season	80	±	11	45	±	2
Exercise Frequency	270	±	95	60	±	8
BMI	435	±	174	60	±	9

base_dir (string)	The filepath for the directory where any files associated with the analysis should be saved. It is suggested this be a directory called agp_analysis, and be located in the same directory as the IPython notebooks.
working_dir (string)	The file path for the directory where all data files associated with this analysis have been stored. This should contain the results of the Preprocessing Notebook. The working_dir is expected to be a directory called sample_data in the `base_dir`.
analysis_dir (string)	The file path where analysis results should be stored. This is expected to be a folder in the `base_dir`.

all_dir (string)	The filepath for the directory where all bodysite files are stored. This should be a directory in the `working_dir`.
all_map_fp (string)	The filepath for the metadata file associated with all samples. This is expected to be a processed metadata file generated by the preprocessing notebook, and contain columns describing alpha diversity.
all_uud_fp (string)	The filepath for the unweighted UniFrac distance matrix associated with the all sample file.

site_dir (string)	The filepath for the directory where data sets from fecal samples are stored. This should be a directory in the `working_dir`.
data_dir (string)	The filepath of the all participant single sample directory. This should be a folder in the `site_dir`.
data_map_fp (string)	The filepath for the metadata file associated with the fecal samples. This is expected to be a processed metadata file generated by the preprocessing notebook, and contain columns describing alpha diversity.
data_uud_fp (string)	The filepath for the unweighted UniFrac distance matrix associated with the fecal sample dataset.

results_dir (string)	A folder where files summarizing the power calculation results for each run should be stored. This is expected to be a folder in the `analysis_dir`.
site_pickle_pattern (string)	Individual power analyses (numpy arrays of the power curve results) are saved using the python Pickle module. The blanks specify the diversity metric used for comparison, the metadata category, and the two groups within that category. The file pattern contains blanks which can be filled in with information about the specific sample.

image_dir (string)	If power curves are being saved as images, this specifies the directory where all images should be saved. This is expected to be a folder in the `analysis_dir`.
power_image_dir (string)	This directory allows us to specify power curve images from other images generated during the course of analysis. This is expected to be a directory in the `image_dir`.
image_pattern (string)	The file name pattern where images generated by this notebook should be saved. The blank indicates the type of diversity metric used to generate the image.

Table of contents

Power Analysis

Statistical Hypotheses and Error: A “toy” example

A More Formal Mathematical Definition for Power Analysis

Definition

Proof of Principle

Caveats and Considerations

Groups and Calculations

Notebook Requirements

Function Imports

Analysis Parameters

File Saving Parameters

Metadata and File Handling Parameters

Alpha Diversity Parameters

Beta Diversity Parameters

General Parameters for Power Analysis

Multiple Body Site Parameters

Parameters for Fecal Samples

Plotting Parameters

Files and Directories

Base and Working Directories

All Sample Directory and Files

Fecal Sample Directories and Files

Analysis Directories and Files

Data Download

Metadata Adjustment

Multiple Bodysite Power Calculation

Power Calculation for Fecal Samples

Effect Size Estimation

Power Curve Plotting

Alpha Diversity

Beta Diversity

Discussion

References