License: BSD
Copyright: Copyright American Gut Project, 2015


In [1]:
# This cell allows us to render the notebook in the way we wish no matter where
# the notebook is rendered.
from IPython.core.display import HTML
css_file = './ag.css'
HTML(open(css_file, "r").read())


Out[1]:

Power Analysis

While null hypothesis statistical testing demonstrates whether something has an effect, effect sizes are used to estimate the importance of an effect. There have been many calls in medical literature for the inclusion of effect size along with statistical p-values, as effect sizes increase the accuracy of comparison [1, 2]. We can leverage effect sizes within microbiome research in multiple ways. Understanding the effect size may help us rank which factors affect the microbiome the most. This, in turn, could help identify targets for intervention. Effect size can also help us design better studies. In the context of statistical power, effect size can help us estimate how many samples we might need to be reasonably confident that our hypothesis is true, given some margin of error.

The complex relation between humans and their resident microbes, especially the microbes in their guts, has been recognized in many areas of human health. Our microbial communities change during the course of our lives, although most of this change occurs within the first three years of life [3-6]. Long-term dietary patterns also have a large influence on the gut microbiome, although certain extreme dietary changes can force an acute change [6-9]. The gut microbiome can also be re-shaped by antibiotic use [10-12]. Microbiome alterations (dysbiosis) have also been reported in a number of diseases. Inflammatory bowel disease (IBD) and obesity are well studied examples in which the disease state is associated with dysbiosis [13-18]. Seasonal changes in the microbiome have also been reported [19]. We will demonstrate that in a healthy population of adults, age, alcohol consumption, exercise frequency, and sleep duration also impact the gut microbiome.

The list of factors that interact with the microbiome is based on the results of significance testing and does not consider effect size. However, effect size is still an important consideration in microbiome research for both experimental design and possible interventions. Using the American Gut data, we demonstrate that antibiotic use in the month prior to sample collection is associated with significantly decreased alpha diversity, compared with people who have not used antibiotics in the past year (p < 0.01). We also show that people who drink alcohol regularly (three or more times per week) have higher alpha diversity than those who do not consume alcohol (p < 0.01).

If these results were published, one can imagine a situation in which popular media might advise the regular consumption of alcohol use following a dose of antibiotics to help repopulate the microbiome. If we ignore the number of logical fallacies associated with this practice and assume that regular alcohol consumption is able to reseed the gut microbiome in a healthy way, the question then becomes if this will be an effective treatment. (Of course, this represents only one of many potential examples, and no real data beyond the significant association between alcohol consumption and microbiome diversity exists to support this mechanism.) Regular alcohol consumption following antibiotic use might be a good treatment strategy, if the effect size of alcohol consumption is on the same order or greater than the effect of antibiotic use. On the other hand, if the effect size of alcohol consumption is less than that of antibiotic use, encouraging adults who do not drink or drink rarely to increase their alcohol consumption may not lead to the desired effects.

The American Gut data set presents a rare opportunity for effect size prediction within the microbiome field. The large size and high degree of heterogeneity among participants and the amount of survey metadata collected from each participant facilitates the examination of effect size for factors that are either currently uncharacterized or under characterized in microbiome research. The accompanying challenge is that effect size calculations are difficult for the type of data collected in microbiome research. Most traditional effect size metrics for comparing samples between groups, such as Cohen’s d, make assumptions about the normality of the data being studied, while microbiome data is not normal. As a result, we will leverage a method of empirically estimating statistical power.

Return to the top

Statistical Hypotheses and Error: A “toy” example

Using statistics, we can test hypotheses. For example, we might want to test whether or not individual toy bricks cost different amounts depending on which playset they come from. Perhaps we’re trying to build the most epic toy brick playset to represent places where the microbiome is studied, and we’re trying to do this in the cheapest way possible. We’ve determined that there are two relatively inexpensive sets which contain a specific brick we want: a chemistry laboratory and a Komodo dragon enclosure. However, we’d also like to repurpose the rest of the bricks in the sets. So, we’d like to consider the cost per brick in each of the sets.

Therefore, we’re going to test the hypothesis that the cost per brick in the lab playset is different from the cost per brick in the Komodo dragon playset. We can write two possible outcomes for our statistical test, and call them hypotheses:

H0: The playset type is not related to the cost per brick
(costlab = costdragon)
H1: The playset type is related to the cost per brick
(costlab $\neq$ costdragon)

In an ideal world, we would be able to collect information about every set sold in the country, which statisticians would call the population. But, presumably, this is a side project, and we don’t have enough time or energy to be able to collect information about every single toy brick set of a particular model sold at every single store in the country. Rather than collect information about every single set in the country, let’s assume that we were able to collect price information the three playsets per week at three stores across the country during the 2013 calendar year. Let’s call each record of a playset price an observation. The total set of observations of each model is a sample.

In this notebook, we will try to use consistent language, and a sample will refer to a set of observations (whether they be the cost of toy brick playsets, randomly drawn data points, dice rolls, or diversity metrics calculated after sequencing physical 16S swabs). This differs slightly from the use of “samples” in other American Gut documentation, where “sample” refers to a physical specimen and the resultant 16S data.

Hypothesis $\rightarrow$ Reject H0
($cost_{lab} \neq cost_{dragon}$)
Fail to Reject H0
($cost_{lab} = cost_{dragon}$)
Truth ↓
$cost_{lab} = cost_{dragon}$
False Positive
Correct
$cost_{lab} \neq cost_{dragon}$ Correct False Negative

Our statistical test can make a mistake in two ways. We can reject H0 when it is, in fact, true. This leads to a false positive. In our toy example, this would be drawing the conclusion that the bricks within the two sets cost significantly different amounts, when in fact, they don’t. In a medical context, this might mean saying that someone has cancer, when in fact, they do not. Statisticians try to limit the risk associated with false positives by selecting a critical value, sometimes called $\alpha$. The critical value is the acceptable probability that a false positive has occurred. Statistical tests calculate a p value, which is the probability a false positive has occurred. The comparison of p values and critical values prompts rejection, or failure to reject a hypothesis. The null hypothesis is rejected when $p < \alpha$.

We can also make a mistake and fail to reject H0 when it is true. In the toy example, this would mean that we conclude that the costs of the bricks in the two set are the same, when they actually have different costs. We have committed a Type II error or selected a false negative result. In the cancer example, it would mean saying a patient is cancer free when they actually have the disease. This is also dangerous, but more difficult to test. The statistical power of an experiment is

1 – (probability of a false negative)
Effect size is related to the rates of both false positives and false negatives, along with the number of samples present.

Return to the top

A More Formal Mathematical Definition for Power Analysis

Traditionally, statistical power can only be calculated for certain types of data and certain statistical tests. Most effect size measures make assumptions about the distribution of the data (for example, Cohen’s $f^{2}$ is the power of an ANOVA, which assumes a normal distribution) [20, 21]. Unfortunately, most microbiome data violates one or more traditional assumptions. For example, many bacterial taxa are sparsely represented and therefore do not follow a normal distribution [22]. One approach to overcome problems associated with non-normal distributions is to use permutations [23]. We have applied a permutative approach to empirically estimate power for microbiome data.

Definition

Let's assume there exist a set of populations, $\{ K_{1}, K{2}, ... , K_{k} \}$ which can be characterized by some parameter, $\chi$ such that $\chi_{1} \neq \chi_{2} \neq ... \neq \chi_{k}$. From each of these populations, there exist samples, $\{ S_{1}, S_{2}, ... , S_{k} \}$ of sizes $N_{1}, N_{2}, ... , N_{k}$ and characterized by some parameter, $X$ where $X \approx \chi$. For our set of samples, we can apply some statistical test, $\mathbf{F}$, to test the pair of hypotheses: $$\textrm{H}_{\textrm{0}}\textrm{: } \chi_{1} = \chi_{2} = ... = \chi_{k}$$ $$\textrm{H}_{\textrm{1}}\textrm{: } \chi_{1} \neq \chi_{2} \neq ... \neq \chi_{k}$$ Let's assume that when we apply $\mathbf{F}$ to our set of samples, it gives us a probability, $p_{all}$ where $p_{all}$ is the probability that we have committed a Type I Error. We can set some critical value, $\alpha$ such that we reject H0 if $p_{all} < \alpha$.

If $N_{i}$ is sufficiently large, we can randomly subsample $S_{i}$ to generate $s_{i}$, a subsample of size $n_{i}$ where $n_{i} < N_{i}$. Each subsample will have a parameter, $x_{i}$, where $x_{i} \approx X_{i}$, and by the transitive property, $x_{i} \approx \chi_{i}$.
If we subsamples over all our sample set, we can apply our test $\mathbf{F}$ to $m$ rounds of random subsampling to generate a set of $p$ values, $\{p_{1}, p_{2}, ..., p_{m}\}$.

Since we know that we should reject the null hypothesis at the critical value, $\alpha$, when $p_{j} \geq \alpha \textrm{ (for }\{j | j \textrm{ } \epsilon \textrm{ }\mathbb{N} \textrm{ and } 1 \leq j \leq m\}$, we have observed a false negative and committed a type II error. We can define the empirical probability of committing a type II error, $\beta$, as $$\beta = \frac{|p_{j} > \alpha|}{p} \tag{1}$$ where $|a|$ is the number of elements in set $a$. Since statistical power, $P$ is defined as $1 - \beta$, we can say that $$P = 1 - \beta = 1 - \frac{|p_{j} \geq \alpha|}{|p|} = \frac{|p_{j} < \alpha|}{|p|} \tag{2}$$

If we repeat the subsampling process $g$ times, the central limit theorem applied to weak convergence says the mean of the power estimate, $\overline{P} \rightarrow P$.

Return to the top

Proof of Principle

The method of power analysis we’re using is a relatively new method. We wanted to demonstrate the empirical method gives similar results as those obtained using traditional power analysis techniques. We applied a case II student’s t-test, which compares the hypotheses,

H0: $\mu_{1} = \mu_{2}$
H1: $\mu_{1} \neq \mu_{2}$
where $\mu$ is the mean of a normal distribution.

The power for this test is given by $$PWR(\overline{x}_{1}, \overline{x}_{2}, s_{1}, s_{2}, n_{1}, n_{2}, \alpha) = pnorm \left (-z_{1 - \alpha/2} + \sqrt{\frac{(\overline{x}_{1} - \overline{x}_{2})^{2}}{s_{1}^{2}/n_{1} + s_{2}^{2}/n_{2}}}, 0, 1 \right ) \tag{3}$$ where $pnorm(x, 0, 1)$ is the probability density function of $x$ within a normal distribution with a mean of 0 and a standard deviation of 1.

We simulated two normal distributions with the same standard deviation. The difference between the two means was half the standard deviation. We generated samples and subsamples of equal size from these populations. This allows us to simplify our power equation to $$PWR(\overline{x}_{1}, \overline{x}_{2}, s_{1}, s_{2}, n, \alpha) = pnorm \left (-z_{1 - \alpha / 2} + \sqrt{\frac{n * (\overline{x}_{1} + \overline{x}_{2})^{2}}{s_{1}^{2} + s_{2}^{2}}}, 0, 1\right ) \tag{4}$$

We drew samples of 25, 50, 100, and 1000 observations from each of the two distributions, and compared the results using a t-test. The 95% rejection level is highlighted on each of the distributions by shading values above this level gray. The samples were used to generate power curves three ways. First, curves were generated using the traditional equation for power analysis (solid line). Next, an empirical power was calculated for each sample by drawing subsamples of different sizes (blue circles). A mean effect size was calculated for the empirical power, and this was used to extrapolate a power curve (dashed blue line).

There is fairly good agreement between the traditional and elucidated power curves, especially at higher power levels. By 50% power, the two curves are within 10% window of each other (the extrapolated curve overestimates power at lower sample sizes). As a first pass for effect size estimation, empirical power and extrapolated power curves are a reliable method which accurately estimate the statistical power. These are an improvement on currently available methods, especially for nonparametric data.

Return to the top

Caveats and Considerations

The power analyses we’re performing here assume the following things about our data and statistical tests:

  1. The samples are a representative, random sample of the underlying population.

  2. Samples are sufficiently large to allow random subsampling at a depth that facilitates power analysis. For some statistical tests, a subsample of less than 5 observations per group may not be appropriate.

  3. There is a significant difference between the samples, or reason to believe there is a significant difference between the populations.

  4. Samples satisfy any requirements of the statistical test, $\textbf{F}$. These may include things like a requirement the data follow a distribution.

Return to the top

Groups and Calculations

In this notebook, we will look at the effect size of nine metadata categories on the human gut microbiome through alpha and beta diversity. We will also include the comparison of gut samples and skin samples as a control, since it is widely accepted in the field that body site has the largest effect on adult human microbial communities [24]. We will pick the two most extreme states within a metadata category, since we know that most categories involve a continuum, and more extreme states will be more likely to show the effect.

This analysis will focus on the following metadata categories and states

  • Body Site (feces vs hand)
  • Inflammatory bowel disease (people with IBD vs. those without)
  • Antibiotic use (people who have used antibiotics in the past week or month vs. people who have not used them for more than a year prior to sample collection)
    We will combine these two categories into a group we’ll refer to as “In the past month”.
  • Types of plants eaten in a week (less than 5 types of plants vs. more than 30 types of plants) Age (20s vs. 60s)
  • Body mass index (normal vs. obese)
  • Collection season (summer vs. winter)
  • Frequency of Alcohol Consumption (never vs. daily)
  • How frequently an individual exercises (never or rarely vs. daily)
  • The number of hours of sleep a participant gets per night (less than six vs. more than eight)

To reduce some of the noise within the data, we will focus on samples in which other metadata categories have been controlled. So, each sample pair in a category above will be matched based on the following metadata categories:

  • Body site
  • IBD status (IBD or not)
  • Antibiotic use
  • Number of types of plants consumed in a week
  • Diabetes diagnosis
  • Age by decade
  • Collection season
  • Hours of sleep per night

Return to the top

Notebook Requirements

Return to the top

Function Imports

We can start by importing functions we’ll need from other libraries.


In [2]:
import os
import shutil
import copy
import pickle

import numpy as np
import scipy
import skbio
import matplotlib.pylab as plt
import pandas as pd

import americangut.power_plots as pp
import americangut.diversity_analysis as div_an

from IPython.display import HTML
from matplotlib import rcParams

from biom import load_table
from skbio.stats.power import subsample_paired_power

We will also set up some plotting parameters so the generated figures use Helvetica or Arial as their default font. For more on font properties, see the matplotlib documentation on text objects and rendering text with $\LaTeX$. We will also prompt the IPython notebook to display the images we generate live in the notebook.


In [3]:
# Displays images inline
%matplotlib inline

# Sets up plotting parameters so that the default setting is use to Helvetica
# in plots
rcParams['font.family'] = 'sans-serif'
rcParams['font.sans-serif'] = ['Helvetica', 'Arial']
rcParams['text.usetex'] = True

Return to the top

Analysis Parameters

We can also set some necessary parameters for handling files and this analysis. It’s easier to set them as a block, here, so that our systems are consistent than it is to modify each of the variables later in the import if our needs or our data change.

File Saving Parameters

In the course of this analysis, a series of files can be generated. The File Saving Parameters determine whether new files are saved.

overwrite
(boolian)

When overwrite is True, new files will be generated and saved during data processing. It is recommended that overwrite be set to False, in which case new files will only be generated when the file does not exist. This substantially decreases analysis time.

Overwrite will also cause the notebook to generate new power calculations, even if the files do not exist. Whether or not these power calculation results are saved can be controlled by setting save_intermediates.

save_intermediates
(boolian)
The current power code is computationally expensive and takes a long time to compute, due to the iterative nature. While code optimization may come in the future, it can be advantageous to save the power calculation results somewhere, so they can be retrieved later.
save_images
(boolian)
This notebook will generate images of the power curves. By default, these will be displayed inside the notebook. However, some users also find it advantageous to save the images files. The file format is set in image_pattern.

In [4]:
overwrite = False
save_intermediates = True
save_images = True

Return to the top

Metadata and File Handling Parameters

QIIME-formatted metadata and results files are frequently tab-separated text (.txt) files. These files can be opened in Excel or spreadsheet programs. You can learn more about QIIME mapping files here. We use the pandas library to read most of our text files, which provides some spreadsheet-like functionality.

txt_delim
(string)
txt_delim specifies the way columns are separated in the files. QIIME typically consumes and produces tab-delimited ("\t") text files (.txt) for metadata and results generation.
map_index
(string)
The name of the column containing the sample id name. In QIIME, this column is called #SampleID.
map_nas
(list of strings)
It is possible a mapping file map be missing values, since American Gut participants are free to skip any question. The pandas package is able to omit these missing samples from analysis. In raw American Gut files, missing values are typically denoted as “NA”, “no_data”, “unknown”, and empty spaces (“”).
write_na
(string)
The value to denote missing values when text files are written from Pandas data frames. Using an empty space, (“”) will allow certain QIIME scripts, like [group_signigance.py](http://qiime.org/scripts/group_significance.html), to ignore the missing values.
date_cols
(list of strings)
Temporal data can be identified using the date_cols.

In [5]:
txt_delim = '\t'
map_index = '#SampleID'
map_nas = ['NA', 'no_data', 'unknown', '']
write_na = ''
date_cols = ['RUN_DATE', 'COLLECTION_DATE', 'BIRTH_DATE', 'SAMPLE_TIME']

Return to the top

Alpha Diversity Parameters

This notebook will generate power curves for metrics associated with two concepts in ecology: Alpha and Beta Diversity. Alpha diversity is a comparison of intra-community variation. When alpha diversity values are compared, the comparison does not take into account the community structure. So, two communities which share no species can have the same alpha diversity. American Gut Analyses primarily focus on an alpha diversity metric called PD Whole Tree Diversity [25]. PD Whole Tree is phylogenetically aware, meaning that it takes into account shared evolutionary history. Unweighted UniFrac distance, the metric we typically use for beta diversity, is also phylogenetically aware.

a_div_metric
(string)
The alpha diversity metric to be used in the analysis. Mapping files generated by the Preprocessing Notebook have a set of mapping columns appended which provide the mean for several metrics. These are labeled as the metric name with “_mean” to indicate the values are the mean of 10 rarefactions.
a_title
(string)
The title to be displayed on the alpha diversity power curve.
a_suffix
(string)
If files are saved, this string is used to differentiate alpha diversity files from beta diversity.

In [6]:
a_div_metric = 'PD_whole_tree_mean'
a_title = 'PD whole tree Diversity'
a_suffix = 'alpha_pd'

We’re also going to define a function for alpha diversity power analysis. We will test our alpha diversity using a Kruskal-Wallis test, which is non-parametric and does not require normally distributed data. Our function will take the alpha diversity value for each group, and then compare them using the test.


In [7]:
# Defines the alpha diversity test function
def test_alpha_kruskal(ids, meta, a_div_metric):
    """Calculates difference in alpha diversity for a category"""
    # Gets the alpha diversity values at the location given
    alpha = [meta.loc[id_, a_div_metric] for id_ in ids]
    # Calculates the p value by comparing categories
    return scipy.stats.kruskal(*alpha)[1]

Return to the top

Beta Diversity Parameters

Beta diversity looks at the difference in community structure across two communities. Each metric calculates a distance between the communities, which is reflective of their difference. American Gut Analyses have calculated weighted and unweighted UniFrac distance matrices for the communities [26]. UniFrac distance takes into account the evolutionary relationship between samples, by determining what fraction of evolutionary history is different between two samples. Weighted UniFrac also takes into account the relative abundance of each taxa, while unweighted UniFrac distance only considers presence and absence.
We will focus primarily on unweighted UniFrac in this analysis. We see more significantly different effects when we use Unweighted UniFrac distance. Compositional analysis also identifies more significantly different taxa than abundance-based comparisons.

b_div_metric
(string)
This identifies the beta diversity metric to be used in the analysis. This name will appear at the beginning of the distance matrix file.
b_num_iter
(int)
Differences in beta diversity are frequently tested using a permutative test [[23](#Bondini)]. his takes care of many of the statistical constraints associated with distance matrices. b_num_iter sets the number of permutations performed on a distance matrix during beta diversity power calculation. A large number can slow processing considerably, since we much perform the permutation several times.
b_title
(string)
The title to be displayed on the beta diversity power curve.
b_suffix
(string)
If files are saved, this string is used to differentiate alpha diversity files from beta diversity, and different beta diversity metrics.

In [8]:
b_div_metric = 'unweighted_unifrac'
b_num_iter = 249
b_title = 'Unweighted UniFrac Distance'
b_suffix = 'beta_unweighted_unifrac'

We test for differences in beta diversity using the scikit-bio function permanova. We’ll combine all the sample ids, and then use a map and distance matrix with only these observations.


In [9]:
def test_beta_permanova(ids, meta, dm, cat, num_iter=499):
    """Tests difference in beta diversity for a category and distance matrix"""
    # Gets the map and distance matrix subset
    all_ids = np.hstack(ids)
    # Calculates the permanova
    perma_results = skbio.stats.distance.permanova(dm.filter(all_ids), 
                                                   meta.loc[all_ids],
                                                   cat,
                                                   num_iter)
    # Returns the p value
    return perma_results['p-value']

Return to the top

General Parameters for Power Analysis

The empirical power calculation method in scikit-bio is based on iteration. It works by matching pairs of observations based on a set of metadata categories. A number of observations (set using min_counts, max_counts, and counts_interval) are then drawn at random from this set of matched data, and a statistical test is performed. This is repeated a number of times (given by num_iter), and the fraction of samples below a critical value (p_crit) is calculated as the power at that sample depth. This is repeated several times to determine the empirical power with some degree of confidence.

num_iter
(int)
The number of times data should be subsampled at each sampling depth to calculate the statistical power for the sample.
num_runs
(int)
The number of times paired samples should be drawn for confidence interval calculation.
p_crit
(float)
The value of $\alpha$ (the probability of a false positive) acceptable for these power calculations. Empirical power will be based on the number of iterations for a sample set that are less than this value. For historical and cultural reasons, 0.05 is often used.
min_counts
(int)
The minimum number of samples drawn from each group during statistical testing. This should be set based on the expected effect size and number of available samples.
max_counts
(int)
The maximum number of samples drawn from each group during statistical testing. This should be set based on the expected effect size and number of available samples and should not exceed the size of the smallest group.
counts_interval
(int)
A sampling interval used to determine the number of samples which should be drawn during statistical testing. Samples will be drawn in a size increasing from the min_counts, to min_counts + counts_interval, min_counts + 2*counts_interval, and so on, up to max_counts.

In [10]:
# Parameters for power iteration and calculation
num_iter = 500
num_runs = 10
p_crit = 0.05

# Parameters for drawing samples
min_counts = 5
max_counts = 51
counts_interval = 5

Return to the top

Multiple Body Site Parameters

We will determine the effect size of body site on the human microbiome positive control. Body site has been demonstrated as one of the major driving factors in the community structure of the human microbiome [24]. As a result, we expect this power analysis to show a strong effect associated with body site. We will analyze using the same metadata category used to split the data in the Preprocessing Notebook, and we will include control oral and fecal samples. To minimize variation between samples, we will pair samples from the same individual.

all_cat
(string)
The metadata category use for body site comparison.
all_order
(string)
The body sites being analyzed.
all_controls
(string)
The metadata categories used to identify matched samples.

In [11]:
# Parameters for selecting
all_cat = 'BODY_HABITAT'
all_order = ['UBERON:oral cavity', 'UBERON:feces']
all_controls = ['HOST_SUBJECT_ID']

Because the effect size of body site is expected to be quite large, the counts interval we set for general use may not be effective here. If we start at five samples and use five-sample steps, we may miss important points on the curve. So, we’ll use a different set of counts parameters for the comparison across body sites.

all_min_counts
(int)
The minimum number of samples drawn from each group during statistical testing. This should be set based on the expected effect size and number of available samples.
all_max_counts
(int)
The maximum number of samples drawn from each group during statistical testing. This should be set based on the expected effect size and number of available samples and should not exceed the size of the smallest group.
all_counts_interval
(int)
A sampling interval used to determine the number of samples which should be drawn during statistical testing.

In [12]:
all_min_counts = 2
all_max_counts = 21
all_count_interval=1

Return to the top

Parameters for Fecal Samples

We will test the effect of multiple metadata variables on fecal samples. For each of these categories, we will compare two extreme groups to look for the most extreme differences across categories. We’ll approach this power analysis by looping through a set of commands for each metadata category. As a result, it is most convenient to combine the category with the groups in the category we wish to analyze. We also select our control categories here. These are factors used to select which samples will be paired. Here, we’ve tried to select for factors which are known to have a large effect size in hopes that this will decrease noise.

fecal_cats
(list of tuples)
A list of tuples which follow the format (category, order). For example, to look at inflammatory bowel disease status, this might be (‘IDB’, [‘I do not have IBD’, ‘IBD’]). The order list allows us to select which groups we’ll compare. To analyze all groups in a category, the order position may take a value of None.
fecal_control_cats
(list of strings)
The categories used to identify matched samples. So, if we are comparing in category A, but control for B, C, and D, samples will be selected where A is different but B, C, and D are the same.

In [13]:
fecal_cats = [('IBD', ['I do not have IBD', 'IBD']),
              ('ANTIBIOTIC_SELECT', ['In the past month',
                                     'Not in the last year']),
              ('TYPES_OF_PLANTS', ['Less than 5', 'More than 30']),
              ('AGE_CAT', ['20s', '60s']),
              ('BMI_CAT', ['Normal', 'Obese']),
              ('COLLECTION_SEASON', ['Winter', 'Summer']),
              ('ALCOHOL_FREQUENCY', ['Never', 'Daily']),
              ('EXERCISE_FREQUENCY', ['Rarely', 'Daily']),
              ('SLEEP_DURATION', ['Less than 6 hours', '8 or more hours'])]

fecal_control_cats = ['IBD', 'BMI_CAT', 'TYPES_OF_PLANTS', 'DIABETES',
                      'ANTIBIOTIC_SELECT', 'AGE_CAT', 'COLLECTION_SEASON',
                      'SLEEP_DURATION']

Return to the top

Plotting Parameters

The results of our power estimation will be plotted as power curves. We can choose how the curves will be displayed by setting the plot_counts to designate smoothness, the colormap to determine the colors used, the size of the legend text, and how the categories will be labeled.

plot_counts
(array)
The number of samples which should be drawn to plot the curve. The minimum of this should not be less than two, although the maximum can exceed the number of samples in any group.
plot_colormap
(array, None)
The colors used for the lines. If None is specified, the default colors from Statsmodels will be used. When a custom colormap is passed, it should have at least as many colors as there are categories in fecal_cats.
legend_size
(int)
The size of the text appearing in the figure legend.
label_size
(array of strings)
The way each category should appear in the final legend. This should include body site.

In [14]:
# Parameters for plotting
plot_counts = np.hstack((np.array([2]), np.arange(5, 255, 5)))
plot_colormap = None
legend_size=11
labels = np.array(['Body Site', 'IBD', 'Antibiotic Use', 'Plants Consumed', 'Age',
                   'BMI', 'Season', 'Alcohol Use', 'Exercise Frequency',
                   'Sleep Duration'])

When the figures are displayed, they can be scaled for in screen display or for saving. The parameters here are set up for an optimal display when the figure is saved.

figure_size
(tuple)
The height and width of the final figure, in inches.
legend_position
(tuple)
Where the legend should be placed in the final figure. The tuple gives (left, bottom) as a fraction of the axis size.
print_position
(tuple)
A four-element description of the size of the axis in the figure. This is given in inches. The tuple is give as (left, bottom, width, height).
space_position
(tuple)
To render the legend correctly, we have to create a dummy axis. This gives the location of the dummy axis within the figure in inches from the bottom left corner. Positions are (left, bottom, width, height).

In [15]:
# Parameters for displaying the axis
figure_size = (7, 3.5)
legend_position = (1.05, 0.95)
print_position = (0.6, 0.6, 4., 2.5)
space_position = (5., 0.6, 2., 2.5)

If you choose to save the figures that are generated here, these variables create white space around the sides of the figure and make it save properly.

save_pad
(tuple)
The extra space (in inches) for display around the edge of the figure.
save_bbox
(tuple, str)
The size of the image to be saved. Using a value of 'tight' will display the entire figure and allow padding.

In [16]:
# Parameters for saving the results
save_pad = 0.5
save_bbox = 'tight'

Return to the top

Files and Directories

We need to import working OUT data for analysis and set up a location where results from our analysis can be saved.

This notebook consumes pre-processed tables (OTU tables, mapping files and distance matrices) produced by the Preprocessing Notebook. These can be downloaded individually, or the whole set is available here.

As we set up directories, we’ll make use the of the check_dir function. This will create the directories we identify if they do not exist.

Base and Working Directories

We need a general location to do all our analysis; this is the base_dir. All our other directories will exist within the base_dir.

base_dir
(string)
The filepath for the directory where any files associated with the analysis should be saved. It is suggested this be a directory called agp_analysis, and be located in the same directory as the IPython notebooks.
working_dir
(string)
The file path for the directory where all data files associated with this analysis have been stored. This should contain the results of the Preprocessing Notebook.
The working_dir is expected to be a directory called sample_data in the base_dir.
analysis_dir
(string)
The file path where analysis results should be stored. This is expected to be a folder in the base_dir.

In [17]:
base_dir = os.path.join(os.path.abspath('.'), 'agp_analysis')
div_an.check_dir(base_dir)

working_dir = os.path.join(base_dir, 'sample_data')
div_an.check_dir(working_dir)

analysis_dir = os.path.join(base_dir, 'analysis_results')
div_an.check_dir(analysis_dir)

Return to the top

All Sample Directory and Files

Part of this analysis will use the rarefied, all sample table (all samples, all body sites) for a comparison between oral and fecal samples.

all_dir
(string)
The filepath for the directory where all bodysite files are stored. This should be a directory in the working_dir.
all_map_fp
(string)
The filepath for the metadata file associated with all samples. This is expected to be a processed metadata file generated by the preprocessing notebook, and contain columns describing alpha diversity.
all_uud_fp
(string)
The filepath for the unweighted UniFrac distance matrix associated with the all sample file.

In [18]:
# Checks the all sample directory
all_dir = os.path.join(working_dir, 'all')

# Sets up the file paths in the all directory. We will only use unweighted UniFrac distance
all_map_fp = os.path.join(all_dir, 'AGP_100nt_even10k.txt')
all_uud_fp = os.path.join(all_dir, 'unweighted_unifrac_AGP_100nt_even10k.txt')

Return to the top

Fecal Sample Directories and Files

Power analysis will also focus on fecal samples. We’ve chosen to work with a table containing a single sample for all individuals who submitted a fecal sample to the American Gut Project. Within this directory, we’ll be using the results of rarefied data.

site_dir
(string)
The filepath for the directory where data sets from fecal samples are stored. This should be a directory in the working_dir.
data_dir
(string)
The filepath of the all participant single sample directory. This should be a folder in the site_dir.
data_map_fp
(string)
The filepath for the metadata file associated with the fecal samples. This is expected to be a processed metadata file generated by the preprocessing notebook, and contain columns describing alpha diversity.
data_uud_fp
(string)
The filepath for the unweighted UniFrac distance matrix associated with the fecal sample dataset.

In [19]:
site_dir = os.path.join(working_dir, 'fecal')
div_an.check_dir(site_dir)

data_dir = os.path.join(site_dir, 'all_participants_one_sample')

data_map_fp = os.path.join(data_dir, 'AGP_100nt_even10k_fecal.txt')

data_uud_fp = os.path.join(data_dir, '%s_AGP_100nt_even10k_fecal.txt' % b_div_metric)

Return to the top

Analysis Directories and Files

During the course of this analysis, we may choose to save the results we generate as files that can be re-loaded into power analysis in the future. It may be useful to save these intermediate files since the power analysis takes time to run.

results_dir
(string)
A folder where files summarizing the power calculation results for each run should be stored. This is expected to be a folder in the analysis_dir.
site_pickle_pattern
(string)
Individual power analyses (numpy arrays of the power curve results) are saved using the python Pickle module. The blanks specify the diversity metric used for comparison, the metadata category, and the two groups within that category.
The file pattern contains blanks which can be filled in with information about the specific sample.

In [20]:
results_dir = os.path.join(analysis_dir, 'power')
div_an.check_dir(results_dir)

sites_pickle_pattern = 'AGP_100nt_even10k_%(div_metric)s__%(category)s__%(group1)s__vs__%(group2)s.p'

We also have the option to save image results. Saving images this way gives you more control over the final format of the images. You can learn more about the way matplotlib saves images here.

image_dir
(string)
If power curves are being saved as images, this specifies the directory where all images should be saved. This is expected to be a folder in the analysis_dir.
power_image_dir
(string)
This directory allows us to specify power curve images from other images generated during the course of analysis. This is expected to be a directory in the image_dir.
image_pattern
(string)
The file name pattern where images generated by this notebook should be saved. The blank indicates the type of diversity metric used to generate the image.

In [21]:
image_dir = os.path.join(analysis_dir, 'images')
div_an.check_dir(image_dir)

power_image_dir = os.path.join(image_dir, 'power')
div_an.check_dir(power_image_dir)

image_pattern = os.path.join(power_image_dir, 'matched_power_%s.png')

Return to the top

Data Download

We will start our analysis using the clean, rarefied tables generated by the Preprocessing Notebook. If necessary, these files can be downloaded. The necessary files are then loaded into the notebook for analysis and processing.


In [22]:
# Gets the data for the all sample data directory
if overwrite or not (os.path.exists(all_map_fp) and os.path.exists(all_uud_fp)):
    # Downloads the directory file
    !curl -OL https://www.dropbox.com/s/x70bauf6k5cs59e/all.tgz
    # Extracts the data
    !tar -xzf all.tgz
    # Moves the directory
    os.remove(os.path.join('.', 'all.tgz'))
    shutil.move(os.path.join('.', 'all'), all_dir)

In [23]:
# Gets data for the single sample per participant fecal sample directory
if overwrite or not (os.path.exists(data_map_fp) and 
                     os.path.exists(data_uud_fp)):
    # Downloads the files
    !curl -OL https://www.dropbox.com/s/la3q3zntacei1c2/all_participants_all_samples.tgz
    # Extracts the data
    !tar -xzf all_participants_all_samples.tgz
    # Moves the directory
    os.remove(os.path.join('.', 'all_participants_all_samples.tgz'))
    shutil.move(os.path.join('.', 'all_participants_all_samples/'), data_dir)

In [24]:
# Loads the fecal files into the notebook
data_map = pd.read_csv(data_map_fp,
                       sep=txt_delim, 
                       na_values=map_nas,
                       parse_dates=date_cols,
                       index_col=False)
data_map.index = data_map[map_index]

data_uud = skbio.DistanceMatrix.read(data_uud_fp)

# Loads the all sample data into the notebook
all_map = pd.read_csv(all_map_fp,
                      sep=txt_delim, 
                      na_values=map_nas,
                      parse_dates=date_cols,
                      index_col=False)
all_map.index = all_map[map_index]

all_uud = skbio.DistanceMatrix.read(all_uud_fp)

Return to the top

Metadata Adjustment

Certain categories in American Gut are quite small, but still represent some extremes. So, we combine all participants with IBD (Crohns and Ulcerative Colitis) into a single category called IBD. We will combine people who used antibiotics in the week prior to sample collection with those who used them in the month before they took their sample. Among American Gut participants who submitted samples in rounds 1 -14, there are only 10 participants who report never exercising and provided enough data to allow them to be matched for the control categories. To compensate, we combine people who report never exercising and people who report rarely exercising.


In [25]:
# Combines individuals with Ulcerative Colitis and Crohn's disease into a
# single category
data_map.loc[data_map.IBD == "Crohn's disease", 'IBD'] = 'IBD'
data_map.loc[data_map.IBD == "Ulcerative colitis", 'IBD'] = 'IBD'

# Combines the never exercises category with the rarely exercises category
data_map.loc[data_map.EXERCISE_FREQUENCY == 'Never', 'EXERCISE_FREQUENCY'] = 'Rarely'
data_map.loc[data_map.EXERCISE_FREQUENCY == 'Rarely (few times/month)', 'EXERCISE_FREQUENCY'] = 'Rarely'

# Combines people who took antibiotics in the past week and in the past month
data_map.loc[data_map.ANTIBIOTIC_SELECT == 'In the past week', 'ANTIBIOTIC_SELECT'] = 'In the past month'

Return to the top

Multiple Bodysite Power Calculation

We’ll start by looking at the statistical power for comparisons of fecal and oral samples from the same individual for both alpha and beta diversity. We can do this either by generating new files (if overwrite is True, or the data from the analysis has not already been saved somewhere), or by loading the analysis files which already exist. Since power analysis can be computationally expensive, users who may need to reference or adjust their power curves (for example, playing with the metadata categories being examined) are encouraged to save their intermediate files, so these can be re-loaded. It can take upwards of an hour for each of these cells to run on an 8gb 2012 MacBook Pro while running other programs.


In [26]:
# Gets the alpha diversity power array. We can import this, or we can make a new one
all_alpha_blanks = {'div_metric': a_suffix,
                    'category': all_cat,
                    'group1': all_order[0].split(':')[1].replace(' ', '_'),
                    'group2': all_order[1].split(':')[1].replace(' ', '_')}
all_alpha_fp = os.path.join(results_dir, sites_pickle_pattern) % all_alpha_blanks

In [27]:
# If the power results don't exist, we generate them
if overwrite or not os.path.exists(all_alpha_fp):
    
    # Defines the test function
    alpha_test = lambda x: test_alpha_kruskal(x, all_map, a_div_metric)
    
    # Calculates the statistical power
    all_a_pwr, all_a_cnts = subsample_paired_power(test=alpha_test,
                                                   meta=all_map,
                                                   cat=all_cat,
                                                   control_cats=all_controls,
                                                   min_counts=all_min_counts,
                                                   counts_interval=all_count_interval,
                                                   max_counts=all_max_counts,
                                                   order=all_order,
                                                   num_iter=num_iter,
                                                   num_runs=num_runs,
                                                   alpha_pwr=p_crit)
    # Saves the power results
    if save_intermediates:
        pickle.dump([all_a_pwr, all_a_cnts], open(all_alpha_fp, 'wb'))

# If the power results exist somewhere else, we load them
else:
    [all_a_pwr, all_a_cnts] = pickle.load(open(all_alpha_fp, 'rb'))

In [28]:
# Gets the beta diversity power array. We can import this, or we can make a new one
all_beta_blanks = {'div_metric': b_suffix,
                   'category': all_cat,
                   'group1': all_order[0].split(':')[1].replace(' ', '_'),
                   'group2': all_order[1].split(':')[1].replace(' ', '_')}
all_beta_fp = os.path.join(results_dir, sites_pickle_pattern) % all_beta_blanks
# If the power results don't exist, we generate them
if overwrite or not os.path.exists(all_beta_fp):

    # Defines the test function
    beta_test = lambda x: test_beta_permanova(x, all_map, all_uud, all_cat, num_iter)
    # Calculates the statistical power    
    all_b_pwr, all_b_cnts = subsample_paired_power(test=beta_test,
                                                   meta=all_map,
                                                   cat=all_cat,
                                                   control_cats=all_controls,
                                                   min_counts=all_min_counts,
                                                   counts_interval=all_count_interval,
                                                   max_counts=all_max_counts,
                                                   order=all_order,
                                                   num_iter=num_iter,
                                                   num_runs=num_runs,
                                                   alpha_pwr=p_crit)
    # Saves the power results
    if save_intermediates:
        pickle.dump([all_b_pwr, all_b_cnts], open(all_beta_fp, 'wb'))

# If the power results exist somewhere else, we load them
else:
    [all_b_pwr, all_b_cnts] = pickle.load(open(all_beta_fp, 'rb'))

Return to the top

Power Calculation for Fecal Samples

We can now focus on calculating power for the fecal samples. The scikit bio function we’re using will draw fecal samples which are matched for the control categories we set earlier, but either belongs to group 1 or group 2 in the category we’re varying. If we’re varying a category that is also a control category (for example, if we’re looking at the power for IBD), this will be removed from the control categories. Once again, we have the option of saving, and then loading the analysis files, if necessary. In its current incarnation, this notebook can be run on a personal computer. However, running these two cells typically takes six to eight hours.


In [29]:
# Alocates a list for the power and count tables
a_counts = []
a_powers = []
# Prealocates an object to hold the results
for idx, (cat, order) in enumerate(fecal_cats):
    
    # Removes the category from the control cats, if necessary
    if cat in fecal_control_cats:        
        ctrl_cats = copy.deepcopy(fecal_control_cats)
        ctrl_cats.remove(cat)
    
    # Sets up the file names for the alpha and beta diversity pickle files
    alpha_blanks = {'div_metric': a_suffix,
                    'category': cat,
                    'group1': order[0].replace('In the', '').replace(' ', '_'),
                    'group2': order[1].replace(' in the', '').replace(' ', '_')}
    alpha_fp = os.path.join(results_dir, sites_pickle_pattern) % alpha_blanks
 
    # Loads or calculates the alpha diversity power
    if overwrite or not os.path.exists(alpha_fp):
        alpha_test = lambda x: test_alpha_kruskal(x, data_map, a_div_metric)
        a_pwr, a_cnts = subsample_paired_power(test=alpha_test,
                                               meta=data_map,
                                               cat=cat,
                                               control_cats=ctrl_cats,
                                               min_counts=min_counts,
                                               counts_interval=counts_interval,
                                               max_counts=max_counts,
                                               order=order,
                                               num_iter=num_iter,
                                               num_runs=num_runs,
                                               alpha_pwr=p_crit)
        if save_intermediates:
            pickle.dump([a_pwr, a_cnts], open(alpha_fp, 'wb'))
    else:
        [a_pwr, a_cnts] = pickle.load(open(alpha_fp, 'rb'))

    # Adds the results to the vector
    a_powers.append(a_pwr)
    a_counts.append(a_cnts)

In [30]:
# Prealocates a list for the power and count tables
b_counts = []
b_powers = []

# Prealocates an object to hold the results
for idx, (cat, order) in enumerate(fecal_cats):
    
    # Removes 
    if cat in fecal_control_cats:        
        ctrl_cats = copy.deepcopy(fecal_control_cats)
        ctrl_cats.remove(cat)
    beta_blanks = {'div_metric': b_suffix,
                   'category': cat,
                   'group1': order[0].replace('In the', '').replace(' ', '_'),
                   'group2': order[1].replace(' in the', '').replace(' ', '_')}
    beta_fp = os.path.join(results_dir, sites_pickle_pattern) % beta_blanks

    # Loads or calculates the beta diversity power
    if overwrite or not os.path.exists(beta_fp):
        beta_test = lambda x: test_beta_permanova(x, data_map, data_uud, cat, num_iter)
        b_pwr, b_cnts = subsample_paired_power(test=beta_test,
                                              meta=data_map,
                                               cat=cat,
                                               control_cats=ctrl_cats,
                                               min_counts=min_counts,
                                               counts_interval=counts_interval,
                                               max_counts=max_counts,
                                               order=order,
                                               num_iter=num_iter,
                                               num_runs=num_runs,
                                               alpha_pwr=p_crit)
        if save_intermediates:
            pickle.dump([b_pwr, b_cnts], open(beta_fp, 'wb'))
    else:
        [b_pwr, b_cnts] = pickle.load(open(beta_fp, 'rb'))
    # Adds the results to the vector
    b_powers.append(b_pwr)
    b_counts.append(b_cnts)

Return to the top

Effect Size Estimation

Now that we have a list of power estimations at count depths, we can use the Statsmodels python package to estimate the effect size [27]. Statsmodels will take the power, number of samples, and critical value and solve for an effect size.


In [31]:
# Calculates the effect size and confidence interval for the alpha and beta diversity
a_eff_means, a_eff_bounds = pp.collate_effect_size(a_counts, a_powers, 
                                                   alpha=p_crit)
b_eff_means, b_eff_bounds = pp.collate_effect_size(b_counts, b_powers, 
                                                   alpha=p_crit)

We will order our later displays from largest to smallest using the effect size. We’ll use beta diversity for this effect, since it is more likely to reveal differences in community structure.


In [32]:
# Determines the order for the category
plot_order = np.argsort((b_eff_means + a_eff_means)/2)[::-1]
a_order = np.argsort(a_eff_means)[::-1]
b_order = np.argsort(b_eff_means)[::-1]
# Orders the colormap, if necessary
if plot_colormap is not None:
    plot_colormap = plot_colormap[plot_order, :]

We can also use the effect sizes to calculate the number of samples we’d need to analyze to get 80% power in each sample category. Keep in mind that these are not the actual number of samples which need to be collected. A variety of factors, including a low biomass sample and sheer dumb luck, may contribute to or prevent a sample from being amplified during PCR. We expect between 80 and 95% of samples to be amplified, depending on the source. If samples are low biomass, difficult to obtain, or cannot be resequenced for some reason, technical replicates may be used to help correct for this sequencing failure. Within the American Gut Project, we resequence the second swab if the first fails to amplify.


In [33]:
html_table = pp.summarize_effect(labels[plot_order + 1], a_eff_means[plot_order], b_eff_means[plot_order],
                                 a_eff_bounds[plot_order], b_eff_bounds[plot_order])
# html_table
HTML(html_table)


Out[33]:
Category Alpha Beta
IBD 20 ± 2 15 ± 2
Plants Consumed 35 ± 8 20 ± 2
Antibiotic Use 40 ± 7 25 ± 4
Age 100 ± 20 25 ± 2
Alcohol Use 60 ± 9 35 ± 2
Sleep Duration 40 ± 7 55 ± 7
Season 80 ± 11 45 ± 2
Exercise Frequency 270 ± 95 60 ± 8
BMI 435 ± 174 60 ± 9

We can see from the effect size calculations that, at a minimum, single-site comparisons requires at least twenty cross sectional samples per group. So, for IBD sampling, we’d need to analyze 20 IBD cases and 20 IBD controls to see a significant difference 80% of the time.

Return to the top

Power Curve Plotting

Finally, we’re going to plot power curves to help us visualize the differences in effect size. We’re going to first plot the power from the fecal samples. Then, we’ll add the trace line for our bodysite data.

Alpha Diversity

Let’s start by plotting the alpha diversity curve.


In [34]:
# Plots the power curve figure
alpha_fig = pp.plot_effects(effect_means=a_eff_means[plot_order],
                            effect_bounds=a_eff_bounds[plot_order],
                            labels=labels[plot_order + 1],
                            sample_counts=plot_counts,
                            colormap=plot_colormap,
                            title=a_title,
                            legend_size=legend_size)
# Adds the body site figure
pp.add_average_trace(alpha_fig, all_a_pwr, all_a_cnts, 
                     labels[np.hstack((0, plot_order+1))],
                     legend_pad=space_position,
                     figure_size=figure_size,
                     legend_position=legend_position)

# Saves the figure, if desired
if save_images:
    alpha_fig.savefig(image_pattern % a_suffix,
                      pad_inches=save_pad,
                      bbox_inches=save_bbox)


When we read the power curve, it relates the number of samples to the power, or the fraction of times we see a significant difference, based on the assumption there is a significant difference. Steeper curves, closer to the left of the plot, being to factors which have larger effects. In these plots, we’ve chosen to represent the confidence interval around each power curve with dashed lines. We’d expect that in 95% of experiments, the number of samples needed to analyze the data would fall between these values, assuming we’ve selected a representative sample. Since we’re basing our results on the data we currently have, the effect may change as the population becomes more defined.

Return to the top

Beta Diversity

We can also look at the power associated with unweighted UniFrac Distance.


In [35]:
# Plots the power curve figure
beta_fig = pp.plot_effects(effect_means=b_eff_means[plot_order],
                           effect_bounds=b_eff_bounds[plot_order],
                           labels=labels[plot_order + 1],
                           sample_counts=plot_counts,
                           colormap=plot_colormap,
                           title=b_title,
                           legend_size=legend_size)
# Adds the body site figure
pp.add_average_trace(beta_fig, all_b_pwr, all_b_cnts, 
                     labels[np.hstack((0, b_order+1))],
                     legend_pad=space_position,
                     figure_size=figure_size,
                     legend_position=legend_position)

# Saves the figure, if desired
if save_images:
    beta_fig.savefig(image_pattern % b_suffix,
                      pad_inches=save_pad,
                      bbox_inches=save_bbox)


Return to the top

Discussion

From our empirical effect size estimations, we can begin to rank effects of different factors on the microbiome. As demonstrated by the Human Microbiome Project (HMP), body site has the largest effect on the human microbiome [24].

We find that an IBD diagnosis has one of the largest impacts on the gut microbiome. While dysbiosis in IBD is likely a complex relationship between the host genetics, host lifestyle, and the microbiome, the disease is characterized by a profoundly altered microbiome and intestinal inflammation [13-15]. As with IBD, the microbiome is considered strongly predictive of obesity [16, 17, 28]. However, we find the effect of obesity on the microbiome is smaller than lifestyle factors like sleep and exercise for both alpha and beta diversity. This finding is corroborated, to some degree, by a recent meta analysis which observed consistent IBD-related microbial signatures across studies while obesity-related dysbiosis was more strongly influenced by technical effects [13].

We might have predicted the difference in magnitude of these effects based on gut pathology. IBD has been recognized as a disease of the intestines and a diagnosis is based on intestinal pathology [29, 30]. It’s likely the complex interactions of IBD-specific bacteria and bacterial strains and host inflammation shapes the community structure in a unique way. It seems less likely we will see such a strong disease-associated effect in conditions that do not traditionally have gut involvement. It might be more reasonable to expect they will have an effect size closer to obesity.

We also find that antibiotic use has a large effect on both alpha and beta diversity. This has been previously recognized [10-12]. Intuitively, it’s also not entirely unexpected; the goal of antibiotic treatment is to kill bacteria. Therefore, a systemic dose of antibiotics will likely be associated with off-target effects.

Among lifestyle factors, we see the one dietary variable, the number of types of plants consumed in a week, has the largest influence. This effect is larger than the effect of recent antibiotic use. Long term dietary patterns are known to have a strong effect on the microbiome [6-9]. In conjunction with this finding, we also observe that daily alcohol consumption has a smaller effect on the microbiome than plant consumption, or age, but a larger effect than obesity.
We also found that sleep duration has a moderate effect. Individuals who sleep less six hours or less each night have a less diverse, shifted gut microbiome than those who sleep more than eight hours per night. This effect is larger than the effect of obesity. Circadian dysfunction has been linked to obesity, although the mechanism frequently cited is a disruption of mammalian core clock genes [31].
Exercise is known to have an effect on the microbiome, and the effect of daily exercise, compared to those who exercise less than once a week, is noticeable, but small in the scheme of the effects investigated here [32, 33]. The effect of daily exercise on beta diversity is of the same order of magnitude as the effect of obesity, while the effect on alpha diversity is much larger.

We also find strong effects associated with temporal variation. Changes in the microbiome at the extremes of the human lifespan - from early development to the effects of aging on the microbiome have been well studied [3-6]. The most extreme changes in the microbiome occur during the first three years of life; an infant’s microbiome is highly plastic and does not resemble an adult microbiome at the same site [3, 4, 34]. We see a strong effect of age in adults: age has almost as large an effect on beta diversity as antibiotic use, although the effect on alpha diversity is much smaller.

We also see a seasonal effect, which was previously observed in a population of 60 matched samples [19]. The effect of collection season on the microbiome is on the same order as the effect we see for sleep duration, and larger than the effect seen for obesity. This may encourage a shift in experiment design. At a minimum, it suggests that tracking collection date in human microbiome studies may be important.

Our power analysis allows us to rank the effect sizes of disease, lifestyle, and temporal factors on the human microbiome. We find that effect sizes are generally larger in beta diversity than in alpha diversity analyses, suggesting that most changes associated with the extreme states are caused by changes in the taxa present between the two states. A more concerning aspect to the power analysis, however, suggests that many cross-sectional microbiome studies currently in the literature are underpowered. We hope the effect size predictions here can serve as a guide for future studies, and encourage appropriate sample size selection.

Return to the top

References

  1. Nakagawa, S. and Cuthill, I.C. (2007) “Effect size, confidence interval and statistical significance: a practical guide for biologists.” Biol Rev Camb Philos Soc. 82: 591 - 605.

  2. Sullivan, G.M. and Feninn, R. (2012) “Using effect size - or why p value is not enough.” J Grad Med Educ. 4: 279 - 282.

  3. Koenig, J.E.; Spor, A.; Scalfone, N.; Fricker, A.D.; Stombaugh, J.; Knight, R.; Angenent, L.T.; and Ley, R.E. (2011). “Succession of microbial consortia in the developing infant gut microbiome.</a>” PNAS. 108 Suppl 1: 4578 - 4585.

  4. Yatsunenko, T.; Rey, F.E.; Manary, M.J.; Trehan, I.; Dominguez-Bello, M.G.; Contreras, M.; Magris, M.; Hidalgo, G.; Baldassano, R.N.; Anokhin, A.P.; Heath, A.C.; Warner, B.; Rdder, J.; Kuczynski, J.; Caporaso, J.G.; Lozupone, C.A.; Lauber, C.; Clemente, J.C.; Knights, D.; Knight, R. and Gordon, J.I. (2012) “Human Gut microbiome viewed across age and geography.” Nature. 486: 222-227.

  5. Claesson, M.J.; Cusacks, S.; O’Sullivan, O.; Greene-Diniz, R.; de Weerd, H.; Flannery, E.; Marchesi, J.R.; Falush, D.; Dinan, T.; Fitzgerald, G.; Stanton, C.; van Sinderen, D.; O’Connor, M.; Harnedy, N.; O’Connor, K.; Henry, C.; O’Mahony, D.; Fitzgerald, A.P.; Shananhan, F.; Twomey, C.; Hill, C.; Ross, R.P.; and O’Toole, P.W. (2011). Composition, variability and temporal stability of the intestinal microbiota of the elderly.” PNAS. 108 Suppl 1: 4586 - 4591.

  6. Claesson, M.J.; Jeffery, I.B.; Conde, S.; Power, S.E.; O’Connor, E.M.; Cusack, S.; Harris, H.M.; Coakley, M.; Lakshminarayanan, B.; O’Sullivan, O.; Fitzgerald, G.F; Deane, J.; O’Connor, M.; Harnedy, N.; O’Connor, K.; O’Mahony, D.; van Sinderen, D.; Wallace, M.; Brennan, L.; Stanton, C.; Marchesi, J.R.; Fitzgerald, A.P.; Shanahan, F.; Hill, C.; Ross, R.P.; and O’Toole, P.W. (2012). “Gut microbiota composition correlates with diet and health in the elderly.” Nature. 488: 178-184.

  7. Muegge, B.D.; Kuczynski, J.; Knights, D.; Clemente, J.C.; Gonzalez, A.; Fontana, L.; Henrissat, B.; Knight, R.; and Gordon, J.I. (2011) “Diet drive convergence in gut microbiome functions across mammalian phylogeny and within humans.” Science. 332: 970 - 974.

  8. Wu, G.D.; Chen, J.; Hoffmann, C.; Bittinger, K.; Chen, Y.Y.; Keilbaugh, S.A.; Bewtra, M.; Knights, D.; Walters, W.A.; Knight, R.; Sinah, R.; Gupta, K.; Baldassano, R.; Nessel, L.; Li, H.; Bushman, F.D.; Lewis, J.D. (2011) “Linking long-term dietary patterns with gut microbiome enterotypes.Science. 334: 105-108.

  9. David, L.A.; Maurice, C.F.; Carmondy, R.N.; Gottenberg, D.B.; Button, J.E.; Wolfe, B.E.; Ling, A.V.; Devlin, A.S.; Varma, Y.; Fischbach, M.A.; Biddinger, S.B.; Button, R.J.; Turnbaugh, P.J. “Diet rapidly and reproducibly alters the human gut microbiome.” Nature. 505: 559-563.

  10. Manichanh, C.; Reeder, J.; Gibert, P.; Varela, E.; Llopis, M.; Antolin, M.; Guigo, R.; Knight, R.; Gaurner, F. (2010) “Reshaping the gut microbiome with bacterial transplantation and antibiotic intake.” Genome Res. 20: 1411 - 1419.

  11. Dethlefsen, L. and Relman, D.A. (2011) “Incomplete recovery and individualized responses of the human distal gut microbiota to repeated antibiotic perturbation.” PNAS. 108 Suppl 1: 4554-4561.

  12. Jakobsson, H.E.; Jernberg, C.; Andersson, A.F.; Sjölund-Karlsson, M; Jansson, J.K.; and Engstrand, L. (2010). “Short-term antibiotic treatment has differing long-term impacts on the human throat and gut microbiome.” PLoS One. 5: e9836.

  13. Walters, W.A.; Zu, Z.; and Knight, R. (2014) "Meta-analysis of human gut microbes associated with obesity and IBD.” FEBS Letters. 588: 4223-4233.

  14. Knights, D.; Silverberg, M.S.; Weersma, R.K.; Gevers, D.; Dijkstra, G.; Huang, H.; Tyler, A.D.; von Sommeren, S.; Imhann, F.; Stempak, J.M.; Huang, H.; Vangay, P.; Al-Ghalith, G.A.; Russell, C.; Sauk, J.; Knight, J.; Daly, M.J.; Huttenhower, C.; and Xavier, R.J. (2014). "Complex host genetics influence the microbiome in inflammatory bowel disease." Genome Med. 6: 107.

  15. Rehman, A.; Rausch, P.; Wang, J.; Skieceviciene, J.; Kiudelis, G.; Bhagalia, K.; Amarapurkar, D.; Kupcinskas, L.; Schreiber, S.; Rosenstiel, P.; Baines, J.F.; and Ott, S. (2015). "Geographical patterns of the standing and active human gut microbiome in health and IBD." Gut. doi: 10.1136/gutjnl-2014-308341

  16. Turnbaugh, P.J.; Bäckhed, F.; Fulton, L.; and Gordon, J.I. (2008). “Diet-induced obesity is linked to marked but reversible alterations in the mouse distal gut microbiome.” Cell Host Microbe. 3: 213 - 223.

  17. Turnbaugh, P.J.; Hamady, M.; Yatsunenko, T.; Cantarel, B.L.; Duncan, A.; Ley, R.E.; Sogin, M.L.; Jones, W.J.; Roe, B.A.; Affourtit, J.P.; Egholm, M.; Henrissat, B.; Knight, R.; and Gordon, J.I. (2009). "A core gut microbiome in obese and lean twins." Nature. 457: 480 - 484.

  18. Ridaura, V.K.; Faith, J.J.; Rey, F.E.; Cheng, J.; Duncan, A.E.; Kau, A.L.; Griffin, N.W.; Lombard, V.; Henrissat, B.; Bain, J.R.; Muehlbauer, M.J.; Ilkayeva, O.; Semenkovich, C.F.; Funai, K.; Hayashi, D.K.; Lyle, B.J.; Martini, M.C.; Ursell, L.K.; Clemente, J.C.; Van Treuren, W.; Walters, W.A.; Knight, R.; Newgard, C.B.; Heath, A.C.; and Gordon, J.I. (2013). "Gut microbiota from twins discordant for obesity modulate metabolism in mice." Science. 341: 1241214.

  19. Davenport, E.R.; Mizrahi-Man, O.; Michelini, K.; Barreiro, L.B.; Ober, C.; Gilad, Y. (2014). “Seasonal Variation in human gut microbiome composition.” PLoS One. 9:e90731.

  20. Cohen, J. (1988) “The Analysis of Variance.pdf)”. Statistical Power Analysis for the Behavioral Sciences. Ch 8. Second Ed. Hillsdale: Lawrence Erlbaum and Associates. pp. 273 - 288.

  21. Zar, J. (1999) Biostatistical Analysis. Fourth Ed. Upper Saddle River: Prentice Hall. pp 185.

  22. La Rosa, P.S.; Brooks, J.P.; Deych, E.; Boone, E.L.; Edwards, D.J.; Wang, Q.; Sodergren, E.; Weinstock, G.; Shannon, W.D. (2012) “Hypothesis testing and power calculations for taxonomic-based human microbiome data.” Plos One. 7:e52078.

  23. Bondini, M.E.; Mielke, P.W.; Berry, K.J. (1988) "Data-dependent permutation techniques for the analysis of ecological data.” Vegetatio. 75: 161-168.

  24. The Human Microbiome Consortium. (2012) “Structure, Function and diversity of the healthy human microbiome.” Nature. 486: 207-214.

  25. Eckburg, P.B.; Bik, E.M.; Bernstein C.N.; Purdom, E.; Dethlefson, L.; Sargent, M.; Gill, S.R.; Nelson, K.E.; Relman, D.A. (2005) “Diversity of the human intestinal microbial flora.Science. 308: 1635-1638

  26. Lozupone, C.; and Knight, R. (2005). “UniFrac: a new phylogenetic method for comparing microbial communities.” Appl Enviro Microbiol. 71: 8228-8235.

  27. Seabold. J.S. and Perktold. J. (2010). “Statsmodels: Econometric and Statistical Modeling in Python.” Proceedings of the 9th Python in Science Conference.

  28. Hartstra, A.V.; Bouter, K.E.; Bäckhed, F.; Nieuwdorp, M. (2015) "Insights into the role of the microbiome in obesity and type 2 diabetes." Diabetes Care. 38: 159-165.

  29. Bernstein, C.N.; Fried, M.; Krabshuis, J.H.; Cohen, H.; Eliakim, R.; Fedail, S.; Gearry, R.; Goh, K.L.; Hamid, S.; Khan, A.G.; LeMair, A.W.; Malfertheiner, Ouyang, Q.; Rey, J.F.; Sood, A.; Steinwurz, F.; Thomsen, O.O.; Thomson, A; and Watermeyer, G. (2010). “World Gastroenterology Organization Practice Guidelines for the diagnosis and management of IBD in 2010.” Inflamm Bowel Dis. 16: 112-124.

  30. Cosnes, J.; Gower-Rousseau, C.; Seksik, P.; Cortot, A. (2011). “Epidemiology and natural history of inflammatory bowel disease.” Gastroenterology. 140: 1785 - 1794.

  31. Shi, S.Q.; Ansari, T.S.; McGuinness, O.P.; Wasserman, D.H.; and Johnson, C.H. (2013). “Circadian disruption leads to insulin resistance and obesity.” Curr Biol. 23: 372 - 381.

  32. Clarke, S.F.; Murphy, E.F.; O’Sullivan, O.; Lucey, A.J.; Humphreys, M.; Hogan, A.; Hayes, P.; O’Reilly, M.; Jeffery, I.B.; Wood-Martin, R.; Kerins, D.M.; Quigley, E.; Ross, R.P.; O’Toole, P.W.; Molloy, M.G.; Falvey, E.; Shanahan, F.; and Cotter, P.D. “Exercise and associated dietary extremes impact on gut microbial diversity.” Gut. 63: 1913-1920.

  33. Evans, C.C.; LePard, K.J.; Kwak, J.W.; Stancukas, M.C.; Laskowski, S.; Dougherty, J.; Moulton, L.; Glawe, A.; Wang, Y.; Leone, V.; Antonpoulos, D.A.; Smith, D.; Chang, E.B.; and Ciancio, M.J. (2014). “Exercise prevents weight gain and alters the gut microbiota in a mouse model of high fat diet-induced obesity.” PLoS One. 9: e92193.

  34. Dominguez-Bello, M.G.; Costello, E.K.; Contreras, M.; Magris, M.; Hidalgo, G.; Fierer, N.; and Knight, R. (2010). “Delivery mode shapes the acquisition and structure of the initial microbiota across multiple body habitats in newborns.” PNAS. 107: 11971-11975.

Return to the top