In [1]:
import calour as ca
In [2]:
ca.set_log_level(11)
In [3]:
%matplotlib notebook
We will use the Chronic faitigue syndrome data from:
Giloteaux, L., Goodrich, J.K., Walters, W.A., Levine, S.M., Ley, R.E. and Hanson, M.R., 2016.
Reduced diversity and altered composition of the gut microbiome in individuals with myalgic encephalomyelitis/chronic fatigue syndrome.
Microbiome, 4(1), p.30.
In [4]:
cfs=ca.read_amplicon('data/chronic-fatigue-syndrome.biom',
'data/chronic-fatigue-syndrome.sample.txt',
normalize=10000,min_reads=1000)
In [5]:
cfs=cfs.filter_abundance(10)
In [6]:
cfs=cfs.cluster_features()
In [7]:
cfs=cfs.sort_samples('Subject')
in the interactive heatmap, when clicking on a bacteria, we get a list of all database results about the selected bacteria.
We can choose which databases to use by the databases=['dbbact',...]
parameter. The possible databases depend on which database modules were installed.
Currently, supported microbiome database interfaces include:
dbBact - a community database for manual annotations about bacteria (interface installation instruction at dbbact-calour).
SpongeEMP - an automatic database for sea sponge samples (interface installation instruction at spongeworld-calour).
phenoDB - phenotypic information about selected bacteria (interface installation instruction at pheno-calour).
By default, calour uses the dbBact database for microbiome data
In [8]:
cfs.plot(sample_field='Subject',gui='jupyter')
Out[8]:
In [9]:
dd=cfs.diff_abundance(field='Subject',val1='Control',val2='Patient', random_seed=2018)
In [11]:
dd.plot(sample_field='Subject', gui='jupyter', databases=['dbbact','sponge'],bary_fields=['_calour_direction'])
Out[11]:
In [12]:
ax, enriched=dd.plot_diff_abundance_enrichment()
The enriched terms are in a calour experiment class (terms are features, bacteria are samples), so we can see the list of enriched terms with the p-value (pval) and effect size (odif)
In [13]:
enriched.feature_metadata
Out[13]:
We can plot the enriched terms heatmap to see the term scores for each bacteria.
Note now rows are the bacteria and columns are the terms
In [14]:
enriched.plot(gui='jupyter', databases=[], feature_field='term',sample_field='group',
yticklabel_kwargs={'rotation': 0, 'size': 7})
Out[14]:
We want to see all the annotations where a given term appears, and see what bacteria from either group (CFS or healthy) appear in that annotations. To do this, we use dbbact.show_term_details_diff(). The output of this function is an experiment where each COLUMN is a bacteria, and each row is an annotation. We see whether each bacteria appears in the annotation. Color indicates the annotation type.
In [15]:
dbbact=ca.database._get_database_class('dbbact')
In [16]:
term_info_exp = dbbact.show_term_details_diff('small village',dd,gui='jupyter')
In [17]:
ax, enriched=dd.plot_diff_abundance_enrichment(term_type='annotation')
In [18]:
enriched.feature_metadata
Out[18]:
In [19]:
ax, enriched=dd.plot_diff_abundance_enrichment(term_type='combined')
In [20]:
enriched.feature_metadata
Out[20]:
If our experiment is already in dbBact, or if there are other experiments in dbBact we do not want to include in the enrichment analysis, we can specify them using the ignore_exp=[expID,...]
parameter.
In our case, the cfs experiment is already added to dbBact, so let's ignore it's annotations when doing the analysis. By looking at dbBact.org we know its experimentID is 12. Alternatively we can use ignore_exp=True
to automatically detect the current experimentID if it exists in dbBact (using the data and mapping file md5 hash).
In [21]:
ax, enriched=dd.plot_diff_abundance_enrichment(term_type='combined', ignore_exp=[12])
In [22]:
cfs=cfs.add_terms_to_features(dbname='dbbact',use_term_list=['feces','saliva','skin','mus musculus'])
In [23]:
tt=cfs.sort_by_metadata('common_term',axis='feature')
In [25]:
tt.plot(sample_field='Subject', bary_fields=['common_term'], gui='jupyter')
Out[25]:
Instead of just comparing the bacteria enriched in the two groups (and then comparing terms between them), we can do a weighted term average for each group using all bacteria (weighing the terms of each bacteria by its' frequency in the sample). This can work if we don't have a strong set of bacteria separating between the two groups.
In [26]:
dbbact=ca.database._get_database_class('dbbact')
In [27]:
enriched=dbbact.sample_enrichment(cfs,'Subject','Control','Patient',
term_type='combined',ignore_exp=[12])
In [28]:
enriched.feature_metadata
Out[28]:
In [ ]: