Calour microbiome databases interface tutorial

Setup


In [1]:
import calour as ca


/Users/amnon/miniconda3/envs/calour/lib/python3.5/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

In [2]:
ca.set_log_level(11)

In [3]:
%matplotlib notebook

Load the data

We will use the Chronic faitigue syndrome data from:

Giloteaux, L., Goodrich, J.K., Walters, W.A., Levine, S.M., Ley, R.E. and Hanson, M.R., 2016.

Reduced diversity and altered composition of the gut microbiome in individuals with myalgic encephalomyelitis/chronic fatigue syndrome.

Microbiome, 4(1), p.30.


In [4]:
cfs=ca.read_amplicon('data/chronic-fatigue-syndrome.biom',
                     'data/chronic-fatigue-syndrome.sample.txt',
                     normalize=10000,min_reads=1000)


2019-04-18 14:16:27 INFO loaded 87 samples, 2129 features
2019-04-18 14:16:27 WARNING These have metadata but do not have data - dropped (1): {'ERR1331814'}
2019-04-18 14:16:27 INFO After filtering, 87 remaining

preprocess

remove non-interesting bacteria, cluster bacteria and sort samples by disease status


In [5]:
cfs=cfs.filter_abundance(10)


2019-04-18 14:16:28 INFO After filtering, 1100 remaining

In [6]:
cfs=cfs.cluster_features()


2019-04-18 14:16:28 INFO After filtering, 1100 remaining

In [7]:
cfs=cfs.sort_samples('Subject')

Viewing database annotations

in the interactive heatmap, when clicking on a bacteria, we get a list of all database results about the selected bacteria.

We can choose which databases to use by the databases=['dbbact',...] parameter. The possible databases depend on which database modules were installed.

Currently, supported microbiome database interfaces include:

  • dbBact - a community database for manual annotations about bacteria (interface installation instruction at dbbact-calour).

  • SpongeEMP - an automatic database for sea sponge samples (interface installation instruction at spongeworld-calour).

  • phenoDB - phenotypic information about selected bacteria (interface installation instruction at pheno-calour).

By default, calour uses the dbBact database for microbiome data


In [8]:
cfs.plot(sample_field='Subject',gui='jupyter')


Out[8]:
<calour.heatmap.plotgui_jupyter.PlotGUI_Jupyter at 0x1a18f962b0>

dbBact enrichment of selected bacteria

By selecting a set of bacteria (using the shift+click or ctrl+click) and choosing the "Enrichment" button, we can get a list of terms that are significantly enriched in the selected bacteria compared to the rest of the bacteria in the plot

Adding dbBact annotations

(Only possible using the gui='qt5' GUI)

To add a new annotation to the selected set of bacteria, choose the "Annotate" button.

Detailed instructions are available at the dbBact.org website.

Differential abundance

To find the bacteria significantly different between samples with 'Control' (healthy) and 'Patient' (sick) in the 'Subject' field.


In [9]:
dd=cfs.diff_abundance(field='Subject',val1='Control',val2='Patient', random_seed=2018)


2019-04-18 14:16:41 INFO 87 samples with both values
2019-04-18 14:16:41 INFO After filtering, 1100 remaining
2019-04-18 14:16:41 INFO 39 samples with value 1 (['Control'])
2019-04-18 14:16:41 INFO number of higher in Control: 38. number of higher in Patient : 16. total 54

Plot the significant bacteria

When clicking on a bacteria, we'll get both dbBact, SpongeEMP, and phenoDB information


In [11]:
dd.plot(sample_field='Subject', gui='jupyter', databases=['dbbact','sponge'],bary_fields=['_calour_direction'])


Out[11]:
<calour.heatmap.plotgui_jupyter.PlotGUI_Jupyter at 0x1a1946dfd0>

dbBact term enrichment (diff_abundance_enrichment)

We can ask what is special in the bacteria significanly higher in the Control vs. the Patient group and vice versa.

  • Note since we need to get the per-feature annotations from dbBact, we need a live internet connection to run this command.

Default parameters


In [12]:
ax, enriched=dd.plot_diff_abundance_enrichment()


2019-04-18 14:17:31 INFO Getting dbBact annotations for 54 sequences, please wait...
2019-04-18 14:17:40 INFO got 609 annotations
2019-04-18 14:17:40 INFO Got 4096 annotation-sequence pairs
2019-04-18 14:17:40 INFO Added annotation data to experiment. Total 609 annotations, 54 terms

The enriched terms are in a calour experiment class (terms are features, bacteria are samples), so we can see the list of enriched terms with the p-value (pval) and effect size (odif)


In [13]:
enriched.feature_metadata


Out[13]:
num_enriched_exps num_total_exps odif pvals term
LOWER IN thailand * -1.0 -1.0 -20.072368 0.000999 LOWER IN thailand *
LOWER IN age 60-79 * -1.0 -1.0 -20.027961 0.000999 LOWER IN age 60-79 *
LOWER IN age 30-50 * -1.0 -1.0 -20.027961 0.000999 LOWER IN age 30-50 *
LOWER IN control -1.0 -1.0 -19.583882 0.000999 LOWER IN control
LOWER IN rural community -1.0 -1.0 -18.740132 0.000999 LOWER IN rural community
LOWER IN physical activity * -1.0 -1.0 -18.562500 0.000999 LOWER IN physical activity *
little physical activity * -1.0 -1.0 -18.562500 0.000999 little physical activity *
LOWER IN small village -1.0 -1.0 -17.452303 0.000999 LOWER IN small village
LOWER IN adult -1.0 -1.0 -17.185855 0.001998 LOWER IN adult
LOWER IN peru * -1.0 -1.0 -16.430921 0.000999 LOWER IN peru *
LOWER IN tunapuco * -1.0 -1.0 -16.430921 0.000999 LOWER IN tunapuco *
crohn's disease -1.0 -1.0 -15.365132 0.000999 crohn's disease
chronic fatigue syndrome * -1.0 -1.0 -15.187500 0.000999 chronic fatigue syndrome *
age 3-6 * -1.0 -1.0 -14.832237 0.001998 age 3-6 *
systemic lupus erythematosus * -1.0 -1.0 -13.144737 0.000999 systemic lupus erythematosus *
mus musculus -1.0 -1.0 -12.922697 0.001998 mus musculus
age > 1 year -1.0 -1.0 -12.478618 0.004995 age > 1 year
rattus norvegicus -1.0 -1.0 -12.389803 0.000999 rattus norvegicus
LOWER IN male -1.0 -1.0 -12.167763 0.002997 LOWER IN male
LOWER IN age 13-14 * -1.0 -1.0 -12.078947 0.000999 LOWER IN age 13-14 *
rat -1.0 -1.0 -12.034539 0.000999 rat
LOWER IN gay -1.0 -1.0 -11.990132 0.000999 LOWER IN gay
LOWER IN msm -1.0 -1.0 -11.990132 0.000999 LOWER IN msm
LOWER IN homosexual -1.0 -1.0 -11.990132 0.000999 LOWER IN homosexual
state of oklahoma -1.0 -1.0 -11.501645 0.010989 state of oklahoma
research facility -1.0 -1.0 -11.412829 0.004995 research facility
age 1 year -1.0 -1.0 -11.368421 0.000999 age 1 year
LOWER IN plant diet * -1.0 -1.0 -11.368421 0.001998 LOWER IN plant diet *
mouse -1.0 -1.0 -11.279605 0.003996 mouse
stroke * -1.0 -1.0 -11.235197 0.001998 stroke *
... ... ... ... ... ...
age 30-50 * -1.0 -1.0 17.052632 0.000999 age 30-50 *
nigeria -1.0 -1.0 17.052632 0.000999 nigeria
LOWER IN effluent -1.0 -1.0 17.230263 0.000999 LOWER IN effluent
influent * -1.0 -1.0 17.230263 0.000999 influent *
sewage * -1.0 -1.0 17.230263 0.000999 sewage *
wastewater treatment plant -1.0 -1.0 17.452303 0.000999 wastewater treatment plant
zoological garden -1.0 -1.0 17.585526 0.000999 zoological garden
LOWER IN finland -1.0 -1.0 17.629934 0.000999 LOWER IN finland
tanzania * -1.0 -1.0 17.763158 0.000999 tanzania *
hadza * -1.0 -1.0 17.763158 0.000999 hadza *
thailand * -1.0 -1.0 17.985197 0.000999 thailand *
egypt * -1.0 -1.0 18.118421 0.000999 egypt *
africa * -1.0 -1.0 18.162829 0.000999 africa *
monkey -1.0 -1.0 18.473684 0.000999 monkey
papio anubis * -1.0 -1.0 18.473684 0.000999 papio anubis *
LOWER IN duodenum -1.0 -1.0 19.050987 0.000999 LOWER IN duodenum
LOWER IN age 3-6 * -1.0 -1.0 19.184211 0.000999 LOWER IN age 3-6 *
right colon -1.0 -1.0 19.184211 0.000999 right colon
zambia * -1.0 -1.0 19.184211 0.000999 zambia *
LOWER IN united states of america -1.0 -1.0 19.894737 0.000999 LOWER IN united states of america
south america * -1.0 -1.0 20.250000 0.000999 south america *
venezuela -1.0 -1.0 20.605263 0.000999 venezuela
amerindian -1.0 -1.0 20.605263 0.000999 amerindian
tunapuco * -1.0 -1.0 21.315789 0.000999 tunapuco *
peru * -1.0 -1.0 21.315789 0.000999 peru *
el salvador * -1.0 -1.0 22.026316 0.000999 el salvador *
hunter gatherer -1.0 -1.0 22.736842 0.000999 hunter gatherer
rural community -1.0 -1.0 23.092105 0.000999 rural community
LOWER IN infant -1.0 -1.0 23.358553 0.000999 LOWER IN infant
small village -1.0 -1.0 23.758224 0.000999 small village

240 rows × 5 columns

We can plot the enriched terms heatmap to see the term scores for each bacteria.

Note now rows are the bacteria and columns are the terms


In [14]:
enriched.plot(gui='jupyter', databases=[], feature_field='term',sample_field='group',
              yticklabel_kwargs={'rotation': 0, 'size': 7})


Out[14]:
<calour.heatmap.plotgui_jupyter.PlotGUI_Jupyter at 0x1a24a5a048>

Look at the behavior of a single term

We want to see all the annotations where a given term appears, and see what bacteria from either group (CFS or healthy) appear in that annotations. To do this, we use dbbact.show_term_details_diff(). The output of this function is an experiment where each COLUMN is a bacteria, and each row is an annotation. We see whether each bacteria appears in the annotation. Color indicates the annotation type.


In [15]:
dbbact=ca.database._get_database_class('dbbact')

In [16]:
term_info_exp = dbbact.show_term_details_diff('small village',dd,gui='jupyter')


2019-04-18 14:18:02 WARNING Do you forget to normalize your data? It is required before running this function
2019-04-18 14:18:02 INFO After filtering, 12 remaining

getting enriched annotations instead of terms

Each annotation is coming from a single experiment (as opposed to terms that can come from annotations in multiple experiment)


In [17]:
ax, enriched=dd.plot_diff_abundance_enrichment(term_type='annotation')



In [18]:
enriched.feature_metadata


Out[18]:
num_enriched_exps num_total_exps odif pvals term
lower in people from thailand compared to 2nd generation immigrants to usa ( high in united states of america compared to thailand rural community in homo sapiens feces -1.0 -1.0 -20.072368 0.000999 lower in people from thailand compared to 2nd ...
higher in individuals with low physical activity ( high in little physical activity compared to physical activity in homo sapiens feces united states of america -1.0 -1.0 -18.562500 0.000999 higher in individuals with low physical activi...
higher in centenarians compared to adults ( high in age >94 compared to age 30-50 age 60-79 in homo sapiens feces china -1.0 -1.0 -16.430921 0.000999 higher in centenarians compared to adults ( hi...
high in united states of america city state of oklahoma compared to peru small village tunapuco rural community in feces homo sapiens adult -1.0 -1.0 -16.430921 0.001998 high in united states of america city state o...
high in chronic fatigue syndrome compared to control in homo sapiens feces new york county -1.0 -1.0 -15.187500 0.000999 high in chronic fatigue syndrome compared to...
high in children with Crohn's disease compared to healthy adult controls ( high in crohn's disease obsolete_juvenile stage child compared to control adult in homo sapiens feces glasgow -1.0 -1.0 -15.187500 0.000999 high in children with Crohn's disease compared...
high in female compared to male in homo sapiens feces united states of america -1.0 -1.0 -15.187500 0.000999 high in female compared to male in homo sap...
high in age 3-6 age 8-12 child compared to adult age 30-50 age 60-79 in homo sapiens feces china -1.0 -1.0 -13.322368 0.000999 high in age 3-6 age 8-12 child compared to a...
higher in kindergarten compared to primary and middle school kids ( high in age 3-6 compared to age 8-12 age 13-14 in homo sapiens feces china -1.0 -1.0 -12.078947 0.002997 higher in kindergarten compared to primary and...
high in systemic lupus erythematosus compared to control in feces homo sapiens commonwealth of virginia united states of america adult -1.0 -1.0 -11.812500 0.000999 high in systemic lupus erythematosus compare...
Higher in animal product diet compared to plant diet ( high in diet animal product diet compared to plant diet in homo sapiens feces united states of america -1.0 -1.0 -11.368421 0.002997 Higher in animal product diet compared to plan...
common feces, homo sapiens, commonwealth of virginia, united states of america, adult, systemic lupus erythematosus, -1.0 -1.0 -10.657895 0.003996 common feces, homo sapiens, commonwealth of v...
higher in stroke patients compared to healthy controls ( high in stroke compared to control in homo sapiens feces china guangzhou city prefecture adult -1.0 -1.0 -10.125000 0.000999 higher in stroke patients compared to healthy ...
common homo sapiens, feces, kingdom of norway, oslo, infant, age 1 year, -1.0 -1.0 -10.125000 0.000999 common homo sapiens, feces, kingdom of norway...
higher in antibiotics treated rats compared to controls ( high in antibiotic neomycin ampicillin compared to control in rat rattus norvegicus sprague dawley feces caecum research facility switzerland -1.0 -1.0 -10.125000 0.001998 higher in antibiotics treated rats compared to...
high in infant age 1 year compared to adult age 30-40 in homo sapiens feces kingdom of norway oslo -1.0 -1.0 -10.125000 0.000999 high in infant age 1 year compared to adult ...
lower in infants age<1 year compared to 1-3 years in baby feces ( high in age age > 1 year compared to age <1 year in homo sapiens feces infant finland -1.0 -1.0 -9.858553 0.015984 lower in infants age<1 year compared to 1-3 ye...
high in age 1 year compared to age 2 months in homo sapiens female feces state of california infant -1.0 -1.0 -9.414474 0.001998 high in age 1 year compared to age 2 months ...
lower in gay (msm) individuals compared to heterosexual (msw) ( high in heterosexual msw compared to gay msm homosexual in homo sapiens feces united states of america state of colorado denver -1.0 -1.0 -9.414474 0.000999 lower in gay (msm) individuals compared to het...
common felis catus, cat, feces, united states of america, state of colorado, -1.0 -1.0 -8.703947 0.001998 common felis catus, cat, feces, united states...
higher in lean participants in human feces ( high in low bmi compared to high bmi in united states of america feces adult homo sapiens -1.0 -1.0 -8.437500 0.003996 higher in lean participants in human feces ( h...
common homo sapiens, feces, diarrhea, clostridium difficile intestinal infectious disease, australia, hospital, -1.0 -1.0 -8.437500 0.002997 common homo sapiens, feces, diarrhea, clostri...
high in schizophrenia compared to control in homo sapiens feces adult united states of america -1.0 -1.0 -8.437500 0.002997 high in schizophrenia compared to control i...
common feces, homo sapiens, kingdom of denmark, infant, age one year, -1.0 -1.0 -8.437500 0.002997 common feces, homo sapiens, kingdom of denmar...
high in crohn's disease compared to control in homo sapiens feces belgium -1.0 -1.0 -8.437500 0.000999 high in crohn's disease compared to control ...
high in gangcha region compared to gannan tibetan autonomous prefecture in feces homo sapiens adult tibetan plateau tibet autonomous region -1.0 -1.0 -7.993421 0.011988 high in gangcha region compared to gannan ti...
common feces, united states of america, research facility, rat, rattus norvegicus, -1.0 -1.0 -7.726974 0.010989 common feces, united states of america, resea...
high in healthy dogs compared to EPI dogs without treatment ( high in control compared to exocrine pancreatic insufficiency in canis lupus familiaris dog feces united states of america -1.0 -1.0 -7.726974 0.008991 high in healthy dogs compared to EPI dogs with...
high in age age one month compared to age one week in feces homo sapiens kingdom of denmark infant -1.0 -1.0 -7.726974 0.007992 high in age age one month compared to age on...
high in equine grass sickness disease compared to control in equus caballus horse feces united kingdom farm -1.0 -1.0 -7.726974 0.006993 high in equine grass sickness disease compar...
... ... ... ... ... ...
high in control compared to chronic fatigue syndrome in homo sapiens feces new york county -1.0 -1.0 11.368421 0.003996 high in control compared to chronic fatigue ...
high in control compared to ulcerative colitis in feces united states of america homo sapiens -1.0 -1.0 11.546053 0.010989 high in control compared to ulcerative colit...
common homo sapiens, feces, lima, city, shantytown, -1.0 -1.0 11.812500 0.005994 common homo sapiens, feces, lima, city, shant...
higher in babies from russia compared to finland ( high in russia compared to finland in homo sapiens feces infant age < 3 years -1.0 -1.0 11.812500 0.001998 higher in babies from russia compared to finla...
common feces, united states of america, zoological garden, state of tennessee, macaca mulatta, macaque, -1.0 -1.0 12.078947 0.000999 common feces, united states of america, zoolo...
common feces, homo sapiens, nigeria, kebbi state, rural community, child, age 10-15 years, schistosomiasis, urinary schistosomiasis, -1.0 -1.0 12.078947 0.001998 common feces, homo sapiens, nigeria, kebbi st...
common pan troglodytes, chimpanzee, zoological garden, feces, united states of america, state of tennessee, -1.0 -1.0 12.078947 0.001998 common pan troglodytes, chimpanzee, zoologica...
common homo sapiens, tanzania, hadza, hunter gatherer, feces, -1.0 -1.0 12.789474 0.000999 common homo sapiens, tanzania, hadza, hunter ...
lower in small intestine compared to colon in pigs ( high in caecum right colon left colon compared to duodenum jejunum ileum in sus scrofa pig united kingdom -1.0 -1.0 12.789474 0.000999 lower in small intestine compared to colon in ...
high in adult age 30-40 compared to infant age 1 year in homo sapiens feces kingdom of norway oslo -1.0 -1.0 13.233553 0.001998 high in adult age 30-40 compared to infant a...
common homo sapiens, feces, colombia, -1.0 -1.0 13.233553 0.001998 common homo sapiens, feces, colombia,
high in age 30-50 age 60-79 adult compared to age 3-6 age 8-12 child in homo sapiens feces china -1.0 -1.0 13.500000 0.000999 high in age 30-50 age 60-79 adult compared t...
high in peru small village tunapuco rural community compared to united states of america city state of oklahoma in feces homo sapiens adult -1.0 -1.0 14.210526 0.001998 high in peru small village tunapuco rural com...
common feces, homo sapiens, nigeria, kebbi state, rural community, child, age 10-15 years, control, -1.0 -1.0 14.210526 0.000999 common feces, homo sapiens, nigeria, kebbi st...
common homo sapiens, feces, kingdom of denmark, metabolic syndrome, adult, -1.0 -1.0 14.388158 0.002997 common homo sapiens, feces, kingdom of denmar...
high in control compared to crohn's disease in feces united states of america homo sapiens -1.0 -1.0 14.565789 0.000999 high in control compared to crohn's disease ...
common sus scrofa, pig, united kingdom, caecum, right colon, left colon, -1.0 -1.0 14.654605 0.000999 common sus scrofa, pig, united kingdom, caecu...
high in zoological garden papio anubis compared to wild papio kindae papio ursinus in zambia feces monkey baboon papio -1.0 -1.0 14.921053 0.000999 high in zoological garden papio anubis compa...
common homo sapiens, feces, obsolete_juvenile stage, child, egypt, -1.0 -1.0 15.009868 0.000999 common homo sapiens, feces, obsolete_juvenile...
high in control compared to pancreatitis acute pancreatitis in homo sapiens adult feces china nanchang city prefecture -1.0 -1.0 15.365132 0.000999 high in control compared to pancreatitis acu...
high in male compared to female in homo sapiens feces united states of america -1.0 -1.0 15.631579 0.000999 high in male compared to female in homo sap...
lower in babies from finland compared to estonia ( high in estonia compared to finland in homo sapiens feces infant age < 3 years -1.0 -1.0 16.342105 0.000999 lower in babies from finland compared to eston...
common feces, monkey, united states of america, zoological garden, papio anubis, olive baboon, -1.0 -1.0 16.342105 0.000999 common feces, monkey, united states of americ...
high in feces compared to duodenum in homo sapiens africa child -1.0 -1.0 16.519737 0.000999 high in feces compared to duodenum in homo ...
lower in wastewater plant effluent compared to influent and sewer in south america ( high in sewage influent compared to effluent in south america wastewater treatment plant city -1.0 -1.0 17.230263 0.000999 lower in wastewater plant effluent compared to...
high in control compared to crohn's disease in homo sapiens feces belgium -1.0 -1.0 18.207237 0.000999 high in control compared to crohn's disease ...
common homo sapiens, venezuela, feces, amerindian, hunter gatherer, -1.0 -1.0 19.184211 0.000999 common homo sapiens, venezuela, feces, amerin...
common feces, homo sapiens, peru, small village, tunapuco, rural community, adult, -1.0 -1.0 19.184211 0.000999 common feces, homo sapiens, peru, small villa...
common homo sapiens, feces, city, el salvador, small village, -1.0 -1.0 20.605263 0.000999 common homo sapiens, feces, city, el salvador...
high in adult compared to infant age < 1 year in homo sapiens feces india -1.0 -1.0 22.470395 0.000999 high in adult compared to infant age < 1 yea...

149 rows × 5 columns

Getting both enriched terms and annotations


In [19]:
ax, enriched=dd.plot_diff_abundance_enrichment(term_type='combined')



In [20]:
enriched.feature_metadata


Out[20]:
num_enriched_exps num_total_exps odif pvals term
LOWER IN thailand -1.0 -1.0 -20.072368 0.000999 LOWER IN thailand
lower in people from thailand compared to 2nd generation immigrants to usa ( high in united states of america compared to thailand rural community in homo sapiens feces -1.0 -1.0 -20.072368 0.000999 lower in people from thailand compared to 2nd ...
LOWER IN age 60-79 -1.0 -1.0 -20.027961 0.000999 LOWER IN age 60-79
LOWER IN age 30-50 -1.0 -1.0 -20.027961 0.000999 LOWER IN age 30-50
LOWER IN control -1.0 -1.0 -19.583882 0.000999 LOWER IN control
LOWER IN rural community -1.0 -1.0 -18.740132 0.000999 LOWER IN rural community
LOWER IN physical activity -1.0 -1.0 -18.562500 0.000999 LOWER IN physical activity
little physical activity -1.0 -1.0 -18.562500 0.000999 little physical activity
higher in individuals with low physical activity ( high in little physical activity compared to physical activity in homo sapiens feces united states of america -1.0 -1.0 -18.562500 0.000999 higher in individuals with low physical activi...
LOWER IN small village -1.0 -1.0 -17.452303 0.000999 LOWER IN small village
LOWER IN adult -1.0 -1.0 -17.185855 0.000999 LOWER IN adult
LOWER IN tunapuco -1.0 -1.0 -16.430921 0.000999 LOWER IN tunapuco
LOWER IN peru -1.0 -1.0 -16.430921 0.000999 LOWER IN peru
higher in centenarians compared to adults ( high in age >94 compared to age 30-50 age 60-79 in homo sapiens feces china -1.0 -1.0 -16.430921 0.000999 higher in centenarians compared to adults ( hi...
high in united states of america city state of oklahoma compared to peru small village tunapuco rural community in feces homo sapiens adult -1.0 -1.0 -16.430921 0.000999 high in united states of america city state o...
crohn's disease -1.0 -1.0 -15.365132 0.000999 crohn's disease
high in female compared to male in homo sapiens feces united states of america -1.0 -1.0 -15.187500 0.000999 high in female compared to male in homo sap...
high in chronic fatigue syndrome compared to control in homo sapiens feces new york county -1.0 -1.0 -15.187500 0.000999 high in chronic fatigue syndrome compared to...
high in children with Crohn's disease compared to healthy adult controls ( high in crohn's disease obsolete_juvenile stage child compared to control adult in homo sapiens feces glasgow -1.0 -1.0 -15.187500 0.000999 high in children with Crohn's disease compared...
chronic fatigue syndrome -1.0 -1.0 -15.187500 0.000999 chronic fatigue syndrome
age 3-6 -1.0 -1.0 -14.832237 0.001998 age 3-6
high in age 3-6 age 8-12 child compared to adult age 30-50 age 60-79 in homo sapiens feces china -1.0 -1.0 -13.322368 0.000999 high in age 3-6 age 8-12 child compared to a...
systemic lupus erythematosus -1.0 -1.0 -13.144737 0.000999 systemic lupus erythematosus
mus musculus -1.0 -1.0 -12.922697 0.000999 mus musculus
age > 1 year -1.0 -1.0 -12.478618 0.002997 age > 1 year
rattus norvegicus -1.0 -1.0 -12.389803 0.001998 rattus norvegicus
LOWER IN male -1.0 -1.0 -12.167763 0.002997 LOWER IN male
higher in kindergarten compared to primary and middle school kids ( high in age 3-6 compared to age 8-12 age 13-14 in homo sapiens feces china -1.0 -1.0 -12.078947 0.001998 higher in kindergarten compared to primary and...
LOWER IN age 13-14 -1.0 -1.0 -12.078947 0.001998 LOWER IN age 13-14
rat -1.0 -1.0 -12.034539 0.001998 rat
... ... ... ... ... ...
wastewater treatment plant -1.0 -1.0 17.452303 0.000999 wastewater treatment plant
zoological garden -1.0 -1.0 17.585526 0.000999 zoological garden
LOWER IN finland -1.0 -1.0 17.629934 0.000999 LOWER IN finland
hadza -1.0 -1.0 17.763158 0.000999 hadza
tanzania -1.0 -1.0 17.763158 0.000999 tanzania
thailand -1.0 -1.0 17.985197 0.000999 thailand
egypt -1.0 -1.0 18.118421 0.000999 egypt
africa -1.0 -1.0 18.162829 0.000999 africa
high in control compared to crohn's disease in homo sapiens feces belgium -1.0 -1.0 18.207237 0.000999 high in control compared to crohn's disease ...
monkey -1.0 -1.0 18.473684 0.000999 monkey
papio anubis -1.0 -1.0 18.473684 0.000999 papio anubis
LOWER IN duodenum -1.0 -1.0 19.050987 0.000999 LOWER IN duodenum
right colon -1.0 -1.0 19.184211 0.000999 right colon
common homo sapiens, venezuela, feces, amerindian, hunter gatherer, -1.0 -1.0 19.184211 0.000999 common homo sapiens, venezuela, feces, amerin...
common feces, homo sapiens, peru, small village, tunapuco, rural community, adult, -1.0 -1.0 19.184211 0.000999 common feces, homo sapiens, peru, small villa...
LOWER IN age 3-6 -1.0 -1.0 19.184211 0.000999 LOWER IN age 3-6
zambia -1.0 -1.0 19.184211 0.000999 zambia
LOWER IN united states of america -1.0 -1.0 19.894737 0.000999 LOWER IN united states of america
south america -1.0 -1.0 20.250000 0.000999 south america
common homo sapiens, feces, city, el salvador, small village, -1.0 -1.0 20.605263 0.000999 common homo sapiens, feces, city, el salvador...
amerindian -1.0 -1.0 20.605263 0.000999 amerindian
venezuela -1.0 -1.0 20.605263 0.000999 venezuela
peru -1.0 -1.0 21.315789 0.000999 peru
tunapuco -1.0 -1.0 21.315789 0.000999 tunapuco
el salvador -1.0 -1.0 22.026316 0.000999 el salvador
high in adult compared to infant age < 1 year in homo sapiens feces india -1.0 -1.0 22.470395 0.000999 high in adult compared to infant age < 1 yea...
hunter gatherer -1.0 -1.0 22.736842 0.000999 hunter gatherer
rural community -1.0 -1.0 23.092105 0.000999 rural community
LOWER IN infant -1.0 -1.0 23.358553 0.000999 LOWER IN infant
small village -1.0 -1.0 23.758224 0.000999 small village

397 rows × 5 columns

Ignoring selected experiments already in dbBact

If our experiment is already in dbBact, or if there are other experiments in dbBact we do not want to include in the enrichment analysis, we can specify them using the ignore_exp=[expID,...] parameter.

In our case, the cfs experiment is already added to dbBact, so let's ignore it's annotations when doing the analysis. By looking at dbBact.org we know its experimentID is 12. Alternatively we can use ignore_exp=True to automatically detect the current experimentID if it exists in dbBact (using the data and mapping file md5 hash).


In [21]:
ax, enriched=dd.plot_diff_abundance_enrichment(term_type='combined', ignore_exp=[12])


Adding common dbBact terms to features (add_terms_to_features)

We can attach to each bacteria the most common dbBact term associated with it.

The terms are selected from all of the dbBact terms, or can be selected from a supplied list.


In [22]:
cfs=cfs.add_terms_to_features(dbname='dbbact',use_term_list=['feces','saliva','skin','mus musculus'])


2019-04-18 14:18:37 INFO Getting dbBact annotations for 1100 sequences, please wait...
2019-04-18 14:18:51 INFO got 2263 annotations
2019-04-18 14:18:51 INFO Got 41933 annotation-sequence pairs
2019-04-18 14:18:51 INFO Added annotation data to experiment. Total 2263 annotations, 1100 terms

In [23]:
tt=cfs.sort_by_metadata('common_term',axis='feature')

In [25]:
tt.plot(sample_field='Subject', bary_fields=['common_term'], gui='jupyter')


Out[25]:
<calour.heatmap.plotgui_jupyter.PlotGUI_Jupyter at 0x1a27dd4eb8>

Get enriched terms using all bacteria

Instead of just comparing the bacteria enriched in the two groups (and then comparing terms between them), we can do a weighted term average for each group using all bacteria (weighing the terms of each bacteria by its' frequency in the sample). This can work if we don't have a strong set of bacteria separating between the two groups.


In [26]:
dbbact=ca.database._get_database_class('dbbact')

In [27]:
enriched=dbbact.sample_enrichment(cfs,'Subject','Control','Patient',
                                  term_type='combined',ignore_exp=[12])


2019-04-18 14:19:31 INFO 87 samples with both values
2019-04-18 14:19:31 WARNING Do you forget to normalize your data? It is required before running this function
2019-04-18 14:19:31 INFO After filtering, 4365 remaining
2019-04-18 14:19:31 INFO 39 samples with value 1 (['Control'])
2019-04-18 14:19:34 INFO number of higher in Control: 831. number of higher in Patient : 119. total 950

In [28]:
enriched.feature_metadata


Out[28]:
term num_features _calour_stat _calour_pval _calour_direction
enzyme supplement enzyme supplement 20 -1.467864 0.000999 Patient
-non c. diff diarrhea -non c. diff diarrhea 13 -1.467470 0.000999 Patient
higher in patients with c. diff diarrhea compared to non-c. diff diarrhea ( high in clostridium difficile intestinal infectious disease compared to non c. diff diarrhea in homo sapiens feces australia hospital diarrhea higher in patients with c. diff diarrhea compa... 13 -1.467470 0.000999 Patient
higher in antibiotics treated rats compared to controls ( high in antibiotic neomycin ampicillin compared to control in rat rattus norvegicus sprague dawley feces caecum research facility switzerland higher in antibiotics treated rats compared to... 21 -1.311713 0.000999 Patient
neomycin neomycin 21 -1.294715 0.000999 Patient
-no enzyme supplement -no enzyme supplement 20 -1.252388 0.000999 Patient
high in EPI dogs with enzyme supplement compared to no supplement ( high in enzyme supplement compared to no enzyme supplement in canis lupus familiaris dog feces united states of america exocrine pancreatic insufficiency high in EPI dogs with enzyme supplement compar... 20 -1.252388 0.000999 Patient
high in children with Crohn's disease compared to healthy adult controls ( high in crohn's disease obsolete_juvenile stage child compared to control adult in homo sapiens feces glasgow high in children with Crohn's disease compared... 47 -1.188369 0.000999 Patient
high in schizophrenia compared to control in homo sapiens feces adult united states of america high in schizophrenia compared to control i... 13 -1.031583 0.000999 Patient
-gastric bypass -gastric bypass 4 -1.009475 0.000999 Patient
lower in people with Roux-en-Y gastric bypass compared to controls ( high in control compared to gastric bypass in homo sapiens feces united states of america lower in people with Roux-en-Y gastric bypass ... 4 -1.009475 0.000999 Patient
-physical activity -physical activity 49 -0.964541 0.001998 Patient
higher in individuals with low physical activity ( high in little physical activity compared to physical activity in homo sapiens feces united states of america higher in individuals with low physical activi... 49 -0.964541 0.001998 Patient
probiotic probiotic 3 -0.958546 0.001998 Patient
little physical activity little physical activity 49 -0.931222 0.002997 Patient
defatted larvae insect diet defatted larvae insect diet 3 -0.921678 0.000999 Patient
highfreq feces, south africa, acinonyx jubatus, cheetah, highfreq feces, south africa, acinonyx jubatu... 10 -0.896606 0.005994 Patient
cheetah cheetah 20 -0.853389 0.001998 Patient
-age 30-40 -age 30-40 16 -0.832852 0.000999 Patient
high in infant age 1 year compared to adult age 30-40 in homo sapiens feces kingdom of norway oslo high in infant age 1 year compared to adult ... 16 -0.832852 0.000999 Patient
salmune vaccination salmune vaccination 12 -0.821224 0.001998 Patient
-salmune vaccination -salmune vaccination 28 -0.812220 0.001998 Patient
-vaccination -vaccination 28 -0.812220 0.001998 Patient
higher in non-vaccinated chickens ( high in control compared to vaccination salmune vaccination in gallus gallus chicken united states of america caecum higher in non-vaccinated chickens ( high in co... 28 -0.812220 0.001998 Patient
negatively correlated with age 8-35 days in chickens ( high in young age compared to age old age in gallus gallus chicken feces united kingdom negatively correlated with age 8-35 days in ch... 9 -0.801746 0.001998 Patient
high in ulcerative colitis compared to control in feces united states of america homo sapiens high in ulcerative colitis compared to contr... 20 -0.798372 0.003996 Patient
pulsed antibiotic treatment, macrolide tylosin tartrate pulsed antibiotic treatment, macrolide tylosin... 9 -0.771024 0.004995 Patient
exocrine pancreatic insufficiency exocrine pancreatic insufficiency 33 -0.728781 0.002997 Patient
high in crohn's disease compared to control in feces united states of america homo sapiens high in crohn's disease compared to control ... 18 -0.722724 0.002997 Patient
acinonyx jubatus acinonyx jubatus 22 -0.686487 0.001998 Patient
... ... ... ... ... ...
-camp hukamako -camp hukamako 6 0.993227 0.000999 Control
lower in Hadza camp Hukamako compared to hadza camp Sengeli ( high in camp sengeli compared to camp hukamako in homo sapiens tanzania hadza hunter gatherer feces lower in Hadza camp Hukamako compared to hadza... 6 0.993227 0.000999 Control
common feces, monkey, theropithecus gelada, ethiopia, common feces, monkey, theropithecus gelada, e... 17 1.005345 0.001998 Control
high in hiv infection compared to control in homo sapiens feces united states of america high in hiv infection compared to control i... 20 1.010040 0.000999 Control
low hdl low hdl 4 1.031434 0.000999 Control
plant based diet plant based diet 4 1.033398 0.000999 Control
-little physical activity -little physical activity 84 1.051961 0.001998 Control
higher in individuals with high physical activity ( high in physical activity compared to little physical activity in homo sapiens feces united states of america higher in individuals with high physical activ... 84 1.051961 0.001998 Control
high in male compared to female in homo sapiens feces toronto high in male compared to female in homo sap... 11 1.053019 0.000999 Control
-papio kindae -papio kindae 114 1.054979 0.000999 Control
-papio ursinus -papio ursinus 114 1.054979 0.000999 Control
high in zoological garden papio anubis compared to wild papio kindae papio ursinus in zambia feces monkey baboon papio high in zoological garden papio anubis compa... 114 1.054979 0.000999 Control
physical activity physical activity 84 1.072794 0.001998 Control
trichuriasis trichuriasis 6 1.078697 0.000999 Control
high filber high filber 7 1.083860 0.001998 Control
highfreq sus scrofa, pig, united kingdom, caecum, right colon, left colon, highfreq sus scrofa, pig, united kingdom, cae... 10 1.108316 0.000999 Control
low human contact low human contact 4 1.109711 0.000999 Control
plant fiber cell plant fiber cell 8 1.112771 0.000999 Control
hiv infection hiv infection 67 1.114019 0.000999 Control
-state of oklahoma -state of oklahoma 177 1.132083 0.000999 Control
high in peru small village tunapuco rural community compared to united states of america city state of oklahoma in feces homo sapiens adult high in peru small village tunapuco rural com... 177 1.132083 0.000999 Control
camp sengeli camp sengeli 6 1.177868 0.000999 Control
higher in gay (msm) individuals compared to heterosexual (msw) ( high in homosexual msm gay compared to heterosexual msw in homo sapiens feces united states of america state of colorado denver higher in gay (msm) individuals compared to he... 82 1.255350 0.000999 Control
-heterosexual -heterosexual 98 1.390927 0.000999 Control
-msw -msw 98 1.390927 0.000999 Control
-gangcha region -gangcha region 70 1.432247 0.001998 Control
high in gannan tibetan autonomous prefecture compared to gangcha region in tibet autonomous region feces homo sapiens tibetan plateau adult high in gannan tibetan autonomous prefecture ... 70 1.432247 0.001998 Control
high in hiv infection compared to control in united states of america feces homo sapiens high in hiv infection compared to control i... 45 1.496605 0.000999 Control
high in gay homosexual msm compared to heterosexual msw in homo sapiens feces united states of america adult state of colorado city high in gay homosexual msm compared to heter... 71 1.506088 0.000999 Control
gannan tibetan autonomous prefecture gannan tibetan autonomous prefecture 70 1.621142 0.000999 Control

950 rows × 5 columns


In [ ]: