In [1]:
import calour as ca
Calour uses Python's builtin logging
module to print out logging messages. By default the logging level is set to WARNING
. Let's change it to INFO
for the purpose of this tutorial, so we get more detailed information about the function outputs.
In [2]:
ca.set_log_level('INFO')
In [3]:
%matplotlib notebook
This data set is from: Caporaso JG, Lauber CL, Costello EK, Berg-Lyons D, Gonzalez A, Stombaugh J, Knights D, Gajer P, Ravel J, Fierer N, et al. (2011) Moving pictures of the human microbiome. Genome Biology, 12, R50.
The raw data are reproccessed with deblur method, which is published in mSystem
We use read_amplicon
to read the data into AmpliconExperiment
class. This class has some amplicon experiment specific functions such as filter_taxonomy
etc.
Useful parameters are:
In [4]:
exp = ca.read_amplicon(data_file='data/moving_pic.biom', sample_metadata_file='data/moving_pic.sample.txt',
min_reads=1000, normalize=10000)
In [5]:
exp
Out[5]:
In [6]:
exp.data
Out[6]:
In [7]:
exp.sample_metadata.head(5)
Out[7]:
In [8]:
exp.feature_metadata.head(5)
Out[8]:
In [9]:
exp.sample_metadata.COMMON_SAMPLE_SITE.value_counts()
Out[9]:
In [10]:
exp=exp.sort_samples('DAYS_SINCE_EXPERIMENT_START').sort_samples('COMMON_SAMPLE_SITE').sort_samples('HOST_SUBJECT_ID')
In [11]:
exp.feature_metadata.head()
Out[11]:
In [12]:
exp.sample_metadata['BODY_PRODUCT'].value_counts()
Out[12]:
let's split the dataset:
In [13]:
feces = exp.filter_samples('BODY_PRODUCT', 'UBERON:feces')
tongue = exp.filter_samples('BODY_PRODUCT', 'UBERON:tongue')
# just to have fun, let's negate and use multiple values
hand = exp.filter_samples('BODY_PRODUCT', ['UBERON:feces', 'UBERON:tongue'], negate=True )
NOTE: The data array and sample and feature metadata are always synchronized to the same order for all the manipulations (filtering, sorting, transforming, etc.) done on Experiment
object.
In [14]:
feces
Out[14]:
In [15]:
tongue
Out[15]:
In [16]:
hand
Out[16]:
Note that filtering samples does not change or get rid of features not present in the set of samples.
In [17]:
tt=exp.filter_abundance(50)
We plot by sorting samples according to body site, with sample color bars for the subject id.
We use the plot_sort
function which does a sort on a (set) of fields on the fly and then plots.
We can specify the type of interactive heatmap using the gui
with following options:
In [18]:
f = tt.sort_samples('BODY_PRODUCT').plot(gui='jupyter', feature_field='taxonomy', barx_fields=['HOST_SUBJECT_ID'], clim=[0, 1000])
In [19]:
# you can save figures
f.save_figure('fig.pdf')
In [19]:
zz=tt.sort_taxonomy()
zz.sort_samples('BODY_PRODUCT').plot(gui='jupyter',barx_fields=['HOST_SUBJECT_ID'], clim=[0, 1000])
Out[19]:
In [20]:
ttt=tt.cluster_features()
In [21]:
f=ttt.sort_samples('BODY_PRODUCT').plot(gui='jupyter',barx_fields=['HOST_SUBJECT_ID','COMMON_SAMPLE_SITE'], clim=[0,1000])
This is the same plot but focused on a specific region that is interesting:
In [22]:
f=ttt.sort_samples('BODY_PRODUCT').plot(gui='jupyter',barx_fields=['HOST_SUBJECT_ID','COMMON_SAMPLE_SITE'],
rect=[-0.5, 1966.5, 2511.5, 2354.5],
clim=[0,1000], feature_field=None)
We can see there is a set of sOTUs showing up in together in the individual of "M3" sporadically (near the bottom of the heatmap above). This behavior is difficult to see in a naive heatmap without clustering and sorting the features.
In [23]:
tt = feces.filter_samples('HOST_SUBJECT_ID','M3')
tt=tt.cluster_features(50)
In [24]:
dd = tt.correlation('DAYS_SINCE_EXPERIMENT_START', random_seed=2018)
We get 259 features with significant correlation following FDR control.
Note features are sorted by the effect size (biggest/smallest correlation is top/bottom)
In [25]:
dd.plot(feature_field=None, gui='jupyter')
Out[25]:
In [26]:
ttt=tt.filter_prevalence(0.3)
In [27]:
ttt=ttt.sort_centroid()
In [28]:
ttt.sample_metadata.HOST_SUBJECT_ID.value_counts()
Out[28]:
In [29]:
f=ttt.plot(gui='jupyter',feature_field=None, clim=[0,1000])
In [30]:
tt = feces.sort_abundance({'HOST_SUBJECT_ID': ['M3']})
In [31]:
tt.plot(sample_field='HOST_SUBJECT_ID', feature_field=None, gui='jupyter')
Out[31]:
In [32]:
m3s=hand.filter_samples('HOST_SUBJECT_ID','M3')
m3s=m3s.normalize_compositional()
m3s=m3s.sort_samples('DAYS_SINCE_EPOCH')
dd=m3s.diff_abundance('COMMON_SAMPLE_SITE','L_palm', random_seed=2018)
dd.sort_samples('COMMON_SAMPLE_SITE').plot(sample_field='COMMON_SAMPLE_SITE', gui='jupyter', title='Host M3')
Out[32]:
In [33]:
m3s=hand.filter_samples('HOST_SUBJECT_ID','F4')
m3s=m3s.sort_samples('DAYS_SINCE_EPOCH')
dd=m3s.diff_abundance('COMMON_SAMPLE_SITE','L_palm', random_seed=2018)
dd.sort_samples('COMMON_SAMPLE_SITE').plot(sample_field='COMMON_SAMPLE_SITE', gui='jupyter')
Out[33]:
In [ ]: