This notebook contains all of the code from the corresponding post on the One Codex Blog. These snippets are exactly what are in the blog post, and let you perfectly reproduce those figures.
This is meant to be a starting off point for you to get started analyzing your own samples. You can copy this notebook straight into your account using the button in the header. To "run" or execute a cell, just hit Shift + Enter
. A few other resources you may find useful include: notes on getting started with our One Codex library; the full documentation on our API (more technical); a cheat sheet on getting started with Pandas, a Python library for data manipulation; and reading a few of our blog posts (where we plan to have nice demos with these notebooks). As always, also feel free to send us questions or suggestions by clicking the chat icon in the bottom right!
Now we're going to dive right in and start crunching some numbers!
In [ ]:
from onecodex import Api
ocx = Api()
project = ocx.Projects.get("d53ad03b010542e3") # get DIABIMMUNE project by ID
samples = ocx.Samples.where(project=project.id, public=True, limit=50)
samples.metadata[[
"gender",
"host_age",
"geo_loc_name",
"totalige",
"eggs",
"vegetables",
"milk",
"wheat",
"rice",
]]
In [ ]:
chao1 = samples.plot_metadata(vaxis="chao1", haxis="geo_loc_name", return_chart=True)
simpson = samples.plot_metadata(vaxis="simpson", haxis="geo_loc_name", return_chart=True)
shannon = samples.plot_metadata(vaxis="shannon", haxis="geo_loc_name", return_chart=True)
chao1 | simpson | shannon
In [ ]:
from onecodex.notebooks.report import *
ref_text = 'Roo, et al. "How to Python." Nature, 2019.'
legend(f"Alpha diversity by location of birth{reference(text=ref_text, label='roo1')}")
In [ ]:
samples.plot_metadata(haxis="host_age", vaxis="Bacteroides", plot_type="scatter")
Here, we're going to drop into a dataframe, slice it to fetch all the data points from a single subject of the study, and generate a stacked bar plot. It's nice to see the expected high abundance of Bifidobacterium early in life, giving way to Bacteroides near age three!
In [ ]:
# generate a dataframe containing relative abundances
df_rel = samples.to_df(rank="genus")
# fetch all samples for subject P014839
subject_metadata = samples.metadata.loc[samples.metadata["host_subject_id"] == "P014839"]
subject_df = df_rel.loc[subject_metadata.index]
# put them in order of sample date
subject_df = subject_df.loc[subject_metadata["host_age"].sort_values().index]
# you can access our library using the ocx accessor on pandas dataframes!
subject_df.ocx.plot_bargraph(
rank="genus",
label=lambda metadata: str(metadata["host_age"]),
title="Subject P014839 Over Time",
xlabel="Host Age at Sampling Time (days)",
ylabel="Relative Abundance",
legend="Genus",
)
In [ ]:
df_rel[:30].ocx.plot_heatmap(legend="Relative Abundance", tooltip="geo_loc_name")
In [ ]:
# generate a dataframe containing read counts
df_abs = samples.to_df()
df_abs[:30].ocx.plot_distance(metric="weighted_unifrac")
In [ ]:
samples.plot_pca(color="geo_loc_name", size="Bifidobacterium", title="My PCoA Plot")
In [ ]:
samples.plot_mds(
metric="weighted_unifrac", method="pcoa", color="geo_loc_name", title="My PCoA Plot"
)
In [ ]:
page_break()
In [ ]:
bibliography()
In [ ]: