In this tutorial we illustrate steps for analyzing fungal ITS amplicon data using the QIIME/UNITE reference OTUs (alpha version 12_11; find the link to the latest version of the reference OTUs on the QIIME resources page) to compare the composition of 9 soil communities using open-reference OTU picking.
First we initialize our environment by obtaining the necessary files.
In [ ]:
tutorial_url = "https://s3.amazonaws.com/s3-qiime_tutorial_files/its-soils-tutorial.tgz"
reference_url = "https://github.com/downloads/qiime/its-reference-otus/its_12_11_otus.tar.gz"
!wget $tutorial_url
!wget $reference_url
Now unzip these files.
In [ ]:
!tar -xzf its-soils-tutorial.tgz
!tar -xzf its_12_11_otus.tar.gz
!gunzip ./its_12_11_otus/rep_set/97_otus.fasta.gz
!gunzip ./its_12_11_otus/taxonomy/97_otu_taxonomy.txt.gz
You can then view the files in each of these direcories by passing the directory name to the FileLinks
function.
In [ ]:
from IPython.display import FileLink, FileLinks
FileLinks('its-soils-tutorial')
The params.txt
file contains many of the details of this analysis. You can review those by clicking the link or by catting the file.
In [ ]:
!cat its-soils-tutorial/params.txt
The paramters that differentiate ITS analysis from analysis of other amplicons are the two assign_taxonomy
parameters, which are pointing to the reference collection that we just downloaded, and the beta_diversity
parameter, where we specify to compare communities using the non-phylogenetic Bray-Curtis metric, which we use here because we don't currently have a phylogenetic tree relating these OTUs.
Set up paths for the analysis steps.
In [ ]:
input_seqs_fp = "its-soils-tutorial/seqs.fna"
reference_seqs_fp = "its_12_11_otus/rep_set/97_otus.fasta"
output_dir = "ucrss_fast/"
parameters_fp = "its-soils-tutorial/params.txt"
mapping_fp = "its-soils-tutorial/map.txt"
We're now ready to run the pick_open_reference_otus.py
workflow. You can find a description of this process here.
In [ ]:
!pick_open_reference_otus.py -i $input_seqs_fp -r $reference_seqs_fp -o $output_dir -p $parameters_fp --suppress_align_and_tree
After that completes (it will take a few minutes) we'll have the OTU table with taxonomy. You can review all of the files that are created by passing the path to the output directory to the FileLinks
function.
In [ ]:
FileLinks("ucrss_fast/")
You can then pass the OTU table to biom summarize-table
to view a summary of the information in the OTU table.
In [ ]:
otu_table_fp = "ucrss_fast/otu_table_mc2_w_tax.biom"
summary_fp = "ucrss_fast/otu_table_mc2_w_tax_summary.txt"
!biom summarize-table -i $otu_table_fp -o $summary_fp
To view a single file (in this case, the output table summary that we created above), use FileLink
.
In [ ]:
FileLink(summary_fp)
Next we'll run a few representative analyses on the OTU table. First we'll compute beta diveristy and generated PCoA plots, and then we'll generate taxonomic summaries of the samples. Note that for the beta diversity analysis we choose an even sampling depth of 287
based on the results of biom summarize-table
.
In [ ]:
bdiv_output_dir = "ucrss_fast/bdiv_even287/"
!beta_diversity_through_plots.py -i $otu_table_fp -o $bdiv_output_dir -e 287 -p $parameters_fp -m $mapping_fp
In [ ]:
FileLinks("ucrss_fast/bdiv_even287/")
The primary files of interested here are the 2d and 3d PCoA plots. You can view those by clicking the corresponding links.
In [ ]:
taxa_output_dir = 'ucrss_fast/taxa_plots/'
!summarize_taxa_through_plots.py -i $otu_table_fp -o $taxa_output_dir
In [ ]:
FileLinks('ucrss_fast/taxa_plots/')
Here the most interesting file is the bar_charts.html
. Click that to open it in a new browser window/tab.