This notebook illustrates the commands in the Illumina Overview Tutorial, as well as some features that are convenient for working with the IPython Notebook and QIIME.
This tutorial makes use of the 12_10
release of the Greengenes reference OTUs. You can always find a link to the latest version of the reference OTUs on the QIIME resources page.
In [1]:
# need to reset this if running notebook multiple times
base_dir = 'qiime_test'
!rm -rf $base_dir
!mkdir $base_dir
In [2]:
!wget https://s3.amazonaws.com/qiime-tutorial/moving_pictures_tutorial.tgz
!wget ftp://greengenes.microbio.me/greengenes_release/gg_12_10/gg_12_10_otus.tar.gz
!tar -xzf moving_pictures_tutorial.tgz -C $base_dir
!tar -xzf gg_12_10_otus.tar.gz -C $base_dir
!rm moving_pictures_tutorial.tgz gg_12_10_otus.tar.gz
!wget -O $base_dir/core_set_aligned.fasta.imputed http://greengenes.lbl.gov/Download/Sequence_Data/Fasta_data_files/core_set_aligned.fasta.imputed
!wget -O $base_dir/lanemask_in_1s_and_0s http://greengenes.lbl.gov/Download/Sequence_Data/lanemask_in_1s_and_0s
In [3]:
# get qiime dependcies and set env for running in KBase VM
import os
from os import environ, chdir, mkdir
from os.path import join
nb_dir = os.getcwd()
environ['RDP_JAR_PATH'] = '/kb/runtime/rdp_classifier_2.2/rdp_classifier-2.2.jar'
environ['QIIME_CONFIG_FP'] = join(nb_dir, '.qiime_config')
# create qiime config
qconf = open('.qiime_config', 'w')
qconf.write("%s\t%s\n"%('qiime_scripts_dir', '/usr/local/bin'))
qconf.write("%s\t%s\n"%('temp_dir', join(nb_dir, 'tmp')))
qconf.write("%s\t%s\n"%('pynast_template_alignment_fp', join(nb_dir, base_dir, 'core_set_aligned.fasta.imputed')))
qconf.write("%s\t%s\n"%('template_alignment_lanemask_fp', join(nb_dir, base_dir, 'lanemask_in_1s_and_0s')))
qconf.close()
In [4]:
# these are only available in the current development branch of IPython
from IPython.display import FileLinks, FileLink
otu_base = join(base_dir, "gg_12_10_otus")
reference_seqs = join(otu_base, "rep_set/97_otus.fasta")
reference_tree = join(otu_base, "trees/97_otus.tree")
reference_tax = join(otu_base, "taxonomy/97_otu_taxonomy.txt")
Start by seeing what files are in our tutorial direcotry. We can do this using ls
as we would on the command line, but in this case we prefix with an !
to tell IPython that we're issuing a bash
(i.e., command line) command, rather than a python command.
In [5]:
data_dir = join(base_dir, 'moving_pictures_tutorial')
!ls $data_dir
QIIME additionally supports more convenient output formattting for the IPython notebook so you can directly interact with or download your data.
In [6]:
FileLinks(base_dir)
Out[6]:
In [7]:
!check_id_map.py -o $base_dir/cid/ -m $data_dir/filtered_mapping_l1.txt
In this case there were no errors, but if there were we would review the resulting html summary to find out what errors are present. You could then fix those in a spreadsheet program or text editor. To view that html file, call FileLinks
on the output directory from the previous step and click the link to the html
file.
In [8]:
FileLinks(base_dir+'/cid/')
Out[8]:
In [9]:
!split_libraries_fastq.py -o $base_dir/slout/ -i $data_dir/subsampled_fastq/subsampled_s_1_sequence.fastq,$data_dir/subsampled_fastq/subsampled_s_2_sequence.fastq,$data_dir/subsampled_fastq/subsampled_s_3_sequence.fastq,$data_dir/subsampled_fastq/subsampled_s_4_sequence.fastq,$data_dir/subsampled_fastq/subsampled_s_5_sequence.fastq,$data_dir/subsampled_fastq/subsampled_s_6_sequence.fastq -b $data_dir/subsampled_fastq/subsampled_s_1_sequence_barcodes.fastq,$data_dir/subsampled_fastq/subsampled_s_2_sequence_barcodes.fastq,$data_dir/subsampled_fastq/subsampled_s_3_sequence_barcodes.fastq,$data_dir/subsampled_fastq/subsampled_s_4_sequence_barcodes.fastq,$data_dir/subsampled_fastq/subsampled_s_5_sequence_barcodes.fastq,$data_dir/subsampled_fastq/subsampled_s_6_sequence_barcodes.fastq -m $data_dir/filtered_mapping_l1.txt,$data_dir/filtered_mapping_l2.txt,$data_dir/filtered_mapping_l3.txt,$data_dir/filtered_mapping_l4.txt,$data_dir/filtered_mapping_l5.txt,$data_dir/filtered_mapping_l6.txt
We often want to see the results of running a command. Here we can do that by calling our output formatter again, this time passing the output directory from the previous step.
In [10]:
FileLinks(base_dir+'/slout/')
Out[10]:
In [11]:
!count_seqs.py -i $base_dir/slout/seqs.fna
In [12]:
!pick_subsampled_reference_otus_through_otu_table.py -o $base_dir/ucrss_fast/ -i $base_dir/slout/seqs.fna -r $reference_seqs -p $data_dir/ucrC_fast_params.txt
In [13]:
FileLinks(base_dir+'/ucrss_fast/')
Out[13]:
In [14]:
!print_biom_table_summary.py -i $base_dir/ucrss_fast/otu_table_mc2_w_tax_no_pynast_failures.biom
In [15]:
!merge_mapping_files.py -o $base_dir/combined_mapping_file.txt -m $data_dir/filtered_mapping_l1.txt,$data_dir/filtered_mapping_l2.txt,$data_dir/filtered_mapping_l3.txt,$data_dir/filtered_mapping_l4.txt,$data_dir/filtered_mapping_l5.txt,$data_dir/filtered_mapping_l6.txt
In [16]:
!head $base_dir/combined_mapping_file.txt
To view a single file (rather than a directory) we use the FileLink
function instead of the FileLinks
function.
In [17]:
FileLink(base_dir+'/combined_mapping_file.txt')
Out[17]:
Here we're running the core_diversity_analyses.py
script which applies many of the "first-pass" diversity analyses that users are generally interested in. The main output that users will interact with is the index.html
file, which provides links into the different analysis results.
In [18]:
!core_diversity_analyses.py -o $base_dir/cd258/ -i $base_dir/ucrss_fast/otu_table_mc2_w_tax_no_pynast_failures.biom -m $base_dir/combined_mapping_file.txt -t $base_dir/ucrss_fast/rep_set.tre -e 258 -c "SampleType,days_since_epoch"
In [19]:
FileLink(base_dir+'/cd258/index.html')
Out[19]:
In [19]: