Notebook preparation

This notebook illustrates the commands in the Illumina Overview Tutorial, as well as some features that are convenient for working with the IPython Notebook and QIIME.

This tutorial makes use of the 12_10 release of the Greengenes reference OTUs. You can always find a link to the latest version of the reference OTUs on the QIIME resources page.

Getting started

We'll begin by downloading data and initializing some variables to configure our IPython computing environment.



In [1]:

    
# need to reset this if running notebook multiple times
base_dir = 'qiime_test'
!rm -rf $base_dir
!mkdir $base_dir



In [2]:

    
!wget https://s3.amazonaws.com/qiime-tutorial/moving_pictures_tutorial.tgz
!wget ftp://greengenes.microbio.me/greengenes_release/gg_12_10/gg_12_10_otus.tar.gz
!tar -xzf moving_pictures_tutorial.tgz -C $base_dir
!tar -xzf gg_12_10_otus.tar.gz -C $base_dir
!rm moving_pictures_tutorial.tgz gg_12_10_otus.tar.gz
!wget -O $base_dir/core_set_aligned.fasta.imputed http://greengenes.lbl.gov/Download/Sequence_Data/Fasta_data_files/core_set_aligned.fasta.imputed
!wget -O $base_dir/lanemask_in_1s_and_0s http://greengenes.lbl.gov/Download/Sequence_Data/lanemask_in_1s_and_0s









    



--2013-04-08 19:27:45--  https://s3.amazonaws.com/qiime-tutorial/moving_pictures_tutorial.tgz
Resolving s3.amazonaws.com... 72.21.214.199
Connecting to s3.amazonaws.com|72.21.214.199|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 29124406 (28M) [application/x-compressed]
Saving to: `moving_pictures_tutorial.tgz'

100%[======================================>] 29,124,406  2.66M/s   in 11s     

2013-04-08 19:27:56 (2.60 MB/s) - `moving_pictures_tutorial.tgz' saved [29124406/29124406]

--2013-04-08 19:27:56--  ftp://greengenes.microbio.me/greengenes_release/gg_12_10/gg_12_10_otus.tar.gz
           => `gg_12_10_otus.tar.gz'
Resolving greengenes.microbio.me... 128.138.93.17
Connecting to greengenes.microbio.me|128.138.93.17|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /greengenes_release/gg_12_10 ... done.
==> SIZE gg_12_10_otus.tar.gz ... 135667617
==> PASV ... done.    ==> RETR gg_12_10_otus.tar.gz ... done.
Length: 135667617 (129M) (unauthoritative)

100%[======================================>] 135,667,617 48.9M/s   in 2.6s    

2013-04-08 19:28:00 (48.9 MB/s) - `gg_12_10_otus.tar.gz' saved [135667617]

--2013-04-08 19:28:08--  http://greengenes.lbl.gov/Download/Sequence_Data/Fasta_data_files/core_set_aligned.fasta.imputed
Resolving greengenes.lbl.gov... 128.32.248.7
Connecting to greengenes.lbl.gov|128.32.248.7|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 37975992 (36M) [text/plain]
Saving to: `qiime_test/core_set_aligned.fasta.imputed'

100%[======================================>] 37,975,992   634K/s   in 56s     

2013-04-08 19:29:05 (666 KB/s) - `qiime_test/core_set_aligned.fasta.imputed' saved [37975992/37975992]

--2013-04-08 19:29:05--  http://greengenes.lbl.gov/Download/Sequence_Data/lanemask_in_1s_and_0s
Resolving greengenes.lbl.gov... 128.32.248.7
Connecting to greengenes.lbl.gov|128.32.248.7|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7683 (7.5K) [text/plain]
Saving to: `qiime_test/lanemask_in_1s_and_0s'

100%[======================================>] 7,683       --.-K/s   in 0s      

2013-04-08 19:29:05 (454 MB/s) - `qiime_test/lanemask_in_1s_and_0s' saved [7683/7683]



In [3]:

    
# get qiime dependcies and set env for running in KBase VM
import os
from os import environ, chdir, mkdir
from os.path import join
nb_dir = os.getcwd()
environ['RDP_JAR_PATH'] = '/kb/runtime/rdp_classifier_2.2/rdp_classifier-2.2.jar'
environ['QIIME_CONFIG_FP'] = join(nb_dir, '.qiime_config')
# create qiime config
qconf = open('.qiime_config', 'w')
qconf.write("%s\t%s\n"%('qiime_scripts_dir', '/usr/local/bin'))
qconf.write("%s\t%s\n"%('temp_dir', join(nb_dir, 'tmp')))
qconf.write("%s\t%s\n"%('pynast_template_alignment_fp', join(nb_dir, base_dir, 'core_set_aligned.fasta.imputed')))
qconf.write("%s\t%s\n"%('template_alignment_lanemask_fp', join(nb_dir, base_dir, 'lanemask_in_1s_and_0s')))
qconf.close()



In [4]:

    
# these are only available in the current development branch of IPython
from IPython.display import FileLinks, FileLink
otu_base = join(base_dir, "gg_12_10_otus")
reference_seqs = join(otu_base, "rep_set/97_otus.fasta")
reference_tree = join(otu_base, "trees/97_otus.tree")
reference_tax = join(otu_base, "taxonomy/97_otu_taxonomy.txt")

Start by seeing what files are in our tutorial direcotry. We can do this using ls as we would on the command line, but in this case we prefix with an ! to tell IPython that we're issuing a bash (i.e., command line) command, rather than a python command.



In [5]:

    
data_dir = join(base_dir, 'moving_pictures_tutorial')
!ls $data_dir









    



454_map.txt		 filtered_mapping_l5.txt      subsampled_454_seqs.fna
filtered_mapping_l1.txt  filtered_mapping_l6.txt      subsampled_fastq
filtered_mapping_l2.txt  procrustes_metadata_map.txt  ucrC_fast_params.txt
filtered_mapping_l3.txt  procrustes_sid_map.txt
filtered_mapping_l4.txt  README.txt

QIIME additionally supports more convenient output formattting for the IPython notebook so you can directly interact with or download your data.



In [6]:

    
FileLinks(base_dir)









    Out[6]:




qiime_test/

  lanemask_in_1s_and_0s

  core_set_aligned.fasta.imputed

qiime_test/gg_12_10_otus/

  note

qiime_test/gg_12_10_otus/otus/

  88_otu_map.txt

  97_clusters.uc

  85_clusters.uc

  67_clusters.uc

  82_otu_map.txt

  91_otu_map.txt

  97_otu_map.txt

  64_otu_map.txt

  99_clusters.uc

  73_clusters.uc

  79_clusters.uc

  88_clusters.uc

  61_otu_map.txt

  76_clusters.uc

  73_otu_map.txt

  94_clusters.uc

  82_clusters.uc

  70_clusters.uc

  70_otu_map.txt

  67_otu_map.txt

  64_clusters.uc

  61_clusters.uc

  76_otu_map.txt

  79_otu_map.txt

  85_otu_map.txt

  99_otu_map.txt

  91_clusters.uc

  94_otu_map.txt

qiime_test/gg_12_10_otus/rep_set/

  91_otus.fasta

  64_otus.fasta

  79_otus.fasta

  76_otus.fasta

  97_otus.fasta

  94_otus.fasta

  73_otus.fasta

  82_otus.fasta

  85_otus.fasta

  67_otus.fasta

  61_otus.fasta

  99_otus.fasta

  88_otus.fasta

  70_otus.fasta

qiime_test/gg_12_10_otus/taxonomy/

  73_otu_taxonomy.txt

  97_otu_taxonomy.txt

  70_otu_taxonomy.txt

  85_otu_taxonomy.txt

  76_otu_taxonomy.txt

  67_otu_taxonomy.txt

  94_otu_taxonomy.txt

  64_otu_taxonomy.txt

  99_otu_taxonomy.txt

  82_otu_taxonomy.txt

  79_otu_taxonomy.txt

  61_otu_taxonomy.txt

  88_otu_taxonomy.txt

  91_otu_taxonomy.txt

qiime_test/gg_12_10_otus/trees/

  88_otus.tree

  82_otus.tree

  73_otus.tree

  76_otus.tree

  67_otus.tree

  97_otus.tree

  79_otus.tree

  70_otus.tree

  85_otus.tree

  94_otus.tree

  61_otus.tree

  99_otus.tree

  64_otus.tree

  91_otus.tree

qiime_test/moving_pictures_tutorial/

  procrustes_sid_map.txt

  filtered_mapping_l4.txt

  filtered_mapping_l5.txt

  filtered_mapping_l6.txt

  filtered_mapping_l2.txt

  454_map.txt

  filtered_mapping_l3.txt

  ucrC_fast_params.txt

  README.txt

  filtered_mapping_l1.txt

  subsampled_454_seqs.fna

  procrustes_metadata_map.txt

qiime_test/moving_pictures_tutorial/subsampled_fastq/

  subsampled_s_6_sequence_barcodes.fastq

  subsampled_s_2_sequence_barcodes.fastq

  subsampled_s_3_sequence.fastq

  subsampled_s_4_sequence.fastq

  subsampled_s_1_sequence_barcodes.fastq

  subsampled_s_6_sequence.fastq

  subsampled_s_5_sequence.fastq

  subsampled_s_4_sequence_barcodes.fastq

  subsampled_s_2_sequence.fastq

  subsampled_s_1_sequence.fastq

  subsampled_s_3_sequence_barcodes.fastq

  subsampled_s_5_sequence_barcodes.fastq

First, let's check our mapping file for errors



In [7]:

    
!check_id_map.py -o $base_dir/cid/ -m $data_dir/filtered_mapping_l1.txt









    



No errors or warnings were found in mapping file.

In this case there were no errors, but if there were we would review the resulting html summary to find out what errors are present. You could then fix those in a spreadsheet program or text editor. To view that html file, call FileLinks on the output directory from the previous step and click the link to the html file.



In [8]:

    
FileLinks(base_dir+'/cid/')









    Out[8]:




qiime_test/cid/

  overlib.js

  filtered_mapping_l1.log

  filtered_mapping_l1.html

  filtered_mapping_l1_corrected.txt

Demultiplexing and quality filtering seqeunces



In [9]:

    
!split_libraries_fastq.py -o $base_dir/slout/ -i $data_dir/subsampled_fastq/subsampled_s_1_sequence.fastq,$data_dir/subsampled_fastq/subsampled_s_2_sequence.fastq,$data_dir/subsampled_fastq/subsampled_s_3_sequence.fastq,$data_dir/subsampled_fastq/subsampled_s_4_sequence.fastq,$data_dir/subsampled_fastq/subsampled_s_5_sequence.fastq,$data_dir/subsampled_fastq/subsampled_s_6_sequence.fastq -b $data_dir/subsampled_fastq/subsampled_s_1_sequence_barcodes.fastq,$data_dir/subsampled_fastq/subsampled_s_2_sequence_barcodes.fastq,$data_dir/subsampled_fastq/subsampled_s_3_sequence_barcodes.fastq,$data_dir/subsampled_fastq/subsampled_s_4_sequence_barcodes.fastq,$data_dir/subsampled_fastq/subsampled_s_5_sequence_barcodes.fastq,$data_dir/subsampled_fastq/subsampled_s_6_sequence_barcodes.fastq -m $data_dir/filtered_mapping_l1.txt,$data_dir/filtered_mapping_l2.txt,$data_dir/filtered_mapping_l3.txt,$data_dir/filtered_mapping_l4.txt,$data_dir/filtered_mapping_l5.txt,$data_dir/filtered_mapping_l6.txt

We often want to see the results of running a command. Here we can do that by calling our output formatter again, this time passing the output directory from the previous step.



In [10]:

    
FileLinks(base_dir+'/slout/')









    Out[10]:




qiime_test/slout/

  split_library_log.txt

  seqs.fna

  histograms.txt



In [11]:

    
!count_seqs.py -i $base_dir/slout/seqs.fna









    



66189  : qiime_test/slout/seqs.fna (Sequence lengths (mean +/- std): 132.1182 +/- 9.6185)
66189  : Total

OTU picking: using an open-reference OTU picking protocol by searching reads agsint the Greengenes database.



In [12]:

    
!pick_subsampled_reference_otus_through_otu_table.py -o $base_dir/ucrss_fast/ -i $base_dir/slout/seqs.fna -r $reference_seqs -p $data_dir/ucrC_fast_params.txt



In [13]:

    
FileLinks(base_dir+'/ucrss_fast/')









    Out[13]:




qiime_test/ucrss_fast/

  otu_table_mc2_w_tax.biom

  otu_table_mc2.biom

  log_20130408192934.txt

  final_otu_map.txt

  otu_table_mc2_w_tax_no_pynast_failures.biom

  new_refseqs.fna

  final_otu_map_mc2.txt

  rep_set.fna

  rep_set.tre

qiime_test/ucrss_fast/prefilter_otus/

  seqs_failures.txt

  seqs_otus.log

  prefiltered_seqs.fna

  seqs_otus.txt

  seqs_clusters.uc

qiime_test/ucrss_fast/pynast_aligned_seqs/

  rep_set_aligned.fasta

  rep_set_log.txt

  rep_set_aligned_pfiltered.fasta

  rep_set_failures.fasta

qiime_test/ucrss_fast/rdp_assigned_taxonomy/

  rep_set_tax_assignments.txt

  rep_set_tax_assignments.log

qiime_test/ucrss_fast/step1_otus/

  prefiltered_seqs_clusters.uc

  prefiltered_seqs_otus.log

  prefiltered_seqs_failures.txt

  failures.fasta

  subsampled_failures.fasta

  step1_rep_set.fna

  prefiltered_seqs_otus.txt

qiime_test/ucrss_fast/step2_otus/

  subsampled_failures_otus.txt

  subsampled_failures_clusters.uc

  subsampled_failures_otus.log

  step2_rep_set.fna

qiime_test/ucrss_fast/step3_otus/

  failures_failures.txt

  failures_clusters.uc

  failures_otus.log

  failures_otus.txt

  failures_failures.fasta

qiime_test/ucrss_fast/step4_otus/

  failures_failures_clusters.uc

  step4_rep_set.fna

  failures_failures_otus.log

  failures_failures_otus.txt



In [14]:

    
!print_biom_table_summary.py -i $base_dir/ucrss_fast/otu_table_mc2_w_tax_no_pynast_failures.biom









    



Num samples: 61
Num observations: 1845
Total count: 63399.0
Table density (fraction of non-zero values): 0.0860
Table md5 (unzipped): f8acb7249e8f337081d5d77ab4d61cc8

Counts/sample summary:
 Min: 122.0
 Max: 2560.0
 Median: 891.0
 Mean: 1039.32786885
 Std. dev.: 758.897511846
 Sample Metadata Categories: None provided
 Observation Metadata Categories: taxonomy

Counts/sample detail:
 L3S237: 122.0
 L3S235: 173.0
 L3S372: 186.0
 L3S373: 227.0
 L3S368: 238.0
 L3S370: 242.0
 L3S367: 243.0
 L3S236: 257.0
 L3S369: 260.0
 L3S238: 271.0
 L3S9: 271.0
 L3S8: 275.0
 L3S6: 279.0
 L3S371: 283.0
 L3S7: 301.0
 L5S234: 365.0
 L5S231: 381.0
 L5S235: 414.0
 L5S99: 426.0
 L5S98: 433.0
 L5S230: 442.0
 L5S232: 457.0
 L5S233: 459.0
 L5S229: 505.0
 L5S97: 512.0
 L5S100: 522.0
 L2S236: 584.0
 L2S234: 721.0
 L2S231: 769.0
 L2S229: 844.0
 L2S233: 891.0
 L2S99: 899.0
 L2S230: 935.0
 L2S232: 957.0
 L2S100: 1029.0
 L2S97: 1050.0
 L2S235: 1059.0
 L1S4: 1079.0
 L2S98: 1097.0
 L1S3: 1206.0
 L1S1: 1312.0
 L1S131: 1556.0
 L1S134: 1574.0
 L1S132: 1699.0
 L4S146: 1748.0
 L1S129: 1753.0
 L1S130: 1766.0
 L1S135: 1766.0
 L1S133: 1891.0
 L1S2: 1899.0
 L6S100: 1994.0
 L6S101: 2040.0
 L1S136: 2061.0
 L4S144: 2127.0
 L4S145: 2147.0
 L6S102: 2206.0
 L1S287: 2316.0
 L6S103: 2367.0
 L4S147: 2441.0
 L1S290: 2512.0
 L1S289: 2560.0

Create a single mapping file from the per-lane mapping files.



In [15]:

    
!merge_mapping_files.py -o $base_dir/combined_mapping_file.txt -m $data_dir/filtered_mapping_l1.txt,$data_dir/filtered_mapping_l2.txt,$data_dir/filtered_mapping_l3.txt,$data_dir/filtered_mapping_l4.txt,$data_dir/filtered_mapping_l5.txt,$data_dir/filtered_mapping_l6.txt



In [16]:

    
!head $base_dir/combined_mapping_file.txt









    



#SampleID	BarcodeSequence	LinkerPrimerSequence	days_since_epoch	SamplePlate	timestamp	SampleType	month	PrimerPlate	WellID	year	day	subject	Description
L6S100	GTCGCTGTCTTC	GTGCCAGCMGCCGCGGTAA	14355	20	20090421	Tongue	4	5	d1	2009	21	2	RK_Tong_4_21_2009
L2S229	CAACTCATCGTA	GTGCCAGCMGCCGCGGTAA	14355	7	20090421	L_palm	4	2	e5	2009	21	1	AB_L_Palm_4_21_2009
L6S102	GTTGACGACAGC	GTGCCAGCMGCCGCGGTAA	14357	20	20090423	Tongue	4	5	f1	2009	23	2	RK_Tong_4_23_2009
L6S103	TACGCGCTGAGA	GTGCCAGCMGCCGCGGTAA	14358	20	20090424	Tongue	4	5	g1	2009	24	2	RK_Tong_4_24_2009
L3S369	ATACAGAGCTCC	GTGCCAGCMGCCGCGGTAA	14357	12	20090423	R_palm	4	2	a11	2009	23	1	AB_R_Palm_4_23_2009
L3S368	CATCTGTAGCGA	GTGCCAGCMGCCGCGGTAA	14356	12	20090422	R_palm	4	2	h10	2009	22	1	AB_R_Palm_4_22_2009
L2S100	ACGCTCATGGAT	GTGCCAGCMGCCGCGGTAA	14176	6	20081024	L_palm	10	1	d1	2008	24	1	AB_L_Palm_10_24_2008
L3S373	CACATCTAACAC	GTGCCAGCMGCCGCGGTAA	14175	12	20081023	R_palm	10	2	e11	2008	23	2	RK_R_Palm_10_23_2008
L3S370	ATCCTCAGTAGT	GTGCCAGCMGCCGCGGTAA	14358	12	20090424	R_palm	4	2	b11	2009	24	1	AB_R_Palm_4_24_2009

To view a single file (rather than a directory) we use the FileLink function instead of the FileLinks function.



In [17]:

    
FileLink(base_dir+'/combined_mapping_file.txt')









    Out[17]:




qiime_test/combined_mapping_file.txt

Run diversity analyses

Here we're running the core_diversity_analyses.py script which applies many of the "first-pass" diversity analyses that users are generally interested in. The main output that users will interact with is the index.html file, which provides links into the different analysis results.



In [18]:

    
!core_diversity_analyses.py -o $base_dir/cd258/ -i $base_dir/ucrss_fast/otu_table_mc2_w_tax_no_pynast_failures.biom -m $base_dir/combined_mapping_file.txt -t $base_dir/ucrss_fast/rep_set.tre -e 258 -c "SampleType,days_since_epoch"









    



This code was only tested with Matplotlib-1.1.0

This code was only tested with Matplotlib-1.1.0

This code was only tested with Matplotlib-1.1.0

This code was only tested with Matplotlib-1.1.0

This code was only tested with Matplotlib-1.1.0

This code was only tested with Matplotlib-1.1.0

This code was only tested with Matplotlib-1.1.0

Warning, the following samples were in the category mapping file but not the OTU table and will be ignored: 
L3S368

L3S373

L3S370

L3S367

L3S372

L3S237

L3S236

L3S235


Warning, the following samples were in the category mapping file but not the OTU table and will be ignored: 
L3S368

L3S373

L3S370

L3S367

L3S372

L3S237

L3S236

L3S235



In [19]:

    
FileLink(base_dir+'/cd258/index.html')









    Out[19]:




qiime_test/cd258/index.html



In [19]: