Setup data directory


In [1]:
cd ~/Desktop/SSUsearch/


/home/gjr/Desktop/SSUsearch

In [2]:
mkdir -p ./data

In [3]:
cd ./data


/home/gjr/Desktop/SSUsearch/data

Download database files


In [4]:
!wget https://s3.amazonaws.com/ssusearchdb/SSUsearch_db.tgz


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  117M  100  117M    0     0  2173k      0  0:00:55  0:00:55 --:--:-- 2380k

In [5]:
!tar -xzvf SSUsearch_db.tgz


SSUsearch_db/
SSUsearch_db/Gene_db.silva_108_rep_set.fasta
SSUsearch_db/Gene_tax.silva_taxa_family.tax
SSUsearch_db/Gene_model_org.16s_ecoli_J01695.fasta
SSUsearch_db/Gene_db_cc.greengene_97_otus.fasta
SSUsearch_db/Gene_tax_cc.greengene_97_otus.tax
SSUsearch_db/Copy_db.copyrighter.txt
SSUsearch_db/Ali_template.silva_ssu.fasta
SSUsearch_db/readme
SSUsearch_db/Ali_template.silva_lsu.fasta
SSUsearch_db/Ali_template.test.fasta
SSUsearch_db/Ali_template.test_lsu.fasta
SSUsearch_db/Gene_db.lsu_silva_rep.fasta
SSUsearch_db/Gene_db.ssu_rdp_rep.fasta
SSUsearch_db/Gene_tax.lsu_silva_rep.tax
SSUsearch_db/Gene_tax.ssu_rdp_rep.tax
SSUsearch_db/Hmm.lsu.hmm
SSUsearch_db/clean.sh
SSUsearch_db/Hmm.ssu.hmm

download a small test dataset

ATT: for real (larger) dataset, make sure there is enough disk space.


In [9]:
!wget https://s3.amazonaws.com/ssusearchdb/test.tgz
!tar -xzvf test.tgz


--2015-10-25 08:22:59--  http://athyra.oxli.org/~gjr/public2/misc/SSUsearch/test.tgz
Resolving athyra.oxli.org (athyra.oxli.org)... 35.8.120.27
Connecting to athyra.oxli.org (athyra.oxli.org)|35.8.120.27|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9952 (9.7K) [application/x-gzip]
Saving to: `test.tgz.1'

100%[======================================>] 9,952       --.-K/s   in 0.004s  

2015-10-25 08:22:59 (2.61 MB/s) - `test.tgz.1' saved [9952/9952]

test/
test/SS.design
test/data/
test/data/1c.fa
test/data/1d.fa
test/data/2d.fa
test/data/2c.fa

In [10]:
ls test/data/


1c.fa  1d.fa  2c.fa  2d.fa

This tutorial assumes that you ready finished quality trimming, and also paired end merge, if you paired end reads overlap.

For quality trimming, we recommend trimmomatic written in java, or fastq-mcf written in C.

For paired end reads merging, we recommend pandseq or flash


In [ ]: