Setup data directory

In [2]:
cd /usr/local/notebooks


In [3]:
mkdir -p ./data

In [4]:
cd ./data


Download database files

In [5]:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  117M  100  117M    0     0   9.9M      0  0:00:11  0:00:11 --:--:-- 10.4M

In [6]:
!tar -xzvf SSUsearch_db.tgz

x SSUsearch_db/
x SSUsearch_db/Gene_db.silva_108_rep_set.fasta
x SSUsearch_db/
x SSUsearch_db/Gene_model_org.16s_ecoli_J01695.fasta
x SSUsearch_db/Gene_db_cc.greengene_97_otus.fasta
x SSUsearch_db/
x SSUsearch_db/Copy_db.copyrighter.txt
x SSUsearch_db/Ali_template.silva_ssu.fasta
x SSUsearch_db/readme
x SSUsearch_db/Ali_template.silva_lsu.fasta
x SSUsearch_db/Ali_template.test.fasta
x SSUsearch_db/Ali_template.test_lsu.fasta
x SSUsearch_db/Gene_db.lsu_silva_rep.fasta
x SSUsearch_db/Gene_db.ssu_rdp_rep.fasta
x SSUsearch_db/
x SSUsearch_db/
x SSUsearch_db/Hmm.lsu.hmm
x SSUsearch_db/
x SSUsearch_db/Hmm.ssu.hmm

download a small test dataset

ATT: for real (larger) dataset, make sure there is enough disk space.

In [ ]:
!tar -xzvf test.tgz

In [7]:
ls test/data/

ls: test/data/: No such file or directory

This tutorial assumes that you ready finished quality trimming, and also paired end merge, if you paired end reads overlap.

For quality trimming, we recommend trimmomatic written in java, or fastq-mcf written in C.

For paired end reads merging, we recommend pandseq or flash

In [ ]: