In [1]:
!mafft -h
/usr/local/bin/mafft: Cannot open -h.
------------------------------------------------------------------------------
MAFFT v7.187 (2014/10/02)
http://mafft.cbrc.jp/alignment/software/
MBE 30:772-780 (2013), NAR 30:3059-3066 (2002)
------------------------------------------------------------------------------
High speed:
% mafft in > out
% mafft --retree 1 in > out (fast)
High accuracy (for <~200 sequences x <~2,000 aa/nt):
% mafft --maxiterate 1000 --localpair in > out (% linsi in > out is also ok)
% mafft --maxiterate 1000 --genafpair in > out (% einsi in > out)
% mafft --maxiterate 1000 --globalpair in > out (% ginsi in > out)
If unsure which option to use:
% mafft --auto in > out
--op # : Gap opening penalty, default: 1.53
--ep # : Offset (works like gap extension penalty), default: 0.0
--maxiterate # : Maximum number of iterative refinement, default: 0
--clustalout : Output: clustal format, default: fasta
--reorder : Outorder: aligned, default: input order
--quiet : Do not report progress
--thread # : Number of threads (if unsure, --thread -1)
In [7]:
%%timeit
!mafft /Users/caporaso/data/gg_13_8_otus/rep_set/73_otus.fasta > out.fasta
nseq = 267
distance = ktuples
iterate = 0
cycle = 2
nguidetree = 2
nthread = 0
sueff_global = 0.100000
generating a scoring matrix for nucleotide (dist=200) ... done
done
done
scoremtx = -1
Gap Penalty = -1.53, +0.00, +0.00
tuplesize = 6, dorp = d
Making a distance matrix ..
There are 871 ambiguous characters.
201 / 267
done.
Constructing a UPGMA tree ...
260 / 267
done.
Progressive alignment 1/2...
STEP 121 / 266 d
Reallocating..done. *alloclen = 5644
STEP 186 / 266 f
Reallocating..done. *alloclen = 6953
STEP 222 / 266 d
Reallocating..done. *alloclen = 9470
STEP 262 / 266 d
Reallocating..done. *alloclen = 10797
STEP 266 / 266 d
done.
Constructing a UPGMA tree ...
260 / 267
done.
Progressive alignment 2/2...
STEP 181 / 266 d
Reallocating..done. *alloclen = 5680
STEP 202 / 266 d
Reallocating..done. *alloclen = 6703
STEP 231 / 266 d
Reallocating..done. *alloclen = 9258
STEP 265 / 266 d
Reallocating..done. *alloclen = 11212
STEP 266 / 266 d
done.
disttbfast (nuc) Version 7.187 alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
0 thread(s)
Strategy:
FFT-NS-2 (Fast but rough)
Progressive method (guide trees were built 2 times.)
If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.
The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --legacygappenalty option.
nseq = 267
distance = ktuples
iterate = 0
cycle = 2
nguidetree = 2
nthread = 0
sueff_global = 0.100000
generating a scoring matrix for nucleotide (dist=200) ... done
done
done
scoremtx = -1
Gap Penalty = -1.53, +0.00, +0.00
tuplesize = 6, dorp = d
Making a distance matrix ..
There are 871 ambiguous characters.
201 / 267
done.
Constructing a UPGMA tree ...
260 / 267
done.
Progressive alignment 1/2...
STEP 121 / 266 d
Reallocating..done. *alloclen = 5644
STEP 186 / 266 f
Reallocating..done. *alloclen = 6953
STEP 222 / 266 d
Reallocating..done. *alloclen = 9470
STEP 262 / 266 d
Reallocating..done. *alloclen = 10797
STEP 266 / 266 d
done.
Constructing a UPGMA tree ...
260 / 267
done.
Progressive alignment 2/2...
STEP 181 / 266 d
Reallocating..done. *alloclen = 5680
STEP 202 / 266 d
Reallocating..done. *alloclen = 6703
STEP 231 / 266 d
Reallocating..done. *alloclen = 9258
STEP 265 / 266 d
Reallocating..done. *alloclen = 11212
STEP 266 / 266 d
done.
disttbfast (nuc) Version 7.187 alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
0 thread(s)
Strategy:
FFT-NS-2 (Fast but rough)
Progressive method (guide trees were built 2 times.)
If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.
The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --legacygappenalty option.
nseq = 267
distance = ktuples
iterate = 0
cycle = 2
nguidetree = 2
nthread = 0
sueff_global = 0.100000
generating a scoring matrix for nucleotide (dist=200) ... done
done
done
scoremtx = -1
Gap Penalty = -1.53, +0.00, +0.00
tuplesize = 6, dorp = d
Making a distance matrix ..
There are 871 ambiguous characters.
201 / 267
done.
Constructing a UPGMA tree ...
260 / 267
done.
Progressive alignment 1/2...
STEP 121 / 266 d
Reallocating..done. *alloclen = 5644
STEP 186 / 266 f
Reallocating..done. *alloclen = 6953
STEP 222 / 266 d
Reallocating..done. *alloclen = 9470
STEP 262 / 266 d
Reallocating..done. *alloclen = 10797
STEP 266 / 266 d
done.
Constructing a UPGMA tree ...
260 / 267
done.
Progressive alignment 2/2...
STEP 181 / 266 d
Reallocating..done. *alloclen = 5680
STEP 202 / 266 d
Reallocating..done. *alloclen = 6703
STEP 231 / 266 d
Reallocating..done. *alloclen = 9258
STEP 265 / 266 d
Reallocating..done. *alloclen = 11212
STEP 266 / 266 d
done.
disttbfast (nuc) Version 7.187 alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
0 thread(s)
Strategy:
FFT-NS-2 (Fast but rough)
Progressive method (guide trees were built 2 times.)
If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.
The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --legacygappenalty option.
nseq = 267
distance = ktuples
iterate = 0
cycle = 2
nguidetree = 2
nthread = 0
sueff_global = 0.100000
generating a scoring matrix for nucleotide (dist=200) ... done
done
done
scoremtx = -1
Gap Penalty = -1.53, +0.00, +0.00
tuplesize = 6, dorp = d
Making a distance matrix ..
There are 871 ambiguous characters.
201 / 267
done.
Constructing a UPGMA tree ...
260 / 267
done.
Progressive alignment 1/2...
STEP 121 / 266 d
Reallocating..done. *alloclen = 5644
STEP 186 / 266 f
Reallocating..done. *alloclen = 6953
STEP 222 / 266 d
Reallocating..done. *alloclen = 9470
STEP 262 / 266 d
Reallocating..done. *alloclen = 10797
STEP 266 / 266 d
done.
Constructing a UPGMA tree ...
260 / 267
done.
Progressive alignment 2/2...
STEP 181 / 266 d
Reallocating..done. *alloclen = 5680
STEP 202 / 266 d
Reallocating..done. *alloclen = 6703
STEP 231 / 266 d
Reallocating..done. *alloclen = 9258
STEP 265 / 266 d
Reallocating..done. *alloclen = 11212
STEP 266 / 266 d
done.
disttbfast (nuc) Version 7.187 alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
0 thread(s)
Strategy:
FFT-NS-2 (Fast but rough)
Progressive method (guide trees were built 2 times.)
If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.
The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --legacygappenalty option.
1 loops, best of 3: 28.6 s per loop
In [8]:
%%timeit
!mafft --thread 4 /Users/caporaso/data/gg_13_8_otus/rep_set/73_otus.fasta > out.fasta
nseq = 267
distance = ktuples
iterate = 0
cycle = 2
nguidetree = 2
nthread = 4
sueff_global = 0.100000
generating a scoring matrix for nucleotide (dist=200) ... done
done
done
scoremtx = -1
Gap Penalty = -1.53, +0.00, +0.00
tuplesize = 6, dorp = d
Making a distance matrix ..
There are 871 ambiguous characters.
261 / 267 (thread 1)
done.
Constructing a UPGMA tree ...
260 / 267
done.
Progressive alignment 1/2...
STEP 128 / 266 (thread 1)d
Reallocating..done. *alloclen = 5644
STEP 191 / 266 (thread 2)d
Reallocating..done. *alloclen = 6953
STEP 228 / 266 (thread 3)d
Reallocating..done. *alloclen = 9470
STEP 261 / 266 (thread 2)d
Reallocating..done. *alloclen = 10797
STEP 266 / 266 (thread 2)d
done.
Making a distance matrix from msa..
260 / 267 (thread 3)
done.
Constructing a UPGMA tree ...
260 / 267
done.
Progressive alignment 2/2...
STEP 180 / 266 (thread 0)f
Reallocating..done. *alloclen = 5680
STEP 201 / 266 (thread 1)d
Reallocating..done. *alloclen = 6703
STEP 234 / 266 (thread 0)dd
Reallocating..done. *alloclen = 9258
STEP 264 / 266 (thread 1)d
Reallocating..done. *alloclen = 11212
STEP 266 / 266 (thread 2)d
done.
disttbfast (nuc) Version 7.187 alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
4 thread(s)
Strategy:
FFT-NS-2 (Fast but rough)
Progressive method (guide trees were built 2 times.)
If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.
The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --legacygappenalty option.
nseq = 267
distance = ktuples
iterate = 0
cycle = 2
nguidetree = 2
nthread = 4
sueff_global = 0.100000
generating a scoring matrix for nucleotide (dist=200) ... done
done
done
scoremtx = -1
Gap Penalty = -1.53, +0.00, +0.00
tuplesize = 6, dorp = d
Making a distance matrix ..
There are 871 ambiguous characters.
261 / 267 (thread 2)
done.
Constructing a UPGMA tree ...
260 / 267
done.
Progressive alignment 1/2...
STEP 127 / 266 (thread 0)f
Reallocating..done. *alloclen = 5644
STEP 190 / 266 (thread 3)f
Reallocating..done. *alloclen = 6953
STEP 228 / 266 (thread 1)d
Reallocating..done. *alloclen = 9470
STEP 261 / 266 (thread 3)d
Reallocating..done. *alloclen = 10797
STEP 266 / 266 (thread 3)d
done.
Making a distance matrix from msa..
260 / 267 (thread 0)
done.
Constructing a UPGMA tree ...
260 / 267
done.
Progressive alignment 2/2...
STEP 180 / 266 (thread 0)f
Reallocating..done. *alloclen = 5680
STEP 201 / 266 (thread 3)d
Reallocating..done. *alloclen = 6703
STEP 234 / 266 (thread 0)dd
Reallocating..done. *alloclen = 9258
STEP 264 / 266 (thread 3)d
Reallocating..done. *alloclen = 11212
STEP 266 / 266 (thread 1)d
done.
disttbfast (nuc) Version 7.187 alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
4 thread(s)
Strategy:
FFT-NS-2 (Fast but rough)
Progressive method (guide trees were built 2 times.)
If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.
The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --legacygappenalty option.
nseq = 267
distance = ktuples
iterate = 0
cycle = 2
nguidetree = 2
nthread = 4
sueff_global = 0.100000
generating a scoring matrix for nucleotide (dist=200) ... done
done
done
scoremtx = -1
Gap Penalty = -1.53, +0.00, +0.00
tuplesize = 6, dorp = d
Making a distance matrix ..
There are 871 ambiguous characters.
261 / 267 (thread 1)
done.
Constructing a UPGMA tree ...
260 / 267
done.
Progressive alignment 1/2...
STEP 127 / 266 (thread 3)f
Reallocating..done. *alloclen = 5644
STEP 191 / 266 (thread 1)d
Reallocating..done. *alloclen = 6953
STEP 228 / 266 (thread 2)d
Reallocating..done. *alloclen = 9470
STEP 261 / 266 (thread 0)d
Reallocating..done. *alloclen = 10797
STEP 266 / 266 (thread 0)d
done.
Making a distance matrix from msa..
260 / 267 (thread 2)
done.
Constructing a UPGMA tree ...
260 / 267
done.
Progressive alignment 2/2...
STEP 180 / 266 (thread 2)f
Reallocating..done. *alloclen = 5680
STEP 201 / 266 (thread 2)d
Reallocating..done. *alloclen = 6703
STEP 234 / 266 (thread 0)d
Reallocating..done. *alloclen = 9258
STEP 264 / 266 (thread 3)d
Reallocating..done. *alloclen = 11212
STEP 266 / 266 (thread 1)d
done.
disttbfast (nuc) Version 7.187 alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
4 thread(s)
Strategy:
FFT-NS-2 (Fast but rough)
Progressive method (guide trees were built 2 times.)
If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.
The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --legacygappenalty option.
nseq = 267
distance = ktuples
iterate = 0
cycle = 2
nguidetree = 2
nthread = 4
sueff_global = 0.100000
generating a scoring matrix for nucleotide (dist=200) ... done
done
done
scoremtx = -1
Gap Penalty = -1.53, +0.00, +0.00
tuplesize = 6, dorp = d
Making a distance matrix ..
There are 871 ambiguous characters.
261 / 267 (thread 0)
done.
Constructing a UPGMA tree ...
260 / 267
done.
Progressive alignment 1/2...
STEP 127 / 266 (thread 3)f
Reallocating..done. *alloclen = 5644
STEP 191 / 266 (thread 0)d
Reallocating..done. *alloclen = 6953
STEP 228 / 266 (thread 1)d
Reallocating..done. *alloclen = 9470
STEP 261 / 266 (thread 2)d
Reallocating..done. *alloclen = 10797
STEP 266 / 266 (thread 2)d
done.
Making a distance matrix from msa..
260 / 267 (thread 0)
done.
Constructing a UPGMA tree ...
260 / 267
done.
Progressive alignment 2/2...
STEP 183 / 266 (thread 0)d
Reallocating..done. *alloclen = 5680
STEP 201 / 266 (thread 2)d
Reallocating..done. *alloclen = 6703
STEP 234 / 266 (thread 2)d
Reallocating..done. *alloclen = 9258
STEP 264 / 266 (thread 1)d
Reallocating..done. *alloclen = 11212
STEP 266 / 266 (thread 0)d
done.
disttbfast (nuc) Version 7.187 alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
4 thread(s)
Strategy:
FFT-NS-2 (Fast but rough)
Progressive method (guide trees were built 2 times.)
If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.
The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --legacygappenalty option.
1 loops, best of 3: 17.3 s per loop
$ time /usr/bin/mafft --thread 30 /data/gg_13_5_otus/rep_set/97_otus.fasta > temp/out.fna
real 734m20.494s
user 4651m19.761s
sys 219m24.048s
$ count_seqs.py -i temp/out.fna
99322 : temp/out.fna (Sequence lengths (mean +/- std): 49071.0000 +/- 0.0000)
99322 : Total
moved the above output file to 2014.10-mafft-experiments/97_otus.mafft_aligned.fna
In [ ]:
Content source: gregcaporaso/sketchbook
Similar notebooks: