In [1]:
!mafft -h


/usr/local/bin/mafft: Cannot open -h.

------------------------------------------------------------------------------
  MAFFT v7.187 (2014/10/02)
  http://mafft.cbrc.jp/alignment/software/
  MBE 30:772-780 (2013), NAR 30:3059-3066 (2002)
------------------------------------------------------------------------------
High speed:
  % mafft in > out
  % mafft --retree 1 in > out (fast)

High accuracy (for <~200 sequences x <~2,000 aa/nt):
  % mafft --maxiterate 1000 --localpair  in > out (% linsi in > out is also ok)
  % mafft --maxiterate 1000 --genafpair  in > out (% einsi in > out)
  % mafft --maxiterate 1000 --globalpair in > out (% ginsi in > out)

If unsure which option to use:
  % mafft --auto in > out

--op # :         Gap opening penalty, default: 1.53
--ep # :         Offset (works like gap extension penalty), default: 0.0
--maxiterate # : Maximum number of iterative refinement, default: 0
--clustalout :   Output: clustal format, default: fasta
--reorder :      Outorder: aligned, default: input order
--quiet :        Do not report progress
--thread # :     Number of threads (if unsure, --thread -1)

In [7]:
%%timeit

!mafft /Users/caporaso/data/gg_13_8_otus/rep_set/73_otus.fasta > out.fasta


nseq =  267
distance =  ktuples
iterate =  0
cycle =  2
nguidetree = 2
nthread = 0
sueff_global = 0.100000
generating a scoring matrix for nucleotide (dist=200) ... done
done
done
scoremtx = -1
Gap Penalty = -1.53, +0.00, +0.00

tuplesize = 6, dorp = d


Making a distance matrix ..

There are 871 ambiguous characters.
  201 / 267
done.

Constructing a UPGMA tree ... 
  260 / 267
done.

Progressive alignment 1/2... 
STEP   121 / 266 d
Reallocating..done. *alloclen = 5644
STEP   186 / 266 f
Reallocating..done. *alloclen = 6953
STEP   222 / 266 d
Reallocating..done. *alloclen = 9470
STEP   262 / 266 d
Reallocating..done. *alloclen = 10797
STEP   266 / 266 d
done.

Constructing a UPGMA tree ... 
  260 / 267
done.

Progressive alignment 2/2... 
STEP   181 / 266 d
Reallocating..done. *alloclen = 5680
STEP   202 / 266 d
Reallocating..done. *alloclen = 6703
STEP   231 / 266 d
Reallocating..done. *alloclen = 9258
STEP   265 / 266 d
Reallocating..done. *alloclen = 11212
STEP   266 / 266 d
done.

disttbfast (nuc) Version 7.187 alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
0 thread(s)


Strategy:
 FFT-NS-2 (Fast but rough)
 Progressive method (guide trees were built 2 times.)

If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.

The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --legacygappenalty option.


nseq =  267
distance =  ktuples
iterate =  0
cycle =  2
nguidetree = 2
nthread = 0
sueff_global = 0.100000
generating a scoring matrix for nucleotide (dist=200) ... done
done
done
scoremtx = -1
Gap Penalty = -1.53, +0.00, +0.00

tuplesize = 6, dorp = d


Making a distance matrix ..

There are 871 ambiguous characters.
  201 / 267
done.

Constructing a UPGMA tree ... 
  260 / 267
done.

Progressive alignment 1/2... 
STEP   121 / 266 d
Reallocating..done. *alloclen = 5644
STEP   186 / 266 f
Reallocating..done. *alloclen = 6953
STEP   222 / 266 d
Reallocating..done. *alloclen = 9470
STEP   262 / 266 d
Reallocating..done. *alloclen = 10797
STEP   266 / 266 d
done.

Constructing a UPGMA tree ... 
  260 / 267
done.

Progressive alignment 2/2... 
STEP   181 / 266 d
Reallocating..done. *alloclen = 5680
STEP   202 / 266 d
Reallocating..done. *alloclen = 6703
STEP   231 / 266 d
Reallocating..done. *alloclen = 9258
STEP   265 / 266 d
Reallocating..done. *alloclen = 11212
STEP   266 / 266 d
done.

disttbfast (nuc) Version 7.187 alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
0 thread(s)


Strategy:
 FFT-NS-2 (Fast but rough)
 Progressive method (guide trees were built 2 times.)

If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.

The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --legacygappenalty option.


nseq =  267
distance =  ktuples
iterate =  0
cycle =  2
nguidetree = 2
nthread = 0
sueff_global = 0.100000
generating a scoring matrix for nucleotide (dist=200) ... done
done
done
scoremtx = -1
Gap Penalty = -1.53, +0.00, +0.00

tuplesize = 6, dorp = d


Making a distance matrix ..

There are 871 ambiguous characters.
  201 / 267
done.

Constructing a UPGMA tree ... 
  260 / 267
done.

Progressive alignment 1/2... 
STEP   121 / 266 d
Reallocating..done. *alloclen = 5644
STEP   186 / 266 f
Reallocating..done. *alloclen = 6953
STEP   222 / 266 d
Reallocating..done. *alloclen = 9470
STEP   262 / 266 d
Reallocating..done. *alloclen = 10797
STEP   266 / 266 d
done.

Constructing a UPGMA tree ... 
  260 / 267
done.

Progressive alignment 2/2... 
STEP   181 / 266 d
Reallocating..done. *alloclen = 5680
STEP   202 / 266 d
Reallocating..done. *alloclen = 6703
STEP   231 / 266 d
Reallocating..done. *alloclen = 9258
STEP   265 / 266 d
Reallocating..done. *alloclen = 11212
STEP   266 / 266 d
done.

disttbfast (nuc) Version 7.187 alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
0 thread(s)


Strategy:
 FFT-NS-2 (Fast but rough)
 Progressive method (guide trees were built 2 times.)

If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.

The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --legacygappenalty option.


nseq =  267
distance =  ktuples
iterate =  0
cycle =  2
nguidetree = 2
nthread = 0
sueff_global = 0.100000
generating a scoring matrix for nucleotide (dist=200) ... done
done
done
scoremtx = -1
Gap Penalty = -1.53, +0.00, +0.00

tuplesize = 6, dorp = d


Making a distance matrix ..

There are 871 ambiguous characters.
  201 / 267
done.

Constructing a UPGMA tree ... 
  260 / 267
done.

Progressive alignment 1/2... 
STEP   121 / 266 d
Reallocating..done. *alloclen = 5644
STEP   186 / 266 f
Reallocating..done. *alloclen = 6953
STEP   222 / 266 d
Reallocating..done. *alloclen = 9470
STEP   262 / 266 d
Reallocating..done. *alloclen = 10797
STEP   266 / 266 d
done.

Constructing a UPGMA tree ... 
  260 / 267
done.

Progressive alignment 2/2... 
STEP   181 / 266 d
Reallocating..done. *alloclen = 5680
STEP   202 / 266 d
Reallocating..done. *alloclen = 6703
STEP   231 / 266 d
Reallocating..done. *alloclen = 9258
STEP   265 / 266 d
Reallocating..done. *alloclen = 11212
STEP   266 / 266 d
done.

disttbfast (nuc) Version 7.187 alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
0 thread(s)


Strategy:
 FFT-NS-2 (Fast but rough)
 Progressive method (guide trees were built 2 times.)

If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.

The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --legacygappenalty option.

1 loops, best of 3: 28.6 s per loop

In [8]:
%%timeit

!mafft --thread 4 /Users/caporaso/data/gg_13_8_otus/rep_set/73_otus.fasta > out.fasta


nseq =  267
distance =  ktuples
iterate =  0
cycle =  2
nguidetree = 2
nthread = 4
sueff_global = 0.100000
generating a scoring matrix for nucleotide (dist=200) ... done
done
done
scoremtx = -1
Gap Penalty = -1.53, +0.00, +0.00

tuplesize = 6, dorp = d


Making a distance matrix ..

There are 871 ambiguous characters.
  261 / 267 (thread    1)
done.

Constructing a UPGMA tree ... 
  260 / 267
done.

Progressive alignment 1/2... 
STEP   128 / 266 (thread    1)d
Reallocating..done. *alloclen = 5644
STEP   191 / 266 (thread    2)d
Reallocating..done. *alloclen = 6953
STEP   228 / 266 (thread    3)d
Reallocating..done. *alloclen = 9470
STEP   261 / 266 (thread    2)d
Reallocating..done. *alloclen = 10797
STEP   266 / 266 (thread    2)d
done.

Making a distance matrix from msa.. 
  260 / 267 (thread    3)
done.

Constructing a UPGMA tree ... 
  260 / 267
done.

Progressive alignment 2/2... 
STEP   180 / 266 (thread    0)f
Reallocating..done. *alloclen = 5680
STEP   201 / 266 (thread    1)d
Reallocating..done. *alloclen = 6703
STEP   234 / 266 (thread    0)dd
Reallocating..done. *alloclen = 9258
STEP   264 / 266 (thread    1)d
Reallocating..done. *alloclen = 11212
STEP   266 / 266 (thread    2)d
done.

disttbfast (nuc) Version 7.187 alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
4 thread(s)


Strategy:
 FFT-NS-2 (Fast but rough)
 Progressive method (guide trees were built 2 times.)

If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.

The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --legacygappenalty option.


nseq =  267
distance =  ktuples
iterate =  0
cycle =  2
nguidetree = 2
nthread = 4
sueff_global = 0.100000
generating a scoring matrix for nucleotide (dist=200) ... done
done
done
scoremtx = -1
Gap Penalty = -1.53, +0.00, +0.00

tuplesize = 6, dorp = d


Making a distance matrix ..

There are 871 ambiguous characters.
  261 / 267 (thread    2)
done.

Constructing a UPGMA tree ... 
  260 / 267
done.

Progressive alignment 1/2... 
STEP   127 / 266 (thread    0)f
Reallocating..done. *alloclen = 5644
STEP   190 / 266 (thread    3)f
Reallocating..done. *alloclen = 6953
STEP   228 / 266 (thread    1)d
Reallocating..done. *alloclen = 9470
STEP   261 / 266 (thread    3)d
Reallocating..done. *alloclen = 10797
STEP   266 / 266 (thread    3)d
done.

Making a distance matrix from msa.. 
  260 / 267 (thread    0)
done.

Constructing a UPGMA tree ... 
  260 / 267
done.

Progressive alignment 2/2... 
STEP   180 / 266 (thread    0)f
Reallocating..done. *alloclen = 5680
STEP   201 / 266 (thread    3)d
Reallocating..done. *alloclen = 6703
STEP   234 / 266 (thread    0)dd
Reallocating..done. *alloclen = 9258
STEP   264 / 266 (thread    3)d
Reallocating..done. *alloclen = 11212
STEP   266 / 266 (thread    1)d
done.

disttbfast (nuc) Version 7.187 alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
4 thread(s)


Strategy:
 FFT-NS-2 (Fast but rough)
 Progressive method (guide trees were built 2 times.)

If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.

The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --legacygappenalty option.


nseq =  267
distance =  ktuples
iterate =  0
cycle =  2
nguidetree = 2
nthread = 4
sueff_global = 0.100000
generating a scoring matrix for nucleotide (dist=200) ... done
done
done
scoremtx = -1
Gap Penalty = -1.53, +0.00, +0.00

tuplesize = 6, dorp = d


Making a distance matrix ..

There are 871 ambiguous characters.
  261 / 267 (thread    1)
done.

Constructing a UPGMA tree ... 
  260 / 267
done.

Progressive alignment 1/2... 
STEP   127 / 266 (thread    3)f
Reallocating..done. *alloclen = 5644
STEP   191 / 266 (thread    1)d
Reallocating..done. *alloclen = 6953
STEP   228 / 266 (thread    2)d
Reallocating..done. *alloclen = 9470
STEP   261 / 266 (thread    0)d
Reallocating..done. *alloclen = 10797
STEP   266 / 266 (thread    0)d
done.

Making a distance matrix from msa.. 
  260 / 267 (thread    2)
done.

Constructing a UPGMA tree ... 
  260 / 267
done.

Progressive alignment 2/2... 
STEP   180 / 266 (thread    2)f
Reallocating..done. *alloclen = 5680
STEP   201 / 266 (thread    2)d
Reallocating..done. *alloclen = 6703
STEP   234 / 266 (thread    0)d
Reallocating..done. *alloclen = 9258
STEP   264 / 266 (thread    3)d
Reallocating..done. *alloclen = 11212
STEP   266 / 266 (thread    1)d
done.

disttbfast (nuc) Version 7.187 alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
4 thread(s)


Strategy:
 FFT-NS-2 (Fast but rough)
 Progressive method (guide trees were built 2 times.)

If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.

The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --legacygappenalty option.


nseq =  267
distance =  ktuples
iterate =  0
cycle =  2
nguidetree = 2
nthread = 4
sueff_global = 0.100000
generating a scoring matrix for nucleotide (dist=200) ... done
done
done
scoremtx = -1
Gap Penalty = -1.53, +0.00, +0.00

tuplesize = 6, dorp = d


Making a distance matrix ..

There are 871 ambiguous characters.
  261 / 267 (thread    0)
done.

Constructing a UPGMA tree ... 
  260 / 267
done.

Progressive alignment 1/2... 
STEP   127 / 266 (thread    3)f
Reallocating..done. *alloclen = 5644
STEP   191 / 266 (thread    0)d
Reallocating..done. *alloclen = 6953
STEP   228 / 266 (thread    1)d
Reallocating..done. *alloclen = 9470
STEP   261 / 266 (thread    2)d
Reallocating..done. *alloclen = 10797
STEP   266 / 266 (thread    2)d
done.

Making a distance matrix from msa.. 
  260 / 267 (thread    0)
done.

Constructing a UPGMA tree ... 
  260 / 267
done.

Progressive alignment 2/2... 
STEP   183 / 266 (thread    0)d
Reallocating..done. *alloclen = 5680
STEP   201 / 266 (thread    2)d
Reallocating..done. *alloclen = 6703
STEP   234 / 266 (thread    2)d
Reallocating..done. *alloclen = 9258
STEP   264 / 266 (thread    1)d
Reallocating..done. *alloclen = 11212
STEP   266 / 266 (thread    0)d
done.

disttbfast (nuc) Version 7.187 alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
4 thread(s)


Strategy:
 FFT-NS-2 (Fast but rough)
 Progressive method (guide trees were built 2 times.)

If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.

The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --legacygappenalty option.

1 loops, best of 3: 17.3 s per loop

Results from experiment on bacon

$ time /usr/bin/mafft --thread 30 /data/gg_13_5_otus/rep_set/97_otus.fasta > temp/out.fna

real    734m20.494s
user    4651m19.761s
sys     219m24.048s

$ count_seqs.py -i temp/out.fna

99322  : temp/out.fna (Sequence lengths (mean +/- std): 49071.0000 +/- 0.0000)
99322  : Total

moved the above output file to 2014.10-mafft-experiments/97_otus.mafft_aligned.fna


In [ ]: