In these analyses our interest is primarily in inferring accurate branch lengths under a relaxed molecular clock model. This means that tips are forced to line up at the present (time) but that rates of substitutions are allowed to vary among branches to best explain the variation in the sequence data.
There is a huge range of models that can be employed using mrbayes by employing different combinations of parameter settings, model definitions, and prior settings. The ipyrad-analysis tool here is intended to make it easy to run such jobs many times (e.g., distributed in parallel) once you have decided on your settings. In addition, we provide a number of pre-set models (e.g., clock_model=2) that may be useful for simple scenarios.
Here we use simulations to demonstrate the accuracy of branch length estimation when sequences come from a single versus multiple distinct genealogies (e.g., gene tree vs species tree), and show an option to fix the topology to speed up analyses when your only interest is to estimate branch lengths.
In [1]:
# conda install ipyrad toytree mrbayes -c conda-forge -c bioconda
In [2]:
import toytree
import ipcoal
import ipyrad.analysis as ipa
In [3]:
TREE = toytree.rtree.bdtree(ntips=8, b=0.8, d=0.2, seed=123)
TREE = TREE.mod.node_scale_root_height(1e6)
TREE.draw(ts='o', layout='d', scalebar=True);
In [68]:
# init simulator
model = ipcoal.Model(TREE, Ne=2e4, nsamples=2, recomb=0)
# simulate sequence data on coalescent genealogies
model.sim_loci(nloci=1, nsites=20000)
# write results to database file
model.write_concat_to_nexus(name="mbtest-1", outdir='/tmp', diploid=True)
In [69]:
# the simulated genealogy of haploid alleles
gene = model.df.genealogy[0]
# draw the genealogy
toytree.tree(gene).draw(ts='o', layout='d', scalebar=True);
In [70]:
model.draw_seqview(idx=0, start=0, end=50);
In [71]:
# init the mb object
mb = ipa.mrbayes(
data="/tmp/mbtest-1.nex",
name="itest-1",
workdir="/tmp",
clock_model=2,
constraints=TREE,
ngen=int(1e6),
nruns=2,
)
# summary of priors/params
print(mb.params)
In [72]:
# start the run
mb.run(force=True)
In [76]:
# load the inferred tree
mbtre = toytree.tree("/tmp/itest-1.nex.con.tre", 10)
# scale root node to 1e6
mbtre = mbtre.mod.node_scale_root_height(1e6)
# draw inferred tree
c, a, m = mbtre.draw(ts='o', layout='d', scalebar=True);
# draw TRUE tree in orange on the same axes
TREE.draw(
axes=a,
ts='o', layout='d', scalebar=True,
edge_colors="darkorange",
node_sizes=0,
);
In [77]:
# check convergence statistics
mb.convergence_stats
Out[77]:
Here we use concatenated sequence data from 100 loci where each represents one or more distinct genealogies. In addition, Ne is increased to 1e5, allowing for more genealogical variation. We expect the accuracy of estimated edge lengths will decrease since we are now adequately modeling the genealogical variation when using concatenation.
In [78]:
# init simulator
model = ipcoal.Model(TREE, Ne=1e5, nsamples=2, recomb=0)
# simulate sequence data on coalescent genealogies
model.sim_loci(nloci=100, nsites=200)
# write results to database file
model.write_concat_to_nexus(name="mbtest-2", outdir='/tmp', diploid=True)
In [87]:
# the simulated genealogies of haploid alleles
genes = model.df.genealogy[:4]
# draw the genealogies of the first four loci
toytree.mtree(genes).draw_tree_grid(ts='o', layout='r');
In [88]:
# init the mb object
mb = ipa.mrbayes(
data="/tmp/mbtest-2.nex",
workdir="/tmp",
name="itest-2",
clock_model=2,
constraints=TREE,
ngen=int(1e6),
nruns=2,
)
# summary of priors/params
print(mb.params)
# start the run
mb.run(force=True)
In [91]:
# load the inferred tree
mbtre = toytree.tree("/tmp/itest-2.nex.con.tre", 10)
# scale root node from unitless to 1e6
mbtre = mbtre.mod.node_scale_root_height(1e6)
# draw inferred tree
c, a, m = mbtre.draw(ts='o', layout='d', scalebar=True);
# draw true tree in orange on the same axes
TREE.draw(
axes=a,
ts='o', layout='d', scalebar=True,
edge_colors="darkorange",
node_sizes=0,
);
In [92]:
mb.convergence_stats
Out[92]:
In [93]:
mb.print_nexus_string()
In [95]:
# init the mb object
mb = ipa.mrbayes(
data="/tmp/mbtest-2.nex",
name="itest-3",
workdir="/tmp",
clock_model=2,
ngen=int(2e6),
nruns=2,
)
# summary of priors/params
print(mb.params)
# start run
mb.run(force=True)
In [ ]:
# load the inferred tree
mbtre = toytree.tree("/tmp/itest-3.nex.con.tre", 10)
# scale root node from unitless to 1e6
mbtre = mbtre.mod.node_scale_root_height(1e6)
# draw inferred tree
c, a, m = mbtre.draw(
#ts='s',
layout='d',
scalebar=True,
node_sizes=18,
node_labels="prob{percent}",
);