Species-tree & species-delimitation using bpp (BP&P)

The program bpp by Rannala & Yang (2010; 2015) is a powerful tool for inferring species tree parameters and testing species delimitation hypotheses. It is relatively easy to use, and best of all, it's quite fast, although not highly parallelizable. This notebook describes a streamlined approach to easily setup input files for testing different hypthotheses in bpp, and to do so in a clear programmatic way that makes it easy to perform many tests over many different parameter settings. We also show how to distribute many separate jobs to run in parallel on a cluster.

Notebook setup

This is a Jupyter notebook, a reproducible and executable document. The code in this notebook is Python (2.7), and should be executed in a jupyter-notebook like this one. Execute each cell in order to reproduce our entire analysis. The example data set used in this analysis is from the empirical example ipyrad tutorial.

Install required software

All software required for this notebook can be installed using conda.



In [1]:

    
## conda install -c ipyrad ipyrad
## conda install -c ipyrad bpp
## conda install -c eaton-lab toytree



In [2]:

    
import ipyrad.analysis as ipa         ## ipyrad analysis tools
import ipyparallel as ipp             ## parallelization
import pandas as pd                   ## DataFrames
import numpy as np                    ## data generation
import toytree                        ## tree plotting
import toyplot                        ## data plotting

Connect to an ipyparallel cluster

We will use the ipyparallel library to submit jobs to run in parallel on a cluster. You will need to have an ipcluster instance running in a separate terminal on your machine (or ideally, it is running on your HPC cluster). The code below simply connects to that cluster and prints how many CPUs are available for use.



In [4]:

    
## Connect to a running ipcluster instance
ipyclient = ipp.Client()

## print information about our cluster
print "Connected to {} cores".format(len(ipyclient))









    



Connected to 40 cores

Analysis setup

You must define a tree with the "species" names in your analysis which will act either as a fixed-tree or as a guide-tree for your analysis. You must also define an IMAP dictionary which maps sample names to "species" names. And you can optionally also define a MINMAP dictionary which is used to filter RAD loci to include only loci that have at least N samples with data from each species in a locus.



In [5]:

    
## set the location of our input .loci file
locifile = "./analysis-ipyrad/pedicularis_outfiles/pedicularis.alleles.loci"

## a tree hypothesis (guidetree) (here based on tetrad results)
newick = "((((((rex, lip),rck),tha),cup),(cys,(cya, sup))),prz);"

## a dictionary mapping sample names to 'species' names
imap = {
    "prz": ["32082_przewalskii", "33588_przewalskii"],
    "cys": ["41478_cyathophylloides", "41954_cyathophylloides"],
    "cya": ["30686_cyathophylla"],
    "sup": ["29154_superba"],
    "cup": ["33413_thamno"],
    "tha": ["30556_thamno"],
    "rck": ["35236_rex"],
    "rex": ["35855_rex", "40578_rex"],
    "lip": ["39618_rex", "38362_rex"],  
    }

## optional: loci will be filtered if they do not have data for at
## least N samples/individuals in each species.
minmap = {
    "prz": 2,
    "cys": 2,
    "cya": 1,
    "sup": 1,
    "cup": 1,
    "tha": 1, 
    "rck": 1,
    "rex": 2,
    "lip": 2,
    }



In [8]:

    
## check your (starting) tree hypothesis
toytree.tree(newick).draw(use_edge_lengths=False, width=250);

The bpp Class object

To simplify the creation of input files for bpp analyses we've created a bpp job generator object that can be accessed from ipa.bpp(). Running bpp requires three input files (.ctl, .imap, and .seq) of which the .ctl file is the most important since it contains the parameters for a run and points to the location of the other two files. The ipa.bpp() object can be used to easily modify parameter settings for a run, to generate the input files, and if desired, to submit the bpp jobs to run on a cluster (your ipyclient cluster).



In [9]:

    
## create a bpp object to run algorithm 00
b = ipa.bpp(
    name="test",
    data=locifile,
    guidetree=newick, 
    imap=imap,
    minmap=minmap,  
    )



In [10]:

    
## set some optional params, leaving others at their defaults
b.params.burnin = 2000
b.params.nsample = 20000
b.params.sampfreq = 2
b.params.seed = 33333
b.params.tauprior = (2, 200, 1)

## print params
b.params









    Out[10]:





binary          bpp                 
burnin          2000                
cleandata       0                   
delimit_alg     (0, 5)              
finetune        (0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01)
infer_delimit   0                   
infer_sptree    0                   
nsample         20000               
sampfreq        2                   
seed            33333               
tauprior        (2, 200, 1)         
thetaprior      (2, 2000)           
usedata         1



In [11]:

    
## set some optional filters leaving others at their defaults
b.filters.maxloci=100
b.filters.minsnps=0

## print filters
b.filters









    Out[11]:





maxloci   100                 
minmap    {'cys': 4, 'rex': 4, 'cup': 2, 'rck': 2, 'cya': 2, 'lip': 4, 'sup': 2, 'tha': 2, 'prz': 4}
minsnps   0

Generating files &/or submitting jobs

When you create a bpp object you save it with a variable name (in this example named test) which is simply the name you will use to reference the object. You must also provide a name (prefix for output files) that will be used by either of the two main functions of the bpp object: write_bpp_files() or run(). Both functions make it easy to sample different distributions of loci to include in different replicate bpp analyses. Each replicate will start from a different random seed after the initial seed. If you used a maxloci argument to limit the number of loci that will used in the analysis then you can also use the randomize_order argument to select a different random number of N loci in each rep, otherwise just the first N loci that pass filtering will be sampled.

`write_bpp_files()`

This writes the .ctl, .seq, and .imap files for the specified run.



In [12]:

    
## write files (filtered based on test.filters and test.params)
b.write_bpp_files()









    



input files created for job test (100 loci)

`run()`

This writes the files for each job and submits the bpp jobs to run on the cluster designated by the ipyclient object. You can efficiently submit many replicate jobs in this way.



In [13]:

    
## or, write files and submit job to run on ipyclient 
b.run(
    nreps=2, 
    ipyclient=ipyclient, 
    randomize_order=True,
    )









    



submitted 2 bpp jobs [test] (100 loci)

Accessing job results

When you submit jobs the results files will be stored in the bpp objects .files attribute. In addition, the asychronous result objects (a representation of the running job) from each submitted job will be accessible from the .asyncs attribute of the bpp object. You can view these objects to see if your job has finished or use them to trace errors if an error arises, as we do below.



In [15]:

    
## check async objects from the bpp object
for job in b.asyncs:
    if job.ready():
        print 'job finished'
    else:
        print 'job running'









    



job finished
job finished



In [29]:

    
## uncomment and run this to block until all running jobs are finished
#ipyclient.wait()

Results



In [34]:

    
## set options to print prettier tables
## (this suppresses scientific notation)
pd.set_option('display.float_format', lambda x: '%.4f' % x)



In [35]:

    
## uncomment to print a subset of the table above
b.summarize_results()









    Out[35]:







  
    
      
      count
      mean
      std
      min
      25%
      50%
      75%
      max
    
  
  
    
      theta_1cup
      40000.0000
      0.0016
      0.0005
      0.0004
      0.0012
      0.0016
      0.0019
      0.0046
    
    
      theta_2cya
      40000.0000
      0.0005
      0.0003
      0.0000
      0.0002
      0.0004
      0.0007
      0.0027
    
    
      theta_3cys
      40000.0000
      0.0004
      0.0003
      0.0000
      0.0001
      0.0004
      0.0007
      0.0016
    
    
      theta_4lip
      40000.0000
      0.0010
      0.0003
      0.0004
      0.0008
      0.0010
      0.0012
      0.0023
    
    
      theta_5prz
      40000.0000
      0.0039
      0.0007
      0.0019
      0.0034
      0.0038
      0.0043
      0.0070
    
    
      theta_6rck
      40000.0000
      0.0014
      0.0005
      0.0004
      0.0011
      0.0014
      0.0017
      0.0042
    
    
      theta_7rex
      40000.0000
      0.0028
      0.0008
      0.0009
      0.0022
      0.0028
      0.0033
      0.0063
    
    
      theta_8sup
      40000.0000
      0.0011
      0.0004
      0.0002
      0.0008
      0.0010
      0.0013
      0.0039
    
    
      theta_9tha
      40000.0000
      0.0015
      0.0005
      0.0005
      0.0012
      0.0014
      0.0017
      0.0052
    
    
      theta_10rexliprckthacupcyscyasupprz
      40000.0000
      0.0039
      0.0032
      0.0002
      0.0011
      0.0024
      0.0068
      0.0144
    
    
      theta_11rexliprckthacupcyscyasup
      40000.0000
      0.0029
      0.0011
      0.0004
      0.0021
      0.0030
      0.0038
      0.0084
    
    
      theta_12rexliprckthacup
      40000.0000
      0.0044
      0.0013
      0.0009
      0.0035
      0.0044
      0.0052
      0.0116
    
    
      theta_13rexliprcktha
      40000.0000
      0.0026
      0.0017
      0.0001
      0.0013
      0.0022
      0.0037
      0.0104
    
    
      theta_14rexliprck
      40000.0000
      0.0017
      0.0010
      0.0000
      0.0010
      0.0015
      0.0022
      0.0088
    
    
      theta_15rexlip
      40000.0000
      0.0018
      0.0010
      0.0001
      0.0010
      0.0016
      0.0023
      0.0079
    
    
      theta_16cyscyasup
      40000.0000
      0.0023
      0.0012
      0.0001
      0.0014
      0.0021
      0.0030
      0.0094
    
    
      theta_17cyasup
      40000.0000
      0.0026
      0.0011
      0.0003
      0.0018
      0.0025
      0.0033
      0.0085
    
    
      tau_10rexliprckthacupcyscyasupprz
      40000.0000
      0.0111
      0.0034
      0.0043
      0.0075
      0.0122
      0.0141
      0.0191
    
    
      tau_11rexliprckthacupcyscyasup
      40000.0000
      0.0041
      0.0005
      0.0020
      0.0037
      0.0040
      0.0044
      0.0064
    
    
      tau_12rexliprckthacup
      40000.0000
      0.0017
      0.0004
      0.0007
      0.0014
      0.0018
      0.0021
      0.0033
    
    
      tau_13rexliprcktha
      40000.0000
      0.0014
      0.0005
      0.0005
      0.0009
      0.0013
      0.0018
      0.0027
    
    
      tau_14rexliprck
      40000.0000
      0.0013
      0.0005
      0.0005
      0.0009
      0.0012
      0.0017
      0.0025
    
    
      tau_15rexlip
      40000.0000
      0.0012
      0.0005
      0.0004
      0.0008
      0.0011
      0.0016
      0.0025
    
    
      tau_16cyscyasup
      40000.0000
      0.0032
      0.0006
      0.0010
      0.0028
      0.0032
      0.0036
      0.0053
    
    
      tau_17cyasup
      40000.0000
      0.0015
      0.0008
      0.0001
      0.0008
      0.0014
      0.0021
      0.0041
    
    
      lnL
      40000.0000
      -12151.9644
      271.9991
      -12524.5230
      -12422.8427
      -12161.9050
      -11879.7943
      -11807.7240



In [33]:

    
## uncomment to print a subset of the table above
#b.summarize_results()[["mean", "std", "min", "max"]]

Examples

Algorithm 00 - fixed tree parameter inference

The 00 algorithm means 'infer_sptree=0' and 'infer_delimit=0', thus the tree that you enter will be treated as the fixed species tree and the analysis will infer parameters for the tree under the multispecies coalescent model. This will yield values of $\Theta$ for each branch of the tree, and divergence times ($\tau$) for each split in the tree.



In [16]:

    
## create two new copies of the bpp object above, but with new names
A00 = b.copy(name="A00")
A00n = b.copy(name="A00-nodata")

## set params on the 'nodata' object to not use seq data
A00n.params.usedata = 0



In [34]:

    
## submit a few replicate jobs from different random subsets of loci 
A00.run(nreps=3, randomize_order=True, ipyclient=ipyclient, force=True)
A00n.run(nreps=1, randomize_order=True, ipyclient=ipyclient, force=True)









    



submitted 3 bpp jobs [A00] (100 loci)
submitted 1 bpp jobs [A00-nodata] (100 loci)



In [35]:

    
## wait for jobs to finish
ipyclient.wait()









    Out[35]:





True

Summarize results tables for algorithm 00

Different bpp algorithms produce different types of results files. For algorithm 00 the mcmc results file is simply a table of $\Theta$ and $\tau$ values so we can just parse it as a CSV file to summarize results. The same results will be available in the .out.txt file, but I find that parsing the results this way is a bit easier to view. Take note that these results tables are showing percentiles which is similar but not identical to posterior densities. You can get those from the .out file produced by bpp, or using the individual_results=True argument to .summarize_results() as shown below.

The 'no-data' results

We expect that when we use no data the priors will be returned. In this case we see no significant differences in theta values among samples, which is good. The tau values will be distributed according to the prior on the root divergence, with the other nodes distributed according to a dirichlet process. It will be of interest to compare these divergence time estimates to the ones produced by the analysis with sequence data, to assess whether the sequence data causing the divergence times estimates to diverge significantly from their priors.



In [17]:

    
## summary for the first 8 params of the no-data run
A00n.summarize_results().head(8)

Results with data

You can see here that the theta and tau values are quite different from the priors, which is good and tells us that our sequence data is informative. If you check the .out file produced by bpp you can also see "effective-sample-size (ESS)" which is indicative of whether or not we've run the mcmc long enough. Ideally we would like each ESS value to be >100.



In [18]:

    
## summary for the first 8 params of the with-data run
A00.summarize_results().head(8)



In [20]:

    
## get individual results for each rep
reps = A00.summarize_results(individual_results=True)

## check convergence (ESS*) for each param in rep 1
reps[1][["mean", "S.D.", "ESS*"]]









    Out[20]:







  
    
      
      mean
      S.D.
      ESS*
    
  
  
    
      theta_1cup
      0.001350
      0.000420
      40.3
    
    
      theta_2cya
      0.000692
      0.000324
      43.3
    
    
      theta_3cys
      0.000717
      0.000177
      60.4
    
    
      theta_4lip
      0.000920
      0.000204
      105.9
    
    
      theta_5prz
      0.003490
      0.000489
      377.8
    
    
      theta_6rck
      0.001130
      0.000321
      121.2
    
    
      theta_7rex
      0.002355
      0.000554
      98.2
    
    
      theta_8sup
      0.000954
      0.000396
      51.8
    
    
      theta_9tha
      0.001655
      0.000485
      95.3
    
    
      theta_10rexliprckthacupcyscyasupprz
      0.006313
      0.002876
      7.6
    
    
      theta_11rexliprckthacupcyscyasup
      0.002176
      0.000815
      22.9
    
    
      theta_12rexliprckthacup
      0.004195
      0.001267
      36.9
    
    
      theta_13rexliprcktha
      0.003357
      0.001846
      14.0
    
    
      theta_14rexliprck
      0.001622
      0.000920
      116.8
    
    
      theta_15rexlip
      0.001853
      0.001100
      95.7
    
    
      theta_16cyscyasup
      0.002449
      0.001300
      172.2
    
    
      theta_17cyasup
      0.003107
      0.001055
      110.5
    
    
      tau_10rexliprckthacupcyscyasupprz
      0.008021
      0.002025
      7.6
    
    
      tau_11rexliprckthacupcyscyasup
      0.004015
      0.000519
      107.8
    
    
      tau_12rexliprckthacup
      0.001575
      0.000461
      15.9
    
    
      tau_13rexliprcktha
      0.000965
      0.000194
      22.1
    
    
      tau_14rexliprck
      0.000911
      0.000181
      21.9
    
    
      tau_15rexlip
      0.000822
      0.000176
      27.5
    
    
      tau_16cyscyasup
      0.002851
      0.000577
      124.8
    
    
      tau_17cyasup
      0.000876
      0.000438
      30.4
    
    
      lnL
      -11880.819322
      21.881173
      22.1

Draw some histograms

Below we draw the posterior parameter distributions for the analyses with and without sequence data. In a real analysis you will probably want to run the analyses for longer, but you can see here that the results depart significantly from the prior expectation.



In [21]:

    
## get full MCMC table from the .files attribute
table_1 = pd.read_csv(A00n.files.mcmcfiles[0], sep='\t', index_col=0)
table_2 = pd.read_csv(A00.files.mcmcfiles[0], sep='\t', index_col=0)

## draw barplots
canvas = toyplot.Canvas(width=800, height=800)

for aidx in range(table_1.shape[1]):
    ## get data
    param = table_1.columns[aidx]
    hist1 = np.histogram(table_1[param], density=True)
    hist2 = np.histogram(table_2[param], density=True)

    ## build plot
    nx = int(np.ceil(np.sqrt(table_1.shape[1])))
    axes = canvas.cartesian(grid=(nx, nx, aidx), margin=25)
    axes.bars(hist1, opacity=0.65, style={"stroke-width": 0})
    axes.bars(hist2, opacity=0.65, style={"stroke-width": 0})

    ## style axes (hide tick labels for clarity)
    axes.label.text = param[:15]
    axes.label.style["font-size"] = "12px"
    axes.x.ticks.labels.show = False
    axes.y.ticks.labels.show = False
    
## save plot to pdf
import toyplot.pdf
toyplot.pdf.render(canvas, "analysis-bpp/bpp-A00.pdf")
canvas









    Out[21]:

The FigTree file

Only algorithm 00 produces a FigTree file, so be aware that you will not find one when you run the other algorithms. There are other types of results files that are produced instead.



In [40]:

    
## the tree file -- can be input to the program FigTree
tre = A00n.files.treefiles[0]
! cat $tre









    



#NEXUS
BEGIN TREES;

	UTREE 1 = ((((((rex: 0.001598, lip: 0.001598) [&height_95%_HPD={0.0009684, 0.002092}, theta=0.00171779]: 0.000096, rck: 0.001695) [&height_95%_HPD={0.0010488, 0.0021839}, theta=0.0017499]: 0.000091, tha: 0.001785) [&height_95%_HPD={0.0010807, 0.0023218}, theta=0.00193679]: 0.000123, cup: 0.001908) [&height_95%_HPD={0.001117, 0.0025796}, theta=0.00467129]: 0.002187, (cys: 0.003522, (cya: 0.002113, sup: 0.002113) [&height_95%_HPD={0.0010666, 0.0031479}, theta=0.00212346]: 0.001409) [&height_95%_HPD={0.0026035, 0.0044186}, theta=0.00208407]: 0.000574) [&height_95%_HPD={0.0033103, 0.0050114}, theta=0.00372337]: 0.010006, prz: 0.014101) [&height_95%_HPD={0.011928, 0.016307}, theta=0.00146896];

END;


[Species tree with tau as branch lengths and theta as labels, for FigTree.
In FigTree, choose 95%HPD for Node Bars and label for Node Labels]

Algorithm 10 - species tree inference

The algorithm 10 aims to infer the correct species tree from the data by implemented a tree search method, thus the input tree is treated only as a starting tree. Based on the results above I'm also going to increase the priors on the divergence times (tau) by about 10X.



In [55]:

    
## create new bpp objects
A10 = A00.copy("A10")
A10n = A00.copy("A10-nodata")

## set new params
A10.params.infer_sptree = 1
A10.params.infer_delimit = 0
A10.params.tauprior = (2, 200, 1)
A10n.params.infer_sptree = 1
A10n.params.infer_delimit = 0
A10n.params.tauprior = (2, 200, 1)

## also set no-data on the 'n' run
A10n.params.usedata = 0



In [26]:

    
## submit job reps to the cluster
A10.run(nreps=2, randomize_order=True, ipyclient=ipyclient, force=True)
A10n.run(nreps=1, randomize_order=True, ipyclient=ipyclient, force=True)









    



submitted 2 bpp jobs [A10] (100 loci)
submitted 1 bpp jobs [A10-nodata] (100 loci)



In [27]:

    
## block until finished
ipyclient.wait()









    Out[27]:





True

Plot the distribution of species trees from algorithm 10

Here we use the multitree() object of toytree to plot the posterior distribution of trees. The code for drawing 'cloudtrees' here is still in development and a littl clunky, but will improve in the future. From the example here (which has not been run long enough, btw), you can see clearly that the two replicates yield quite different trees. In this case the replicates are starting from different random seeds, but probably more importantly they are sampling a different random subset of 100 RAD loci that met our filtering requirements (e.g., minsamples, minSNPs).



In [52]:

    
## load trees slicing out every Nth: (start:end:by)
trees1 = toytree.multitree(
    newick=A10.files.mcmcfiles[0],
    treeslice=(500, 5000, 30),
    )

## build canvas
canvas = toyplot.Canvas(width=800, height=400)
axes1 = canvas.cartesian(grid=(1, 2, 0), padding=25)
axes2 = canvas.cartesian(grid=(1, 2, 1), padding=25)

## draw cloud tree
trees1.draw_cloudtree(
    axes=axes1,
    use_edge_lengths=True,
    orient='right',
    edge_style={"opacity": 0.025},
);

## draw consensus (doesn't save edge lengths currently)
cons = trees1.get_consensus_tree()
cons.root(wildcard="prz")
cons.draw(
    axes=axes2,
    node_labels=cons.get_node_values("support"),
);

## style axes
axes1.y.show = False
axes2.y.show = False



In [54]:

    
## load trees slicing out every N: [100:10000:100]
trees2 = toytree.multitree(
    newick=A10.files.mcmcfiles[1],
    treeslice=(500, 5000, 30),
    )

## build canvas
canvas = toyplot.Canvas(width=800, height=400)
axes1 = canvas.cartesian(grid=(1, 2, 0), padding=25)
axes2 = canvas.cartesian(grid=(1, 2, 1), padding=25)

## draw cloud tree
trees2.draw_cloudtree(
    axes=axes1,
    use_edge_lengths=True,
    orient='right',
    edge_style={"opacity": 0.025},
);

## draw consensus (doesn't save edge lengths currently)
cons = trees2.get_consensus_tree()
cons.root(wildcard="prz")
cons.draw(
    axes=axes2,
    node_labels=cons.get_node_values("support"),
);

## style axes and save
axes1.y.show = False
axes2.y.show = False
toyplot.html.render(canvas, "analysis-bpp/tree.html")
canvas









    Out[54]:

Algorithm 01 -- Species delimitation



In [56]:

    
## create new bpp objects
A01 = A00.copy("A01")
A01n = A00.copy("A01-nodata")

## set new params
A01.params.infer_sptree = 0
A01.params.infer_delimit = 1
A01.params.tauprior = (2, 200, 1)
A01n.params.infer_sptree = 0
A01n.params.infer_delimit = 1
A01n.params.tauprior = (2, 200, 1)

## also set no-data on the 'n' run
A01n.params.usedata = 0



In [57]:

    
## submit job reps to the cluster
A01.run(nreps=2, randomize_order=True, ipyclient=ipyclient, force=True)
A01n.run(nreps=1, randomize_order=True, ipyclient=ipyclient, force=True)









    



submitted 2 bpp jobs [A01] (100 loci)
submitted 1 bpp jobs [A01-nodata] (100 loci)



In [58]:

    
## block until jobs finish
ipyclient.wait()









    Out[58]:





True

Summarize results



In [64]:

    
## obviously this was not run long enough b/c when there is no data
## the posterior should match the prior !!!
A01n.summarize_results()



In [65]:

    
## results from two replicate runs
A01.summarize_results(individual_results=True)









    Out[65]:





[       delim    prior posterior  nspecies
 0   00000000  0.05882       0.0         1
 1   10000000  0.05882       0.0         2
 2   11000000  0.05882       0.0         3
 3   11000010  0.05882       0.0         4
 4   11000011  0.05882       0.0         5
 5   11100000  0.05882       0.0         4
 6   11100010  0.05882       0.0         5
 7   11100011  0.05882       0.0         6
 8   11110000  0.05882       0.0         5
 9   11110010  0.05882       0.0         6
 10  11110011  0.05882       0.0         7
 11  11111000  0.05882       0.0         6
 12  11111010  0.05882       0.0         7
 13  11111011  0.05882       0.0         8
 14  11111100  0.05882       0.0         7
 15  11111110  0.05882       0.0         8
 16  11111111  0.05882       1.0         9,
        delim    prior posterior  nspecies
 0   00000000  0.05882       0.0         1
 1   10000000  0.05882       0.0         2
 2   11000000  0.05882       0.0         3
 3   11000010  0.05882       0.0         4
 4   11000011  0.05882       0.0         5
 5   11100000  0.05882       0.0         4
 6   11100010  0.05882       0.0         5
 7   11100011  0.05882       0.0         6
 8   11110000  0.05882       0.0         5
 9   11110010  0.05882       0.0         6
 10  11110011  0.05882   0.00325         7
 11  11111000  0.05882       0.0         6
 12  11111010  0.05882       0.0         7
 13  11111011  0.05882    0.0288         8
 14  11111100  0.05882       0.0         7
 15  11111110  0.05882       0.0         8
 16  11111111  0.05882   0.96795         9]



In [66]:

    
## or, averaged results across independent replicates
A01.summarize_results(individual_results=False)

Other Examples: Setting up tests of multiple prior settings

Rannala and Yang suggest that you try out several species delimitation algorithms and that you do so over a range of params for the two algorithms. They suggest that you run algorithm 0 with $\epsilon$=(2, 5, 10, 20), and algorithm 1 with $\alpha$=(1, 1.5, 2) and $m$=(1, 1.5, 2). And also to do this with different starting trees. Using our programmatic approach you can easily set up all of these tests and run them in parallel using a simple for-loop setup.



In [297]:

    
## create a bpp object with algorithm 01 settings
A01 = A10.copy(name="A01")
A01.params.infer_sptree = 0
A01.params.infer_delimit = 1



In [298]:

    
## submit jobs using a range of param options
DELIMIT_TESTS = [
    (0, 2),
    (0, 5),
    (0, 10),
    (1, 1.0, 1.0),
    (1, 1.0, 1.5),
    (1, 1.0, 2.0),
    (1, 1.5, 1.0), 
    (1, 1.5, 1.5), 
    (1, 1.5, 2.0),
    (1, 2.0, 1.0), 
    (1, 2.0, 1.5), 
    (1, 2.0, 2.0)
]

## a dictionary to store our bpp objects
bpps = {}

## iterate over delimit args
for test in DELIMIT_TESTS:
    ## make a copy
    name = "A10-{}".format("-".join([str(i) for i in test]))
    bpps[name] = A10.copy(name)
    
    ## set the delimit algorithm
    bpps[name].params.delimit_alg = test
    
    ## submit the job
    bpps[name].run(nreps=1, seed=123, ipyclient=ipyclient)









    



submitted 1 bpp jobs [A10-0-2] (100 loci)
submitted 1 bpp jobs [A10-0-5] (100 loci)
submitted 1 bpp jobs [A10-0-10] (100 loci)
submitted 1 bpp jobs [A10-1-1.0-1.0] (100 loci)
submitted 1 bpp jobs [A10-1-1.0-1.5] (100 loci)
submitted 1 bpp jobs [A10-1-1.0-2.0] (100 loci)
submitted 1 bpp jobs [A10-1-1.5-1.0] (100 loci)
submitted 1 bpp jobs [A10-1-1.5-1.5] (100 loci)
submitted 1 bpp jobs [A10-1-1.5-2.0] (100 loci)
submitted 1 bpp jobs [A10-1-2.0-1.0] (100 loci)
submitted 1 bpp jobs [A10-1-2.0-1.5] (100 loci)
submitted 1 bpp jobs [A10-1-2.0-2.0] (100 loci)

	count	mean	std	min	25%	50%	75%	max
theta_1cup	40000.0000	0.0016	0.0005	0.0004	0.0012	0.0016	0.0019	0.0046
theta_2cya	40000.0000	0.0005	0.0003	0.0000	0.0002	0.0004	0.0007	0.0027
theta_3cys	40000.0000	0.0004	0.0003	0.0000	0.0001	0.0004	0.0007	0.0016
theta_4lip	40000.0000	0.0010	0.0003	0.0004	0.0008	0.0010	0.0012	0.0023
theta_5prz	40000.0000	0.0039	0.0007	0.0019	0.0034	0.0038	0.0043	0.0070
theta_6rck	40000.0000	0.0014	0.0005	0.0004	0.0011	0.0014	0.0017	0.0042
theta_7rex	40000.0000	0.0028	0.0008	0.0009	0.0022	0.0028	0.0033	0.0063
theta_8sup	40000.0000	0.0011	0.0004	0.0002	0.0008	0.0010	0.0013	0.0039
theta_9tha	40000.0000	0.0015	0.0005	0.0005	0.0012	0.0014	0.0017	0.0052
theta_10rexliprckthacupcyscyasupprz	40000.0000	0.0039	0.0032	0.0002	0.0011	0.0024	0.0068	0.0144
theta_11rexliprckthacupcyscyasup	40000.0000	0.0029	0.0011	0.0004	0.0021	0.0030	0.0038	0.0084
theta_12rexliprckthacup	40000.0000	0.0044	0.0013	0.0009	0.0035	0.0044	0.0052	0.0116
theta_13rexliprcktha	40000.0000	0.0026	0.0017	0.0001	0.0013	0.0022	0.0037	0.0104
theta_14rexliprck	40000.0000	0.0017	0.0010	0.0000	0.0010	0.0015	0.0022	0.0088
theta_15rexlip	40000.0000	0.0018	0.0010	0.0001	0.0010	0.0016	0.0023	0.0079
theta_16cyscyasup	40000.0000	0.0023	0.0012	0.0001	0.0014	0.0021	0.0030	0.0094
theta_17cyasup	40000.0000	0.0026	0.0011	0.0003	0.0018	0.0025	0.0033	0.0085
tau_10rexliprckthacupcyscyasupprz	40000.0000	0.0111	0.0034	0.0043	0.0075	0.0122	0.0141	0.0191
tau_11rexliprckthacupcyscyasup	40000.0000	0.0041	0.0005	0.0020	0.0037	0.0040	0.0044	0.0064
tau_12rexliprckthacup	40000.0000	0.0017	0.0004	0.0007	0.0014	0.0018	0.0021	0.0033
tau_13rexliprcktha	40000.0000	0.0014	0.0005	0.0005	0.0009	0.0013	0.0018	0.0027
tau_14rexliprck	40000.0000	0.0013	0.0005	0.0005	0.0009	0.0012	0.0017	0.0025
tau_15rexlip	40000.0000	0.0012	0.0005	0.0004	0.0008	0.0011	0.0016	0.0025
tau_16cyscyasup	40000.0000	0.0032	0.0006	0.0010	0.0028	0.0032	0.0036	0.0053
tau_17cyasup	40000.0000	0.0015	0.0008	0.0001	0.0008	0.0014	0.0021	0.0041
lnL	40000.0000	-12151.9644	271.9991	-12524.5230	-12422.8427	-12161.9050	-11879.7943	-11807.7240

	count	mean	std	min	25%	50%	75%	max
theta_1cup	20000.0000	0.0008	0.0007	0.0000	0.0003	0.0007	0.0011	0.0061
theta_2cya	20000.0000	0.0011	0.0007	0.0001	0.0006	0.0009	0.0014	0.0054
theta_3cys	20000.0000	0.0011	0.0007	0.0002	0.0006	0.0009	0.0014	0.0052
theta_4lip	20000.0000	0.0009	0.0007	0.0001	0.0003	0.0007	0.0013	0.0049
theta_5prz	20000.0000	0.0008	0.0006	0.0001	0.0003	0.0006	0.0011	0.0045
theta_6rck	20000.0000	0.0010	0.0007	0.0001	0.0005	0.0009	0.0013	0.0071
theta_7rex	20000.0000	0.0011	0.0007	0.0001	0.0006	0.0009	0.0014	0.0072
theta_8sup	20000.0000	0.0011	0.0007	0.0001	0.0006	0.0010	0.0015	0.0059

	count	mean	std	min	25%	50%	75%	max
theta_1cup	60000.0000	0.0017	0.0005	0.0004	0.0013	0.0016	0.0020	0.0046
theta_2cya	60000.0000	0.0006	0.0004	0.0000	0.0003	0.0005	0.0008	0.0027
theta_3cys	60000.0000	0.0005	0.0003	0.0000	0.0002	0.0006	0.0008	0.0019
theta_4lip	60000.0000	0.0010	0.0002	0.0004	0.0009	0.0010	0.0012	0.0025
theta_5prz	60000.0000	0.0040	0.0007	0.0019	0.0035	0.0040	0.0045	0.0071
theta_6rck	60000.0000	0.0015	0.0005	0.0004	0.0012	0.0015	0.0018	0.0051
theta_7rex	60000.0000	0.0027	0.0007	0.0009	0.0022	0.0026	0.0032	0.0063
theta_8sup	60000.0000	0.0011	0.0004	0.0002	0.0008	0.0010	0.0013	0.0039

	mean	S.D.	ESS*
theta_1cup	0.001350	0.000420	40.3
theta_2cya	0.000692	0.000324	43.3
theta_3cys	0.000717	0.000177	60.4
theta_4lip	0.000920	0.000204	105.9
theta_5prz	0.003490	0.000489	377.8
theta_6rck	0.001130	0.000321	121.2
theta_7rex	0.002355	0.000554	98.2
theta_8sup	0.000954	0.000396	51.8
theta_9tha	0.001655	0.000485	95.3
theta_10rexliprckthacupcyscyasupprz	0.006313	0.002876	7.6
theta_11rexliprckthacupcyscyasup	0.002176	0.000815	22.9
theta_12rexliprckthacup	0.004195	0.001267	36.9
theta_13rexliprcktha	0.003357	0.001846	14.0
theta_14rexliprck	0.001622	0.000920	116.8
theta_15rexlip	0.001853	0.001100	95.7
theta_16cyscyasup	0.002449	0.001300	172.2
theta_17cyasup	0.003107	0.001055	110.5
tau_10rexliprckthacupcyscyasupprz	0.008021	0.002025	7.6
tau_11rexliprckthacupcyscyasup	0.004015	0.000519	107.8
tau_12rexliprckthacup	0.001575	0.000461	15.9
tau_13rexliprcktha	0.000965	0.000194	22.1
tau_14rexliprck	0.000911	0.000181	21.9
tau_15rexlip	0.000822	0.000176	27.5
tau_16cyscyasup	0.002851	0.000577	124.8
tau_17cyasup	0.000876	0.000438	30.4
lnL	-11880.819322	21.881173	22.1

	delim	prior	posterior	nspecies
0	00000000	0.05882	0.0	1
1	10000000	0.05882	0.0	2
2	11000000	0.05882	0.0	3
3	11000010	0.05882	0.0	4
4	11000011	0.05882	0.0	5
5	11100000	0.05882	0.0	4
6	11100010	0.05882	0.0	5
7	11100011	0.05882	0.0	6
8	11110000	0.05882	0.9862	5
9	11110010	0.05882	0.0	6
10	11110011	0.05882	0.0	7
11	11111000	0.05882	0.0138	6
12	11111010	0.05882	0.0	7
13	11111011	0.05882	0.0	8
14	11111100	0.05882	0.0	7
15	11111110	0.05882	0.0	8
16	11111111	0.05882	0.0	9