Cookbook: RAxML analyses in a notebook

As part of the ipyrad.analysis toolkit we've created convenience functions for easily running common RAxML commands. This can be useful when you want to run all of your analyes in a clean stream-lined way in a jupyter-notebook to create a completely reproducible study.

Install software

There are many ways to install raxml, the simplest of which is to use conda. This will install several raxml binaries into your conda path. If you want to call a different version of raxml that can easily be done by changing the parameter 'binary'.



In [1]:

    
## conda install ipyrad -c ipyrad
## conda install toytree -c eaton-lab
## conda install raxml -c bioconda

Create a raxml Class object

Create a raxml object which has a bunch of default parameters associated with it. The only required argument to initialize the object is a phylip formatted sequence file. In this example I provide a name and working directory as well.



In [2]:

    
import ipyrad.analysis as ipa
import toyplot
import toytree



In [3]:

    
rax = ipa.raxml(
    data="./analysis-ipyrad/aligntest_outfiles/aligntest.phy",
    name="aligntest", 
    workdir="analysis-raxml",
    );

Additional options

You can also modify many of the other command line arguments to raxml by changing values in the params dictionary of your raxml object. These values could also have been set when you initialized the object.



In [4]:

    
## set some other params
rax.params.N = 10
rax.params.T = 2
rax.params.o = None 
#rax.params.o = ["32082_przewalskii", "33588_przewalskii"]

Print the command string

It is good practice to always print the command string so that you know exactly what was called for you analysis and it is documented.



In [5]:

    
print rax.command









    



raxmlHPC-PTHREADS-SSE3 -f a -T 2 -m GTRGAMMA -N 10 -x 12345 -p 54321 -n aligntest -w /home/deren/Documents/ipyrad/tests/analysis-raxml -s /home/deren/Documents/ipyrad/tests/analysis-ipyrad/aligntest_outfiles/aligntest.phy

Run the job

This will start the job running. We haven't made a progress bar yet but we will add one soon.



In [6]:

    
rax.run(force=True)









    



job aligntest finished successfully

Access results

One of the reasons it is so convenient to run your raxml jobs this way is that the results files are easily accessible from your raxml objects.



In [7]:

    
rax.trees









    Out[7]:





bestTree                   ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bestTree.aligntest
bipartitions               ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bipartitions.aligntest
bipartitionsBranchLabels   ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bipartitionsBranchLabels.aligntest
bootstrap                  ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bootstrap.aligntest
info                       ~/Documents/ipyrad/tests/analysis-raxml/RAxML_info.aligntest

Plot the results

Here we use toytree to plot the bootstrap results.



In [8]:

    
tre = toytree.tree(rax.trees.bipartitions)
tre.root(wildcard="3")
tre.draw(
    height=300,
    width=300,
    node_labels=tre.get_node_values("support"),
);









    




Save graph vertex data as CSV
Save graph edge data as CSV
Save scatterplot as CSV

[optional] Submit raxml jobs to run on a cluster

Using the ipyparallel library you can submit raxml jobs to run in parallel on cluster in a load-balanced fashion. You can then tell the notebook to wait until all jobs are finished before progressing in the notebook to draw trees, etc.

Start an ipyparallel cluster

In a separate terminal start an ipcluster instance and tell it how many engines to start.



In [9]:

    
##
##  ipcluster start --n=20
##

Create a Client connected to the cluster



In [10]:

    
import ipyparallel as ipp
ipyclient = ipp.Client()

Create several raxml objects for different data sets



In [11]:

    
rax1 = ipa.raxml(
    data="~/Documents/ipyrad/tests/analysis-ipyrad/pedic_outfiles/pedic.phy", 
    name="rax1", T=4, N=100)

rax2 = ipa.raxml(
    data="~/Documents/ipyrad/tests/analysis-ipyrad/aligntest_outfiles/aligntest.phy", 
    name="rax2", T=4, N=100)

Submit jobs to run on the cluster queue.



In [12]:

    
rax1.run(ipyclient=ipyclient, force=True)
rax2.run(ipyclient=ipyclient, force=True)









    



job rax1 submitted to cluster
job rax2 submitted to cluster

Wait for jobs to finish



In [14]:

    
## you can query each job while it's running
rax1.async.ready()









    Out[14]:





True



In [13]:

    
## or just block until all jobs on ipyclient are finished
ipyclient.wait()









    Out[13]:





True

Plot trees when jobs are finished

Here we will draw a slighly more complex tree figure that combines two trees onto a single canvas.



In [15]:

    
## load trees and add to axes
tre1 = toytree.tree(rax1.trees.bipartitions)
tre1.root(wildcard="prz")
tre1.draw(width=300);

tre2 = toytree.tree(rax2.trees.bipartitions)
tre2.root(wildcard="3")
tre2.draw(width=300);









    




Save graph vertex data as CSV
Save graph edge data as CSV






    




Save graph vertex data as CSV
Save graph edge data as CSV