Cookbook: RAxML analyses in a notebook

As part of the ipyrad.analysis toolkit we've created convenience functions for easily running common RAxML commands. This can be useful when you want to run all of your analyes in a clean stream-lined way in a jupyter-notebook to create a completely reproducible study.

Install software

There are many ways to install raxml, the simplest of which is to use conda. This will install several raxml binaries into your conda path. If you want to call a different version of raxml that can easily be done by changing the parameter 'binary'.


In [1]:
## conda install ipyrad -c ipyrad
## conda install toytree -c eaton-lab
## conda install raxml -c bioconda

Create a raxml Class object

Create a raxml object which has a bunch of default parameters associated with it. The only required argument to initialize the object is a phylip formatted sequence file. In this example I provide a name and working directory as well.


In [2]:
import ipyrad.analysis as ipa
import toyplot
import toytree

In [3]:
rax = ipa.raxml(
    data="./analysis-ipyrad/aligntest_outfiles/aligntest.phy",
    name="aligntest", 
    workdir="analysis-raxml",
    );

Additional options

You can also modify many of the other command line arguments to raxml by changing values in the params dictionary of your raxml object. These values could also have been set when you initialized the object.


In [4]:
## set some other params
rax.params.N = 10
rax.params.T = 2
rax.params.o = None 
#rax.params.o = ["32082_przewalskii", "33588_przewalskii"]

It is good practice to always print the command string so that you know exactly what was called for you analysis and it is documented.


In [5]:
print rax.command


raxmlHPC-PTHREADS-SSE3 -f a -T 2 -m GTRGAMMA -N 10 -x 12345 -p 54321 -n aligntest -w /home/deren/Documents/ipyrad/tests/analysis-raxml -s /home/deren/Documents/ipyrad/tests/analysis-ipyrad/aligntest_outfiles/aligntest.phy

Run the job

This will start the job running. We haven't made a progress bar yet but we will add one soon.


In [6]:
rax.run(force=True)


job aligntest finished successfully

Access results

One of the reasons it is so convenient to run your raxml jobs this way is that the results files are easily accessible from your raxml objects.


In [7]:
rax.trees


Out[7]:
bestTree                   ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bestTree.aligntest
bipartitions               ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bipartitions.aligntest
bipartitionsBranchLabels   ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bipartitionsBranchLabels.aligntest
bootstrap                  ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bootstrap.aligntest
info                       ~/Documents/ipyrad/tests/analysis-raxml/RAxML_info.aligntest

Plot the results

Here we use toytree to plot the bootstrap results.


In [8]:
tre = toytree.tree(rax.trees.bipartitions)
tre.root(wildcard="3")
tre.draw(
    height=300,
    width=300,
    node_labels=tre.get_node_values("support"),
);


3L_03K_03I_03J_02H_02G_02F_02E_01D_01C_01B_01A_0idx: 1 name: 7 dist: 0.00306813286161 support: 100100idx: 2 name: 8 dist: 0.000958273347783 support: 100100idx: 3 name: 9 dist: 0.00106650019095 support: 100100idx: 4 name: 3 dist: 0.00306813286161 support: 100100idx: 5 name: 4 dist: 0.000973067754965 support: 100100idx: 6 name: 5 dist: 0.00117656048502 support: 100100idx: 7 name: 6 dist: 0.00116405198144 support: 100100idx: 8 name: 2 dist: 0.00107367225634 support: 100100idx: 9 name: 1 dist: 0.00105363755445 support: 100100idx: 10 name: 10 dist: 0.00110599233987 support: 100100

[optional] Submit raxml jobs to run on a cluster

Using the ipyparallel library you can submit raxml jobs to run in parallel on cluster in a load-balanced fashion. You can then tell the notebook to wait until all jobs are finished before progressing in the notebook to draw trees, etc.

Start an ipyparallel cluster

In a separate terminal start an ipcluster instance and tell it how many engines to start.


In [9]:
##
##  ipcluster start --n=20
##

Create a Client connected to the cluster


In [10]:
import ipyparallel as ipp
ipyclient = ipp.Client()

Create several raxml objects for different data sets


In [11]:
rax1 = ipa.raxml(
    data="~/Documents/ipyrad/tests/analysis-ipyrad/pedic_outfiles/pedic.phy", 
    name="rax1", T=4, N=100)

rax2 = ipa.raxml(
    data="~/Documents/ipyrad/tests/analysis-ipyrad/aligntest_outfiles/aligntest.phy", 
    name="rax2", T=4, N=100)

Submit jobs to run on the cluster queue.


In [12]:
rax1.run(ipyclient=ipyclient, force=True)
rax2.run(ipyclient=ipyclient, force=True)


job rax1 submitted to cluster
job rax2 submitted to cluster

Wait for jobs to finish


In [14]:
## you can query each job while it's running
rax1.async.ready()


Out[14]:
True

In [13]:
## or just block until all jobs on ipyclient are finished
ipyclient.wait()


Out[13]:
True

Plot trees when jobs are finished

Here we will draw a slighly more complex tree figure that combines two trees onto a single canvas.


In [15]:
## load trees and add to axes
tre1 = toytree.tree(rax1.trees.bipartitions)
tre1.root(wildcard="prz")
tre1.draw(width=300);

tre2 = toytree.tree(rax2.trees.bipartitions)
tre2.root(wildcard="3")
tre2.draw(width=300);


32082_przewalskii33588_przewalskii29154_superba30686_cyathophylla41478_cyathophylloides41954_cyathophylloides33413_thamno30556_thamno40578_rex35855_rex35236_rex38362_rex39618_rex
3L_03K_03I_03J_02H_02G_02F_02E_01D_01C_01B_01A_0