I have made a little analysis to test our ability to date a tree with node order constraints. I use the following tree:
In [53]:
import sys
from ete3 import Tree, TreeStyle, NodeStyle
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math
import scipy
import re
t = Tree("(((a:0.1,b:0.1):0.2, (c:0.2,d:0.2):0.1):0.6, ((e:0.4,f:0.4):0.3, (g:0.5,h:0.5):0.2):0.2);")
ts = TreeStyle()
ts.min_leaf_separation= 0
ts.scale = 88
nstyle = NodeStyle()
nstyle["size"] = 0
for n in t.traverse():
n.set_style(nstyle)
t.render("%%inline", tree_style=ts)
Out[53]:
Then I will compare two MCMC runs: one with 2 node order constraints, and one without. The node order constraints are as follow:
c d a b
g h e f
meaning that the MRCA of cd is older than the MRCA of AB (same thing for the second line), just like in the tree. First we take a short look at the MCMC run.
In [58]:
runConsMet = pd.read_csv("output/testConstraintsMet.log", sep="\t")
In [59]:
runConsMet.describe()
Out[59]:
In [60]:
%matplotlib inline
plt.plot(runConsMet['Iteration'], runConsMet['Posterior'], 'b-')
plt.xlabel("Iteration", fontsize=15)
plt.ylabel("Posterior", fontsize=15)
Out[60]:
In [61]:
%matplotlib inline
plt.plot(runConsMet['Iteration'], runConsMet['root'], 'b-')
plt.xlabel("Iteration", fontsize=15)
plt.ylabel("Root age", fontsize=15)
Out[61]:
In [62]:
def readTreesFromRBOutput (file):
try:
f=open(file, 'r')
except IOError:
print ("Unknown file: "+file)
sys.exit()
line = ""
treeStrings = list()
for l in f:
if "teration" not in l:
m = re.sub('\[&index=\d+\]', "", l)
treeStrings.append(m.split()[4])
trees=list()
for l in treeStrings:
trees.append ( Tree( l ) )
return trees
trees = readTreesFromRBOutput("output/testConstraintsMet.trees")
In [63]:
def getNodeHeights( t ):
node2Height = dict()
id2Height = dict()
for node in t.traverse("postorder"):
if node not in node2Height:
node2Height[node] = 0.0
id2Height[node.name] = 0.0
if node.up:
if node.up.name =='':
leaves = node.up.get_leaves()
name=""
for l in leaves:
name += l.name
node.up.name=name
node2Height[node.up] = node2Height[node] + node.dist
id2Height[str(node.up.name)] = node2Height[node] + node.dist
# print node.name + " : " + str(node2Height[node])
#return node2Height,id2Height
return id2Height
allHeights = list()
for t in trees:
allHeights.append(getNodeHeights(t))
Let's look at the age difference between two nodes constrained in an older-younger fashion (the first constraint.
In [64]:
diffABMinusCD = list()
for a in allHeights:
diffABMinusCD.append(a["ab"] - a["cd"])
pd.DataFrame( diffABMinusCD ).describe()
Out[64]:
They are all negative, and thus agree with the constraint. We look at the other constraint.
In [65]:
diffEFMinusGH = list()
for a in allHeights:
diffEFMinusGH.append(a["ef"] - a["gh"])
pd.DataFrame( diffEFMinusGH ).describe()
Out[65]:
Same thing. Now we look into an unconstrained run.
In [66]:
treesNC = readTreesFromRBOutput("output/testNoConstraint.trees")
allHeightsNC = list()
for t in treesNC:
allHeightsNC.append(getNodeHeights(t))
diffABMinusCDNC = list()
for a in allHeightsNC:
diffABMinusCDNC.append(a["ab"] - a["cd"])
pd.DataFrame( diffABMinusCDNC ).describe()
Out[66]:
The difference is not always negative.
In [67]:
diffEFMinusGHNC = list()
for a in allHeightsNC:
diffEFMinusGHNC.append(a["ef"] - a["gh"])
pd.DataFrame( diffEFMinusGHNC ).describe()
Out[67]:
In [ ]: