# Analysis of a run with constraints, and a run without

I have made a little analysis to test our ability to date a tree with node order constraints. I use the following tree:

``````

In [53]:

import sys
from ete3 import Tree, TreeStyle, NodeStyle
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math
import scipy
import re

t = Tree("(((a:0.1,b:0.1):0.2, (c:0.2,d:0.2):0.1):0.6, ((e:0.4,f:0.4):0.3, (g:0.5,h:0.5):0.2):0.2);")
ts = TreeStyle()
ts.min_leaf_separation= 0
ts.scale = 88
nstyle = NodeStyle()
nstyle["size"] = 0
for n in t.traverse():
n.set_style(nstyle)

t.render("%%inline", tree_style=ts)

``````
``````

Out[53]:

``````

Then I will compare two MCMC runs: one with 2 node order constraints, and one without. The node order constraints are as follow:

c d a b

g h e f

meaning that the MRCA of cd is older than the MRCA of AB (same thing for the second line), just like in the tree. First we take a short look at the MCMC run.

``````

In [58]:

``````
``````

In [59]:

runConsMet.describe()

``````
``````

Out[59]:

Iteration
Posterior
Likelihood
Prior
diversification
extinction
gc
root
speciation
tstv
turnover

count
1001.000000
1001.000000
1001.000000
1001.000000
1001.000000
1001.000000
1001.000000
1001.000000
1001.000000
1001.000000
1001.000000

mean
5000.000000
-113.993874
-98.425971
-15.567914
4.416536
5.334900
0.111875
0.774875
9.751436
0.942513
5.334900

std
2891.081113
2.214749
1.565877
1.904898
1.507350
2.319868
0.107302
0.198239
2.547804
0.552286
2.319868

min
0.000000
-123.309000
-105.959000
-23.481300
0.738713
0.506599
0.000130
0.305117
3.853600
0.064296
0.506599

25%
2500.000000
-115.260000
-99.369100
-16.632500
3.373290
3.620480
0.032300
0.635708
7.979750
0.528046
3.620480

50%
5000.000000
-113.593000
-98.226200
-15.475800
4.261310
4.927260
0.080141
0.770879
9.403520
0.833976
4.927260

75%
7500.000000
-112.454000
-97.292100
-14.235800
5.233970
6.774960
0.153066
0.893176
11.209300
1.241400
6.774960

max
10000.000000
-109.414000
-94.888500
-10.750500
11.936400
17.340100
0.672720
1.806330
22.550600
3.337300
17.340100

``````
``````

In [60]:

%matplotlib inline
plt.plot(runConsMet['Iteration'], runConsMet['Posterior'], 'b-')
plt.xlabel("Iteration", fontsize=15)
plt.ylabel("Posterior", fontsize=15)

``````
``````

Out[60]:

<matplotlib.text.Text at 0x10c16a6d8>

``````
``````

In [61]:

%matplotlib inline
plt.plot(runConsMet['Iteration'], runConsMet['root'], 'b-')
plt.xlabel("Iteration", fontsize=15)
plt.ylabel("Root age", fontsize=15)

``````
``````

Out[61]:

<matplotlib.text.Text at 0x1051757b8>

``````
``````

In [62]:

try:
f=open(file, 'r')
except IOError:
print ("Unknown file: "+file)
sys.exit()
line = ""
treeStrings = list()
for l in f:
if "teration" not in l:
m = re.sub('\[&index=\d+\]', "", l)
treeStrings.append(m.split()[4])
trees=list()
for l in treeStrings:
trees.append ( Tree( l ) )
return trees

``````
``````

In [63]:

def getNodeHeights( t ):
node2Height = dict()
id2Height = dict()
for node in t.traverse("postorder"):
if node not in node2Height:
node2Height[node] = 0.0
id2Height[node.name] = 0.0
if node.up:
if node.up.name =='':
leaves = node.up.get_leaves()
name=""
for l in leaves:
name += l.name
node.up.name=name
node2Height[node.up] = node2Height[node] + node.dist
id2Height[str(node.up.name)] = node2Height[node] + node.dist
# print node.name + " : " + str(node2Height[node])
#return node2Height,id2Height
return id2Height

allHeights = list()
for t in trees:
allHeights.append(getNodeHeights(t))

``````

Let's look at the age difference between two nodes constrained in an older-younger fashion (the first constraint.

``````

In [64]:

diffABMinusCD = list()
for a in allHeights:
diffABMinusCD.append(a["ab"] - a["cd"])

pd.DataFrame( diffABMinusCD ).describe()

``````
``````

Out[64]:

0

count
1001.000000

mean
-0.146086

std
0.113012

min
-0.629282

25%
-0.205406

50%
-0.122005

75%
-0.059797

max
-0.000474

``````

They are all negative, and thus agree with the constraint. We look at the other constraint.

``````

In [65]:

diffEFMinusGH = list()
for a in allHeights:
diffEFMinusGH.append(a["ef"] - a["gh"])

pd.DataFrame( diffEFMinusGH ).describe()

``````
``````

Out[65]:

0

count
1001.000000

mean
-0.147083

std
0.114754

min
-0.598449

25%
-0.216569

50%
-0.123322

75%
-0.055978

max
-0.000051

``````

Same thing. Now we look into an unconstrained run.

``````

In [66]:

allHeightsNC = list()
for t in treesNC:
allHeightsNC.append(getNodeHeights(t))

diffABMinusCDNC = list()
for a in allHeightsNC:
diffABMinusCDNC.append(a["ab"] - a["cd"])

pd.DataFrame( diffABMinusCDNC ).describe()

``````
``````

Out[66]:

0

count
1001.000000

mean
-0.014525

std
0.258753

min
-1.001023

25%
-0.172961

50%
-0.019800

75%
0.144513

max
1.010819

``````

The difference is not always negative.

``````

In [67]:

diffEFMinusGHNC = list()
for a in allHeightsNC:
diffEFMinusGHNC.append(a["ef"] - a["gh"])

pd.DataFrame( diffEFMinusGHNC ).describe()

``````
``````

Out[67]:

0

count
1001.000000

mean
-0.012098

std
0.265463

min
-1.402574

25%
-0.152929

50%
-0.003269

75%
0.140969

max
0.970412

``````

In this case as well, the difference is not always negative.

# To conclude, it seems like the constraints have an impact on the posterior chronogram distribution, so we have a program to do relaxed molecular clock with node order constraints.

``````

In [ ]:

``````