In [1]:
%pylab inline
import treeCl


Populating the interactive namespace from numpy and matplotlib

This loads the data into treeCl - you'll have to change the path to the data to match your filesystem


In [2]:
c = treeCl.Collection(input_dir='/Users/kgori/aa_alignments', datatype='protein', file_format='phylip')


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-5cd8e2cc6e15> in <module>()
----> 1 c = treeCl.Collection(input_dir='/Users/kgori/aa_alignments', datatype='protein', file_format='phylip')

TypeError: __init__() got an unexpected keyword argument 'datatype'

In [3]:
len(c)


Out[3]:
60

In [4]:
print ', '.join([rec.name
                 for rec in c.records[:5]]) + '...'


class1_1, class1_2, class1_3, class1_4, class1_5...

Quickly calculate Neighbour-Joining trees for each alignment, and make distance matrices.


In [5]:
c.calc_phyml_trees(analysis='nj', verbosity=1)
rf_matrix = c.distance_matrix('rf')
euc_matrix = c.distance_matrix('euc')
geo_matrix = c.distance_matrix('geo')
print geo_matrix


Running phyml on class4_15
[[ 0.     0.297  0.308 ...,  3.064  2.946  2.778]
 [ 0.297  0.     0.213 ...,  3.039  2.92   2.744]
 [ 0.308  0.213  0.    ...,  3.027  2.905  2.732]
 ..., 
 [ 3.064  3.039  3.027 ...,  0.     0.241  0.485]
 [ 2.946  2.92   2.905 ...,  0.241  0.     0.432]
 [ 2.778  2.744  2.732 ...,  0.485  0.432  0.   ]]

Use these distances to cluster the trees, using MultiDimensional scaling - the same technique as underlies the embedding plot.


In [6]:
clustering = treeCl.Clustering(geo_matrix)
decomp = clustering.MDS_decomp()
partitioning = clustering.MDS_cluster(6, decomp)
print partitioning


(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 3, 3, 2, 2, 2, 3, 2, 2, 3, 3, 2, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 6, 6, 6, 5, 5, 5, 6, 6, 5, 5, 6, 6, 6, 5)

Now we can use a Plotter object to draw some pictures


In [7]:
plotter = treeCl.Plotter(c, dm=geo_matrix)

In [8]:
hm = plotter.heatmap()



In [9]:
embed = plotter.embedding(partition=partitioning)