In [1]:
%pylab inline
import treeCl
This loads the data into treeCl - you'll have to change the path to the data to match your filesystem
In [2]:
c = treeCl.Collection(input_dir='/Users/kgori/aa_alignments', datatype='protein', file_format='phylip')
In [3]:
len(c)
Out[3]:
In [4]:
print ', '.join([rec.name
for rec in c.records[:5]]) + '...'
Quickly calculate Neighbour-Joining trees for each alignment, and make distance matrices.
In [5]:
c.calc_phyml_trees(analysis='nj', verbosity=1)
rf_matrix = c.distance_matrix('rf')
euc_matrix = c.distance_matrix('euc')
geo_matrix = c.distance_matrix('geo')
print geo_matrix
Use these distances to cluster the trees, using MultiDimensional scaling - the same technique as underlies the embedding plot.
In [6]:
clustering = treeCl.Clustering(geo_matrix)
decomp = clustering.MDS_decomp()
partitioning = clustering.MDS_cluster(6, decomp)
print partitioning
Now we can use a Plotter object to draw some pictures
In [7]:
plotter = treeCl.Plotter(c, dm=geo_matrix)
In [8]:
hm = plotter.heatmap()
In [9]:
embed = plotter.embedding(partition=partitioning)