The underlying idea is to map the different flagella genes (COG) in selected species of bacteria to the operon graph. I want to see if we can understand the structure of the operons and the different path each organism took to establish its flagella structure.Is it feasible then to engineer E.coli to take on different flagella structures.
(a) Monotrichous (polar): Contains only a single sheathed polar flagella
- Vibrio cholerae - vch; Vibrio_cholerae_O1_biovar_El_Tor_N16961_uid57623; NC_002505; REF : Richardson K. 1991. Infection and Immunity. 59 (8): 2727; [link](http://iai.asm.org/content/59/8/2727)
- Pseudomonas aueriginosa - pae; Pseudomonas_aeruginosa_PAO1_uid57945; NC_002516; REF : Bucior I, et.al. 2012. Plos Pathogens ; [DOI: 10.1371/journal.ppat.1002616](http://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1002616)
(b) Lophotrichous : Contains multiple flagella but all at a pole. They may be sheathed as in Helicobacter
- Helicobacter pylori - hpg; Helicobacter_pylori_G27_uid59305; NC_011333 REF : Ottemann KM and Lowenthal AC. 2002. 70(4): 1984; [DOI : 10.1128/IAI.70.4.1984-1990.2002](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC127824/)
- Vibrio fischeri - vfi; Vibrio_fischeri_ES114_uid58163; NC_006840 (In Kegg: Aliivibrio fischeri ES114) REF : Millikan DS and Ruby EG .2004. Journal of Bacteriology. 186(13): 4315; [DOI: 10.1128/JB.186.13.4315-4325.2004](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC421587/)
(c) Peritrichous : The flagella surrounds most of the bacteria.
- E.coli - eco; Escherichia_coli_K_12_substr__MG1655_uid57779; NC_000913;
REF : Copeland M, et.al. 2010. 76(4):1242; [DOI: 10.1128/AEM.02153-09](http://aem.asm.org/content/76/4/1241.full)- Salmonella typhi - seo; Salmonella_enterica_serovar_Typhimurium_14028S_uid86059; NC_016856;
REF : Paul K, et.al. 2010. Molecular Cell. 1:128; [DOI: 10.1016/j.molcel.2010.03.001](http://www.sciencedirect.com/science/article/pii/S1097276510002005)- Bacillus subtilus - bsu; Bacillus_subtilis_168_uid57675; NC_000964;
REF : Mukherjee S and Kearns D B. 2014. 48: 319; [DOI: 10.1146/annurev-genet-120213-092406](http://www.annualreviews.org/doi/full/10.1146/annurev-genet-120213-092406)
(d) Amphitrichous : I am considering both single (as in Campylobacter) as well as multiply flagellated (Rhodospirillum rubrum) organism. However these are bipolar i.e. the flagella exists only at the poles.
- Campylobacter jejuni - cjj; Campylobacter_jejuni_81_176_uid58503; NC_008787;
REF : Balaban M, Hendrixson D R. 2011. Plos Pathogens; [DOI: 10.1371/journal.ppat.1002420](http://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1002420)- Rhodospirillum rubrum -rru; Rhodospirillum_rubrum_ATCC_11170_uid57655; NC_007643
REF : Lee AG, Fitzsimons JTR. 1975. Journal of General Microbiology. 93: 346; [link](http://mic.microbiologyresearch.org/content/journal/micro/10.1099/00221287-93-2-346?crawler=true)
(e) Periplasmic : The flagella is located in the periplasmic space. These are mainly spirochetes. The rotation of the flagella causes a cork-screw movement of the bacteria.
- Borrelia burgdorferi - bbu; Borrelia_burgdorferi_B31_uid57581; NC_001318;
REF : Motaleb MA, et.al. 2000. Proceedings of National Academy of Sciences. 97(20): 10899–10904; [DOI: 10.1073/pnas.200221797](http://www.pnas.org/content/97/20/10899.long)- Brachyspira hyodysenteriae: bhy; Brachyspira_hyodysenteriae_WA1_uid59291; NC_012225;
REF : Li C, et.al. 2002. Proceedings of the National Academy of Sciences. 99 (9): 6169 [DOI: 10.1073/pnas.092010499](http://www.pnas.org/content/99/9/6169.full)
From Kegg database, Flagellar assembly is considered. Flagellar assembly pathway: (KEGG pathway ko: ko02040); The orthologs involved are -
K02400 flagellar biosynthesis protein FlhA
K02401 flagellar biosynthetic protein FlhB
K02421 flagellar biosynthetic protein FliR
K13820 flagellar biosynthetic protein FliR/FlhB
K02420 flagellar biosynthetic protein FliQ
K02419 flagellar biosynthetic protein FliP
K02418 flagellar protein FliO/FliZ
K02417 flagellar motor switch protein FliN/FliY
K02416 flagellar motor switch protein FliM
K02557 chemotaxis protein MotB
K02556 chemotaxis protein MotA
K02402 flagellar transcriptional activator FlhC ; Unmapped to COG
K02403 flagellar transcriptional activator FlhD ; Unmapped to COG
K02390 flagellar hook protein FlgE
K02389 flagellar basal-body rod modification protein FlgD
K02414 flagellar hook-length control protein FliK
K02413 flagellar FliJ protein
K02412 flagellum-specific ATP synthase
K02411 flagellar assembly protein FliH
K02410 flagellar motor switch protein FliG
K02409 flagellar M-ring protein FliF
K02408 flagellar hook-basal body complex protein FliE
K02388 flagellar basal-body rod protein FlgC
K02387 flagellar basal-body rod protein FlgB
K02386 flagella basal body P-ring formation protein FlgA
K02398 negative regulator of flagellin synthesis FlgM
K02399 flagella synthesis protein FlgN
K02391 flagellar basal-body rod protein FlgF
K02392 flagellar basal-body rod protein FlgG
K02393 flagellar L-ring protein precursor FlgH
K02394 flagellar P-ring protein precursor FlgI
K02396 flagellar hook-associated protein 1 FlgK
K02397 flagellar hook-associated protein 3 FlgL
K02406 flagellin
K02407 flagellar hook-associated protein 2
K02422 flagellar protein FliS
K02423 flagellar protein FliT ; Unmapped to COG
In [33]:
# Loading dictionaries
import os
import pickle
import time
import sys
def loadPkl(fname):
pklDir = '/home/jaggu/research/projectFiles/operons/pklFiles'
f = os.path.join(pklDir,fname)
db = pickle.load(open(f))
return db
kog_cogList_dict = loadPkl('kog_cogList.dict.pkl')
cog_kogList_dict = loadPkl('cog_kogList.dict.pkl')
org_cogPair_dict = loadPkl('org_cogPair.dict.pkl')
cogPair_fgocInfo_dict = loadPkl('cogPair_fgocInfo.dict.pkl')
print "Dictionaries loaded ",time.ctime()
In [4]:
orgDict = {'vch': ('Vibrio_cholerae_O1_biovar_El_Tor_N16961_uid57623','NC_002505'),
'pae': ('Pseudomonas_aeruginosa_PAO1_uid57945','NC_002516'),
'hpg': ('Helicobacter_pylori_G27_uid59305','NC_011333'),
'vfi': ('Vibrio_fischeri_ES114_uid58163','NC_006840'),
'eco': ('Escherichia_coli_K_12_substr__MG1655_uid57779','NC_000913'),
'seo': ('Salmonella_enterica_serovar_Typhimurium_LT2_uid57799','NC_003197'),
'bsu': ('Bacillus_subtilis_168_uid57675','NC_000964'),
'cjj': ('Campylobacter_jejuni_81_176_uid58503','NC_008787'),
'rru': ('Rhodospirillum_rubrum_ATCC_11170_uid57655','NC_007643'),
'bbu': ('Borrelia_burgdorferi_B31_uid57581','NC_001318'),
'bhy': ('Brachyspira_hyodysenteriae_WA1_uid59291','NC_012225')
}
print "Organism dictionary created"
In [46]:
# Flagella list
fla_kogList = [ 'K02400','K02401','K02421','K13820','K02420','K02419','K02418','K02417','K02416','K02557',\
'K02556','K02402','K02403','K02390','K02389','K02414','K02413','K02412','K02411','K02410',\
'K02409','K02408','K02388','K02387','K02386','K02398','K02399','K02391','K02392','K02393',\
'K02394','K02396','K02397','K02406','K02407','K02422','K02423']
# COG List
fla_cogList = list()
for item in fla_kogList:
try:
cogL = kog_cogList_dict[item]
except KeyError:
continue;
fla_cogList.extend(cogL)
allCogPair = cogPair_fgocInfo_dict.keys()
fla_cogList_cutoff = list()
# Making a metagenome graph
cutoff = 0.1
destDir = '/home/jaggu/research/projectFiles/operons/graphFiles/flagella'
fname1 = 'AllBacteria.cutoff.noNAN.graph.tab'
fname2 = 'AllBacteria.cutoff.graph.tab'
ofile1 = open(os.path.join(destDir,fname1),'w')
ofile2 = open(os.path.join(destDir,fname2),'w')
header = '\t'.join(['COG-A','COG-B','COUNT','dirFGOC','FGOC','COUNT(COG-A)','COUNT(COG-B)','\n'])
ofile1.write(header)
ofile2.write(header)
for cog1,cog2 in allCogPair:
if cog1 in fla_cogList or cog2 in fla_cogList:
#values: [cogA, cogB, count, dfgoc, fgoc, aNbr, bNbr]
# '-': This is replaced with nan
vals = cogPair_fgocInfo_dict[(cog1,cog2)]
dfgoc = float(vals[3])
if dfgoc > cutoff:
cogPairs_fla_cutoff_allOrg_list.extend([(cog1,cog2)])
if not (cog1 == '-' or cog2 == '-'):
info1 = '\t'.join(map(str,vals))+'\n'
ofile1.write(info1)
fla_cogList_cutoff.append(vals)
elif cog1 == '-' and cog2 == '-':
continue;
else:
info2 = '\t'.join(map(str,vals))+'\n'
ofile2.write(info2)
ofile1.close()
# Appending the graph file (with unassigned COG) with the graph file
ofile1 = open(os.path.join(destDir,fname1),'r')
lines = ofile1.readlines()[1:]
for line in lines:
ofile2.write(line)
ofile2.close()
print "Flagella assembly pathway as COG list made"
print "Operon graph with flagella COG genes made "
The total number of orthologous group assigned for Flagella assembly are -
In [106]:
# Defining organisms; This is where we can change or iterate through organisms
destDir = '/home/jaggu/research/projectFiles/operons/graphFiles/flagella'
org_FlagellaPairs_dict = dict()
def makeOperon(org,orgName,fla_cogList):
cogPair = list()
fname = org + '.graph.tab'
ofile = open(os.path.join(destDir,fname),'w')
header = '\t'.join(['COG-A','COG-B','\n'])
ofile.write(header)
allCogPairs = org_cogPair_dict[orgName]
for cogA,cogB in allCogPairs:
if cogA in fla_cogList or cogB in fla_cogList:
ofile.write(cogA+'\t'+cogB+'\t'+'\n')
cogPair.extend([(cogA,cogB)])
ofile.close()
org_FlagellaPairs_dict[org]=cogPair
print "Flagella for %s org done "%(org)
return True
def savePkl(db,pklFname):
pklPath = '/home/jaggu/research/projectFiles/operons/pklFiles'
f = os.path.join(pklPath,pklFname)
pickle.dump(db,open(f,'w'))
return
for org, orgName in orgDict.items():
makeOperon(org,orgName,fla_cogList)
savePkl(org_FlagellaPairs_dict,'selected_org.flagellaCOGPairs.dict.pkl')
print "Completed graph file and dictionary (contains unassigned COG :'-')"
In [8]:
# Coloring Nodes; Associating COG groups with functional linkage
import os
cog_func_list = [('COG1345','K02407','CAP','fliD'),
('COG1344','K02397','HOOK ASSOCIATED','flgL'),
('COG1256','K02396','HOOK ASSOCIATED','flgK'),
('COG1344','K02406','FILAMENT','fliC'),
('COG3144','K02414','HOOK','fliK'),
('COG1843','K02389','HOOK','flgD'),
('COG1749','K02392','DISTAL RING','flgG'),
('COG2063','K02393','L RING','flgH'),
('COG1749','K02391','P ROD','flgF'),
('COG1706','K02394','P RING','flgI'),
('COG1677','K02408','P ROD','fliE'),
('COG1815','K02387','P ROD','flgB'),
('COG1558','K02388','P ROD','flgC'),
('COG1749','K02390','HOOK','flgE'),
('COG1766','K02409','MS RING','fliF'),
('COG1536','K02410','C RING','fliG'),
('COG1868','K02416','C RING','fliM'),
('COG1886','K02417','C RING','fliN'),
('COG1360','K02557','MOTOR','motB'),
('COG1291','K02556','MOTOR','motA'),
('COG1298','K02400','TYPE III SS','flhA'),
('COG1377','K02401','TYPE III SS','flhB'),
('COG1317','K02411','TYPE III SS','fliH'),
('COG1157','K02412','TYPE III SS','fliI'),
('COG3190','K02418','TYPE III SS','fliO'),
('COG1987','K02420','TYPE III SS','fliQ'),
('COG1338','K02419','TYPE III SS','fliP'),
('COG1684','K02421','TYPE III SS','fliR'),
('COG1684','K02415','BASAL BODY','fliI'),
('COG1261','K02386','BASAL BODY','flgA'),
('COG3418','K02399','CHAPERONE','flgN'),
('COG2882','K02413','CHAPERONE','fliJ'),
('COG1516','K02422','CHAPERONE','fliS'),
('COG2747','K02398','REGULATOR','flgM'),
]
destDir = '/home/jaggu/research/projectFiles/operons/graphFiles/flagella'
fname = 'flagella.COG.attr.tab'
ofile = open(os.path.join(destDir,fname),'w')
header = '\t'.join(['COG','KEGG','FUNCTION','GENE-NAME','PATHWAY-PRESENT','\n'])
ofile.write(header)
for cog, kog,func,gname in cog_func_list:
line = '\t'.join([cog,kog,func,gname,'YES','\n'])
ofile.write(line)
ofile.close()
print "Nodes (KEGG pathway present) with color attributes written"
In [9]:
cog_notPathway_func_list = [('COG4791','K0NONE','TYPE III SS-No','escT'),
('COG4789','K0NONE','TYPE III SS-No','escV'),
('COG4790','K0NONE','TYPE III SS-No','escR'),
('COG5443','K06601','REGULATOR','flbT'),
('COG1582','K02385','HOOK','flbD'),
('COG1419','K02404','TYPE III SS-No','flhF'),
('COG4786','K0NONE','BASAL BODY','flgG'),
('COG4787','K0NONE','BASAL BODY','flgF'),
('COG1776','K03410','CHEMOTAXIS','cheC'),
('COG1705','K02395','REGULATOR','flgJ'),
('COG3951','K0NONE','BASAL BODY','none'),
('COG1580','K02415','BASAL BODY','fliL'),
('COG3334','K0NONE','CHAPERONE','motE'),
('COG4465','K0NONE','REGULATOR','codY'),
('COG3143','K03414','CHEMOTAXIS','cheZ'),
('COG1334','K06603','FILAMENT','flaG')]
destDir = '/home/jaggu/research/projectFiles/operons/graphFiles/flagella'
fname = 'flagella.COG.notInPathway.attr.tab'
ofile = open(os.path.join(destDir,fname),'w')
header = '\t'.join(['COG','KEGG','FUNCTION','GENE-NAME','PATHWAY-PRESENT','\n'])
ofile.write(header)
for cog, kog,func,gname in cog_notPathway_func_list:
line = '\t'.join([cog,kog,func,gname,'NO','\n'])
ofile.write(line)
ofile.close()
print "Nodes (Genes not present in KEGG pathway) with color attributes written"
orgDict - Dictionary - {organism (three letter code) : organism Name, accession number},...
fla_cogList_cutoff - List - [cogA, cogB, COUNT, dirFGOC, FGOC,COUNT(COG-A),COUNT(COG-B)],...;
This also has cutoff score and all unassigned COG members are removed
org_cogPair_dict - Dictionary - {(organism Name, accession number): (COG-A, COG-B), ...},...;
Unassigned COG genes are not discarded.
org_FlagellaPairs_dict - Dictionary - {organism (three letter code) : [(COG-A,COG-B),...]...;
cogPairs_fla_cutoff_allOrg_list - List - [(COG-A,COG-B),....]; This contains '-' or unassigned COG group.
'AllBacteria.cutoff.graph.tab' - Graph File - COG-A, COG-B, count, dfgoc, fgoc, aNbr, bNbr,...
A Tab delimited file for all COG pairs occuring for the flagellar assembly. The file contains unassigned COG groups (assigned as nan302 (for example)).
In [104]:
"""
I am going to provide an annotation file for the edges in the Allbacteria.cutoff.graph.tab file. The all flagella
graph (brings in the '-' or unassigned COG). The mapping may be a little more inelegant. The AllBacteria graph has unassigned COG as nan[#number]. But the org_flagella pairs is in the form
COG1,'-' while the flagellar COG Pairs in organisms is assigned as '-'.
"""
def isPairPresent(cog1,cog2,flaCOGPairs):
flag = True
if (cog1,cog2) in flaCOGPairs:flag = True
else: flag = False
return flag
def makeEdgeAttribution(org,allCogPair_edgeAttr):
orgFlaPairs = org_FlagellaPairs_dict[org]
newPair_list = list()
for line in allCogPair_edgeAttr:
cog1,cog2,count,dfgoc,fgoc,anbr,bnbr,edgeAttr = line
if cog1.startswith('nan'):
anbr = 0
if isPairPresent('-',cog2,orgFlaPairs):
edgeAttr+='_'+str(org)
elif cog2.startswith('nan'):
bnbr = 0
if isPairPresent(cog1,'-',orgFlaPairs):
edgeAttr+='_'+str(org)
else:
if isPairPresent(cog1,cog2,orgFlaPairs):
edgeAttr+='_'+str(org)
newPair_list.append([cog1,cog2,count,dfgoc,fgoc,anbr,bnbr,edgeAttr])
return newPair_list
def mapAttributionAll(lines):
allpair_edge = list()
edgeAttr = ['all']
for line in lines[1:]:
val = line.strip().split('\t')
val.extend(edgeAttr)
allpair_edge.append(val)
return allpair_edge
def overlayOrg(orgL):
"""
This function overlays single or pair of organism on the metagenome graph
"""
allCogPair_edgeAttr = mapAttributionAll(lines)
destDir = '/home/jaggu/research/projectFiles/operons/graphFiles/flagella/overlayOrgs'
oname = '_'.join(org for org in orgL)
fname2 = oname + '.flagella.overlayAll.graph.tab'
ofile = open(os.path.join(destDir,fname2),'w')
header = '\t'.join(['COG-A','COG-B','COUNT','dirFGOC','FGOC','COUNT(COG-A)','COUNT(COG-B)','ORG','\n'])
ofile.write(header)
for org in orgL:
allCogPair_edgeAttr = makeEdgeAttribution(org,allCogPair_edgeAttr)
for item in allCogPair_edgeAttr:
ofile.write('\t'.join(map(str,item))+'\n')
ofile.close()
print "flagella overlay on the metagenome graph for organism done : %s ;"%(oname)
print "Graph file created : %s"%(fname2)
return True
"""
Retrieve the big metagenome graph for the flagella assembly genes (both the present). Overlay the graphs for the
queried organisms.
"""
allCogPair_edgeAttr = list()
destDir = '/home/jaggu/research/projectFiles/operons/graphFiles/flagella'
fname1 = 'AllBacteria.cutoff.graph.tab'
ifile = open(os.path.join(destDir,fname1),'r')
lines = ifile.readlines()
ifile.close()
#orgL = ['eco','pae']
allOrg = orgDict.keys()
allOrgPair = [('eco',org) for org in allOrg]
for orgL in allOrgPair:
overlayOrg(orgL)
In [ ]:
In [20]:
# Calculating an organism gene order score (but only between flagella cogPairs) and a matrix between the organisms
# With cutoff applied to the metagenomic flagella: Number of cogPairs = 51;
# Then clustering it to see if we can get
"""
variables required - (1) fla_cogList_cutoff (also does not have NAN) (2) orgDict - {org:orgName} (3)
"""
from __future__ import division
destDir = '/home/jaggu/research/projectFiles/operons/graphFiles/flagella'
org_FlagellaPairs_noNAN_dict = dict()
def getCOGpair(org):
cogPair = list()
orgFname = os.path.join(destDir,org+'.graph.tab')
ifile = open(orgFname,'r')
lines = ifile.readlines()
ifile.close()
for line in lines[1:]:
cog1, cog2 = line.split('\t')[0], line.split('\t')[1]
if not (cog1 == '-' or cog2 == '-'):
cogPair.extend([(cog1,cog2)])
return cogPair
# Getting COGpair of organisms and removing '-' in COG pair
for org in orgDict.keys():
org_FlagellaPairs_noNAN_dict[org]=getCOGpair(org)
# Calculating the organism flagella comparison score; A matrix of all by all organism and the value being
# Score between number of COG pairs between the two organisms.
def intersect(a,b):
return list(set(a)&set(b))
org_flaCompare_dict = dict()
for org1 in orgDict.keys():
for org2 in orgDict.keys():
cogPairList_1 = org_FlagellaPairs_noNAN_dict[org1]
cogPairList_2 = org_FlagellaPairs_noNAN_dict[org2]
nbr_pair = len(intersect(cogPairList_1,cogPairList_2))
nbr_org1 = len(set(cogPairList_1))
org_flaCompare_dict[(org1,org2)]= nbr_pair/nbr_org1
print "Dictionary comparing flagellar COG pairs done : {((ORG1,ORG2) : Flagella Score),....}"
In [21]:
"""
Plotting a heat map between the comparison of organisms and the conservation of flagellar order
Need variables: org_flaCompare_dict
"""
% matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
imgDir = '/home/jaggu/research/projectFiles/operons/flagellaFigures'
# Sorting order of organism; Default is just the order entered; Can be changed by patristic distance
x_org_order = orgDict.keys()
y_org_order = x_org_order
col_labels = x_org_order
row_labels = y_org_order
data = np.zeros((len(x_org_order),len(y_org_order)),dtype=float)
for i, org1 in enumerate(x_org_order):
for j, org2 in enumerate(y_org_order):
data[(i,j)]= org_flaCompare_dict[(org1,org2)]
df = pd.DataFrame(data,columns=col_labels,index=row_labels)
# Heatmap
#axm = fig.add_axes([0.26,0.1,0.6,0.6]) # x-pos, y-pos, width, height
fig,axm = plt.subplots()
cax = axm.matshow(df,interpolation='nearest', cmap='Blues')
#fig.colorbar(cax)
axm.set_xticks(range(len(df.columns)),minor=False)
axm.set_yticks(range(len(df.index)),minor=False)
axm.set_xticklabels(list(df.columns),rotation='vertical')
axm.set_yticklabels(list(df.index))
# Saving figure
fname = 'flagella.orgCOGPairs.NoCluster.heatmap.svg'
f = os.path.join(imgDir,fname)
plt.savefig(f,dpi=300)
In [22]:
import pandas as pd
import scipy
import scipy.cluster.hierarchy as sch
D = data
df = pd.DataFrame(D,columns=col_labels,index=row_labels)
clusterMethod = 'single'
# Compute and plot dendrogram.
fig = plt.figure()
axd = fig.add_axes([0.09,0.1,0.2,0.6])
row_clusters = sch.linkage(df, method=clusterMethod,metric='euclidean')
# Row label 1, 2, distance, #members
row_dendr = sch.dendrogram(row_clusters,orientation='right',labels=row_labels)
# reorder data with respect to clustering
df_rowclust = df.ix[row_dendr['leaves'][::-1]]
axd.set_xticks([])
axd.set_yticks([])
# remove axes spines from dendrogram
for i in axd.spines.values():
i.set_visible(False)
# plot heatmap
axm = fig.add_axes([0.26,0.1,0.6,0.6]) # x-pos, y-pos, width, height
cax = axm.matshow(df_rowclust, interpolation='nearest', cmap='Blues')
fig.colorbar(cax)
axm.set_xticks(range(len(df_rowclust.columns)),minor=False)
axm.set_yticks(range(len(df_rowclust.index)),minor=False)
axm.set_xticklabels(list(df_rowclust.columns),rotation='vertical')
axm.set_yticklabels(list(df_rowclust.index))
# Saving figure
fname = 'flagella.orgCluster.method_'+clusterMethod+'.row.heatmap.svg'
f = os.path.join(imgDir,fname)
plt.savefig(f,dpi=300)
In [23]:
# Clustering by both row and column
import pandas as pd
import scipy
import scipy.cluster.hierarchy as sch
D = data
df = pd.DataFrame(D,columns=col_labels,index=row_labels)
# Compute pairwise distances for columns
col_clusters = sch.linkage(df.T, method='complete',metric='euclidean')
row_clusters = sch.linkage(df, method='complete',metric='euclidean')
# plot column dendrogram
sch.set_link_color_palette(['black'])
fig = plt.figure()
axd2 = fig.add_axes([0.38,0.74,0.36,0.10])
col_dendr = sch.dendrogram(col_clusters, orientation='top',
color_threshold=np.inf) # makes dendrogram black)
axd2.set_xticks([])
axd2.set_yticks([])
# plot row dendrogram
axd1 = fig.add_axes([0.09,0.1,0.2,0.6])
row_dendr = sch.dendrogram(row_clusters, orientation='right',
count_sort='ascending',
color_threshold=np.inf) # makes dendrogram black
axd1.set_xticks([])
axd1.set_yticks([])
# remove axes spines from dendrogram
for i,j in zip(axd1.spines.values(), axd2.spines.values()):
i.set_visible(False)
j.set_visible(False)
# reorder columns and rows with respect to the clustering
df_rowclust = df.ix[row_dendr['leaves'][::-1]]
df_rowclust.columns = [df_rowclust.columns[col_dendr['leaves']]]
# plot heatmap
axm = fig.add_axes([0.26,0.1,0.6,0.6])
cax = axm.matshow(df_rowclust, interpolation='nearest', cmap='Blues')
fig.colorbar(cax)
axm.set_xticks(range(len(df_rowclust.columns)),minor=False)
axm.set_yticks(range(len(df_rowclust.index)),minor=False)
axm.set_xticklabels(list(df_rowclust.columns),rotation='vertical')
axm.set_yticklabels(list(df_rowclust.index))
plt.show()