Comparing the flagella operons across selected species

The underlying idea is to map the different flagella genes (COG) in selected species of bacteria to the operon graph. I want to see if we can understand the structure of the operons and the different path each organism took to establish its flagella structure.Is it feasible then to engineer E.coli to take on different flagella structures.

The selected bacterial species are -

(a) Monotrichous (polar): Contains only a single sheathed polar flagella

  • Vibrio cholerae - vch; Vibrio_cholerae_O1_biovar_El_Tor_N16961_uid57623; NC_002505; REF : Richardson K. 1991. Infection and Immunity. 59 (8): 2727; [link](http://iai.asm.org/content/59/8/2727)
  • Pseudomonas aueriginosa - pae; Pseudomonas_aeruginosa_PAO1_uid57945; NC_002516; REF : Bucior I, et.al. 2012. Plos Pathogens ; [DOI: 10.1371/journal.ppat.1002616](http://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1002616)

(b) Lophotrichous : Contains multiple flagella but all at a pole. They may be sheathed as in Helicobacter

  • Helicobacter pylori - hpg; Helicobacter_pylori_G27_uid59305; NC_011333 REF : Ottemann KM and Lowenthal AC. 2002. 70(4): 1984; [DOI : 10.1128/IAI.70.4.1984-1990.2002](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC127824/)
  • Vibrio fischeri - vfi; Vibrio_fischeri_ES114_uid58163; NC_006840 (In Kegg: Aliivibrio fischeri ES114) REF : Millikan DS and Ruby EG .2004. Journal of Bacteriology. 186(13): 4315; [DOI: 10.1128/JB.186.13.4315-4325.2004](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC421587/)

(c) Peritrichous : The flagella surrounds most of the bacteria.

  • E.coli - eco; Escherichia_coli_K_12_substr__MG1655_uid57779; NC_000913;
    REF : Copeland M, et.al. 2010. 76(4):1242; [DOI: 10.1128/AEM.02153-09](http://aem.asm.org/content/76/4/1241.full)
  • Salmonella typhi - seo; Salmonella_enterica_serovar_Typhimurium_14028S_uid86059; NC_016856;
    REF : Paul K, et.al. 2010. Molecular Cell. 1:128; [DOI: 10.1016/j.molcel.2010.03.001](http://www.sciencedirect.com/science/article/pii/S1097276510002005)
  • Bacillus subtilus - bsu; Bacillus_subtilis_168_uid57675; NC_000964;
    REF : Mukherjee S and Kearns D B. 2014. 48: 319; [DOI: 10.1146/annurev-genet-120213-092406](http://www.annualreviews.org/doi/full/10.1146/annurev-genet-120213-092406)

(d) Amphitrichous : I am considering both single (as in Campylobacter) as well as multiply flagellated (Rhodospirillum rubrum) organism. However these are bipolar i.e. the flagella exists only at the poles.

  • Campylobacter jejuni - cjj; Campylobacter_jejuni_81_176_uid58503; NC_008787;
    REF : Balaban M, Hendrixson D R. 2011. Plos Pathogens; [DOI: 10.1371/journal.ppat.1002420](http://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1002420)
  • Rhodospirillum rubrum -rru; Rhodospirillum_rubrum_ATCC_11170_uid57655; NC_007643
    REF : Lee AG, Fitzsimons JTR. 1975. Journal of General Microbiology. 93: 346; [link](http://mic.microbiologyresearch.org/content/journal/micro/10.1099/00221287-93-2-346?crawler=true)

(e) Periplasmic : The flagella is located in the periplasmic space. These are mainly spirochetes. The rotation of the flagella causes a cork-screw movement of the bacteria.

  • Borrelia burgdorferi - bbu; Borrelia_burgdorferi_B31_uid57581; NC_001318;
    REF : Motaleb MA, et.al. 2000. Proceedings of National Academy of Sciences. 97(20): 10899–10904; [DOI: 10.1073/pnas.200221797](http://www.pnas.org/content/97/20/10899.long)
  • Brachyspira hyodysenteriae: bhy; Brachyspira_hyodysenteriae_WA1_uid59291; NC_012225;
    REF : Li C, et.al. 2002. Proceedings of the National Academy of Sciences. 99 (9): 6169 [DOI: 10.1073/pnas.092010499](http://www.pnas.org/content/99/9/6169.full)

From Kegg database, Flagellar assembly is considered. Flagellar assembly pathway: (KEGG pathway ko: ko02040); The orthologs involved are - K02400 flagellar biosynthesis protein FlhA
K02401 flagellar biosynthetic protein FlhB
K02421 flagellar biosynthetic protein FliR
K13820 flagellar biosynthetic protein FliR/FlhB
K02420 flagellar biosynthetic protein FliQ
K02419 flagellar biosynthetic protein FliP
K02418 flagellar protein FliO/FliZ
K02417 flagellar motor switch protein FliN/FliY
K02416 flagellar motor switch protein FliM
K02557 chemotaxis protein MotB
K02556 chemotaxis protein MotA
K02402 flagellar transcriptional activator FlhC ; Unmapped to COG
K02403 flagellar transcriptional activator FlhD ; Unmapped to COG
K02390 flagellar hook protein FlgE
K02389 flagellar basal-body rod modification protein FlgD
K02414 flagellar hook-length control protein FliK
K02413 flagellar FliJ protein
K02412 flagellum-specific ATP synthase
K02411 flagellar assembly protein FliH
K02410 flagellar motor switch protein FliG
K02409 flagellar M-ring protein FliF
K02408 flagellar hook-basal body complex protein FliE
K02388 flagellar basal-body rod protein FlgC
K02387 flagellar basal-body rod protein FlgB
K02386 flagella basal body P-ring formation protein FlgA
K02398 negative regulator of flagellin synthesis FlgM
K02399 flagella synthesis protein FlgN
K02391 flagellar basal-body rod protein FlgF
K02392 flagellar basal-body rod protein FlgG
K02393 flagellar L-ring protein precursor FlgH
K02394 flagellar P-ring protein precursor FlgI
K02396 flagellar hook-associated protein 1 FlgK
K02397 flagellar hook-associated protein 3 FlgL
K02406 flagellin
K02407 flagellar hook-associated protein 2
K02422 flagellar protein FliS
K02423 flagellar protein FliT ; Unmapped to COG


In [33]:
# Loading dictionaries
import os
import pickle 
import time
import sys

def loadPkl(fname):
    pklDir = '/home/jaggu/research/projectFiles/operons/pklFiles'
    f = os.path.join(pklDir,fname)
    db = pickle.load(open(f))
    return db

kog_cogList_dict = loadPkl('kog_cogList.dict.pkl')
cog_kogList_dict = loadPkl('cog_kogList.dict.pkl')
org_cogPair_dict = loadPkl('org_cogPair.dict.pkl')
cogPair_fgocInfo_dict = loadPkl('cogPair_fgocInfo.dict.pkl')

print "Dictionaries loaded ",time.ctime()


Dictionaries loaded  Sat Oct 24 14:32:26 2015

In [4]:
orgDict = {'vch': ('Vibrio_cholerae_O1_biovar_El_Tor_N16961_uid57623','NC_002505'),
           'pae': ('Pseudomonas_aeruginosa_PAO1_uid57945','NC_002516'),
           'hpg': ('Helicobacter_pylori_G27_uid59305','NC_011333'),
           'vfi': ('Vibrio_fischeri_ES114_uid58163','NC_006840'),
           'eco': ('Escherichia_coli_K_12_substr__MG1655_uid57779','NC_000913'),
           'seo': ('Salmonella_enterica_serovar_Typhimurium_LT2_uid57799','NC_003197'),
           'bsu': ('Bacillus_subtilis_168_uid57675','NC_000964'),
           'cjj': ('Campylobacter_jejuni_81_176_uid58503','NC_008787'),
           'rru': ('Rhodospirillum_rubrum_ATCC_11170_uid57655','NC_007643'),
           'bbu': ('Borrelia_burgdorferi_B31_uid57581','NC_001318'),
           'bhy': ('Brachyspira_hyodysenteriae_WA1_uid59291','NC_012225')
          }
print "Organism dictionary created"


Organism dictionary created

In [46]:
# Flagella list
fla_kogList = [     'K02400','K02401','K02421','K13820','K02420','K02419','K02418','K02417','K02416','K02557',\
                    'K02556','K02402','K02403','K02390','K02389','K02414','K02413','K02412','K02411','K02410',\
                    'K02409','K02408','K02388','K02387','K02386','K02398','K02399','K02391','K02392','K02393',\
                    'K02394','K02396','K02397','K02406','K02407','K02422','K02423']
# COG List
fla_cogList = list()

for item in fla_kogList:
    try: 
        cogL = kog_cogList_dict[item]
    except KeyError:
        continue;
    fla_cogList.extend(cogL)
allCogPair = cogPair_fgocInfo_dict.keys()
fla_cogList_cutoff = list()

# Making a metagenome graph
cutoff = 0.1
destDir = '/home/jaggu/research/projectFiles/operons/graphFiles/flagella'
fname1 = 'AllBacteria.cutoff.noNAN.graph.tab'
fname2 = 'AllBacteria.cutoff.graph.tab'
ofile1 = open(os.path.join(destDir,fname1),'w')
ofile2 = open(os.path.join(destDir,fname2),'w')
header = '\t'.join(['COG-A','COG-B','COUNT','dirFGOC','FGOC','COUNT(COG-A)','COUNT(COG-B)','\n'])
ofile1.write(header)
ofile2.write(header)

for cog1,cog2 in allCogPair:
    if cog1 in fla_cogList or cog2 in fla_cogList:
    #values: [cogA, cogB, count, dfgoc, fgoc, aNbr, bNbr] 
    # '-': This is replaced with nan
        vals = cogPair_fgocInfo_dict[(cog1,cog2)]
        dfgoc = float(vals[3])
        if dfgoc > cutoff:
            cogPairs_fla_cutoff_allOrg_list.extend([(cog1,cog2)])
            if not (cog1 == '-' or cog2 == '-'):
                info1 = '\t'.join(map(str,vals))+'\n'
                ofile1.write(info1)
                fla_cogList_cutoff.append(vals)
            elif cog1 == '-' and cog2 == '-':
                continue;
            else:
                info2 = '\t'.join(map(str,vals))+'\n'
                ofile2.write(info2)

            
ofile1.close()
# Appending the graph file (with unassigned COG) with the graph file 
ofile1 = open(os.path.join(destDir,fname1),'r')
lines = ofile1.readlines()[1:]
for line in lines:
    ofile2.write(line)

ofile2.close()

print "Flagella assembly pathway as COG list made"
print "Operon graph with flagella COG genes made "


Flagella assembly pathway as COG list made
Operon graph with flagella COG genes made 

The total number of orthologous group assigned for Flagella assembly are -

  • KEGG : 37;
  • COG : 34; Unmapped groups are K02402 (flhC) ,K02403 (flhD) and K02423 (fliT: flagellar protein)
  • Using COG pairs (cutoff=0.1 and non annotated groups ignored), number of Nodes = 47 ; Edges = 51

In [106]:
# Defining organisms; This is where we can change or iterate through organisms


destDir = '/home/jaggu/research/projectFiles/operons/graphFiles/flagella'
org_FlagellaPairs_dict = dict()

def makeOperon(org,orgName,fla_cogList):
    cogPair = list()
    fname = org + '.graph.tab'
    ofile = open(os.path.join(destDir,fname),'w')
    header = '\t'.join(['COG-A','COG-B','\n'])
    ofile.write(header)

    allCogPairs = org_cogPair_dict[orgName]
    for cogA,cogB in allCogPairs:
        if cogA in fla_cogList or cogB in fla_cogList:
            ofile.write(cogA+'\t'+cogB+'\t'+'\n')
            cogPair.extend([(cogA,cogB)])
    ofile.close()
    org_FlagellaPairs_dict[org]=cogPair
    print "Flagella for %s org done "%(org)
    return True

def savePkl(db,pklFname):
    pklPath = '/home/jaggu/research/projectFiles/operons/pklFiles'
    f = os.path.join(pklPath,pklFname)
    pickle.dump(db,open(f,'w'))
    return 

for org, orgName in orgDict.items():
    makeOperon(org,orgName,fla_cogList)

savePkl(org_FlagellaPairs_dict,'selected_org.flagellaCOGPairs.dict.pkl')

print "Completed graph file and dictionary (contains unassigned COG :'-')"


Flagella for eco org done 
Flagella for bsu org done 
Flagella for bhy org done 
Flagella for seo org done 
Flagella for rru org done 
Flagella for cjj org done 
Flagella for vfi org done 
Flagella for vch org done 
Flagella for pae org done 
Flagella for bbu org done 
Flagella for hpg org done 
Completed graph file and dictionary (contains unassigned COG :'-')

In [8]:
# Coloring Nodes; Associating COG groups with functional linkage 
import os

cog_func_list = [('COG1345','K02407','CAP','fliD'),
                 ('COG1344','K02397','HOOK ASSOCIATED','flgL'),
                 ('COG1256','K02396','HOOK ASSOCIATED','flgK'),
                 ('COG1344','K02406','FILAMENT','fliC'),
                 ('COG3144','K02414','HOOK','fliK'),
                 ('COG1843','K02389','HOOK','flgD'),
                 ('COG1749','K02392','DISTAL RING','flgG'),
                 ('COG2063','K02393','L RING','flgH'),
                 ('COG1749','K02391','P ROD','flgF'),
                 ('COG1706','K02394','P RING','flgI'),
                 ('COG1677','K02408','P ROD','fliE'),
                 ('COG1815','K02387','P ROD','flgB'),
                 ('COG1558','K02388','P ROD','flgC'),
                 ('COG1749','K02390','HOOK','flgE'),                 
                 ('COG1766','K02409','MS RING','fliF'),
                 ('COG1536','K02410','C RING','fliG'),
                 ('COG1868','K02416','C RING','fliM'),
                 ('COG1886','K02417','C RING','fliN'),
                 ('COG1360','K02557','MOTOR','motB'),
                 ('COG1291','K02556','MOTOR','motA'),
                 ('COG1298','K02400','TYPE III SS','flhA'),
                 ('COG1377','K02401','TYPE III SS','flhB'),
                 ('COG1317','K02411','TYPE III SS','fliH'),
                 ('COG1157','K02412','TYPE III SS','fliI'),
                 ('COG3190','K02418','TYPE III SS','fliO'),
                 ('COG1987','K02420','TYPE III SS','fliQ'),
                 ('COG1338','K02419','TYPE III SS','fliP'),
                 ('COG1684','K02421','TYPE III SS','fliR'),
                 ('COG1684','K02415','BASAL BODY','fliI'),
                 ('COG1261','K02386','BASAL BODY','flgA'),
                 ('COG3418','K02399','CHAPERONE','flgN'),
                 ('COG2882','K02413','CHAPERONE','fliJ'),
                 ('COG1516','K02422','CHAPERONE','fliS'),
                 ('COG2747','K02398','REGULATOR','flgM'),
                ]


destDir = '/home/jaggu/research/projectFiles/operons/graphFiles/flagella'
fname = 'flagella.COG.attr.tab'
ofile = open(os.path.join(destDir,fname),'w')
header = '\t'.join(['COG','KEGG','FUNCTION','GENE-NAME','PATHWAY-PRESENT','\n'])
ofile.write(header)
for cog, kog,func,gname in cog_func_list:
    line = '\t'.join([cog,kog,func,gname,'YES','\n'])
    ofile.write(line)
ofile.close()
print "Nodes (KEGG pathway present) with color attributes written"


Nodes (KEGG pathway present) with color attributes written

In [9]:
cog_notPathway_func_list = [('COG4791','K0NONE','TYPE III SS-No','escT'),
                 ('COG4789','K0NONE','TYPE III SS-No','escV'),
                 ('COG4790','K0NONE','TYPE III SS-No','escR'),             
                 ('COG5443','K06601','REGULATOR','flbT'),
                 ('COG1582','K02385','HOOK','flbD'),
                 ('COG1419','K02404','TYPE III SS-No','flhF'),
                 ('COG4786','K0NONE','BASAL BODY','flgG'),
                 ('COG4787','K0NONE','BASAL BODY','flgF'),
                 ('COG1776','K03410','CHEMOTAXIS','cheC'),
                 ('COG1705','K02395','REGULATOR','flgJ'),
                 ('COG3951','K0NONE','BASAL BODY','none'),
                 ('COG1580','K02415','BASAL BODY','fliL'),
                 ('COG3334','K0NONE','CHAPERONE','motE'),
                 ('COG4465','K0NONE','REGULATOR','codY'),
                 ('COG3143','K03414','CHEMOTAXIS','cheZ'),
                 ('COG1334','K06603','FILAMENT','flaG')]

destDir = '/home/jaggu/research/projectFiles/operons/graphFiles/flagella'
fname = 'flagella.COG.notInPathway.attr.tab'
ofile = open(os.path.join(destDir,fname),'w')
header = '\t'.join(['COG','KEGG','FUNCTION','GENE-NAME','PATHWAY-PRESENT','\n'])
ofile.write(header)
for cog, kog,func,gname in cog_notPathway_func_list:
    line = '\t'.join([cog,kog,func,gname,'NO','\n'])
    ofile.write(line)
ofile.close()
print "Nodes (Genes not present in KEGG pathway) with color attributes written"


Nodes (Genes not present in KEGG pathway) with color attributes written

Graphing an organism's operon path on top of the all organism path.

In essence I am trying to see how the graph matches to the path taken by one organism and compare with a path taken by another. Dictionaries that needs to be imported -

orgDict - Dictionary - {organism (three letter code) : organism Name, accession number},...
fla_cogList_cutoff - List - [cogA, cogB, COUNT, dirFGOC, FGOC,COUNT(COG-A),COUNT(COG-B)],...;
This also has cutoff score and all unassigned COG members are removed
org_cogPair_dict - Dictionary - {(organism Name, accession number): (COG-A, COG-B), ...},...;
Unassigned COG genes are not discarded.
org_FlagellaPairs_dict - Dictionary - {organism (three letter code) : [(COG-A,COG-B),...]...;
cogPairs_fla_cutoff_allOrg_list - List - [(COG-A,COG-B),....]; This contains '-' or unassigned COG group.
'AllBacteria.cutoff.graph.tab' - Graph File - COG-A, COG-B, count, dfgoc, fgoc, aNbr, bNbr,...
A Tab delimited file for all COG pairs occuring for the flagellar assembly. The file contains unassigned COG groups (assigned as nan302 (for example)).


In [104]:
"""
I am going to provide an annotation file for the edges in the Allbacteria.cutoff.graph.tab file. The all flagella 
graph (brings in the '-' or unassigned COG). The mapping may be a little more inelegant. The AllBacteria graph has unassigned COG as nan[#number]. But the org_flagella pairs is in the form 
COG1,'-' while the flagellar COG Pairs in organisms is assigned as '-'. 
"""

def isPairPresent(cog1,cog2,flaCOGPairs):
    flag = True
    if (cog1,cog2) in flaCOGPairs:flag = True
    else: flag = False
    return flag
        
    
def makeEdgeAttribution(org,allCogPair_edgeAttr):
    orgFlaPairs = org_FlagellaPairs_dict[org]
    newPair_list = list()
    for line in allCogPair_edgeAttr:
        cog1,cog2,count,dfgoc,fgoc,anbr,bnbr,edgeAttr = line
        if cog1.startswith('nan'): 
            anbr = 0
            if isPairPresent('-',cog2,orgFlaPairs):
                edgeAttr+='_'+str(org)
        elif cog2.startswith('nan'):
            bnbr = 0
            if isPairPresent(cog1,'-',orgFlaPairs):
                edgeAttr+='_'+str(org)
        else:
            if isPairPresent(cog1,cog2,orgFlaPairs):
                edgeAttr+='_'+str(org)
        newPair_list.append([cog1,cog2,count,dfgoc,fgoc,anbr,bnbr,edgeAttr])
    return newPair_list

def mapAttributionAll(lines):
    allpair_edge = list()
    edgeAttr = ['all']
    for line in lines[1:]:
        val = line.strip().split('\t')
        val.extend(edgeAttr)
        allpair_edge.append(val)
    return allpair_edge

def overlayOrg(orgL):
    """
    This function overlays single or pair of organism on the metagenome graph
    """
    allCogPair_edgeAttr = mapAttributionAll(lines)

    destDir = '/home/jaggu/research/projectFiles/operons/graphFiles/flagella/overlayOrgs'
    oname = '_'.join(org for org in orgL)
    fname2 = oname + '.flagella.overlayAll.graph.tab' 
    ofile = open(os.path.join(destDir,fname2),'w')
    header = '\t'.join(['COG-A','COG-B','COUNT','dirFGOC','FGOC','COUNT(COG-A)','COUNT(COG-B)','ORG','\n'])
    ofile.write(header)

    for org in orgL:
        allCogPair_edgeAttr = makeEdgeAttribution(org,allCogPair_edgeAttr)

    for item in allCogPair_edgeAttr:
        ofile.write('\t'.join(map(str,item))+'\n')
    ofile.close()

    print "flagella overlay on the metagenome graph for organism done : %s ;"%(oname)
    print "Graph file created : %s"%(fname2)
    return True
    
"""
Retrieve the big metagenome graph for the flagella assembly genes (both the present). Overlay the graphs for the 
queried organisms.
"""
allCogPair_edgeAttr = list()
destDir = '/home/jaggu/research/projectFiles/operons/graphFiles/flagella'
fname1 = 'AllBacteria.cutoff.graph.tab'
ifile = open(os.path.join(destDir,fname1),'r')
lines = ifile.readlines()
ifile.close()

#orgL = ['eco','pae']
allOrg = orgDict.keys()
allOrgPair = [('eco',org) for org in allOrg]

for orgL in allOrgPair:
    overlayOrg(orgL)


Parsing organism eco ... 
Parsing organism eco ... 
flagella overlay on the metagenome graph for organism done : eco_eco ;
Graph file created : eco_eco.flagella.overlayAll.graph.tab
Parsing organism eco ... 
Parsing organism bsu ... 
flagella overlay on the metagenome graph for organism done : eco_bsu ;
Graph file created : eco_bsu.flagella.overlayAll.graph.tab
Parsing organism eco ... 
Parsing organism bhy ... 
flagella overlay on the metagenome graph for organism done : eco_bhy ;
Graph file created : eco_bhy.flagella.overlayAll.graph.tab
Parsing organism eco ... 
Parsing organism seo ... 
flagella overlay on the metagenome graph for organism done : eco_seo ;
Graph file created : eco_seo.flagella.overlayAll.graph.tab
Parsing organism eco ... 
Parsing organism rru ... 
flagella overlay on the metagenome graph for organism done : eco_rru ;
Graph file created : eco_rru.flagella.overlayAll.graph.tab
Parsing organism eco ... 
Parsing organism cjj ... 
flagella overlay on the metagenome graph for organism done : eco_cjj ;
Graph file created : eco_cjj.flagella.overlayAll.graph.tab
Parsing organism eco ... 
Parsing organism vfi ... 
flagella overlay on the metagenome graph for organism done : eco_vfi ;
Graph file created : eco_vfi.flagella.overlayAll.graph.tab
Parsing organism eco ... 
Parsing organism vch ... 
flagella overlay on the metagenome graph for organism done : eco_vch ;
Graph file created : eco_vch.flagella.overlayAll.graph.tab
Parsing organism eco ... 
Parsing organism pae ... 
flagella overlay on the metagenome graph for organism done : eco_pae ;
Graph file created : eco_pae.flagella.overlayAll.graph.tab
Parsing organism eco ... 
Parsing organism bbu ... 
flagella overlay on the metagenome graph for organism done : eco_bbu ;
Graph file created : eco_bbu.flagella.overlayAll.graph.tab
Parsing organism eco ... 
Parsing organism hpg ... 
flagella overlay on the metagenome graph for organism done : eco_hpg ;
Graph file created : eco_hpg.flagella.overlayAll.graph.tab

In [ ]:


Evaluating the conservation of Flagellar gene pairs between organisms

Evaluating the graph for all the organisms is difficult to visualize. For the bacterial species considered, we could calculate a pathway order score (similar to the genomic order score calculated earlier). The pathway order score would take the following formalism - $$ pathway\,order\,score (Organism 1 , Organism 2) = \frac {Count \,( COG \; Pair (A,B) )_{\small (between \; organism - 1,2)}}{Count (COG Pairs)_{\small(organism-1)}} $$ There is a direction for the score. I am hoping this would indicate the feasibility of changing the set of flagellar genes from one organism to another. This will account for missing cog pairs in organism 2 if they were present in organism 1. By making a heat map, we can visualize the similarity between organisms. A cluster of this will indicate which organisms are similar in terms of COG pairs. Perhaps there is a unifying reason the organisms will cluster. If the clustering is based on flagella arrangement, then it is home run!

In [20]:
# Calculating an organism gene order score (but only between flagella cogPairs) and a matrix between the organisms 
# With cutoff applied to the metagenomic flagella: Number of cogPairs = 51; 
# Then clustering it to see if we can get 
"""
variables required - (1) fla_cogList_cutoff (also does not have NAN) (2) orgDict - {org:orgName} (3)

"""
from __future__ import division

destDir = '/home/jaggu/research/projectFiles/operons/graphFiles/flagella'
org_FlagellaPairs_noNAN_dict = dict()

def getCOGpair(org):
    cogPair = list()
    orgFname = os.path.join(destDir,org+'.graph.tab')
    ifile = open(orgFname,'r')
    lines = ifile.readlines()
    ifile.close()
    for line in lines[1:]:
        cog1, cog2 = line.split('\t')[0], line.split('\t')[1]
        if not (cog1 == '-' or cog2 == '-'):
            cogPair.extend([(cog1,cog2)])
    return cogPair

# Getting COGpair of organisms and removing '-' in COG pair
for org in orgDict.keys():
    org_FlagellaPairs_noNAN_dict[org]=getCOGpair(org) 

# Calculating the organism flagella comparison score; A matrix of all by all organism and the value being 
# Score between number of COG pairs between the two organisms. 
def intersect(a,b):
    return list(set(a)&set(b))

org_flaCompare_dict = dict()

for org1 in orgDict.keys():
    for org2 in orgDict.keys():
        cogPairList_1 = org_FlagellaPairs_noNAN_dict[org1]
        cogPairList_2 = org_FlagellaPairs_noNAN_dict[org2]
        nbr_pair = len(intersect(cogPairList_1,cogPairList_2))
        nbr_org1 = len(set(cogPairList_1))
        org_flaCompare_dict[(org1,org2)]= nbr_pair/nbr_org1

print "Dictionary comparing flagellar COG pairs done : {((ORG1,ORG2) : Flagella Score),....}"


Dictionary comparing flagellar COG pairs done : {((ORG1,ORG2) : Flagella Score),....}

In [21]:
"""
Plotting a heat map between the comparison of organisms and the conservation of flagellar order
Need variables: org_flaCompare_dict
"""
% matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

imgDir = '/home/jaggu/research/projectFiles/operons/flagellaFigures'

# Sorting order of organism; Default is just the order entered; Can be changed by patristic distance
x_org_order = orgDict.keys() 
y_org_order = x_org_order
col_labels = x_org_order
row_labels = y_org_order

data = np.zeros((len(x_org_order),len(y_org_order)),dtype=float)

for i, org1 in enumerate(x_org_order):
    for j, org2 in enumerate(y_org_order):
        data[(i,j)]= org_flaCompare_dict[(org1,org2)]

df = pd.DataFrame(data,columns=col_labels,index=row_labels)
    
# Heatmap
#axm = fig.add_axes([0.26,0.1,0.6,0.6]) # x-pos, y-pos, width, height
fig,axm = plt.subplots()
cax = axm.matshow(df,interpolation='nearest', cmap='Blues')
#fig.colorbar(cax)
axm.set_xticks(range(len(df.columns)),minor=False)
axm.set_yticks(range(len(df.index)),minor=False)
axm.set_xticklabels(list(df.columns),rotation='vertical')
axm.set_yticklabels(list(df.index))

# Saving figure
fname = 'flagella.orgCOGPairs.NoCluster.heatmap.svg'
f = os.path.join(imgDir,fname)
plt.savefig(f,dpi=300)



In [22]:
import pandas as pd
import scipy
import scipy.cluster.hierarchy as sch

D = data
df = pd.DataFrame(D,columns=col_labels,index=row_labels)
clusterMethod = 'single'

# Compute and plot dendrogram.
fig = plt.figure()
axd = fig.add_axes([0.09,0.1,0.2,0.6])
row_clusters = sch.linkage(df, method=clusterMethod,metric='euclidean') 
# Row label 1, 2, distance, #members
row_dendr = sch.dendrogram(row_clusters,orientation='right',labels=row_labels)

# reorder data with respect to clustering
df_rowclust = df.ix[row_dendr['leaves'][::-1]]
axd.set_xticks([])
axd.set_yticks([])

# remove axes spines from dendrogram
for i in axd.spines.values():
        i.set_visible(False)
        
# plot heatmap
axm = fig.add_axes([0.26,0.1,0.6,0.6]) # x-pos, y-pos, width, height
cax = axm.matshow(df_rowclust, interpolation='nearest', cmap='Blues')
fig.colorbar(cax)
axm.set_xticks(range(len(df_rowclust.columns)),minor=False)
axm.set_yticks(range(len(df_rowclust.index)),minor=False)
axm.set_xticklabels(list(df_rowclust.columns),rotation='vertical')
axm.set_yticklabels(list(df_rowclust.index))

# Saving figure
fname = 'flagella.orgCluster.method_'+clusterMethod+'.row.heatmap.svg'
f = os.path.join(imgDir,fname)
plt.savefig(f,dpi=300)



In [23]:
# Clustering by both row and column

import pandas as pd
import scipy
import scipy.cluster.hierarchy as sch

D = data
df = pd.DataFrame(D,columns=col_labels,index=row_labels)

# Compute pairwise distances for columns
col_clusters = sch.linkage(df.T, method='complete',metric='euclidean')
row_clusters = sch.linkage(df, method='complete',metric='euclidean') 

# plot column dendrogram
sch.set_link_color_palette(['black'])
fig = plt.figure()

axd2 = fig.add_axes([0.38,0.74,0.36,0.10]) 
col_dendr = sch.dendrogram(col_clusters, orientation='top',
                       color_threshold=np.inf) # makes dendrogram black)
axd2.set_xticks([])
axd2.set_yticks([])

# plot row dendrogram
axd1 = fig.add_axes([0.09,0.1,0.2,0.6])
row_dendr = sch.dendrogram(row_clusters, orientation='right',  
                       count_sort='ascending',
                       color_threshold=np.inf) # makes dendrogram black
axd1.set_xticks([])
axd1.set_yticks([])

# remove axes spines from dendrogram
for i,j in zip(axd1.spines.values(), axd2.spines.values()):
        i.set_visible(False)
        j.set_visible(False)
        
# reorder columns and rows with respect to the clustering
df_rowclust = df.ix[row_dendr['leaves'][::-1]]
df_rowclust.columns = [df_rowclust.columns[col_dendr['leaves']]]

# plot heatmap
axm = fig.add_axes([0.26,0.1,0.6,0.6])
cax = axm.matshow(df_rowclust, interpolation='nearest', cmap='Blues')
fig.colorbar(cax)
axm.set_xticks(range(len(df_rowclust.columns)),minor=False)
axm.set_yticks(range(len(df_rowclust.index)),minor=False)
axm.set_xticklabels(list(df_rowclust.columns),rotation='vertical')
axm.set_yticklabels(list(df_rowclust.index))

plt.show()