MYC analyses in Pathways

In Leiserson et al. MYC is left out, though it is presented as a rather "hot" node with frequent copynumber amplifications.

Leiserson:Cancer cells are thought to harbor multiple driver mutations that perturb multiple biological functions15. Consistent with this model, we found that four pairs of subnetworks, including TP53 and NOTCH signaling, TP53 and RTK signaling, PI3K signaling and the cohesin complex, and PI3K signaling and the ASCOM complex, exhibited significant co-occurrence (P < 0.05, multiple-hypotheses corrected) across the pan-cancer cohort (Fig. 2b) or in individual cancer types (Fig. 2c). Multiple pairs of genes within these subnetworks showed co-occurring mutations (Supplementary Table 24). In contrast, mutually exclusive mutations are typically expected within a pathway and not across pathways32,33.

Ref 32 Yeang, C.-H., McCormick, F. & Levine, A. Combinatorial patterns of somatic gene mutations in cancer. FASEB J. 22, 2605–2622 (2008).

Ref 33 Vandin, F., Upfal, E. & Raphael, B.J. De novo discovery of mutated driver pathways in cancer. Genome Res. 22, 375–385 (2012).

Where does MYC belong to?


In [1]:
import NotebookImport
from TCGA_analysis_BRCA_import import *
from IPython.display import HTML


importing IPython notebook from TCGA_analysis_BRCA_import.ipynb
Loading Focal Amplification data
Loading RNASeq data
Loading RNASeq data of controls
Loading CNV Data
Loading Somatic Mutation Data
TCGA-A8-A07C-01A not found
TCGA-BH-A0HN-01A not found
TCGA-BH-A0B8-01A not found
TCGA-BH-A0HF-01A not found
TCGA-A7-A4SC-01A not found
TCGA-AR-A0TU-01A not found
TCGA-AR-A1AT-01A not found
TCGA-E2-A1LS-01A not found
TCGA-A2-A0CZ-01A not found
TCGA-BH-A0HL-01A not found
TCGA-AN-A0G0-01A not found
TCGA-BH-A0B1-01A not found
TCGA-B6-A0I8-01A not found
TCGA-B6-A0I6-01A not found
Loading Clinical Data
Samples: 1086
  --Focal Data: 1086
  --CNV:        1086
  --RNASeq:     1081
  --Somatic:    968
  --Clinical:   1058

Which pathways are affected in MYC focally amplified samples

For all pathways described by Leiserson: Check which pathway is more often co-mutated by MYC


In [2]:
TP53_pathway=["CCND1","CDKN2A", "CUL9", "NPM1", "PTEN", "TP53","AGBL2", "ALS2CR8", "AMOTL1", "ANKRD12", "BRPF1", "CACHD1", "CARNS1", "CDKN2AIP", "CELSR3", "CHD8", "EPHA3", "HECW2", "IFT140", "IWS1", "MAGI2", "MAST3", "MDM4", "MLL5", "PLEKHA8", "PRDM2", "PRKRIR", "SCAPER", "SETD2", "SMG1", "SMG5", "SMG7", "SNRK", "SPTBN2", "STK11IP", "STRADA", "SWAP70", "TEP1", "TRIP12", "TTLL5", "UACA", "ZFP91", "ZMIZ1", "ZNF227", "ZNF668"]
STAG1_pathway=["PDS5B", "STAG1", "STAG2", "WAPAL","PDS5A", "RAD21", "SMC1A"]
MLL_pathway=["KDM6A", "MLL2", "MLL3","E2F3", "N4BP2", "PROSER1"]
BAP1_pathway=["ASXL1", "ASXL2", "BAP1","ANKRD17", "FOXK1", "FOXK2", "KDM1B"]
KRAS_pathway=["BRAF", "KRAS", "PIK3CA","ARHGAP4", "EIF2AK1", "FAM118B", "NRAS", "PLCE1", "RASGRF2", "RASGRP2", "RGL1", "VARS2"]
HLA_pathway=["B2M", "CD1D","HLA-A", "HLA-B", "P4HTM"]
HER2_pathway=["EGFR", "ERBB2","AREG", "ELF3", "ERBB4", "LRIG1", "OSMR"]
CDON_pathway=["BOC", "CDON"]
KEAP1_pathway=["KEAP1", "NFE2L2","CHD6", "PTMA", "WAC"]
SPOP_pathway=["MYD88", "SPOP"]
RUNX1_pathway=["CBFB", "RUNX1","ELF4"]
NOTCH_pathway=["FBXW7", "NOTCH1","CCNE1", "DCLRE1C", "DLL1", "DTX4", "FAM101A", "FOXM1", "JAG1", "KLF5", "LFNG", "LPP", "MAML1", "MMS22L", "NOTCH3", "NOTCH4", "SHPRH"]
ARID_pathway=["ARID1A", "PBRM1","ADNP", "ARID1B", "ARID2"]
SMARC_pathway=["SMARCB1", "SMARCA4"]
SMC_pathway=["NCAPD3","SMC4","NCAPG2","NCAPH2","SMC2","NCAPD2"]
CLASP_pathway=["CLASP1","CLASP2","CLIP2"]

pathways={"TP53_pathway": TP53_pathway, "STAG1_pathway":STAG1_pathway, "MLL_pathway":MLL_pathway,"BAP1_pathway":BAP1_pathway,"KRAS_pathway":KRAS_pathway, "HLA_pathway":HLA_pathway,  "HER2_pathway":HER2_pathway, "CDON_pathway":CDON_pathway, "KEAP1_pathway":KEAP1_pathway,"SPOP_pathway":SPOP_pathway, "RUNX1_pathway":RUNX1_pathway, "NOTCH_pathway":NOTCH_pathway, "ARID_pathway":ARID_pathway, "SMARC_pathway":SMARC_pathway, "SMC_pathway":SMC_pathway,"CLASP_pathway":CLASP_pathway }

MYC_pos_pathway_affected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }
MYC_pos_pathway_unaffected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }
MYC_neg_pathway_affected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }
MYC_neg_pathway_unaffected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }


for sample in samples.values():
    if not sample.somatic_mutation_data:
        continue
    for pathway in pathways.keys():
        pathway_affected = False
        for gene in pathways[pathway]:
            if gene in sample.genes_affected or sample.checkFocalGeneAmp(gene):
                pathway_affected = True
                break
        if pathway_affected and sample.checkFocalGeneAmp("MYC"):
            MYC_pos_pathway_affected[pathway] += 1
        elif not pathway_affected and sample.checkFocalGeneAmp("MYC"):
            MYC_pos_pathway_unaffected[pathway] += 1
        elif pathway_affected and not sample.checkFocalGeneAmp("MYC"):
            MYC_neg_pathway_affected[pathway] += 1
        elif not pathway_affected and not sample.checkFocalGeneAmp("MYC"):
            MYC_neg_pathway_unaffected[pathway] += 1

s = "<table>"
s += "<th></th><th colspan=\"4\">Sample</th><th colspan=\"3\">Fisher's exact test (two-sided)</th><th colspan=\"3\">Fisher's exact test (one-sided greater)</th><th colspan=\"3\">Fisher's exact test (one-sided less)</th></tr>"
s += "<tr><th>Pathway</th><th>MYC + pathway affected</th><th>MYC + pathway unaffected</th><th>MYC - pathway affected</th><th>MYC - pathway unaffected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th></tr>"

for pathway in pathways.keys():
   oddsratio, pvalue = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]])
   oddsratio_greater, pvalue_greater = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]], alternative='greater')
   oddsratio_less, pvalue_less = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]], alternative='less')
   s += "<tr><th>"+pathway+"</th><td>"+str(MYC_pos_pathway_affected[pathway])+"</td>"
   s += "<td>"+str(MYC_pos_pathway_unaffected[pathway])+"</td>"
   s += "<td>"+str(MYC_neg_pathway_affected[pathway])+"</td>"
   s += "<td>"+str(MYC_neg_pathway_unaffected[pathway])+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio)+"</td>"
   if pvalue < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue * 16)+"</td>"
   else:
      s += "<td>"+"{:.2f}".format(pvalue)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue * 16)+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio_greater)+"</td>"
   if pvalue_greater < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue_greater)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue_greater * 16)+"</td>"
   else:
      s += "<td>"+"{:.2f}".format(pvalue_greater)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue_greater * 16)+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio_less)+"</td>"
   if pvalue_less < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue_less)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue_less * 16)+"</td></tr>"
   else: 
      s += "<td>"+"{:.2f}".format(pvalue_less)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue_less * 16)+"</td></tr>"

s += "</table>"
h = HTML(s);h


Out[2]:
SampleFisher's exact test (two-sided)Fisher's exact test (one-sided greater)Fisher's exact test (one-sided less)
PathwayMYC + pathway affectedMYC + pathway unaffectedMYC - pathway affectedMYC - pathway unaffectedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value corrected
STAG1_pathway259548530.540.579.150.540.8814.130.540.304.77
KRAS_pathway21403735340.750.355.550.750.8814.060.750.192.97
CLASP_pathway457218862.960.071.072.960.071.072.960.9815.73
SMARC_pathway06198980.001.0016.000.001.0016.000.000.568.88
SPOP_pathway06159020.001.0016.000.001.0016.000.000.7211.55
RUNX1_pathway160588490.240.172.720.240.9815.700.240.101.59
MLL_pathway14471157922.050.030860.493832.050.023570.377112.050.9915.84
TP53_pathway49125443632.720.001590.025442.720.000850.013542.721.0016.00
ARID_pathway358668410.660.8012.720.660.8313.250.660.355.63
KEAP1_pathway457308772.050.162.572.050.162.572.050.9415.09
SMC_pathway259278801.100.7011.261.100.568.911.100.7311.61
NOTCH_pathway12491307771.460.264.191.460.172.701.460.9014.47
BAP1_pathway556588491.310.599.401.310.365.831.310.8012.82
CDON_pathway259168911.890.325.041.890.325.041.890.9014.42
HLA_pathway160278800.541.0016.000.540.8413.480.540.467.40
HER2_pathway18431677401.850.042390.678321.850.028900.462351.850.9915.78

Remove TP53 from TP53 pathway


In [3]:
TP53_pathway=["CCND1","CDKN2A", "CUL9", "NPM1", "PTEN", "AGBL2", "ALS2CR8", "AMOTL1", "ANKRD12", "BRPF1", "CACHD1", "CARNS1", "CDKN2AIP", "CELSR3", "CHD8", "EPHA3", "HECW2", "IFT140", "IWS1", "MAGI2", "MAST3", "MDM4", "MLL5", "PLEKHA8", "PRDM2", "PRKRIR", "SCAPER", "SETD2", "SMG1", "SMG5", "SMG7", "SNRK", "SPTBN2", "STK11IP", "STRADA", "SWAP70", "TEP1", "TRIP12", "TTLL5", "UACA", "ZFP91", "ZMIZ1", "ZNF227", "ZNF668"]
STAG1_pathway=["PDS5B", "STAG1", "STAG2", "WAPAL","PDS5A", "RAD21", "SMC1A"]
MLL_pathway=["KDM6A", "MLL2", "MLL3","E2F3", "N4BP2", "PROSER1"]
BAP1_pathway=["ASXL1", "ASXL2", "BAP1","ANKRD17", "FOXK1", "FOXK2", "KDM1B"]
KRAS_pathway=["BRAF", "KRAS", "PIK3CA","ARHGAP4", "EIF2AK1", "FAM118B", "NRAS", "PLCE1", "RASGRF2", "RASGRP2", "RGL1", "VARS2"]
HLA_pathway=["B2M", "CD1D","HLA-A", "HLA-B", "P4HTM"]
HER2_pathway=["EGFR", "ERBB2","AREG", "ELF3", "ERBB4", "LRIG1", "OSMR"]
CDON_pathway=["BOC", "CDON"]
KEAP1_pathway=["KEAP1", "NFE2L2","CHD6", "PTMA", "WAC"]
SPOP_pathway=["MYD88", "SPOP"]
RUNX1_pathway=["CBFB", "RUNX1","ELF4"]
NOTCH_pathway=["FBXW7", "NOTCH1","CCNE1", "DCLRE1C", "DLL1", "DTX4", "FAM101A", "FOXM1", "JAG1", "KLF5", "LFNG", "LPP", "MAML1", "MMS22L", "NOTCH3", "NOTCH4", "SHPRH"]
ARID_pathway=["ARID1A", "PBRM1","ADNP", "ARID1B", "ARID2"]
SMARC_pathway=["SMARCB1", "SMARCA4"]
SMC_pathway=["NCAPD3","SMC4","NCAPG2","NCAPH2","SMC2","NCAPD2"]
CLASP_pathway=["CLASP1","CLASP2","CLIP2"]

pathways={"TP53_pathway": TP53_pathway, "STAG1_pathway":STAG1_pathway, "MLL_pathway":MLL_pathway,"BAP1_pathway":BAP1_pathway,"KRAS_pathway":KRAS_pathway, "HLA_pathway":HLA_pathway,  "HER2_pathway":HER2_pathway, "CDON_pathway":CDON_pathway, "KEAP1_pathway":KEAP1_pathway,"SPOP_pathway":SPOP_pathway, "RUNX1_pathway":RUNX1_pathway, "NOTCH_pathway":NOTCH_pathway, "ARID_pathway":ARID_pathway, "SMARC_pathway":SMARC_pathway, "SMC_pathway":SMC_pathway,"CLASP_pathway":CLASP_pathway }

MYC_pos_pathway_affected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }
MYC_pos_pathway_unaffected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }
MYC_neg_pathway_affected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }
MYC_neg_pathway_unaffected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }


for sample in samples.values():
    if not sample.somatic_mutation_data:
        continue
    for pathway in pathways.keys():
        pathway_affected = False
        for gene in pathways[pathway]:
            if gene in sample.genes_affected or sample.checkFocalGeneAmp(gene):
                pathway_affected = True
                break
        if pathway_affected and sample.checkFocalGeneAmp("MYC"):
            MYC_pos_pathway_affected[pathway] += 1
        elif not pathway_affected and sample.checkFocalGeneAmp("MYC"):
            MYC_pos_pathway_unaffected[pathway] += 1
        elif pathway_affected and not sample.checkFocalGeneAmp("MYC"):
            MYC_neg_pathway_affected[pathway] += 1
        elif not pathway_affected and not sample.checkFocalGeneAmp("MYC"):
            MYC_neg_pathway_unaffected[pathway] += 1

s = "<table>"
s += "<th></th><th colspan=\"4\">Sample</th><th colspan=\"3\">Fisher's exact test (two-sided)</th><th colspan=\"3\">Fisher's exact test (one-sided greater)</th><th colspan=\"3\">Fisher's exact test (one-sided less)</th></tr>"
s += "<tr><th>Pathway</th><th>MYC + pathway affected</th><th>MYC + pathway unaffected</th><th>MYC - pathway affected</th><th>MYC - pathway unaffected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th></tr>"

for pathway in pathways.keys():
   oddsratio, pvalue = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]])
   oddsratio_greater, pvalue_greater = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]], alternative='greater')
   oddsratio_less, pvalue_less = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]], alternative='less')
   s += "<tr><th>"+pathway+"</th><td>"+str(MYC_pos_pathway_affected[pathway])+"</td>"
   s += "<td>"+str(MYC_pos_pathway_unaffected[pathway])+"</td>"
   s += "<td>"+str(MYC_neg_pathway_affected[pathway])+"</td>"
   s += "<td>"+str(MYC_neg_pathway_unaffected[pathway])+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio)+"</td>"
   if pvalue < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue * 16)+"</td>"
   else:
      s += "<td>"+"{:.2f}".format(pvalue)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue * 16)+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio_greater)+"</td>"
   if pvalue_greater < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue_greater)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue_greater * 16)+"</td>"
   else:
      s += "<td>"+"{:.2f}".format(pvalue_greater)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue_greater * 16)+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio_less)+"</td>"
   if pvalue_less < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue_less)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue_less * 16)+"</td></tr>"
   else: 
      s += "<td>"+"{:.2f}".format(pvalue_less)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue_less * 16)+"</td></tr>"

s += "</table>"
h = HTML(s);h


Out[3]:
SampleFisher's exact test (two-sided)Fisher's exact test (one-sided greater)Fisher's exact test (one-sided less)
PathwayMYC + pathway affectedMYC + pathway unaffectedMYC - pathway affectedMYC - pathway unaffectedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value corrected
STAG1_pathway259548530.540.579.150.540.8814.130.540.304.77
KRAS_pathway21403735340.750.355.550.750.8814.060.750.192.97
CLASP_pathway457218862.960.071.072.960.071.072.960.9815.73
SMARC_pathway06198980.001.0016.000.001.0016.000.000.568.88
SPOP_pathway06159020.001.0016.000.001.0016.000.000.7211.55
RUNX1_pathway160588490.240.172.720.240.9815.700.240.101.59
MLL_pathway14471157922.050.030860.493832.050.023570.377112.050.9915.84
TP53_pathway34274015061.590.091.361.590.050.851.590.9715.52
ARID_pathway358668410.660.8012.720.660.8313.250.660.355.63
KEAP1_pathway457308772.050.162.572.050.162.572.050.9415.09
SMC_pathway259278801.100.7011.261.100.568.911.100.7311.61
NOTCH_pathway12491307771.460.264.191.460.172.701.460.9014.47
BAP1_pathway556588491.310.599.401.310.365.831.310.8012.82
CDON_pathway259168911.890.325.041.890.325.041.890.9014.42
HLA_pathway160278800.541.0016.000.540.8413.480.540.467.40
HER2_pathway18431677401.850.042390.678321.850.028900.462351.850.9915.78

Which pathways are affected in MYC CN>4 samples

For all pathways described by Leiserson: Check which pathway is more often co-mutated by MYC


In [4]:
TP53_pathway=["CCND1","CDKN2A", "CUL9", "NPM1", "PTEN", "TP53","AGBL2", "ALS2CR8", "AMOTL1", "ANKRD12", "BRPF1", "CACHD1", "CARNS1", "CDKN2AIP", "CELSR3", "CHD8", "EPHA3", "HECW2", "IFT140", "IWS1", "MAGI2", "MAST3", "MDM4", "MLL5", "PLEKHA8", "PRDM2", "PRKRIR", "SCAPER", "SETD2", "SMG1", "SMG5", "SMG7", "SNRK", "SPTBN2", "STK11IP", "STRADA", "SWAP70", "TEP1", "TRIP12", "TTLL5", "UACA", "ZFP91", "ZMIZ1", "ZNF227", "ZNF668"]
STAG1_pathway=["PDS5B", "STAG1", "STAG2", "WAPAL","PDS5A", "RAD21", "SMC1A"]
MLL_pathway=["KDM6A", "MLL2", "MLL3","E2F3", "N4BP2", "PROSER1"]
BAP1_pathway=["ASXL1", "ASXL2", "BAP1","ANKRD17", "FOXK1", "FOXK2", "KDM1B"]
KRAS_pathway=["BRAF", "KRAS", "PIK3CA","ARHGAP4", "EIF2AK1", "FAM118B", "NRAS", "PLCE1", "RASGRF2", "RASGRP2", "RGL1", "VARS2"]
HLA_pathway=["B2M", "CD1D","HLA-A", "HLA-B", "P4HTM"]
HER2_pathway=["EGFR", "ERBB2","AREG", "ELF3", "ERBB4", "LRIG1", "OSMR"]
CDON_pathway=["BOC", "CDON"]
KEAP1_pathway=["KEAP1", "NFE2L2","CHD6", "PTMA", "WAC"]
SPOP_pathway=["MYD88", "SPOP"]
RUNX1_pathway=["CBFB", "RUNX1","ELF4"]
NOTCH_pathway=["FBXW7", "NOTCH1","CCNE1", "DCLRE1C", "DLL1", "DTX4", "FAM101A", "FOXM1", "JAG1", "KLF5", "LFNG", "LPP", "MAML1", "MMS22L", "NOTCH3", "NOTCH4", "SHPRH"]
ARID_pathway=["ARID1A", "PBRM1","ADNP", "ARID1B", "ARID2"]
SMARC_pathway=["SMARCB1", "SMARCA4"]
SMC_pathway=["NCAPD3","SMC4","NCAPG2","NCAPH2","SMC2","NCAPD2"]
CLASP_pathway=["CLASP1","CLASP2","CLIP2"]

pathways={"TP53_pathway": TP53_pathway, "STAG1_pathway":STAG1_pathway, "MLL_pathway":MLL_pathway,"BAP1_pathway":BAP1_pathway,"KRAS_pathway":KRAS_pathway, "HLA_pathway":HLA_pathway,  "HER2_pathway":HER2_pathway, "CDON_pathway":CDON_pathway, "KEAP1_pathway":KEAP1_pathway,"SPOP_pathway":SPOP_pathway, "RUNX1_pathway":RUNX1_pathway, "NOTCH_pathway":NOTCH_pathway, "ARID_pathway":ARID_pathway, "SMARC_pathway":SMARC_pathway, "SMC_pathway":SMC_pathway,"CLASP_pathway":CLASP_pathway }

MYC_pos_pathway_affected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }
MYC_pos_pathway_unaffected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }
MYC_neg_pathway_affected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }
MYC_neg_pathway_unaffected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }


for sample in samples.values():
    if not sample.somatic_mutation_data:
        continue
    copynumber = "n/a"
    log2 = sample.getLog2FromGene(gene_positions['MYC'].chrom,gene_positions['MYC'].start, gene_positions['MYC'].end)
    if (log2 != None):
        copynumber = 2.0 * 2.0 ** log2
    else:
        copynumber="n/a"
    for pathway in pathways.keys():
        pathway_affected = False
        for gene in pathways[pathway]:
            if gene in sample.genes_affected or sample.checkFocalGeneAmp(gene):
                pathway_affected = True
                break
        if copynumber != "n/a" and pathway_affected and copynumber > 4:
            MYC_pos_pathway_affected[pathway] += 1
        elif copynumber != "n/a" and not pathway_affected and  copynumber > 4:
            MYC_pos_pathway_unaffected[pathway] += 1
        elif copynumber != "n/a" and pathway_affected and not  copynumber > 4:
            MYC_neg_pathway_affected[pathway] += 1
        elif copynumber != "n/a" and not pathway_affected and not  copynumber > 4:
            MYC_neg_pathway_unaffected[pathway] += 1

s = "<table>"
s += "<th></th><th colspan=\"4\">Sample</th><th colspan=\"3\">Fisher's exact test (two-sided)</th><th colspan=\"3\">Fisher's exact test (one-sided greater)</th><th colspan=\"3\">Fisher's exact test (one-sided less)</th></tr>"
s += "<tr><th>Pathway</th><th>MYC + pathway affected</th><th>MYC + pathway unaffected</th><th>MYC - pathway affected</th><th>MYC - pathway unaffected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th></tr>"

for pathway in pathways.keys():
   oddsratio, pvalue = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]])
   oddsratio_greater, pvalue_greater = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]], alternative='greater')
   oddsratio_less, pvalue_less = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]], alternative='less')
   s += "<tr><th>"+pathway+"</th><td>"+str(MYC_pos_pathway_affected[pathway])+"</td>"
   s += "<td>"+str(MYC_pos_pathway_unaffected[pathway])+"</td>"
   s += "<td>"+str(MYC_neg_pathway_affected[pathway])+"</td>"
   s += "<td>"+str(MYC_neg_pathway_unaffected[pathway])+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio)+"</td>"
   if pvalue < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue * 16)+"</td>"
   else:
      s += "<td>"+"{:.2f}".format(pvalue)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue * 16)+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio_greater)+"</td>"
   if pvalue_greater < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue_greater)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue_greater * 16)+"</td>"
   else:
      s += "<td>"+"{:.2f}".format(pvalue_greater)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue_greater * 16)+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio_less)+"</td>"
   if pvalue_less < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue_less)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue_less * 16)+"</td></tr>"
   else: 
      s += "<td>"+"{:.2f}".format(pvalue_less)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue_less * 16)+"</td></tr>"

s += "</table>"
h = HTML(s);h


Out[4]:
SampleFisher's exact test (two-sided)Fisher's exact test (one-sided greater)Fisher's exact test (one-sided less)
PathwayMYC + pathway affectedMYC + pathway unaffectedMYC - pathway affectedMYC - pathway unaffectedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value corrected
STAG1_pathway1111558000.130.015580.249260.131.0015.990.130.007210.11543
KRAS_pathway42703515040.860.548.630.860.7912.700.860.274.31
CLASP_pathway4108218341.470.528.321.470.335.231.470.8513.55
SMARC_pathway111188470.951.0016.000.950.6710.740.950.7211.52
SPOP_pathway011258500.001.0016.000.001.0016.000.000.548.63
RUNX1_pathway2110577980.250.050.880.250.9915.920.250.023520.37631
MLL_pathway16961137421.090.7712.281.090.426.781.090.6910.96
TP53_pathway90225023532.880.000010.000112.880.000000.000062.881.0016.00
ARID_pathway8104617941.001.0016.001.000.568.951.000.599.52
KEAP1_pathway5107298261.330.589.311.330.365.721.330.8112.94
SMC_pathway5107248311.620.375.911.620.243.811.620.8914.27
NOTCH_pathway19931237321.220.487.651.220.274.391.220.8112.94
BAP1_pathway8104558001.120.6911.011.120.457.181.120.7011.20
CDON_pathway3109158401.540.457.271.540.355.561.540.8513.67
HLA_pathway3109258300.911.0016.000.910.6510.370.910.599.43
HER2_pathway29831566991.570.060.901.570.038490.615811.570.9815.64

Remove TP53 from TP53 pathway


In [5]:
TP53_pathway=["CCND1","CDKN2A", "CUL9", "NPM1", "PTEN","AGBL2", "ALS2CR8", "AMOTL1", "ANKRD12", "BRPF1", "CACHD1", "CARNS1", "CDKN2AIP", "CELSR3", "CHD8", "EPHA3", "HECW2", "IFT140", "IWS1", "MAGI2", "MAST3", "MDM4", "MLL5", "PLEKHA8", "PRDM2", "PRKRIR", "SCAPER", "SETD2", "SMG1", "SMG5", "SMG7", "SNRK", "SPTBN2", "STK11IP", "STRADA", "SWAP70", "TEP1", "TRIP12", "TTLL5", "UACA", "ZFP91", "ZMIZ1", "ZNF227", "ZNF668"]
STAG1_pathway=["PDS5B", "STAG1", "STAG2", "WAPAL","PDS5A", "RAD21", "SMC1A"]
MLL_pathway=["KDM6A", "MLL2", "MLL3","E2F3", "N4BP2", "PROSER1"]
BAP1_pathway=["ASXL1", "ASXL2", "BAP1","ANKRD17", "FOXK1", "FOXK2", "KDM1B"]
KRAS_pathway=["BRAF", "KRAS", "PIK3CA","ARHGAP4", "EIF2AK1", "FAM118B", "NRAS", "PLCE1", "RASGRF2", "RASGRP2", "RGL1", "VARS2"]
HLA_pathway=["B2M", "CD1D","HLA-A", "HLA-B", "P4HTM"]
HER2_pathway=["EGFR", "ERBB2","AREG", "ELF3", "ERBB4", "LRIG1", "OSMR"]
CDON_pathway=["BOC", "CDON"]
KEAP1_pathway=["KEAP1", "NFE2L2","CHD6", "PTMA", "WAC"]
SPOP_pathway=["MYD88", "SPOP"]
RUNX1_pathway=["CBFB", "RUNX1","ELF4"]
NOTCH_pathway=["FBXW7", "NOTCH1","CCNE1", "DCLRE1C", "DLL1", "DTX4", "FAM101A", "FOXM1", "JAG1", "KLF5", "LFNG", "LPP", "MAML1", "MMS22L", "NOTCH3", "NOTCH4", "SHPRH"]
ARID_pathway=["ARID1A", "PBRM1","ADNP", "ARID1B", "ARID2"]
SMARC_pathway=["SMARCB1", "SMARCA4"]
SMC_pathway=["NCAPD3","SMC4","NCAPG2","NCAPH2","SMC2","NCAPD2"]
CLASP_pathway=["CLASP1","CLASP2","CLIP2"]

pathways={"TP53_pathway": TP53_pathway, "STAG1_pathway":STAG1_pathway, "MLL_pathway":MLL_pathway,"BAP1_pathway":BAP1_pathway,"KRAS_pathway":KRAS_pathway, "HLA_pathway":HLA_pathway,  "HER2_pathway":HER2_pathway, "CDON_pathway":CDON_pathway, "KEAP1_pathway":KEAP1_pathway,"SPOP_pathway":SPOP_pathway, "RUNX1_pathway":RUNX1_pathway, "NOTCH_pathway":NOTCH_pathway, "ARID_pathway":ARID_pathway, "SMARC_pathway":SMARC_pathway, "SMC_pathway":SMC_pathway,"CLASP_pathway":CLASP_pathway }

MYC_pos_pathway_affected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }
MYC_pos_pathway_unaffected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }
MYC_neg_pathway_affected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }
MYC_neg_pathway_unaffected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }


for sample in samples.values():
    if not sample.somatic_mutation_data:
        continue
    copynumber = "n/a"
    log2 = sample.getLog2FromGene(gene_positions['MYC'].chrom,gene_positions['MYC'].start, gene_positions['MYC'].end)
    if (log2 != None):
        copynumber = 2.0 * 2.0 ** log2
    else:
        copynumber="n/a"
    for pathway in pathways.keys():
        pathway_affected = False
        for gene in pathways[pathway]:
            if gene in sample.genes_affected or sample.checkFocalGeneAmp(gene):
                pathway_affected = True
                break
        if copynumber != "n/a" and pathway_affected and copynumber > 4:
            MYC_pos_pathway_affected[pathway] += 1
        elif copynumber != "n/a" and not pathway_affected and  copynumber > 4:
            MYC_pos_pathway_unaffected[pathway] += 1
        elif copynumber != "n/a" and pathway_affected and not  copynumber > 4:
            MYC_neg_pathway_affected[pathway] += 1
        elif copynumber != "n/a" and not pathway_affected and not  copynumber > 4:
            MYC_neg_pathway_unaffected[pathway] += 1

s = "<table>"
s += "<th></th><th colspan=\"4\">Sample</th><th colspan=\"3\">Fisher's exact test (two-sided)</th><th colspan=\"3\">Fisher's exact test (one-sided greater)</th><th colspan=\"3\">Fisher's exact test (one-sided less)</th></tr>"
s += "<tr><th>Pathway</th><th>MYC + pathway affected</th><th>MYC + pathway unaffected</th><th>MYC - pathway affected</th><th>MYC - pathway unaffected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th></tr>"

for pathway in pathways.keys():
   oddsratio, pvalue = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]])
   oddsratio_greater, pvalue_greater = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]], alternative='greater')
   oddsratio_less, pvalue_less = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]], alternative='less')
   s += "<tr><th>"+pathway+"</th><td>"+str(MYC_pos_pathway_affected[pathway])+"</td>"
   s += "<td>"+str(MYC_pos_pathway_unaffected[pathway])+"</td>"
   s += "<td>"+str(MYC_neg_pathway_affected[pathway])+"</td>"
   s += "<td>"+str(MYC_neg_pathway_unaffected[pathway])+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio)+"</td>"
   if pvalue < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue * 16)+"</td>"
   else:
      s += "<td>"+"{:.2f}".format(pvalue)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue * 16)+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio_greater)+"</td>"
   if pvalue_greater < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue_greater)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue_greater * 16)+"</td>"
   else:
      s += "<td>"+"{:.2f}".format(pvalue_greater)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue_greater * 16)+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio_less)+"</td>"
   if pvalue_less < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue_less)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue_less * 16)+"</td></tr>"
   else: 
      s += "<td>"+"{:.2f}".format(pvalue_less)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue_less * 16)+"</td></tr>"

s += "</table>"
h = HTML(s);h


Out[5]:
SampleFisher's exact test (two-sided)Fisher's exact test (one-sided greater)Fisher's exact test (one-sided less)
PathwayMYC + pathway affectedMYC + pathway unaffectedMYC - pathway affectedMYC - pathway unaffectedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value corrected
STAG1_pathway1111558000.130.015580.249260.131.0015.990.130.007210.11543
KRAS_pathway42703515040.860.548.630.860.7912.700.860.274.31
CLASP_pathway4108218341.470.528.321.470.335.231.470.8513.55
SMARC_pathway111188470.951.0016.000.950.6710.740.950.7211.52
SPOP_pathway011258500.001.0016.000.001.0016.000.000.548.63
RUNX1_pathway2110577980.250.050.880.250.9915.920.250.023520.37631
MLL_pathway16961137421.090.7712.281.090.426.781.090.6910.96
TP53_pathway59533754801.420.091.381.420.048410.774511.420.9715.50
ARID_pathway8104617941.001.0016.001.000.568.951.000.599.52
KEAP1_pathway5107298261.330.589.311.330.365.721.330.8112.94
SMC_pathway5107248311.620.375.911.620.243.811.620.8914.27
NOTCH_pathway19931237321.220.487.651.220.274.391.220.8112.94
BAP1_pathway8104558001.120.6911.011.120.457.181.120.7011.20
CDON_pathway3109158401.540.457.271.540.355.561.540.8513.67
HLA_pathway3109258300.911.0016.000.910.6510.370.910.599.43
HER2_pathway29831566991.570.060.901.570.038490.615811.570.9815.64

Which genes are affected in MYC focally amplified samples

Check all genes which appear in any pathway in Leiserson paper Only output genes where any p-value is lower than 0.05


In [6]:
genes=["CCND1","CDKN2A", "CUL9", "NPM1", "PTEN", "TP53","AGBL2", "ALS2CR8", "AMOTL1", "ANKRD12", "BRPF1", "CACHD1", "CARNS1", "CDKN2AIP", "CELSR3", "CHD8", "EPHA3", "HECW2", "IFT140", "IWS1", "MAGI2", "MAST3", "MDM4", "MLL5", "PLEKHA8", "PRDM2", "PRKRIR", "SCAPER", "SETD2", "SMG1", "SMG5", "SMG7", "SNRK",
       "SPTBN2", "STK11IP", "STRADA", "SWAP70", "TEP1", "TRIP12", "TTLL5", "UACA", "ZFP91", "ZMIZ1", "ZNF227", "ZNF668",
       "PDS5B", "STAG1", "STAG2", "WAPAL","PDS5A", "RAD21", "SMC1A",
       "KDM6A", "MLL2", "MLL3","E2F3", "N4BP2", "PROSER1",
       "ASXL1", "ASXL2", "BAP1","ANKRD17", "FOXK1", "FOXK2", "KDM1B",
       "BRAF", "KRAS", "PIK3CA","ARHGAP4", "EIF2AK1", "FAM118B", "NRAS", "PLCE1", "RASGRF2", "RASGRP2", "RGL1", "VARS2",
       "B2M", "CD1D","HLA-A", "HLA-B", "P4HTM",
       "EGFR", "ERBB2","AREG", "ELF3", "ERBB4", "LRIG1", "OSMR",
       "BOC", "CDON",
       "KEAP1", "NFE2L2","CHD6", "PTMA", "WAC",
       "MYD88", "SPOP",
       "CBFB", "RUNX1","ELF4",
       "FBXW7", "NOTCH1","CCNE1", "DCLRE1C", "DLL1", "DTX4", "FAM101A", "FOXM1", "JAG1", "KLF5", "LFNG", "LPP", "MAML1", "MMS22L", "NOTCH3", "NOTCH4", "SHPRH",
       "ARID1A", "PBRM1","ADNP", "ARID1B", "ARID2",
       "SMARCB1", "SMARCA4",
       "NCAPD3","SMC4","NCAPG2","NCAPH2","SMC2","NCAPD2",
       "CLASP1","CLASP2","CLIP2"]

s = "<table>"
s += "<th></th><th colspan=\"4\">Sample</th><th colspan=\"3\">Fisher's exact test (two-sided)</th><th colspan=\"3\">Fisher's exact test (one-sided greater)</th><th colspan=\"3\">Fisher's exact test (one-sided less)</th></tr>"
s += "<tr><th>Gene</th><th>MYC + Gene affected</th><th>MYC + Gene unaffected</th><th>MYC - Gene affected</th><th>MYC - Gene unaffected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th></tr>"

gene_count = 0
for gene in genes:
    gene_count += 1
    MYC_pos_gene_affected = 0
    MYC_pos_gene_unaffected = 0
    MYC_neg_gene_affected = 0
    MYC_neg_gene_unaffected = 0

    for sample in samples.values():
        if not sample.somatic_mutation_data:
            continue
        if sample.checkFocalGeneAmp(gene) or gene in sample.genes_affected:
            if sample.checkFocalGeneAmp("MYC"):
                MYC_pos_gene_affected += 1
            else:
                MYC_neg_gene_affected += 1
        else:
            if sample.checkFocalGeneAmp("MYC"):
                MYC_pos_gene_unaffected += 1
            else:
                MYC_neg_gene_unaffected += 1

    oddsratio, pvalue = scipy.stats.fisher_exact([[MYC_pos_gene_affected, MYC_pos_gene_unaffected], [MYC_neg_gene_affected, MYC_neg_gene_unaffected]])
    oddsratio_greater, pvalue_greater = scipy.stats.fisher_exact([[MYC_pos_gene_affected, MYC_pos_gene_unaffected], [MYC_neg_gene_affected, MYC_neg_gene_unaffected]], alternative='greater')
    oddsratio_less, pvalue_less = scipy.stats.fisher_exact([[MYC_pos_gene_affected, MYC_pos_gene_unaffected], [MYC_neg_gene_affected, MYC_neg_gene_unaffected]], alternative='less')

        
    s += "<tr><th>"+gene+"</th><td>"+str(MYC_pos_gene_affected)+"</td>"
    s += "<td>"+str(MYC_pos_gene_unaffected)+"</td>"
    s += "<td>"+str(MYC_neg_gene_affected)+"</td>"
    s += "<td>"+str(MYC_neg_gene_unaffected)+"</td>"
    s += "<td>"+"{:.2f}".format(oddsratio)+"</td>"
    if pvalue < 0.05: 
       s += "<td>"+"{:.5f}".format(pvalue)+"</td>"
       s += "<td>"+"{:.5f}".format(pvalue * 134)+"</td>"
    else:
       s += "<td>"+"{:.2f}".format(pvalue)+"</td>"
       s += "<td>"+"{:.2f}".format(pvalue * 134)+"</td>"
    s += "<td>"+"{:.2f}".format(oddsratio_greater)+"</td>"
    if pvalue_greater < 0.05: 
       s += "<td>"+"{:.5f}".format(pvalue_greater)+"</td>"
       s += "<td>"+"{:.5f}".format(pvalue_greater * 134)+"</td>"
    else:
       s += "<td>"+"{:.2f}".format(pvalue_greater)+"</td>"
       s += "<td>"+"{:.2f}".format(pvalue_greater * 134)+"</td>"
    s += "<td>"+"{:.2f}".format(oddsratio_less)+"</td>"
    if pvalue_less < 0.05: 
       s += "<td>"+"{:.5f}".format(pvalue_less)+"</td>"
       s += "<td>"+"{:.5f}".format(pvalue_less * 134)+"</td></tr>"
    else: 
       s += "<td>"+"{:.2f}".format(pvalue_less)+"</td>"
       s += "<td>"+"{:.2f}".format(pvalue_less * 134)+"</td></tr>"

s += "</table>"
print "Genes: "+str(gene_count)
h = HTML(s);h


Genes: 134
Out[6]:
SampleFisher's exact test (two-sided)Fisher's exact test (one-sided greater)Fisher's exact test (one-sided less)
GeneMYC + Gene affectedMYC + Gene unaffectedMYC - Gene affectedMYC - Gene unaffectedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value corrected
CCND110511827250.780.6282.900.780.80107.690.780.3140.95
CDKN2A16049033.760.2837.293.760.2837.293.760.97129.37
CUL916088991.870.4459.601.870.4459.601.870.89119.84
NPM106149030.001.00134.000.001.00134.000.000.77103.24
PTEN259338740.901.00134.000.900.6688.650.900.6282.92
TP5336252616463.560.000000.000323.560.000000.000273.561.00134.00
AGBL206149030.001.00134.000.001.00134.000.000.77103.24
ALS2CR806149030.001.00134.000.001.00134.000.000.77103.24
AMOTL1061108970.001.00134.000.001.00134.000.000.5269.67
ANKRD12061208870.000.6384.580.001.00134.000.000.2735.97
BRPF106169010.001.00134.000.001.00134.000.000.6890.58
CACHD116069012.500.3749.162.500.3749.162.500.93125.04
CARNS1259448630.660.76102.230.660.80107.490.660.4358.24
CDKN2AIP06149030.001.00134.000.001.00134.000.000.77103.24
CELSR3061128950.001.00134.000.001.00134.000.000.4661.08
CHD8061158920.000.6282.660.001.00134.000.000.3750.11
EPHA3358228852.080.2027.462.080.2027.462.080.93125.08
HECW2259138942.330.2432.562.330.2432.562.330.94125.56
IFT14025979004.360.1114.164.360.1114.164.360.98131.96
IWS106159020.001.00134.000.001.00134.000.000.7296.71
MAGI206198980.001.00134.000.001.00134.000.000.5674.40
MAST306159020.001.00134.000.001.00134.000.000.7296.71
MDM4358238841.990.2229.721.990.2229.721.990.92123.94
MLL5259108973.040.1723.023.040.1723.023.040.97129.33
PLEKHA806149030.001.00134.000.001.00134.000.000.77103.24
PRDM216079002.140.4154.552.140.4154.552.140.91122.53
PRKRIR754858221.250.5167.771.250.3647.931.250.79105.25
SCAPER061108970.001.00134.000.001.00134.000.000.5269.67
SETD2259158922.020.2939.022.020.2939.022.020.91122.47
SMG1160188890.821.00134.000.820.7195.560.820.6688.55
SMG506188990.001.00134.000.001.00134.000.000.5979.45
SMG706188990.001.00134.000.001.00134.000.000.5979.45
SNRK25959026.120.078.966.120.078.966.120.99133.07
SPTBN2259418660.721.00134.000.720.77103.170.720.4864.57
STK11IP06149030.001.00134.000.001.00134.000.000.77103.24
STRADA06179000.001.00134.000.001.00134.000.000.6384.84
SWAP7006139040.001.00134.000.001.00134.000.000.82110.21
TEP1259158922.020.2939.022.020.2939.022.020.91122.47
TRIP1206188990.001.00134.000.001.00134.000.000.5979.45
TTLL5061108970.001.00134.000.001.00134.000.000.5269.67
UACA16069012.500.3749.162.500.3749.162.500.93125.04
ZFP91061108970.001.00134.000.001.00134.000.000.5269.67
ZMIZ1457228852.820.0810.062.820.0810.062.820.98131.34
ZNF227160190615.100.1216.3615.100.1216.3615.101.00133.48
ZNF66816049033.760.2837.293.760.2837.293.760.97129.37
PDS5B061138940.001.00134.000.001.00134.000.000.4357.18
STAG1160118961.360.5472.921.360.5472.921.360.83110.98
STAG2061138940.001.00134.000.001.00134.000.000.4357.18
WAPAL06129050.001.00134.000.001.00134.000.000.88117.64
PDS5A06159020.001.00134.000.001.00134.000.000.7296.71
RAD2116059023.010.3243.423.010.3243.423.010.95127.34
SMC1A061118960.001.00134.000.001.00134.000.000.4965.24
KDM6A259148932.160.2735.792.160.2735.792.160.93124.07
MLL2160228850.671.00134.000.670.78104.540.670.5776.15
MLL3754608471.830.1824.701.830.1216.171.830.95126.96
E2F3457188893.470.044355.943003.470.044355.943003.470.99132.71
N4BP2061128950.001.00134.000.001.00134.000.000.4661.08
PROSER106179000.001.00134.000.001.00134.000.000.6384.84
ASXL116059023.010.3243.423.010.3243.423.010.95127.34
ASXL2061168910.000.6282.690.001.00134.000.000.3546.90
BAP106179000.001.00134.000.001.00134.000.000.6384.84
ANKRD17259228851.360.6688.371.360.4560.811.360.81108.77
FOXK106119060.001.00134.000.001.00134.000.000.94125.56
FOXK206149030.001.00134.000.001.00134.000.000.77103.24
KDM1B25979004.360.1114.164.360.1114.164.360.98131.96
BRAF160128951.240.5776.821.240.5776.821.240.80107.84
KRAS259318760.961.00134.000.960.6384.280.960.6587.70
PIK3CA14473175900.550.079.260.550.98131.650.550.035284.72707
ARHGAP406169010.001.00134.000.001.00134.000.000.6890.58
EIF2AK116069012.500.3749.162.500.3749.162.500.93125.04
FAM118B25969015.090.0911.475.090.0911.475.090.99132.58
NRAS06139040.001.00134.000.001.00134.000.000.82110.21
PLCE1160158920.991.00134.000.990.6587.100.990.7398.21
RASGRF206159020.001.00134.000.001.00134.000.000.7296.71
RASGRP216069012.500.3749.162.500.3749.162.500.93125.04
RGL106139040.001.00134.000.001.00134.000.000.82110.21
VARS206119060.001.00134.000.001.00134.000.000.94125.56
B2M06159020.001.00134.000.001.00134.000.000.7296.71
CD1D16059023.010.3243.423.010.3243.423.010.95127.34
HLA-A06159020.001.00134.000.001.00134.000.000.7296.71
HLA-B06188990.001.00134.000.001.00134.000.000.5979.45
P4HTM06159020.001.00134.000.001.00134.000.000.7296.71
EGFR061248830.000.3952.790.001.00134.000.000.2127.56
ERBB215461207872.140.020422.736542.140.015322.053182.140.99133.15
AREG259108973.040.1723.023.040.1723.023.040.97129.33
ELF306149030.001.00134.000.001.00134.000.000.77103.24
ERBB4259218861.430.6587.301.430.4357.851.430.83110.94
LRIG106159020.001.00134.000.001.00134.000.000.7296.71
OSMR06159020.001.00134.000.001.00134.000.000.7296.71
BOC061108970.001.00134.000.001.00134.000.000.5269.67
CDON25979004.360.1114.164.360.1114.164.360.98131.96
KEAP106139040.001.00134.000.001.00134.000.000.82110.21
NFE2L216059023.010.3243.423.010.3243.423.010.95127.34
CHD6259188891.670.3648.611.670.3648.611.670.87117.07
PTMA06119060.001.00134.000.001.00134.000.000.94125.56
WAC16088991.870.4459.601.870.4459.601.870.89119.84
MYD8806119060.001.00134.000.001.00134.000.000.94125.56
SPOP06149030.001.00134.000.001.00134.000.000.77103.24
CBFB061238840.000.3952.520.001.00134.000.000.2229.46
RUNX1160318760.470.7295.920.470.88117.890.470.3952.02
ELF406159020.001.00134.000.001.00134.000.000.7296.71
FBXW7259158922.020.2939.022.020.2939.022.020.91122.47
NOTCH1160118961.360.5472.921.360.5472.921.360.83110.98
CCNE1853378703.550.005280.706893.550.005280.706893.551.00133.83
DCLRE1C16079002.140.4154.552.140.4154.552.140.91122.53
DLL106159020.001.00134.000.001.00134.000.000.7296.71
DTX406188990.001.00134.000.001.00134.000.000.5979.45
FAM101A16049033.760.2837.293.760.2837.293.760.97129.37
FOXM116069012.500.3749.162.500.3749.162.500.93125.04
JAG1061138940.001.00134.000.001.00134.000.000.4357.18
KLF50610907nan1.00134.00nan1.00134.00nan1.00134.00
LFNG06119060.001.00134.000.001.00134.000.000.94125.56
LPP06169010.001.00134.000.001.00134.000.000.6890.58
MAML106139040.001.00134.000.001.00134.000.000.82110.21
MMS22L06159020.001.00134.000.001.00134.000.000.7296.71
NOTCH3061148930.001.00134.000.001.00134.000.000.4053.53
NOTCH4061108970.001.00134.000.001.00134.000.000.5269.67
SHPRH061118960.001.00134.000.001.00134.000.000.4965.24
ARID1A061278800.000.4154.580.001.00134.000.000.1722.55
PBRM116069012.500.3749.162.500.3749.162.500.93125.04
ADNP160148931.061.00134.001.060.6383.891.060.76101.44
ARID1B160158920.991.00134.000.990.6587.100.990.7398.21
ARID2061118960.001.00134.000.001.00134.000.000.4965.24
SMARCB106139040.001.00134.000.001.00134.000.000.82110.21
SMARCA406179000.001.00134.000.001.00134.000.000.6384.84
NCAPD3061118960.001.00134.000.001.00134.000.000.4965.24
SMC406188990.001.00134.000.001.00134.000.000.5979.45
NCAPG206139040.001.00134.000.001.00134.000.000.82110.21
NCAPH206139040.001.00134.000.001.00134.000.000.82110.21
SMC206159020.001.00134.000.001.00134.000.000.7296.71
NCAPD2259390410.210.034544.6278110.210.034544.6278110.211.00133.71
CLASP116088991.870.4459.601.870.4459.601.870.89119.84
CLASP216079002.140.4154.552.140.4154.552.140.91122.53
CLIP225979004.360.1114.164.360.1114.164.360.98131.96

Which genes are affected in MYC Copynumber > 4 samples


In [7]:
genes=["CCND1","CDKN2A", "CUL9", "NPM1", "PTEN", "TP53","AGBL2", "ALS2CR8", "AMOTL1", "ANKRD12", "BRPF1", "CACHD1", "CARNS1", "CDKN2AIP", "CELSR3", "CHD8", "EPHA3", "HECW2", "IFT140", "IWS1", "MAGI2", "MAST3", "MDM4", "MLL5", "PLEKHA8", "PRDM2", "PRKRIR", "SCAPER", "SETD2", "SMG1", "SMG5", "SMG7", "SNRK",
       "SPTBN2", "STK11IP", "STRADA", "SWAP70", "TEP1", "TRIP12", "TTLL5", "UACA", "ZFP91", "ZMIZ1", "ZNF227", "ZNF668",
       "PDS5B", "STAG1", "STAG2", "WAPAL","PDS5A", "RAD21", "SMC1A",
       "KDM6A", "MLL2", "MLL3","E2F3", "N4BP2", "PROSER1",
       "ASXL1", "ASXL2", "BAP1","ANKRD17", "FOXK1", "FOXK2", "KDM1B",
       "BRAF", "KRAS", "PIK3CA","ARHGAP4", "EIF2AK1", "FAM118B", "NRAS", "PLCE1", "RASGRF2", "RASGRP2", "RGL1", "VARS2",
       "B2M", "CD1D","HLA-A", "HLA-B", "P4HTM",
       "EGFR", "ERBB2","AREG", "ELF3", "ERBB4", "LRIG1", "OSMR",
       "BOC", "CDON",
       "KEAP1", "NFE2L2","CHD6", "PTMA", "WAC",
       "MYD88", "SPOP",
       "CBFB", "RUNX1","ELF4",
       "FBXW7", "NOTCH1","CCNE1", "DCLRE1C", "DLL1", "DTX4", "FAM101A", "FOXM1", "JAG1", "KLF5", "LFNG", "LPP", "MAML1", "MMS22L", "NOTCH3", "NOTCH4", "SHPRH",
       "ARID1A", "PBRM1","ADNP", "ARID1B", "ARID2",
       "SMARCB1", "SMARCA4",
       "NCAPD3","SMC4","NCAPG2","NCAPH2","SMC2","NCAPD2",
       "CLASP1","CLASP2","CLIP2"]

s = "<table>"
s += "<th></th><th colspan=\"4\">Sample</th><th colspan=\"3\">Fisher's exact test (two-sided)</th><th colspan=\"3\">Fisher's exact test (one-sided greater)</th><th colspan=\"3\">Fisher's exact test (one-sided less)</th></tr>"
s += "<tr><th>Gene</th><th>MYC + Gene affected</th><th>MYC + Gene unaffected</th><th>MYC - Gene affected</th><th>MYC - Gene unaffected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th></tr>"

gene_count = 0
for gene in genes:
    gene_count += 1
    MYC_pos_gene_affected = 0
    MYC_pos_gene_unaffected = 0
    MYC_neg_gene_affected = 0
    MYC_neg_gene_unaffected = 0

    for sample in samples.values():
        if not sample.somatic_mutation_data:
            continue
        copynumber = "n/a"
        log2 = sample.getLog2FromGene(gene_positions['MYC'].chrom,gene_positions['MYC'].start, gene_positions['MYC'].end)
        if (log2 != None):
            copynumber = 2.0 * 2.0 ** log2
        else:
            copynumber="n/a"
        if sample.checkFocalGeneAmp(gene) or gene in sample.genes_affected:
            if copynumber != "n/a" and copynumber > 4:
                MYC_pos_gene_affected += 1
            elif  copynumber != "n/a" and not copynumber > 4:
                MYC_neg_gene_affected += 1
        else:
            if copynumber != "n/a" and copynumber > 4:
                MYC_pos_gene_unaffected += 1
            elif  copynumber != "n/a" and not copynumber > 4:
                MYC_neg_gene_unaffected += 1

    oddsratio, pvalue = scipy.stats.fisher_exact([[MYC_pos_gene_affected, MYC_pos_gene_unaffected], [MYC_neg_gene_affected, MYC_neg_gene_unaffected]])
    oddsratio_greater, pvalue_greater = scipy.stats.fisher_exact([[MYC_pos_gene_affected, MYC_pos_gene_unaffected], [MYC_neg_gene_affected, MYC_neg_gene_unaffected]], alternative='greater')
    oddsratio_less, pvalue_less = scipy.stats.fisher_exact([[MYC_pos_gene_affected, MYC_pos_gene_unaffected], [MYC_neg_gene_affected, MYC_neg_gene_unaffected]], alternative='less')

    #if (pvalue > 0.05 and pvalue_greater > 0.05 and pvalue_less > 0.05):
        #continue
        
    s += "<tr><th>"+gene+"</th><td>"+str(MYC_pos_gene_affected)+"</td>"
    s += "<td>"+str(MYC_pos_gene_unaffected)+"</td>"
    s += "<td>"+str(MYC_neg_gene_affected)+"</td>"
    s += "<td>"+str(MYC_neg_gene_unaffected)+"</td>"
    s += "<td>"+"{:.2f}".format(oddsratio)+"</td>"
    if pvalue < 0.05: 
       s += "<td>"+"{:.5f}".format(pvalue)+"</td>"
       s += "<td>"+"{:.5f}".format(pvalue * 134)+"</td>"
    else:
       s += "<td>"+"{:.2f}".format(pvalue)+"</td>"
       s += "<td>"+"{:.2f}".format(pvalue * 134)+"</td>"
    s += "<td>"+"{:.2f}".format(oddsratio_greater)+"</td>"
    if pvalue_greater < 0.0005: 
       s += "<td>"+"{:.10f}".format(pvalue_greater)+"</td>"
       s += "<td>"+"{:.10f}".format(pvalue_greater * 134)+"</td>"
    elif pvalue_greater < 0.05: 
       s += "<td>"+"{:.5f}".format(pvalue_greater)+"</td>"
       s += "<td>"+"{:.5f}".format(pvalue_greater * 134)+"</td>"
    else:
       s += "<td>"+"{:.2f}".format(pvalue_greater)+"</td>"
       s += "<td>"+"{:.2f}".format(pvalue_greater * 134)+"</td>"
    s += "<td>"+"{:.2f}".format(oddsratio_less)+"</td>"
    if pvalue_less < 0.05: 
       s += "<td>"+"{:.5f}".format(pvalue_less)+"</td>"
       s += "<td>"+"{:.5f}".format(pvalue_less * 134)+"</td></tr>"
    else: 
       s += "<td>"+"{:.2f}".format(pvalue_less)+"</td>"
       s += "<td>"+"{:.2f}".format(pvalue_less * 134)+"</td></tr>"

s += "</table>"
print "Genes: "+str(gene_count)
h = HTML(s);h


Genes: 134
Out[7]:
SampleFisher's exact test (two-sided)Fisher's exact test (one-sided greater)Fisher's exact test (one-sided less)
GeneMYC + Gene affectedMYC + Gene unaffectedMYC - Gene affectedMYC - Gene unaffectedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value corrected
CCND127851656901.330.2634.351.330.1419.021.330.91121.33
CDKN2A011258500.001.00134.000.001.00134.000.000.5472.31
CUL9011298460.000.6181.590.001.00134.000.000.3344.04
NPM1011248510.001.00134.000.001.00134.000.000.6181.83
PTEN3109328230.710.79105.690.710.79106.220.710.4154.53
TP5363492336223.430.000000.000003.430.00000000170.00000022493.431.00134.00
AGBL2011248510.001.00134.000.001.00134.000.000.6181.83
ALS2CR8011248510.001.00134.000.001.00134.000.000.6181.83
AMOTL1211088471.930.3343.621.930.3343.621.930.90120.74
ANKRD120112208350.000.1520.660.001.00134.000.000.0811.13
BRPF1111158501.530.5270.111.530.5270.111.530.85114.41
CACHD1111168491.270.5877.551.270.5877.551.270.81108.57
CARNS15107418140.931.00134.000.930.6384.900.930.5574.28
CDKN2AIP2110185415.530.036894.9435515.530.036894.9435515.531.00133.80
CELSR31111118440.691.00134.000.690.77103.680.690.5978.59
CHD80112158400.000.2432.220.001.00134.000.000.1620.84
EPHA35107208351.950.2026.471.950.1520.551.950.94126.08
HECW22110138421.180.6992.361.180.5371.521.180.75100.90
IFT140211078482.200.2837.552.200.2837.552.200.92123.89
IWS1011258500.001.00134.000.001.00134.000.000.5472.31
MAGI2011298460.000.6181.590.001.00134.000.000.3344.04
MAST3011248510.001.00134.000.001.00134.000.000.6181.83
MDM45107218341.860.2128.441.860.1723.221.860.93124.65
MLL5310998462.590.1520.532.590.1520.532.590.96128.58
PLEKHA8011248510.001.00134.000.001.00134.000.000.6181.83
PRDM2011288470.000.6181.320.001.00134.000.000.3749.86
PRKRIR1399797761.290.3952.911.290.2634.421.290.84112.01
SCAPER111198460.851.00134.000.850.7195.110.850.6790.38
SETD22110158401.021.00134.001.020.6080.961.020.6991.97
SMG12110178380.901.00134.000.900.6789.280.900.6282.97
SMG5111178481.091.00134.001.090.6384.141.090.77102.55
SMG7011278480.001.00134.000.001.00134.000.000.4256.45
SNRK211058503.090.1925.433.090.1925.433.090.96128.99
SPTBN23109408150.560.4762.420.560.89119.820.560.2432.75
STK11IP011248510.001.00134.000.001.00134.000.000.6181.83
STRADA111168491.270.5877.551.270.5877.551.270.81108.57
SWAP70011238520.001.00134.000.001.00134.000.000.6992.59
TEP13109148411.650.4358.261.650.3142.031.650.88117.42
TRIP12111178481.091.00134.001.090.6384.141.090.77102.55
TTLL5111198460.851.00134.000.850.7195.110.850.6790.38
UACA211058503.090.1925.433.090.1925.433.090.96128.99
ZFP910112108450.000.6282.510.001.00134.000.000.2938.89
ZMIZ17105198362.930.023383.132472.930.023383.132472.930.99133.12
ZNF227111118547.690.2229.267.690.2229.267.690.99132.22
ZNF668111148511.920.4661.691.920.4661.691.920.89119.90
PDS5B0112138420.000.3851.310.001.00134.000.000.2026.76
STAG10112128430.000.3850.850.001.00134.000.000.2330.32
STAG20112138420.000.3851.310.001.00134.000.000.2026.76
WAPAL011228530.001.00134.000.001.00134.000.000.78104.74
PDS5A011258500.001.00134.000.001.00134.000.000.5472.31
RAD21111158501.530.5270.111.530.5270.111.530.85114.41
SMC1A0112118440.000.6383.940.001.00134.000.000.2634.34
KDM6A3109138421.780.4255.931.780.2837.541.780.90120.14
MLL21111228330.340.5067.620.340.94126.370.340.2331.22
MLL38104597961.040.84113.191.040.5270.131.040.6384.46
E2F34108188371.720.3141.451.720.2432.801.720.90120.57
N4BP20112128430.000.3850.850.001.00134.000.000.2330.32
PROSER1011278480.001.00134.000.001.00134.000.000.4256.45
ASXL1111158501.530.5270.111.530.5270.111.530.85114.41
ASXL22110148411.090.7194.771.090.5776.381.090.7296.46
BAP1011278480.001.00134.000.001.00134.000.000.4256.45
ANKRD172110228330.691.00134.000.690.79105.560.690.4661.72
FOXK111110855inf0.1215.52inf0.1215.52inf1.00134.00
FOXK2111138522.560.3952.172.560.3952.172.560.93124.86
KDM1B111188470.951.00134.000.950.6789.960.950.7296.45
BRAF2110118441.400.6687.781.400.4661.021.400.82109.45
KRAS7105268292.130.0912.412.130.0810.262.130.97130.11
PIK3CA29833025530.640.067.540.640.98131.710.640.028883.87038
ARHGAP4111158501.530.5270.111.530.5270.111.530.85114.41
EIF2AK1211058503.090.1925.433.090.1925.433.090.96128.99
FAM118B211068492.570.2331.452.570.2331.452.570.95126.65
NRAS011238520.001.00134.000.001.00134.000.000.6992.59
PLCE11111158400.501.00134.000.500.86115.610.500.4357.62
RASGRF2011258500.001.00134.000.001.00134.000.000.5472.31
RASGRP2111168491.270.5877.551.270.5877.551.270.81108.57
RGL1011228530.001.00134.000.001.00134.000.000.78104.74
VARS2011218540.001.00134.000.001.00134.000.000.88118.48
B2M111148511.920.4661.691.920.4661.691.920.89119.90
CD1D011268490.001.00134.000.001.00134.000.000.4863.89
HLA-A011258500.001.00134.000.001.00134.000.000.5472.31
HLA-B111178481.091.00134.001.090.6384.141.090.77102.55
P4HTM111148511.920.4661.691.920.4661.691.920.89119.90
EGFR7105178383.290.014982.007203.290.014982.007203.291.00133.49
ERBB219931167391.300.3141.961.300.2026.901.300.87116.27
AREG310998462.590.1520.532.590.1520.532.590.96128.58
ELF3011248510.001.00134.000.001.00134.000.000.6181.83
ERBB41111228330.340.5067.620.340.94126.370.340.2331.22
LRIG1111148511.920.4661.691.920.4661.691.920.89119.90
OSMR111148511.920.4661.691.920.4661.691.920.89119.90
BOC111198460.851.00134.000.850.7195.110.850.6790.38
CDON211078482.200.2837.552.200.2837.552.200.92123.89
KEAP1011238520.001.00134.000.001.00134.000.000.6992.59
NFE2L2211048513.870.1519.593.870.1519.593.870.98130.88
CHD63109178381.360.5066.601.360.4155.471.360.81108.13
PTMA011218540.001.00134.000.001.00134.000.000.88118.48
WAC011298460.000.6181.590.001.00134.000.000.3344.04
MYD88011218540.001.00134.000.001.00134.000.000.88118.48
SPOP011248510.001.00134.000.001.00134.000.000.6181.83
CBFB1111228330.340.5067.620.340.94126.370.340.2331.22
RUNX11111318240.240.1621.940.240.98131.560.240.1013.03
ELF4011258500.001.00134.000.001.00134.000.000.5472.31
FBXW72110158401.021.00134.001.020.6080.961.020.6991.97
NOTCH1310998462.590.1520.532.590.1520.532.590.96128.58
CCNE17105388171.430.3546.471.430.2634.591.430.86115.44
DCLRE1C211068492.570.2331.452.570.2331.452.570.95126.65
DLL1011258500.001.00134.000.001.00134.000.000.5472.31
DTX4111178481.091.00134.001.090.6384.141.090.77102.55
FAM101A111148511.920.4661.691.920.4661.691.920.89119.90
FOXM1111168491.270.5877.551.270.5877.551.270.81108.57
JAG12110118441.400.6687.781.400.4661.021.400.82109.45
KLF501120855nan1.00134.00nan1.00134.00nan1.00134.00
LFNG011218540.001.00134.000.001.00134.000.000.88118.48
LPP011268490.001.00134.000.001.00134.000.000.4863.89
MAML1011238520.001.00134.000.001.00134.000.000.6992.59
MMS22L011258500.001.00134.000.001.00134.000.000.5472.31
NOTCH33109118442.110.2128.752.110.2128.752.110.93124.87
NOTCH40112108450.000.6282.510.001.00134.000.000.2938.89
SHPRH0112118440.000.6383.940.001.00134.000.000.2634.34
ARID1A2110258300.600.76101.830.600.84112.630.600.3850.54
PBRM1111168491.270.5877.551.270.5877.551.270.81108.57
ADNP1111148410.541.00134.000.540.84113.160.540.4762.48
ARID1B3109138421.780.4255.931.780.2837.541.780.90120.14
ARID21111108450.761.00134.000.760.7499.660.760.6384.40
SMARCB1111128533.840.3141.413.840.3141.413.840.96129.06
SMARCA4011278480.001.00134.000.001.00134.000.000.4256.45
NCAPD3211098461.710.3749.601.710.3749.601.710.87117.25
SMC4011288470.000.6181.320.001.00134.000.000.3749.86
NCAPG2011238520.001.00134.000.001.00134.000.000.6992.59
NCAPH2011238520.001.00134.000.001.00134.000.000.6992.59
SMC2111148511.920.4661.691.920.4661.691.920.89119.90
NCAPD2211038525.160.1114.105.160.1114.105.160.99132.30
CLASP1111188470.951.00134.000.950.6789.960.950.7296.45
CLASP2211068492.570.2331.452.570.2331.452.570.95126.65
CLIP2111188470.951.00134.000.950.6789.960.950.7296.45

Which pathways are affected in MYC positive samples (Vogelstein)

This time use pathway definitions of Vogelstein et al.

Note that MYC has been deleted from pathway: Cell Cycle / Apoptosis


In [8]:
chromatin_modification_pathway=["ARID1A","ARID1B","ARID2","ASXL1","ATRX","CREBBP","DAXX","DNMT1","DNMT3A","EP300","EZH2","H3H3A","HIST1H3B","IDH1","IDH2","KDM5C","KDM6A","MEN1","MLL2","MLL3","NCOR1","PAX5","PBRM1","PRDM1","SETD2","SETBP1","SMARCA4","SMARCB1","SPOP","TET2","WT1","NCOA3"]
cell_cycle_apoptosis_pathway=["ABL1","BCL2","CARD1","CASP8","CDC73","CDKN2A","CYLD","DAXX","FUBP1","MED12","MYD88","NFE2L2","NPM2","PPP2R1A","RB1","TNFAIP3","TRAF7","TP53","CCND1","CDKN2C","MDM2","MDM4","MYCL1","MYCN","SKP2"]
tgf_beta_pathway=["ACVR1B","EP300","FOXL2","GATA1","GATA2","GNAS","MED12","SMAD2","SMAD4"]
PI3K_pathway=["AKT1","ALK","B2M","CBL","CEBPA","CSF1R","EGFR","ERBB2","FGFR2","FGFR3","FLT3","GNA11","GNAQ","GNAS","KIT","MET","PDGFRA","PIK2CA","PIK3R1","PTEN","RET","TSC1","TSHR","VHL","NCOA3"]
RAS_pathway=["ALK","B2M","BRAF","CBL","CEBPA","CIC","CSF1R","EGFR","ERBB2","FGFR2","FGFR3","FLT3","GNA11","GNAQ","GNAS","HRAS","KIT","KRAS","MAP2K1","MAP3K1","MET","NF1","NRAS","PDGFRA","PTPN11","RET","VHL"]
APC_pathway=["AXIN1","CDH1","CTNNB1","EP300","FAM123B","GNAS","HNF1A","NF2","RNF43","SOX9"]
transcription_regulation_pathway=["AR","BCOR","CREBBP","GATA3","KLF4","PHF6","RUNX1","SF3B1","SRSF2","U2AF1","IKZF1","LMO1"]
dna_damage_pathway=["ATM","BAP1","BRCA1","BRCA2","MLH1","MSH2","MSH6","STAG2","TP53"]
mapkinase_pathway=["B2M","CEBPA","GNA11","GNAQ","MAP3K1","TNFAIP3","TSHR","MAP2K4","NKX2-1"]
STAT_pathway=["CLRF","FGFR2","FGFR3","FLT3","JAK1","JAK2","JAK3","STAT","MPL","SOCS1","VHL"]
NOTCH_pathway=["EP300","FBXW7","GATA1","GATA2","NOTCH1","NOTCH2"]
wnt_pathway=["KLF4"]
hedgehog_pathway=["PTCH1","SMO","SPOP"]
mTOR_pathway=["STK11"]
replicationpathway=["SETBP1"]

pathways={"Chromatin Modification": chromatin_modification_pathway, "Cell Cycle\\Apoptosis":cell_cycle_apoptosis_pathway, 
        "TGF beta":tgf_beta_pathway,"PI3K":PI3K_pathway,"RAS":RAS_pathway, "APC":APC_pathway,  
        "Transcription Regulation":transcription_regulation_pathway, 
        "DNA Damage Repair":dna_damage_pathway, "MAP Kinase":mapkinase_pathway,
        "STAT":STAT_pathway, "NOTCH":NOTCH_pathway, 
        "WNT":wnt_pathway, "Hedgehog":hedgehog_pathway, 
        "mTOR":mTOR_pathway, "Replication":replicationpathway}

MYC_pos_pathway_affected={"Chromatin Modification": 0, "Cell Cycle\\Apoptosis":0, "TGF beta":0,"PI3K":0,"RAS":0, "APC":0,
                          "Transcription Regulation":0, "DNA Damage Repair":0, "MAP Kinase":0,"STAT":0, "NOTCH":0, "WNT":0, "Hedgehog":0,
                          "mTOR":0, "Replication":0}
MYC_pos_pathway_unaffected={"Chromatin Modification": 0, "Cell Cycle\\Apoptosis":0, "TGF beta":0,"PI3K":0,"RAS":0, "APC":0,
                          "Transcription Regulation":0, "DNA Damage Repair":0, "MAP Kinase":0,"STAT":0, "NOTCH":0, "WNT":0, "Hedgehog":0,
                          "mTOR":0, "Replication":0}
MYC_neg_pathway_affected={"Chromatin Modification": 0, "Cell Cycle\\Apoptosis":0, "TGF beta":0,"PI3K":0,"RAS":0, "APC":0,  
                          "Transcription Regulation":0, "DNA Damage Repair":0, "MAP Kinase":0,"STAT":0, "NOTCH":0, "WNT":0, "Hedgehog":0,
                          "mTOR":0, "Replication":0}
MYC_neg_pathway_unaffected={"Chromatin Modification": 0, "Cell Cycle\\Apoptosis":0, "TGF beta":0,"PI3K":0,"RAS":0, "APC":0, 
                          "Transcription Regulation":0, "DNA Damage Repair":0, "MAP Kinase":0,"STAT":0, "NOTCH":0, "WNT":0, "Hedgehog":0,
                          "mTOR":0, "Replication":0}


for sample in samples.values():
    if not sample.somatic_mutation_data:
        continue
    for pathway in pathways.keys():
        pathway_affected = False
        for gene in pathways[pathway]:
            if gene in sample.genes_affected or sample.checkFocalGeneAmp(gene):
                pathway_affected = True
                break
        if pathway_affected and sample.checkFocalGeneAmp("MYC"):
            MYC_pos_pathway_affected[pathway] += 1
        elif not pathway_affected and sample.checkFocalGeneAmp("MYC"):
            MYC_pos_pathway_unaffected[pathway] += 1
        elif pathway_affected and not sample.checkFocalGeneAmp("MYC"):
            MYC_neg_pathway_affected[pathway] += 1
        elif not pathway_affected and not sample.checkFocalGeneAmp("MYC"):
            MYC_neg_pathway_unaffected[pathway] += 1

s = "<table>"
s += "<th></th><th colspan=\"4\">Sample</th><th colspan=\"3\">Fisher's exact test (two-sided)</th><th colspan=\"3\">Fisher's exact test (one-sided greater)</th><th colspan=\"3\">Fisher's exact test (one-sided less)</th></tr>"
s += "<tr><th>Pathway</th><th>MYC + pathway affected</th><th>MYC + pathway unaffected</th><th>MYC - pathway affected</th><th>MYC - pathway unaffected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th></tr>"

for pathway in pathways.keys():
   oddsratio, pvalue = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]])
   oddsratio_greater, pvalue_greater = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]], alternative='greater')
   oddsratio_less, pvalue_less = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]], alternative='less')
   s += "<tr><th>"+pathway+"</th><td>"+str(MYC_pos_pathway_affected[pathway])+"</td>"
   s += "<td>"+str(MYC_pos_pathway_unaffected[pathway])+"</td>"
   s += "<td>"+str(MYC_neg_pathway_affected[pathway])+"</td>"
   s += "<td>"+str(MYC_neg_pathway_unaffected[pathway])+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio)+"</td>"
   if pvalue < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue * 15)+"</td>"
   else:
      s += "<td>"+"{:.2f}".format(pvalue)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue * 15)+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio_greater)+"</td>"
   if pvalue_greater < 0.0005: 
      s += "<td>"+"{:.10f}".format(pvalue_greater)+"</td>"
      s += "<td>"+"{:.10f}".format(pvalue_greater * 15)+"</td>"
   elif pvalue_greater < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue_greater)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue_greater * 15)+"</td>"
   else:
      s += "<td>"+"{:.2f}".format(pvalue_greater)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue_greater * 15)+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio_less)+"</td>"
   if pvalue_less < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue_less)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue_less * 15)+"</td></tr>"
   else: 
      s += "<td>"+"{:.2f}".format(pvalue_less)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue_less * 15)+"</td></tr>"

s += "</table>"
h = HTML(s);h


Out[8]:
SampleFisher's exact test (two-sided)Fisher's exact test (one-sided greater)Fisher's exact test (one-sided less)
PathwayMYC + pathway affectedMYC + pathway unaffectedMYC - pathway affectedMYC - pathway unaffectedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value corrected
STAT13481337741.580.192.891.580.111.711.580.9414.08
Chromatin Modification28332686392.020.009520.142822.020.006680.100202.021.0014.95
PI3K34273085992.450.000820.012302.450.000610.009192.451.0015.00
Transcription Regulation12491757321.021.0015.001.020.537.891.020.619.08
NOTCH556588491.310.598.821.310.365.471.310.8012.02
DNA Damage Repair42193115964.240.000000.000004.240.00000011310.00000169694.241.0015.00
TGF beta358708370.620.629.230.620.8612.850.620.314.61
MAP Kinase10511417661.070.8612.831.070.497.311.070.659.78
Cell Cycle\Apoptosis47144784293.010.000180.002633.010.00012585000.00188774933.011.0015.00
Replication061138940.001.0015.000.001.0015.000.000.436.40
mTOR160190615.100.121.8315.100.121.8315.101.0014.94
RAS29323625451.360.284.211.360.152.241.360.9013.56
WNT06129050.001.0015.000.001.0015.000.000.8813.17
APC8531567510.730.487.260.730.8412.620.730.273.98
Hedgehog259278801.100.7010.561.100.568.351.100.7310.89

Remove TP53 from Pathways: Cell Cycle/Apoptosis and DNA Damaga Control


In [9]:
chromatin_modification_pathway=["ARID1A","ARID1B","ARID2","ASXL1","ATRX","CREBBP","DAXX","DNMT1","DNMT3A","EP300","EZH2","H3H3A","HIST1H3B","IDH1","IDH2","KDM5C","KDM6A","MEN1","MLL2","MLL3","NCOR1","PAX5","PBRM1","PRDM1","SETD2","SETBP1","SMARCA4","SMARCB1","SPOP","TET2","WT1","NCOA3"]
cell_cycle_apoptosis_pathway=["ABL1","BCL2","CARD1","CASP8","CDC73","CDKN2A","CYLD","DAXX","FUBP1","MED12","MYD88","NFE2L2","NPM2","PPP2R1A","RB1","TNFAIP3","TRAF7","CCND1","CDKN2C","MDM2","MDM4","MYCL1","MYCN","SKP2"]
tgf_beta_pathway=["ACVR1B","EP300","FOXL2","GATA1","GATA2","GNAS","MED12","SMAD2","SMAD4"]
PI3K_pathway=["AKT1","ALK","B2M","CBL","CEBPA","CSF1R","EGFR","ERBB2","FGFR2","FGFR3","FLT3","GNA11","GNAQ","GNAS","KIT","MET","PDGFRA","PIK2CA","PIK3R1","PTEN","RET","TSC1","TSHR","VHL","NCOA3"]
RAS_pathway=["ALK","B2M","BRAF","CBL","CEBPA","CIC","CSF1R","EGFR","ERBB2","FGFR2","FGFR3","FLT3","GNA11","GNAQ","GNAS","HRAS","KIT","KRAS","MAP2K1","MAP3K1","MET","NF1","NRAS","PDGFRA","PTPN11","RET","VHL"]
APC_pathway=["AXIN1","CDH1","CTNNB1","EP300","FAM123B","GNAS","HNF1A","NF2","RNF43","SOX9"]
transcription_regulation_pathway=["AR","BCOR","CREBBP","GATA3","KLF4","PHF6","RUNX1","SF3B1","SRSF2","U2AF1","IKZF1","LMO1"]
dna_damage_pathway=["ATM","BAP1","BRCA1","BRCA2","MLH1","MSH2","MSH6","STAG2"]
mapkinase_pathway=["B2M","CEBPA","GNA11","GNAQ","MAP3K1","TNFAIP3","TSHR","MAP2K4","NKX2-1"]
STAT_pathway=["CLRF","FGFR2","FGFR3","FLT3","JAK1","JAK2","JAK3","STAT","MPL","SOCS1","VHL"]
NOTCH_pathway=["EP300","FBXW7","GATA1","GATA2","NOTCH1","NOTCH2"]
wnt_pathway=["KLF4"]
hedgehog_pathway=["PTCH1","SMO","SPOP"]
mTOR_pathway=["STK11"]
replicationpathway=["SETBP1"]

pathways={"Chromatin Modification": chromatin_modification_pathway, "Cell Cycle\\Apoptosis":cell_cycle_apoptosis_pathway, 
        "TGF beta":tgf_beta_pathway,"PI3K":PI3K_pathway,"RAS":RAS_pathway, "APC":APC_pathway,  
        "Transcription Regulation":transcription_regulation_pathway, 
        "DNA Damage Repair":dna_damage_pathway, "MAP Kinase":mapkinase_pathway,
        "STAT":STAT_pathway, "NOTCH":NOTCH_pathway, 
        "WNT":wnt_pathway, "Hedgehog":hedgehog_pathway, 
        "mTOR":mTOR_pathway, "Replication":replicationpathway}

MYC_pos_pathway_affected={"Chromatin Modification": 0, "Cell Cycle\\Apoptosis":0, "TGF beta":0,"PI3K":0,"RAS":0, "APC":0,
                          "Transcription Regulation":0, "DNA Damage Repair":0, "MAP Kinase":0,"STAT":0, "NOTCH":0, "WNT":0, "Hedgehog":0,
                          "mTOR":0, "Replication":0}
MYC_pos_pathway_unaffected={"Chromatin Modification": 0, "Cell Cycle\\Apoptosis":0, "TGF beta":0,"PI3K":0,"RAS":0, "APC":0,
                          "Transcription Regulation":0, "DNA Damage Repair":0, "MAP Kinase":0,"STAT":0, "NOTCH":0, "WNT":0, "Hedgehog":0,
                          "mTOR":0, "Replication":0}
MYC_neg_pathway_affected={"Chromatin Modification": 0, "Cell Cycle\\Apoptosis":0, "TGF beta":0,"PI3K":0,"RAS":0, "APC":0,  
                          "Transcription Regulation":0, "DNA Damage Repair":0, "MAP Kinase":0,"STAT":0, "NOTCH":0, "WNT":0, "Hedgehog":0,
                          "mTOR":0, "Replication":0}
MYC_neg_pathway_unaffected={"Chromatin Modification": 0, "Cell Cycle\\Apoptosis":0, "TGF beta":0,"PI3K":0,"RAS":0, "APC":0, 
                          "Transcription Regulation":0, "DNA Damage Repair":0, "MAP Kinase":0,"STAT":0, "NOTCH":0, "WNT":0, "Hedgehog":0,
                          "mTOR":0, "Replication":0}


for sample in samples.values():
    if not sample.somatic_mutation_data:
        continue
    for pathway in pathways.keys():
        pathway_affected = False
        for gene in pathways[pathway]:
            if gene in sample.genes_affected or sample.checkFocalGeneAmp(gene):
                pathway_affected = True
                break
        if pathway_affected and sample.checkFocalGeneAmp("MYC"):
            MYC_pos_pathway_affected[pathway] += 1
        elif not pathway_affected and sample.checkFocalGeneAmp("MYC"):
            MYC_pos_pathway_unaffected[pathway] += 1
        elif pathway_affected and not sample.checkFocalGeneAmp("MYC"):
            MYC_neg_pathway_affected[pathway] += 1
        elif not pathway_affected and not sample.checkFocalGeneAmp("MYC"):
            MYC_neg_pathway_unaffected[pathway] += 1

s = "<table>"
s += "<th></th><th colspan=\"4\">Sample</th><th colspan=\"3\">Fisher's exact test (two-sided)</th><th colspan=\"3\">Fisher's exact test (one-sided greater)</th><th colspan=\"3\">Fisher's exact test (one-sided less)</th></tr>"
s += "<tr><th>Pathway</th><th>MYC + pathway affected</th><th>MYC + pathway unaffected</th><th>MYC - pathway affected</th><th>MYC - pathway unaffected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th></tr>"

for pathway in pathways.keys():
   oddsratio, pvalue = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]])
   oddsratio_greater, pvalue_greater = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]], alternative='greater')
   oddsratio_less, pvalue_less = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]], alternative='less')
   s += "<tr><th>"+pathway+"</th><td>"+str(MYC_pos_pathway_affected[pathway])+"</td>"
   s += "<td>"+str(MYC_pos_pathway_unaffected[pathway])+"</td>"
   s += "<td>"+str(MYC_neg_pathway_affected[pathway])+"</td>"
   s += "<td>"+str(MYC_neg_pathway_unaffected[pathway])+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio)+"</td>"
   if pvalue < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue * 15)+"</td>"
   else:
      s += "<td>"+"{:.2f}".format(pvalue)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue * 15)+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio_greater)+"</td>"
   if pvalue_greater < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue_greater)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue_greater * 15)+"</td>"
   else:
      s += "<td>"+"{:.2f}".format(pvalue_greater)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue_greater * 15)+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio_less)+"</td>"
   if pvalue_less < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue_less)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue_less * 15)+"</td></tr>"
   else: 
      s += "<td>"+"{:.2f}".format(pvalue_less)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue_less * 15)+"</td></tr>"

s += "</table>"
h = HTML(s);h


Out[9]:
SampleFisher's exact test (two-sided)Fisher's exact test (one-sided greater)Fisher's exact test (one-sided less)
PathwayMYC + pathway affectedMYC + pathway unaffectedMYC - pathway affectedMYC - pathway unaffectedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value corrected
STAT13481337741.580.192.891.580.111.711.580.9414.08
Chromatin Modification28332686392.020.009520.142822.020.006680.100202.021.0014.95
PI3K34273085992.450.000820.012302.450.000610.009192.451.0015.00
Transcription Regulation12491757321.021.0015.001.020.537.891.020.619.08
NOTCH556588491.310.598.821.310.365.471.310.8012.02
DNA Damage Repair952748331.950.091.401.950.071.031.950.9714.56
TGF beta358708370.620.629.230.620.8612.850.620.314.61
MAP Kinase10511417661.070.8612.831.070.497.311.070.659.78
Cell Cycle\Apoptosis26353175901.380.274.021.380.142.121.380.9113.66
Replication061138940.001.0015.000.001.0015.000.000.436.40
mTOR160190615.100.121.8315.100.121.8315.101.0014.94
RAS29323625451.360.284.211.360.152.241.360.9013.56
WNT06129050.001.0015.000.001.0015.000.000.8813.17
APC8531567510.730.487.260.730.8412.620.730.273.98
Hedgehog259278801.100.7010.561.100.568.351.100.7310.89

Which pathways are affected in MYC copynumber > 4 samples (Vogelstein)


In [10]:
chromatin_modification_pathway=["ARID1A","ARID1B","ARID2","ASXL1","ATRX","CREBBP","DAXX","DNMT1","DNMT3A","EP300","EZH2","H3H3A","HIST1H3B","IDH1","IDH2","KDM5C","KDM6A","MEN1","MLL2","MLL3","NCOR1","PAX5","PBRM1","PRDM1","SETD2","SETBP1","SMARCA4","SMARCB1","SPOP","TET2","WT1","NCOA3"]
cell_cycle_apoptosis_pathway=["ABL1","BCL2","CARD1","CASP8","CDC73","CDKN2A","CYLD","DAXX","FUBP1","MED12","MYD88","NFE2L2","NPM2","PPP2R1A","RB1","TNFAIP3","TRAF7","TP53","CCND1","CDKN2C","MDM2","MDM4","MYCL1","MYCN","SKP2"]
tgf_beta_pathway=["ACVR1B","EP300","FOXL2","GATA1","GATA2","GNAS","MED12","SMAD2","SMAD4"]
PI3K_pathway=["AKT1","ALK","B2M","CBL","CEBPA","CSF1R","EGFR","ERBB2","FGFR2","FGFR3","FLT3","GNA11","GNAQ","GNAS","KIT","MET","PDGFRA","PIK2CA","PIK3R1","PTEN","RET","TSC1","TSHR","VHL","NCOA3"]
RAS_pathway=["ALK","B2M","BRAF","CBL","CEBPA","CIC","CSF1R","EGFR","ERBB2","FGFR2","FGFR3","FLT3","GNA11","GNAQ","GNAS","HRAS","KIT","KRAS","MAP2K1","MAP3K1","MET","NF1","NRAS","PDGFRA","PTPN11","RET","VHL"]
APC_pathway=["AXIN1","CDH1","CTNNB1","EP300","FAM123B","GNAS","HNF1A","NF2","RNF43","SOX9"]
transcription_regulation_pathway=["AR","BCOR","CREBBP","GATA3","KLF4","PHF6","RUNX1","SF3B1","SRSF2","U2AF1","IKZF1","LMO1"]
dna_damage_pathway=["ATM","BAP1","BRCA1","BRCA2","MLH1","MSH2","MSH6","STAG2","TP53"]
mapkinase_pathway=["B2M","CEBPA","GNA11","GNAQ","MAP3K1","TNFAIP3","TSHR","MAP2K4","NKX2-1"]
STAT_pathway=["CLRF","FGFR2","FGFR3","FLT3","JAK1","JAK2","JAK3","STAT","MPL","SOCS1","VHL"]
NOTCH_pathway=["EP300","FBXW7","GATA1","GATA2","NOTCH1","NOTCH2"]
wnt_pathway=["KLF4"]
hedgehog_pathway=["PTCH1","SMO","SPOP"]
mTOR_pathway=["STK11"]
replicationpathway=["SETBP1"]

pathways={"Chromatin Modification": chromatin_modification_pathway, "Cell Cycle\\Apoptosis":cell_cycle_apoptosis_pathway, 
        "TGF beta":tgf_beta_pathway,"PI3K":PI3K_pathway,"RAS":RAS_pathway, "APC":APC_pathway,  
        "Transcription Regulation":transcription_regulation_pathway, 
        "DNA Damage Repair":dna_damage_pathway, "MAP Kinase":mapkinase_pathway,
        "STAT":STAT_pathway, "NOTCH":NOTCH_pathway, 
        "WNT":wnt_pathway, "Hedgehog":hedgehog_pathway, 
        "mTOR":mTOR_pathway, "Replication":replicationpathway}

MYC_pos_pathway_affected={"Chromatin Modification": 0, "Cell Cycle\\Apoptosis":0, "TGF beta":0,"PI3K":0,"RAS":0, "APC":0,
                          "Transcription Regulation":0, "DNA Damage Repair":0, "MAP Kinase":0,"STAT":0, "NOTCH":0, "WNT":0, "Hedgehog":0,
                          "mTOR":0, "Replication":0}
MYC_pos_pathway_unaffected={"Chromatin Modification": 0, "Cell Cycle\\Apoptosis":0, "TGF beta":0,"PI3K":0,"RAS":0, "APC":0,
                          "Transcription Regulation":0, "DNA Damage Repair":0, "MAP Kinase":0,"STAT":0, "NOTCH":0, "WNT":0, "Hedgehog":0,
                          "mTOR":0, "Replication":0}
MYC_neg_pathway_affected={"Chromatin Modification": 0, "Cell Cycle\\Apoptosis":0, "TGF beta":0,"PI3K":0,"RAS":0, "APC":0,  
                          "Transcription Regulation":0, "DNA Damage Repair":0, "MAP Kinase":0,"STAT":0, "NOTCH":0, "WNT":0, "Hedgehog":0,
                          "mTOR":0, "Replication":0}
MYC_neg_pathway_unaffected={"Chromatin Modification": 0, "Cell Cycle\\Apoptosis":0, "TGF beta":0,"PI3K":0,"RAS":0, "APC":0, 
                          "Transcription Regulation":0, "DNA Damage Repair":0, "MAP Kinase":0,"STAT":0, "NOTCH":0, "WNT":0, "Hedgehog":0,
                          "mTOR":0, "Replication":0}


for sample in samples.values():
    if not sample.somatic_mutation_data:
        continue
    copynumber = "n/a"
    log2 = sample.getLog2FromGene(gene_positions['MYC'].chrom,gene_positions['MYC'].start, gene_positions['MYC'].end)
    if (log2 != None):
        copynumber = 2.0 * 2.0 ** log2
    else:
        copynumber="n/a"
    if copynumber == "n/a":
        continue
    for pathway in pathways.keys():
        pathway_affected = False
        for gene in pathways[pathway]:
            if gene in sample.genes_affected or sample.checkFocalGeneAmp(gene):
                pathway_affected = True
                break
        if pathway_affected and copynumber > 4:
            MYC_pos_pathway_affected[pathway] += 1
        elif not pathway_affected and copynumber > 4:
            MYC_pos_pathway_unaffected[pathway] += 1
        elif pathway_affected and not copynumber > 4:
            MYC_neg_pathway_affected[pathway] += 1
        elif not pathway_affected and not copynumber > 4:
            MYC_neg_pathway_unaffected[pathway] += 1

s = "<table>"
s += "<th></th><th colspan=\"4\">Sample</th><th colspan=\"3\">Fisher's exact test (two-sided)</th><th colspan=\"3\">Fisher's exact test (one-sided greater)</th><th colspan=\"3\">Fisher's exact test (one-sided less)</th></tr>"
s += "<tr><th>Pathway</th><th>MYC + pathway affected</th><th>MYC + pathway unaffected</th><th>MYC - pathway affected</th><th>MYC - pathway unaffected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th></tr>"

for pathway in pathways.keys():
   oddsratio, pvalue = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]])
   oddsratio_greater, pvalue_greater = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]], alternative='greater')
   oddsratio_less, pvalue_less = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]], alternative='less')
   s += "<tr><th>"+pathway+"</th><td>"+str(MYC_pos_pathway_affected[pathway])+"</td>"
   s += "<td>"+str(MYC_pos_pathway_unaffected[pathway])+"</td>"
   s += "<td>"+str(MYC_neg_pathway_affected[pathway])+"</td>"
   s += "<td>"+str(MYC_neg_pathway_unaffected[pathway])+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio)+"</td>"
   if pvalue < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue * 15)+"</td>"
   else:
      s += "<td>"+"{:.2f}".format(pvalue)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue * 15)+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio_greater)+"</td>"
   if pvalue_greater < 0.0005: 
      s += "<td>"+"{:.10f}".format(pvalue_greater)+"</td>"
      s += "<td>"+"{:.10f}".format(pvalue_greater * 15)+"</td>"
   elif pvalue_greater < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue_greater)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue_greater * 15)+"</td>"
   else:
      s += "<td>"+"{:.2f}".format(pvalue_greater)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue_greater * 15)+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio_less)+"</td>"
   if pvalue_less < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue_less)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue_less * 15)+"</td></tr>"
   else: 
      s += "<td>"+"{:.2f}".format(pvalue_less)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue_less * 15)+"</td></tr>"

s += "</table>"
h = HTML(s);h


Out[10]:
SampleFisher's exact test (two-sided)Fisher's exact test (one-sided greater)Fisher's exact test (one-sided less)
PathwayMYC + pathway affectedMYC + pathway unaffectedMYC - pathway affectedMYC - pathway unaffectedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value corrected
STAT21911257301.350.263.921.350.162.351.350.9013.48
Chromatin Modification47652496061.760.008570.128591.760.004510.067671.761.0014.96
PI3K49632935621.490.060.871.490.032010.480101.490.9814.70
Transcription Regulation18941686870.780.446.660.780.8512.740.780.223.32
NOTCH10102538021.480.314.581.480.182.731.480.9013.50
DNA Damage Repair68442845713.110.000000.000003.110.00000002280.00000034153.111.0015.00
TGF beta11101627931.390.345.111.390.213.201.390.8713.12
MAP Kinase18941337221.040.8913.351.040.497.331.040.629.28
Cell Cycle\Apoptosis87254374183.330.000000.000003.330.00000004200.00000062963.331.0015.00
Replication1111128430.631.0015.000.630.8012.000.630.548.17
mTOR111118547.690.223.287.690.223.287.690.9914.80
RAS52603385171.330.182.741.330.101.471.330.9313.99
WNT011228530.001.0015.000.001.0015.000.000.7811.72
APC15971497060.730.355.230.730.8913.320.730.182.63
Hedgehog4108258301.230.7711.481.230.446.571.230.7611.43

Remove TP53 from pathways Cell Cycle / Apoptosis and DNA Damage repair


In [11]:
chromatin_modification_pathway=["ARID1A","ARID1B","ARID2","ASXL1","ATRX","CREBBP","DAXX","DNMT1","DNMT3A","EP300","EZH2","H3H3A","HIST1H3B","IDH1","IDH2","KDM5C","KDM6A","MEN1","MLL2","MLL3","NCOR1","PAX5","PBRM1","PRDM1","SETD2","SETBP1","SMARCA4","SMARCB1","SPOP","TET2","WT1","NCOA3"]
cell_cycle_apoptosis_pathway=["ABL1","BCL2","CARD1","CASP8","CDC73","CDKN2A","CYLD","DAXX","FUBP1","MED12","MYD88","NFE2L2","NPM2","PPP2R1A","RB1","TNFAIP3","TRAF7","CCND1","CDKN2C","MDM2","MDM4","MYCL1","MYCN","SKP2"]
tgf_beta_pathway=["ACVR1B","EP300","FOXL2","GATA1","GATA2","GNAS","MED12","SMAD2","SMAD4"]
PI3K_pathway=["AKT1","ALK","B2M","CBL","CEBPA","CSF1R","EGFR","ERBB2","FGFR2","FGFR3","FLT3","GNA11","GNAQ","GNAS","KIT","MET","PDGFRA","PIK2CA","PIK3R1","PTEN","RET","TSC1","TSHR","VHL","NCOA3"]
RAS_pathway=["ALK","B2M","BRAF","CBL","CEBPA","CIC","CSF1R","EGFR","ERBB2","FGFR2","FGFR3","FLT3","GNA11","GNAQ","GNAS","HRAS","KIT","KRAS","MAP2K1","MAP3K1","MET","NF1","NRAS","PDGFRA","PTPN11","RET","VHL"]
APC_pathway=["AXIN1","CDH1","CTNNB1","EP300","FAM123B","GNAS","HNF1A","NF2","RNF43","SOX9"]
transcription_regulation_pathway=["AR","BCOR","CREBBP","GATA3","KLF4","PHF6","RUNX1","SF3B1","SRSF2","U2AF1","IKZF1","LMO1"]
dna_damage_pathway=["ATM","BAP1","BRCA1","BRCA2","MLH1","MSH2","MSH6","STAG2"]
mapkinase_pathway=["B2M","CEBPA","GNA11","GNAQ","MAP3K1","TNFAIP3","TSHR","MAP2K4","NKX2-1"]
STAT_pathway=["CLRF","FGFR2","FGFR3","FLT3","JAK1","JAK2","JAK3","STAT","MPL","SOCS1","VHL"]
NOTCH_pathway=["EP300","FBXW7","GATA1","GATA2","NOTCH1","NOTCH2"]
wnt_pathway=["KLF4"]
hedgehog_pathway=["PTCH1","SMO","SPOP"]
mTOR_pathway=["STK11"]
replicationpathway=["SETBP1"]

pathways={"Chromatin Modification": chromatin_modification_pathway, "Cell Cycle\\Apoptosis":cell_cycle_apoptosis_pathway, 
        "TGF beta":tgf_beta_pathway,"PI3K":PI3K_pathway,"RAS":RAS_pathway, "APC":APC_pathway,  
        "Transcription Regulation":transcription_regulation_pathway, 
        "DNA Damage Repair":dna_damage_pathway, "MAP Kinase":mapkinase_pathway,
        "STAT":STAT_pathway, "NOTCH":NOTCH_pathway, 
        "WNT":wnt_pathway, "Hedgehog":hedgehog_pathway, 
        "mTOR":mTOR_pathway, "Replication":replicationpathway}

MYC_pos_pathway_affected={"Chromatin Modification": 0, "Cell Cycle\\Apoptosis":0, "TGF beta":0,"PI3K":0,"RAS":0, "APC":0,
                          "Transcription Regulation":0, "DNA Damage Repair":0, "MAP Kinase":0,"STAT":0, "NOTCH":0, "WNT":0, "Hedgehog":0,
                          "mTOR":0, "Replication":0}
MYC_pos_pathway_unaffected={"Chromatin Modification": 0, "Cell Cycle\\Apoptosis":0, "TGF beta":0,"PI3K":0,"RAS":0, "APC":0,
                          "Transcription Regulation":0, "DNA Damage Repair":0, "MAP Kinase":0,"STAT":0, "NOTCH":0, "WNT":0, "Hedgehog":0,
                          "mTOR":0, "Replication":0}
MYC_neg_pathway_affected={"Chromatin Modification": 0, "Cell Cycle\\Apoptosis":0, "TGF beta":0,"PI3K":0,"RAS":0, "APC":0,  
                          "Transcription Regulation":0, "DNA Damage Repair":0, "MAP Kinase":0,"STAT":0, "NOTCH":0, "WNT":0, "Hedgehog":0,
                          "mTOR":0, "Replication":0}
MYC_neg_pathway_unaffected={"Chromatin Modification": 0, "Cell Cycle\\Apoptosis":0, "TGF beta":0,"PI3K":0,"RAS":0, "APC":0, 
                          "Transcription Regulation":0, "DNA Damage Repair":0, "MAP Kinase":0,"STAT":0, "NOTCH":0, "WNT":0, "Hedgehog":0,
                          "mTOR":0, "Replication":0}


for sample in samples.values():
    if not sample.somatic_mutation_data:
        continue
    copynumber = "n/a"
    log2 = sample.getLog2FromGene(gene_positions['MYC'].chrom,gene_positions['MYC'].start, gene_positions['MYC'].end)
    if (log2 != None):
        copynumber = 2.0 * 2.0 ** log2
    else:
        copynumber="n/a"
    if copynumber == "n/a":
        continue
    for pathway in pathways.keys():
        pathway_affected = False
        for gene in pathways[pathway]:
            if gene in sample.genes_affected or sample.checkFocalGeneAmp(gene):
                pathway_affected = True
                break
        if pathway_affected and copynumber > 4:
            MYC_pos_pathway_affected[pathway] += 1
        elif not pathway_affected and copynumber > 4:
            MYC_pos_pathway_unaffected[pathway] += 1
        elif pathway_affected and not copynumber > 4:
            MYC_neg_pathway_affected[pathway] += 1
        elif not pathway_affected and not copynumber > 4:
            MYC_neg_pathway_unaffected[pathway] += 1

s = "<table>"
s += "<th></th><th colspan=\"4\">Sample</th><th colspan=\"3\">Fisher's exact test (two-sided)</th><th colspan=\"3\">Fisher's exact test (one-sided greater)</th><th colspan=\"3\">Fisher's exact test (one-sided less)</th></tr>"
s += "<tr><th>Pathway</th><th>MYC + pathway affected</th><th>MYC + pathway unaffected</th><th>MYC - pathway affected</th><th>MYC - pathway unaffected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th></tr>"

for pathway in pathways.keys():
   oddsratio, pvalue = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]])
   oddsratio_greater, pvalue_greater = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]], alternative='greater')
   oddsratio_less, pvalue_less = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]], alternative='less')
   s += "<tr><th>"+pathway+"</th><td>"+str(MYC_pos_pathway_affected[pathway])+"</td>"
   s += "<td>"+str(MYC_pos_pathway_unaffected[pathway])+"</td>"
   s += "<td>"+str(MYC_neg_pathway_affected[pathway])+"</td>"
   s += "<td>"+str(MYC_neg_pathway_unaffected[pathway])+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio)+"</td>"
   if pvalue < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue * 15)+"</td>"
   else:
      s += "<td>"+"{:.2f}".format(pvalue)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue * 15)+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio_greater)+"</td>"
   if pvalue_greater < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue_greater)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue_greater * 15)+"</td>"
   else:
      s += "<td>"+"{:.2f}".format(pvalue_greater)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue_greater * 15)+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio_less)+"</td>"
   if pvalue_less < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue_less)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue_less * 15)+"</td></tr>"
   else: 
      s += "<td>"+"{:.2f}".format(pvalue_less)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue_less * 15)+"</td></tr>"

s += "</table>"
h = HTML(s);h


Out[11]:
SampleFisher's exact test (two-sided)Fisher's exact test (one-sided greater)Fisher's exact test (one-sided less)
PathwayMYC + pathway affectedMYC + pathway unaffectedMYC - pathway affectedMYC - pathway unaffectedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value corrected
STAT21911257301.350.263.921.350.162.351.350.9013.48
Chromatin Modification47652496061.760.008570.128591.760.004510.067671.761.0014.96
PI3K49632935621.490.060.871.490.032010.480101.490.9814.70
Transcription Regulation18941686870.780.446.660.780.8512.740.780.223.32
NOTCH10102538021.480.314.581.480.182.731.480.9013.50
DNA Damage Repair9103747810.921.0015.000.920.649.640.920.507.49
TGF beta11101627931.390.345.111.390.213.201.390.8713.12
MAP Kinase18941337221.040.8913.351.040.497.331.040.629.28
Cell Cycle\Apoptosis50622935621.550.035560.533431.550.021060.315851.550.9914.81
Replication1111128430.631.0015.000.630.8012.000.630.548.17
mTOR111118547.690.223.287.690.223.287.690.9914.80
RAS52603385171.330.182.741.330.101.471.330.9313.99
WNT011228530.001.0015.000.001.0015.000.000.7811.72
APC15971497060.730.355.230.730.8913.320.730.182.63
Hedgehog4108258301.230.7711.481.230.446.571.230.7611.43

Which Genes are affected in MYC positive samples (Vogelstein)

This time use genes of pathway definitions of Vogelstein et al.

Note that MYC has been deleted from the gene list


In [12]:
genes=["ABL1","ACVR1B","AKT1","ALK","APC","AR","ARID1A","ARID1B","ARID2","ASXL1","ATM","ATRX","AXIN1","B2M","BAP1","BCL2","BCOR","BRAF",
       "BRCA1","BRCA2","CARD11","CASP8","CBL","CDC73","CDH1","CDKN2A","CEBPA","CIC","CREBBP","CRLF2","CSF1R","CTNNB1","CYLD","DAXX",
       "DNMT1","DNMT3A","EGFR","EP300","ERBB2","EZH2","FAM123B","FBXW7","FGFR2","FGFR3","FLT3","FOXL2","FUBP1","GATA1","GATA2","GATA3",
       "GNA11","GNAQ","GNAS","H3F3A","HIST1H3B","HNF1A","HRAS","IDH1","IDH2","JAK1","JAK2","JAK3","KDM5C","KDM6A","KIT","KLF4","KRAS",
       "MAP2K1","MAP3K1","MED12","MEN1","MET","MLH1","MLL2","MLL3","MPL","MSH2","MSH6","MYD88","NCOR1","NF1","NF2","NFE2L2","NOTCH1",
       "NOTCH2","NPM1","NRAS","PAX5","PBRM1","PDGFRA","PHF6","PIK3CA","PIK3R1","PPP2R1A","PRDM1","PTCH1","PTEN","PTPN11","RB1","RET",
       "RNF43","RUNX1","SETD2","SETBP1","SF3B1","SMAD2","SMAD4","SMARCA4","SMARCB1","SMO","SOCS1","SOX9","SPOP","SRSF2","STAG2","STK11",
       "TET2","TNFAIP3","TRAF7","TP53","TSC1","TSHR","U2AF1","VHL","WT1","CCND1","CDKN2C","IKZF1","LMO1","MAP2K4","MDM2","MDM4",
       "MYCL1","MYCN","NCOA3","NKX2-1","SKP2"]


s = "<table>"
s += "<th></th><th colspan=\"4\">Sample</th><th colspan=\"3\">Fisher's exact test (two-sided)</th><th colspan=\"3\">Fisher's exact test (one-sided greater)</th><th colspan=\"3\">Fisher's exact test (one-sided less)</th></tr>"
s += "<tr><th>Gene</th><th>MYC + Gene affected</th><th>MYC + Gene unaffected</th><th>MYC - Gene affected</th><th>MYC - Gene unaffected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th></tr>"

gene_count = 0
for gene in genes:
    gene_count += 1
    MYC_pos_gene_affected = 0
    MYC_pos_gene_unaffected = 0
    MYC_neg_gene_affected = 0
    MYC_neg_gene_unaffected = 0

    for sample in samples.values():
        if not sample.somatic_mutation_data:
            continue
        if sample.checkFocalGeneAmp(gene) or gene in sample.genes_affected:
            if sample.checkFocalGeneAmp("MYC"):
                MYC_pos_gene_affected += 1
            else:
                MYC_neg_gene_affected += 1
        else:
            if sample.checkFocalGeneAmp("MYC"):
                MYC_pos_gene_unaffected += 1
            else:
                MYC_neg_gene_unaffected += 1

    oddsratio, pvalue = scipy.stats.fisher_exact([[MYC_pos_gene_affected, MYC_pos_gene_unaffected], [MYC_neg_gene_affected, MYC_neg_gene_unaffected]])
    oddsratio_greater, pvalue_greater = scipy.stats.fisher_exact([[MYC_pos_gene_affected, MYC_pos_gene_unaffected], [MYC_neg_gene_affected, MYC_neg_gene_unaffected]], alternative='greater')
    oddsratio_less, pvalue_less = scipy.stats.fisher_exact([[MYC_pos_gene_affected, MYC_pos_gene_unaffected], [MYC_neg_gene_affected, MYC_neg_gene_unaffected]], alternative='less')

    if (pvalue > 0.05 and pvalue_greater > 0.05 and pvalue_less > 0.05):
        continue
        
    s += "<tr><th>"+gene+"</th><td>"+str(MYC_pos_gene_affected)+"</td>"
    s += "<td>"+str(MYC_pos_gene_unaffected)+"</td>"
    s += "<td>"+str(MYC_neg_gene_affected)+"</td>"
    s += "<td>"+str(MYC_neg_gene_unaffected)+"</td>"
    s += "<td>"+"{:.2f}".format(oddsratio)+"</td>"
    if pvalue < 0.05: 
       s += "<td>"+"{:.5f}".format(pvalue)+"</td>"
       s += "<td>"+"{:.5f}".format(pvalue * 137)+"</td>"
    else:
       s += "<td>"+"{:.2f}".format(pvalue)+"</td>"
       s += "<td>"+"{:.2f}".format(pvalue * 137)+"</td>"
    s += "<td>"+"{:.2f}".format(oddsratio_greater)+"</td>"
    if pvalue_greater < 0.05: 
       s += "<td>"+"{:.5f}".format(pvalue_greater)+"</td>"
       s += "<td>"+"{:.5f}".format(pvalue_greater * 137)+"</td>"
    else:
       s += "<td>"+"{:.2f}".format(pvalue_greater)+"</td>"
       s += "<td>"+"{:.2f}".format(pvalue_greater * 137)+"</td>"
    s += "<td>"+"{:.2f}".format(oddsratio_less)+"</td>"
    if pvalue_less < 0.05: 
       s += "<td>"+"{:.5f}".format(pvalue_less)+"</td>"
       s += "<td>"+"{:.5f}".format(pvalue_less * 137)+"</td></tr>"
    else: 
       s += "<td>"+"{:.2f}".format(pvalue_less)+"</td>"
       s += "<td>"+"{:.2f}".format(pvalue_less * 137)+"</td></tr>"

s += "</table>"
print "Genes: "+str(gene_count)
h = HTML(s);h


Genes: 137
Out[12]:
SampleFisher's exact test (two-sided)Fisher's exact test (one-sided greater)Fisher's exact test (one-sided less)
GeneMYC + Gene affectedMYC + Gene unaffectedMYC - Gene affectedMYC - Gene unaffectedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value corrected
AXIN1259390410.210.034544.7314210.210.034544.7314210.211.00136.70
BRCA1556108978.010.001550.212788.010.001550.212788.011.00136.98
CEBPA457490315.840.000830.1134415.840.000830.1134415.841.00136.99
CYLD35888995.810.027333.744485.810.027333.744485.811.00136.54
ERBB215461207872.140.020422.797812.140.015322.099152.140.99136.13
PIK3CA14473175900.550.079.460.550.98134.590.550.035284.83290
SOCS125949037.650.049726.812147.650.049726.812147.651.00136.43
TP5336252616463.560.000000.000333.560.000000.000283.561.00137.00
NCOA3754168917.220.000310.042297.220.000310.042297.221.00137.00

Which Genes are affected in MYC copynumber > 4 samples (Vogelstein)


In [13]:
genes=["ABL1","ACVR1B","AKT1","ALK","APC","AR","ARID1A","ARID1B","ARID2","ASXL1","ATM","ATRX","AXIN1","B2M","BAP1","BCL2","BCOR","BRAF",
       "BRCA1","BRCA2","CARD11","CASP8","CBL","CDC73","CDH1","CDKN2A","CEBPA","CIC","CREBBP","CRLF2","CSF1R","CTNNB1","CYLD","DAXX",
       "DNMT1","DNMT3A","EGFR","EP300","ERBB2","EZH2","FAM123B","FBXW7","FGFR2","FGFR3","FLT3","FOXL2","FUBP1","GATA1","GATA2","GATA3",
       "GNA11","GNAQ","GNAS","H3F3A","HIST1H3B","HNF1A","HRAS","IDH1","IDH2","JAK1","JAK2","JAK3","KDM5C","KDM6A","KIT","KLF4","KRAS",
       "MAP2K1","MAP3K1","MED12","MEN1","MET","MLH1","MLL2","MLL3","MPL","MSH2","MSH6","MYD88","NCOR1","NF1","NF2","NFE2L2","NOTCH1",
       "NOTCH2","NPM1","NRAS","PAX5","PBRM1","PDGFRA","PHF6","PIK3CA","PIK3R1","PPP2R1A","PRDM1","PTCH1","PTEN","PTPN11","RB1","RET",
       "RNF43","RUNX1","SETD2","SETBP1","SF3B1","SMAD2","SMAD4","SMARCA4","SMARCB1","SMO","SOCS1","SOX9","SPOP","SRSF2","STAG2","STK11",
       "TET2","TNFAIP3","TRAF7","TP53","TSC1","TSHR","U2AF1","VHL","WT1","CCND1","CDKN2C","IKZF1","LMO1","MAP2K4","MDM2","MDM4",
       "MYCL1","MYCN","NCOA3","NKX2-1","SKP2"]


s = "<table>"
s += "<th></th><th colspan=\"4\">Sample</th><th colspan=\"3\">Fisher's exact test (two-sided)</th><th colspan=\"3\">Fisher's exact test (one-sided greater)</th><th colspan=\"3\">Fisher's exact test (one-sided less)</th></tr>"
s += "<tr><th>Gene</th><th>MYC + Gene affected</th><th>MYC + Gene unaffected</th><th>MYC - Gene affected</th><th>MYC - Gene unaffected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th></tr>"

gene_count = 0
for gene in genes:
    gene_count += 1
    MYC_pos_gene_affected = 0
    MYC_pos_gene_unaffected = 0
    MYC_neg_gene_affected = 0
    MYC_neg_gene_unaffected = 0

    for sample in samples.values():
        if not sample.somatic_mutation_data:
            continue
        copynumber = "n/a"
        log2 = sample.getLog2FromGene(gene_positions['MYC'].chrom,gene_positions['MYC'].start, gene_positions['MYC'].end)
        if (log2 != None):
           copynumber = 2.0 * 2.0 ** log2
        else:
           copynumber="n/a"
        if copynumber == "n/a":
           continue
        if sample.checkFocalGeneAmp(gene) or gene in sample.genes_affected:
            if copynumber > 4:
                MYC_pos_gene_affected += 1
            else:
                MYC_neg_gene_affected += 1
        else:
            if copynumber > 4:
                MYC_pos_gene_unaffected += 1
            else:
                MYC_neg_gene_unaffected += 1

    oddsratio, pvalue = scipy.stats.fisher_exact([[MYC_pos_gene_affected, MYC_pos_gene_unaffected], [MYC_neg_gene_affected, MYC_neg_gene_unaffected]])
    oddsratio_greater, pvalue_greater = scipy.stats.fisher_exact([[MYC_pos_gene_affected, MYC_pos_gene_unaffected], [MYC_neg_gene_affected, MYC_neg_gene_unaffected]], alternative='greater')
    oddsratio_less, pvalue_less = scipy.stats.fisher_exact([[MYC_pos_gene_affected, MYC_pos_gene_unaffected], [MYC_neg_gene_affected, MYC_neg_gene_unaffected]], alternative='less')

    if (pvalue > 0.05 and pvalue_greater > 0.05 and pvalue_less > 0.05):
        continue
        
    s += "<tr><th>"+gene+"</th><td>"+str(MYC_pos_gene_affected)+"</td>"
    s += "<td>"+str(MYC_pos_gene_unaffected)+"</td>"
    s += "<td>"+str(MYC_neg_gene_affected)+"</td>"
    s += "<td>"+str(MYC_neg_gene_unaffected)+"</td>"
    s += "<td>"+"{:.2f}".format(oddsratio)+"</td>"
    if pvalue < 0.05: 
       s += "<td>"+"{:.5f}".format(pvalue)+"</td>"
       s += "<td>"+"{:.5f}".format(pvalue * 137)+"</td>"
    else:
       s += "<td>"+"{:.2f}".format(pvalue)+"</td>"
       s += "<td>"+"{:.2f}".format(pvalue * 137)+"</td>"
    s += "<td>"+"{:.2f}".format(oddsratio_greater)+"</td>"
    if pvalue_greater < 0.05: 
       s += "<td>"+"{:.5f}".format(pvalue_greater)+"</td>"
       s += "<td>"+"{:.5f}".format(pvalue_greater * 137)+"</td>"
    else:
       s += "<td>"+"{:.2f}".format(pvalue_greater)+"</td>"
       s += "<td>"+"{:.2f}".format(pvalue_greater * 137)+"</td>"
    s += "<td>"+"{:.2f}".format(oddsratio_less)+"</td>"
    if pvalue_less < 0.05: 
       s += "<td>"+"{:.5f}".format(pvalue_less)+"</td>"
       s += "<td>"+"{:.5f}".format(pvalue_less * 137)+"</td></tr>"
    else: 
       s += "<td>"+"{:.2f}".format(pvalue_less)+"</td>"
       s += "<td>"+"{:.2f}".format(pvalue_less * 137)+"</td></tr>"

s += "</table>"
print "Genes: "+str(gene_count)
h = HTML(s);h


Genes: 137
Out[13]:
SampleFisher's exact test (two-sided)Fisher's exact test (one-sided greater)Fisher's exact test (one-sided less)
GeneMYC + Gene affectedMYC + Gene unaffectedMYC - Gene affectedMYC - Gene unaffectedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value corrected
AXIN13109285311.740.012711.7416811.740.012711.7416811.741.00136.89
CDH171051057500.480.068.290.480.98134.910.480.036174.95570
CYLD610658509.620.000600.082549.620.000600.082549.621.00136.99
EGFR7105178383.290.014982.052133.290.014982.052133.291.00136.48
FGFR36106158403.170.026613.645893.170.026613.645893.170.99136.07
PIK3CA29833025530.640.067.710.640.98134.660.640.028883.95703
SOCS1310938527.820.023293.191187.820.023293.191187.821.00136.71
TP5363492336223.430.000000.000003.430.000000.000003.431.00137.00
MDM210102388172.110.068.122.110.041545.690562.110.98134.71
NCOA38104158404.310.002780.381354.310.002780.381354.311.00136.92

Which pathways are affected in MYC CN>4 but CN<6 samples


In [14]:
TP53_pathway=["CCND1","CDKN2A", "CUL9", "NPM1", "PTEN", "TP53","AGBL2", "ALS2CR8", "AMOTL1", "ANKRD12", "BRPF1", "CACHD1", "CARNS1", "CDKN2AIP", "CELSR3", "CHD8", "EPHA3", "HECW2", "IFT140", "IWS1", "MAGI2", "MAST3", "MDM4", "MLL5", "PLEKHA8", "PRDM2", "PRKRIR", "SCAPER", "SETD2", "SMG1", "SMG5", "SMG7", "SNRK", "SPTBN2", "STK11IP", "STRADA", "SWAP70", "TEP1", "TRIP12", "TTLL5", "UACA", "ZFP91", "ZMIZ1", "ZNF227", "ZNF668"]
STAG1_pathway=["PDS5B", "STAG1", "STAG2", "WAPAL","PDS5A", "RAD21", "SMC1A"]
MLL_pathway=["KDM6A", "MLL2", "MLL3","E2F3", "N4BP2", "PROSER1"]
BAP1_pathway=["ASXL1", "ASXL2", "BAP1","ANKRD17", "FOXK1", "FOXK2", "KDM1B"]
KRAS_pathway=["BRAF", "KRAS", "PIK3CA","ARHGAP4", "EIF2AK1", "FAM118B", "NRAS", "PLCE1", "RASGRF2", "RASGRP2", "RGL1", "VARS2"]
HLA_pathway=["B2M", "CD1D","HLA-A", "HLA-B", "P4HTM"]
HER2_pathway=["EGFR", "ERBB2","AREG", "ELF3", "ERBB4", "LRIG1", "OSMR"]
CDON_pathway=["BOC", "CDON"]
KEAP1_pathway=["KEAP1", "NFE2L2","CHD6", "PTMA", "WAC"]
SPOP_pathway=["MYD88", "SPOP"]
RUNX1_pathway=["CBFB", "RUNX1","ELF4"]
NOTCH_pathway=["FBXW7", "NOTCH1","CCNE1", "DCLRE1C", "DLL1", "DTX4", "FAM101A", "FOXM1", "JAG1", "KLF5", "LFNG", "LPP", "MAML1", "MMS22L", "NOTCH3", "NOTCH4", "SHPRH"]
ARID_pathway=["ARID1A", "PBRM1","ADNP", "ARID1B", "ARID2"]
SMARC_pathway=["SMARCB1", "SMARCA4"]
SMC_pathway=["NCAPD3","SMC4","NCAPG2","NCAPH2","SMC2","NCAPD2"]
CLASP_pathway=["CLASP1","CLASP2","CLIP2"]

pathways={"TP53_pathway": TP53_pathway, "STAG1_pathway":STAG1_pathway, "MLL_pathway":MLL_pathway,"BAP1_pathway":BAP1_pathway,"KRAS_pathway":KRAS_pathway, "HLA_pathway":HLA_pathway,  "HER2_pathway":HER2_pathway, "CDON_pathway":CDON_pathway, "KEAP1_pathway":KEAP1_pathway,"SPOP_pathway":SPOP_pathway, "RUNX1_pathway":RUNX1_pathway, "NOTCH_pathway":NOTCH_pathway, "ARID_pathway":ARID_pathway, "SMARC_pathway":SMARC_pathway, "SMC_pathway":SMC_pathway,"CLASP_pathway":CLASP_pathway }

MYC_pos_pathway_affected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }
MYC_pos_pathway_unaffected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }
MYC_neg_pathway_affected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }
MYC_neg_pathway_unaffected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }


for sample in samples.values():
    if not sample.somatic_mutation_data:
        continue
    copynumber = "n/a"
    log2 = sample.getLog2FromGene(gene_positions['MYC'].chrom,gene_positions['MYC'].start, gene_positions['MYC'].end)
    if (log2 != None):
        copynumber = 2.0 * 2.0 ** log2
    else:
        copynumber="n/a"
    for pathway in pathways.keys():
        pathway_affected = False
        for gene in pathways[pathway]:
            if gene in sample.genes_affected or sample.checkFocalGeneAmp(gene):
                pathway_affected = True
                break
        if copynumber != "n/a" and pathway_affected and copynumber > 4 and copynumber < 6:
            MYC_pos_pathway_affected[pathway] += 1
        elif copynumber != "n/a" and not pathway_affected and  copynumber > 4 and copynumber < 6:
            MYC_pos_pathway_unaffected[pathway] += 1
        elif copynumber != "n/a" and pathway_affected and not  copynumber > 4:
            MYC_neg_pathway_affected[pathway] += 1
        elif copynumber != "n/a" and not pathway_affected and not  copynumber > 4:
            MYC_neg_pathway_unaffected[pathway] += 1

s = "<table>"
s += "<th></th><th colspan=\"4\">Sample</th><th colspan=\"3\">Fisher's exact test (two-sided)</th><th colspan=\"3\">Fisher's exact test (one-sided greater)</th><th colspan=\"3\">Fisher's exact test (one-sided less)</th></tr>"
s += "<tr><th>Pathway</th><th>MYC + pathway affected</th><th>MYC + pathway unaffected</th><th>MYC - pathway affected</th><th>MYC - pathway unaffected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th></tr>"

for pathway in pathways.keys():
   oddsratio, pvalue = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]])
   oddsratio_greater, pvalue_greater = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]], alternative='greater')
   oddsratio_less, pvalue_less = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]], alternative='less')
   s += "<tr><th>"+pathway+"</th><td>"+str(MYC_pos_pathway_affected[pathway])+"</td>"
   s += "<td>"+str(MYC_pos_pathway_unaffected[pathway])+"</td>"
   s += "<td>"+str(MYC_neg_pathway_affected[pathway])+"</td>"
   s += "<td>"+str(MYC_neg_pathway_unaffected[pathway])+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio)+"</td>"
   if pvalue < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue * 16)+"</td>"
   else:
      s += "<td>"+"{:.2f}".format(pvalue)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue * 16)+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio_greater)+"</td>"
   if pvalue_greater < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue_greater)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue_greater * 16)+"</td>"
   else:
      s += "<td>"+"{:.2f}".format(pvalue_greater)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue_greater * 16)+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio_less)+"</td>"
   if pvalue_less < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue_less)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue_less * 16)+"</td></tr>"
   else: 
      s += "<td>"+"{:.2f}".format(pvalue_less)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue_less * 16)+"</td></tr>"

s += "</table>"
h = HTML(s);h


Out[14]:
SampleFisher's exact test (two-sided)Fisher's exact test (one-sided greater)Fisher's exact test (one-sided less)
PathwayMYC + pathway affectedMYC + pathway unaffectedMYC - pathway affectedMYC - pathway unaffectedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value corrected
STAG1_pathway093558000.000.004380.070140.001.0016.000.000.002860.04578
KRAS_pathway33603515040.790.325.100.790.8814.010.790.182.83
CLASP_pathway489218341.780.304.751.780.223.581.780.9114.57
SMARC_pathway19288471.150.619.711.150.619.711.150.7812.51
SPOP_pathway09358500.001.0016.000.001.0016.000.000.609.54
RUNX1_pathway291577980.310.111.770.310.9915.770.310.060.91
MLL_pathway14791137421.160.6310.091.160.365.741.160.7511.99
TP53_pathway75185023532.930.000030.000442.930.000020.000252.931.0016.00
ARID_pathway687617940.901.0016.000.900.6610.580.900.518.12
KEAP1_pathway489298261.280.568.901.280.416.541.280.7812.55
SMC_pathway588248311.970.193.111.970.152.341.970.9415.10
NOTCH_pathway17761237321.330.355.671.330.193.121.330.8814.00
BAP1_pathway786558001.180.6610.541.180.416.521.180.7411.91
CDON_pathway390158401.870.416.541.870.264.091.870.9114.54
HLA_pathway390258301.110.7512.001.110.538.471.110.7111.33
HER2_pathway25681566991.650.050.821.650.033940.542981.650.9815.70

Which pathways are affected in MYC CN>6 samples


In [15]:
TP53_pathway=["CCND1","CDKN2A", "CUL9", "NPM1", "PTEN", "TP53","AGBL2", "ALS2CR8", "AMOTL1", "ANKRD12", "BRPF1", "CACHD1", "CARNS1", "CDKN2AIP", "CELSR3", "CHD8", "EPHA3", "HECW2", "IFT140", "IWS1", "MAGI2", "MAST3", "MDM4", "MLL5", "PLEKHA8", "PRDM2", "PRKRIR", "SCAPER", "SETD2", "SMG1", "SMG5", "SMG7", "SNRK", "SPTBN2", "STK11IP", "STRADA", "SWAP70", "TEP1", "TRIP12", "TTLL5", "UACA", "ZFP91", "ZMIZ1", "ZNF227", "ZNF668"]
STAG1_pathway=["PDS5B", "STAG1", "STAG2", "WAPAL","PDS5A", "RAD21", "SMC1A"]
MLL_pathway=["KDM6A", "MLL2", "MLL3","E2F3", "N4BP2", "PROSER1"]
BAP1_pathway=["ASXL1", "ASXL2", "BAP1","ANKRD17", "FOXK1", "FOXK2", "KDM1B"]
KRAS_pathway=["BRAF", "KRAS", "PIK3CA","ARHGAP4", "EIF2AK1", "FAM118B", "NRAS", "PLCE1", "RASGRF2", "RASGRP2", "RGL1", "VARS2"]
HLA_pathway=["B2M", "CD1D","HLA-A", "HLA-B", "P4HTM"]
HER2_pathway=["EGFR", "ERBB2","AREG", "ELF3", "ERBB4", "LRIG1", "OSMR"]
CDON_pathway=["BOC", "CDON"]
KEAP1_pathway=["KEAP1", "NFE2L2","CHD6", "PTMA", "WAC"]
SPOP_pathway=["MYD88", "SPOP"]
RUNX1_pathway=["CBFB", "RUNX1","ELF4"]
NOTCH_pathway=["FBXW7", "NOTCH1","CCNE1", "DCLRE1C", "DLL1", "DTX4", "FAM101A", "FOXM1", "JAG1", "KLF5", "LFNG", "LPP", "MAML1", "MMS22L", "NOTCH3", "NOTCH4", "SHPRH"]
ARID_pathway=["ARID1A", "PBRM1","ADNP", "ARID1B", "ARID2"]
SMARC_pathway=["SMARCB1", "SMARCA4"]
SMC_pathway=["NCAPD3","SMC4","NCAPG2","NCAPH2","SMC2","NCAPD2"]
CLASP_pathway=["CLASP1","CLASP2","CLIP2"]

pathways={"TP53_pathway": TP53_pathway, "STAG1_pathway":STAG1_pathway, "MLL_pathway":MLL_pathway,"BAP1_pathway":BAP1_pathway,"KRAS_pathway":KRAS_pathway, "HLA_pathway":HLA_pathway,  "HER2_pathway":HER2_pathway, "CDON_pathway":CDON_pathway, "KEAP1_pathway":KEAP1_pathway,"SPOP_pathway":SPOP_pathway, "RUNX1_pathway":RUNX1_pathway, "NOTCH_pathway":NOTCH_pathway, "ARID_pathway":ARID_pathway, "SMARC_pathway":SMARC_pathway, "SMC_pathway":SMC_pathway,"CLASP_pathway":CLASP_pathway }

MYC_pos_pathway_affected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }
MYC_pos_pathway_unaffected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }
MYC_neg_pathway_affected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }
MYC_neg_pathway_unaffected={"TP53_pathway": 0, "STAG1_pathway":0, "MLL_pathway":0,"BAP1_pathway":0,"KRAS_pathway":0, "HLA_pathway":0,  "HER2_pathway":0, "CDON_pathway":0, "KEAP1_pathway":0,"SPOP_pathway":0, "RUNX1_pathway":0, "NOTCH_pathway":0, "ARID_pathway":0, "SMARC_pathway":0, "SMC_pathway":0,"CLASP_pathway":0 }


for sample in samples.values():
    if not sample.somatic_mutation_data:
        continue
    copynumber = "n/a"
    log2 = sample.getLog2FromGene(gene_positions['MYC'].chrom,gene_positions['MYC'].start, gene_positions['MYC'].end)
    if (log2 != None):
        copynumber = 2.0 * 2.0 ** log2
    else:
        copynumber="n/a"
    for pathway in pathways.keys():
        pathway_affected = False
        for gene in pathways[pathway]:
            if gene in sample.genes_affected or sample.checkFocalGeneAmp(gene):
                pathway_affected = True
                break
        if copynumber != "n/a" and pathway_affected and copynumber > 6:
            MYC_pos_pathway_affected[pathway] += 1
        elif copynumber != "n/a" and not pathway_affected and  copynumber > 6:
            MYC_pos_pathway_unaffected[pathway] += 1
        elif copynumber != "n/a" and pathway_affected and not  copynumber > 6:
            MYC_neg_pathway_affected[pathway] += 1
        elif copynumber != "n/a" and not pathway_affected and not  copynumber > 6:
            MYC_neg_pathway_unaffected[pathway] += 1

s = "<table>"
s += "<th></th><th colspan=\"4\">Sample</th><th colspan=\"3\">Fisher's exact test (two-sided)</th><th colspan=\"3\">Fisher's exact test (one-sided greater)</th><th colspan=\"3\">Fisher's exact test (one-sided less)</th></tr>"
s += "<tr><th>Pathway</th><th>MYC + pathway affected</th><th>MYC + pathway unaffected</th><th>MYC - pathway affected</th><th>MYC - pathway unaffected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th></tr>"

for pathway in pathways.keys():
   oddsratio, pvalue = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]])
   oddsratio_greater, pvalue_greater = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]], alternative='greater')
   oddsratio_less, pvalue_less = scipy.stats.fisher_exact([[MYC_pos_pathway_affected[pathway], MYC_pos_pathway_unaffected[pathway]], [MYC_neg_pathway_affected[pathway], MYC_neg_pathway_unaffected[pathway]]], alternative='less')
   s += "<tr><th>"+pathway+"</th><td>"+str(MYC_pos_pathway_affected[pathway])+"</td>"
   s += "<td>"+str(MYC_pos_pathway_unaffected[pathway])+"</td>"
   s += "<td>"+str(MYC_neg_pathway_affected[pathway])+"</td>"
   s += "<td>"+str(MYC_neg_pathway_unaffected[pathway])+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio)+"</td>"
   if pvalue < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue * 16)+"</td>"
   else:
      s += "<td>"+"{:.2f}".format(pvalue)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue * 16)+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio_greater)+"</td>"
   if pvalue_greater < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue_greater)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue_greater * 16)+"</td>"
   else:
      s += "<td>"+"{:.2f}".format(pvalue_greater)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue_greater * 16)+"</td>"
   s += "<td>"+"{:.2f}".format(oddsratio_less)+"</td>"
   if pvalue_less < 0.05: 
      s += "<td>"+"{:.5f}".format(pvalue_less)+"</td>"
      s += "<td>"+"{:.5f}".format(pvalue_less * 16)+"</td></tr>"
   else: 
      s += "<td>"+"{:.2f}".format(pvalue_less)+"</td>"
      s += "<td>"+"{:.2f}".format(pvalue_less * 16)+"</td></tr>"

s += "</table>"
h = HTML(s);h


Out[15]:
SampleFisher's exact test (two-sided)Fisher's exact test (one-sided greater)Fisher's exact test (one-sided less)
PathwayMYC + pathway affectedMYC + pathway unaffectedMYC - pathway affectedMYC - pathway unaffectedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value corrected
STAG1_pathway118558930.901.0016.000.900.6810.910.900.7011.16
KRAS_pathway9103845641.320.6410.221.320.355.641.320.8012.80
CLASP_pathway019259230.001.0016.000.001.0016.000.000.619.68
SMARC_pathway01999390.001.0016.000.001.0016.000.000.8413.37
SPOP_pathway01959430.001.0016.000.001.0016.000.000.9114.49
RUNX1_pathway019598890.000.629.980.001.0016.000.000.304.78
MLL_pathway2171278210.761.0016.000.760.7411.910.760.528.39
TP53_pathway1545773712.410.152.442.410.081.332.410.9715.55
ARID_pathway217678811.550.6410.241.550.406.371.550.8513.63
KEAP1_pathway118339151.540.507.951.540.507.951.540.8613.74
SMC_pathway019299190.001.0016.000.001.0016.000.000.568.92
NOTCH_pathway2171408080.681.0016.000.680.7912.700.680.467.28
BAP1_pathway118628860.791.0016.000.790.7311.610.790.6510.33
CDON_pathway019189300.001.0016.000.001.0016.000.000.7011.16
HLA_pathway019289200.001.0016.000.001.0016.000.000.579.11
HER2_pathway4151817671.130.7712.341.130.518.121.130.7111.35

Which genes of the Leiserson Pathway are affected at MYC copynumber > 4 but copynumber < 6 samples


In [16]:
genes=["CCND1","CDKN2A", "CUL9", "NPM1", "PTEN", "TP53","AGBL2", "ALS2CR8", "AMOTL1", "ANKRD12", "BRPF1", "CACHD1", "CARNS1", "CDKN2AIP", "CELSR3", "CHD8", "EPHA3", "HECW2", "IFT140", "IWS1", "MAGI2", "MAST3", "MDM4", "MLL5", "PLEKHA8", "PRDM2", "PRKRIR", "SCAPER", "SETD2", "SMG1", "SMG5", "SMG7", "SNRK",
       "SPTBN2", "STK11IP", "STRADA", "SWAP70", "TEP1", "TRIP12", "TTLL5", "UACA", "ZFP91", "ZMIZ1", "ZNF227", "ZNF668",
       "PDS5B", "STAG1", "STAG2", "WAPAL","PDS5A", "RAD21", "SMC1A",
       "KDM6A", "MLL2", "MLL3","E2F3", "N4BP2", "PROSER1",
       "ASXL1", "ASXL2", "BAP1","ANKRD17", "FOXK1", "FOXK2", "KDM1B",
       "BRAF", "KRAS", "PIK3CA","ARHGAP4", "EIF2AK1", "FAM118B", "NRAS", "PLCE1", "RASGRF2", "RASGRP2", "RGL1", "VARS2",
       "B2M", "CD1D","HLA-A", "HLA-B", "P4HTM",
       "EGFR", "ERBB2","AREG", "ELF3", "ERBB4", "LRIG1", "OSMR",
       "BOC", "CDON",
       "KEAP1", "NFE2L2","CHD6", "PTMA", "WAC",
       "MYD88", "SPOP",
       "CBFB", "RUNX1","ELF4",
       "FBXW7", "NOTCH1","CCNE1", "DCLRE1C", "DLL1", "DTX4", "FAM101A", "FOXM1", "JAG1", "KLF5", "LFNG", "LPP", "MAML1", "MMS22L", "NOTCH3", "NOTCH4", "SHPRH",
       "ARID1A", "PBRM1","ADNP", "ARID1B", "ARID2",
       "SMARCB1", "SMARCA4",
       "NCAPD3","SMC4","NCAPG2","NCAPH2","SMC2","NCAPD2",
       "CLASP1","CLASP2","CLIP2"]

s = "<table>"
s += "<th></th><th colspan=\"4\">Sample</th><th colspan=\"3\">Fisher's exact test (two-sided)</th><th colspan=\"3\">Fisher's exact test (one-sided greater)</th><th colspan=\"3\">Fisher's exact test (one-sided less)</th></tr>"
s += "<tr><th>Gene</th><th>MYC + Gene affected</th><th>MYC + Gene unaffected</th><th>MYC - Gene affected</th><th>MYC - Gene unaffected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th></tr>"

gene_count = 0
for gene in genes:
    gene_count += 1
    MYC_pos_gene_affected = 0
    MYC_pos_gene_unaffected = 0
    MYC_neg_gene_affected = 0
    MYC_neg_gene_unaffected = 0

    for sample in samples.values():
        if not sample.somatic_mutation_data:
            continue
        copynumber = "n/a"
        log2 = sample.getLog2FromGene(gene_positions['MYC'].chrom,gene_positions['MYC'].start, gene_positions['MYC'].end)
        if (log2 != None):
            copynumber = 2.0 * 2.0 ** log2
        else:
            copynumber="n/a"
        if sample.checkFocalGeneAmp(gene) or gene in sample.genes_affected:
            if copynumber != "n/a" and copynumber > 4 and copynumber < 6:
                MYC_pos_gene_affected += 1
            elif  copynumber != "n/a" and not copynumber > 4:
                MYC_neg_gene_affected += 1
        else:
            if copynumber != "n/a" and copynumber > 4 and copynumber < 6:
                MYC_pos_gene_unaffected += 1
            elif  copynumber != "n/a" and not copynumber > 4:
                MYC_neg_gene_unaffected += 1

    oddsratio, pvalue = scipy.stats.fisher_exact([[MYC_pos_gene_affected, MYC_pos_gene_unaffected], [MYC_neg_gene_affected, MYC_neg_gene_unaffected]])
    oddsratio_greater, pvalue_greater = scipy.stats.fisher_exact([[MYC_pos_gene_affected, MYC_pos_gene_unaffected], [MYC_neg_gene_affected, MYC_neg_gene_unaffected]], alternative='greater')
    oddsratio_less, pvalue_less = scipy.stats.fisher_exact([[MYC_pos_gene_affected, MYC_pos_gene_unaffected], [MYC_neg_gene_affected, MYC_neg_gene_unaffected]], alternative='less')

    if (pvalue > 0.05 and pvalue_greater > 0.05 and pvalue_less > 0.05):
        continue
        
    s += "<tr><th>"+gene+"</th><td>"+str(MYC_pos_gene_affected)+"</td>"
    s += "<td>"+str(MYC_pos_gene_unaffected)+"</td>"
    s += "<td>"+str(MYC_neg_gene_affected)+"</td>"
    s += "<td>"+str(MYC_neg_gene_unaffected)+"</td>"
    s += "<td>"+"{:.2f}".format(oddsratio)+"</td>"
    if pvalue < 0.05: 
       s += "<td>"+"{:.5f}".format(pvalue)+"</td>"
       s += "<td>"+"{:.5f}".format(pvalue * 134)+"</td>"
    else:
       s += "<td>"+"{:.2f}".format(pvalue)+"</td>"
       s += "<td>"+"{:.2f}".format(pvalue * 134)+"</td>"
    s += "<td>"+"{:.2f}".format(oddsratio_greater)+"</td>"
    if pvalue_greater < 0.05: 
       s += "<td>"+"{:.5f}".format(pvalue_greater)+"</td>"
       s += "<td>"+"{:.5f}".format(pvalue_greater * 134)+"</td>"
    else:
       s += "<td>"+"{:.2f}".format(pvalue_greater)+"</td>"
       s += "<td>"+"{:.2f}".format(pvalue_greater * 134)+"</td>"
    s += "<td>"+"{:.2f}".format(oddsratio_less)+"</td>"
    if pvalue_less < 0.05: 
       s += "<td>"+"{:.5f}".format(pvalue_less)+"</td>"
       s += "<td>"+"{:.5f}".format(pvalue_less * 134)+"</td></tr>"
    else: 
       s += "<td>"+"{:.2f}".format(pvalue_less)+"</td>"
       s += "<td>"+"{:.2f}".format(pvalue_less * 134)+"</td></tr>"

s += "</table>"
print "Genes: "+str(gene_count)
h = HTML(s);h


Genes: 134
Out[16]:
SampleFisher's exact test (two-sided)Fisher's exact test (one-sided greater)Fisher's exact test (one-sided less)
GeneMYC + Gene affectedMYC + Gene unaffectedMYC - Gene affectedMYC - Gene unaffectedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value corrected
TP5352412336223.390.000000.000013.390.000000.000013.391.00134.00
CDKN2AIP291185418.770.026763.5855418.770.026763.5855418.771.00133.88
ZMIZ1687198363.030.028853.865723.030.028853.865723.030.99132.97
KRAS786268292.600.035384.741002.600.035384.741002.600.99132.50
PIK3CA21723025530.530.015212.038250.531.00133.460.530.008101.08584
EGFR786178384.010.005990.803334.010.005990.803334.011.00133.83

Which genes of the Leiserson Pathway are affected at MYC copynumber > 6


In [17]:
genes=["CCND1","CDKN2A", "CUL9", "NPM1", "PTEN", "TP53","AGBL2", "ALS2CR8", "AMOTL1", "ANKRD12", "BRPF1", "CACHD1", "CARNS1", "CDKN2AIP", "CELSR3", "CHD8", "EPHA3", "HECW2", "IFT140", "IWS1", "MAGI2", "MAST3", "MDM4", "MLL5", "PLEKHA8", "PRDM2", "PRKRIR", "SCAPER", "SETD2", "SMG1", "SMG5", "SMG7", "SNRK",
       "SPTBN2", "STK11IP", "STRADA", "SWAP70", "TEP1", "TRIP12", "TTLL5", "UACA", "ZFP91", "ZMIZ1", "ZNF227", "ZNF668",
       "PDS5B", "STAG1", "STAG2", "WAPAL","PDS5A", "RAD21", "SMC1A",
       "KDM6A", "MLL2", "MLL3","E2F3", "N4BP2", "PROSER1",
       "ASXL1", "ASXL2", "BAP1","ANKRD17", "FOXK1", "FOXK2", "KDM1B",
       "BRAF", "KRAS", "PIK3CA","ARHGAP4", "EIF2AK1", "FAM118B", "NRAS", "PLCE1", "RASGRF2", "RASGRP2", "RGL1", "VARS2",
       "B2M", "CD1D","HLA-A", "HLA-B", "P4HTM",
       "EGFR", "ERBB2","AREG", "ELF3", "ERBB4", "LRIG1", "OSMR",
       "BOC", "CDON",
       "KEAP1", "NFE2L2","CHD6", "PTMA", "WAC",
       "MYD88", "SPOP",
       "CBFB", "RUNX1","ELF4",
       "FBXW7", "NOTCH1","CCNE1", "DCLRE1C", "DLL1", "DTX4", "FAM101A", "FOXM1", "JAG1", "KLF5", "LFNG", "LPP", "MAML1", "MMS22L", "NOTCH3", "NOTCH4", "SHPRH",
       "ARID1A", "PBRM1","ADNP", "ARID1B", "ARID2",
       "SMARCB1", "SMARCA4",
       "NCAPD3","SMC4","NCAPG2","NCAPH2","SMC2","NCAPD2",
       "CLASP1","CLASP2","CLIP2"]

s = "<table>"
s += "<th></th><th colspan=\"4\">Sample</th><th colspan=\"3\">Fisher's exact test (two-sided)</th><th colspan=\"3\">Fisher's exact test (one-sided greater)</th><th colspan=\"3\">Fisher's exact test (one-sided less)</th></tr>"
s += "<tr><th>Gene</th><th>MYC + Gene affected</th><th>MYC + Gene unaffected</th><th>MYC - Gene affected</th><th>MYC - Gene unaffected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th><th>Odd's ratio</th><th>p-value</th><th>p-value corrected</th></tr>"

gene_count = 0
for gene in genes:
    gene_count += 1
    MYC_pos_gene_affected = 0
    MYC_pos_gene_unaffected = 0
    MYC_neg_gene_affected = 0
    MYC_neg_gene_unaffected = 0

    for sample in samples.values():
        if not sample.somatic_mutation_data:
            continue
        copynumber = "n/a"
        log2 = sample.getLog2FromGene(gene_positions['MYC'].chrom,gene_positions['MYC'].start, gene_positions['MYC'].end)
        if (log2 != None):
            copynumber = 2.0 * 2.0 ** log2
        else:
            copynumber="n/a"
        if sample.checkFocalGeneAmp(gene) or gene in sample.genes_affected:
            if copynumber != "n/a" and copynumber > 6:
                MYC_pos_gene_affected += 1
            elif  copynumber != "n/a" and not copynumber > 6:
                MYC_neg_gene_affected += 1
        else:
            if copynumber != "n/a" and copynumber > 6:
                MYC_pos_gene_unaffected += 1
            elif  copynumber != "n/a" and not copynumber > 6:
                MYC_neg_gene_unaffected += 1

    oddsratio, pvalue = scipy.stats.fisher_exact([[MYC_pos_gene_affected, MYC_pos_gene_unaffected], [MYC_neg_gene_affected, MYC_neg_gene_unaffected]])
    oddsratio_greater, pvalue_greater = scipy.stats.fisher_exact([[MYC_pos_gene_affected, MYC_pos_gene_unaffected], [MYC_neg_gene_affected, MYC_neg_gene_unaffected]], alternative='greater')
    oddsratio_less, pvalue_less = scipy.stats.fisher_exact([[MYC_pos_gene_affected, MYC_pos_gene_unaffected], [MYC_neg_gene_affected, MYC_neg_gene_unaffected]], alternative='less')

    if (pvalue > 0.05 and pvalue_greater > 0.05 and pvalue_less > 0.05):
        continue
        
    s += "<tr><th>"+gene+"</th><td>"+str(MYC_pos_gene_affected)+"</td>"
    s += "<td>"+str(MYC_pos_gene_unaffected)+"</td>"
    s += "<td>"+str(MYC_neg_gene_affected)+"</td>"
    s += "<td>"+str(MYC_neg_gene_unaffected)+"</td>"
    s += "<td>"+"{:.2f}".format(oddsratio)+"</td>"
    if pvalue < 0.05: 
       s += "<td>"+"{:.5f}".format(pvalue)+"</td>"
       s += "<td>"+"{:.5f}".format(pvalue * 134)+"</td>"
    else:
       s += "<td>"+"{:.2f}".format(pvalue)+"</td>"
       s += "<td>"+"{:.2f}".format(pvalue * 134)+"</td>"
    s += "<td>"+"{:.2f}".format(oddsratio_greater)+"</td>"
    if pvalue_greater < 0.05: 
       s += "<td>"+"{:.5f}".format(pvalue_greater)+"</td>"
       s += "<td>"+"{:.5f}".format(pvalue_greater * 134)+"</td>"
    else:
       s += "<td>"+"{:.2f}".format(pvalue_greater)+"</td>"
       s += "<td>"+"{:.2f}".format(pvalue_greater * 134)+"</td>"
    s += "<td>"+"{:.2f}".format(oddsratio_less)+"</td>"
    if pvalue_less < 0.05: 
       s += "<td>"+"{:.5f}".format(pvalue_less)+"</td>"
       s += "<td>"+"{:.5f}".format(pvalue_less * 134)+"</td></tr>"
    else: 
       s += "<td>"+"{:.2f}".format(pvalue_less)+"</td>"
       s += "<td>"+"{:.2f}".format(pvalue_less * 134)+"</td></tr>"

s += "</table>"
print "Genes: "+str(gene_count)
h = HTML(s);h


Genes: 134
Out[17]:
SampleFisher's exact test (two-sided)Fisher's exact test (one-sided greater)Fisher's exact test (one-sided less)
GeneMYC + Gene affectedMYC + Gene unaffectedMYC - Gene affectedMYC - Gene unaffectedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value correctedOdd's ratiop-valuep-value corrected
TP531182856633.200.020222.708953.200.011641.560393.201.00133.58
IFT140217794115.820.012141.6262715.820.012141.6262715.821.00133.93
MLL52171093811.040.021482.8782911.040.021482.8782911.041.00133.83
FOXK11180948inf0.019652.63289inf0.019652.63289inf1.00134.00