Looking at Switchiness

I call the genes that are associated with tumor tissue, independent of proliferation as exhibiting switchiness. The idea is that rather than just being associated with growth, these signals switch on at some point in the process of tumoregenesis. We can't say they this is a causal signal of cancer, but we have a bit we have a bit more evidence that these signals play a role in the tranformation of the tumor cell as opposed to being bystander processes that reflect the increased flux through growth and proliferation pathways.



In [1]:

    
import NotebookImport
from metaPCNA import *









    




importing IPython notebook from metaPCNA






    




importing IPython notebook from Imports






    




importing IPython notebook from Global_Parameters



In [36]:

    
pcna_corr.name = 'PCNA corr'



In [2]:

    
f_win.order().head()









    Out[2]:





GPM6B     -0.38
SFRP1     -0.38
PIK3C2G   -0.36
PKHD1L1   -0.36
SCNN1B    -0.36
Name: fraction overexpressed\n(detrended), dtype: float64



In [3]:

    
f_win.order().tail()









    Out[3]:





LOC283914    0.41
CST2         0.42
C16orf13     0.43
GABRD        0.47
SEMA5B       0.48
Name: fraction overexpressed\n(detrended), dtype: float64



In [4]:

    
pcna_win.order().tail()









    Out[4]:





KIF23     0.43
ZNF488    0.44
CDC25A    0.45
VRK1      0.47
GMNN      0.49
Name: meta-PCNA\n(detrended), dtype: float64



In [5]:

    
switch_plot('GABRD')



In [6]:

    
switch_plot('FOXM1')



In [7]:

    
switch_plot('SEMA5B')



In [29]:

    
gs2 = gene_sets.ix[f_win.dropna().index].fillna(0)
rr = screen_feature(f_win, rev_kruskal, gs2.T, 
                    align=False)
fp = (1.*gene_sets.T * f_win).T.dropna().replace(0, np.nan).mean().order()
fp.name = 'mean score'



In [30]:

    
(rr.q < .00001).value_counts()









    Out[30]:





False    1271
True       59
dtype: int64

Greedy filter based on p-value



In [31]:

    
f2 = fp.ix[ti(rr.q < .0001)]
ff_u = filter_pathway_hits(rr.ix[ti(f2>0)].p.order(), gs2)
ff_p = filter_pathway_hits(rr.ix[ti(f2<0)].p.order(), gs2)
ff = ff_u.append(ff_p)

selected = rr.ix[ff[ff < .00001].index].join(fp)
selected.sort('p')









    Out[31]:






  
    
      
      H
      p
      q
      mean score
    
  
  
    
      REACTOME_TRANSLATION
      158.08
      2.97e-36
      3.86e-33
      0.13
    
    
      KEGG_SYSTEMIC_LUPUS_ERYTHEMATOSUS
      82.42
      1.10e-19
      1.22e-17
      0.10
    
    
      KEGG_FATTY_ACID_METABOLISM
      42.00
      9.14e-11
      5.79e-09
      -0.13
    
    
      REACTOME_INTERFERON_ALPHA_BETA_SIGNALING
      41.59
      1.13e-10
      6.81e-09
      0.11
    
    
      KEGG_LYSOSOME
      38.06
      6.86e-10
      3.51e-08
      0.07
    
    
      REACTOME_GENERIC_TRANSCRIPTION_PATHWAY
      34.89
      3.48e-09
      1.36e-07
      -0.03
    
    
      REACTOME_VIF_MEDIATED_DEGRADATION_OF_APOBEC3G
      34.13
      5.16e-09
      1.96e-07
      0.09
    
    
      REACTOME_ASPARAGINE_N_LINKED_GLYCOSYLATION
      29.17
      6.63e-08
      1.76e-06
      0.07
    
    
      REACTOME_EXTRACELLULAR_MATRIX_ORGANIZATION
      21.82
      2.99e-06
      5.45e-05
      0.08

Greedy filter based on effect size



In [32]:

    
f2 = fp.ix[ti(rr.q < .0001)]
ff_u = filter_pathway_hits(fp.ix[ti(f2>0)].order()[::-1], gs2)
ff_p = filter_pathway_hits(fp.ix[ti(f2<0)].order(), gs2)
ff = ff_u.append(ff_p)

selected = rr.ix[ff.index].join(f2)
selected.sort('p')









    Out[32]:






  
    
      
      H
      p
      q
      mean score
    
  
  
    
      KEGG_RIBOSOME
      144.04
      3.47e-33
      1.16e-30
      0.16
    
    
      REACTOME_PACKAGING_OF_TELOMERE_ENDS
      45.35
      1.65e-11
      1.15e-09
      0.12
    
    
      KEGG_FATTY_ACID_METABOLISM
      42.00
      9.14e-11
      5.79e-09
      -0.13
    
    
      KEGG_LYSOSOME
      38.06
      6.86e-10
      3.51e-08
      0.07
    
    
      KEGG_GRAFT_VERSUS_HOST_DISEASE
      35.52
      2.52e-09
      1.05e-07
      0.11
    
    
      REACTOME_GENERIC_TRANSCRIPTION_PATHWAY
      34.89
      3.48e-09
      1.36e-07
      -0.03
    
    
      KEGG_PROTEASOME
      31.14
      2.40e-08
      7.43e-07
      0.09
    
    
      REACTOME_N_GLYCAN_TRIMMING_IN_THE_ER_AND_CALNEXIN_CALRETICULIN_CYCLE
      24.76
      6.48e-07
      1.35e-05
      0.18
    
    
      REACTOME_EXTRACELLULAR_MATRIX_ORGANIZATION
      21.82
      2.99e-06
      5.45e-05
      0.08

I am pulling Ribosome as the top hit because it has the largest effect size of a few very significant gene sets.



In [37]:

    
p = gs2['KEGG_RIBOSOME']
fig, axs = subplots(2,2, figsize=(8,6), sharex=True)
axs = axs.flatten()
violin_plot_pandas(p, dx_rna.frac, ax=axs[0])
violin_plot_pandas(p, pcna_corr, ax=axs[2])
violin_plot_pandas(p, f_win, ax=axs[1])
violin_plot_pandas(p, pcna_win, ax=axs[3])
for ax in axs:
    prettify_ax(ax)
fig.tight_layout()



In [38]:

    
fig, ax = subplots()
series_scatter(pcna_corr.ix[ti(p==0)], dx_rna.frac, color='grey', 
               s=10, alpha=.1, ax=ax, ann=None)
series_scatter(pcna_corr.ix[ti(p>0)], dx_rna.frac, color=colors[0], ax=ax,
               alpha=1, s=20, ann=None)



In [39]:

    
p = gs2['KEGG_LYSOSOME']
fig, axs = subplots(2,2, figsize=(8,6), sharex=True)
axs = axs.flatten()
violin_plot_pandas(p, dx_rna.frac, ax=axs[0])
violin_plot_pandas(p, pcna_corr, ax=axs[2])
violin_plot_pandas(p, f_win, ax=axs[1])
violin_plot_pandas(p, pcna_win, ax=axs[3])
for ax in axs:
    prettify_ax(ax)
fig.tight_layout()



In [40]:

    
p = gs2['REACTOME_PACKAGING_OF_TELOMERE_ENDS']
fig, axs = subplots(2,2, figsize=(8,6), sharex=True)
axs = axs.flatten()
violin_plot_pandas(p, dx_rna.frac, ax=axs[0])
violin_plot_pandas(p, pcna_corr, ax=axs[2])
violin_plot_pandas(p, f_win, ax=axs[1])
violin_plot_pandas(p, pcna_win, ax=axs[3])
for ax in axs:
    prettify_ax(ax)
fig.tight_layout()



In [41]:

    
fig, ax = subplots()
series_scatter(pcna_corr.ix[ti(p==0)], dx_rna.frac, color='grey', 
               s=10, alpha=.1, ax=ax, ann=None)
series_scatter(pcna_corr.ix[ti(p>0)], dx_rna.frac, color=colors[0], ax=ax,
               alpha=1, s=20, ann=None)



In [42]:

    
p = gs2['KEGG_FATTY_ACID_METABOLISM']
fig, axs = subplots(2,2, figsize=(8,6), sharex=True)
axs = axs.flatten()
violin_plot_pandas(p, dx_rna.frac, ax=axs[0])
violin_plot_pandas(p, pcna_corr, ax=axs[2])
violin_plot_pandas(p, f_win, ax=axs[1])
violin_plot_pandas(p, pcna_win, ax=axs[3])
for ax in axs:
    prettify_ax(ax)
fig.tight_layout()



In [43]:

    
fig, ax = subplots()
series_scatter(pcna_corr.ix[ti(p==0)], dx_rna.frac, color='grey', 
               s=10, alpha=.1, ax=ax, ann=None)
series_scatter(pcna_corr.ix[ti(p>0)], dx_rna.frac, color=colors[0], ax=ax,
               alpha=1, s=20, ann=None)

The histone genes are not in our gene sets, but some fishing around showed that they are overexpressed in the tumor much more than their proliferation scores would suggest.



In [44]:

    
g = pd.Series(f_win.index, f_win.index)
p = g.str.startswith('HIST')
fig, axs = subplots(2,2, figsize=(8,6), sharex=True)
axs = axs.flatten()
violin_plot_pandas(p, dx_rna.frac, ax=axs[0])
violin_plot_pandas(p, pcna_corr, ax=axs[2])
violin_plot_pandas(p, f_win, ax=axs[1])
violin_plot_pandas(p, pcna_win, ax=axs[3])
for ax in axs:
    prettify_ax(ax)
fig.tight_layout()

	H	p	q	mean score
REACTOME_TRANSLATION	158.08	2.97e-36	3.86e-33	0.13
KEGG_SYSTEMIC_LUPUS_ERYTHEMATOSUS	82.42	1.10e-19	1.22e-17	0.10
KEGG_FATTY_ACID_METABOLISM	42.00	9.14e-11	5.79e-09	-0.13
REACTOME_INTERFERON_ALPHA_BETA_SIGNALING	41.59	1.13e-10	6.81e-09	0.11
KEGG_LYSOSOME	38.06	6.86e-10	3.51e-08	0.07
REACTOME_GENERIC_TRANSCRIPTION_PATHWAY	34.89	3.48e-09	1.36e-07	-0.03
REACTOME_VIF_MEDIATED_DEGRADATION_OF_APOBEC3G	34.13	5.16e-09	1.96e-07	0.09
REACTOME_ASPARAGINE_N_LINKED_GLYCOSYLATION	29.17	6.63e-08	1.76e-06	0.07
REACTOME_EXTRACELLULAR_MATRIX_ORGANIZATION	21.82	2.99e-06	5.45e-05	0.08

	H	p	q	mean score
KEGG_RIBOSOME	144.04	3.47e-33	1.16e-30	0.16
REACTOME_PACKAGING_OF_TELOMERE_ENDS	45.35	1.65e-11	1.15e-09	0.12
KEGG_FATTY_ACID_METABOLISM	42.00	9.14e-11	5.79e-09	-0.13
KEGG_LYSOSOME	38.06	6.86e-10	3.51e-08	0.07
KEGG_GRAFT_VERSUS_HOST_DISEASE	35.52	2.52e-09	1.05e-07	0.11
REACTOME_GENERIC_TRANSCRIPTION_PATHWAY	34.89	3.48e-09	1.36e-07	-0.03
KEGG_PROTEASOME	31.14	2.40e-08	7.43e-07	0.09
REACTOME_N_GLYCAN_TRIMMING_IN_THE_ER_AND_CALNEXIN_CALRETICULIN_CYCLE	24.76	6.48e-07	1.35e-05	0.18
REACTOME_EXTRACELLULAR_MATRIX_ORGANIZATION	21.82	2.99e-06	5.45e-05	0.08