I call the genes that are associated with tumor tissue, independent of proliferation as exhibiting switchiness. The idea is that rather than just being associated with growth, these signals switch on at some point in the process of tumoregenesis. We can't say they this is a causal signal of cancer, but we have a bit we have a bit more evidence that these signals play a role in the tranformation of the tumor cell as opposed to being bystander processes that reflect the increased flux through growth and proliferation pathways.
In [1]:
import NotebookImport
from metaPCNA import *
In [36]:
pcna_corr.name = 'PCNA corr'
In [2]:
f_win.order().head()
Out[2]:
In [3]:
f_win.order().tail()
Out[3]:
In [4]:
pcna_win.order().tail()
Out[4]:
In [5]:
switch_plot('GABRD')
In [6]:
switch_plot('FOXM1')
In [7]:
switch_plot('SEMA5B')
In [29]:
gs2 = gene_sets.ix[f_win.dropna().index].fillna(0)
rr = screen_feature(f_win, rev_kruskal, gs2.T,
align=False)
fp = (1.*gene_sets.T * f_win).T.dropna().replace(0, np.nan).mean().order()
fp.name = 'mean score'
In [30]:
(rr.q < .00001).value_counts()
Out[30]:
Greedy filter based on p-value
In [31]:
f2 = fp.ix[ti(rr.q < .0001)]
ff_u = filter_pathway_hits(rr.ix[ti(f2>0)].p.order(), gs2)
ff_p = filter_pathway_hits(rr.ix[ti(f2<0)].p.order(), gs2)
ff = ff_u.append(ff_p)
selected = rr.ix[ff[ff < .00001].index].join(fp)
selected.sort('p')
Out[31]:
Greedy filter based on effect size
In [32]:
f2 = fp.ix[ti(rr.q < .0001)]
ff_u = filter_pathway_hits(fp.ix[ti(f2>0)].order()[::-1], gs2)
ff_p = filter_pathway_hits(fp.ix[ti(f2<0)].order(), gs2)
ff = ff_u.append(ff_p)
selected = rr.ix[ff.index].join(f2)
selected.sort('p')
Out[32]:
I am pulling Ribosome as the top hit because it has the largest effect size of a few very significant gene sets.
In [37]:
p = gs2['KEGG_RIBOSOME']
fig, axs = subplots(2,2, figsize=(8,6), sharex=True)
axs = axs.flatten()
violin_plot_pandas(p, dx_rna.frac, ax=axs[0])
violin_plot_pandas(p, pcna_corr, ax=axs[2])
violin_plot_pandas(p, f_win, ax=axs[1])
violin_plot_pandas(p, pcna_win, ax=axs[3])
for ax in axs:
prettify_ax(ax)
fig.tight_layout()
In [38]:
fig, ax = subplots()
series_scatter(pcna_corr.ix[ti(p==0)], dx_rna.frac, color='grey',
s=10, alpha=.1, ax=ax, ann=None)
series_scatter(pcna_corr.ix[ti(p>0)], dx_rna.frac, color=colors[0], ax=ax,
alpha=1, s=20, ann=None)
In [39]:
p = gs2['KEGG_LYSOSOME']
fig, axs = subplots(2,2, figsize=(8,6), sharex=True)
axs = axs.flatten()
violin_plot_pandas(p, dx_rna.frac, ax=axs[0])
violin_plot_pandas(p, pcna_corr, ax=axs[2])
violin_plot_pandas(p, f_win, ax=axs[1])
violin_plot_pandas(p, pcna_win, ax=axs[3])
for ax in axs:
prettify_ax(ax)
fig.tight_layout()
In [40]:
p = gs2['REACTOME_PACKAGING_OF_TELOMERE_ENDS']
fig, axs = subplots(2,2, figsize=(8,6), sharex=True)
axs = axs.flatten()
violin_plot_pandas(p, dx_rna.frac, ax=axs[0])
violin_plot_pandas(p, pcna_corr, ax=axs[2])
violin_plot_pandas(p, f_win, ax=axs[1])
violin_plot_pandas(p, pcna_win, ax=axs[3])
for ax in axs:
prettify_ax(ax)
fig.tight_layout()
In [41]:
fig, ax = subplots()
series_scatter(pcna_corr.ix[ti(p==0)], dx_rna.frac, color='grey',
s=10, alpha=.1, ax=ax, ann=None)
series_scatter(pcna_corr.ix[ti(p>0)], dx_rna.frac, color=colors[0], ax=ax,
alpha=1, s=20, ann=None)
In [42]:
p = gs2['KEGG_FATTY_ACID_METABOLISM']
fig, axs = subplots(2,2, figsize=(8,6), sharex=True)
axs = axs.flatten()
violin_plot_pandas(p, dx_rna.frac, ax=axs[0])
violin_plot_pandas(p, pcna_corr, ax=axs[2])
violin_plot_pandas(p, f_win, ax=axs[1])
violin_plot_pandas(p, pcna_win, ax=axs[3])
for ax in axs:
prettify_ax(ax)
fig.tight_layout()
In [43]:
fig, ax = subplots()
series_scatter(pcna_corr.ix[ti(p==0)], dx_rna.frac, color='grey',
s=10, alpha=.1, ax=ax, ann=None)
series_scatter(pcna_corr.ix[ti(p>0)], dx_rna.frac, color=colors[0], ax=ax,
alpha=1, s=20, ann=None)
The histone genes are not in our gene sets, but some fishing around showed that they are overexpressed in the tumor much more than their proliferation scores would suggest.
In [44]:
g = pd.Series(f_win.index, f_win.index)
p = g.str.startswith('HIST')
fig, axs = subplots(2,2, figsize=(8,6), sharex=True)
axs = axs.flatten()
violin_plot_pandas(p, dx_rna.frac, ax=axs[0])
violin_plot_pandas(p, pcna_corr, ax=axs[2])
violin_plot_pandas(p, f_win, ax=axs[1])
violin_plot_pandas(p, pcna_win, ax=axs[3])
for ax in axs:
prettify_ax(ax)
fig.tight_layout()