Global Imports

Importing a bunch of globals from this notebook.


In [2]:
import NotebookImport
from HNSCC_Imports import *

Calculate MATH Score


In [4]:
maf = FH.get_submaf(run.data_path, 'HNSC', fields='All')
maf = maf[maf.patient.map(lambda s: s[13:16] == '01A')]
maf.patient = maf.patient.map(lambda s: s[:12])
maf = maf.reset_index()
maf['t_alt_count'] = maf['t_alt_count'].replace('---', nan).astype(float)
maf['t_ref_count'] = maf['t_ref_count'].replace('---', nan).astype(float)

In [5]:
frac = maf.t_alt_count / maf[['t_alt_count','t_ref_count']].sum(1)
frac = frac[frac > .075]

get_mad = lambda s: (s - s.median()).abs().median() * 1.826 
med = frac.groupby(maf.patient).median()
mad = frac.groupby(maf.patient).mad()
math = (mad / med) * 100
math.name = 'MATH'

My numbers are a bit off... I should try and reconsile this.


In [6]:
math.describe()


Out[6]:
count    302.00
mean      35.78
std        9.41
min       17.13
25%       28.69
50%       34.05
75%       41.84
max       67.77
dtype: float64

In [7]:
math.hist()


Out[7]:
<matplotlib.axes.AxesSubplot at 0x7f6f10f01f50>

I'm going to tweak their threshold as my calculation is a bit miscalibrated.


In [8]:
(math > 32).value_counts()


Out[8]:
True     182
False    120
dtype: int64

This is a bit closer but its hard to tell where the missing samples are.


In [9]:
(math > 31.5).value_counts()


Out[9]:
True     191
False    111
dtype: int64

In [10]:
math_t = (math > 31.5).map({True:'MATH High', False:'MATH Low'})

Survival Analysis

Curves look a little off, maybe I'm using more recent data?


In [11]:
survival_and_stats(math_t, surv)


Combination of HPV and MATH
Not sure why they cut off survival at 4 years


In [12]:
violin_plot_pandas(hpv, math)



In [13]:
survival_and_stats(combine(math_t=='MATH High', hpv), surv)


From another angle, I don't see the HPV effect in my data.


In [14]:
draw_survival_curves(math_t, surv, hpv)


Combination Analysis in HPV-

  • TP53-3p (Our finding from this paper).
  • I use a different working set with some old patients filtered

In [15]:
survival_and_stats(combine(del_3p<0, mut.features.ix['TP53']>0).ix[keepers_o].dropna(), 
                   clinical.survival.survival_5y)


This is with the old patients back in.


In [16]:
survival_and_stats(combine(del_3p<0, mut.features.ix['TP53']>0).ix[ti(hpv==False)].dropna(), 
                   clinical.survival.survival_5y)


TP53-MATH (Figure 6b)


In [17]:
violin_plot_pandas(mut.df.ix['TP53'], math)



In [18]:
survival_and_stats(combine(math_t=='MATH High', mut.features.ix['TP53']>0).ix[ti(hpv==False)].dropna(), 
                   clinical.survival.survival_5y)


3p Deletion-MATH


In [19]:
violin_plot_pandas(del_3p, math, order=[-2,-1,0,1])



In [20]:
survival_and_stats(combine(math_t=='MATH High', del_3p < 0).ix[ti(hpv==False)].dropna(), 
                   clinical.survival.survival_5y)


TP53-3p combination + MATH


In [23]:
combo = combine(mut.features.ix['TP53']>0, del_3p < 0)

In [24]:
violin_plot_pandas(combo, math, order=['neither','3p_deletion',
                                       'TP53','both'])



In [25]:
survival_and_stats(combine(math_t=='MATH High', combo=='both').ix[ti(hpv==False)].dropna(), 
                   clinical.survival.survival_5y)


TP53-3p combination in in the context of MATH


In [26]:
draw_survival_curves(combo, surv, math_t)


MATH in the context of TP53-3p


In [27]:
draw_survival_curves(math_t=='MATH High', surv, combo)


Add third subtype from out paper


In [28]:
two_hit = combine(del_3p<0, mut_new.ix['TP53']>0) == 'both'
two_hit.name = 'two_hit'
subtypes = combine(mirna.ix['hsa-mir-548k'][:,'01'] > -1, two_hit)
subtypes = subtypes.map({'hsa-mir-548k':1, 'neither':1, 'two_hit':2, 'both': 3})
subtypes.name = 'subtype'

In [29]:
violin_plot_pandas(subtypes, math)



In [30]:
draw_survival_curves(math_t=='MATH High', surv, subtypes)



In [31]:
draw_survival_curves(subtypes, surv, math_t=='MATH High')