UNKOWN genotype demography experiment

Summary

to be filled
to install run pip install git+https://github.com/ctzhu/lifemodels, assuming a running python installation is inplace
I have to make WIN32 dynamic libraries, before that the following notebook can only run in MacOX



In [1]:

    
import os
#os.chdir('/Users/q6600sl/IPython_NB')
from lifemodels import s_models
%matplotlib inline

· Read the data and create a survival object using `.survt_df()` method

The only thing I add is the column name for the genotypes ('Genotype')



In [2]:

    
#Read the data
original_df = pd.read_csv('/Users/q6600sl/Downloads/SP_12-22-15.txt', sep=' ')
#1st step: create a survival object
surv_df    = s_models.survt_df(original_df)



In [3]:

    
original_df.head()

· Specify the model and create the model fit object using `.distfit_df()` method



In [4]:

    
gp3_fit    = s_models.distfit_df(surv_df, 'gp3')



In [5]:

    
gp3_fit.mdl_all_free









    Out[5]:






  
    
      
      alpha
      beta
      c
      logL
      optimizer_flag
    
    
      Genotype
      
      
      
      
      
    
  
  
    
      Mut-1
      -8.538838
      -2.185154
      -6.430742
      679.274162
      0
    
    
      Mut-12-1
      -5.341972
      -2.250901
      -4.752080
      551.716119
      0
    
    
      Mut-12-2
      -9.383152
      -1.832199
      -7.601744
      954.244459
      0
    
    
      Mut-12-3
      -6.429694
      -2.342217
      -34.558168
      1139.981590
      0
    
    
      Mut-2
      -9.890958
      -1.887198
      -6.820190
      641.991652
      0
    
    
      WT
      -6.787571
      -2.011953
      -6.565602
      651.463453
      0

· We can get the variance of the parameter estimates

It might not always be there, due to float overflow and underflow problems



In [6]:

    
gp3_fit.var

· Get the estimated log time-to-reach-%-of-death using `.logq()` method

.logq(key, q), key is genotype here. q is that target % of death. q is default to 50%, therefore by default it returns the estimate for medium lifespan.

.logq() method and .logq_nc() method are similar: the latter ignores the Makehamm term, if there is one.



In [7]:

    
(gp3_fit.logq('Mut-1'), gp3_fit.logq_nc('Mut-1'))









    Out[7]:





(3.9562204965442995, 3.9751914478113077)



In [8]:

    
medium_df = pd.DataFrame({'With_C'   : map(gp3_fit.logq, gp3_fit.mdl_all_free.index),
                          'Without_C': map(gp3_fit.logq_nc, gp3_fit.mdl_all_free.index)},
                         index=gp3_fit.mdl_all_free.index)
np.exp(medium_df)



In [9]:

    
ax = medium_df.plot(kind='bar', alpha=0.5)
ax.set_ylim(3, 4.5)

right_y = ax.twinx()
right_y.set_ylim(3, 4.5)
_ = right_y.set_yticklabels(map('{:.2f}'.format,  np.exp(right_y.get_yticks())))

ax.set_ylabel('Medium survival time in log(days)')
right_y.set_ylabel('Medium survival time in days')
plt.title('Medium lifespan estimated based on Gompertz Makeham Model', y=1.05)









    Out[9]:





<matplotlib.text.Text at 0x121dbec50>

· Get the estimated log mean lifespan using `.logm()` method



In [10]:

    
(gp3_fit.logm('Mut-1'), gp3_fit.logm_nc('Mut-1'))









    Out[10]:





(3.9031295162640411, 3.940987933505439)

· We can also calculate the variance of time-to-percentage-death estimiates



In [11]:

    
gp3_fit.logt_var('Mut-1', 3.9)









    Out[11]:





0.0029731768200757795

· Plot for parameter estimates



In [12]:

    
f, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(16, 5))

plot_par = {'logy' : True,
            'alpha': 0.5,
            'color': list('BBBBBR')} 

np.exp(gp3_fit.mdl_all_free).ix[:, 'alpha'].plot(kind='bar', ax=ax1, **plot_par)
np.exp(gp3_fit.mdl_all_free).ix[:, 'beta'].plot(kind='bar', ax=ax2, **plot_par)
np.exp(gp3_fit.mdl_all_free).ix[:, 'c'].plot(kind='bar', ax=ax3, **plot_par)

ax1.set_title(r'Gompertz $\alpha$')
ax2.set_title(r'Gompertz $\beta$')
ax3.set_title(r'Gompertz $C$')

plt.suptitle('Gompertz Makeham Model parameter estimates', fontsize=16, y=0.99)









    Out[12]:





<matplotlib.text.Text at 0x121e953d0>

· Plot harzard plot, survival plot and other plots using `.plot()` method



In [13]:

    
f, axes = plt.subplots(2, 3, figsize=(14, 8), sharex=True, sharey=True)
axes = axes.ravel()
for ax, key in zip(axes, gp3_fit.mdl_all_free.index.tolist()):
    gp3_fit.plot(key, np.linspace(0,80,100),ax,'hzd')
    ax.set_ylim((-8,1))
    ax.set_title(key)
plt.suptitle('log-harzard plot', fontsize=20)









    Out[13]:





<matplotlib.text.Text at 0x121d97690>



In [14]:

    
f, axes = plt.subplots(2, 3, figsize=(14, 8), sharex=True, sharey=True)
axes = axes.ravel()
for ax, key in zip(axes, gp3_fit.mdl_all_free.index.tolist()):
    gp3_fit.plot(key, np.linspace(0,80,100),ax,'sf')
    ax.set_ylim((-0.01,1.01))
    ax.set_title(key)
plt.suptitle('survival plot', fontsize=20)









    Out[14]:





<matplotlib.text.Text at 0x109009d10>



In [ ]:

	alpha	beta	c
Genotype
Mut-1	0.368075	0.060483	0.343668
Mut-12-1	0.550833	0.159461	0.363832
Mut-12-2	0.372298	0.051876	0.755458
Mut-12-3	NaN	NaN	inf
Mut-2	0.445852	0.058488	0.414629
WT	0.343995	0.073766	0.658950

	With_C	Without_C
Genotype
Mut-1	52.259438	53.260313
Mut-12-1	25.143313	26.476313
Mut-12-2	44.720688	44.890313
Mut-12-3	39.864938	38.962875
Mut-2	49.901563	50.415500
WT	32.962188	33.061313

	alpha	beta	c	logL	optimizer_flag
Genotype
Mut-1	-8.538838	-2.185154	-6.430742	679.274162	0
Mut-12-1	-5.341972	-2.250901	-4.752080	551.716119	0
Mut-12-2	-9.383152	-1.832199	-7.601744	954.244459	0
Mut-12-3	-6.429694	-2.342217	-34.558168	1139.981590	0
Mut-2	-9.890958	-1.887198	-6.820190	641.991652	0
WT	-6.787571	-2.011953	-6.565602	651.463453	0

UNKOWN genotype demography experiment

Summary

· Read the data and create a survival object using .survt_df() method

· Specify the model and create the model fit object using .distfit_df() method