## Custom regression models

Like for univariate models, it is possible to create your own custom parametric survival models. Why might you want to do this?

• Create new / extend AFT models using known probability distributions
• Create a piecewise model using domain knowledge about subjects
• Iterate and fit a more accurate parametric model

lifelines has a very simple API to create custom parametric regression models. You only need to define the cumulative hazard function. For example, the cumulative hazard for the constant-hazard regression model looks like:

$$H(t, x) = \frac{t}{\lambda(x)}\\ \lambda(x) = \exp{(\vec{\beta} \cdot \vec{x}^{\,T})}$$

where $\beta$ are the unknowns we will optimize over.

Below are some example custom models.



In [1]:

from lifelines.fitters import ParametricRegressionFitter
from autograd import numpy as np

class ExponentialAFTFitter(ParametricRegressionFitter):

# this class property is necessary, and should always be a non-empty list of strings.
_fitted_parameter_names = ['lambda_']

def _cumulative_hazard(self, params, t, Xs):
# params is a dictionary that maps unknown parameters to a numpy vector.
# Xs is a dictionary that maps unknown parameters to a numpy 2d array
beta = params['lambda_']
X = Xs['lambda_']
lambda_ = np.exp(np.dot(X, beta))
return t / lambda_

rossi['intercept'] = 1.0

# the below variables maps dataframe columns to parameters
regressors = {
'lambda_': rossi.columns
}

eaf = ExponentialAFTFitter().fit(rossi, 'week', 'arrest', regressors=regressors)
eaf.print_summary()




.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

text-align: right;
}

model
lifelines.ExponentialAFTFitter

duration col
'week'

event col
'arrest'

number of observations
432

number of events observed
114

log-likelihood
-686.37

time fit was run
2020-01-21 22:13:30 UTC

coef
exp(coef)
se(coef)
coef lower 95%
coef upper 95%
exp(coef) lower 95%
exp(coef) upper 95%
z
p
-log2(p)

lambda_
fin
0.37
1.44
0.19
-0.01
0.74
0.99
2.10
1.92
0.06
4.18

age
0.06
1.06
0.02
0.01
0.10
1.01
1.10
2.55
0.01
6.52

race
-0.30
0.74
0.31
-0.91
0.30
0.40
1.35
-0.99
0.32
1.63

wexp
0.15
1.16
0.21
-0.27
0.56
0.76
1.75
0.69
0.49
1.03

mar
0.43
1.53
0.38
-0.32
1.17
0.73
3.24
1.12
0.26
1.93

paro
0.08
1.09
0.20
-0.30
0.47
0.74
1.59
0.42
0.67
0.57

prio
-0.09
0.92
0.03
-0.14
-0.03
0.87
0.97
-3.03
<0.005
8.65

intercept
4.05
57.44
0.59
2.90
5.20
18.21
181.15
6.91
<0.005
37.61

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

text-align: right;
}

Log-likelihood ratio test
31.22 on 6 df, -log2(p)=15.41




In [17]:

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

class DependentCompetingRisksHazard(ParametricRegressionFitter):
"""

Reference
--------------
Frees and Valdez, UNDERSTANDING RELATIONSHIPS USING COPULAS

"""

_fitted_parameter_names = ["lambda1", "rho1", "lambda2", "rho2", "alpha"]

def _cumulative_hazard(self, params, T, Xs):
lambda1 = np.exp(np.dot(Xs["lambda1"], params["lambda1"]))
lambda2 = np.exp(np.dot(Xs["lambda2"], params["lambda2"]))
rho2 =    np.exp(np.dot(Xs["rho2"],    params["rho2"]))
rho1 =    np.exp(np.dot(Xs["rho1"],    params["rho1"]))
alpha =   np.exp(np.dot(Xs["alpha"],   params["alpha"]))

return ((T / lambda1) ** rho1 + (T / lambda2) ** rho2) ** alpha

fitter = DependentCompetingRisksHazard(penalizer=0.1)

rossi["intercept"] = 1.0
rossi["week"] = rossi["week"] / rossi["week"].max() # scaling often helps with convergence

covariates = {
"lambda1": rossi.columns,
"lambda2": rossi.columns,
"rho1": ["intercept"],
"rho2": ["intercept"],
"alpha": ["intercept"],
}

fitter.fit(rossi, "week", event_col="arrest", regressors=covariates, timeline=np.linspace(0, 2))
fitter.print_summary(2)

ax = fitter.plot()

ax = fitter.predict_survival_function(rossi.loc[::100]).plot(figsize=(8, 4))
ax.set_title("Predicted survival functions for selected subjects")




.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

text-align: right;
}

model
lifelines.DependentCompetingRisksHazard

duration col
'week'

event col
'arrest'

penalizer
0.1

number of observations
432

number of events observed
114

log-likelihood
-227.75

time fit was run
2020-01-21 22:18:10 UTC

coef
exp(coef)
se(coef)
coef lower 95%
coef upper 95%
exp(coef) lower 95%
exp(coef) upper 95%
z
p
-log2(p)

lambda1
fin
0.02
1.02
0.22
-0.40
0.44
0.67
1.55
0.09
0.93
0.11

age
0.01
1.01
0.00
0.01
0.02
1.01
1.02
3.34
<0.005
10.25

race
0.05
1.05
0.17
-0.29
0.39
0.75
1.48
0.31
0.76
0.40

wexp
-0.21
0.81
0.04
-0.29
-0.13
0.75
0.87
-5.44
<0.005
24.18

mar
0.18
1.20
0.19
-0.19
0.56
0.83
1.75
0.96
0.34
1.56

paro
0.02
1.02
0.12
-0.20
0.25
0.82
1.28
0.20
0.84
0.25

prio
-0.01
0.99
0.01
-0.03
0.00
0.97
1.00
-1.35
0.18
2.49

intercept
-0.00
1.00
0.03
-0.06
0.06
0.94
1.06
-0.08
0.94
0.10

lambda2
fin
0.21
1.23
0.20
-0.19
0.61
0.83
1.84
1.03
0.30
1.72

age
0.02
1.02
0.01
0.00
0.04
1.00
1.04
2.05
0.04
4.64

race
-0.21
0.81
0.22
-0.64
0.21
0.53
1.24
-0.98
0.33
1.62

wexp
0.24
1.27
0.13
-0.02
0.49
0.98
1.64
1.79
0.07
3.77

mar
0.17
1.18
0.21
-0.25
0.58
0.78
1.79
0.78
0.43
1.20

paro
0.01
1.01
0.14
-0.27
0.29
0.76
1.33
0.06
0.95
0.07

prio
-0.07
0.93
0.02
-0.11
-0.03
0.90
0.97
-3.22
<0.005
9.59

intercept
0.61
1.84
0.07
0.48
0.74
1.61
2.09
9.14
<0.005
63.82

rho1
intercept
2.87
17.69
0.39
2.12
3.63
8.31
37.63
7.46
<0.005
43.38

rho2
intercept
0.30
1.35
0.66
-0.99
1.59
0.37
4.89
0.46
0.65
0.63

alpha
intercept
-0.00
1.00
0.14
-0.29
0.28
0.75
1.32
-0.03
0.98
0.04

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

text-align: right;
}

Log-likelihood ratio test
36.86 on 17 df, -log2(p)=8.15

Out[17]:

Text(0.5, 1.0, 'Predicted survival functions for selected subjects')



### Cure models

Suppose in our population we have a subpopulation that will never experience the event of interest. Or, for some subjects the event will occur so far in the future that it's essentially at time infinity. In this case, the survival function for an individual should not asymptically approach zero, but some positive value. Models that describe this are sometimes called cure models (i.e. the subject is "cured" of death and hence no longer susceptible) or time-lagged conversion models.

It would be nice to be able to use common survival models and have some "cure" component. Let's suppose that for individuals that will experience the event of interest, their survival distrubtion is a Weibull, denoted $S_W(t)$. For a random selected individual in the population, thier survival curve, $S(t)$, is:

\begin{align*} S(t) = P(T > t) &= P(\text{cured}) P(T > t\;|\;\text{cured}) + P(\text{not cured}) P(T > t\;|\;\text{not cured}) \\ &= p + (1-p) S_W(t) \end{align*}

Even though it's in an unconvential form, we can still determine the cumulative hazard (which is the negative logarithm of the survival function):

$$H(t) = -\log{\left(p + (1-p) S_W(t)\right)}$$


In [9]:

class CureModel(ParametricRegressionFitter):
_scipy_fit_method = "SLSQP"
_scipy_fit_options = {"ftol": 1e-10, "maxiter": 200}

_fitted_parameter_names = ["lambda_", "beta_", "rho_"]

def _cumulative_hazard(self, params, T, Xs):
c = expit(np.dot(Xs["beta_"], params["beta_"]))

lambda_ = np.exp(np.dot(Xs["lambda_"], params["lambda_"]))
rho_ = np.exp(np.dot(Xs["rho_"], params["rho_"]))
sf = np.exp(-(T / lambda_) ** rho_)

return -np.log((1 - c) + c * sf)

cm = CureModel(penalizer=0.0)

rossi["intercept"] = 1.0

covariates = {"lambda_": rossi.columns, "rho_": ["intercept"], "beta_": ['intercept', 'fin']}

cm.fit(rossi, "week", event_col="arrest", regressors=covariates, timeline=np.arange(250))
cm.print_summary(2)




.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

text-align: right;
}

model
lifelines.CureModel

duration col
'week'

event col
'arrest'

number of observations
432

number of events observed
114

log-likelihood
-679.51

time fit was run
2020-01-21 22:14:20 UTC

coef
exp(coef)
se(coef)
coef lower 95%
coef upper 95%
exp(coef) lower 95%
exp(coef) upper 95%
z
p
-log2(p)

lambda_
fin
0.15
1.16
1.09
-1.99
2.29
0.14
9.89
0.14
0.89
0.17

age
0.03
1.04
0.16
-0.28
0.35
0.76
1.42
0.22
0.83
0.27

race
-0.28
0.75
0.58
-1.43
0.86
0.24
2.37
-0.49
0.63
0.67

wexp
0.20
1.23
0.20
-0.19
0.60
0.82
1.82
1.01
0.31
1.67

mar
0.34
1.41
0.23
-0.10
0.79
0.90
2.20
1.50
0.13
2.92

paro
0.03
1.04
0.18
-0.32
0.39
0.73
1.47
0.19
0.85
0.24

prio
-0.08
0.93
0.03
-0.13
-0.02
0.87
0.98
-2.52
0.01
6.42

intercept
3.76
43.15
0.07
3.62
3.91
37.27
49.95
50.39
<0.005
inf

rho_
intercept
0.43
1.54
0.07
0.30
0.56
1.35
1.75
6.37
<0.005
32.28

beta_
fin
-0.41
0.66
0.98
-2.34
1.52
0.10
4.56
-0.42
0.68
0.56

intercept
0.70
2.02
0.12
0.47
0.93
1.60
2.54
5.95
<0.005
28.50

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

text-align: right;
}

Log-likelihood ratio test
34.22 on 9 df, -log2(p)=13.58




In [18]:

cm.predict_survival_function(rossi.loc[::100]).plot(figsize=(12,6))




Out[18]:

<matplotlib.axes._subplots.AxesSubplot at 0x1278726a0>




In [11]:

# what's the effect on the survival curve if I vary "age"
fig, ax = plt.subplots(figsize=(12, 6))

cm.plot_covariate_groups(['age'], values=np.arange(20, 50, 5), cmap='coolwarm', ax=ax)




Out[11]:

<matplotlib.axes._subplots.AxesSubplot at 0x11167bb38>



### Spline models

See royston_parmar_splines.py in the examples folder: https://github.com/CamDavidsonPilon/lifelines/tree/master/examples