This notebook is to facilitate collaboration on building and evaluating models to estimate pharmaceutical product expiration date. The notebook contains:
In this example the impurity product, $P$, can either be produced from crystalline drug, $D$, or an amorphous intermediate, $I$, that is more reactive (this example is extracted from Waterman et al. Pharm Res Vol 24 2007 p 783).
Expressing the system in matrix form. \begin{equation} \frac{d}{dt} \begin{bmatrix} P \\ I \\ D \end{bmatrix} = \begin{bmatrix} 0 & k_2 & k_3 \\ 0 & (-k_{1r} - k_2) & k_{1f} \\ 0 & k_{1r} & (-k_{1f}-k_3) \end{bmatrix} \begin{bmatrix} P \\ I \\ D \end{bmatrix} \end{equation}
The rate constants are assumed to follow an Arrhenius temperature dependence: \begin{equation} k = A \exp \left (- \frac{E}{RT} \right ) \end{equation}
Edit the parameters below for the above reaction model.
A : refers to the Arrhenius prefactor ($\textrm{sec}^{-1}$). The subfix is defined by the above model.
E : refers to the Arrhenius activation energy (kcal/mol). The subfix is defined by the above model.
Po, Io, Do : refer to the initial product $P$, intermediate, $I$ and drug, $D$, component concentrations in the above model. The concentrations are defined as the fraction of component relative to the total amount of starting reactant. For example, if the starting formulation has 90% of the drug as crystalline, and 10% amorphous, then the ratios would be $D_o$=0.9, $I_o$=0.1, and $P_o$=0.
Temperatures: refers to the stability testing temperatures.
days: refers to the time points that each sample is measured.
impurity_setpoint: defines the acceptance criteria for the impurity concentration
When finished run menu Cell -> Run Cells. Scroll down to see the updated results.
In [82]:
# kinetic parameters (kcal/mol)
A1f = 1e4
E1f = 22
A1r = 1e4
E1r = 26
A2 = 1e6
E2 = 20
A3 = 1e5
E3 = 21
Po = 0
Io = .1
Do = 0.9
# temperatures (up to 4 different temperatures)
Temperatures = [25, 40, 60, 80]
# time points in days
days = [[0, 7, 14], # days at first temperature
[0, 5, 10], # days at second temperature
[0, 3, 7], # days at third temperature
[0, 1, 5]] # days at fourth temperature
# acceptance criteria for impurity concentration
# final drug concentration = initial drug concentration (1 - impurity setpoint)
# or
# impurity concentration = initial drug concentration * impurity setpoint
impurity_setpoint = 0.05
In [83]:
# Python module imports
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
from scipy.integrate import odeint
from IPython.display import Image
from IPython.core.display import HTML
from scipy.optimize import minimize
from scipy.optimize import curve_fit
import statsmodels.formula.api as sm
%matplotlib inline
from pdb import set_trace
plt.rcParams.update({'figure.figsize':(8,6),
'figure.titlesize': 16.,
'axes.labelsize':12.,
'xtick.labelsize':12.,
'legend.fontsize': 12.})
In [84]:
# gas constant kcal/mol K
R = 0.00198588
# define the domain for the ODEs solution
ndays = np.max(days) + 365*10
dt = 5000.
t = np.arange(0, ndays*(24*3600), dt)
npts = t.shape[0]
# differential equation solution variable
cols = [(T, C) for T in Temperatures for C in ['P','I','D']]
concentrations = pd.DataFrame({col: np.zeros(t.shape[0]) for col in cols})
concentrations['t'] = t
# experimental results
iterables = [(T, d) for i, T in enumerate(Temperatures) for d in days[i]]
index = pd.MultiIndex.from_tuples(iterables, names=['Temperature', 'days'])
results = pd.DataFrame(data=np.zeros(12), index=index, columns=['P'])
# predicted times
predictions = pd.DataFrame(data=np.zeros(len(Temperatures)), index=Temperatures, columns=['Theory'])
# calculate rate constants from the user defined Arrhenius values
def Arrhenius(A, E, T):
return A*np.exp(-E/(R*T))
# calculate the rate constants for each user defined temperature
k1f = [Arrhenius(A1f, E1f, T+273.) for T in Temperatures]
k1r = [Arrhenius(A1r, E1r, T+273.) for T in Temperatures]
k2 = [Arrhenius(A2, E2, T+273.) for T in Temperatures]
k3 = [Arrhenius(A3, E3, T+273.) for T in Temperatures]
# definition of the derivative of the concentrations vector variable.
def deriv(y, t, k1f, k1r, k2, k3):
P, I, D = y
dPdt = k2*I + k3*D
dIdt = (-k1r-k2)*I + k1f*D
dDdt = k1r*I + (-k1f-k3)*D
dydt = [dPdt, dIdt, dDdt]
return dydt
# initial conditions (user defined)
yo = [Po, Io, Do]
# call solver for each user defined temperature
for i, T in enumerate(Temperatures):
yt = odeint(deriv, yo, t, args=(k1f[i], k1r[i], k2[i], k3[i]))
for j, C in enumerate(['P', 'I', 'D']):
concentrations[T,C] = yt[:,j]
# find the impurity concentration at the user defined time points
index = (np.array(days)/float(ndays)*npts).astype('int')
for i, T in enumerate(Temperatures):
for j, d in enumerate(days[i]):
results.loc[T,d].P = concentrations[T,'P'].iloc[index[i,j]]
# find the time that the reaction will produce the user-defined impurity concentration
for T in Temperatures:
predictions.loc[T].Theory = t[np.argmin(np.abs(concentrations[T,"P"] - impurity_setpoint))]/(3600*24*365)
# define a "private" dictionary to contain all data
_dat = dict(kinetics={'A1f': A1f, 'E1f': E1f, 'A1r': A1r, 'E1r': E1r, 'A2': A2, 'E2': E2, 'A3':A3, 'E3': E3},
initial={'Po': Po, 'Io': Io, 'Do':Do},
setpt=impurity_setpoint)
_dat['const'] = {'R': R}
_dat['conc'] = concentrations
_dat['exppts'] = results
_dat['predictions'] = predictions
In [85]:
t = _dat['conc']['t'].values
t_days = t / (24*3600)
max_days_index = np.argmin(np.abs(t_days - (np.max(_dat['exppts'].index.levels[1].values)+1)))
t_days = t_days[:max_days_index]
c = _dat['conc'].iloc[:max_days_index]
max_conc = c.iloc[:, c.columns.get_level_values(1)=='P'].max().max()
fig = plt.figure(figsize=(14,10))
for i, T in enumerate(Temperatures):
ax = fig.add_subplot(2,2,i+1)
ax.plot(t_days, c[T,'P'], color='red', label='P')
ax.plot(_dat['exppts'].loc[T].index.values, _dat['exppts'].loc[T].P, ls="none", marker='o', color="red")
ax2 = ax.twinx()
ax2.plot(t_days, c[T,'I'], color='green', label='I')
ax2.plot(t_days, c[T,'D'], color='blue', label='D')
ax.set_ylim([0, max_conc])
ax2.set_ylim([0,1])
ax.set_xlabel("time (days)")
ax.set_ylabel("impurity concentration (%)")
ax2.set_ylabel("intermediate and drug concentration (%) ")
ax.legend(['P'], loc="upper left")
ax2.legend(loc="upper right")
ax.set_title("%s C"%(Temperatures[i]))
Concentration profiles: The solid lines are the exact concentrations of the components $P$, $I$, $D$ as a function of time calculated from the defined reaction parameters. The filled symbols denote the experimental time points (defined by the user in the parameter list).
In [86]:
_dat['exppts']
Out[86]:
Theoretical concentration measurements: The above tabulates the theoretical impurity concentrations, $P$, at the measurement time points.
In [87]:
_dat['setpt']
Out[87]:
In [88]:
t = _dat['conc']['t'].values
c = _dat['conc']
t_days = t / (24*3600)
max_conc = c.iloc[:, c.columns.get_level_values(1)=='P'].max().max()
fig = plt.figure(figsize=(14,10))
for i, T in enumerate(Temperatures):
ax = fig.add_subplot(2,2,i+1)
ax.plot(t_days, c[T,'P'], color='red', label='P')
ax.plot(t_days, [_dat['setpt']]*t_days.shape[0], color='red', ls=':')
ax.plot(_dat['exppts'].loc[T].index.values, _dat['exppts'].loc[T].P, ls="none", marker='o', color="red")
ax.set_ylim([0, max_conc])
ax.set_xlabel("time (days)")
ax.set_ylabel(r"impurity concentration (%)")
ax.legend(['P'], loc="upper left")
ax.set_title("%s C"%(Temperatures[i]))
ax2 = ax.twinx()
ax2.plot(t_days, c[T,'I'], color='green', label='I')
ax2.plot(t_days, c[T,'D'], color='blue', label='D')
ax2.set_ylim([0,1])
#ax2.set_ylabel(r"drug concentration")
ax2.legend(loc="upper right")
Concentration profiles: The plots above show the concentration profiles over a duration sufficient that the impurity concentrations reaches the user defined set point (denoted by the red dashed line).
In [89]:
_dat['predictions']
Out[89]:
Shelf life: The above tabulates the theoretical shelf life in years calculated from the user defined kinetic parameters. The value for $25~ ^\circ C$ is the most useful as this is the number that the following models will attempt to predict.
The actual reaction mechanism is unknown in advance, thus a simple reaction mechanism is assumed.
The time is then calculated as:
\begin{equation}
t = \frac{P}{\hat{A} \exp(-\frac{\hat{E}}{RT})}
\end{equation}
In [90]:
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
r = _dat['exppts'].iloc[_dat['exppts'].index.get_level_values(1)!=0].copy()
r['y'] = np.log(np.divide(r['P'].values, r.index.get_level_values(1).astype('f8').values*24*3600+1))
r['x'] = 1./(r.index.get_level_values(0).astype('f8').values + 273)
ax.plot(r.x, r.y, ls='none', marker='o')
ax.set_xlabel(r'$1/T$')
ax.set_ylabel(r'$\ln (P/t)$')
reg = sm.ols(formula="y ~ x", data=r).fit()
pred = reg.params[1]*r.x + reg.params[0]
ax.plot(r.x, pred)
A = np.exp(reg.params[0])
E = -1*reg.params[1]*R
print "regression fit for A = %s "%(A)
print "regression fit for E = %s"%(E)
Regression analysis: The above figure plots the logarithm of the measured impurity concentration divided by time, $\ln (P/t)$, against the reciprical temperature, $1/T$ (filled circles), which would be linear for a zero-order kinetics. The line is the result of a linear fit to the data.
In [91]:
print reg.summary()
In [92]:
temp = _dat['const']['R']*(_dat['predictions'].index.astype('f8').values+273)
k = A * np.exp(-E/temp)
_dat['predictions']['zero_order'] = _dat['setpt'] / k * 1/(3600*24*365)
In [93]:
_dat['predictions']
Out[93]:
Table of predicted results The above table compares the theoretical shelf-life in years (calculated from the exact reaction mechanism and user defined parameters) to the predicted shelf-life assuming zero order kinetics.
Assuming a reaction mechanism as:
One method to obtain the Arrhenius parameters is to a stepwise calculation. First, the impurity concentration experiments conducted at the same temperature, $T$, the kinetic parameter, $k$, at the temperature $T$ can be obtained by plotting log of the drug concentration over time (recall that $P = D_{To} - D$). \begin{equation} D_{To} - D = D_{To} (1-e^{-k t}) \end{equation} \begin{equation} D = D_{To} e^{-k t} \end{equation} \begin{equation} \ln D = \ln D_{To} - k t \end{equation} \begin{equation} \ln \left ( \frac{D}{D_{To}} \right ) = - k t \end{equation}
In [94]:
D = Do + Io - _dat['exppts']['P']
data = np.log(D/(Do+Io)).to_frame('ln(D/Do)')
data['t'] = data.index.get_level_values(1)*24*3600
data['k'] = [0]*data['t'].shape[0]
colors = ['blue','green','red','orange']
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
for i, T in enumerate(_dat['exppts'].index.levels[0]):
ax.plot(data.loc[T]['t'], data.loc[T]['ln(D/Do)'], ls='none', marker='o', color=colors[i])
slope = sm.OLS(data.loc[T]['ln(D/Do)'], data.loc[T]['t']).fit().params[0]
data.loc[(T,slice(None)),'k'] = -1 * slope
ax.plot(data.loc[T]['t'], data.loc[T]['t']*slope,
ls='--', color=colors[i], label="%.2e 1/s"%(-1*slope))
ax.set_ylabel(r'ln $\left ( D/D_{To} \right )$')
ax.set_xlabel(r'$t$ (sec)')
plt.legend(loc='best')
Out[94]:
Figure $\ln \left ( \frac{D}{D_o} \right ) ~\mathrm{vs}~t$: to estimate the the rate constant at each experimental temperature, which is subsequently used to estimate the activation energy and collision factor for initial guess in the non-linear regression.
In [95]:
data
Out[95]:
Table of the kinetic parameter, $k$ regressed from data at the different experimental temperatures.
Once the kinetic parameter, $k$, is estimated at the different temperatures, the Arrenius parameters, $A$ and $E$, can be obtained by plotting the logarithm of the rate constant at each experimental temperature against the reciprical temperature. \begin{equation} \ln k = \ln A - \frac{E}{R} \frac{1}{T} \end{equation}
In [96]:
_dat['exppts']
Out[96]:
In [97]:
# variable data is used from above cell
df = pd.DataFrame({'x': 1./(data.index.levels[0]+273).values, 'y': np.log(data.loc[(slice(None),0),'k'])})
reg = sm.ols(formula='y~x', data=df).fit()
A = np.exp(reg.params['Intercept'])
E = -1 * R * reg.params['x']
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.plot(df.x, df.y, marker='o', ls='none', color='blue', label='data')
ax.plot(df.x, df.x*reg.params['x'] + reg.params['Intercept'], ls="--", color='blue', label='fit')
ax.set_xlabel(r"$1/T$ (1/K)")
ax.set_ylabel(r"log $k$")
ax.legend()
print "regression fit for A = %.2e"%A
print "regression fit for E = %.2f"%E
Figure $\ln k ~\mathrm{vs}~\frac{1}{T}$: to estimate the activation energy and the collision factor to use as initial guesses in the subsequent nonlinear regression.
The estimated Arrhenius parameters can now be used to estimate shelf-life.
In [98]:
temp = _dat['const']['R']*(_dat['predictions'].index.astype('f8').values+273)
k = A * np.exp(-E/temp)
_dat['predictions']['first_order'] = _dat['setpt'] / k * 1/(3600*24*365)
_dat['predictions']
Out[98]:
According to a paper by Fung (Statistical prediction of drug stability based on nonlinear parameter estimation, Fung et al Journal Pharm Sci Vol 73 No 5 pg 657-662 1984), improved confidence can be obtained by performing a non-linear regression of all the data simuntaneously, instead of the stepwise approach. In Fung's paper, they pre-assumed a shelf-life defined by 10% degredation of the product, which simplifies the estimation. However, the regression done here will be performed without the assumption.
Thus, the Arrhenius parameters are found by finding the roots to the above equation, where $A$ and $E$ are the variables.
In [99]:
D = Do + Io - _dat['exppts']['P'].values
df = pd.DataFrame(np.log(D/(Do+Io)), columns=['D*'])
df['t'] = _dat['exppts'].index.get_level_values(1)*24*3600
df['T'] = _dat['exppts'].index.get_level_values(0) + 273.
df = df.loc[df['t']!=0]
def error(p):
A = p[0]
E = p[1]
return np.sum((df['D*']/df['t'] + A * np.exp(-1. * E/(R*df['T'])))**2)
The optimization will be performed over a range of initial conditions to inspect convergence problems.
In [100]:
xo = np.array([A,E])
opt = []
for d in np.arange(.1,1.5,.1):
opt.append(minimize(error, xo*d, tol=1e-30))
Plot the square errors for a series of initial conditions.
In [101]:
e2 = np.array([error(r.x) for r in opt])
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(e2)
ax.set_ylabel(r"$\sum \mathrm{error}^2$")
ax.set_xlabel("initial condition index")
A, E = (opt[e2.argmin()]).x
print "optimal A = %.2e 1/s" %A
print "optimal E = %.2f kcal/mol K" %E
Figure of square error indicates that the objective function likely has some very shallow gradients near the minimum, thus inhibiting convergence to a unique value.
In [102]:
temp = _dat['const']['R']*(_dat['predictions'].index.astype('f8').values+273)
k = A * np.exp(-E/temp)
_dat['predictions']['first_order_nonlinear'] = impurity_setpoint / k * 1/(3600*24*365)
_dat['predictions']
Out[102]:
Table of precition comparison suggests that zero_order was actually the most accurate of the models. It should be noted that the convergence of the non-linear regression is questionable. The under prediction of shelf-life is to be expected, because of the faster reaction of the amorphous form that gets consumed early in the process.
As illustrated above, there is a weak dependence of the objective function on the kinetic parameters near the minimum. Consequently, the optimization problem has difficulty converging to an absolute minimum. The paper by King et al suggests some modification to the equations prior to the non-linear regression. First, the concentration profiles are written as a function of the temperature dependent reaction rate constant, $k$, for zero, first and second order reactions.
Here, $Co$ is the total starting drug concentration and $C$ is the total drug concentration at time, $t$. The rate constant is related to temperature via the Arrhenius equation. \begin{equation} k = Ae^{-\frac{E}{RT}} \end{equation} The Arrhenius relation could be substituted into the above concentration equations, and a non-linear regression be applied to estimate the parametes $A$ and $E$. However, as illustrated in the previous method example above, the gradients are very small in the vicinity of the global minimum, which makes the parameter estimation difficult. Part of the problem seems to aries because the rate constant, $A$ is on the order of $10^5$, whereas the activation energy is on the order of $10^1$. The disparity in size of the parameters, may lead to some instabilities. Fung et. al eliminate the collision parameter, $A$ by using the Arrhenius equation at a temperature of 298 K. \begin{equation} A = k_{298} e^{\frac{E}{R*298}} \end{equation} Substitution gives: \begin{equation} k = k_{298} e^{\frac{E}{R*298}} e^{-\frac{E}{RT}} = k_{298} e^{\frac{E}{R} \left ( \frac{1}{298} - \frac{1}{T} \right ) } \end{equation}
The rate constant at 298 K, $k_{298}$, is then exressed as a function of the user defined concentration set-point to define the product shelf-life, $C_f$, and the time it will take the drug to reach the user set-point at 298 K, $t_{298}$. \begin{equation} k_{298} = f(C_f, t_{298}) \end{equation} Here, $f$ is a function that depends on the reaction order. The shelf-life concentration set-point can be exressed in terms of the variable impurity-setpoint, $S_p$ as: \begin{equation} S_p = \frac{C_o - C_f}{C_o} \rightarrow C_f = C_o(1-S_p) \end{equation} Rearranging the above concentration equations for zero, first and second order reactions and defining the parameters using the 298 K subscript gives: \begin{align*} & \mathrm{zero~order} & {\Huge |} & \mathrm{first~order} & {\Huge |} & \mathrm{second~order} & \\ & k_{298} = \frac{C_o - C_f}{t^*_{298}} & {\Huge |} & k_{298} = -\frac{ln \frac{C_f}{C_o}}{t^*_{298}} & {\Huge |} & k_{298} = \frac{1}{t^*_{298}} \left ( \frac{1}{C_f} - \frac{1}{C_o} \right ) \\ & k_{298} = \frac{S_p }{t^*_{298}} & {\Huge |} & k_{298} = -\frac{ \ln (1 - S_p) }{t^*_{298}} & {\Huge |} & k_{298} = \frac{ 1 }{C_o t^*_{298}} \left ( \frac{S_p}{1-S_p} \right ) \end{align*} Substituting the expression for the rate constant at 298 K, $k_{298}$, into the concentration equations gives:
In the above relationships, the fit parameters are the activation energy, $E$, and the product shelf-life at 298 K, $t^*_{298}$.
Define the variables that will be used for the subsequent non-linear regression analysis.
In [103]:
Co = Do + Io
C = Co - _dat['exppts']['P'].values
t = _dat['exppts'].index.get_level_values(1).values*24*3600.
T = _dat['exppts'].index.get_level_values(0).values + 273.
C = C[t!=0]
T = T[t!=0]
t = t[t!=0]
Sp = _dat['setpt']
R = _dat['const']['R']
Define the objective function for a first order reaaction.
In [104]:
def error(p):
t_298 = p[0]
E = p[1]
k = np.exp(E/R * (1/298. - 1/T))
k = Sp / t_298 * k
err = Co * (1 - k * t)
return np.sum(err**2)
Perform the non-linear regression over a range of initial conditions.
In [105]:
t_298 = 10.*365*24*3600
E = 10.
xo = np.array([t_298,E])
opt = []
for d in np.arange(.1,10,.1):
opt.append(minimize(error, xo*d, tol=1e-30))
Plot the error of the regression as a function of the initial conditions.
In [106]:
e2 = np.array([error(r.x) for r in opt])
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(e2)
ax.set_ylabel(r"$\sum \mathrm{error}^2$")
ax.set_xlabel("initial condition index")
t_298, E = (opt[e2.argmin()]).x
print "optimal t_298 = %.1f y" %(t_298/(365*24*3600))
print "optimal E = %.2f kcal/mol K" %E
Figure of sum of residual erros illustrates that the expressions for the objective function has very shallow gradients near the minimum, which results in difficult in converging to a global minimum.
In [107]:
_dat['predictions']['KKF_zero'] = np.nan
_dat['predictions']['KKF_zero'].loc[25] = t_298 / (365*24*3600)
Define the objective function for a first order reaaction.
In [108]:
def error(p):
t_298 = p[0]
E = p[1]
k = np.exp(E/R * (1/298. - 1/T))
k = np.log(1-Sp) / t_298 * k
err = Co * np.exp(k*t) - C
return np.sum(err**2)
Perform the non-linear regression over a range of initial conditions.
In [109]:
t_298 = 10.*365*24*3600
E = 10.
xo = np.array([t_298,E])
opt = []
for d in np.arange(.1,10,.1):
opt.append(minimize(error, xo*d, tol=1e-30))
Plot the error of the regression as a function of the initial conditions.
In [110]:
e2 = np.array([error(r.x) for r in opt])
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(e2)
ax.set_ylabel(r"$\sum \mathrm{error}^2$")
ax.set_xlabel("initial condition index")
t_298, E = (opt[e2.argmin()]).x
print "optimal t_298 = %.1f y" %(t_298/(365*24*3600))
print "optimal E = %.2f kcal/mol K" %E
Figure of sum of residual erros illustrates that the expressions for the objective function has very shallow gradients near the minimum, which results in difficult in converging to a global minimum.
In [111]:
_dat['predictions']['KKF_first'] = np.nan
_dat['predictions']['KKF_first'].loc[25] = t_298 / (365*24*3600)
Define the objective function for a first order reaaction.
In [112]:
def error(p):
t_298 = p[0]
E = p[1]
k = np.exp(E/R * (1/298. - 1/T))
k = 1 / t_298 * (Sp/(1-Sp)) * k
err = Co / (1 + k*t)
return np.sum(err**2)
Perform the non-linear regression over a range of initial conditions.
In [113]:
t_298 = 10.*365*24*3600
E = 10.
xo = np.array([t_298,E])
opt = []
for d in np.arange(.1,10,.1):
opt.append(minimize(error, xo*d, tol=1e-30))
Plot the error of the regression as a function of the initial conditions.
In [114]:
e2 = np.array([error(r.x) for r in opt])
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(e2)
ax.set_ylabel(r"$\sum \mathrm{error}^2$")
ax.set_xlabel("initial condition index")
t_298, E = (opt[e2.argmin()]).x
print "optimal t_298 = %.1f y" %(t_298/(365*24*3600))
print "optimal E = %.2f kcal/mol K" %E
Figure of sum of residual erros illustrates that the expressions for the objective function has very shallow gradients near the minimum, which results in difficult in converging to a global minimum.
In [115]:
_dat['predictions']['KKF_second'] = np.nan
_dat['predictions']['KKF_second'].loc[25] = t_298 / (365*24*3600)
In [116]:
_dat['predictions']
Out[116]: