NOTE: THIS IS A TWO-DAY PROJECT! We don't expect you to get it all done in one day!
Instructor notes: This is a project that will require substantial assistance from us, particularly for students whose background math skills are a bit weaker. Please be careful!
Work in pairs, and put the names of both people in your group here!
The main goal of this assignment is to model a biological process (namely, the competition between viruses and the human body's immune system) in a mathematical way, and in doing so reproduce an experimental dataset. More specifically:
Some background: In this project we explore a model of the viral load—the number of virions in the blood of a patient infected with HIV—after the administration of an antiretroviral drug.
Viruses multiply in more than one way. One of the most common is called the Lytic Cycle, and the other is the Lysogenic Cycle. Both cycles are similar in that the virus takes cells hostage and use the cell's resources to multiply, making many copies of itself. Once enough new viruses are produced inside the cell it bursts, and the newly-created viruses then are released into the bloodstream to carry on the process and search for new host cells to invade.
Antiviral drugs behave differently than antibiotics - rather than directly destroying the virus population in a patient, they instead generally inhibit the creation of new viruses by preventing viruses from entering target cells, by preventing the viruses from synthesizing new viruses once they have invaded new cells, or by preventing the release of newly-created viruses from the host cell.
This project: The first part of this project will involve visualizing and fitting a simple model to experimental measurements of the HIV viral load in a person's body after they are administered, and the second part will involve creating a more sophisticated mathematical model of the biological processes involved, and comparing it to the experimental data.
We're going to use experimental data of actual viral loads provided by Kindler and Nelson (2015). They write:
File
HIVseries.mat
contains variable "a" with two columns of data. The first is the time in days since administration of a treatment to an HIV positive patient; the second contains the concentration of virus in that patient's blood, in arbitrary units.
HIVseries.csv
andHIVseries.npy
contain the same data in the same format.HIVseries.npz
contains the same data in two separate arrays calledtime_in_days
andviral_load
.Data from A. Perelson. Modelling viral and immune system dynamics. Nature Revs. Immunol. (2002) vol. 2 (1) pp. 28--36 (Box 1).
So, to summarize, the dataset hiv_data
has 2 columns:
time_in_days
is the number of days since an HIV-positive patient received a treatment.viral_load
is the concentraiton of the virus in that patients blood, in arbitrary units.In the cells below, we're going to load, examine, and visualize this data and make a simple model to describe it.
In [ ]:
# some code to set up the problem.
# Make plots inline
%matplotlib inline
# Make inline plots vector graphics instead of raster graphics
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('pdf', 'svg')
# import modules for plotting and data analysis
import matplotlib.pyplot as plt
import numpy as np
import pandas
pandas.read_csv()
to Load the DataThe data file we'll use is in a file format called CSV, which stands for comma-separated values. It's a commonly used format for storing 2-dimensional data, and programs like Microsoft Excel can import and export .CSV files.
The code below will use the read_csv()
function from the pandas data analysis library to load the CSV file you need from the web, then store the data as a variable called hiv_data
.
In [ ]:
# Loading the data in a data frame
hiv_data = pandas.read_csv(
"https://raw.githubusercontent.com/ComputationalModeling/IPML-Data/master/01HIVseries/HIVseries.csv",
header = None,
names = ["time_in_days", "viral_load"]
)
Now, use the Pandas analysis tools you've used in the past to analyze and visualize the plots!
Some useful analysis methods include .describe()
, .head()
, .tail()
, .mean()
,.min()
,.max()
.
Some useful plotting functions include .plot.scatter()
and .plot.histogram()
.
In particular, make sure to show the time evolution of the viral load (i.e., viral load as a function of time).
In [ ]:
# Put your code here, and add additional cells as necessary.
hiv_data.describe()
hiv_data.plot.scatter('time_in_days','viral_load')
Now we're going to try some simple fits to the viral load as a function of time.
It's fairly common for processes in nature to behave in an exponential or power-law way, so we'll try to create fits for two simple models:
Model 1 (power law): $L(t)_{V,pow} = L_{V0,p} (t/\tau_p)^{-\alpha}$
Model 2 (exponential decay): $L(t)_{V,exp} = L_{V0,e} e^{-t/\tau_e}$
In these models, the values $L_{V0,p}$, $L_{V0,e}$, $\tau_p$, $\tau_e$, and $\alpha$ are constants describing different parts of the model's behavior (more on this later).
Use np.linspace()
to create an array of times (at least 100, going from t=0 to t=10 days), and then create two additional arrays of values corresponding to Models 1 and 2 above. Plot them both on the same graph. Note that you can plot multiple datasets together by repeated calls to plt.plot()
, or alternately by giving multiple sets of x,y,format data to plt.plot()
. For example:
plt.plot(x1vals,y1vals,'r--')
plt.plot(x2vals,y2vals,'go')
and
plt.plot(x1vals,y1vals,'r--',x2vals,y2vals,'go')
will produce the same plot. Note also that you can manipulate the plot's axes and limits using the pyplot xscale()
, yscale()
, xlim()
, and ylim()
methods. Making one or more of the axis scales a log axis (i.e., calling plt.yscale('log')
) may make it easier to see similarities or differences between the models.
As a reminder: if you have a question about a particular python method or function, you can learn about that by typing a question mark after it. For example, np.linspace?
will give you information about that method.
In [ ]:
# Put your code here, and add additional cells as necessary.
model_time = np.linspace(0,10,200)
LV0p = 1.0e+5
TauP = 2.0
alpha = 2.0
LV0e = 1.8e+5
TauE = 2.25
powerlaw = LV0p * (model_time/TauP)**-alpha
exponential = LV0e * np.exp(-model_time/TauE)
plt.plot(model_time,powerlaw,'b-',model_time,exponential,'r-')
plt.yscale('log')
plt.xlim(0,10)
plt.ylim(1.0e+3,1.0e+6)
Try to find values for the model parameters that fit well.
Adjust the values of $L_{V0,p}$, $L_{V0,e}$, $\tau_p$, $\tau_e$, and $\alpha$ to obtain the best fit to the experimental data. As in the previous section, adjust the plot limits and scales as necessary so you can identify which model (and set of model parameters) provides the best fit?
In [ ]:
# Put your code here, and add additional cells as necessary.
plt.plot(hiv_data['time_in_days'],hiv_data['viral_load'],'bo')
plt.plot(model_time,exponential,'b--')
plt.plot(model_time,powerlaw,'r--')
plt.yscale('log')
plt.xlim(0,10)
plt.ylim(1.0e+3,1.0e+6)
Question: Which model does a better job of describing the experimental data? And, does it succeed equally well at both early and late times in the data? And, what do you think the various quantities in the model ($L_{V0,p}$, $L_{V0,e}$, $\tau_p$, $\tau_e$, and $\alpha$) represent?
Answer: The exponential model does a better job - we can get it to fit perfectly after t ~ 0.5 days. The power law model is not great but does ok after ~2 days. Neither model does a good job of fitting the data at very early times (less than 0.5 days) - we're clearly missing something here!
The quantities $L_{V0,p}$ and $L_{V0,e}$ represent the amount of virions in the sytem at a particular time (t=0 for the exponential model and t=$\tau_p$ for the power-law model). $\tau_p$ is a timescale corresponding to when the amount of virions equals $L_{V0,p}$, and $\tau_e$ is the exponential decay timescale in the exponential model. $\alpha$ is a power law exponent that controls how quickly the virions disappear in the power-law model.
In general, we can think of what happenes to an infected patient that has been administered an antiviral drug using a simple model. The key points are:
A table of parameters in this model is here, for your convenience:
Parameter | Meaning |
---|---|
$N_v$ | Number of virions in bloodstream |
$N_I$ | Number of infected cells |
$t_{mul}$ | Timescale for virus multiplication |
$\gamma$ | Number of virions produced per infected cell |
$T_{crit}$ | Amount of time it takes for drugs to stop virus reproduction |
$t_{rel}$ | Timescale on which infected cells release virions |
$t_{clr}$ | Timescale on which virions are cleared from the body |
You have a mission that will take place in three parts:
Understanding the math
It's instructive to look at section 1.2 of Nelson's book.
There is an initial phase where the number of viruses grows exponentially, since $\frac{dN_v}{dt} \propto \frac{dN_I}{dt} \propto \gamma N_I / t_{mul}$. This leads to $N_v(t) \simeq N_v(0) e^{t/t_{mul}}$, where $N_v(0)$ is the number of virions in the bloodstream at t=0.
Some time $T_{crit}$ after the drug is administered, cell infection stops and the number of infected cells changes as $\frac{dN_I}{dt} = -N_{I}/t_{rel}$. This leads to the exponential solution $N_I(t) = N_I(0) e^{-t/t_{rel}}$. (Where $t=0$ is assumed to be at the time that cell infection stops, which is really $T_{crit}$
The total viral load depends on the rate that virions are dumped into the bloodstream by dying cells and cleared from the body by its immune system. In other words:
$\frac{dN_v}{dt} = -N_v/t_{clr} + \gamma N_{I}/t_{rel}$
Or, taking into account the fact that we know $N_I(t)$ already, with t=0 redefined to be the time where cell infection stops:
$\frac{dN_v}{dt} = -N_v/t_{clr} + \frac{\gamma N_I(0)}{t_{rel}} e^{-t/t_{rel}}$
The computer program: The students should create either a for loop or a while loop that evolves the system forward in time, and which basically solves $N_v(t)$ with two distinct phases:
There should be if statements involved, and possibly break/continue statements if they're feeling clever.
In [ ]:
# put your computer program here!
# put your computer program here!
%matplotlib inline
import matplotlib.pyplot as plt
import math
import numpy as np
NV0 = 1.0e+5 # number of initial viral cells
NI0 = 1.0e+4 # number of initial infected cells
T_crit = 0.1 # time scale (in days) on which the drug becomes effective
t_clr = 1.1 # time scale (in days) on which your body clears out virions
t_rel = 2.0 # time scale (in days) where inected cells die and release virions
t_mul = 0.2 # growth time of viruses (in days) prior to drug administration
dt = 0.01 # timestep length (in days)
end_time = 10.0 # simulation end time (in days)
gamma = 10.0 # number of virions/infected cell
NV = NV0
NI = NI0
time = []
viral_load = []
infected_cells = []
this_time = 0.0
while(this_time <= end_time):
if this_time <= T_crit:
# change in virions w/time
dNVdt = gamma * NI0 / t_mul * math.exp(this_time/t_mul)
NI_inf = NI0 * math.exp(this_time/t_mul)
NV += dNVdt*dt
else: # this_time > T_crit
dNVdt = gamma*NI_inf/t_rel*math.exp(-(this_time-T_crit)/t_rel) - NV/t_clr
NV += dNVdt*dt
#NI_inf +=
this_time += dt
time.append(this_time)
#infected_cells.append(NI_inf)
viral_load.append(NV)
plt.plot(hiv_data['time_in_days'],hiv_data['viral_load'],'bo')
plt.plot(model_time,exponential,'b--')
plt.plot(model_time,powerlaw,'r--')
plt.plot(time,viral_load,'k-')
plt.xlim(0,10)
plt.ylim(1.0e+3,5.0e+5)
plt.yscale('log')
In [ ]:
from IPython.display import HTML
HTML(
"""
<iframe
src="https://goo.gl/forms/cHBqN8XXeOOTt0K32?embedded=true"
width="80%"
height="1200px"
frameborder="0"
marginheight="0"
marginwidth="0">
Loading...
</iframe>
"""
)
Note: this assignment was inspired by Nelson's Physical models of Living Systems, Chapter 1, and Kinder and Nelson's A Student's Guide to Python for Physical Modeling, Chapter 4.