Work in pairs, and put the names of both people in your group here!
The main goal of this assignment is to model a biological process (namely, the competition between viruses and the human body's immune system) in a mathematical way, and in doing so reproduce an experimental dataset. More specifically:
Some background: In this project we explore a model of the viral load—the number of virions in the blood of a patient infected with HIV—after the administration of an antiretroviral drug.
Viruses multiply in more than one way. One of the most common is called the Lytic Cycle, and the other is the Lysogenic Cycle. Both cycles are similar in that the virus takes cells hostage and use the cell's resources to multiply, making many copies of itself. Once enough new viruses are produced inside the cell it bursts, and the newly-created viruses then are released into the bloodstream to carry on the process and search for new host cells to invade.
Antiviral drugs behave differently than antibiotics - rather than directly destroying the virus population in a patient, they instead generally inhibit the creation of new viruses by preventing viruses from entering target cells, by preventing the viruses from synthesizing new viruses once they have invaded new cells, or by preventing the release of newly-created viruses from the host cell.
This project: The first part of this project will involve visualizing and fitting a simple model to experimental measurements of the HIV viral load in a person's body after they are administered, and the second part will involve creating a more sophisticated mathematical model of the biological processes involved, and comparing it to the experimental data.
We're going to use experimental data of actual viral loads provided by Kindler and Nelson (2015). They write:
File
HIVseries.mat
contains variable "a" with two columns of data. The first is the time in days since administration of a treatment to an HIV positive patient; the second contains the concentration of virus in that patient's blood, in arbitrary units.
HIVseries.csv
andHIVseries.npy
contain the same data in the same format.HIVseries.npz
contains the same data in two separate arrays calledtime_in_days
andviral_load
.Data from A. Perelson. Modelling viral and immune system dynamics. Nature Revs. Immunol. (2002) vol. 2 (1) pp. 28--36 (Box 1).
So, to summarize, the dataset hiv_data
has 2 columns:
time_in_days
is the number of days since an HIV-positive patient received a treatment.viral_load
is the concentraiton of the virus in that patients blood, in arbitrary units.In the cells below, we're going to load, examine, and visualize this data and make a simple model to describe it.
In [ ]:
# some code to set up the problem.
# Make plots inline
%matplotlib inline
# Make inline plots vector graphics instead of raster graphics
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('pdf', 'svg')
# import modules for plotting and data analysis
import matplotlib.pyplot as plt
import numpy as np
import pandas
pandas.read_csv()
to Load the DataThe data file we'll use is in a file format called CSV, which stands for comma-separated values. It's a commonly used format for storing 2-dimensional data, and programs like Microsoft Excel can import and export .CSV files.
The code below will use the read_csv()
function from the pandas data analysis library to load the CSV file you need from the web, then store the data as a variable called hiv_data
.
In [ ]:
# Loading the data in a data frame
hiv_data = pandas.read_csv(
"https://raw.githubusercontent.com/ComputationalModeling/IPML-Data/master/01HIVseries/HIVseries.csv",
header = None,
names = ["time_in_days", "viral_load"]
)
Now, use the Pandas analysis tools you've used in the past to analyze and visualize the plots!
Some useful analysis methods include .describe()
, .head()
, .tail()
, .mean()
,.min()
,.max()
.
Some useful plotting functions include .plot.scatter()
and .plot.histogram()
.
In particular, make sure to show the time evolution of the viral load (i.e., viral load as a function of time).
In [ ]:
# Put your code here, and add additional cells as necessary.
Now we're going to try some simple fits to the viral load as a function of time.
It's fairly common for processes in nature to behave in an exponential or power-law way, so we'll try to create fits for two simple models:
Model 1 (power law): $L(t)_{V,pow} = L_{V0,p} (t/\tau_p)^{-\alpha}$
Model 2 (exponential decay): $L(t)_{V,exp} = L_{V0,e} e^{-t/\tau_e}$
In these models, the values $L_{V0,p}$, $L_{V0,e}$, $\tau_p$, $\tau_e$, and $\alpha$ are constants describing different parts of the model's behavior (more on this later).
Use np.linspace()
to create an array of times (at least 100, going from t=0 to t=10 days), and then create two additional arrays of values corresponding to Models 1 and 2 above. Plot them both on the same graph. Note that you can plot multiple datasets together by repeated calls to plt.plot()
, or alternately by giving multiple sets of x,y,format data to plt.plot()
. For example:
plt.plot(x1vals,y1vals,'r--')
plt.plot(x2vals,y2vals,'go')
and
plt.plot(x1vals,y1vals,'r--',x2vals,y2vals,'go')
will produce the same plot. Note also that you can manipulate the plot's axes and limits using the pyplot xscale()
, yscale()
, xlim()
, and ylim()
methods. Making one or more of the axis scales a log axis (i.e., calling plt.yscale('log')
) may make it easier to see similarities or differences between the models.
As a reminder: if you have a question about a particular python method or function, you can learn about that by typing a question mark after it. For example, np.linspace?
will give you information about that method.
In [ ]:
# Put your code here, and add additional cells as necessary.
Try to find values for the model parameters that fit well.
Adjust the values of $L_{V0,p}$, $L_{V0,e}$, $\tau_p$, $\tau_e$, and $\alpha$ to obtain the best fit to the experimental data. As in the previous section, adjust the plot limits and scales as necessary so you can identify which model (and set of model parameters) provides the best fit?
In [ ]:
# Put your code here, and add additional cells as necessary.
Question: Which model does a better job of describing the experimental data? And, does it succeed equally well at both early and late times in the data? And, what do you think the various quantities in the model ($L_{V0,p}$, $L_{V0,e}$, $\tau_p$, $\tau_e$, and $\alpha$) represent?
put your answer here!
In general, we can think of what happenes to an infected patient that has been administered an antiviral drug using a simple model. The key points are:
A table of parameters in this model is here, for your convenience:
Parameter | Meaning |
---|---|
$N_v$ | Number of virions in bloodstream |
$N_I$ | Number of infected cells |
$t_{mul}$ | Timescale for virus multiplication |
$\gamma$ | Number of virions produced per infected cell |
$T_{crit}$ | Amount of time it takes for drugs to stop virus reproduction |
$t_{rel}$ | Timescale on which infected cells release virions |
$t_{clr}$ | Timescale on which virions are cleared from the body |
You have a mission that will take place in three parts:
In [ ]:
# put your program here, and add additional cells as necessary!
In [ ]:
from IPython.display import HTML
HTML(
"""
<iframe
src="https://goo.gl/forms/cHBqN8XXeOOTt0K32?embedded=true"
width="80%"
height="1200px"
frameborder="0"
marginheight="0"
marginwidth="0">
Loading...
</iframe>
"""
)
Note: this assignment was inspired by Nelson's Physical models of Living Systems, Chapter 1, and Kinder and Nelson's A Student's Guide to Python for Physical Modeling, Chapter 4.