Virus and drug interactions in the human body

+++++ SOLUTIONS - DO NOT DISTRIBUTE +++++

NOTE: THIS IS A TWO-DAY PROJECT! We don't expect you to get it all done in one day!

Instructor notes: This is a project that will require substantial assistance from us, particularly for students whose background math skills are a bit weaker. Please be careful!

Names of group members

Work in pairs, and put the names of both people in your group here!

Goals of this assignment

The main goal of this assignment is to model a biological process (namely, the competition between viruses and the human body's immune system) in a mathematical way, and in doing so reproduce an experimental dataset. More specifically:

  • To develop an intuition about how equation-based models work
  • To understand how to evaluate models by plotting them together with experimental data
  • To practice predicting what the effect will be when we change the parameters of a model
  • To learn how we can iterate to improve the fit of a model

Our project - working with data and models

Some background: In this project we explore a model of the viral load—the number of virions in the blood of a patient infected with HIV—after the administration of an antiretroviral drug.

Viruses multiply in more than one way. One of the most common is called the Lytic Cycle, and the other is the Lysogenic Cycle. Both cycles are similar in that the virus takes cells hostage and use the cell's resources to multiply, making many copies of itself. Once enough new viruses are produced inside the cell it bursts, and the newly-created viruses then are released into the bloodstream to carry on the process and search for new host cells to invade.

Antiviral drugs behave differently than antibiotics - rather than directly destroying the virus population in a patient, they instead generally inhibit the creation of new viruses by preventing viruses from entering target cells, by preventing the viruses from synthesizing new viruses once they have invaded new cells, or by preventing the release of newly-created viruses from the host cell.

This project: The first part of this project will involve visualizing and fitting a simple model to experimental measurements of the HIV viral load in a person's body after they are administered, and the second part will involve creating a more sophisticated mathematical model of the biological processes involved, and comparing it to the experimental data.


Loading and Examining Experimental Data

What is this data?

We're going to use experimental data of actual viral loads provided by Kindler and Nelson (2015). They write:

File HIVseries.mat contains variable "a" with two columns of data. The first is the time in days since administration of a treatment to an HIV positive patient; the second contains the concentration of virus in that patient's blood, in arbitrary units.

HIVseries.csv and HIVseries.npy contain the same data in the same format. HIVseries.npz contains the same data in two separate arrays called time_in_days and viral_load.

Data from A. Perelson. Modelling viral and immune system dynamics. Nature Revs. Immunol. (2002) vol. 2 (1) pp. 28--36 (Box 1).

So, to summarize, the dataset hiv_data has 2 columns:

  • time_in_days is the number of days since an HIV-positive patient received a treatment.
  • viral_load is the concentraiton of the virus in that patients blood, in arbitrary units.

In the cells below, we're going to load, examine, and visualize this data and make a simple model to describe it.


In [ ]:
# some code to set up the problem.

# Make plots inline
%matplotlib inline

# Make inline plots vector graphics instead of raster graphics
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('pdf', 'svg')

# import modules for plotting and data analysis
import matplotlib.pyplot as plt
import numpy as np
import pandas

Use pandas.read_csv() to Load the Data

The data file we'll use is in a file format called CSV, which stands for comma-separated values. It's a commonly used format for storing 2-dimensional data, and programs like Microsoft Excel can import and export .CSV files.

The code below will use the read_csv() function from the pandas data analysis library to load the CSV file you need from the web, then store the data as a variable called hiv_data.


In [ ]:
# Loading the data in a data frame

hiv_data = pandas.read_csv(
    "https://raw.githubusercontent.com/ComputationalModeling/IPML-Data/master/01HIVseries/HIVseries.csv",
    header = None,
    names = ["time_in_days", "viral_load"]
)

Examine the data

Now, use the Pandas analysis tools you've used in the past to analyze and visualize the plots!

Some useful analysis methods include .describe(), .head(), .tail(), .mean(),.min(),.max().

Some useful plotting functions include .plot.scatter() and .plot.histogram().

In particular, make sure to show the time evolution of the viral load (i.e., viral load as a function of time).


In [ ]:
# Put your code here, and add additional cells as necessary.

hiv_data.describe()

hiv_data.plot.scatter('time_in_days','viral_load')

Make a simple fit to the experimental data

Now we're going to try some simple fits to the viral load as a function of time.

It's fairly common for processes in nature to behave in an exponential or power-law way, so we'll try to create fits for two simple models:

Model 1 (power law): $L(t)_{V,pow} = L_{V0,p} (t/\tau_p)^{-\alpha}$

Model 2 (exponential decay): $L(t)_{V,exp} = L_{V0,e} e^{-t/\tau_e}$

In these models, the values $L_{V0,p}$, $L_{V0,e}$, $\tau_p$, $\tau_e$, and $\alpha$ are constants describing different parts of the model's behavior (more on this later).

Use np.linspace() to create an array of times (at least 100, going from t=0 to t=10 days), and then create two additional arrays of values corresponding to Models 1 and 2 above. Plot them both on the same graph. Note that you can plot multiple datasets together by repeated calls to plt.plot(), or alternately by giving multiple sets of x,y,format data to plt.plot(). For example:

plt.plot(x1vals,y1vals,'r--')
plt.plot(x2vals,y2vals,'go')

and

plt.plot(x1vals,y1vals,'r--',x2vals,y2vals,'go')

will produce the same plot. Note also that you can manipulate the plot's axes and limits using the pyplot xscale(), yscale(), xlim(), and ylim() methods. Making one or more of the axis scales a log axis (i.e., calling plt.yscale('log')) may make it easier to see similarities or differences between the models.

As a reminder: if you have a question about a particular python method or function, you can learn about that by typing a question mark after it. For example, np.linspace? will give you information about that method.


In [ ]:
# Put your code here, and add additional cells as necessary.


model_time = np.linspace(0,10,200)


LV0p = 1.0e+5
TauP = 2.0
alpha = 2.0

LV0e = 1.8e+5
TauE = 2.25

powerlaw = LV0p * (model_time/TauP)**-alpha

exponential = LV0e * np.exp(-model_time/TauE)

plt.plot(model_time,powerlaw,'b-',model_time,exponential,'r-')
plt.yscale('log')
plt.xlim(0,10)
plt.ylim(1.0e+3,1.0e+6)

Plot the model and data together

Try to find values for the model parameters that fit well.

Adjust the values of $L_{V0,p}$, $L_{V0,e}$, $\tau_p$, $\tau_e$, and $\alpha$ to obtain the best fit to the experimental data. As in the previous section, adjust the plot limits and scales as necessary so you can identify which model (and set of model parameters) provides the best fit?


In [ ]:
# Put your code here, and add additional cells as necessary.

plt.plot(hiv_data['time_in_days'],hiv_data['viral_load'],'bo')
plt.plot(model_time,exponential,'b--')
plt.plot(model_time,powerlaw,'r--')
plt.yscale('log')
plt.xlim(0,10)
plt.ylim(1.0e+3,1.0e+6)

Question: Which model does a better job of describing the experimental data? And, does it succeed equally well at both early and late times in the data? And, what do you think the various quantities in the model ($L_{V0,p}$, $L_{V0,e}$, $\tau_p$, $\tau_e$, and $\alpha$) represent?

Answer: The exponential model does a better job - we can get it to fit perfectly after t ~ 0.5 days. The power law model is not great but does ok after ~2 days. Neither model does a good job of fitting the data at very early times (less than 0.5 days) - we're clearly missing something here!

The quantities $L_{V0,p}$ and $L_{V0,e}$ represent the amount of virions in the sytem at a particular time (t=0 for the exponential model and t=$\tau_p$ for the power-law model). $\tau_p$ is a timescale corresponding to when the amount of virions equals $L_{V0,p}$, and $\tau_e$ is the exponential decay timescale in the exponential model. $\alpha$ is a power law exponent that controls how quickly the virions disappear in the power-law model.


Creating a mathematical model describing the system

In this section, we're going to use our understanding of the biological and biochemical processes that are at work in this system to make a model describing the system's evolution over time.

Some background knowledge we need for this model

In general, we can think of what happenes to an infected patient that has been administered an antiviral drug using a simple model. The key points are:

  • Viruses multiply rapidly if uninhibited by the body's immune system and infect cells at a rate that is proportional to the number of virions (virus particles), $N_v$, that are in the bloodstream. In other words, $\frac{dN_I}{dt}$, the rate at which the number of cells that are infected ($N_I$) changes, depends on $N_v$ and the time scale for multiplication $t_{mul}$. $N_v$ in turn depends on the number of infected cells and the number of virions produced per infected cell, $\gamma$.
  • As antiviral drugs are administered at a constant rate, it takes some amount of time $T_{crit}$ for the amount of drug in the bloodstream to reach a high enough level that it suppresses the formation of new viruses. (This time varies from patient to patient, but is typically one day to a few days.)
  • After the drug takes effect, we can assume that cell infection immediately stops. After infection stops, the number of infected cells $N_I$ decreases at a rate $\frac{dN_I}{dt} = -N_{I}/t_{rel}$, where $t_{rel}$ is the time scale on which infected cells release virions into the bloodstream and die.
  • Once cells can no longer be infected, virions are released into the bloodstream through the death of previously infected cells. The rate at which these virions are released behaves as $\frac{dN_v}{dt} = \gamma N_{I}/t_{rel}$.
  • The body clears virions out of the body at a rate that depends on the amount of virions that are in the bloodstream, $\frac{dN_v}{dt} = -N_v/t_{clr}$ ($N_v$ is the number of virions in the bloodstream and $t_{clr}$ is the time scale on which virions are cleared from the body).

A table of parameters in this model is here, for your convenience:

Parameter Meaning
$N_v$ Number of virions in bloodstream
$N_I$ Number of infected cells
$t_{mul}$ Timescale for virus multiplication
$\gamma$ Number of virions produced per infected cell
$T_{crit}$ Amount of time it takes for drugs to stop virus reproduction
$t_{rel}$ Timescale on which infected cells release virions
$t_{clr}$ Timescale on which virions are cleared from the body

Your mission

You have a mission that will take place in three parts:

  1. Using the information above and your whiteboards, create a mathematical model for how the viral load in the bloodstream, $N_v$, varies as a function of time. Don't use numbers - just symbols!
  2. After you are happy with your model (and after you've talked to one of the instructors about it), figure out how to implement it as a computer program. Do so below!
  3. Compare the shape of the plot created by your model with the experimental data from earlier in this project. How do they compare? (Suggestion: assume that all of the time scales in your model are roughly equal, and roughly a day, and vary them from there.)

INSTRUCTOR NOTES

Understanding the math

It's instructive to look at section 1.2 of Nelson's book.

There is an initial phase where the number of viruses grows exponentially, since $\frac{dN_v}{dt} \propto \frac{dN_I}{dt} \propto \gamma N_I / t_{mul}$. This leads to $N_v(t) \simeq N_v(0) e^{t/t_{mul}}$, where $N_v(0)$ is the number of virions in the bloodstream at t=0.

Some time $T_{crit}$ after the drug is administered, cell infection stops and the number of infected cells changes as $\frac{dN_I}{dt} = -N_{I}/t_{rel}$. This leads to the exponential solution $N_I(t) = N_I(0) e^{-t/t_{rel}}$. (Where $t=0$ is assumed to be at the time that cell infection stops, which is really $T_{crit}$

The total viral load depends on the rate that virions are dumped into the bloodstream by dying cells and cleared from the body by its immune system. In other words:

$\frac{dN_v}{dt} = -N_v/t_{clr} + \gamma N_{I}/t_{rel}$

Or, taking into account the fact that we know $N_I(t)$ already, with t=0 redefined to be the time where cell infection stops:

$\frac{dN_v}{dt} = -N_v/t_{clr} + \frac{\gamma N_I(0)}{t_{rel}} e^{-t/t_{rel}}$

The computer program: The students should create either a for loop or a while loop that evolves the system forward in time, and which basically solves $N_v(t)$ with two distinct phases:

  1. Prior to the drug kicking in, where the virus load grows exponentially
  2. After the drug starts working, where the virus load starts falling, first relatively slowly and then more quickly. (Double exponential model.)

There should be if statements involved, and possibly break/continue statements if they're feeling clever.


In [ ]:
# put your computer program here!

# put your computer program here!
%matplotlib inline
import matplotlib.pyplot as plt
import math
import numpy as np

NV0 = 1.0e+5  # number of initial viral cells
NI0 = 1.0e+4  # number of initial infected cells
T_crit = 0.1  # time scale (in days) on which the drug becomes effective
t_clr = 1.1   # time scale (in days) on which your body clears out virions
t_rel = 2.0   # time scale (in days) where inected cells die and release virions
t_mul = 0.2   # growth time of viruses (in days) prior to drug administration
dt = 0.01     # timestep length (in days)
end_time = 10.0 # simulation end time (in days)
gamma = 10.0  # number of virions/infected cell

NV = NV0
NI = NI0

time = []
viral_load = []
infected_cells = []

this_time = 0.0

while(this_time <= end_time):

    if this_time <= T_crit:

        # change in virions w/time 
        dNVdt = gamma * NI0 / t_mul * math.exp(this_time/t_mul) 
    
        NI_inf = NI0 * math.exp(this_time/t_mul)
    
        NV += dNVdt*dt
    
    else:  # this_time > T_crit
    
        dNVdt = gamma*NI_inf/t_rel*math.exp(-(this_time-T_crit)/t_rel) - NV/t_clr
        
        NV += dNVdt*dt
        #NI_inf += 
    
    this_time += dt
    time.append(this_time)
    #infected_cells.append(NI_inf)
    
    viral_load.append(NV)
  


plt.plot(hiv_data['time_in_days'],hiv_data['viral_load'],'bo')
plt.plot(model_time,exponential,'b--')
plt.plot(model_time,powerlaw,'r--')
plt.plot(time,viral_load,'k-')
plt.xlim(0,10)
plt.ylim(1.0e+3,5.0e+5)
plt.yscale('log')

Assignment wrapup

Please fill out the form that appears when you run the code below. You must completely fill this out in order to receive credit for the assignment!


In [ ]:
from IPython.display import HTML
HTML(
"""
<iframe 
	src="https://goo.gl/forms/cHBqN8XXeOOTt0K32?embedded=true" 
	width="80%" 
	height="1200px" 
	frameborder="0" 
	marginheight="0" 
	marginwidth="0">
	Loading...
</iframe>
"""
)

Turn it in!

Whether you've completed it or not, turn this assignment in to the Day 9 and/or 10 dropbox in the "in-class activities" folder.

Note: this assignment was inspired by Nelson's Physical models of Living Systems, Chapter 1, and Kinder and Nelson's A Student's Guide to Python for Physical Modeling, Chapter 4.