Work in pairs, and put the names of both people in your group here! (If you're in a group of 3, just move your chairs so you can work together.)
Here we explore a model of the viral load—the number of virions in the blood of a patient infected with HIV—after the administration of an antiretroviral drug. One model for the viral load predicts that the concentration $V (t)$ of HIV in the blood at time t after the start of treatment will be:
$$ \begin{equation} V(t) = A \cdot \mathrm{exp}(-\alpha t) + B \cdot \mathrm{exp}(-\beta t) \end{equation}$$When we write mathematics, $\mathrm{exp}(\dots)$ is notational shorthand for $e^{(\dots)}$. So, we can rewrite the viral load model this way:
$$ \begin{equation} V(t) = A e^{(-\alpha t)} + B e^{(-\beta t)} \end{equation} $$Two things to note about this particular model:
Note:
You probably know that there are black-box software packages that do such “curve fitting” automatically. In this lab, you should do it manually, just to see how the curves respond to changes in the parameters.]
Kinder, Jesse M.; Nelson, Philip (2015-07-01). A Student's Guide to Python for Physical Modeling (Page 63). Princeton University Press. Kindle Edition.
Doing it manually also helps build our intuition for how mathematical models behave when we visualize them.
Let's get started by setting some options and loading the modules we'll need.
In [ ]:
# Make plots inline
%matplotlib inline
# Make inline plots vector graphics instead of raster graphics
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('pdf', 'svg')
# import modules for plotting and data analysis
import matplotlib.pyplot as plt
import numpy as np
import pandas
Now, we'll tackle the "function in time" part of this model by learning how to make and use arrays to represent time.
In [ ]:
import numpy as np
np.linspace(
0, # where the interval starts
1, # where the interval ends
11 # How many steps (elements) we want in the final array
)
We can also try assigning that array to a variable, so we our later code can use it and refer back to it. Here, we'll:
trying_to_make_a_time_interval
(it helps when our names are descriptive)
In [ ]:
import numpy as np
# Make our array and assign it to `time`
trying_to_make_a_time_interval = np.linspace(
0, # where the interval starts
1, # where the interval ends
11 # How many steps (elements) we want in the final array
)
# Check that there are 11 elements in our array (which we asked for)
trying_to_make_a_time_interval.size
In [ ]:
# Create your time array here
time = np.linspace(0, 10, 101)
Remember how our model equation has four parameters $(A, \alpha, B, \beta )$? Below, you're going to use Python code to see what happens to the model when we try (and change) values for those parameters. Below, write code that sets $B = 0$ and chooses non-zero values for the other three parameters.
In [ ]:
# Make B equal to zero and set some non-zero values for the other parameters
A = 3
B = 0
alpha = 2
beta = 5
Here's that viral load equation again:
$$ \begin{equation} V(t) = \textbf{A} e^{(-\boldsymbol{\alpha} t)} + \textbf{B} e^{(- \boldsymbol{\beta} t)} \end{equation} $$Just like the order-of-magnitude approximations we've been doing, thinking before we plot and evaluate helps us develop our intution about models. In your code above, you've set $B = 0$. Use the markdown cell below to predict (in words) how you think setting $B = 0$ affects the equation (and by extension, the model).
We can write the viral load function (from above) in Python, using the parameters we set above and our time
array. In the cell below, write viral_load
as a function of time
.
Note: We can write exponentation in numpy using np.exp()
. So,
In [ ]:
# Write and evaluate viral_load = ...
viral_load = (A * np.exp(-alpha * time)) + (B * np.exp(-beta * time))
In [ ]:
# Verify that both arrays are the same length.
# You can use .size, as in time.size or viral_load.size
# Then, try plotting viral_load vs. time
# Make plots inline
%matplotlib inline
# Make inline plots vector graphics instead of raster graphics
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('pdf', 'svg')
import matplotlib.pyplot as plt
# Put your plot code here
plt.plot(viral_load, time)
plt.yscale('log')
Create a few more plots using different values of the four model parameters. For each new plot,
viral_load
Remember: You can try negative (and non-integer) values for parameters
In [ ]:
# Change the values and make a new plot
A = 10
B = -0.5
alpha = 1
beta = 1
viral_load = (A * np.exp(-alpha * time)) + (B * np.exp(-beta * time))
plt.plot(time, viral_load)
// Note each change you made and what you saw
We're going to use experimental data of actual viral loads provided by Kindler and Nelson (2015). They write:
File
HIVseries.mat
contains variable "a" with two columns of data. The first is the time in days since administration of a treatment to an HIV positive patient; the second contains the concentration of virus in that patient's blood, in arbitrary units.
HIVseries.csv
andHIVseries.npy
contain the same data in the same format.HIVseries.npz
contains the same data in two separate arrays calledtime_in_days
andviral_load
.Data from A. Perelson. Modelling viral and immune system dynamics. Nature Revs. Immunol. (2002) vol. 2 (1) pp. 28--36 (Box 1).
So, to summarize, the dataset hiv_data
has 2 columns:
time_in_days
is the number of days since an HIV-positive patient received a treatment.viral_load
is the concentraiton of the virus in that patients blood, in arbitrary units.pandas.read_csv()
to Load the DataThe data file we'll use is in a file format called CSV, which stands for comma-separated values. It's a commonly used format for storing 2-dimensional data, and programs like Microsoft Excel can import and export .CSV files.
The code below will use the read_csv()
function from the pandas data analysis library to load the CSV file you need from the web, then store the data as a variable called hiv_data
.
In [ ]:
# Loading the data
hiv_data = pandas.read_csv(
"https://raw.githubusercontent.com/ComputationalModeling/IPML-Data/master/01HIVseries/HIVseries.csv",
header = None,
names = ["time_in_days", "viral_load"]
)
# the data type of hiv_data is "dataframe"
type(hiv_data)
In [ ]:
# Execute this cell (Shift + Enter) to see the data
hiv_data
In [ ]:
# If you have a pandas dataframe, you can call `head()` on it like this:
hiv_data.head()
In [ ]:
# To see the last few rows, call `tail()` on it
hiv_data.tail()
In [ ]:
# How to view an individual column
hiv_data["time_in_days"] # or
hiv_data["viral_load"]
In [ ]:
# Here's the viral load column again
hiv_data["viral_load"]
In [ ]:
# And we can calulate its mean, max, size, and other properties
# Just like we would on a numpy array
hiv_data["viral_load"].mean()
In [ ]:
hiv_data["viral_load"].max()
In [ ]:
hiv_data["viral_load"].size
In a Python cell below, plot the viral load versus time from the hiv_data
we loaded.
In [ ]:
# Plot viral load vs. time
plt.plot(hiv_data["time_in_days"],hiv_data["viral_load"],'ro')
plt.yscale('log')
# note that there's some very interesting double-exponential behavior here!
Now that we've seen what the experimental data look like, we'll take our model one step further by putting the data and our model together on the same plot.
In the Python Cell below, create a plot that contains both the data and the model we were working with earlier. So, we'll superimpose the datapoints on a plot of $V(t) = A e^{(-\alpha t)} + B e^{(-\beta t)}$. You may need to adjust the model parameters until you can see both the data and model in your plot.
In [ ]:
# Plot the data and model together
A = 95000
B = 150000
alpha = 0.3
beta = 2.0
viral_load = (A * np.exp(-alpha * time)) + (B * np.exp(-beta * time))
plt.plot(
time, viral_load, "r",
hiv_data['time_in_days'], hiv_data['viral_load'], "b."
)
#plt.plot(
# hiv_data['time_in_days'], hiv_data['viral_load'], "b."
#)
#plt.yscale('log') # much less clear what's going on here
The goal will be to tune the four parameters of $V(t) = A e^{(-\alpha t)} + B e^{(-\beta t)}$ until the model agrees with the data. It is hard to find the right needle in a four-dimensional haystack! We need a more systematic approach than just guessing. Consider and try to answer each of the following in a Markdown cell below:
// Answer the questions here
Kinder and Nelson (2015) recommend that you:
Carry out this analysis so that you have only one remaining free parameter, which you can adjust fairly easily. Adjust this parameter until you like what you see.
We suggest you try and save different models by assigning those models to named variables. For example, you might try:
# Parameters for Model 1
A = 1
B = 3
### Evaluate model 1
model_01 = (A * np.exp(alpha * time)) + (B * np.exp(beta * time))
### Plot model 1
plt.plot(time, model_01)
# Trying new parameter values for Model 2
A = 3
B = -5
alpha = 0.5
# Evaluate model 2
model_02 = (A * np.exp(alpha * time)) + (B * np.exp(beta * time))
# Plot model 2
plt.plot(time, viral_load)
# and so on until you find values you like
Use one or more Python cells below to create multiple models and decide on good parameter values.
In [ ]:
# Do whatever work you need here to determine
# the parameter values you think work best.
# REMEMBER: You can assign each new model to a new variable,
# like `model_01`, `model_02`, ...
# fiddle with parameters
A = 10
B = 160000 - A
alpha = .002
beta = 10
time1 = np.linspace(0, 2, 100)
slope1 = -np.log((106100 - 66435) / 2)
y_int1 = np.log(106100)
line1 = (slope1 * time1) + y_int1
slope2 = -np.log((75906 / 5))
y_int2 = np.log(4785.2)
line2 = slope2 * (time - 7) + y_int2
log_model = np.log((A * np.exp(-alpha * time)) + (B * np.exp(-beta * time)))
log_hiv_data = np.log(hiv_data["viral_load"])
plt.plot(
hiv_data['time_in_days'], np.log(hiv_data['viral_load']), 'bo',
# ,time, line2, 'g'
time1, line1, 'r'
)
line1[0]
hiv_data['viral_load'].where(hiv_data['viral_load'] < 60000)
In [ ]:
from IPython.display import IFrame
IFrame('http://goo.gl/forms/v8oZUSLDaa', width=800, height=1200)
Log into the course Desire2Learn website (d2l.msu.edu) and go to the "In-class assignments" folder. There will be a dropbox labeled "Day 6
". Upload this notebook there (but not pictures of drawings, etc.). You only have to upload one notebook per group - just make sure that everybody's name is at the top of the notebook!
In [ ]: