Table of Contents

1.10 - Please Give Feedback on this assignment

Understanding Parameter-Fitting by Exploring Viral Load

Student names

Work in pairs, and put the names of both people in your group here! (If you're in a group of 3, just move your chairs so you can work together.)

Learning Goals (Why Are We Asking You To Do This?)

  • To develop an intuition about how equation-based models work
  • To understand how to evaluate models by plotting them together with experimental data
  • To practice predicting what the effect will be when we change the parameters of a model
  • To learn how we can iterate to improve the fit of a model

Understanding How We Treat the Human Immunodeficiency Virus (HIV)

Here we explore a model of the viral load—the number of virions in the blood of a patient infected with HIV—after the administration of an antiretroviral drug. One model for the viral load predicts that the concentration $V (t)$ of HIV in the blood at time t after the start of treatment will be:

$$ \begin{equation} V(t) = A \cdot \mathrm{exp}(-\alpha t) + B \cdot \mathrm{exp}(-\beta t) \end{equation}$$

When we write mathematics, $\mathrm{exp}(\dots)$ is notational shorthand for $e^{(\dots)}$. So, we can rewrite the viral load model this way:

$$ \begin{equation} V(t) = A e^{(-\alpha t)} + B e^{(-\beta t)} \end{equation} $$

Two things to note about this particular model:

  1. Viral load is a function of time $t$. That's why we're writing it as $V(t) = \dots$.
  2. There are four modeling parameters (numbers) we can vary: $(A, \alpha, B, \beta )$.:
$$ \begin{equation} V(t) = \textbf{A} e^{(-\boldsymbol{\alpha} t)} + \textbf{B} e^{(- \boldsymbol{\beta} t)} \end{equation} $$

Note:

You probably know that there are black-box software packages that do such “curve fitting” automatically. In this lab, you should do it manually, just to see how the curves respond to changes in the parameters.]

Kinder, Jesse M.; Nelson, Philip (2015-07-01). A Student's Guide to Python for Physical Modeling (Page 63). Princeton University Press. Kindle Edition.

Doing it manually also helps build our intuition for how mathematical models behave when we visualize them.

Let's get started by setting some options and loading the modules we'll need.

Setting options and loading modules


In [ ]:
# Make plots inline
%matplotlib inline

# Make inline plots vector graphics instead of raster graphics
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('pdf', 'svg')

# import modules for plotting and data analysis
import matplotlib.pyplot as plt
import numpy as np
import pandas

Now, we'll tackle the "function in time" part of this model by learning how to make and use arrays to represent time.

Making intervals in time

Using numpy, we can conveniently make arrays that would represent slices in time. We'll use the np.linspace command, which takes three arguments (inputs):

  • Where you want the interval to start
  • Where you want the interval to end
  • How many steps there should be in the interval

We can use np.linspace to make time intervals


In [ ]:
import numpy as np

np.linspace(
    0,  # where the interval starts
    1,  # where the interval ends
    11  # How many steps (elements) we want in the final array    
)

We can assign time intervals to variables

We can also try assigning that array to a variable, so we our later code can use it and refer back to it. Here, we'll:

  1. Make the same interval as before (0 to 1),
  2. With eleven steps,
  3. Assign it to a variable we'll call trying_to_make_a_time_interval (it helps when our names are descriptive)
  4. Check how many items are in the array we made. (It should be 11).

In [ ]:
import numpy as np

# Make our array and assign it to `time`
trying_to_make_a_time_interval = np.linspace(
    0,  # where the interval starts
    1,  # where the interval ends
    11  # How many steps (elements) we want in the final array    
)

# Check that there are 11 elements in our array (which we asked for)
trying_to_make_a_time_interval.size

Your Turn - 'Use np.linspace() to make a time interval for our model

Now that you've seen how to make evenly-spaced time intervals, we'll start creating an array of time slices for our viral load model. Create a single array called time that:

  • Starts at 0,
  • Goes to 10, and
  • Has 101 elements in it

In [ ]:
# Create your time array here
time = np.linspace(0, 10, 101)

We can set and change parameter values to see how the model behaves

Remember how our model equation has four parameters $(A, \alpha, B, \beta )$? Below, you're going to use Python code to see what happens to the model when we try (and change) values for those parameters. Below, write code that sets $B = 0$ and chooses non-zero values for the other three parameters.


In [ ]:
# Make B equal to zero and set some non-zero values for the other parameters

A = 3
B = 0 
alpha = 2
beta = 5

PREDICT BEFORE YOU PLOT

Here's that viral load equation again:

$$ \begin{equation} V(t) = \textbf{A} e^{(-\boldsymbol{\alpha} t)} + \textbf{B} e^{(- \boldsymbol{\beta} t)} \end{equation} $$

Just like the order-of-magnitude approximations we've been doing, thinking before we plot and evaluate helps us develop our intution about models. In your code above, you've set $B = 0$. Use the markdown cell below to predict (in words) how you think setting $B = 0$ affects the equation (and by extension, the model).

  • How will the equation change?
    • Setting $B = 0$ drops out the second term, which means the viral load is wholly determined by the first term
  • How does setting $B = 0$ affect the other model parameters (if at all)
    • With $B = 0$, the value of $Beta$ is inconsequential because its term ultimately gets multiplied by zero.

Your Prediction

When we set B = 0, we think what will happen is...

Explore the model

Explore the model by evaluating the equation

We can write the viral load function (from above) in Python, using the parameters we set above and our time array. In the cell below, write viral_load as a function of time.

Note: We can write exponentation in numpy using np.exp(). So,

$$ \begin{equation} e^{(\dots)} = \tt{np.exp(\dots)} \end{equation} $$

In [ ]:
# Write and evaluate viral_load = ...
viral_load = (A * np.exp(-alpha * time)) + (B * np.exp(-beta * time))

Explore the model by visualizing it

You should now have two arrays of the same length, time and viral load. In the Python cell below, verify that they're the same length (have the same number of values), then plot them.


In [ ]:
# Verify that both arrays are the same length. 
# You can use .size, as in time.size or viral_load.size

# Then, try plotting viral_load vs. time

# Make plots inline
%matplotlib inline

# Make inline plots vector graphics instead of raster graphics
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('pdf', 'svg')
import matplotlib.pyplot as plt

# Put your plot code here
plt.plot(viral_load, time)
plt.yscale('log')

Explore the model by changing parameter values

Create a few more plots using different values of the four model parameters. For each new plot,

  1. Use a Python cell to change the model parameters however you'd like,
  2. Re-evaluate the viral_load
  3. Use a markdown cell to explain how the curve changed

Remember: You can try negative (and non-integer) values for parameters


In [ ]:
# Change the values and make a new plot

A = 10
B = -0.5
alpha = 1
beta = 1

viral_load = (A * np.exp(-alpha * time)) + (B * np.exp(-beta * time))
plt.plot(time, viral_load)

// Note each change you made and what you saw

Loading and Examining Experimental Data

What is this data?

We're going to use experimental data of actual viral loads provided by Kindler and Nelson (2015). They write:

File HIVseries.mat contains variable "a" with two columns of data. The first is the time in days since administration of a treatment to an HIV positive patient; the second contains the concentration of virus in that patient's blood, in arbitrary units.

HIVseries.csv and HIVseries.npy contain the same data in the same format. HIVseries.npz contains the same data in two separate arrays called time_in_days and viral_load.

Data from A. Perelson. Modelling viral and immune system dynamics. Nature Revs. Immunol. (2002) vol. 2 (1) pp. 28--36 (Box 1).

So, to summarize, the dataset hiv_data has 2 columns:

  • time_in_days is the number of days since an HIV-positive patient received a treatment.
  • viral_load is the concentraiton of the virus in that patients blood, in arbitrary units.

Use pandas.read_csv() to Load the Data

The data file we'll use is in a file format called CSV, which stands for comma-separated values. It's a commonly used format for storing 2-dimensional data, and programs like Microsoft Excel can import and export .CSV files.

The code below will use the read_csv() function from the pandas data analysis library to load the CSV file you need from the web, then store the data as a variable called hiv_data.


In [ ]:
# Loading the data
hiv_data = pandas.read_csv(
    "https://raw.githubusercontent.com/ComputationalModeling/IPML-Data/master/01HIVseries/HIVseries.csv",
    header = None,
    names = ["time_in_days", "viral_load"]
)

# the data type of hiv_data is "dataframe"
type(hiv_data)

You Can View a Pandas DataFrame by Executing It


In [ ]:
# Execute this cell (Shift + Enter) to see the data
hiv_data

You can view the first/last few rows of data with .head() and .tail() functions


In [ ]:
# If you have a pandas dataframe, you can call `head()` on it like this:
hiv_data.head()

In [ ]:
# To see the last few rows, call `tail()` on it
hiv_data.tail()

Use data["column_name"] to View or Refer to a Column of Data


In [ ]:
# How to view an individual column
hiv_data["time_in_days"] # or
hiv_data["viral_load"]

Pandas DataFrame Columns Behave Like Numpy Arrays


In [ ]:
# Here's the viral load column again
hiv_data["viral_load"]

In [ ]:
# And we can calulate its mean, max, size, and other properties
# Just like we would on a numpy array
hiv_data["viral_load"].mean()

In [ ]:
hiv_data["viral_load"].max()

In [ ]:
hiv_data["viral_load"].size

Plotting the Experimental Data

In a Python cell below, plot the viral load versus time from the hiv_data we loaded.


In [ ]:
# Plot viral load vs. time

plt.plot(hiv_data["time_in_days"],hiv_data["viral_load"],'ro')
plt.yscale('log')
# note that there's some very interesting double-exponential behavior here!

Fitting Our Model To Experimental Data

Now that we've seen what the experimental data look like, we'll take our model one step further by putting the data and our model together on the same plot.

Plot the Data and Model Together

In the Python Cell below, create a plot that contains both the data and the model we were working with earlier. So, we'll superimpose the datapoints on a plot of $V(t) = A e^{(-\alpha t)} + B e^{(-\beta t)}$. You may need to adjust the model parameters until you can see both the data and model in your plot.


In [ ]:
# Plot the data and model together

A = 95000
B = 150000
alpha = 0.3
beta = 2.0

viral_load = (A * np.exp(-alpha * time)) + (B * np.exp(-beta * time))

plt.plot(
    time, viral_load, "r",
    hiv_data['time_in_days'], hiv_data['viral_load'], "b."
)

#plt.plot(
#    hiv_data['time_in_days'], hiv_data['viral_load'], "b."
#)

#plt.yscale('log')  # much less clear what's going on here

THINK about Tuning the Model Parameters to Fit the Model to Data

The goal will be to tune the four parameters of $V(t) = A e^{(-\alpha t)} + B e^{(-\beta t)}$ until the model agrees with the data. It is hard to find the right needle in a four-dimensional haystack! We need a more systematic approach than just guessing. Consider and try to answer each of the following in a Markdown cell below:

  1. Assuming $\beta > \alpha$, how does the trial solution behave at long times?
  2. If the data also behave that way, can we use the long-time behavior to determine two of the four unknown constants, then hold them fixed while adjusting the other two?
  3. Even two constants is a lot to adjust by hand, so let’s think some more: How does the initial value $V(0)$ depend on the four constant parameters?
  4. Can you vary these constants in a way that always gives the correct long-time behavior and initial value? (Kinder and Nelson, 2015, p. 62)

// Answer the questions here

Make and Plot Multiple Models to Find Model Parameters that Best Fit the Model to the Data

Kinder and Nelson (2015) recommend that you:

Carry out this analysis so that you have only one remaining free parameter, which you can adjust fairly easily. Adjust this parameter until you like what you see.

We suggest you try and save different models by assigning those models to named variables. For example, you might try:

# Parameters for Model 1
A = 1
B = 3

### Evaluate model 1
model_01 = (A * np.exp(alpha * time)) + (B * np.exp(beta * time))

### Plot model 1
plt.plot(time, model_01)

# Trying new parameter values for Model 2
A = 3
B = -5
alpha = 0.5

# Evaluate model 2
model_02 = (A * np.exp(alpha * time)) + (B * np.exp(beta * time))

# Plot model 2
plt.plot(time, viral_load)

# and so on until you find values you like

Use one or more Python cells below to create multiple models and decide on good parameter values.


In [ ]:
# Do whatever work you need here to determine
#   the parameter values you think work best.
# REMEMBER: You can assign each new model to a new variable,
#   like `model_01`, `model_02`, ...

# fiddle with parameters
A = 10
B = 160000 - A
alpha = .002
beta = 10

time1 = np.linspace(0, 2, 100)
slope1 = -np.log((106100 - 66435) / 2)
y_int1 = np.log(106100)
line1 = (slope1 * time1) + y_int1


slope2 = -np.log((75906 / 5))
y_int2 = np.log(4785.2)
line2 = slope2 * (time - 7) + y_int2

log_model = np.log((A * np.exp(-alpha * time)) + (B * np.exp(-beta * time)))
log_hiv_data = np.log(hiv_data["viral_load"])



plt.plot(
    hiv_data['time_in_days'], np.log(hiv_data['viral_load']), 'bo',
#     ,time, line2, 'g'
    time1, line1, 'r'
)


line1[0]
hiv_data['viral_load'].where(hiv_data['viral_load'] < 60000)

Please Give Feedback on this assignment


In [ ]:
from IPython.display import IFrame
IFrame('http://goo.gl/forms/v8oZUSLDaa', width=800, height=1200)

How to submit this assignment

Log into the course Desire2Learn website (d2l.msu.edu) and go to the "In-class assignments" folder. There will be a dropbox labeled "Day 6". Upload this notebook there (but not pictures of drawings, etc.). You only have to upload one notebook per group - just make sure that everybody's name is at the top of the notebook!

Source

This tutorial is adapted from:

Kinder, Jesse M.; Nelson, Philip (2015-07-01). A Student's Guide to Python for Physical Modeling (pp. 61–64). Princeton University Press. Kindle Edition.


In [ ]: