Simulation, analysis, and visualization of tumor growth

This Jupyter Notebook illustrates the following:

Monte Carlo simulation of an agent-based model using object-oriented programming in Python (using NumPy and SciPy)
statistical analysis of an ensemble of simulations
interactive data visualization using Matplotlib and the JavaScript library D3

This Notebook lives at Github.

Rendered version of notebook includes inline animations and interactive plots.

Integral feedback control of tumor growth

Many growing tumors do not progress, and instead arrest their growth, meaning that their cells stop replicating. Some tumors that have previously arrested their growth can reinitiate it. Physicians would like to be able to distinguish tumors that are destined to escape growth arrest from those that are not, yet current methods for doing so are unreliable.

Nevi (pigmented spots) on human skin are good examples of this. Physicians are poorly equipped to predict whether a given nevus will remain benign or become a melanoma. Faced with evidence that a patient is at risk of melanoma (e.g. the existence of predisposing mutations in their genome), physicians attempt to lower the risk by excising lots of nevi, proceeding under the assumption that the patient harbors an unidentifiable rogue nevus, which is more likely to be removed the more nevi are excised. The problem with this approach is that many nevi that would never have progressed to melanoma are nevertheless excised, placing a physical and emotional burden on the patient, and a financial burden on the healthcare system.

An understanding of what makes nevi stop growing in the first place may help predict the fate of nevi, yet currently no good mechanistic hypotheses exist. It is often implicitly assumed that cells autonomously arrest their growth. Yet, we know that many, if not all, tissues are composed of cell lineages, e.g. a stem cell state that becomes a terminally differentiated state, and that cells in different lineage states often cooperate to control their numbers. The most commonly observed strategy is an integral-feedback mechanism in which terminally differentiated (quiescent) cells secrete molecules that lower the probability that nearby progenitor (cycling) cells replicate versus differentiate (e.g. see this publication).

I used modeling, simulations and analysis (featured below) to determine conditions under which nevi, and tumors in general, are expected to arrest their growth through integral feedback control mediated by cell lineages.

The `tumor` package

tumor is a Python package that contains a number of modules that permit tumor growth simulations.



In [1]:

    
import tumor_package
help(tumor_package)









    



Help on package tumor_package:

NAME
    tumor_package

PACKAGE CONTENTS
    animate
    energy
    ensemble
    events
    read
    self_renewal_probability
    tumor
    write

FILE
    /Users/petermchale/tumor/tumor_package/__init__.py



In [2]:

    
# To make these modules visible to the Notebook, edit the python search path.
import os
repository_directory = os.getcwd() 
package_directory = repository_directory + '/tumor_package'
print('package directory:\t' + package_directory)
import sys
print('Python searches these paths when asked to import a module:')
sys.path.append(package_directory)
for path in sys.path: 
    print(path)









    



package directory:	/Users/petermchale/tumor/tumor_package
Python searches these paths when asked to import a module:

/Users/petermchale/anaconda/lib/python36.zip
/Users/petermchale/anaconda/lib/python3.6
/Users/petermchale/anaconda/lib/python3.6/lib-dynload
/Users/petermchale/anaconda/lib/python3.6/site-packages
/Users/petermchale/anaconda/lib/python3.6/site-packages/Sphinx-1.5.6-py3.6.egg
/Users/petermchale/anaconda/lib/python3.6/site-packages/aeosa
/Users/petermchale/anaconda/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg
/Users/petermchale/anaconda/lib/python3.6/site-packages/IPython/extensions
/Users/petermchale/.ipython
/Users/petermchale/tumor/tumor_package

Simulating tumor growth



In [3]:

    
# the initial condition and parameter values are stored in Dictionaries
from read import read_into_dict
initialCondition = read_into_dict('data/animation/initialCondition.in')
parameterValues = read_into_dict('data/animation/parameterValues.in')

# simulate and store the resultant time course in an animation object
from animate import animate_tumor_growth_base
fig, anim = animate_tumor_growth_base(initialCondition, parameterValues, number_of_frames=400, random_seed=2)

To embed the matplotlib animation anim in the notebook, I follow Louis Tiao's blog post. This requires a recent version of Matplotlib, and an animation writer called ffmpeg, which you can install by first installing Homebrew and then typing brew install ffmpeg in the Terminal.



In [4]:

    
# Check matplotlib and ffmpeg
import matplotlib as mpl
print('matplotlib version:\t' + mpl.__version__ + '\t (should be at least 1.5.1)')
from matplotlib import pyplot as plt
print('animation writer:\t' + plt.rcParams['animation.writer'] + '\t (should be ffmpeg)')









    



matplotlib version:	2.0.2	 (should be at least 1.5.1)
animation writer:	ffmpeg	 (should be ffmpeg)



In [5]:

    
# Simulate and visualize the tumor, using a HTML5 video tag
from IPython.display import HTML
HTML(anim.to_html5_video())









    Out[5]:

In this movie, each disc represents a tumor cell. Color indicates whether the cell is in a cycling state (green) or a quiescent state (red). Time is measured in units of a cell cycle: the average time that elapses between a cell's birth and its division.

The movie shows the interplay between cycling (C) and quiescent (Q) cells during tumor growth. Early on, C-cells self-renew, mostly amplifying their own numbers but also occasionally differentiating to produce Q-cells. Once the number of Q-cells crosses a threshold value (inverse feedback gain; see below), a brief second phase begins during which the C-cells differentiate, rapidly reducing their numbers, increasing the numbers of Q-cells, and arresting tumor growth. Finally, in a third phase, the tumor regresses due to cell death.

Modeling cell division, quiescence, and feedback

Having visualized the output of the model, let us now describe it in more detail. The fate of a C-cell is modeled as a continuous-time Poisson process in which the cell either divides or becomes quiescent after an exponentially distributed random time interval. If a C-cell self-renews, its daughters are displaced in opposite directions. All this can be conveniently represented mathematically as

\begin{equation} C_{\vec{x}} \xrightarrow{1} \left\{ \begin{array}{ll} C_{\vec{x} + \delta\vec{x}} + C_{\vec{x} - \delta\vec{x}} & \mbox{with prob $p$}\\ Q_{\vec{x}} & \mbox{with prob $1-p$} \end{array} \right. \end{equation}

where $C_{\vec{x}}$ represents a C-cell located at position $\vec{x}$, the $1$ over the arrow on the left indicates the mean waiting time for something to happen to that cell, $p$ represents the self-renewal probability, and $\delta\vec{x}$ is a vector of fixed length but random direction.

We suppose that C- and Q-cells within a neighborhood of a C-cell influence the chance that it self-renews:

\begin{equation} p = p_{max} W_{neg} W_{pos} \end{equation}

where $W_{neg}$ and $W_{pos}$ are negative and positive feedback functions, respectively, of $N_C$ and $N_Q$, the number of C- and Q-cells in a neighborhood of the C-cell under consideration. Examples of feedback functions are

\begin{eqnarray} W_{neg} & = & \frac{1}{1 + s} \\ W_{pos} & = & \frac{s}{1 + s} \end{eqnarray}

where

\begin{equation} s = (g_C N_C)^{n_C} + (g_Q N_Q)^{n_Q} \end{equation}

and the $g$'s are 'feedback gains'. Notice that feedback 'kicks in' when cell numbers become comparable to the inverse value of the corresponding feedback gain because it is then that the $W$ functions, which control the self-renewal probability $p$, cross over from 0 to 1 (or vice versa).

To see how these behaviors are implemented look at execute_divisionQuiescence_event:



In [6]:

    
from events import execute_divisionQuiescence_event
help(execute_divisionQuiescence_event)









    



Help on function execute_divisionQuiescence_event in module events:

execute_divisionQuiescence_event(prng, cells, old_tumor_energy, parameterValues)
    Randomly choose between dividing a random cell versus making it
    quiescent.  This function modifies the list of cells in place, but
    updates the tumor energy using a functional programming style,
    i.e. the tumor energy is passed in, updated, and then passed out
    using a return statement.

Modeling physical interactions between cells

A cell physically interacts with other cells in its proximity in two fundamental ways. First, the existence of a cell precludes the possibility of another cell occupying the same position, a phenomenon called 'volume exclusion'. Second, cells tend to adhere to one another via molecules on their surfaces. Put another way, cell-cell interactions are strongly repulsive at short intercellular distances, weakly attractive at intermediate distances, and zero at long distances. One way to capture these constraints mathematically is to ascribe to each pair of cells an interaction energy $V$ that depends on their separation $r$ in the following way:

\begin{equation} V(r) = \left\{ \begin{array}{ll} V_a \left(\frac{a}{r}\right)^n - V_c & r < a\\ -V_c & a < r < b \\ -V_c \frac{c-r}{c-b} & b < r < c \\ 0 & c < r \end{array} \right. \end{equation}

The following code plots this function:



In [7]:

    
import numpy as np

def cell_cell_energy(rr): 
    """interaction energy between two cells located a distance rr apart"""
    
    Va = 120
    Vc = 12
    aa = 1
    bb = 1.2
    cc = 2
    nn = 2
    
    if rr < aa: 
        return Va * np.power(aa/rr, nn) - float(Vc)
    elif rr < bb: 
        return -float(Vc)
    elif rr < cc: 
        return -float(Vc) * (cc - rr)/(cc - bb)
    else: 
        return 0.0

%matplotlib inline

fontsize = 16
fontsize_tick = 14

fig = plt.figure(figsize=(7,5), facecolor='w')
ax = fig.add_subplot(111)
rr = np.linspace(0.9, 2.25, 100)  
ax.plot(rr, np.vectorize(cell_cell_energy)(rr), linewidth=2)
ax.set_xlabel('distance between cells, r (cell diameters)', fontsize=fontsize)
ax.set_ylabel('cell-cell interaction energy, V(r)', fontsize=fontsize)
ax.tick_params(axis='both', which='major', labelsize=fontsize_tick)

You can see that there is a huge energy penalty at $r<1$ reflecting the tendency of cells closer than one cell diameter to repel one another. In general, cells will tend to move to positions in which they minimize their interaction energy, which, for this energy function, corresponds to intercellular distances between 1 and 1.2 cell diameters.

Modeling tissue relaxation

Cell growth, division and death are slow processes that induce intercellular forces because of changes in cell size and number. The tissue responds by re-arranging its cells in such a way as to remove these forces, a process termed 'relaxation'. It occurs on a faster timescale than cell growth, division and death, probabilistically favoring cell displacements that lower the tissue energy.

Mathematically speaking, a cell $Z$ is chosen at random at a rate $\nu\gg 1$ (recall that $1$ is the rate at which cells divide) and a random displacement $\delta \vec{x}$ proposed:

\begin{equation} Z_{\vec{x}} \xrightarrow{\nu} Z_{\vec{x} + \delta \vec{x}} \end{equation}

where $\delta \vec{x}$ is drawn from a circular Gaussian distribution. Associated with the proposed move is an energy change $\Delta E = E_{new} - E_{old}$, where

\begin{equation} E = \sum_{i < j} V (|\vec{x}_i - \vec{x}_j|) \end{equation}

is the tumor energy. We implement the bias towards more energetically favorable cell configurations via the Metropolis algorithm in which we always accept the proposed cell displacement when it lowers the energy and accept it with probability $e^{-\Delta E}$ when it raises the energy (by a positive amount $\Delta E$).

A note on performance. The time needed to compute the tumor energy $E$ by summing over all cell pairs scales quadratically with tumor size. Much more efficient is to update only the cell-cell interaction energies that change after each event (division, death, relaxation) producing an algorithm that scales linearly with tumor size. tumor implements the faster linear algorithm.

Statistical analysis of simulated tumor-growth data

Generate an ensemble of tumor-growth time courses by either running

from ensemble import generate_tumor_growth_trajectories_base
generate_tumor_growth_trajectories_base(initialCondition, parameterValues, number_realizations=100, random_seed=2, output_directory_name='data/animation/')

in this Notebook; or by running

unix
cd <path to repository>/data/animation/
python ../../tumor_package/ensemble.py

in Terminal. Generating a large number of realizations of tumor growth takes a while. To speed up the execution of this Notebook, I will therefore use pre-existing data for long- and short-range negative feedback. Let's now read in these data and visually compare the simulated tumor-growth time courses.



In [8]:

    
# Read in data 

parameterValues_short = read_into_dict('data/short_range_negative_feedback/parameterValues.in')
parameterValues_long = read_into_dict('data/long_range_negative_feedback/parameterValues.in')

class Data(object): 
    """Container for simulation data"""
    
    def __init__(self, time_points, number_C_cells, number_Q_cells): 
    
        self.time_points = time_points
        self.number_C_cells = number_C_cells
        self.number_Q_cells = number_Q_cells
        self.total_number_cells = self.number_C_cells + self.number_Q_cells

def create_data_object_from_file(data_file_name):
    
    with np.load(data_file_name) as data_set:
        time_points = data_set['time_points']
        number_C_cells = data_set['number_C_cells']
        number_Q_cells = data_set['number_Q_cells']
    
    return Data(time_points, number_C_cells, number_Q_cells)

data_short = create_data_object_from_file('data/short_range_negative_feedback/tumor_time_courses.npz')
data_long = create_data_object_from_file('data/long_range_negative_feedback/tumor_time_courses.npz')



In [9]:

    
# Visually compare time courses for short- and long-range negative feedback

fig = plt.figure(figsize=(12, 7), facecolor='w')

def plot_time_courses(data, axis_number, title):

    number_subplot_rows, number_subplot_columns = 1, 2
    ax = fig.add_subplot(number_subplot_rows, number_subplot_columns, axis_number)
    ax.plot(data.time_points.T, data.total_number_cells.T, linewidth=1)
    ax.set_xlabel('time (cell cycles)', fontsize=fontsize)
    ax.set_ylabel('number of cells per tumor', fontsize=fontsize)
    ax.set_title(title, fontsize=fontsize)
    ax.tick_params(axis='both', which='major', labelsize=fontsize_tick)
    ax.set_yscale('log')
    ax.set_ylim(0, 1e3)
    
def thin(data, step): 
    """down-sample time-course data"""
    
    def thin_base(arr):
        return arr[0::step, :]
    
    return Data(thin_base(data.time_points), thin_base(data.number_C_cells), thin_base(data.number_Q_cells))
    
step = 2
plot_time_courses(thin(data_short, step), axis_number=1, title='short-range high-gain feedback')
plot_time_courses(thin(data_long, step), axis_number=2, title='long-range low-gain feedback')

The two graphs correspond to two different pairs of values of negative feedback range and gain while each color represents a single realization of a growing tumor. When increasing the spatial range of negative feedback, I also decreased its gain, so as to make the average trajectories comparable in both cases. A number of trends are evident in these plots.

First, negative feedback can arrest tumor growth even when the spatial range of feedback is smaller than the maximum tumor size (left panel). Fundamentally, this is possible because controlling the growth of the tumor early is more important than doing so late; in other words, errors have less impact when corrected early rather than late.

Second, increasing the spatial range of the negative feedback makes the large-number part of the trajectories less variable (right panel). In particular, you can clearly see examples of trajectories that possess an early exponential growth phase (due to the tumors being mainly composed of self-replicating C-cells) and a late exponential regression phase (during which tumors are composed mainly of Q-cells that are killed off at a constant rate per cell).

Third, some tumors survive the initial fluctuations in tumor size to grow deterministically (right panel). For these tumors, most of their variability is explained by the randomness in the time at which they 'take off', e.g. the time at which there are at least 10 cells. Other tumors are not so 'lucky': the initial fluctuations prevent them from ever entering the deterministic growth phase. These 'unlucky' tumors — doomed to regression from their inception — are not rare. In fact, as many as half of the tumors are destined to 'extinguish', as the following histogram shows...



In [10]:

    
# plot the distribution of the number of cells per tumor at a single time point

def plot_distribution(data, bin_width=10, time_at_which_to_plot_histogram=14):

    index_at_which_to_plot_histogram = (np.abs(data.time_points[0, :] - time_at_which_to_plot_histogram)).argmin()
    total_number_cells_at_time_point = data.total_number_cells[:, index_at_which_to_plot_histogram]

    fig = plt.figure(figsize=(7, 5), facecolor='w')
    ax = fig.add_subplot(111)
    x_max = max(data.total_number_cells.flatten()) + 2*bin_width    
    bin_edges = np.arange(-0.5*bin_width, x_max, bin_width)
    counts = np.histogram(total_number_cells_at_time_point, bin_edges)[0]
    counts = np.array(counts)
    probabilities = counts/float(sum(counts))
    rectangle_width = 0.7 * bin_width
    bin_centers = 0.5 * (bin_edges[:-1] + bin_edges[1:])
    rectangle_left_edges = bin_centers - 0.5*rectangle_width
    ax.bar(left=rectangle_left_edges, height=probabilities, width=rectangle_width, color='red')
    ax.set_xlabel('number of cells per tumor', fontsize=fontsize)
    ax.set_ylabel('probability', fontsize=fontsize)
    ax.tick_params(axis='both', which='major', labelsize=fontsize_tick)
    print('number of samples = ' + str(len(total_number_cells_at_time_point)))
    
plot_distribution(data_long)









    



number of samples = 100

The following interactive visualization was inspired by this mpld3 example. If your Python distribution doesn't already have mpld3, you'll need to install it, e.g., by executing

pip install mpld3

in Terminal. mpld3 allows the user to combine Matplotlib, the popular Python-based graphing library, and D3, the popular Javascript library for creating interactive data visualizations for the web.



In [11]:

    
# adjust the Notebook display properties so that mpld3 will be used to display figures
from mpld3 import enable_notebook
enable_notebook()

# define a 'plugin' to add interactive behavior to matplotlib plots rendered in d3
from mpld3 import plugins, utils
class LinkedView(plugins.PluginBase):
    """Link scatter plot to two line plots"""

    JAVASCRIPT = """
    mpld3.register_plugin("linkedview", LinkedViewPlugin);
    LinkedViewPlugin.prototype = Object.create(mpld3.Plugin.prototype);
    LinkedViewPlugin.prototype.constructor = LinkedViewPlugin;
    LinkedViewPlugin.prototype.requiredProps = ["idpts", "idline1", "data1", "idline2", "data2"];
    LinkedViewPlugin.prototype.defaultProps = {}
    function LinkedViewPlugin(fig, props){
        mpld3.Plugin.call(this, fig, props);
    };

    LinkedViewPlugin.prototype.draw = function(){
      var pts = mpld3.get_element(this.props.idpts);
      var line1 = mpld3.get_element(this.props.idline1);
      var data1 = this.props.data1;
      var line2 = mpld3.get_element(this.props.idline2);
      var data2 = this.props.data2;

      function mouseover(d, i){
        line1.data = data1[i];
        line2.data = data2[i];

        line1.elements().transition()
            .attr("d", line1.datafunc(line1.data))
            .style("stroke", this.style.fill);
        line2.elements().transition()
            .attr("d", line2.datafunc(line2.data))
            .style("stroke", this.style.fill);
      }
      pts.elements().on("mouseover", mouseover);
    };
    """

    def __init__(self, points, line1, linedata1, line2, linedata2):
        if isinstance(points, mpl.lines.Line2D):
            suffix = "pts"
        else:
            suffix = None

        self.dict_ = {"type": "linkedview",
                      "idpts": utils.get_id(points, suffix),
                      "idline1": utils.get_id(line1),
                      "data1": linedata1,
                      "idline2": utils.get_id(line2),
                      "data2": linedata2}

Having set up the interactive data visualization tools, we use them to produce an interactive scatter plot:



In [12]:

    
def remove_low_trajectories(data, inverse_gain): 
    """remove time-courses in which number of Q-cells never reaches inverse feedback gain"""
    
    tj = data.number_Q_cells.max(axis=1) > inverse_gain
    
    def thin_base(arr, indices):
        return arr[indices, :]
    
    return Data(thin_base(data.time_points, tj), thin_base(data.number_C_cells, tj), thin_base(data.number_Q_cells, tj))

def extract_scatter(data, inverse_gain): 
    """extract data for scatter plot"""
    
    # get a list of indices of all trajectories
    number_trajectories, number_time_points = data.time_points.shape
    trajectory_indices = list(range(number_trajectories)) # range is a generator, not a list, in Python 3.x

    # for each trajectory, compute the time at which the number of C-cells (first) reaches its maximum, and compute that maximum
    time_indices = data.number_C_cells.argmax(axis=1)
    times_C_max = data.time_points[trajectory_indices, time_indices]
    values_C_max = data.number_C_cells[trajectory_indices, time_indices]
    
    def first_time_crossing_threshold(number_cells, threshold): 
        """for each trajectory, compute the first time 
        at which the number of cells crosses a threshold"""
        
        time_indices = np.argmax(number_cells>threshold, axis=1)
        return data.time_points[trajectory_indices, time_indices]

    times_C_inverseGain = first_time_crossing_threshold(data.number_C_cells, inverse_gain)
    times_Q_inverseGain = first_time_crossing_threshold(data.number_Q_cells, inverse_gain)
    
    return times_C_max, values_C_max, times_C_inverseGain, times_Q_inverseGain

def process_data(data, inverse_gain): 
    """remove time courses and extract scatter data"""
    
    data = remove_low_trajectories(data, inverse_gain)
    extracted_data = extract_scatter(data, inverse_gain)
    return extracted_data + (data,)
    
def interactive_scatter_plot(data, inverse_gain, title_string): 
    """use mpld3 to interactively visualize correlations"""
    
    times_C_max, values_C_max, times_C_inverseGain, times_Q_inverseGain, data = process_data(data, inverse_gain)
    
    # set up the figure
    fig = plt.figure(figsize=(10, 7), facecolor='w')
    ax1 = plt.subplot2grid((2,2), (0,0), rowspan=2)
    ax2 = plt.subplot2grid((2,2), (0,1))
    ax3 = plt.subplot2grid((2,2), (1,1))
    
    # scatter plot
    points = ax1.scatter(times_C_max, times_Q_inverseGain, s=values_C_max, alpha=0.5)
    ax1.set_xlabel('time at which C is maximal', labelpad=15, fontsize=fontsize)
    ax1.set_ylabel('time at which Q first reaches inverse of feedback gain', labelpad=15, fontsize=fontsize)
    ax1.set_title(title_string, fontsize=fontsize)
    min_time = np.amin([times_C_max.min(), times_Q_inverseGain.min()])
    max_time = np.amax([times_C_max.max(), times_Q_inverseGain.max()])
    ax1.plot([min_time, max_time], [min_time, max_time], color='black')
    ax1.text(0.05, 0.90, 'hover over points to see lines', transform=ax1.transAxes, fontsize=fontsize_tick)

    def line_plot(axis_object, y_data, ylabel_string):
        """set up axis and data needed to generate time courses"""
        
        # line plot 
        t = data.time_points[0,:]  # necessary for mpld3 to update line plots correctly
        lines = axis_object.plot(t, 0*t, '-w', lw=2, alpha=0.5)
        line = lines[0]
        axis_object.set_xlim(data.time_points.min(), data.time_points.max())
        axis_object.set_ylim(0, y_data.max())
        axis_object.set_ylabel(ylabel_string, labelpad=15, fontsize=fontsize)

        # line data
        linedata = np.dstack((data.time_points, y_data)).tolist()

        return line, linedata
    
    line_2, linedata_2 = line_plot(ax2, y_data=data.number_C_cells, ylabel_string='C (# cells)')
    line_3, linedata_3 = line_plot(ax3, y_data=data.number_Q_cells, ylabel_string='Q (# cells)')

    ax3.set_xlabel('time', labelpad=15, fontsize=fontsize)
    
    # use mpld3 to connect scatter and line plots interactively
    plugins.connect(fig, LinkedView(points, line_2, linedata_2, line_3, linedata_3))

interactive_scatter_plot(data_short, 1.0/parameterValues_short['negative_gain_Q'], 'short-range high-gain feedback')

Each point in the scatter plot corresponds to a single growth trajectory. C and Q represent the number of C- and Q-cells, respectively. The size of the points in the scatter plot (left) reflect the maximum number of C-cells reached during the course of the simulation. When that number is small, trajectories are highly variable whereas when that number is large trajectories are more uniform, as can be seen by hovering over the points in the scatter plot and viewing the associated time courses on the right. (Use the buttons at the bottom-left of the plot to enable zooming and panning, and to reset the view.)

Notice that most of the scatter about the regression line appears to be explained by small points. This means that noisy trajectories exhibit a poorer correlation between the time courses of C- and Q-cells than do deterministic ones. Could averaging out noise, making all trajectories more deterministic, improve the correlation? One way to average out noise is spatially, by extending the spatial range of negative feedback (and decreasing the gain). When we do this and re-examine the scatter plot (shown below), we do indeed observe a tighter correlation between C- and Q-trajectories.



In [13]:

    
interactive_scatter_plot(data_long, 1.0/parameterValues_long['negative_gain_Q'], 'long-range low-gain feedback')



In [ ]: