Session 2 Primer

Things to do until next week (29.April 2015)

Python

  • install some python stack
    • python, ipython, numpy, scipy, matplotlib, seaborn
    • e.g. anaconda
  • get familiar with python
    • basic synthax
    • numpy for array operations
    • scipy for some stats
    • some plotting (seaborn or matplotlib)

Excercise

  • read next weeks' paper :
    • Till, J. E., McCulloch, E. a. & Siminovitch, L. A stochastic model of stem cell proliferation, based on the growth of spleen colony-forming cells. Proc. Natl. Acad. Sci. U. S. A. 51, 29–36 (1964).
  • get this notebook
  • make sure the notebook works for you (dependencies on some libraries)
  • understand how simulateTree(p0, nGens) works and use it to simulate 500 trees of 10 generations each (p0=0.4)
  • plot the distribution of clonesize over time (violin plot e.g.):

Till/McCulloch model simulation


In [1]:
# ensure that plots are shown inline
%matplotlib inline
import numpy as np # <- efficient vector/matrix operations (similar to MATLAB)

# the next ones are not required here, but might become useful later on, check if they're installed
import matplotlib as plt # <- basic plotting
import seaborn as sns    # <- fancy plotting
import pandas as pd      # <- a powerful data analysis and manipulation library for Python
import sklearn           # <- machine learning libray
import scipy             # <- scientific computing in general
import sympy             # <- symbolic calculations

'---------------------------------------------'

def simulateTree(p0, nGens):
    """
    simulates a single tree from the Till/McCulloch model
    
    inputs:
        p0: probability that a single cell undergoes terminal differentiation (i.e. no more division)
        nGens: number of generations to simulate
    
    returns: 
        a list (one element per generation) of single cells present at that generation.
        a single element is just an array of cells present at that time 
        (zeros for stem cells, 1s for differentiated cells).
    """
    
    # cell state is either 0 (stem cell) or 1 (differentiated),
    # which is the only thing we keep track of here
    
    theGenerations = list()
    theGenerations.append(np.array(0))
    
    for g in range(nGens):
    
        lastGen = theGenerations[-1]
        
        # for each of the last generation, roll a dice whether it terminally diffs
        newState = roll_the_dice(lastGen, p0)
        
        #all the zeros divide, the 1's just stay
        n0 = sum(newState==0) # beware: this is pythons interal sum(), not the one from numpy (which is loads fasters)
        n1 = sum(newState==1) # however, speed doesnt really matter here
        nextGen = np.concatenate([np.repeat(0, 2*n0), np.repeat(1,n1)])

        theGenerations.append(nextGen)

    return theGenerations


def roll_the_dice(cellstate_array, p0):
    """
    decide if a cell goes from 0->1 (wit probability p0)
    does that for an entire vector of zeros and ones in paralell
    """
    # helper function so that we can index into it via generation
    # makes sure that as soon as cell_state==1 it wont change anymore
    tmpP = np.array([p0, 1])     
    p = tmpP[cellstate_array]
    
    r = np.random.rand(cellstate_array.size)
    
    newGeneration = r<p
    
    return newGeneration.astype(int)

Example call


In [2]:
# Example call
aSingleTree = simulateTree(p0=0.1, nGens=5)

print aSingleTree # just a list of numpy.arrays


[array(0), array([0, 0]), array([0, 0, 0, 0]), array([0, 0, 0, 0, 0, 0, 1]), array([0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1]), array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1])]

TODO

  • simulate 500 trees (10 generations, p0=0.4)
  • get the clone size distribution over time. clone size is just the number of cells in a given generation
  • plot it, e.g. simple boxplots, violin plots(seaborn)

In [ ]: