INF 674 S1: Galton-Watson Process

Céline Comte & Fabien Mathieu

2016-2017

To start the course, we propose to investigate the Galton-Watson process. If you want to deepen your theoretical knowledge of Galton-Watson processes, you can read Chapter 1 from the book Epidemics and Rumours in Complex Networks (this is not mandatory).

The goal of the process is to depict the propagation of some feature (family surname, DNA) through generations, and in particular to estimate the probability of extinction.

The process builds a tree as follows: We start with one node (the $0th$ generation, patient $ 0 $), the tree root. Each node from a given generation $ i $ will give birth to a certain number of nodes from generation $ i+1 $. The number of children nodes $ k $ is drawn i.i.d. according to some given distribution $(p_k)_{k\in \mathbb{N}}$ s.t. $\sum_{k=0}^{\infty} p_k = 1$. We call $\mu$ the average number of children, supposed finite: $\mu = \sum_{k=0}^{\infty} k p_k <+\infty$.

Note that there is multiple ways to build evolution of the tree. For example:

  • Generation by generation: we then can call $ G_i $ the random variable that express the number of nodes for generation $ i $.
  • Active node by active node: you keep in mind the number of nodes for which you have not decided the number of offsprings. As long as it is not empty, you can perform a termination (also called activation in previous editions of this course): remove one active node and add new ones according to $(p_k)_{k\in \mathbb{N}}$. We call $ X_t $ the number of active nodes after $ t $ terminations.

The goal of the present is to play with the two views to understand GW.

1. Bimodal distribution

We first assume a very simple children distribution, the bimodal distribution, where a node can only have 0 or 2 children: $p_0=1-\mu/2, p_2=\mu/2, p_k=0$ for $k\notin \{0,2\}$.

For this Section, we take $\mu = \frac{4}{3}$, and perform an empirical study of the associated GW process. Note: Try to write a flexible code, as parameters and distributions will change.

Question 1

Write a function that returns the values $G_i$ observed during one run (fix a maximal number of generation). Try it a few times. Can you comment? np.random.choice may be handy.


In [ ]:
%pylab inline

Answer:

Question 2

Write a function that returns the values $X_t$ observed during one run (fix a maximal number of terminations). Warning: avoid your function to return negative values. Let's try it a few times. Can you comment?

Answer:

Question 3

Write two functions that estimate $\mathbb{E}(G_i)$ and $\mathbb{E}(X_t)$ by averaging over $ n $ runs. Display the results in figures.

Answer:

Question 4

At the end of a given run (last generation / activation), you may face extinction. Write a function that uses $ X_t $ to estimate the probability of extinction $ P_{ext} $.

Answer:

Question 5 (Bonus)

Display $\mathbb{E}(G_i)$ and $\mathbb{E}(X_t)$ conditioned on the run has lead to extinction or not. Discuss the results.

Answer:

2. Extinction

We focus now on the probability of extinction $P_{ext}$, which is the probability that eventually no alive node remains. We will admit that if $\mu > 1$, $P_{ext} < 1$.

Question 1

Give an equation that relates $P_{ext}$ and $(p_k)_{k\in \mathbb{N}}$.

Answer:

Question 2

Relate $P_{ext}$ and $\mu$. Write a (very) small function that gives $P_{ext}$ for a list of $\mu$'s.

Answer:

Question 3

Adapt the function from previous Question 4 to estimate $P_{ext}$ for multiple values of $\mu$. Suggested values: $t=10, t=100, t=1000$, $n=1000$, $\mu= \texttt{np.arange(0,2.05,.05)}$. Display the results against the value of previous question.

Answer:

Question 4

Using $n$ runs has an inherent lack of accuracy. Try to compute exactly the probability that all nodes are dead after $t$ activations. Display the results and compare. Hint: for $t <\infty$, write a function pop_after_t that computes as a function of $ p $ the active population distribution after $t$ elementary steps. np.convolve may be handy.

Answer:

3. Other Distributions

Question 1

We now consider a geometric distribution $p_k=(1-a)a^k$ for some $0\leq a<1$. Relate $a$ and $\mu$ and study the extinction like you did for the bimodal case. For the approach with $n$ trials, np.random.geometric may be handy. For the approach that computes the active distribution, as the geometric distribution has infinite support, you may want to truncate it.

Answer:

Question 2

We now consider a Poisson distribution of parameter $\mu$ ($ p_k = e^{-\mu}\frac{\mu^k}{k!} $). Study the extinction like you did for the \emph{bimodal} case.

Remind the equation $ P_{ext} $ should verify. Compute $ P_{ext} $ as a function of $ \mu $ for $ \mu\in [0, 2] $. For the non trivial cases, one can use an iterative computation of the solution. To validate the result, try to compute $ P_{ext} $ after $ t=1000 $ steps using: pop_after_t and a truncated Poisson distribution; multiple simulations using a Poisson generator. Display and comment.

Answer:

Question 3 (Bonus)

Plot in the same figure the three theoretical $ P_{ext} $ you obtained. Try to informally discuss the differences.

Answer:

Question 4

Start other this NoteBook for a Galton-Watson process that starts with two nodes (you can re-use results from above).