To start the course, we propose to investigate the Galton-Watson process. This process was introduced to depict the propagation of some feature (family surname, DNA) through generations, and in particular to estimate its probability of extinction.
A Galton-Watson process can be represented as a (directed) random tree. It is built generation by generation as follows. At generation $0$, there is a single individual, the ancestor of the population, represented by the tree root. Starting from this, each individual from a given generation $ i $ gives birth to a certain number of individual at generation $ i+1 $, represented by its direct successors in the tree.
The unique parameter of the process is the distribution of the number of children of each node. Specifically, we assume that the number of children of a node is independent from the other nodes, drawn according to some given distribution $(p_k)_{k\in \mathbb{N}}$ (s.t. $\sum_{k=0}^{\infty} p_k = 1$). We let $\mu$ denote the average number of children, supposed finite: $$ \mu = \sum_{k=0}^{\infty} k p_k <+\infty. $$
Note that there are multiple ways to explore the tree. For example:
The goal of the practical is to play with the two views to understand Galton-Watson processes.
If you want to deepen your theoretical knowledge of this process, you can read Chapter 1 from the book Epidemics and Rumours in Complex Networks (which is not mandatory).
In [ ]:
%pylab inline
We first assume a very simple children distribution, the bimodal distribution, where a node can only have 0 or 2 children: $p_0=1-\mu/2, p_2=\mu/2, p_k=0$ for $k\notin \{0,2\}$. In this section, we focus on values of the mean $\mu$ which range between $0$ and $2$, and perform an empirical study of the associated Galton-Watson process.
Note: Try to write a flexible code, as parameters and distributions will change.
We first consider the generation by generation exploration. Write a function generation_growth
that returns the values $G_i$ observed during a realization of the process (up to generation imax
).
Run it a few times with different values of the mean $\mu$. Can you comment?
The function random.rand
of numpy
package may be handy.
Code:
In [ ]:
def generation_growth(μ, imax = 20):
g = ones(imax + 1, dtype = int)
# to be completed
return g
Discussion:
In [ ]:
μ = 1.
generation_growth(μ)
We now consider the active node by active node exploration. Write a function active_growth
that returns the values $X_t$ observed during a realization of the process (fix a maximal number tmax
of terminations). Warning: your function should not return negative values.
Run it a few timess with different values of the mean $\mu$. Can you comment?
Code:
In [ ]:
def active_growth(μ, tmax = 20):
x = zeros(tmax + 1, dtype = int)
# to be completed
return x
Discussion:
In [ ]:
μ = 1.
active_growth(μ)
Write two functions estimate_generation
and estimate_active
that estimate $\mathbb{E}(G_i)$ and $\mathbb{E}(X_t)$ by averaging the results over $ n $ runs.
For $\mu = 1/2$ and $\mu = 4/3$, display the results in figures and comment.
Code:
In [ ]:
def estimate_generation(μ, imax = 10, n = 1000):
g = zeros(imax+1)
# to be completed
return g / n
In [ ]:
def estimate_active(μ, tmax = 100, n = 1000):
x = np.zeros(tmax+1)
# to be completed
return x/n
Discussion:
In [ ]:
μ = 4/3
In [ ]:
imax = 10
figure()
semilogy(arange(imax+1), estimate_generation(μ, imax), label='Estimate of $\mathbb{E}(G_i)$')
xlabel('Generation')
ylabel('Number of individuals')
legend()
show()
In [ ]:
tmax = 100
figure()
plot(arange(tmax+1), estimate_active(μ, tmax), label='Estimate of $\mathbb{E}(X_t)$')
xlabel('Step')
ylabel('Number of individuals')
legend(loc=2)
show()
At each realization, you may face extinction in the sense that none of the nodes of this generation has children. Write a function estimate_extinction
that uses the function active_growth
of Question 2 to estimate the probability of extinction $ P_{ext} $. Run it on a few values of $\mu$.
Code:
Evaluate $\mathbb{E}(G_i)$ conditioned on the run has lead to extinction or not. Discuss the results.
Answer:
We focus now on the probability $P_{ext}$ of extinction, which is the probability that the total population is finite. We will observe experimentally the following phase transition:
The proof of this result is given in Epidemics and Rumours in Complex Networks, along with more details (in particular, the behavior of the process when $\mu = 1$).
Give an equality that relates $P_{ext}$ and $(p_k)_{k\in \mathbb{N}}$.
Answer:
We consider the bimodal distribution of Exercice 1.
Admitting that $P_{ext}$ is the smallest solution of the previous equation in the interval $[0,1]$, relate $P_{ext}$ and $\mu$.
Write a (very) small function pext_bim_exact
that computes $P_{ext}$ for a list of $\mu$'s.
Answer:
Adapt the function from Question 4 of Exercice 1 to estimate $P_{ext}$ by simulation for multiple values of $\mu$. Suggested values: t=10
, t=100
, t=1000
, n=1000
, $\mu$ = np.arange(0,2.05,.05)
. Display the results you obtain against the theoretical value obtained in the previous question. Discuss the difference.
Code:
Discussion:
Evaluating the results by simulation has an inherent lack of accuracy. Try to compute exactly the probability that all nodes are dead after $t$ terminations. Display the results and compare.
Hint: for $t <\infty$, write a function pop_after_t
that computes the distribution of the number of active nodes after $t$ terminations as a function of $ p = (p_k)_{k \in \mathbb{N}}$. The function convolve
from numpy
package may be handy.
Answer:
Code:
We now consider a geometric distribution $p_k=(1-a)a^k$, or each $k \in \mathbb{N}$, for some $0\leq a<1$. Relate $a$ and $\mu$ and study the extinction probability like you did for the bimodal case. In particular, give the equation $ P_{ext} $ should verify. Compute $ P_{ext} $ as a function of $ \mu $ for $ \mu\in [0, 2] $. For the non trivial cases, you can use an iterative computation of the solution. To validate the result, you should for instance:
pop_after_t
to compute $ P_{ext} $ after t=1000
terminations for a truncated geometric distribution.random.geometric
from numpy
package may be handy.Display the results.
Code:
We finally consider a Poisson distribution of parameter $\mu$: $$ p_k = e^{-\mu}\frac{\mu^k}{k!}, \quad \forall k \in \mathbb{N}. $$ Study the extinction probability. In particular, give the equation $ P_{ext} $ should verify. Compute $ P_{ext} $ as a function of $ \mu $ for $ \mu\in [0, 2] $. For the non trivial cases, you can use an iterative computation of the solution. To validate the result:
pop_after_t
to compute $ P_{ext} $ after t=1000
terminations for a truncated Poisson distribution.random.poisson
from numpy
package may be handy.Display the results.
Code:
Plot the three theoretical $ P_{ext} $ you obtained on the same figure, for the same mean value $\mu$. Try to informally discuss the differences.
Answer:
Redo the exercices of this Notebook for a Galton-Watson process that starts with two nodes (you can re-use results from above).