In [ ]:
%pylab inline
For the first session, we propose to investigate the Galton-Watson process. This process was first introduced to analyse the propagation of a feature (family surname, DNA) through generations, and in particular to estimate its probability of extinction.
A Galton-Watson process can be depicted as a (directed) random tree. The nodes represent individuals and the arrows family relations. The tree is built generation by generation as follows. At generation $0$, there is a single individual, the population ancestor, represented by the tree root. Then, each individual from a given generation $ i $ gives birth to a random number of individuals at generation $ i+1 $, represented by its direct successors in the tree.
The unique parameter of the process is the distribution of the number of children per individual. Specifically, we assume that the numbers of children are independent between individuals and drawn from the same distribution $(p_k)_{k\in \mathbb{N}}$ (with $\sum_{k=0}^{\infty} p_k = 1$). We let $\mu$ denote the mean number of children per individual, which we assume to be finite: $$ \mu = \sum_{k=0}^{\infty} k p_k <+\infty. $$
Note that, in practice, there are multiple ways to explore the tree. We will consider two of them in the pratical:
The goal of the practical is to play with these two viewpoints in order to better understand the Galton-Watson process. We will focus on two metrics: the mean population size $\chi$ and the extinction probability $P_{ext}$ (i.e., the probability that the population is finite).
If you want to deepen your theoretical knowledge of this process, you can read Chapter 1 from the book Epidemics and Rumours in Complex Networks (which is not mandatory).
Throughout the practical, we will first gain insight into the Galton-Watson process by running simulations with a specific distribution of the number of children, and then generalize our observations to any distribution.
For the simulations, we will focus on the bimodal distribution, in which a node can only have 0 or 2 children: $p_0 = 1-\frac{\mu}{2}$, $p_2 = \frac{\mu}{2}$, and $p_k = 0$ for $k \notin \{0,2\}$. Note that the mean $\mu$ can range from $0$ to $2$. The objective of this first exercice is to build a realization of the Galton-Watson process with the bimodal distribution, using either the generation by generation traversal or the active node by active node traversal.
Note: Try to write a flexible code, as the parameters and distributions will change later.
We consider the generation by generation traversal.
First, complete the function generation_growth(μ, imax)
that returns the values of $G_i$ observed during a realization of the process, up to generation imax
. The function random.rand
from numpy
package may be handy.
Second, complete the function display_generation_growth(μ, imax = 20, n = 20)
using the function plot
from matplotlib
package. You should plot the number of nodes as a function of the generation. The additional argument n
gives the number of realizations of the process to plot on the same graph.
In [ ]:
def generation_growth(μ, imax):
g = ones(imax + 1, dtype = int)
# to be completed
return g
In [ ]:
def display_generation_growth(μ, imax = 20, n = 20):
figure()
for i in range(n):
# to be completed
# just to make the plot prettier
xlim([0, imax])
axes = gca(); yl = axes.get_ylim(); ylim([0, yl[1]])
xlabel('Generation'); ylabel('Number of nodes')
title("μ = %.1f" % μ)
show()
In [ ]:
for μ in arange(.4, 1.5, .2):
display_generation_growth(μ)
We now consider the active node by active node traversal.
First, complete the function active_growth(μ, tmax)
that returns the values of $X_t$ observed during a realization of the process, up to tmax
terminations. Look out! Your function shouldn't return negative values.
Second, complete the function display_active_growth(μ, tmax = 50, n = 10)
using the function plot
from matplotlib
package. You should plot the number of active nodes as a function of the termination step.
In [ ]:
def active_growth(μ, tmax):
x = zeros(tmax + 1, dtype = int)
# to be completed
return x
In [ ]:
def display_active_growth(μ, tmax = 50, n = 10):
figure()
for i in range(n):
# to be completed
xlim([0, tmax])
axes = gca(); yl = axes.get_ylim(); ylim([0, yl[1]])
xlabel('Termination step'); ylabel('Number of active nodes')
title("μ = %.1f" % μ)
show()
In [ ]:
for μ in arange(.4, 1.5, .2):
display_active_growth(μ)
The first quantity we evaluate is the mean population size. The results obtained by simulation under the bimodal distribution will prove representative of what we obtain under any distribution.
Complete the function chi_bim_generation(μ, imax, n = 1000)
that estimates $\mathbb{E}(G_i)$ for $i$ up to imax
by averaging the results of generation_growth
over n
independent runs.
For $\mu = \frac12$ and $\mu = \frac43$, plot the results and comment them. We advise you to start with $\mu = \frac43$ and use a semi-log plot.
In [ ]:
def chi_bim_generation(μ, imax, n = 1000):
# to be completed
Your turn!
We now focus on the extinction probability $P_{ext}$, that is, the probability that the total population is finite.
Complete the function pext_bim_trials(range_μ, tmax, n)
that estimates the probability that the population extincts after no more than tmax
termination steps by averaging the results of active_growth
over n
independent runs, for several values of $\mu$.
Plot the results and comment them. Suggested values: range_μ = (np.)linspace(2, 0, 40, endpoint = False)
, tmax = 10, 100, 1000
, n = 1000
.
In [ ]:
def pext_bim_trials(range_μ, tmax, n):
# to be completed
In the special case of the bimodal distribution, use this equation to write $P_{ext}$ as a function of $\mu$.
Write a (very) small function pext_bim_exact(range_μ)
that computes $P_{ext}$ for a list of values of $\mu$ and display the results against the empirical values obtained in Question 3.2.
Evaluating the results by simulation has an inherent lack of accuracy. Write a function pext_bim_distrib(range_μ, tmax)
that computes exactly the probability of an extinction after no more than tmax
terminations. Display the results and compare them with those obtained by simulation.
Hint: write a function pop_after_t(p, tmax)
that computes the distribution of the number of active nodes after tmax
termination steps as a function of $ p = (p_k)_{k \in \mathbb{N}}$. The function convolve
from numpy
package may be handy.
We again focus on the probability $P_{ext}$ of an extinction. For the bimodal distribution, we observed the following phase transition:
This result is valid for any distribution of the number of children per node, as long as its mean $\mu$ is finite. The proof will be given orally during the practical. You can also find it in Epidemics and Rumours in Complex Networks, along with more details (in particular, what happens when $\mu = 1$). We now consider two other distributions of the number of children per node and look at the extinction probability when $\mu > 1$.
We now consider a geometric distribution $p_k=(1-a)a^k$, $k \in \mathbb{N}$, for some $0\leq a<1$. Relate $a$ and $\mu$ and study the extinction probability as you did for the bimodal case. In particular,
random.geometric
from numpy
package may be handy.pop_after_t
of Question 3.5 to compute the probability of an extinction after tmax = 1000
termination steps for a truncated geometric distribution.We finally consider a Poisson distribution of parameter $\mu$: $$ p_k = e^{-\mu}\frac{\mu^k}{k!}, \quad \forall k \in \mathbb{N}. $$ Study the extinction probability. In particular,
random.poisson
from numpy
package may be handy.pop_after_t
of Question 3.5 to compute the probability of an extinction after t = 1000
termination steps for a truncated Poisson distribution.