In [ ]:
%pylab inline
If you want to deepen your theoretical knowledge of the content of this class, you can read (this is not mandatory)
You can skip this section if you are already familiar with graph theory.
A graph is a couple $G = (V, E)$ where $V$ is a set of vertices (also called nodes) and $E$ is a set of edges. Typically, we would take $V = \{1,\ldots,N\}$ where $N$ is the total number of vertices in the graph. The graph is said directed if the edges have an orientation. In this case, $E$ is a set of couples $(i,j)$, where $j \in V$ is the head of the oriented edge (also called an arrow) and $i \in V$ is its tail. The graph is said undirected the edges don't have an orientation. Then $E$ is a set of unordered pairs $\{i, j\}$, where $i \in V$ and $j \in V$ need not be distinct. There are several ways to describe a graph, for instance:
In the sequel, we will focus on undirected graphs. Then the degree of a node $i$ is the total number of edges that have $i$ as one of their endpoints. Observe that the sum of the degrees is twice the number of edges (because each edge has two end points). A path is an ordered sequence of nodes such that that two consecutive nodes are connected by an edge. A connected component is a maximal set of nodes that are two-by-two connected by a path. The graph is said connected if it contains a single connected component, which means that there is a path between any two nodes. A cycle is a path that starts and finishes at the same node (as a special case, an edge that has two identical extremities is called a loop).
The Erdös-Rényi graph $\mathbf{G(n,p)}$ is an undirected random graph without loop. Unlike the Galton-Watson process, the population contains a finite number $\mathbf{n}$ of nodes and the randomness only lies in the construction of the edges. Specifically, for each pair of (distinct) nodes $i$ and $j$, there is an edge between $i$ and $j$ with probability $\mathbf{p}$, independently of the existence of the other edges. Thoughout this practical, we will study the behavior of the Erdös-Rényi graph $G(n,p)$ as $n$ goes to infinity.
Write a function erdos_renyi(n, p)
that returns the adjacency list of a realization of the Erdös-Rényi graph $G(n,p)$.
For instance, if you build a graph on $n = 5$ nodes numbered $0$, $1$, $2$, $3$, and $4$, and with edges between the nodes $0$ and $3$, $0$ and $4$, $1$ and $4$, $2$ and $3$, and $2$ and $4$, the output should look like [[3, 4], [4], [3, 4], [0, 2], [0, 1, 2]]
.
Remarks:
numpy
functions rather than for
loops in order to boost the performance. You can reuse some of the functions of the first practical. Depending on your implementation, the following functions might also be useful: triu
, transpose
, where
.Write a function size_components(L)
that takes as input the adjacency list L
of an undirected graph and returns a list of the sizes (in number of nodes) of its connected components.
Remark: For this question and the next one, it is best to verify the output of your function on small graphs, such as instances of the Erdös-Rényi graph with $n = 5$. You will use these functions a lot in the rest of the practical.
We now describe the Reed-Frost epidemic, which is a simple model to analyze the propagation of an epidemic (disease, rumor...) in a population.
We consider a population of $\mathbf{n}$ individuals. The time is slotted and, at the end of each time slot, each individual can be either susceptible, infected, or removed. At the beginning of the first time slot, a single individual is infected and the others are susceptible. Then the epidemic evolves over time as follows. Any individual that is infected at the beginning of one time slot is contagious during this time slot and (definitely) removed at the end. During this unique time slot when they are contagious, an individual can infect any other susceptible individual independently at random with some infection probability $\mathbf{p}$. Therefore, the individuals who were infected at the beginning of some time slot are removed at the end of this time slot, those who were infected during this time slot become infected, and the others remain susceptible.
We will look at the behavior of the epidemic as the population size $n$ grows. The exercise is divided into two parts that focus on two (related but different) questions:
The Reed-Frost epidemic is related to the Erdös-Rényi random graph as follows. If we consider a Reed-Frost epidemic that starts at some individual $i$ and has an infection probability $p$, the individuals who are infected at the end of some time slot $t \geq 1$ correspond to the nodes that are at distance $t$ of node $i$ in the Erdös-Rényi random graph $G(n,p)$.
Using this relation with Erdös-Rényi graphs, adapt the functions of Exercise 1 to compute the mean number of individuals that are eventually removed in the Reed-Frost epidemic, starting from an arbitrary individual of the population (say, the first one). The inputs of this new function infection_size(n, p, nb_trials)
are the number n
of individuals in the population, the infection probability p
, and the number nb_trials
of independent realizations to consider.
For a given population size n
, plot the mean number of individuals that are eventually infected by a Reed-Frost epidemic as a function of the infection probability p
.
We advise you to consider n = 500
with nb_trials = 200
trials and to focus on small values of p
.
Discuss the results. What do you observe?
Give the probability that a given node in the Erdös-Rényi graph $G(n,p)$ is isolated, in the sense that this is not connected to any other nodes in the graph. Use this result to compute the mean number of isolated nodes in the graph. Look at the limit of this value as $n \to +\infty$, when $p = \lambda \frac{\log(n)}n$ for some $\lambda > 0$. What does the result suggest? Is it consistent with the observation of Question 8?
We finally consider a heterogeneous random graph $\mathbf{G(n_1, p_1, n_2, p_2, p)}$ defined as follows:
This heterogeneous random graph is used to describe the propagation of an extension of the Reed-Frost epidemic with a population of size $n = n_1 + n_2$, where the infection probabilities depend on the types of the individuals. The objective of this exercise is to generalize the result of Exercise 2.1 about the mean infection size.
Assuming that $p_1 = p_2 = 0$, use the same approach as in Question 6 to intuit the critical value of $p$ that guarantees that the infection propagates to a non-negligible fraction of the population as $n_1$ and $n_2$ go to infinity. The objective is not to prove anything, but just to reason about the model.