ACN 903 S2: Erdös-Rényi Graph

Céline Comte & Fabien Mathieu


In [ ]:
%pylab inline

If you want to deepen your theoretical knowledge of the content of this class, you can read (this is not mandatory)

Reminders: graph theory definitions

You can skip this section if you are already familiar with graph theory.

A graph is a couple $G = (V, E)$ where $V$ is a set of vertices (also called nodes) and $E$ is a set of edges. Typically, we would take $V = \{1,\ldots,N\}$ where $N$ is the total number of vertices in the graph. The graph is said directed if the edges have an orientation. In this case, $E$ is a set of couples $(i,j)$, where $j \in V$ is the head of the oriented edge (also called an arrow) and $i \in V$ is its tail. The graph is said undirected the edges don't have an orientation. Then $E$ is a set of unordered pairs $\{i, j\}$, where $i \in V$ and $j \in V$ need not be distinct. There are several ways to describe a graph, for instance:

  • Adjacency matrix: An $N \times N$ matrix $A = (A_{i,j})_{i, j = 1,\ldots,N}$, where $A_{i,j} = 1$ if there is an edge between nodes $i$ and $j$ (or from node $i$ to node $j$ if the graph is directed) and $A_{i,j} = 0$ otherwise. If the graph is undirected, the matrix $A$ is symmetrical.
  • Adjacency list: The list of the lists of neighbors of each node.

In the sequel, we will focus on undirected graphs. Then the degree of a node $i$ is the total number of edges that have $i$ as one of their endpoints. Observe that the sum of the degrees is twice the number of edges (because each edge has two end points). A path is an ordered sequence of nodes such that that two consecutive nodes are connected by an edge. A connected component is a maximal set of nodes that are two-by-two connected by a path. The graph is said connected if it contains a single connected component, which means that there is a path between any two nodes. A cycle is a path that starts and finishes at the same node (as a special case, an edge that has two identical extremities is called a loop).

1. Erdös-Rényi random graph

The Erdös-Rényi graph $\mathbf{G(n,p)}$ is an undirected random graph without loop. Unlike the Galton-Watson process, the population contains a finite number $\mathbf{n}$ of nodes and the randomness only lies in the construction of the edges. Specifically, for each pair of (distinct) nodes $i$ and $j$, there is an edge between $i$ and $j$ with probability $\mathbf{p}$, independently of the existence of the other edges. Thoughout this practical, we will study the behavior of the Erdös-Rényi graph $G(n,p)$ as $n$ goes to infinity.

Question 1

Write a function erdos_renyi(n, p) that returns the adjacency list of a realization of the Erdös-Rényi graph $G(n,p)$. For instance, if you build a graph on $n = 5$ nodes numbered $0$, $1$, $2$, $3$, and $4$, and with edges between the nodes $0$ and $3$, $0$ and $4$, $1$ and $4$, $2$ and $3$, and $2$ and $4$, the output should look like [[3, 4], [4], [3, 4], [0, 2], [0, 1, 2]].

Remarks:

  1. There are many different ways of implementing this function. Remember that you want an undirected graph without loop. In particular, if $j$ is in the adjacency list of $i$, then $i$ should also be in the adjacency list of $j$.
  2. Since this function will be used over and over again throughout the practical, we recommend that your final implementation relies on numpy functions rather than for loops in order to boost the performance. You can reuse some of the functions of the first practical. Depending on your implementation, the following functions might also be useful: triu, transpose, where.

 Question 2

Give the mean number of edges in the graph $G(n,p)$. Verify that, for n = 1000, the mean number of edges in the instance returned by the function erdos_renyi of Question 1 is close to this value.

 Question 3

Write a function size_components(L) that takes as input the adjacency list L of an undirected graph and returns a list of the sizes (in number of nodes) of its connected components.

Remark: For this question and the next one, it is best to verify the output of your function on small graphs, such as instances of the Erdös-Rényi graph with $n = 5$. You will use these functions a lot in the rest of the practical.

2. Reed-Frost Epidemic

We now describe the Reed-Frost epidemic, which is a simple model to analyze the propagation of an epidemic (disease, rumor...) in a population.

We consider a population of $\mathbf{n}$ individuals. The time is slotted and, at the end of each time slot, each individual can be either susceptible, infected, or removed. At the beginning of the first time slot, a single individual is infected and the others are susceptible. Then the epidemic evolves over time as follows. Any individual that is infected at the beginning of one time slot is contagious during this time slot and (definitely) removed at the end. During this unique time slot when they are contagious, an individual can infect any other susceptible individual independently at random with some infection probability $\mathbf{p}$. Therefore, the individuals who were infected at the beginning of some time slot are removed at the end of this time slot, those who were infected during this time slot become infected, and the others remain susceptible.

We will look at the behavior of the epidemic as the population size $n$ grows. The exercise is divided into two parts that focus on two (related but different) questions:

  • Part 2.1: What is the mean size of the infection, starting from an arbitrary individual of the population? In particular, is there a minimum value of the infection probability $p$ that guarantees that the infection propagates to a non-negligible fraction of the population?
  • Part 2.2: Is there a minimum value of the infection probability $p$ that guarantees that the infection propagates to all $n$ individuals?

2.1 Mean size of the infection

Question 4

The Reed-Frost epidemic is related to the Erdös-Rényi random graph as follows. If we consider a Reed-Frost epidemic that starts at some individual $i$ and has an infection probability $p$, the individuals who are infected at the end of some time slot $t \geq 1$ correspond to the nodes that are at distance $t$ of node $i$ in the Erdös-Rényi random graph $G(n,p)$.

Using this relation with Erdös-Rényi graphs, adapt the functions of Exercise 1 to compute the mean number of individuals that are eventually removed in the Reed-Frost epidemic, starting from an arbitrary individual of the population (say, the first one). The inputs of this new function infection_size(n, p, nb_trials) are the number n of individuals in the population, the infection probability p, and the number nb_trials of independent realizations to consider.

 Question 5

For a given population size n, plot the mean number of individuals that are eventually infected by a Reed-Frost epidemic as a function of the infection probability p. We advise you to consider n = 500 with nb_trials = 200 trials and to focus on small values of p. Discuss the results. What do you observe?

Question 6

At this point, can you make a parallel with the phase transition observed for the Galton-Watson process? Can you explain it?

Hint: Compare the generation by generation traversal of the Galton-Watson process with the propagation of the Reed-Frost epidemic.

2.2 Probability of a total infection

Question 7

Write a function infection_proba(n, p, nb_trials) that estimates the probability that the Reed-Frost epidemic with infection probability p propagates to all n individuals. Each value should be obtained by averaging nb_trials independent realizations of the infection.

Question 8

Plot the probability that the Reed-Frost epidemic propagates to all n individuals as a function of the infection probability p, for the same values of n and nb_trials as before. Discuss the results. Do you observe the same transition as in Question 2?

Question 9

Give the probability that a given node in the Erdös-Rényi graph $G(n,p)$ is isolated, in the sense that this is not connected to any other nodes in the graph. Use this result to compute the mean number of isolated nodes in the graph. Look at the limit of this value as $n \to +\infty$, when $p = \lambda \frac{\log(n)}n$ for some $\lambda > 0$. What does the result suggest? Is it consistent with the observation of Question 8?

3. The heterogeneous Erdös-Rényi graph, a.k.a. the stochastic block model (optional)

We finally consider a heterogeneous random graph $\mathbf{G(n_1, p_1, n_2, p_2, p)}$ defined as follows:

  • There are $n_1$ nodes of type 1 and $n_2$ nodes of type 2;
  • Two distinct nodes of type 1 are connected by an edge with probability $p_1$;
  • Two distinct nodes of type 2 are connected by an edge with probability $p_2$;
  • A node of type 1 and a node of type 2 are connected by an edge with probability $p$.

This heterogeneous random graph is used to describe the propagation of an extension of the Reed-Frost epidemic with a population of size $n = n_1 + n_2$, where the infection probabilities depend on the types of the individuals. The objective of this exercise is to generalize the result of Exercise 2.1 about the mean infection size.

Question 10

Write a function heterogeneous_erdos_renyi(n1, p1, n2, p2, p) that returns the adjacency list of a realization of the heterogeneous Erdös-Rényi graph $G(n_1, p_1, n_2, p_2, p)$.

Question 11

Assuming that $p_1 = p_2 = 0$, use the same approach as in Question 6 to intuit the critical value of $p$ that guarantees that the infection propagates to a non-negligible fraction of the population as $n_1$ and $n_2$ go to infinity. The objective is not to prove anything, but just to reason about the model.

Question 12

Verify your guess by simulation. You can either reuse the function size_components of Questions 1.2 or write a new function.

Question 13

Generalize your answer to arbitrary values of $p_1$ and $p_2$.