ACN 903 S6: Markov Chains

Céline Comte & Fabien Mathieu

The objective of this practical is to make you see a Markov chain in action. In particular, you will observe what happens if the conditions of the Perron-Frobenius theorem are not met.



In [ ]:

    
%pylab inline

from matplotlib import animation, rc
from IPython.display import HTML

# Uncomment the following line if you want to pimp your animation a little bit
# with xkcd-like display (will cost extra computation time)
# xkcd()

Things to know (matrix-oriented theoretical take-away)

Right stochastic matrix. Let $A = (a_{i,j})$ be a $n \times n$ matrix. We say that $A$ is a right stochastic matrix if:

The coefficients of $A$ are non-negative: $\forall i, j, a_{i,j}\geq 0$.
Each row sums to 1: $\forall i, \sum_j a_{i,j} = 1$.

Homogeneous Markov chain. An $n \times n$ right stochastic matrix $A = (a_{i,j})$ can define a Markov Chain that describes the evolution of a distribution over $n$ states as follows: if one is in state $i$ at step $k$ of the process, the probability to be in state $j$ at step $k+1$ is $a_{i,j}$.

With this matrix notation, the evolution of the Markov chain is easy to study: if $P_k$ is the probability distribution at step $k$ (in particular, $P_k\geq 0$ and $\sum_{i}P_k[i]=1$), then we have $$P_{k+1}=P_k A, \quad \forall k \in \mathbb{N}.$$

Irreducibility. Let $A$ be a non-negative matrix ($\forall i,j, a_{i,j}\geq 0$). Let $G=(V,E)$ be the oriented graph associated to $A$: $(i,j)\in E$ if, and only if $A[i,j]>0$. The following propositions are equivalent:

$A$ is irreducible (that is, the Markov chain associated to the normalized version of $A$ is irreducible);
$G$ is strongly connected: $\forall (i,j)\in V^2$, there exists an oriented path in $G$ from $i$ to $j$;
For each $i, j = 1,\ldots,n$, there exists $k > 0$ such that $A^k[i,j]>0$.

Intuitively, the irreducibility property indicates that, starting from any state, the Markov chain can reach any state with a positive (e.g. >0) probability after some steps.

Aperiodicity (from Wikipedia). The period of a state $i$ is the greatest common divisor of the lengths of the cycles that state $i$ belongs to. If we reformulate this definition in terms of the transition matrix, the period of state $i$ is the greatest common divisor of all natural numbers $k$ such that $A^k[i,i] > 0$. When the Markov chain is irreducible, the period of every state is the same and is called the period of the Markov chain.

A Markov chain is said aperiodic if the period of each state is 1.

Intuitively, a period $k>1$ indicates that the length of any cycle must be a multiple of $k$.

Perron-Frobenius Theorem (a variant, actually). If a matrix $A$ is right stochastic, irreducible and aperiodic, then $A^k \to B$ as $k \to +\infty$, where $B$ is the right stochastic matrix having all its rows equal to the same row vector $Q$ defined as the unique normalized solution to the equation $QA = Q$.

Interpretation. When the condition of the Perron-Frobenius theorem are met, the Markov chain will eventually converge to a unique distribution, independently of its initial state which is forgotten.

1. Markov chains animations

To start with, we consider a small experiment that plays with the assumptions of Perron-Frobenius theorem.

Consider a circular game board made of $ n = 36 $ squares numbered from $ 0 $ to $ 35 $. At the (discrete) turn $ k=0 $, a player stands on square $ 0 $. Between two successive turns, the player moves of a certain number of squares that depends on the game rules. Remember that the board is circular: if the player is in square $35$ and moves one square forward, she lands in square $0$.

The objective of this exercice is to visualize the impact of the game rules on the (probabilistic) position of the player on the board.

Toy example

We now give you the code to visualize the game evolution in a toy example. For now, the player (deterministically) moves one square forward at each turn. The function evolution displays the evolution of a distribution. It takes three arguments:

next_step: a function that takes a distribution as input and returns the resulting distribution after one step of the Markov process.
k_max: the number of steps you want to watch.
n: the size of the game board



In [ ]:

    
def evolution(next_step, k_max=180, n=36):
    # Turn interactive plotting off
    plt.ioff()
    # Initiate figure
    fig, ax = subplots()
    # Initiate probability distribution: the initial position of the player is known: she is in square 0
    P = zeros(n)
    P[0] = 1
    # Display probability
    pbar = ax.bar(range(n), P, 1)
    xlim([0, n])

    #Init only required for blitting to give a clean slate.
    def init():
        for rect, y in zip(pbar, P):
            rect.set_height(y)
        return pbar
    
    # animate tells what to do at step i of the process
    def animate(i):
        for rect, y in zip(pbar, P):
            rect.set_height(y)
        P[:] = next_step(P)  # Update the values using the next_step function
        return pbar
    
    # create the animation object
    ani = animation.FuncAnimation(
        fig, animate, frames=k_max, init_func=init, interval=50, blit=True)
    
    # return the video as HTML to display in the output of the cell
    return HTML(ani.to_html5_video())

The rule for the toy example is to move one case forward. This can be easily done with the function roll from numpy package.



In [ ]:

    
def next_step(P):
    # Roll, baby!
    return roll(P, 1)

We now call the function evolution. It can take a few seconds to initiate, depending on the k_max you choose.



In [ ]:

    
evolution(next_step, 180)

Question 1

Are the condition of the Perron-Frobenius met in the toy example above? Justify your answers by:

commenting the animation above;
and using the theoretical definitions (in particular, you should clearly define the Markov chain you consider).

Answer:

Question 2

We now change the rules. At each turn, the player tosses an unbiased coin. If it is a head, she moves forward $ a = 1 $ step. If it is a tail, she moves forward $ b = 2 $ steps. Visualize the evolution of the position of the player. Comment what you observe.

Remark: The game rules are not trivial any more, so that we will play with more general Markov chains. Updating the vector P by applying the function roll won't be enough anymore. You may want to use dot(P, A), which multiplies a vector $P$ and a matrix $A$, with a properly built matrix $A$. To do so, we define below the function next_stepA(P) that takes a vector $P$ as input. In each question of the exercise, you will simply have to

build the matrix $A$ that encodes the game rules described in the question;
run the instruction evolution(next_stepA, nb_of_steps) to generate the video, where nb_of_steps is the number of steps you wish to observe;
discuss the result.



In [ ]:

    
def next_stepA(P):
    return dot(P, A)

Answer:

Question 3

New rules. We now assume that $ a = 1 $ and $ b = i^2 $, where $ i $ is the current position of the player.

Visualize the evolution of the position of the player. Comment what you observe.

Answer:

Question 4

New rules. We now assume that $ a = 1 $ and $ b = 7 $.

Visualize the evolution of the position of the player. Comment what you observe.

Answer:

Question 5

New rules. We now assume that $ a = 2 $ and $ b = 4 $.

Visualize the evolution of the position of the player. Comment what you observe.

Answer:

Question 6

New rule. When the player stands on square 35, she rolls a dice. If she scores 6 (which happens with probability $\frac16$), she realizes that the game is pointless and quits the game. Redo questions 2 to 5 with this new rule (theory, practice, discussions...).

Answer:

2. PageRank on a small graph (if time)

The goal of this second exercise is to test a few variants of PageRank on small graphs and see how they differ. The next practical will give you the opportunity to experience PageRank on larger (and more realistic) graphs.

Let $ h $ be an integer (we will use $h = 2$ for the toy evaluation and $h = 9$ for the complete evaluation). We consider the following web graph $ G $:

There are $ n = 2^{h+1}-1 $ nodes (representing web pages) numbered from $0$ to $ n-1 = 2^{h+1}-2 $.
The oriented edges (representing hyperlinks) are as follows:
- For each node $ i\geq 1 $, there is an edge from node $ i $ to node $ i - 1 $.
- For each node $ i\geq 1 $, there is an edge from node $ i $ to node $ \left\lfloor \frac{i-1}2 \right\rfloor $.
- For each node $j$ such that $ j \in [2^h-1, 2^{h+1}-2] $, there is an edge from $0$ to $j$.

Remark: there is at most one edge from a node $i$ to a node $j$, even when several rules indicate the existence of the edge.

Question 7

Build the transition matrix $A$ associated to the Markov chain defined by the position of the random surfer on the graph $G$ described above. For memory, this matrix is defined by $ a_{i,j} = \frac1{\text{deg}_+(i)}$ if there is an edge from $ i $ to $ j $, and $a_{i,j} = 0$ otherwise (where $\text{deg}_+(i)$ is the outdegree of a node $i$).

Answer:

Question 8

Try to guess which nodes of the graph are important. Justify your answer (e.g. copying/pasting the answer to the next question is not enough).

Answer:

Question 9

Compute the PageRank $P$ which is a solution of $ P = P A $. Proceed iteratively, starting from the uniform distribution $ P = \frac{\mathbf{1}}n $ and updating $ P \leftarrow P A $ until $ ||P-PA||_1<\epsilon $ (recommendation: $ \epsilon = 10^{-8} $).

You function should display:

the current top ten (that is, the ten nodes that have highest value in current $P$) at the end of each of the first 10 iterations,
the top ten at the end of the last iteration,
and the total number of iterations.

You should impose a maximum number of authorized iterations to avoid infinite loops (recommendation: 2000 iterations). When reaching the maximal number of iterations, your function should display a message saying the process has not converged and giving the last top ten that was computed.

Answer:

Question 10

We supplement the web graph $G$ with $b = 10$ new nodes numbered from $ n $ to $ n+b-1 $. For each new node $ i $, we add an edge from $ i-1 $ to $i$.

Do Questions 7, 8, and 9 over with this new graph. What happens (in theory and in practice)?
Use $ P \leftarrow \frac{P A}{||PA||_1} $ instead of $P \leftarrow P A$ to update $P$. What happens?

Answer:

Question 11

We add a single edge to the previous graph, from node $ n+b-1 $ to node $ n+b-2 $.

Do Questions 7, 8, and 9 over with this new graph, along with the update proposed in Question 10. What happens (in theory and in practice)?
Use $ P \leftarrow d P A + (1-d)\frac{\mathbf{1}}n $ for updating $P$, with $d = 0.85$. What happens? Can you explain it?
(Bonus) Try numerically to simply use $ P \leftarrow d P A + \mathbf{1}$ for updating $P$. Can you explain the result?

Answer:

Question 12

In case you haven't done it yet, please compare and discuss the convergence and rankings you observed in this practical.

Answer: