INF 674 S7: Markov Chains

Céline Comte & Fabien Mathieu

2016-2017

Remark: the number of this Session is 7, and not 6, because the sessions numbers have been renumbered:

S0 Python Introduction
S1 Galton-Watson
S2 Erdös-Rényi
S3 Competitive Epidemics
S4 Small-Worlds
S5 Power Laws
S6 Distances and Clustering
S7 Markov Chains
S8 PageRank on a real dataset
S9 Navigability

The objective of this practical is to make you see a Markov chain in action. In particular, you will observe what happens if the conditions of the Perron-Frobenius theorem are not met.

Things to know

Right stochastic matrix Let $A = (a_{i,j})$ be a $n\times n$ matrix. We say that $A$ is a right stochastic matrix if:

The coefficients of $A$ are non-negative: $\forall i, j, a_{i,j}\geq 0$.
Each row sums to 1: $\forall i, \sum_j a_{i,j} = 1$.

Homogeneous Markov chain: A right stochastic matrix $A$ can define a Markov Chain that describes the evolution of a distribution over $n$ states as follows: if one is in state $i$ at step $k$ of the process, the probability to be in state $j$ at step $k+1$ is $a_{i,j}$.

With this notation, the evolution of a Markov chain is easy to study: if $P_k$ is the probability distribution at step $k$ ($P_k\geq 0, \sum_{i}P_k[i]=1$), then if have

$$P_{k+1}=P_k A$$

irreducibility: Let $A$ be a non-negative matrix $A$ ($\forall i,j, A[i,j]\geq 0$). Let $G=(V,E)$ be the oriented graph associated to $A$: $(i,j)\in E$ if, and only if $A[i,j]>0$. The following propositions are equivalent:

$A$ is irreducible
$\forall i,j, \exists k>0, A^k[i,j]>0$
$G$ is strongly connected: $\forall (i,j)\in V^2$, there exists an oriented path in $G$ from $i$ to $j$. Intuitively, the irreducibility property indicates that starting from any state, any state can be reached with a positive (e.g. >0) probability after some steps.

aperiodicity: (from wikipedia) For an index $i$, the period of $i$ is the greatest common divisor of all natural numbers $k$ such that $A^k[i,i]>0$. When $A$ is irreducible, the period of every index is the same and is called the period of $A$.

If the period is 1, $A$ is aperiodic.

Intuitively, a period $k>1$ indicates that the length of any cycle must be a multiple of $k$.

Perron-Frobenius Theorem (a variant, actually) If $A$ is right stochastic, irreducible and aperiodic, then $A^k\xrightarrow[k\to \infty]{} B$, where $B$ is the right stochastic matrix having all its rows equal to the same row vector $Q$ defined as the unique normalized solution to the equation $QA = Q$.

Interpretation: when the condition of the Perron-Frobenius theorem are met, the process will eventually converge to a unique distribution, which does not depend of its initial state, which is forgotten.

Markov chains animations

To start with, a small experiment that plays with the assumptions of Perron-Frobenius theorem.

Consider the following game: you have a circular game board made of $ n = 36 $ squares numbered from $ 0 $ to $ 35 $. At (discrete) turn $ t=0 $, a player stands on square $ 0 $. Between two turns, the player moves from a certain number of squares. Remember that the board is circular: if the player is in square $35$ and moves one square forward, she lands in square $0$. We propose to visualize the evolution of the (probabilistic) position of the player on the board.

We give you the code to visualize the game where at each turn, the player moves one square forward.

First, evaluate the following cells. Note that you should install ffmpeg if it is not available in your system (https://ffmpeg.org/ ). If you are on a TPT machine, ffmpeg is not available, use the writer avconv instead.



In [ ]:

    
%pylab inline



In [ ]:

    
from matplotlib import animation, rc
from IPython.display import HTML

# use writer = 'avconv' on TPT computers
rc('animation', html='html5', writer = 'ffmpeg')

xkcd()

n = 36

Remark: the use of xkcd() is just to have funny movies. You can remove it.

The following function will be used to display the evolution of a distribution. It takes two arguments:

next_step is a function that takes a distribution as input and ouputs the resulting distribution after one step of the Markov process.
k_max indicates the number of steps you want to watch.



In [ ]:

    
def evolution(next_step, k_max):
    # Turn interactive plotting off
    fig, ax = subplots()
    # Initiate probability distribution: the position of the player is known.
    P = zeros(n)
    P[0] = 1
    # Display probability
    pbar = ax.bar(range(n),P,1)
    xlim([0,n])
    #Init only required for blitting to give a clean slate.
    def init():
        for rect, y in zip(pbar, P):
            rect.set_height(y)        
        return pbar
    def animate(i):
        P[:] = next_step(P) # Update the values using the next_step function
        for rect, y in zip(pbar, P):
            rect.set_height(y)        
        return pbar
    ani = animation.FuncAnimation(fig, animate, frames = k_max, init_func=init,
    interval=25, blit=True)
    return ani

The rule for the toy example is to move one case forward. This can be easily done with the np.roll function.



In [ ]:

    
def next_step(P):
    # Roll, baby!
    return roll(P, 1)

We now call evolution. The %%capture command hides a static picture that would be shown otherwise.



In [ ]:

    
%%capture
ani = evolution(next_step,180)

And... Showtime (can take a few seconds to initiate, depending on the tmax you set)



In [ ]:

    
ani

Question 1

Are the condition of the Perron-Frobenius met? Justify your answers:

using the theoretical definitions
by commenting the animation above

Answer:

Question 2

We now change the rules. At each turn, the player tosses an unbiased coin. If it is a head, she moves forward $ a = 1 $ step. If it is a tail, she moves forward $ b = 2 $ steps.

As we are now playing with more general Markov chains, roll may not be enough. You may want to use dot(P, A), which multiplies a vector P and a matrix A



In [ ]:

    
def next_stepA(P):
    return dot(P, A)

Visualize the evolution of the position of the player. Comment what you observe.

Answer:

Question 3

New rules. We assume now $ a = 1 $, $ b = i^2 $, where $ i $ is the number of the square where the player stands.

Visualize the evolution of the position of the player. Comment what you observe.

Answer:

There is convergence. Note compared to previous case that:

It is much faster (if you want to go deeper in that direction, the reason is that the spectral gap is greater with the new rules).
The asymptotic distribution is very heterogeneous.

Question 4

New rules. We assume now $ a = 1 $, $ b = 7 $.

Visualize the evolution of the position of the player. Comment what you observe.

Answer:

We have not the aperiodicity condition. In fact it is quite easy to see that the period of the graph is 6: consider a loop. It is mad of a certain number of $a$ jumps, say $i$, and a certain number of $b$, say $j$. The fact that it is a loop means that $i a + j b = 36 k$ for some $k$, i.e. $i+7j = 36k$. Hence we have $$(i+7j)\%6 = (i+j)\%6 = 36k\%6 = 0$$

In other words, $i+j$, the length of the loop, is a multiple of 6.

This cycle of 6 is indeed easy to see on the animation.

Question 5

New rules. We assume now $ a = 2 $, $ b = 4 $.

Visualize the evolution of the position of the player. Comment what you observe.

Answer:

It is like question 2 except that only even squares are covered.

Regarding Perron-Frobenius, the irreducible property is lost. you have two strongly connected components: even squares and odd squares (there are indeed disconnected from each other).
The practical effect of loosing irreducibility is that you can have states with 0 probability (the odd squares), whereas when all conditions of Perron-Frobenius are met, all states have a positive (e.g. >0) probability in the steady state distribution.
While $2 \wedge 4 = 2$, 2 is NOT a period of the graph (the graph is actually aperiodic on both of its strongly connected components).

Question 6

With probability $ 1/5 $, the coin lands on its edge (it is a very thick coin). Default procedure: toss again until is does not land on its edge. Exception: if it lands on its edge while the player stands on square 35, she realizes that the game is pointless and quits the game. Redo questions 2 to 5 with this new rule.

Answer:

A remark first: many of you have assumed that if you toss an edge on a different square than 35, you have to skip a square. What was intended for squares less than 35 was that a step was the action of tossing until you have head or tail, then move. Indeed the problem was probably not enough precise on that point, so it is OK if you implemented loops for squares less than 35.

PageRank on a small graph (if time)

The goal of this last part is to test a few variants of PageRank on small graphs and see how they differ.

It is entirely optional, but it can help you before computing PageRank on bigger graphs.

Let $ h $ be an integer (use $ h $ = 3 for toy evaluation, $ h = 9 $ for complete evaluation).

Let $ G $ be the following graph:

There are $ n = 2^{h+1}-1 $ nodes numbered from $0$ to $ 2^{h+1}-2 $.
The oriented edges are as follows:
- For all nodes $ i\geq 1 $, there is an edge from $ i $ to $ \lfloor (i-1) / 2 \rfloor $.
- For all nodes $ i\geq 1 $, there is an edge from $ i $ to $ i - 1 $.
- For all nodes $j$ such that $ j \in [2^h-1, 2^{h+1}-2] $, there is an edge from $0$ to $j$.

Question 1

Build the stochastic matrix $A$ associated to $G$, defined by $ A(i,j) = 1/deg_+(i)$ if there is an edge from $ i $ to $ j $, 0 otherwise. ($deg_+$: outdegree)

Answer:

Question 2

Try to guess which nodes of the graph are important. Justify your answer (e.g. copying/pasting the answer from next question is not enough).

Answer:

Question 3

Compute the PageRank $P$ which is a solution of $ P = P A $. Proceed iteratively starting from $ P = \mathbf{1}/n $ and updating $ P \leftarrow P A $ until $ ||P-PA||_1<\epsilon $ (recommendation: $ \epsilon = 10^{-8} $). Display for the $10$ first iterations the current top ten (the ten nodes that have highest value in current $P$), the total number of iterations and the final top ten.

Configure a maximal number of authorized iterations to avoid infinite loops (recommendation: 2000 iterations). When reaching the maximal number of iterations, your function should display a message saying the process has not converged and give the past top ten obtained.

Answer:

We observe the expected ranking.

Question 4

We add to $b$ new nodes to $G$ ($ b = 10 $) numbered from $ n $ to $ n+b-1 $. For each new node $ i $, we add an edge from $ i-1 $ to $i$.

Do Questions 1, 2, 3 over with this new graph. What happens (in theory and in practice)?
Use $ P \leftarrow P A / ||PA||_1 $ for updating $P$. What happens?

Answer:

First, let us do the modification, then compute the PageRank.

It seems the convergence cannot be observed. The reason is that we introduced a leak on the last node. In theory, $P$ should converge towards 0, but

the leak is big enough to prevent $||P-PA||<\epsilon$ when $P$ stays large enough
the leak is small enough so that $P$ stays large enough for a long time

However, notice that the ranking after the maximum allowed number of iterations seems to be correct.

Now, change the algorithm so that $P$ stays normalized.

Convergence works again! (with a similar number of iterations)

Question 5

We add one edge to the previous graph, from $ n+b-1 $ to $ n+b-2 $.

Do Questions 1, 2, 3 over with this new graph, along with the update proposed in Question 4. What happens (in theory and in practice)?
Use $ P \leftarrow d P A + (1-d)\mathbf{1} $ for updating $P$, wirh $d = 0.85$. What happens?

Answer:

First, change the graph by adding one last edge.

No convergence.
In theory, the nodes 1031 and 1032 form a probability trap: it is a strongly connected component that can be reached from any other node and from which you cannot exit. So they should be ranked first, with all other nodes ranked last.
Yet, the returned ranking is [0, 1031, 1032, ...]: 1031 and 1032 did not have enough time to converge.
Remark that normalizing does not change anything: there is no leak to compensate for.
It is not very satisfying that one single edge completely changes the ranking
Now, let see what happens with damping.

Faster convergence
The ranking [0, 1, 2, ...] is returned: the effect of local traps is ignored.

Question 6

Compare and discuss the convergence and rankings you observed in this practical.

Answer:

See the markdown above.