Given a two player game $(A,B)\in{\mathbb{R}^{m\times n}}^2$, referred to as a stage game, a $T$-stage repeated game is a game in which players play that stage game for $T > 0$ periods. Players make decisions based on the full history of play over all the periods.
For example consider the game:
$$ A = \begin{pmatrix} 0 & 6 & 1\\ 1 & 7 & 5 \end{pmatrix} \qquad B = \begin{pmatrix} 0 & 3 & 1\\ 1 & 0 & 1 \end{pmatrix} $$by identifying the best responses:
$$ A = \begin{pmatrix} 0 & 6 & 1\\ \underline{1} & \underline{7} & \underline{5} \end{pmatrix} \qquad B = \begin{pmatrix} 0 & \underline{3} & 1\\ \underline{1} & 0 & \underline{1} \end{pmatrix} $$it is immediate to find two Nash equilibria:
$$((0,1), (1,0,0))\qquad((0,1), (0,0,1))$$
In [1]:
import nashpy as nash
import numpy as np
A = np.array([[0, 6, 1],
[1, 7, 5]])
B = np.array([[0, 3, 1],
[1, 0, 1]])
game = nash.Game(A, B)
list(game.support_enumeration())
Out[1]:
If we were to repeat this game twice ($T=2$) we obtain a new game. However to be able to think of this we need to define what a strategy in a repeated game is.
Given a two player stage game $(A,B)\in{\mathbb{R}^{m\times n}}^2$, repeated to give a $T$-stage repeated game. A strategy is a mapping from the entire history of play to an action of the stage game:
$$ \bigcup_{t=0}^{T-1}H(t)\to a $$where:
To help avoid confusion, whilst we have referred to pure strategies as choices made in stage games, here we will call those actions.
The actions for our example:
A strategy for the row/column player thus needs to map an element of the following set to an element of $\{r_1, r_2\}$/$\{c_1, c_2, c_3\}$:
$$\bigcup_{t=0}^{1}H(t)=\{(\emptyset, \emptyset), (r_1, c_1), (r_1, c_2), (r_1, c_3), (r_2, c_1), (r_2, c_2), (r_3, c_3)\}$$In other words, in our example, a strategy answers both of the following questions:
The following theorem allows us to find a Nash equilibrium:
For any repeated game, any sequence of stage Nash profiles gives a Nash equilibrium.
Consider the following strategy:
The row/column player should play action $a_{r/c}$ regardless of the play of any previous strategy profiles.
where $(a_{r}, a_{c})$ is a given stage Nash equilibrium.
Using backwards induction, this is a Nash equilibrium for the last stage game. Thus, at the last stage, no player has a reason to deviate. Similarly at the $T-1$th stage. The proof follows.
Thus, for our example we have the four Nash equilibria:
Note however that it is not the only equilibria for our repeated game.
In a repeated game it is possible for players to encode reputation and trust in their strategies.
Consider the following two strategies:
For the row player:
$$(\emptyset, \emptyset) \to r_1$$ $$(r_1, c_1) \to r_2$$ $$(r_1, c_2) \to r_2$$ $$(r_1, c_3) \to r_2$$
For the column player:
$$(\emptyset, \emptyset) \to c_2$$ $$(r_1, c_2) \to c_3$$ $$(r_2, c_2) \to c_1$$
Note that here we omit some of the histories which are not possible based on the first play by each player.
This strategy corresponds to the following scenario:
Play $(r_1,c_2)$ in first stage and $(r_2,c_3)$ in second stage unless the row player does not cooperate in which case play $(r_2, c_1)$.
If both players play these strategies their utilities are: $(11, 4)$ which is better for both players then the utilities at any sequence of pure stage Nash equilibria. But is this a Nash equilibrium? To find out we investigate if either player has an incentive to deviate.
Thus this strategy pair is a Nash equilibrium and evidences how a reputation can be built and cooperation can emerge from complex dynamics.