An algorithm for two-player repeated games with perfect monitoring

Paper Authors: Dilip Abreu and Yuliy Sannikov

Notebook Author: Chase Coleman

This notebook is meant to be for a "quick" overview of what is presented in the paper. If you find this helpful, then we highly recommend reading the original paper as it is more thorough in its presentation of the material.

Abstract

Consider repeated two-player games with perfect monitoring and discounting. We provide an algorithm that computes the set $V^*$ of payoff pairs of all pure-strategy subgame perfect equilibria with public randomization. The algorithm provides significant efficiency gains over existing implementations of the algorithm from Abreu, Pearce, Stacchetti (1990). These efficiency gains arise from a better understanding of the manner in which extreme points of the equilibrium payoff set are generated. An important theoretical implication of our algorithm is that the set of extreme points $E$ of $V^*$ is finite. Indeed, $|E| \leq 3 |A|$, where $A$ is the set of action profiles of the stage game.

Set up

This paper considers a two-player repeated game with finite simultaneous actions and discounting. The stage game will be denoted by $G = \{N, \{ A_i \}_{i \in N}, \{g_i\}_{i \in N} \}$ in which players $N=\{1, 2\}$ share a discount factor $\delta$.

We will begin with some definitions that will make the presentation more clean.

  • Define the best possible payoff from in response to the other player choosing $a_j$. $$\bar{g_i}(a_j) := \max_{a_i} g_i(a_i, a_j)$$
  • Define the gain from deviating from a specified set of actions by $$h_i(a) = \bar{g_i}(a_j) - g_i(a)$$
  • Define the worst punishment an agent can be given for a specific set of continuation values $X$ by $$P_i(X) = \min \{ x_i | (x_1, x_2) \in X \text{ for some } x_j \}$$
  • Define the set of values associated with all subgame-perfect equilibria as $V^*$.|

We will say that a point $v \in \mathcal{R}$ is generated by the set $X$ if there is an action pair $a \in A$ in the current period and a pair of continuation values $w \in X$ such that:

  • It can be obtained by $(a, w)$ $\Rightarrow$ $v = (1-\delta)g(a) + \delta w$
  • It is incentive compatible $\Rightarrow$ $\delta(w - P(X)) \geq (1-\delta)h(a)$

Then in conjunction with previous APS papers, we will say that a set $X$ is self-generating if each extreme point of $X$ can be generated by $X$.

Algorithm proposed by APS

In their original papers, Abreu, Pearce, and Stacchetti proposed a repeated application of a "$B$" operator to a set of continuation values and showed that the operator satisfied certain conditions that imply it will converge to the set of values which correspond to the subgame-perfect equilibria. The APS papers prove (among other things) that $B^n(W) \rightarrow V^*$. The $B$ operator is defined by

$$B(W) = \text{co}\{v | \exists w \in W, a \in A \text{ s.t } v=(1-\delta)g(a) + \delta w \text{ and } \delta(w - P(W)) \geq (1-\delta)h(a) \}$$

Given a current set of continuation values, $W$, this operation could be split into several simple steps

  1. Given an action profile, $a$, and a threat point, $u \in \mathcal{R}^2$, we can define a set $Q(a, W, u) := \{w \in \mathcal{R}^2 | \delta (w - u) \geq (1-\delta) h(a) \} \cap W$.
  2. Define $B_a(W) = (1-\delta)g(a) + \delta Q(a, W, P(W))$
  3. $B(W) = \cup_{a \in A} B_a(W)$

One potential issue of this approach in a numerical setting is that the number of vertices in the set $B_a(W)$ isn't necessarily bounded -- Luckily, Abreu Sannikov will show (using the following theorem) that we can bound this and give a specific bound.

Theorem 1: Any action profile $a$ such that $g(a) \geq \underline{v} + (1-\delta)/\delta h(a)$ generates at most one extreme point of $V^*$, $v = g(a)$. Any action profile for which

  • (1) $g_1(a) < \underline{v}_1 + \frac{1-\delta}{\delta} h_1(a)$ or
  • (2) $g_2(a) < \underline{v}_2 + \frac{1-\delta}{\delta} h_2(a)$

generates at most four extreme points of $V^*$, using continuation payoff vectors $w$ that are extreme points of $Q(a, V^*, \underline{v})$ such that

  • $\delta (w_1 - \underline{v}_1) = (1-\delta) h_1(a)$ or
  • $\delta (w_2 - \underline{v}_2) = (1-\delta) h_2(a)$

This bound on the number of extreme points is a particularly important theoretical result. The authors are able to show in the appendix that the bound of extreme points for $V^*$ actually is less than or equal to $3 |A|$ which is stronger than this theorem which implies it is less than or equal to $4 |A|$. These results rely on the way in which incentive constraints bind in different cases.

Algorithm proposed by Abreu Sannikov

This paper introduces a new way to compute the subgame-perfect value set. Instead of the $B$ operator, it uses a slightly stronger operator which the authors refer to as the $R$ operator. Given a current set of continuation values, $W$, we can describe this operator in steps that are similar to the $B$ operator.

  1. Define the same $Q(a, W, u)$ operation as before -- Namely, $Q(a, W, u) := \{w \in \mathcal{R}^2 | \delta (w - u) \geq (1-\delta) h(a) \} \cap W$

  2. Given an action profile $a$, and a threat point, $u \in \mathcal{R}^2$ define $$C(a, W, u) = \begin{cases} \{ g(a) \} \text{ if } \delta (g(a) - u) \geq (1-\delta) h(a) \\ \{ x \in \text{Ext} \left( Q(a, W, u) \right) | \delta (w_i - u_i) = (1-\delta)h_i(a) \text{ for some $i$}\} \text{ otherwise }\end{cases}$$ where $\text{Ext}(X)$ gives the extreme points of a set $X$.

  3. $R(W, u) = \text{co} \cup_{a \in A} (1-\delta) g(a) + \delta C(a, W, u)$

Theorem 1 implies that $v^* = R(V^*, \underline{v})$, but it doesn't necessarily imply that repeated application of the $R$ operator will result in convergence to the set $V^*$. The authors find a way to get around this issue by showing that although the $R$ operator is not necessarily monotonic in $u$, it is monotonic in $W$. Monotonicity in $W$ means given $W \subset W'$, then, for a given $u$, $R(W, u) \subset R(W', u)$. In order to deal with the lack of monotonicity in $u$, the authors define a set $V(u) = B(V(u), u)$. They are then able to show that $V^* \subseteq V(u)$ whenever $u \leq \underline{v}$.

Theorem 2: $V(u) = R(V(u), u)$

TODO: Add more theoretical exposition here. The point is that they show that repeated application of the $R$ operator (under $V^* \neq \emptyset$) converges towards $V^*$.

The algorithm then takes the form of

  • Start with $W^0 \supset V^*$, a vector $u^0 \in \mathcal{R}^2$ such that $u^0 \leq \underline{v}$
  • Inductively compute $W^n = R(W^{n-1}, u^{n-1})$ and denote $u^n = \max(u^{n-1}, P(W^n))$.

In [ ]: