An algorithm for two-player repeated games with perfect monitoring

Authors: Dilip Abreu and Yuliy Sannikov

This notebook is meant to be for a "quick" overview of what is presented in the paper. If you find this helpful, then we highly recommend reading the original paper as it is more thorough in its presentation of the material.

Abstract

Consider repeated two-player games with perfect monitoring and discounting. We provide an algorithm that computes the set $V^*$ of payoff pairs of all pure-strategy subgame perfect equilibria with public randomization. The algorithm provides significant efficiency gains over existing implementations of the algorithm from Abreu, Pearce, Stacchetti (1990). These efficiency gains arise from a better understanding of the manner in which extreme points of the equilibrium payoff set are generated. An important theoretical implication of our algorithm is that the set of extreme points $E$ of $V^*$ is finite. Indeed, $|E| \leq 3 |A|$, where $A$ is the set of action profiles of the stage game.

Set up

This paper considers a two-player repeated game with finite simultaneous actions and discounting. The stage game will be denoted by $G = \{N, \{ A_i \}_{i \in N}, \{g_i\}_{i \in N} \}$ in which players $N=\{1, 2\}$ share a discount factor $\delta$.

We will begin with some definitions that will make the presentation more clean.

  • Define the best possible payoff from in response to the other player choosing $a_j$. $$\bar{g_i}(a_j) := \max_{a_i} g_i(a_i, a_j)$$
  • Define the gain from deviating from a specified set of actions by $$h_i(a) = \bar{g_i}(a_j) - g_i(a)$$
  • Define the worst punishment an agent can be given for a specific set of continuation values $X$ by $$P_i(X) = \min \{ x_i | (x_1, x_2) \in X \text{ for some } x_j \}$$
  • Define the set of values associated with all subgame-perfect equilibria as $V^*$.|

We will say that a point $v \in \mathcal{R}$ is generated by the set $X$ if there is an action pair $a \in A$ in the current period and a pair of continuation values $w \in X$ such that:

  • It can be obtained by $(a, w)$ --> $v = (1-\delta)g(a) + \delta w$
  • It is incentive compatible --> $\delta(w - P(X)) \geq (1-\delta)h(a)$

Then in conjunction with previous APS papers, we will say that a set $X$ is self-generating if each extreme point of $X$ can be generated by $X$.

Algorithm proposed by APS

In their original papers, Abreu, Pearce, and Stacchetti proposed a repeated application of a "$B$" operator to a set of continuation values and showed that the operator satisfied certain conditions that imply it will converge to the set of values which correspond to the subgame-perfect equilibria. The APS papers prove (among other things) that $B^n(W) \rightarrow V^*$. The $B$ operator is defined by

$$B(W) = \text{co}\{v | \exists w \in W, a \in A \text{ s.t } v=(1-\delta)g(a) + \delta w \text{ and } \delta(w - P(W)) \geq (1-\delta)h(a) \}$$

Given a current set of continuation values, $W$, this operation could be split into several simple steps

  1. Given an action profile, $a$, and a threat point, $u \in \mathcal{R}^2$, we can define a set $Q(a, W, u) := \{w \in \mathcal{R}^2 | \delta (w - u) \geq (1-\delta) h(a) \} \cap W$.
  2. Define $B_a(W) = (1-\delta)g(a) + \delta Q(a, W, P(W))$
  3. $B(W) = \cup_{a \in A} B_a(W)$

One potential issue of this approach in a numerical setting is that the number of vertices in the set $B_a(W)$ isn't necessarily bounded -- Luckily, Abreu Sannikov will show (using the following theorem) that we can bound this and give a specific bound.

Theorem 1: Any action profile $a$ such that $g(a) \geq \underline{v} + (1-\delta)/\delta h(a)$ generates at most one extreme point of $V^*$, $v = g(a)$. Any action profile for which

  • (1) $g_1(a) < \underline{v}_1 + \frac{1-\delta}{\delta} h_1(a)$ or
  • (2) $g_2(a) < \underline{v}_2 + \frac{1-\delta}{\delta} h_2(a)$

generates at most four extreme points of $V^*$, using continuation payoff vectors $w$ that are extreme points of $Q(a, V^*, \underline{v})$ such that

  • $\delta (w_1 - \underline{v}_1) = (1-\delta) h_1(a)$ or
  • $\delta (w_2 - \underline{v}_2) = (1-\delta) h_2(a)$

This bound on the number of extreme points is a particularly important theoretical result and relies on the fact that (insert more here).

Algorithm proposed by Abreu Sannikov

This paper introduces a new way to compute the subgame-perfect value set. Instead of the $B$ operator, it uses a slightly stronger operator which the authors refer to as the $R$ operator.


In [ ]: