Adversarial Search

This notebook serves as supporting material for the chapter Adversarial Search. Mathematical game theory views any multiagent environment as a game, provided that impact of each agent on the other is "significant", regardless of whether the agents are cooperative or competitive. In this chapter we cover competitive environments, in which the agents' goals are in conflict, giving rise to adversarial search problems-often known as games. The discussion begins with a definition of the optimal move and an algorithm for finding it. We then look at techniques for choosing a good move when the time is limited.


In [8]:
%classpath add jar ../out/artifacts/aima_core_jar/aima-core.jar

Games

We first consider games with 2 players, whom we call MAX and MIN. MAX moves first, and then they take turns moving until the game is over. Now let's formally define a game. According to the textbook, a game can be formally defined as a kind of search problems with the following elements:

  • $S_0$: The initial state, which specifies how the game is set up at the start.
  • $PLAYER(s)$: Defines which player has the move in a state.
  • $ACTIONS(s)$: Returns the set of legal moves in a state.
  • $RESULT(s,a)$: The transition model, which defines the result of a move.
  • $TERMINAL$-$TEST(s)$: A terminal test, which is true when the game is over and false otherwise. States, where the game has ended, are called terminal states.
  • $UTILITY(s,p)$: A utility function defines the final numeric value for a game that ends in terminal state $s$ for a player $p$. For example, in chess, the outcome is a win, lose, or draw, with values +1, 0, or 1/2.

This six component structure is implemented as an interface named Game.java in the repository. Let's have a look at the implementation

public interface Game<S, A, P> {

    S getInitialState();

    P[] getPlayers();

    P getPlayer(S state);

    List<A> getActions(S state);

    S getResult(S state, A action);

    boolean isTerminal(S state);

    double getUtility(S state, P player);
}

The states, actions and players are represented by the generic variables S, A and P respectively. Clearly, the methods represent the six components of a particular problem as follows:

  • initial state ← getInitialState()
  • player having the move at current state ← getPlayer(S state)
  • applicable actions ← getActions(S state)
  • the transition model ← getResult(S state, A action)
  • the terminal test ← isTerminal(S state)
  • utility function ← getUtility(S state, P player)

The initial state, ACTIONS function, and RESULT function define the game tree for the game-a tree where the nodes are game states and edges are moves. Note that, regardless of the size of the game tree, it is MAX's job to search for a good move. We use the term search tree for a tree that is superimposed on the full game tree, and examines enough nodes to allow a player to determine what move to make.

Optimal Decision in Games

In Adversarial search, MIN's move impacts MAX's next move. MAX, therefore, must find a contingent strategy, which specifies MAX's move in the initial state, then MAX's moves in the states resulting from every possible response by MIN, then MAX's moves in the states resulting from every possible response by MIN to those moves, and so on. Let's begin with "How to find this optimal strategy?"

Given a game tree, the optimal strategy can be determined from the minimax value of each node, which we write as $MINIMAX(n)$. The minimax value of a node is the utility (for MAX) of being in the corresponding state, assuming that both players play optimally from there to the end of the game. Obviously, the minimax value of the terminal state is just its utility. Furthermore, given a choice, MAX prefers to move to a state of the maximum value, whereas MIN prefers a state of minimum value. Therefore:

$MINIMAX(s) = \left\{\begin{matrix} UTILITY(s) & if & TERMINAL-TEST(s) \\ max{_{a\in Actions(s)}MINIMAX(RESULT(s,a))} & if & PLAYER(s) = MAX\\ min{_{a\in Actions(s)}MINIMAX(RESULT(s,a))} & if & PLAYER(s) = MIN \end{matrix}\right.$

This definition of optimal play for MAX assumes that MIN also plays optimally-it maximizes the worst case outcome for MAX. If MIN does not play optimally then MAX will do even better.

The Minimax Algorithm

The minimax algorithm computes the minimax decision from the current state. It uses a simple recursive computation of the minimax values of each successor state. The recursion proceeds all the way down to the leaves of the game tree and then the minimax values are backed up through the tree as the recursion unwinds. Therefore, it performs a complete depth-first exploration of the game tree. If the maximum depth of the tree is $m$ and there are $b$ legal moves at each point, then the time complexity of the minimax algorithm is $O(b^m)$.

Let's have a look at the pseudo code of minimax decision.


In [9]:
%%python
from notebookUtils import *
pseudocode('Minimax Decision')


Out[9]:

AIMA3e

function MINIMAX-DECISION(state) returns an action
return arg max _a_ &Element ACTIONS(_s_) MIN-VALUE(RESULT(state, a))


function MAX-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v ← −∞
for each a in ACTIONS(state) do
   v ← MAX(v, MIN-VALUE(RESULT(state, a)))
return v


function MIN-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v ← ∞
for each a in ACTIONS(state) do
   v ← MIN(v, MAX-VALUE(RESULT(state, a)))
return v


Figure ?? An algorithm for calculating minimax decisions. It returns the action corresponding to the best possible move, that is, the move that leads to the outcome with the best utility, under the assumption that the opponent plays to minimize utility. The functions MAX-VALUE and MIN-VALUE go through the whole game tree, all the way to the leaves, to determine the backed-up value of a state. The notation argmax _a_ &Element _S_ f(a) computes the element a of set S that has maximum value of f(a).


function EXPECTIMINIMAX(s) =
 UTILITY(s) if TERMINAL-TEST(s)
 max_a_ EXPECTIMINIMAX(RESULT(s, a)) if PLAYER(s)= MAX
 min_a_ EXPECTIMINIMAX(RESULT(s, a)) if PLAYER(s)= MIN
 ∑_r_ P(r) EXPECTIMINIMAX(RESULT(s, r)) if PLAYER(s)= CHANCE

Above pseudo code is implemented here in the code repository. Now let's try to analyze the minimax decision in a Tic-Tac-toe game.


In [10]:
import aima.core.environment.tictactoe.TicTacToeGame;
import aima.core.environment.tictactoe.TicTacToeState;
import aima.core.search.adversarial.AdversarialSearch;
import aima.core.search.adversarial.MinimaxSearch;
import aima.core.util.datastructure.XYLocation;

System.out.println("MINI MAX DEMO\n");
TicTacToeGame game = new TicTacToeGame();
TicTacToeState currState = game.getInitialState();
AdversarialSearch<TicTacToeState, XYLocation> search = MinimaxSearch
        .createFor(game);
while (!(game.isTerminal(currState))) {
    System.out.println(game.getPlayer(currState) + "  playing ... ");
    XYLocation action = search.makeDecision(currState);
    currState = game.getResult(currState, action);
    System.out.println(currState);
}


MINI MAX DEMO

X  playing ... 
X - - 
- - - 
- - - 

O  playing ... 
X - - 
- O - 
- - - 

X  playing ... 
X - - 
X O - 
- - - 

O  playing ... 
X - - 
X O - 
O - - 

X  playing ... 
X - X 
X O - 
O - - 

O  playing ... 
X O X 
X O - 
O - - 

X  playing ... 
X O X 
X O - 
O X - 

O  playing ... 
X O X 
X O O 
O X - 

X  playing ... 
X O X 
X O O 
O X X 

Out[10]:
null

Optimal decisions in multiplayer games

Here we need to replace the single value for each node with a vector of values. For example, in a 3-player game with players A, B, and C, a vector $<v_a, v_b, v_c>$ is associated with each node. For terminal states, this vector gives the utility of the state from each player's viewpoint. The simplest way to implement this is to have the $UTILITY$ function return a vector of utilities. Hence the backed-up value of a node $n$ is always the utility vector of the successor state with the highest value for the player playing at $n$. It is important to notice that multiplayer games usually involve alliances among the players. Alliances are made and broken as the game proceeds. For example, suppose A and B are in a weak position and C is in a stronger position. then it is often optimal for both A and B to attack C rather than each other, lest C destroy each of them individually. In this way, collaboration emerges with purely selfish behavior. Of course, as soon as C weakens, the alliance loses its value, and either A or B could violate the agreement.

Alpha-Beta Pruning

The problem with minimax search is that the number of game states it has to examine is exponential in the depth of the tree. Unfortunately, we can't eliminate the exponent, but we can effectively cut it in half. The trick is that it is possible to compute the correct minimax decision without looking at every node in the game tree. The particular technique, called alpha-beta pruning, when applied to a standard minimax tree, returns the same move as the minimax would, but prunes away the branches that cannot influence the final decision. Alpha-beta pruning can be applied to trees of any depth, and it is often possible to prune entire subtrees rather than just leaves. The general principle is: consider a node $n$ somewhere in the tree such that player has a choice of moving to that node. If the player has a better choice $m$ either at the parent node of $n$ or at any choice-point further up, then n will never be reached in actual play. So once we have found out enough about $n$ to reach this conclusion, we can prune it.

Alpha and Beta are the 2 parameters that describe bounds on the backed-up value that appear anywhere along the path:

  • $\alpha$ = the highest value choice we have found so far at any choice-point along the path i.e. value of the best choice found so far for MAX
  • $\beta$ = the lowest value choice we have found so far at any choice-point along the path i.e. value of the best choice found so far for MIN

Alpha-beta search updates the value of $\alpha$ and $\beta$ as it goes along and prunes the remaining branch at a node as soon as the vvalue of the current node is known to be worse than the current $\alpha$ or $\beta$ value for MAX and MIN respectively.

Let's have a look at the pseudo code of Alpha-Beta Search.


In [11]:
%%python
from notebookUtils import *
pseudocode('Alpha Beta Search')


Out[11]:

AIMA3e

function ALPHA-BETA-SEARCH(state) returns an action
v ← MAX-VALUE(state, −∞, &plus∞)
return the action in ACTIONS(state) with value v


function MAX-VALUE(state, α, β) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v ← −∞
for each a in ACTIONS(state) do
   v ← MAX(v, MIN-VALUE(RESULT(state, a), α, β))
   if vβ then return v
   α ← MAX(α, v)
return v


function MIN-VALUE(state, α, β) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v ← &plus∞
for each a in ACTIONS(state) do
   v ← MIN(v, MAX-VALUE(RESULT(state, a), α, β))
   if vα then return v
   β ← MIN(β, v)
return v


Figure ?? The alpha-beta search algorithm. Notice that these routines are the same as the MINIMAX functions in Figure ??, except for the two lines in each of MIN-VALUE and MAX-VALUE that maintain α and β (and the bookkeeping to pass these parameters along).

Above pseudo code is implemented here. Let's analyze the states of TicTacToe game played using Alpha-Beta search.


In [12]:
import aima.core.environment.tictactoe.TicTacToeGame;
import aima.core.environment.tictactoe.TicTacToeState;
import aima.core.search.adversarial.AdversarialSearch;
import aima.core.search.adversarial.AlphaBetaSearch;
import aima.core.util.datastructure.XYLocation;

System.out.println("ALPHA BETA DEMO\n");
TicTacToeGame game = new TicTacToeGame();
TicTacToeState currState = game.getInitialState();
AdversarialSearch<TicTacToeState, XYLocation> search = AlphaBetaSearch
        .createFor(game);
while (!(game.isTerminal(currState))) {
    System.out.println(game.getPlayer(currState) + "  playing ... ");
    XYLocation action = search.makeDecision(currState);
    currState = game.getResult(currState, action);
    System.out.println(currState);
}


ALPHA BETA DEMO

X  playing ... 
X - - 
- - - 
- - - 

O  playing ... 
X - - 
- O - 
- - - 

X  playing ... 
X - - 
X O - 
- - - 

O  playing ... 
X - - 
X O - 
O - - 

X  playing ... 
X - X 
X O - 
O - - 

O  playing ... 
X O X 
X O - 
O - - 

X  playing ... 
X O X 
X O - 
O X - 

O  playing ... 
X O X 
X O O 
O X - 

X  playing ... 
X O X 
X O O 
O X X 

Out[12]:
null

Entire game tree and the optimal decision making in Tic-Tac-Toe game can be visualized here.


In [ ]: