This notebook serves as supporting material for the chapter Adversarial Search. Mathematical game theory views any multiagent environment as a game, provided that impact of each agent on the other is "significant", regardless of whether the agents are cooperative or competitive. In this chapter we cover competitive environments, in which the agents' goals are in conflict, giving rise to adversarial search problems-often known as games. The discussion begins with a definition of the optimal move and an algorithm for finding it. We then look at techniques for choosing a good move when the time is limited.
In [8]:
%classpath add jar ../out/artifacts/aima_core_jar/aima-core.jar
We first consider games with 2 players, whom we call MAX and MIN. MAX moves first, and then they take turns moving until the game is over. Now let's formally define a game. According to the textbook, a game can be formally defined as a kind of search problems with the following elements:
This six component structure is implemented as an interface named Game.java in the repository. Let's have a look at the implementation
public interface Game<S, A, P> {
S getInitialState();
P[] getPlayers();
P getPlayer(S state);
List<A> getActions(S state);
S getResult(S state, A action);
boolean isTerminal(S state);
double getUtility(S state, P player);
}
The states, actions and players are represented by the generic variables S
, A
and P
respectively. Clearly, the methods represent the
six components of a particular problem as follows:
getInitialState()
getPlayer(S state)
getActions(S state)
getResult(S state, A action)
isTerminal(S state)
getUtility(S state, P player)
The initial state, ACTIONS function, and RESULT function define the game tree for the game-a tree where the nodes are game states and edges are moves. Note that, regardless of the size of the game tree, it is MAX's job to search for a good move. We use the term search tree for a tree that is superimposed on the full game tree, and examines enough nodes to allow a player to determine what move to make.
In Adversarial search, MIN's move impacts MAX's next move. MAX, therefore, must find a contingent strategy, which specifies MAX's move in the initial state, then MAX's moves in the states resulting from every possible response by MIN, then MAX's moves in the states resulting from every possible response by MIN to those moves, and so on. Let's begin with "How to find this optimal strategy?"
Given a game tree, the optimal strategy can be determined from the minimax value of each node, which we write as $MINIMAX(n)$. The minimax value of a node is the utility (for MAX) of being in the corresponding state, assuming that both players play optimally from there to the end of the game. Obviously, the minimax value of the terminal state is just its utility. Furthermore, given a choice, MAX prefers to move to a state of the maximum value, whereas MIN prefers a state of minimum value. Therefore:
$MINIMAX(s) = \left\{\begin{matrix} UTILITY(s) & if & TERMINAL-TEST(s) \\ max{_{a\in Actions(s)}MINIMAX(RESULT(s,a))} & if & PLAYER(s) = MAX\\ min{_{a\in Actions(s)}MINIMAX(RESULT(s,a))} & if & PLAYER(s) = MIN \end{matrix}\right.$
This definition of optimal play for MAX assumes that MIN also plays optimally-it maximizes the worst case outcome for MAX. If MIN does not play optimally then MAX will do even better.
The minimax algorithm computes the minimax decision from the current state. It uses a simple recursive computation of the minimax values of each successor state. The recursion proceeds all the way down to the leaves of the game tree and then the minimax values are backed up through the tree as the recursion unwinds. Therefore, it performs a complete depth-first exploration of the game tree. If the maximum depth of the tree is $m$ and there are $b$ legal moves at each point, then the time complexity of the minimax algorithm is $O(b^m)$.
Let's have a look at the pseudo code of minimax decision.
In [9]:
%%python
from notebookUtils import *
pseudocode('Minimax Decision')
Out[9]:
Above pseudo code is implemented here in the code repository. Now let's try to analyze the minimax decision in a Tic-Tac-toe game.
In [10]:
import aima.core.environment.tictactoe.TicTacToeGame;
import aima.core.environment.tictactoe.TicTacToeState;
import aima.core.search.adversarial.AdversarialSearch;
import aima.core.search.adversarial.MinimaxSearch;
import aima.core.util.datastructure.XYLocation;
System.out.println("MINI MAX DEMO\n");
TicTacToeGame game = new TicTacToeGame();
TicTacToeState currState = game.getInitialState();
AdversarialSearch<TicTacToeState, XYLocation> search = MinimaxSearch
.createFor(game);
while (!(game.isTerminal(currState))) {
System.out.println(game.getPlayer(currState) + " playing ... ");
XYLocation action = search.makeDecision(currState);
currState = game.getResult(currState, action);
System.out.println(currState);
}
Out[10]:
Here we need to replace the single value for each node with a vector of values. For example, in a 3-player game with players A, B, and C, a vector $<v_a, v_b, v_c>$ is associated with each node. For terminal states, this vector gives the utility of the state from each player's viewpoint. The simplest way to implement this is to have the $UTILITY$ function return a vector of utilities. Hence the backed-up value of a node $n$ is always the utility vector of the successor state with the highest value for the player playing at $n$. It is important to notice that multiplayer games usually involve alliances among the players. Alliances are made and broken as the game proceeds. For example, suppose A and B are in a weak position and C is in a stronger position. then it is often optimal for both A and B to attack C rather than each other, lest C destroy each of them individually. In this way, collaboration emerges with purely selfish behavior. Of course, as soon as C weakens, the alliance loses its value, and either A or B could violate the agreement.
The problem with minimax search is that the number of game states it has to examine is exponential in the depth of the tree. Unfortunately, we can't eliminate the exponent, but we can effectively cut it in half. The trick is that it is possible to compute the correct minimax decision without looking at every node in the game tree. The particular technique, called alpha-beta pruning, when applied to a standard minimax tree, returns the same move as the minimax would, but prunes away the branches that cannot influence the final decision. Alpha-beta pruning can be applied to trees of any depth, and it is often possible to prune entire subtrees rather than just leaves. The general principle is: consider a node $n$ somewhere in the tree such that player has a choice of moving to that node. If the player has a better choice $m$ either at the parent node of $n$ or at any choice-point further up, then n will never be reached in actual play. So once we have found out enough about $n$ to reach this conclusion, we can prune it.
Alpha and Beta are the 2 parameters that describe bounds on the backed-up value that appear anywhere along the path:
Alpha-beta search updates the value of $\alpha$ and $\beta$ as it goes along and prunes the remaining branch at a node as soon as the vvalue of the current node is known to be worse than the current $\alpha$ or $\beta$ value for MAX and MIN respectively.
Let's have a look at the pseudo code of Alpha-Beta Search.
In [11]:
%%python
from notebookUtils import *
pseudocode('Alpha Beta Search')
Out[11]:
Above pseudo code is implemented here. Let's analyze the states of TicTacToe game played using Alpha-Beta search.
In [12]:
import aima.core.environment.tictactoe.TicTacToeGame;
import aima.core.environment.tictactoe.TicTacToeState;
import aima.core.search.adversarial.AdversarialSearch;
import aima.core.search.adversarial.AlphaBetaSearch;
import aima.core.util.datastructure.XYLocation;
System.out.println("ALPHA BETA DEMO\n");
TicTacToeGame game = new TicTacToeGame();
TicTacToeState currState = game.getInitialState();
AdversarialSearch<TicTacToeState, XYLocation> search = AlphaBetaSearch
.createFor(game);
while (!(game.isTerminal(currState))) {
System.out.println(game.getPlayer(currState) + " playing ... ");
XYLocation action = search.makeDecision(currState);
currState = game.getResult(currState, action);
System.out.println(currState);
}
Out[12]:
Entire game tree and the optimal decision making in Tic-Tac-Toe game can be visualized here.
In [ ]: