This is the ipython notebook you should use as a template for your agent. Your task for this assignment is to implement a winning AI for the game of Isolation, as specified in the assignment pdf you have been issued.
The following random agent just selects a move out of the set of legal moves. Note that your agent, when asked for a move, is already provided with the set of moves available to it. This is done for your convenience. If your agent attempts to perform an illegal move, it will lose, so please refrain from doing so. It is also provided with a function that, when invoked, returns the amount of time left for your agent to make its move. If your agent fails to make a move in the alloted time, it will lose.
In [98]:
from random import randint
class RandomPlayer():
"""Player that chooses a move randomly."""
def move(self, game, legal_moves, time_left):
if not legal_moves: return (-1,-1)
return legal_moves[randint(0,len(legal_moves)-1)]
The following are functions that might be useful to you in developing your agent:
game.get_legal_moves(): Returns a list of legal moves for the active player.
game.get_opponent_moves(): Returns a list of legal moves for the inactive player.
game.forecast_move(move): This returns a new board, whose state is the result of making the move specified on the current board.
game.get_state(): This returns a 2D array containing a copy of the explicit state of the board.
game.is_winner(player): Returns whether your player agent has won.
game.is_opponent_winner(player): Returns whether your player's opponent has won.
game.print_board(): Returns a string representation of the game board. This should be useful for debugging.
In [99]:
class HumanPlayer():
"""Player that chooses a move according to
user's input."""
def move(self, game, legal_moves, time_left):
print('\t'.join(['[%d] %s'%(i,str(move)) for i,move in enumerate(legal_moves)] ))
valid_choice = False
while not valid_choice:
try:
index = int(raw_input('Select move index:'))
valid_choice = 0 <= index < len(legal_moves)
if not valid_choice:
print('Illegal move! Try again.')
except ValueError:
print('Invalid index! Try again.')
return legal_moves[index]
This is the first part of the assignment you are expected to implement. It is the evaluation function we've been using in class. The score of a specified game state is just the number of moves open to the active player.
In [100]:
class OpenMoveEvalFn():
def score(self, game, maximizing_player):
if maximizing_player:
eval_func = len(game.get_legal_moves())
else:
eval_func = len(game.get_opponent_moves())
return eval_func
The following is a
Custom evaluation function that acts
however you think it should. This is not
required but highly encouraged if you
want to build the best AI possible.
In [101]:
class CustomEvalFn():
def score(self, game, maximizing_player):
#maximize your own moves and minimize opponents moves
if maximizing_player:
my_moves = len(game.get_legal_moves())
opponents_moves = len(game.get_opponent_moves())
else:
opponents_moves = len(game.get_legal_moves())
my_moves = len(game.get_opponent_moves())
#maximize this
eval_func = my_moves - opponents_moves
return eval_func
Implement a Player below that chooses a move using your evaluation function and a depth-limited minimax algorithm with alpha-beta pruning. You must finish and test this player to make sure it properly uses minimax and alpha-beta to return a good move in less than 500 milliseconds.
In [102]:
class CustomPlayer():
def __init__(self, search_depth=15, eval_fn=CustomEvalFn(), threshold = 50):
self.eval_fn = eval_fn
self.search_depth = search_depth
self.thresh = threshold
def move(self, game, legal_moves, time_left):
if game.move_count<2:
return legal_moves[randint(0,len(legal_moves)-1)]
best_move = self.alphabeta_id(game, time_left, self.search_depth)
# you will eventually replace minimax with alpha-beta
return best_move
def utility(self, game, maximizing_player):
if game.is_winner(self):
return 50000
if game.is_opponent_winner(self):
return -50000
return self.eval_fn.score(game, maximizing_player)
def minimax(self, game, time_left, depth=float("inf"), maximizing_player=True):
#terminal states
#if maximizing_player and realize opponent has won
#if minimizing player and realize max has won
if (maximizing_player and game.is_opponent_winner(self)) or (not maximizing_player and game.is_winner(self)):
return ((-1,-1), self.utility(game, maximizing_player))
#if realize that time_left isn't much, we have an arbitrary threshold of 50 ms here/max depth is reached
if depth==0 or time_left()<self.thresh:
return ((-1,-1), self.utility(game, maximizing_player))
#get actions
actions = game.get_legal_moves()
best_move = (-1,-1)
#if maximizing player, get minimax value for all actions and choose the move which has maxMIN value
if maximizing_player:
best_val = float("-inf")
for a in actions:
_, score_of_action = self.minimax(game.forecast_move(a), time_left, depth-1, False);
best_move, best_val = (a, score_of_action) if score_of_action>=best_val else (best_move, best_val)
if time_left()<self.thresh:
return (best_move, best_val)
#if minimizing player find minimax value for all actions and choose one which has minMAX value
else:
best_val = float("inf")
for a in actions:
_, score_of_action = self.minimax(game.forecast_move(a), time_left, depth-1, True);
best_move, best_val = (a, score_of_action) if score_of_action<=best_val else (best_move, best_val)
if time_left()<self.thresh:
return (best_move, best_val)
return (best_move, best_val)
def alphabeta(self, game, time_left, depth=float("inf"), alpha=float("-inf"), beta=float("inf"), maximizing_player=True):
#terminal states
#if maximizing_player and realize opponent has won
#if minimizing player and realize max has won
if (maximizing_player and game.is_opponent_winner(self)) or (not maximizing_player and game.is_winner(self)):
return ((-1,-1), self.utility(game, maximizing_player))
#if realize that time_left isn't much, we have an arbitrary threshold of 75 ms here/max depth is reached
if depth==0 or time_left()<self.thresh:
return ((-1,-1), self.utility(game, maximizing_player))
#get actions
actions = game.get_legal_moves()
best_move = (-1,-1)
#if maximizing player, get alphabeta value for all actions and choose the move
#which has max value plus prune those which do not adher to alphabeta limits
if maximizing_player:
best_val = float("-inf")
for a in actions:
_, score_of_action = self.alphabeta(game.forecast_move(a), time_left, depth-1, alpha, beta, False);
if score_of_action>best_val:
best_move = a
best_val = score_of_action
if best_val>beta:
return a, best_val
alpha = max(alpha, best_val)
if alpha>=beta:
return a, alpha
if time_left()<self.thresh:
return (best_move, best_val)
else:
best_val = float("inf")
for a in actions:
_, score_of_action = self.alphabeta(game.forecast_move(a), time_left, depth-1, alpha, beta, True);
if score_of_action<best_val:
best_move = a
best_val = score_of_action
if best_val<alpha:
return a, best_val
beta = min(beta, best_val)
if beta<=alpha:
return a, beta
if time_left()<self.thresh:
return (best_move, best_val)
return (best_move, best_val)
def alphabeta_id(self, game, time_left, max_depth):
for depth in range(1, max_depth):
best_move, _ = self.alphabeta(game, time_left, depth)
if time_left()<self.thresh:
break
return (best_move)
The following are some basic tests you can use to sanity-check your code. You will also be provided with a test server to which you will be able to submit your agents later this week. Good luck!
In [103]:
"""Example test you can run
to make sure your AI does better
than random."""
from isolation import Board
if __name__ == "__main__":
r = RandomPlayer()
h = CustomPlayer()
game = Board(h,r)
game.play_isolation(500)
In [104]:
"""Example test you can run
to make sure your basic evaluation
function works."""
from isolation import Board
if __name__ == "__main__":
sample_board = Board(RandomPlayer(),RandomPlayer())
# setting up the board as though we've been playing
sample_board.move_count = 3
sample_board.__active_player__ = 0 # player 1 = 0, player 2 = 1
# 1st board = 16 moves
sample_board.__board_state__ = [
[0,2,0,0,0],
[0,0,0,0,0],
[0,0,1,0,0],
[0,0,0,0,0],
[0,0,0,0,0]]
sample_board.__last_player_move__ = [(2,2),(0,1)]
# player 1 should have 16 moves available,
# so board gets a score of 16
h = OpenMoveEvalFn()
print('This board has a score of %s.'%(h.score(sample_board)))