We will describe in this notebook a very simple 2 players card game and a way to implement an IA to play this game using an artificial neural network (we will use a multi layer perceptron model)
The IA will be trained only by playing, which means that no "expert" human player will implement strategies.
The only information about the game given to the IA is a function which indicates to the IA the cards allowed to play.
The game is a very basic 2 player "Trick-taking" card game :
In order to compare strategies, the players will play a twice :
Then we compare both games :
We will implement a simple MLP model using keras library
In [10]:
# Importation
import numpy as np
import keras
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD, RMSprop
In [15]:
# General functions to generate hands
def generateDeck(N=10, P=2):
deck = np.zeros((P*N, 2))
for i in range(0,P):
deck[i * N:i * N + N,0] = np.arange(N)
deck[i * N:i * N + N,1] = i
return deck
def gererateHand(N=10, P=2, seed=0):
""" Generate N cards hands for P players.
The function returns the deck indices"""
np.random.seed(seed)
deck = np.arange(N*P)
np.random.shuffle(deck)
return np.reshape(deck, (P,-1))
In [6]:
def playable(hand, deck, trick=[]):
"""Returns indices of playable cards """
allrange = np.arange(len(hand))
if len(trick) == 0:
return allrange # First to play
color = deck[trick[0]][1]
colors = allrange[deck[hand][:,1] == color]
if len(colors) > 0: # Forced to play same color
return colors
else: # Does not have the color
return allrange
In [7]:
def firstPlay(hand, deck, trick=[]):
choices = playable(hand, deck, trick)
return choices[0]
In [12]:
def playTrick(hands, hist, deck, dealer=0):
"""Let the players play one trick """
tricks = []
P = len(hands)
for player in range(dealer, dealer+P):
player %= P
if player == 0:
card_index = firstPlay(hands[player], deck, tricks) # Player 1 plays "firstPlay strategy"
elif player == 1:
card_index = firstPlay(hands[player], deck, tricks) # Player 2 plays "firstPlay strategy"
tricks.append(hands[player][card_index])
return tricks
def getWinner(trick, deck):
"""Computes which player wins the trick """
good_trump = deck[trick][:,1] == deck[trick[0],1] # players with the right color
winner = 0
for i in range(1, len(trick)):
if good_trump[i] and deck[trick[i]][0] > deck[trick[winner]][0]:
winner = i
return winner
def getHands(hands, played_cards):
""" Extract current hands from played cards"""
cleanHand = []
for hand in hands:
cleanHand.append(hand[np.array([card not in played_cards for card in hand])])
return cleanHand
def play(hands, deck, dealer=0):
"""Plays N tricks and returns played_card order, first player, and winner"""
played_cards = []
who_play = []
winner = []
for i in range(0, len(hands[0])): # N tricks
# P players starting from dealer
who_play += list((np.arange(0, len(hands)) + dealer) % len(hands))
trick = playTrick(getHands(hands, played_cards), played_cards, deck, dealer=dealer)
played_cards += trick # Save played card history
winner_rel = getWinner(trick, deck)
dealer = (dealer + winner_rel) % len(hands) # Compute which player will start next
winner.append(dealer)
return (played_cards, winner, who_play)
def shiftHand(hands):
"""Exchange hands so that each player can play each hand"""
hands2 = np.zeros(hands.shape, dtype=hands.dtype )
hands2[0] = hands[-1]
hands2[1:] = hands[0:-1]
return hands2
In [14]:
deck0 = generateDeck(N=10, P=2)
num_of_games = 1000
w1 = 0
w2 = 0
d = 0
for i in range(0, num_of_games):
# Lets play num_of_games games
hand0 = gererateHand(N=10, P=2, seed=i) # seed is fixed
(hist, win, who_play) = play(hand0, deck0, 0)
(shift_hist, shift_win, shift_who_play) = play(shiftHand(hand0), deck0, 1)
if sum([w == 0 for w in win]) > sum([w == 1 for w in shift_win]):
w1 += 1
elif sum([w == 0 for w in win]) < sum([w == 1 for w in shift_win]):
w2 += 1
else:
d += 1
print(w1 / num_of_games * 100, w2 / num_of_games * 100, d / num_of_games * 100)
Expected result:
0.0 0.0 100.0
=> 100% of played games ends with draw (as expected)
In [39]:
def randomPlay(hand, deck, trick=[]):
choices = playable(hand, deck, trick)
return np.random.random_integers(0,len(choices) - 1)
def playTrick(hands, hist, deck, dealer=0):
"""Let the players play one trick """
tricks = []
P = len(hands)
for player in range(dealer, dealer+P):
player %= P
if player == 0:
card_index = randomPlay(hands[player], deck, tricks) # Player 1 plays "randomPlay strategy"
elif player == 1:
card_index = randomPlay(hands[player], deck, tricks) # Player 2 plays "randomPlay strategy"
tricks.append(hands[player][card_index])
return tricks
In [45]:
deck0 = generateDeck(N=10, P=2)
num_of_games = 1000
w1 = 0
w2 = 0
d = 0
for i in range(0, num_of_games):
# Lets play num_of_games games
hand0 = gererateHand(N=10, P=2, seed=i) # seed is fixed
(hist, win, who_play) = play(hand0, deck0, 0)
(shift_hist, shift_win, shift_who_play) = play(shiftHand(hand0), deck0, 1)
if sum([w == 0 for w in win]) > sum([w == 1 for w in shift_win]):
w1 += 1
elif sum([w == 0 for w in win]) < sum([w == 1 for w in shift_win]):
w2 += 1
else:
d += 1
print(w1 / num_of_games * 100, w2 / num_of_games * 100, d / num_of_games * 100)
Expected results : P1 P2 (100 - P1 - P2) with P1 close to P2 Here we have for example:
Now we will implement a neural network (without training it) with one hidden layer
We implement one output per card. The choosen card will be the allowed card with the highest value
We implement a concatenation of 3 binary inputs :
Example with N = 3
[0, 0, 1, 1, 0, 0] will represent a game in which card 2 and 3 of the deck have been already played. Which means in the case of 2 colors the card (2, 0) and (0, 1)
[1, 0, 0, 0, 0, 1] will represent a hand with cards 2 and 3 of the deck, i.e card (0,0) and (2,1)
When player is the first player to play, the trick will be [0, 0, 0, 0, 0, 0] When player is the second player to play, the trick represents the played card. Example [0, 1, 0, 0, 0, 0] means that other player plays card (1, 0)
In [47]:
def compile_nn(in_, hidden_, out_):
"""Creates the neural network"""
model = Sequential()
model.add(Dense(in_, hidden_, init='uniform', activation='tanh'))
model.add(Dropout(0.5))
model.add(Dense(hidden_, out_, init='uniform', activation='softmax'))
rms = RMSprop()
model.compile(loss='categorical_crossentropy', optimizer=rms)
return model
Lets implement the NN playing strategy. We add some function allowing us convert a card to its binary array representation.
In [49]:
def card2vec(card_value, card_color, N=10, P=2):
"""Convert a card from its (value, color) representation to its binary array representation"""
vec = np.zeros(N * P)
vec[card_color * N + card_value] = 1
return vec
def vec2card(vec, N=10):
"""Convert a card from its binary array representation to its (value, color) representation"""
index = np.argwhere(vec == 1)[0][0]
return (index % N, int(index / N))
def hist2vec(hist, deck, N=10, P=2):
"""Convert a list of card"""
hist_vec = np.zeros((len(hist), len(deck)))
for i in range(0, len(hist)):
vec = card2vec(deck[hist[i]][0], deck[hist[i]][1], N=N, P=P)
hist_vec[i] = vec
return hist_vec
def nnPlay(mod, hand, hist, deck, trick=[]):
choices = playable(hand, deck, trick)
hist_cards = np.sum(hist2vec(hist, deck0, 10, 2), axis=0)
current_hand_vec = np.zeros(len(deck0))
for j in hand:
current_hand_vec[j] = 1
if trick:
new_X = np.concatenate([
hist_cards,
current_hand_vec,
hist2vec(trick, deck)[0]
])
else:
new_X = np.concatenate([
hist_cards,
current_hand_vec,
np.zeros(20)
])
probas = mod.predict(new_X.reshape((1,-1)), verbose=0)
return choices[np.argmax(probas[0][hand[choices]])]
In [51]:
modl = compile_nn(3 * len(deck0), 2 * len(deck0), len(deck0))
# We keep modl has a global variable in order to only change interesing functions
def playTrick(hands, hist, deck, dealer=0):
tricks = []
P = len(hands)
for player in range(dealer, dealer+P):
player %= P
if player == 0:
card_index = nnPlay(modl, hands[player], hist, deck, tricks)
elif player == 1:
card_index = firstPlay(hands[player], deck, tricks)
tricks.append(hands[player][card_index])
return tricks
In [53]:
deck0 = generateDeck(N=10, P=2)
num_of_games = 1000
w1 = 0
w2 = 0
d = 0
for i in range(0, num_of_games):
# Lets play num_of_games games
hand0 = gererateHand(N=10, P=2, seed=10 * i) # seed is fixed
(hist, win, who_play) = play(hand0, deck0, 0)
(shift_hist, shift_win, shift_who_play) = play(shiftHand(hand0), deck0, 1)
if sum([w == 0 for w in win]) > sum([w == 1 for w in shift_win]):
w1 += 1
elif sum([w == 0 for w in win]) < sum([w == 1 for w in shift_win]):
w2 += 1
else:
d += 1
print(w1 / num_of_games * 100, w2 / num_of_games * 100, d / num_of_games * 100)
Expected results : P1 P2 (100 - P1 - P2) with P1 close to P2 Here we have for example:
Firsly we will train the network by feeding it with cards played when he wins the game
For each won game, we feed the network as if each card he plays during this game was the card to play, i.e. was the expected label.
We catch "good" examples with learn_win() functions called when the player wins
In [55]:
def learn_win(hist, X, y):
hist_vec = hist2vec(hist, deck0, 10, 2)
hist_cards = np.zeros(20)
current_hand = hand0[0]
for i in range(0, int(len(hist_vec) / 2)): # For each trick
# Re-construct the current hand of this particular trick
current_hand_vec = np.zeros(len(deck0))
for j in current_hand:
current_hand_vec[j] = 1
# 2 cases : Player plays first or second
if (who_play[2 * i + 1] == 0): # Win when 2nd to play
y.append(hist_vec[2 * i + 1]) # It was THE card to play !
new_X = np.concatenate([
hist_cards,
current_hand_vec,
hist_vec[2 * i]
])
X.append(new_X)
current_hand = np.delete(current_hand, np.where(current_hand == hist[2 * i + 1])[0][0])
elif (who_play[2 * i] == 0): # Win when 1st to play
y.append(hist_vec[2 * i]) # It was THE card to play !
new_X = np.concatenate([
hist_cards,
current_hand_vec,
np.zeros(20)
])
X.append(new_X)
current_hand = np.delete(current_hand, np.where(current_hand == hist[2 * i])[0][0])
hist_cards += hist_vec[2 * i] + hist_vec[2 * i + 1]
return (X, y)
Now we will play several batches of different num_of_games. So that the network can learn from each one
In [62]:
modl = compile_nn(3 * len(deck0), 2 * len(deck0), len(deck0))
deck0 = generateDeck(N=10, P=2)
num_of_games = 1000
for j in range(0, 10): # LETS PLAY 10 TIMES !
w1 = 0
w2 = 0
d = 0
# Examples used to train the network
# We accumulate examples during all the num_of_games games
X = []
y = []
for i in range(0, num_of_games):
hand0 = gererateHand(N=10, P=2, seed=i + j * num_of_games) # seed is always differebt
(hist, win, who_play) = play(hand0, deck0, 0)
(shift_hist, shift_win, shift_who_play) = play(shiftHand(hand0), deck0, 1)
if sum([w == 0 for w in win]) > sum([w == 1 for w in shift_win]):
# HERE WE WIN
# So we use the information to train the network
(X, y) = learn_win(hist, X, y)
w1 += 1
elif sum([w == 0 for w in win]) < sum([w == 1 for w in shift_win]):
w2 += 1
else:
d += 1
print(w1 / num_of_games * 100, w2 / num_of_games * 100, d / num_of_games * 100)
# Now we have good examples, we can train the network for the next time !
modl.fit(np.array(X), np.array(y), nb_epoch=20, verbose=0)
mod0 = modl # Save the model
Expected results : We expect our network to be quite better than the random wich means that P1 must increase and be better than P2
Example of generated results :
Ok seems to work since NN player wins 50% of the time whereas random strategy wins only 25% !
In [59]:
def learn_loose(hist, X, y):
hist_vec = hist2vec(hist, deck0, 10, 2)
hist_cards = np.zeros(20)
current_hand = hand0[0]
for i in range(0, int(len(hist_vec) / 2)):
current_hand_vec = np.zeros(len(deck0))
for j in current_hand:
current_hand_vec[j] = 1
if (who_play[2 * i + 1] == 0): # Loose when 2nd to play
ye = np.zeros(20)
ye[current_hand] = 1
ye -= hist_vec[2 * i + 1]
y.append(ye) # Maybe I could choose another card of my hand
new_X = np.concatenate([
hist_cards,
current_hand_vec,
hist_vec[2 * i]
])
X.append(new_X)
current_hand = np.delete(current_hand, np.where(current_hand == hist[2 * i + 1])[0][0])
elif (who_play[2 * i] == 0): # Loose when 1st to play
ye = np.zeros(20)
ye[current_hand] = 1
ye -= hist_vec[2 * i]
y.append(ye) # Maybe I could choose another card of my hand
new_X = np.concatenate([
hist_cards,
current_hand_vec,
np.zeros(20)
])
X.append(new_X)
current_hand = np.delete(current_hand, np.where(current_hand == hist[2 * i])[0][0])
hist_cards += hist_vec[2 * i] + hist_vec[2 * i + 1]
return (X, y)
And now we train it each time !
In [72]:
modl = compile_nn(3 * len(deck0), 2 * len(deck0), len(deck0))
deck0 = generateDeck(N=10, P=2)
num_of_games = 1000
for j in range(0, 10): # LETS PLAY 10 TIMES !
w1 = 0
w2 = 0
d = 0
# Examples used to train the network
# We accumulate examples during all the num_of_games games
X = []
y = []
for i in range(0, num_of_games):
hand0 = gererateHand(N=10, P=2, seed=i + j * num_of_games) # seed is always differebt
(hist, win, who_play) = play(hand0, deck0, 0)
(shift_hist, shift_win, shift_who_play) = play(shiftHand(hand0), deck0, 1)
if sum([w == 0 for w in win]) > sum([w == 1 for w in shift_win]):
# HERE WE WIN
# So we use the information to train the network
(X, y) = learn_win(hist, X, y)
w1 += 1
elif sum([w == 0 for w in win]) < sum([w == 1 for w in shift_win]):
# HERE WE LOOSE
# So we use the information to train the network
(X, y) = learn_loose(hist, X, y)
w2 += 1
else:
d += 1
print(w1 / num_of_games * 100, w2 / num_of_games * 100, d / num_of_games * 100)
# Now we have good examples, we can train the network for the next time !
modl.fit(np.array(X), np.array(y), nb_epoch=20, verbose=0)
mod2 = modl # Save the model
Example of generated results :
Well our network now wins 72% of the time vs 12 % for the random strategy
In [69]:
def playTrick(hands, hist, deck, dealer=0):
tricks = []
P = len(hands)
for player in range(dealer, dealer+P):
player %= P
if player == 0:
card_index = nnPlay(mod0, hands[player], hist, deck, tricks)
elif player == 1:
# card_index = firstPlay(hands[player], deck, tricks)
card_index = nnPlay(mod1, hands[player], hist, deck, tricks)
tricks.append(hands[player][card_index])
return tricks
In [71]:
deck0 = generateDeck(N=10, P=2)
num_of_games = 1000
w1 = 0
w2 = 0
d = 0
for i in range(0, num_of_games):
# Lets play num_of_games games
hand0 = gererateHand(N=10, P=2, seed=10000 * i) # seed is fixed
(hist, win, who_play) = play(hand0, deck0, 0)
(shift_hist, shift_win, shift_who_play) = play(shiftHand(hand0), deck0, 1)
if sum([w == 0 for w in win]) > sum([w == 1 for w in shift_win]):
w1 += 1
elif sum([w == 0 for w in win]) < sum([w == 1 for w in shift_win]):
w2 += 1
else:
d += 1
print(w1 / num_of_games * 100, w2 / num_of_games * 100, d / num_of_games * 100)
Example of generated results :
The second neural network seem way better !
In [ ]: