Simple card game IA

We will describe in this notebook a very simple 2 players card game and a way to implement an IA to play this game using an artificial neural network (we will use a multi layer perceptron model)

The IA will be trained only by playing, which means that no "expert" human player will implement strategies.

The only information about the game given to the IA is a function which indicates to the IA the cards allowed to play.

The game

The game is a very basic 2 player "Trick-taking" card game :

General description

  • There are 2 colors (color 0 and color 1)
  • Each color has N cards (valued from 0 to N-1)
  • The game is composed of N "tricks"
  • Player 0 starts to play for the first trick
  • The winner of a trick starts to play the next trick

Trick rules

  • The first player can play any card
  • If the second player has cards with the same color, he has to play the same color
  • Otherwise he plays the other color

Trick winner

  • If the second player plays the same color. The card with the highest value wins the trick
  • If the second player does not have the same color. The first player wins

Example

  • N = 3
  • We denote (1,0) card 1 of color 0
  • Player 0's hand = [(0, 0), (2, 0), (1, 1)]
  • Player 1's hand = [(1, 0), (0, 1), (2, 1)]
    • Player 0 plays (2,0)
    • Player 1 plays (1,0)
      • Player 0 wins the trick since 2 > 1 and start to play next trick
    • Player 0 plays (0,0)
    • Player 1 plays (0,1)
      • Player 0 wins the trick since player 1 with no color 0, played color 1
    • Player 0 plays (1, 1)
    • Player 1 plays (2, 1)
      • Player 1 wins the trick since 2 > 1
  • At the end of the game, player 0 wins 2 tricks, player 1 wins 1 trick

Strategies comparison

In order to compare strategies, the players will play a twice :

  • Player 0 with hand 0, Player 1 with hand 1, Player 0 stats
  • Player 1 with hand 0, Player 0 with hand 1, Player 1 stats

Then we compare both games :

  • If player 0 wins more tricks with hand 0 than player 1 with hand 0, player 0 wins the whole game
  • If player 1 wins more tricks with hand 0 than player 0 with hand 0, player 1 wins the whole game
  • Otherwise its draw

IA implementation

We will implement a simple MLP model using keras library


In [10]:
# Importation
import numpy as np

import keras
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD, RMSprop

In [15]:
# General functions to generate hands
def generateDeck(N=10, P=2):
    deck = np.zeros((P*N, 2))
    for i in range(0,P):
        deck[i * N:i * N + N,0] = np.arange(N)
        deck[i * N:i * N + N,1] = i
    return deck

def gererateHand(N=10, P=2, seed=0):
    """ Generate N cards hands for P players.
    The function returns the deck indices"""
    np.random.seed(seed)
    deck = np.arange(N*P)
    np.random.shuffle(deck)
    return np.reshape(deck, (P,-1))

Rules

Here is the rule function that indicates to the player the cards they can play, according to their hand and the previous trick.


In [6]:
def playable(hand, deck, trick=[]):
    """Returns indices of playable cards """
    allrange = np.arange(len(hand))
    if len(trick) == 0:
        return allrange # First to play

    color = deck[trick[0]][1]
    colors = allrange[deck[hand][:,1] == color]
    if len(colors) > 0: # Forced to play same color
        return colors
    else: # Does not have the color
        return allrange

Random Strategy

Here is the random strategy.

The player plays the first card he founds and since cards are distributed in a random way (see gererateHand()) its equivalent to a random selection.


In [7]:
def firstPlay(hand, deck, trick=[]):
    choices = playable(hand, deck, trick)
    return choices[0]

First implementation

We will first implement 2 players playing the same random strategy. Since everything is seeded, all the game should end with a draw result


In [12]:
def playTrick(hands, hist, deck, dealer=0):
    """Let the players play one trick """
    tricks = []
    P = len(hands)
    for player in range(dealer, dealer+P):
        player %= P
        if player == 0:
            card_index = firstPlay(hands[player], deck, tricks) # Player 1 plays "firstPlay strategy"
        elif player == 1:
            card_index = firstPlay(hands[player], deck, tricks) # Player 2 plays "firstPlay strategy"
        tricks.append(hands[player][card_index])
    return tricks


def getWinner(trick, deck):
    """Computes which player wins the trick """
    good_trump = deck[trick][:,1] == deck[trick[0],1] # players with the right color
    winner = 0
    for i in range(1, len(trick)):
        if good_trump[i] and deck[trick[i]][0] > deck[trick[winner]][0]:
            winner = i
    return winner


def getHands(hands, played_cards):
    """ Extract current hands from played cards"""
    cleanHand = []
    for hand in hands:
        cleanHand.append(hand[np.array([card not in played_cards for card in hand])])
    return cleanHand


def play(hands, deck, dealer=0):
    """Plays N tricks and returns played_card order, first player, and winner"""
    played_cards = []
    who_play = []
    winner = []
    
    for i in range(0, len(hands[0])): # N tricks
        # P players starting from dealer
        who_play += list((np.arange(0, len(hands)) + dealer) % len(hands))

        trick = playTrick(getHands(hands, played_cards), played_cards, deck, dealer=dealer)      
        played_cards += trick # Save played card history
        
        winner_rel = getWinner(trick, deck)
        dealer = (dealer + winner_rel) % len(hands) # Compute which player will start next
        winner.append(dealer)
    return (played_cards, winner, who_play)


def shiftHand(hands):
    """Exchange hands so that each player can play each hand"""
    hands2 = np.zeros(hands.shape, dtype=hands.dtype )
    hands2[0] = hands[-1]
    hands2[1:] = hands[0:-1]
    return hands2

Lets play

Now both players can play the first random strategy Lets generate a deck, play several games and compute the winner. According to the implementation, all games must be draw


In [14]:
deck0 = generateDeck(N=10, P=2)
num_of_games = 1000
w1 = 0
w2 = 0
d = 0

for i in range(0, num_of_games):
    # Lets play num_of_games games
    hand0 = gererateHand(N=10, P=2, seed=i) # seed is fixed

    (hist, win, who_play) = play(hand0, deck0, 0)
    (shift_hist, shift_win, shift_who_play) = play(shiftHand(hand0), deck0, 1)

    if sum([w == 0 for w in win]) > sum([w == 1 for w in shift_win]):
        w1 += 1
    elif sum([w == 0 for w in win]) < sum([w == 1 for w in shift_win]):
        w2 += 1
    else:
        d += 1
print(w1 / num_of_games * 100, w2 / num_of_games * 100, d / num_of_games * 100)


0.0 0.0 100.0

Expected result:

0.0 0.0 100.0

=> 100% of played games ends with draw (as expected)

Random strategies

Lets see how 2 players with random selection (but not the same) plays together


In [39]:
def randomPlay(hand, deck, trick=[]):
    choices = playable(hand, deck, trick)
    return np.random.random_integers(0,len(choices) - 1)

def playTrick(hands, hist, deck, dealer=0):
    """Let the players play one trick """
    tricks = []
    P = len(hands)
    for player in range(dealer, dealer+P):
        player %= P
        if player == 0:
            card_index = randomPlay(hands[player], deck, tricks) # Player 1 plays "randomPlay strategy"
        elif player == 1:
            card_index = randomPlay(hands[player], deck, tricks) # Player 2 plays "randomPlay strategy"
        tricks.append(hands[player][card_index])
    return tricks

In [45]:
deck0 = generateDeck(N=10, P=2)
num_of_games = 1000
w1 = 0
w2 = 0
d = 0

for i in range(0, num_of_games):
    # Lets play num_of_games games
    hand0 = gererateHand(N=10, P=2, seed=i) # seed is fixed

    (hist, win, who_play) = play(hand0, deck0, 0)
    (shift_hist, shift_win, shift_who_play) = play(shiftHand(hand0), deck0, 1)

    if sum([w == 0 for w in win]) > sum([w == 1 for w in shift_win]):
        w1 += 1
    elif sum([w == 0 for w in win]) < sum([w == 1 for w in shift_win]):
        w2 += 1
    else:
        d += 1
print(w1 / num_of_games * 100, w2 / num_of_games * 100, d / num_of_games * 100)


41.8 43.3 14.899999999999999

Expected results : P1 P2 (100 - P1 - P2) with P1 close to P2 Here we have for example:

  • 44.80 41.6 13.60 with seed=i
  • 41.80 43.3 14.90 with seed=10*i

Artificial Neural Network

Now we will implement a neural network (without training it) with one hidden layer

Network output

We implement one output per card. The choosen card will be the allowed card with the highest value

Network input

We implement a concatenation of 3 binary inputs :

  • One vector of size len(deck) representing the cards already played
  • One vector of size len(deck) representing the current hand
  • One vector of size len(deck) representing the current trick

Card already played

Example with N = 3

[0, 0, 1, 1, 0, 0] will represent a game in which card 2 and 3 of the deck have been already played. Which means in the case of 2 colors the card (2, 0) and (0, 1)

Current hand

[1, 0, 0, 0, 0, 1] will represent a hand with cards 2 and 3 of the deck, i.e card (0,0) and (2,1)

Trick

When player is the first player to play, the trick will be [0, 0, 0, 0, 0, 0] When player is the second player to play, the trick represents the played card. Example [0, 1, 0, 0, 0, 0] means that other player plays card (1, 0)


In [47]:
def compile_nn(in_, hidden_, out_):
    """Creates the neural network"""
    model = Sequential()
    model.add(Dense(in_, hidden_, init='uniform', activation='tanh'))
    model.add(Dropout(0.5))
    model.add(Dense(hidden_, out_, init='uniform', activation='softmax'))

    rms = RMSprop()
    model.compile(loss='categorical_crossentropy', optimizer=rms)
    return model

Lets implement the NN playing strategy. We add some function allowing us convert a card to its binary array representation.


In [49]:
def card2vec(card_value, card_color, N=10, P=2):
    """Convert a card from its (value, color) representation to its binary array representation"""
    vec = np.zeros(N * P)
    vec[card_color * N + card_value] = 1
    return vec

def vec2card(vec, N=10):
    """Convert a card from its binary array representation to its (value, color) representation"""
    index = np.argwhere(vec == 1)[0][0]
    return (index % N, int(index / N))

def hist2vec(hist, deck, N=10, P=2):
    """Convert a list of card"""
    hist_vec = np.zeros((len(hist), len(deck)))
    for i in range(0, len(hist)):
        vec = card2vec(deck[hist[i]][0], deck[hist[i]][1], N=N, P=P)
        hist_vec[i] = vec
    return hist_vec

def nnPlay(mod, hand, hist, deck, trick=[]):
    choices = playable(hand, deck, trick)

    hist_cards = np.sum(hist2vec(hist, deck0, 10, 2), axis=0)
    
    current_hand_vec = np.zeros(len(deck0))
    for j in hand:
        current_hand_vec[j] = 1

    if trick:
        new_X = np.concatenate([
            hist_cards,
            current_hand_vec,
            hist2vec(trick, deck)[0]
        ])
    else:
        new_X = np.concatenate([
            hist_cards,
            current_hand_vec,
            np.zeros(20)
        ])

    probas = mod.predict(new_X.reshape((1,-1)), verbose=0)
    return choices[np.argmax(probas[0][hand[choices]])]

Untrained NN

Lets try our untrained neural network agaist a firstPlay strategy We use 2 * len(deck) neurons in the hidden layer


In [51]:
modl = compile_nn(3 * len(deck0),  2 * len(deck0), len(deck0))
# We keep modl has a global variable in order to only change interesing functions

def playTrick(hands, hist, deck, dealer=0):
    tricks = []
    P = len(hands)
    for player in range(dealer, dealer+P):
        player %= P
        if player == 0:
            card_index = nnPlay(modl, hands[player], hist, deck, tricks)
        elif player == 1:
            card_index = firstPlay(hands[player], deck, tricks)
        tricks.append(hands[player][card_index])
    return tricks

In [53]:
deck0 = generateDeck(N=10, P=2)
num_of_games = 1000
w1 = 0
w2 = 0
d = 0

for i in range(0, num_of_games):
    # Lets play num_of_games games
    hand0 = gererateHand(N=10, P=2, seed=10 * i) # seed is fixed

    (hist, win, who_play) = play(hand0, deck0, 0)
    (shift_hist, shift_win, shift_who_play) = play(shiftHand(hand0), deck0, 1)

    if sum([w == 0 for w in win]) > sum([w == 1 for w in shift_win]):
        w1 += 1
    elif sum([w == 0 for w in win]) < sum([w == 1 for w in shift_win]):
        w2 += 1
    else:
        d += 1
print(w1 / num_of_games * 100, w2 / num_of_games * 100, d / num_of_games * 100)


38.4 37.6 24.0

Expected results : P1 P2 (100 - P1 - P2) with P1 close to P2 Here we have for example:

  • 37.8 39.2 23.0 with seed=i
  • 38.4 37.6 24.0 with seed=10*i

Lets train the Network !

Good games

Firsly we will train the network by feeding it with cards played when he wins the game

For each won game, we feed the network as if each card he plays during this game was the card to play, i.e. was the expected label.

We catch "good" examples with learn_win() functions called when the player wins


In [55]:
def learn_win(hist, X, y):
    hist_vec = hist2vec(hist, deck0, 10, 2)
    hist_cards = np.zeros(20)
    current_hand = hand0[0]
    
    for i in range(0, int(len(hist_vec) / 2)): # For each trick
        
        # Re-construct the current hand of this particular trick
        current_hand_vec = np.zeros(len(deck0))
        for j in current_hand:
            current_hand_vec[j] = 1
        
        # 2 cases : Player plays first or second
        if (who_play[2 * i + 1] == 0): # Win when 2nd to play
            y.append(hist_vec[2 * i + 1]) # It was THE card to play !
            new_X = np.concatenate([
                hist_cards,
                current_hand_vec,
                hist_vec[2 * i]
            ])
            X.append(new_X)
            current_hand = np.delete(current_hand, np.where(current_hand == hist[2 * i + 1])[0][0])

        elif (who_play[2 * i] == 0): # Win when 1st to play
            y.append(hist_vec[2 * i]) # It was THE card to play !
            new_X = np.concatenate([
                hist_cards,
                current_hand_vec,
                np.zeros(20)
            ])
            X.append(new_X)
            current_hand = np.delete(current_hand, np.where(current_hand == hist[2 * i])[0][0])
        hist_cards += hist_vec[2 * i] + hist_vec[2 * i + 1]       
    return (X, y)

Now we will play several batches of different num_of_games. So that the network can learn from each one


In [62]:
modl = compile_nn(3 * len(deck0),  2 * len(deck0), len(deck0))

deck0 = generateDeck(N=10, P=2)
num_of_games = 1000

for j in range(0, 10): # LETS PLAY 10 TIMES !
    w1 = 0
    w2 = 0
    d = 0

    # Examples used to train the network
    # We accumulate examples during all the num_of_games games
    X = []
    y = []

    for i in range(0, num_of_games):
        hand0 = gererateHand(N=10, P=2, seed=i + j * num_of_games) # seed is always differebt

        (hist, win, who_play) = play(hand0, deck0, 0)
        (shift_hist, shift_win, shift_who_play) = play(shiftHand(hand0), deck0, 1)

        if sum([w == 0 for w in win]) > sum([w == 1 for w in shift_win]):
            # HERE WE WIN
            # So we use the information to train the network
            (X, y) = learn_win(hist, X, y)
            w1 += 1
        elif sum([w == 0 for w in win]) < sum([w == 1 for w in shift_win]):
            w2 += 1
        else:
            d += 1
    print(w1 / num_of_games * 100, w2 / num_of_games * 100, d / num_of_games * 100)
    
    # Now we have good examples, we can train the network for the next time !
    modl.fit(np.array(X), np.array(y), nb_epoch=20, verbose=0)

mod0 = modl # Save the model


35.0 37.8 27.200000000000003
40.699999999999996 33.300000000000004 26.0
44.800000000000004 28.199999999999996 27.0
45.9 31.1 23.0
51.6 25.900000000000002 22.5
48.199999999999996 26.5 25.3
48.9 26.3 24.8
52.800000000000004 23.7 23.5
51.5 24.099999999999998 24.4
46.400000000000006 27.400000000000002 26.200000000000003

Expected results : We expect our network to be quite better than the random wich means that P1 must increase and be better than P2

Example of generated results :

  • 35.5 37.6 26.90
  • 40.1 33.6 26.3
  • 40.5 36.20 23.3
  • 41.70 33.30 25.0
  • 45.2 27.50 27.3
  • 47.4 26.8 25.8
  • 45.1 30.0 24.9
  • 48.4 23.1 28.50
  • 48.9 27.80 23.3
  • 50.1 24.7 25.2

Ok seems to work since NN player wins 50% of the time whereas random strategy wins only 25% !

Learn when loosing

With the first model, the network learn only when it wins. We can try to train it when it loose, by teaching it not to play the card it plays.

We feed the labels we a zero probabiliy for the played cards and feed other allowed card with 1


In [59]:
def learn_loose(hist, X, y):
    hist_vec = hist2vec(hist, deck0, 10, 2)
    hist_cards = np.zeros(20)
    current_hand = hand0[0]
    
    for i in range(0, int(len(hist_vec) / 2)):
        current_hand_vec = np.zeros(len(deck0))
        for j in current_hand:
            current_hand_vec[j] = 1

        if (who_play[2 * i + 1] == 0): # Loose when 2nd to play           
            ye = np.zeros(20)
            ye[current_hand] = 1
            ye -= hist_vec[2 * i + 1]            
            y.append(ye) # Maybe I could choose another card of my hand
            
            new_X = np.concatenate([
                hist_cards,
                current_hand_vec,
                hist_vec[2 * i]
            ])
            X.append(new_X)
            current_hand = np.delete(current_hand, np.where(current_hand == hist[2 * i + 1])[0][0])
        elif (who_play[2 * i] == 0): # Loose when 1st to play

            ye = np.zeros(20)
            ye[current_hand] = 1
            ye -= hist_vec[2 * i]            
            y.append(ye) # Maybe I could choose another card of my hand

            new_X = np.concatenate([
                hist_cards,
                current_hand_vec,
                np.zeros(20)
            ])
            X.append(new_X)
            current_hand = np.delete(current_hand, np.where(current_hand == hist[2 * i])[0][0])
        hist_cards += hist_vec[2 * i] + hist_vec[2 * i + 1]       
    return (X, y)

And now we train it each time !


In [72]:
modl = compile_nn(3 * len(deck0),  2 * len(deck0), len(deck0))

deck0 = generateDeck(N=10, P=2)
num_of_games = 1000

for j in range(0, 10): # LETS PLAY 10 TIMES !
    w1 = 0
    w2 = 0
    d = 0

    # Examples used to train the network
    # We accumulate examples during all the num_of_games games
    X = []
    y = []

    for i in range(0, num_of_games):
        hand0 = gererateHand(N=10, P=2, seed=i + j * num_of_games) # seed is always differebt

        (hist, win, who_play) = play(hand0, deck0, 0)
        (shift_hist, shift_win, shift_who_play) = play(shiftHand(hand0), deck0, 1)

        if sum([w == 0 for w in win]) > sum([w == 1 for w in shift_win]):
            # HERE WE WIN
            # So we use the information to train the network
            (X, y) = learn_win(hist, X, y)
            w1 += 1
        elif sum([w == 0 for w in win]) < sum([w == 1 for w in shift_win]):
            # HERE WE LOOSE
            # So we use the information to train the network
            (X, y) = learn_loose(hist, X, y)
            w2 += 1
        else:
            d += 1
    print(w1 / num_of_games * 100, w2 / num_of_games * 100, d / num_of_games * 100)
    
    # Now we have good examples, we can train the network for the next time !
    modl.fit(np.array(X), np.array(y), nb_epoch=20, verbose=0)

mod2 = modl # Save the model


13.0 71.0 16.0
11.5 72.5 16.0
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-72-8c7f54da2f4c> in <module>()
     18 
     19         (hist, win, who_play) = play(hand0, deck0, 0)
---> 20         (shift_hist, shift_win, shift_who_play) = play(shiftHand(hand0), deck0, 1)
     21 
     22         if sum([w == 0 for w in win]) > sum([w == 1 for w in shift_win]):

<ipython-input-12-813b9d854d72> in play(hands, deck, dealer)
     44         played_cards += trick # Save played card history
     45 
---> 46         winner_rel = getWinner(trick, deck)
     47         dealer = (dealer + winner_rel) % len(hands) # Compute which player will start next
     48         winner.append(dealer)

<ipython-input-12-813b9d854d72> in getWinner(trick, deck)
     15 def getWinner(trick, deck):
     16     """Computes which player wins the trick """
---> 17     good_trump = deck[trick][:,1] == deck[trick[0],1] # players with the right color
     18     winner = 0
     19     for i in range(1, len(trick)):

KeyboardInterrupt: 

Example of generated results :

  • 35.5 38.7 25.8
  • 44.7 32.9 22.40
  • 58.5 18.3 23.20
  • 67.60 12.0 20.4
  • 69.5 12.4 18.10
  • 69.5 12.6 17.9
  • 69.1 13.5 17.4
  • 70.3 11.4 18.3
  • 73.7 10.4 15.9
  • 72.3 11.4 16.3

Well our network now wins 72% of the time vs 12 % for the random strategy

NN1 VS NN2

Lets now fight our 2 models together ! We put very large seeds to be sure both NN have not learn from training examples


In [69]:
def playTrick(hands, hist, deck, dealer=0):
    tricks = []
    P = len(hands)
    for player in range(dealer, dealer+P):
        player %= P
        if player == 0:
            card_index = nnPlay(mod0, hands[player], hist, deck, tricks)
        elif player == 1:
            # card_index = firstPlay(hands[player], deck, tricks)
            card_index = nnPlay(mod1, hands[player], hist, deck, tricks)
        tricks.append(hands[player][card_index])
    return tricks

In [71]:
deck0 = generateDeck(N=10, P=2)
num_of_games = 1000
w1 = 0
w2 = 0
d = 0

for i in range(0, num_of_games):
    # Lets play num_of_games games
    hand0 = gererateHand(N=10, P=2, seed=10000 * i) # seed is fixed

    (hist, win, who_play) = play(hand0, deck0, 0)
    (shift_hist, shift_win, shift_who_play) = play(shiftHand(hand0), deck0, 1)

    if sum([w == 0 for w in win]) > sum([w == 1 for w in shift_win]):
        w1 += 1
    elif sum([w == 0 for w in win]) < sum([w == 1 for w in shift_win]):
        w2 += 1
    else:
        d += 1
print(w1 / num_of_games * 100, w2 / num_of_games * 100, d / num_of_games * 100)


10.7 73.4 15.9

Example of generated results :

  • 10.7 73.4 15.9

The second neural network seem way better !


In [ ]: