The Gibbs Sampling Algorithm (Explained Through a Practial Example)

In this notebook we are going to explain how the Gibbs Sampling Algorithm works through a very easy to understand example. But before we jump to the example, we are going to see some theory.

A little bit of theory

The Gibbs sampler is a method that helps us to obtain the probability distribution of two or more variables, when the direct calculation is very difficult. In other words, we are trying to calculate the probability of p(A,B), using a sampling method, because direct calculation is very difficult (for example when we have to calculate a very complicated integral).

The Gibbs sampler recreates the probabilty distribution p(A,B) by calculating p(A|B) and p(A|B) (assuming that both are very easy to calculate). It starts obtaining values for A and B, based on p(A|B) and p(B|A) until a sufficient number of values is reached. When all the values are obtained, then one calculate p(A=a,B=b) by counting the number of sample with A=a and B=b divided by the number of total samples.

The algorithm is the following:

Set random values for the variables A and B.
Inside a loop and until the desired number of cycles do the following steps:
1. The loop will start with A=a_t and B=b_t
2. Randomly select a variable to sample.
3. Once this variable is selected, then leave the variable of the other variable still. In other words, if A is selected, then the value of B=b_t.
4. Calculate the probability of p(A|b=b_t) (if A was selected, or p(B|A=a_t) if B was selected)
5. Generate a random number [0,1) and obtain the new value of A, which we will call a_(t+1).
6. Update the values of the variales that have been left still with bt = b(t+1)
Repeat the cycle until desired
Finally with a number of sample that is large enough, an accurate of p(A,B) can be calculated.

Since the Gibbs sampler is a Markov Chain Monte Carlo technique, the samples obtained are not independent from the previuos sample obtained. In order to have a more accurate list of samples, a technique called thinning is used. This technique consists on taking just taking the samples every Nth cycle, for instance taking the samples every 100 cycles.

Another technique used is ignoring the first N sample, this is called the burn-in period.

A very easy to understand example

To explain this algorithm we are going to use a cards example.

Let's say we have a deck of cards but some cards are missing. The total missing cards are:

no faces in clubs
1 face in spades
2 faces in diamonds
3 faces in hearts
6 numbers in clubs
8 numbers in spades
7 numbers in diamonds
10 numbers in hearts

In total, 13 cards missing

Probability table of our card problem

To validate that our Gibbs sampler is working correctly, we are going to present the probability table

p(a=red) = 22/37 p(a=black) = 15/37

p(b=face) = 6/37 p(b=number) = 31/37

p(a=red, b=face) = 5/37 p(a=black, b=face) = 1/37 p(a=red, b=number) = 17/37 p(a=black, b=number) = 14/37

p(a=red | b=face) = 5/6 p(a=black | b=face) 1/6

p(a=red | b=number) = 17/31 p(a=black | b=number) = 14/31

p(b=face | a=red) = 5/22 p(b=number | a=red) = 17/22

p(b=face | a=black) = 1/15 p(b=number | a=black) = 14/15

The idea is then to calculate the values of p(A,B) just by using p(A|B) and p(B|A)

Implementation



In [1]:

    
import random

__author__ = 'fpena'


colors = ['red', 'black']
symbols = ['faces', 'numbers']

probability_table = [
    # Probability that color is red given that symbols is face (position 0) or
    # symbol is number (position 1)
    [5./6, 17./31],

    # Probability that symbol is face given that color is red (position 0) or
    # color is black (position 1)
    [5./22, 1./15]
]

class GibbsSampler:

    def __init__(self):
        self.sample_list = []

    def sample(self):

        # Set the initial values for the variables
        values = [
            random.randint(0, 1),  # Color value
            random.randint(0, 1)   # Symbol value
        ]

        for i in xrange(3700000):
            # Select which variable to take (colors or card types)
            var_type = random.randint(0, 1)
            if var_type == 0:
                still_variable = 1
            else:
                still_variable = 0

            new_random_value = random.random()

            if new_random_value > probability_table[var_type][values[still_variable]]:
                values[var_type] = 1
            else:
                values[var_type] = 0
            if i % 100 == 0:
                self.store_variables(values[0], values[1])
            # self.store_variables(values[0], values[1])

        print('Length', len(self.sample_list))

    def store_variables(self, color, symbol):
        self.sample_list.append((colors[color], symbols[symbol]))
        # print_variables(color, symbol)

    def count_results(self):

        pairs = [
            ('red', 'faces'),
            ('red', 'numbers'),
            ('black', 'faces'),
            ('black', 'numbers')
        ]

        for pair in pairs:
            pair_count = 0
            for sample in self.sample_list:
                if sample == pair:
                    pair_count += 1

            print(pair, pair_count)


def print_variables(color, symbol):
    print(colors[color], symbols[symbol])


sampler = GibbsSampler()
sampler.sample()
sampler.count_results()









    



('Length', 37000)
(('red', 'faces'), 4980)
(('red', 'numbers'), 17036)
(('black', 'faces'), 981)
(('black', 'numbers'), 14003)

As you can see in the resutls above, the probability values obtained are very closed to the original ones (just divide the values between 1000 to obtain the expected number of cards).