Code and exercises from my workshop on Bayesian statistics in Python.

``````

In [ ]:

from __future__ import print_function, division

%matplotlib inline

import numpy as np

from thinkbayes2 import Suite
import thinkplot

import warnings
warnings.filterwarnings('ignore')

``````

## The likelihood function

Here's a definition for `Bandit`, which extends `Suite` and defines a likelihood function that computes the probability of the data (win or lose) for a given value of `x` (the probability of win).

Note that `hypo` is in the range 0 to 100.

``````

In [ ]:

class Bandit(Suite):

def Likelihood(self, data, hypo):
"""
hypo is the prob of win (0-100)
data is a string, either 'W' or 'L'
"""
x = hypo / 100
if data == 'W':
return x
else:
return 1-x

``````

``````

In [ ]:

bandit = Bandit(range(101))
thinkplot.Pdf(bandit)
thinkplot.Config(xlabel='x', ylabel='Probability')

``````

Now we can update with a single loss:

``````

In [ ]:

bandit.Update('L')
thinkplot.Pdf(bandit)
thinkplot.Config(xlabel='x', ylabel='Probability', legend=False)

``````

Another loss:

``````

In [ ]:

bandit.Update('L')
thinkplot.Pdf(bandit)
thinkplot.Config(xlabel='x', ylabel='Probability', legend=False)

``````

And a win:

``````

In [ ]:

bandit.Update('W')
thinkplot.Pdf(bandit)
thinkplot.Config(xlabel='x', ylabel='Probability', legend=False)

``````

Starting over, here's what it looks like after 1 win and 9 losses.

``````

In [ ]:

bandit = Bandit(range(101))

for outcome in 'WLLLLLLLLL':
bandit.Update(outcome)

thinkplot.Pdf(bandit)
thinkplot.Config(xlabel='x', ylabel='Probability', legend=False)

``````

The posterior mean is about 17%

``````

In [ ]:

bandit.Mean()

``````

The most likely value is the observed proportion 1/10

``````

In [ ]:

bandit.MAP()

``````

The posterior credible interval has a 90% chance of containing the true value (provided that the prior distribution truly represents our background knowledge).

``````

In [ ]:

bandit.CredibleInterval(90)

``````

## Multiple bandits

Now suppose we have several bandits and we want to decide which one to play.

For this example, we have 4 machines with these probabilities:

``````

In [ ]:

actual_probs = [0.10, 0.20, 0.30, 0.40]

``````

The following function simulates playing one machine once.

``````

In [ ]:

from random import random
from collections import Counter

counter = Counter()

def flip(p):
return random() < p

def play(i):
counter[i] += 1
p = actual_probs[i]
if flip(p):
return 'W'
else:
return 'L'

``````

Here's a test, playing machine 3 twenty times:

``````

In [ ]:

for i in range(20):
result = play(3)
print(result, end=' ')

``````

Now I'll make 4 `Bandit` objects to represent our beliefs about the 4 machines.

``````

In [ ]:

prior = range(101)
beliefs = [Bandit(prior) for i in range(4)]

``````

This function displays the four posterior distributions

``````

In [ ]:

options = dict(yticklabels='invisible')

def plot(beliefs, **options):
thinkplot.preplot(rows=2, cols=2)
for i, b in enumerate(beliefs):
thinkplot.subplot(i+1)
thinkplot.Pdf(b, label=i)
thinkplot.Config(**options)

``````
``````

In [ ]:

plot(beliefs, legend=True)

``````

Now suppose we play each machine 10 times. This function updates our beliefs about one of the machines based on one outcome.

``````

In [ ]:

def update(beliefs, i, outcome):
beliefs[i].Update(outcome)

``````
``````

In [ ]:

for i in range(4):
for _ in range(10):
outcome = play(i)
update(beliefs, i, outcome)

``````
``````

In [ ]:

plot(beliefs, legend=True)

``````

After playing each machine 10 times, we have some information about their probabilies:

``````

In [ ]:

[belief.Mean() for belief in beliefs]

``````

## Bayesian Bandits

To get more information, we could play each machine 100 times, but while we are gathering data, we are not making good use of it. The kernel of the Bayesian Bandits algorithm is that is collects and uses data at the same time. In other words, it balances exploration and exploitation.

The following function chooses among the machines so that the probability of choosing each machine is proportional to its "probability of superiority".

`Random` chooses a value from the posterior distribution.

`argmax` returns the index of the machine that chose the highest value.

``````

In [ ]:

def choose(beliefs):
ps = [b.Random() for b in beliefs]
return np.argmax(ps)

``````

Here's an example.

``````

In [ ]:

choose(beliefs)

``````

Putting it all together, the following function chooses a machine, plays once, and updates `beliefs`:

``````

In [ ]:

def choose_play_update(beliefs, verbose=False):
i = choose(beliefs)
outcome = play(i)
update(beliefs, i, outcome)
if verbose:
print(i, outcome, beliefs[i].Mean())

``````

Here's an example

``````

In [ ]:

counter = Counter()
choose_play_update(beliefs, verbose=True)

``````

## Trying it out

Let's start again with a fresh set of machines:

``````

In [ ]:

beliefs = [Bandit(prior) for i in range(4)]

``````

Now we can play a few times and see how `beliefs` gets updated:

``````

In [ ]:

num_plays = 100

for i in range(num_plays):
choose_play_update(beliefs)

plot(beliefs)

``````

We can summarize `beliefs` by printing the posterior mean and credible interval:

``````

In [ ]:

for i, b in enumerate(beliefs):
print(b.Mean(), b.CredibleInterval(90))

``````

The credible intervals usually contain the true values (10, 20, 30, and 40).

The estimates are still rough, especially for the lower-probability machines. But that's a feature, not a bug: the goal is to play the high-probability machines most often. Making the estimates more precise is a means to that end, but not an end itself.

Let's see how many times each machine got played. If things go according to play, the machines with higher probabilities should get played more often.

``````

In [ ]:

for machine, count in sorted(counter.items()):
print(machine, count)

``````

Exercise: Go back and run this section again with a different value of `num_play` and see how it does.

``````

In [ ]:

``````