Code and exercises from my workshop on Bayesian statistics in Python.
Copyright 2018 Allen Downey
MIT License: https://opensource.org/licenses/MIT
In [ ]:
from __future__ import print_function, division
%matplotlib inline
import numpy as np
from thinkbayes2 import Suite
import thinkplot
import warnings
warnings.filterwarnings('ignore')
In [ ]:
class Bandit(Suite):
def Likelihood(self, data, hypo):
"""
hypo is the prob of win (0-100)
data is a string, either 'W' or 'L'
"""
x = hypo / 100
if data == 'W':
return x
else:
return 1-x
We'll start with a uniform distribution from 0 to 100.
In [ ]:
bandit = Bandit(range(101))
thinkplot.Pdf(bandit)
thinkplot.Config(xlabel='x', ylabel='Probability')
Now we can update with a single loss:
In [ ]:
bandit.Update('L')
thinkplot.Pdf(bandit)
thinkplot.Config(xlabel='x', ylabel='Probability', legend=False)
Another loss:
In [ ]:
bandit.Update('L')
thinkplot.Pdf(bandit)
thinkplot.Config(xlabel='x', ylabel='Probability', legend=False)
And a win:
In [ ]:
bandit.Update('W')
thinkplot.Pdf(bandit)
thinkplot.Config(xlabel='x', ylabel='Probability', legend=False)
Starting over, here's what it looks like after 1 win and 9 losses.
In [ ]:
bandit = Bandit(range(101))
for outcome in 'WLLLLLLLLL':
bandit.Update(outcome)
thinkplot.Pdf(bandit)
thinkplot.Config(xlabel='x', ylabel='Probability', legend=False)
The posterior mean is about 17%
In [ ]:
bandit.Mean()
The most likely value is the observed proportion 1/10
In [ ]:
bandit.MAP()
The posterior credible interval has a 90% chance of containing the true value (provided that the prior distribution truly represents our background knowledge).
In [ ]:
bandit.CredibleInterval(90)
Now suppose we have several bandits and we want to decide which one to play.
For this example, we have 4 machines with these probabilities:
In [ ]:
actual_probs = [0.10, 0.20, 0.30, 0.40]
The following function simulates playing one machine once.
In [ ]:
from random import random
from collections import Counter
counter = Counter()
def flip(p):
return random() < p
def play(i):
counter[i] += 1
p = actual_probs[i]
if flip(p):
return 'W'
else:
return 'L'
Here's a test, playing machine 3 twenty times:
In [ ]:
for i in range(20):
result = play(3)
print(result, end=' ')
Now I'll make 4 Bandit
objects to represent our beliefs about the 4 machines.
In [ ]:
prior = range(101)
beliefs = [Bandit(prior) for i in range(4)]
This function displays the four posterior distributions
In [ ]:
options = dict(yticklabels='invisible')
def plot(beliefs, **options):
thinkplot.preplot(rows=2, cols=2)
for i, b in enumerate(beliefs):
thinkplot.subplot(i+1)
thinkplot.Pdf(b, label=i)
thinkplot.Config(**options)
In [ ]:
plot(beliefs, legend=True)
Now suppose we play each machine 10 times. This function updates our beliefs about one of the machines based on one outcome.
In [ ]:
def update(beliefs, i, outcome):
beliefs[i].Update(outcome)
In [ ]:
for i in range(4):
for _ in range(10):
outcome = play(i)
update(beliefs, i, outcome)
In [ ]:
plot(beliefs, legend=True)
After playing each machine 10 times, we have some information about their probabilies:
In [ ]:
[belief.Mean() for belief in beliefs]
To get more information, we could play each machine 100 times, but while we are gathering data, we are not making good use of it. The kernel of the Bayesian Bandits algorithm is that is collects and uses data at the same time. In other words, it balances exploration and exploitation.
The following function chooses among the machines so that the probability of choosing each machine is proportional to its "probability of superiority".
Random
chooses a value from the posterior distribution.
argmax
returns the index of the machine that chose the highest value.
In [ ]:
def choose(beliefs):
ps = [b.Random() for b in beliefs]
return np.argmax(ps)
Here's an example.
In [ ]:
choose(beliefs)
Putting it all together, the following function chooses a machine, plays once, and updates beliefs
:
In [ ]:
def choose_play_update(beliefs, verbose=False):
i = choose(beliefs)
outcome = play(i)
update(beliefs, i, outcome)
if verbose:
print(i, outcome, beliefs[i].Mean())
Here's an example
In [ ]:
counter = Counter()
choose_play_update(beliefs, verbose=True)
Let's start again with a fresh set of machines:
In [ ]:
beliefs = [Bandit(prior) for i in range(4)]
Now we can play a few times and see how beliefs
gets updated:
In [ ]:
num_plays = 100
for i in range(num_plays):
choose_play_update(beliefs)
plot(beliefs)
We can summarize beliefs
by printing the posterior mean and credible interval:
In [ ]:
for i, b in enumerate(beliefs):
print(b.Mean(), b.CredibleInterval(90))
The credible intervals usually contain the true values (10, 20, 30, and 40).
The estimates are still rough, especially for the lower-probability machines. But that's a feature, not a bug: the goal is to play the high-probability machines most often. Making the estimates more precise is a means to that end, but not an end itself.
Let's see how many times each machine got played. If things go according to play, the machines with higher probabilities should get played more often.
In [ ]:
for machine, count in sorted(counter.items()):
print(machine, count)
Exercise: Go back and run this section again with a different value of num_play
and see how it does.
In [ ]: