2015 August 28
This section isn't a cookbook for A/B testing. Rather, I am pointing to some key aspecst of how we design and analyse A/B tests that will be useful when we get to the section on bandit algorithms.
In [ ]:
from IPython.display import Image
Image(filename='img/treat_aud_reward.jpg')
In [ ]:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
from numpy.random import binomial
from ggplot import *
import random
import sys
plt.figure(figsize=(6,6),dpi=80);
%matplotlib inline
In [ ]:
Image(filename='img/ab.jpg')
Each test is like flipping a fair coin N times
In [ ]:
Image(filename='img/a.jpg')
In [ ]:
# This is A/ testing!
# This is the result of 1 arm, 100 trials
df = pd.DataFrame({"coin_toss":binomial(1,0.5,100)})
df.hist()
# Everyone got the same treatment, this is the distribution of the outcome
# reward is the total height of the right-hand bar
plt.show()
Run the cell above a few times.
Can you easily make the case that the average reward is 50?
In [ ]:
# every sample is 0/1, heads or tails
df.head()
In [ ]:
# now with a high probability of heads
df = pd.DataFrame({"coin_toss":binomial(1,0.6,100)})
df.hist()
plt.show()
In [ ]:
# Compare the variability across many different experiments
# of 100 flips each (variability of the mean)
df = pd.DataFrame({"coin_%i"%i:binomial(1,0.5,100) for i in range(20)})
df.hist()
plt.show()
In [ ]:
# Can we distinguish a small differce in probability?
df = pd.DataFrame({"coin_%i"%i:binomial(1,0.52,100) for i in range(20)})
df.hist()
plt.show()
In [ ]:
# 1 arm
payoff = [-0.1,0.5]
a = np.bincount(binomial(1,0.5,100))
print "Number of 0s and 1s:", a
print "Total reward with pay off specified =", np.dot(a, payoff)
In [ ]:
# 2-arm, equal unity reward per coin
# (4 outcomes but 1,0=0,1 with this payoff vector)
payoff = [0,1,2]
a = np.bincount(binomial(2,0.5,100))
print a
print np.dot(a, payoff)
But more often,
For example, imagine the case of tweet engagement.
In [ ]:
payoff1=[0,1]
reward1 = np.dot(np.bincount(binomial(1,0.5,100)), payoff1)
print "Arm A reward = ", reward1
payoff2=[0,1.05]
reward2 = np.dot(np.bincount(binomial(1,0.5,100)), payoff2)
print "Arm B reward = ", reward2
total_reward = reward1 + reward2
print "Total reward for arms A and B = ", total_reward
Why worry about the total reward? I thought we wanted to know if A > B?
Hold that thought for a couple of more cells...
From now on, assume reward = 0 for outcome 0 to keep thing a little simpler. Everything we do can be generalized to a reward vector as above.
Lets ask about choosing a winner.
In [ ]:
def a_b_test(one_payoff=[1, 1.01]):
# assume payoff for outcome 0 is 0
reward1 = np.bincount(binomial(1,0.5,100))[1] * one_payoff[0]
reward2 = np.bincount(binomial(1,0.5,100))[1] * one_payoff[1]
return reward1, reward2, reward1 + reward2, reward1-reward2
n_tests = 1000
sim = np.array([a_b_test() for i in range(n_tests)])
df = pd.DataFrame(sim, columns=["t1", "t2", "tot", "diff"])
print "Number of tests in which Arm B won (expect > {} because of payoff) = {}".format(
n_tests/2
, len(df[df["diff"] <= 0.0]))
df.hist()
plt.show()
Now is them it jump to your power and significance testing expertise.
We are going to continue building intution through simulation.
In [ ]:
def a_b_test(ps=[0.5, 0.51], one_payoff=[1, 1]):
reward1 = np.bincount(binomial(1,ps[0],100))[1] * one_payoff[0]
reward2 = np.bincount(binomial(1,ps[1],100))[1] * one_payoff[1]
return reward1, reward2, reward1 + reward2, reward1-reward2
n_tests= 100
sim = np.array([a_b_test() for i in range(n_tests)])
df = pd.DataFrame(sim, columns=["t1", "t2", "tot", "diff"])
print "Number of tests in which Arm B won (expect > {} because of probability) = {}".format(
n_tests/2
, len(df[df["diff"] <= 0.0]))
df.hist()
plt.show()
In [ ]:
Image(filename='img/abcd.jpg')
In [ ]:
# repeating what did before with equal equal payoff, more arms
# remember the degenerate outcomes
df = pd.DataFrame({"tot_reward":binomial(2,0.5,100)})
df.hist()
plt.show()
In [ ]:
# ok, now with 4
df = pd.DataFrame({"tot_reward":binomial(4,0.5,100)})
df.hist()
plt.show()
In [ ]:
# a little more practice with total reward distribution
trials = 100
probabilities = [0.1, 0.1, 0.9]
reward = np.zeros(trials)
for m in probabilities:
# equal rewards of 1 or 0
reward += binomial(1,m,trials)
df = pd.DataFrame({"reward":reward, "fair__uniform_reward":binomial(3,0.5,trials)})
df.hist()
plt.show()
So maybe set some new objectives instead of is A>B:
Carefull:
In [ ]:
sys.path.append('../../BanditsBook/python')
from core import *
How can we explore arms and exploit the best arm more often, but still explore?
Answer 1: occasionally, we randomly explore losers.
Notes:
What's a good value for $\epsilon$?
What are these parameters when one of the options is 9 times better than all of the others?
Needs a simulation!
(To keep it simple outcome 1/0 has reward 1/0.)
In [ ]:
random.seed(1)
# Mean (arm probabilities) (Bernoulli)
means = [0.1, 0.1, 0.1, 0.1, 0.9]
# Mulitple arms!
n_arms = len(means)
random.shuffle(means)
arms = map(lambda (mu): BernoulliArm(mu), means)
print("Best arm is " + str(ind_max(means)))
t_horizon = 250
n_sims = 1000
data = []
for epsilon in [0.1, 0.2, 0.3, 0.4, 0.5]:
algo = EpsilonGreedy(epsilon, [], [])
algo.initialize(n_arms)
# results are column oriented
# simulation_num, time, chosen arm, reward, cumulative reward
results = test_algorithm(algo, arms, n_sims, t_horizon)
results.append([epsilon]*len(results[0]))
data.extend(np.array(results).T)
df = pd.DataFrame(data
, columns = ["Sim"
, "T"
, "ChosenArm"
, "Reward"
, "CumulativeReward"
, "Epsilon"])
df.head()
In [ ]:
a=df.groupby(["Epsilon", "T"]).mean().reset_index()
a.head()
In [ ]:
ggplot(aes(x="T",y="Reward", color="Epsilon"), data=a) + geom_line()
In [ ]:
ggplot(aes(x="T",y="CumulativeReward", color="Epsilon"), data=a) + geom_line()
Upgrades to $\epsilon$-Greedy:
Tempted choose each are in proportion to its current value, i.e.:
$p(A) \propto \frac{rA}{rA + RB}$
$p(B) \propto \frac{rB}{rA + RB}$
Remember Boltzmann, and about adding a temperature, $\tau$:
$p(A) \propto \frac{-\exp(rA/\tau)}{(\exp(rA/\tau) + \exp(rB/\tau))}$
$p(B) \propto \frac{-\exp(rB/\tau)}{(\exp(rA/\tau) + \exp(rB/\tau))}$
And what is annealing?
In [ ]:
t_horizon = 250
n_sims = 1000
algo = AnnealingSoftmax([], [])
algo.initialize(n_arms)
data = np.array(test_algorithm(algo, arms, n_sims, t_horizon)).T
df = pd.DataFrame(data)
#df.head()
df.columns = ["Sim", "T", "ChosenArm", "Reward", "CumulativeReward"]
df.head()
a=df.groupby(["T"]).mean().reset_index()
a.head()
In [ ]:
ggplot(aes(x="T",y="Reward", color="Sim"), data=a) + geom_line()
In [ ]:
ggplot(aes(x="T",y="CumulativeReward", color="Sim"), data=a) + geom_line()
In [ ]:
t_horizon = 250
n_sims = 1000
data = []
for alpha in [0.1, 0.3, 0.5, 0.7, 0.9]:
algo = UCB2(alpha, [], [])
algo.initialize(n_arms)
results = test_algorithm(algo, arms, n_sims, t_horizon)
results.append([alpha]*len(results[0]))
data.extend(np.array(results).T)
df = pd.DataFrame(data, columns = ["Sim", "T", "ChosenArm", "Reward", "CumulativeReward", "Alpha"])
df.head()
In [ ]:
a=df.groupby(["Alpha", "T"]).mean().reset_index()
a.head()
In [ ]:
ggplot(aes(x="T",y="Reward", color="Alpha"), data=a) + geom_line()
In [ ]:
ggplot(aes(x="T",y="CumulativeReward", color="Alpha"), data=a) + geom_line()
The end