2015 August 28

## Agenda

1. A simulation approach to building intuition about A/B testing
2. Practical requirements of testing and operation
3. Optimizing outcomes with multiple options with different payoffs

## 1. A simulation approach to building intuition about A/B testing

### A/B Test:

• Treatment (e.g. content presented)
• Reward (e.g. impression, click, proceeds of sale)
• Audience-in-context


In [ ]:

from IPython.display import Image
Image(filename='img/treat_aud_reward.jpg')




In [ ]:

import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
from numpy.random import binomial
from ggplot import *
import random
import sys
plt.figure(figsize=(6,6),dpi=80);
%matplotlib inline



### Simulating A/B testing



In [ ]:

Image(filename='img/ab.jpg')



Each test is like flipping a fair coin N times



In [ ]:

Image(filename='img/a.jpg')




In [ ]:

df = pd.DataFrame({"coin_toss":binomial(1,0.5,100)})
df.hist()
plt.show()




In [ ]:




In [ ]:

df = pd.DataFrame({"coin_toss":binomial(1,0.6,100)})
df.hist()
plt.show()




In [ ]:

df = pd.DataFrame({"coin_%i"%i:binomial(1,0.5,100) for i in range(20)})
df.hist()
plt.show()




In [ ]:

df = pd.DataFrame({"coin_%i"%i:binomial(1,0.52,100) for i in range(20)})
df.hist()
plt.show()



### Rewards for choosing A or B



In [ ]:

# 1 arm
payoff = [-0.1,0.5]
a = np.bincount(binomial(1,0.5,100))
print np.dot(a, payoff)



### Rewards for choosing A or B or C



In [ ]:

# 2-arm, equal reward
payoff = [0,1,2]
a = np.bincount(binomial(2,0.5,100))
print np.dot(a, payoff)



### Simulating multiple options with different payoffs



In [ ]:

payoff=[1,1.05]
reward1 = np.bincount(binomial(1,0.5,100))[1] * payoff[0]
print reward1
reward2 = np.bincount(binomial(1,0.5,100))[1] * payoff[1]
print reward2
total_reward = reward1 + reward2
print total_reward



### Can we differentiat two arms with different payoffs?



In [ ]:

def trial(payoff=[1, 1.01]):
reward1 = np.bincount(binomial(1,0.5,100))[1] * payoff[0]
reward2 = np.bincount(binomial(1,0.5,100))[1] * payoff[1]
return reward1, reward2, reward1 + reward2, reward1-reward2

trials = 100
sim = np.array([trial() for i in range(trials)])
df = pd.DataFrame(sim, columns=["t1", "t2", "tot", "diff"])
print len(df[df["diff"] <= 0.0])
df.hist()
plt.show()



### Can we differentiat two arms with different probabilities?



In [ ]:

def trial(ps=[0.5, 0.51], payoff=[1, 1]):
reward1 = np.bincount(binomial(1,ps[0],100))[1] * payoff[0]
reward2 = np.bincount(binomial(1,ps[1],100))[1] * payoff[1]
return reward1, reward2, reward1 + reward2, reward1-reward2
trials = 100
sim = np.array([trial() for i in range(trials)])
df = pd.DataFrame(sim, columns=["t1", "t2", "tot", "diff"])
print len(df[df["diff"] <= 0.0])
df.hist()
plt.show()



### More Arms



In [ ]:

Image(filename='img/abcd.jpg')




In [ ]:

df = pd.DataFrame({"tot_reward":binomial(2,0.5,100)})
df.hist()
plt.show()




In [ ]:

df = pd.DataFrame({"tot_reward":binomial(3,0.5,100)})
df.hist()
plt.show()




In [ ]:

trials = 100
means = [0.1, 0.1, 0.9]
reward = np.zeros(trials)
for m in means:
# equal rewards of 1 or 0
reward += binomial(1,m,trials)
df = pd.DataFrame({"reward":reward, "fair_reward":binomial(3,0.5,trials)})
df.hist()
plt.show()



## 2. Practical requirements of testing and operation

### Explore vs Exploit?

• Marketer is going to have a hard time waiting for rigorous testing as winners appear to emerge
• Implement online system vs. testing system
• Flavor: Chase successes vs. analyzing failures
• Want to automate balance of explore and exploit and run continuously
• Some danger in forgetting about significance and power

### Bandit Book Utilities

from "Bandit Algorithms" by John Myles White

Three utilities:

• arms
• algorithms
• monte carlo testing framework


In [ ]:

sys.path.append('../../BanditsBook/python')
from core import *



## 3. Optimizing outcomes with multiple options with different payoffs

### Epsilon Greedy

1. Split on audience + context + creative with fraction
2. Keep track of past results
3. Split experiment and exploit offerings with probability epsilon

Notes:

• epsilon is the fraction of exploration
• randomize everything all the time


In [ ]:

random.seed(1)
# Mean arm payoff (Bernoulli)
means = [0.1, 0.1, 0.1, 0.1, 0.9]
# Mulitple arms!
n_arms = len(means)
random.shuffle(means)
arms = map(lambda (mu): BernoulliArm(mu), means)
print("Best arm is " + str(ind_max(means)))

t_horizon = 250
n_sims = 1000

data = []
for epsilon in [0.1, 0.2, 0.3, 0.4, 0.5]:
algo = EpsilonGreedy(epsilon, [], [])
algo.initialize(n_arms)
# results are column oriented
# simulation_num, time, chosen arm, reward, cumulative reward
results = test_algorithm(algo, arms, n_sims, t_horizon)
results.append([epsilon]*len(results[0]))
data.extend(np.array(results).T)

df = pd.DataFrame(data, columns = ["Sim", "T", "ChosenArm", "Reward", "CumulativeReward", "Epsilon"])




In [ ]:

a=df.groupby(["Epsilon", "T"]).mean().reset_index()




In [ ]:

ggplot(aes(x="T",y="Reward", color="Epsilon"), data=a) + geom_line()




In [ ]:

ggplot(aes(x="T",y="CumulativeReward", color="Epsilon"), data=a) + geom_line()



### Anealing Softmax

Upgrades to $\epsilon$-Greedy:

• Need to run more experiments if rewards appear to be nearly equal
• Keep track of results for exploration as well as exploitation

Tempted choose each are in proportion to its current value, i.e.:

$p(A) \propto \frac{rA}{rA + RB}$

$p(B) \propto \frac{rB}{rA + RB}$

Remember Boltzmann, and about adding a temperature, $\tau$:

$p(A) \propto \frac{-\exp(rA/\tau)}{(\exp(rA/\tau) + \exp(rB/\tau))}$

$p(B) \propto \frac{-\exp(rB/\tau)}{(\exp(rA/\tau) + \exp(rB/\tau))}$

And what is annealing?

• $\tau = 0$ is deterministic case of winner takes all
• $\tau = \infty$ is all random, all time
• Let the temperature go to zero over time to settle into the state slowly (adiabatically)


In [ ]:

t_horizon = 250
n_sims = 1000

algo = AnnealingSoftmax([], [])
algo.initialize(n_arms)
data = np.array(test_algorithm(algo, arms, n_sims, t_horizon)).T
df = pd.DataFrame(data)
df.columns = ["Sim", "T", "ChosenArm", "Reward", "CumulativeReward"]
a=df.groupby(["T"]).mean().reset_index()




In [ ]:

ggplot(aes(x="T",y="Reward", color="Sim"), data=a) + geom_line()




In [ ]:

ggplot(aes(x="T",y="CumulativeReward", color="Sim"), data=a) + geom_line()



### UCB2

Add a confidnece measure to our estimates of averages!



In [ ]:

t_horizon = 250
n_sims = 1000

data = []
for alpha in [0.1, 0.3, 0.5, 0.7, 0.9]:
algo = UCB2(alpha, [], [])
algo.initialize(n_arms)
results = test_algorithm(algo, arms, n_sims, t_horizon)
results.append([alpha]*len(results[0]))
data.extend(np.array(results).T)

df = pd.DataFrame(data, columns = ["Sim", "T", "ChosenArm", "Reward", "CumulativeReward", "Alpha"])




In [ ]:

a=df.groupby(["Alpha", "T"]).mean().reset_index()




In [ ]:

ggplot(aes(x="T",y="Reward", color="Alpha"), data=a) + geom_line()




In [ ]:

ggplot(aes(x="T",y="CumulativeReward", color="Alpha"), data=a) + geom_line()



The end