Hot Hot Hands

An exercise in Kruschke's Doing Bayesian Data Analysis has you build a probability model for pairs of free throws, the second conditioned on the success of the first, and asks whether a "hot hand" phenomenon exists among NBA players.

There's a nice demo running PyMC on IPCluster to solve this problem for many players at once, but why should they have all the fun? Here we work the original Larry Bird hot hands exercise, with some other bonuses.


In [21]:
# Set up the environment for successive computational cells. 
# This stuff will not be repeated below, be sure to run this cell first.

import pymc as mc
import numpy as np
from pymc import deterministic
from IPython.display import Image

class Struct:
    # convenience 'bag of attributes' class
    def __init__(self, **entries): self.__dict__.update(entries)
        
def show_graph(model):
    """display a graphical representation of the model"""
    graph = mc.graph.graph(model)
    Image(graph.create_png())

def posterior_hist(trace, label="", HDI=.95, alpha=1.0):   
    """Render a histogram of the posterior sampling distribution of a model variable."""
    t = sorted(trace)
    h = hist(t, normed=True, alpha=alpha, label="%s:%.2f" % (label, mean(t)))
    ymax = max(h[0])
    axvline(mean(t), c='red', linestyle='-.')
    hdidx = int(len(t)*(1 - HDI)/2.) 
    annotate("", xy=(t[hdidx],0), xytext=(t[-hdidx],0),
             arrowprops=dict(arrowstyle="|-|", connectionstyle="arc3", color='k', lw=2))

The text presents Larry Bird's free throw data as shot totals after an initial make or miss. We need to synthesize a list of throw events (success/fail) that match these proportions, to serve as the observed data for our Bernoulli likelihoods:


In [22]:
# free throw data for Larry Bird, 1980-1982. 
#
throws = Struct(
    hit_hit = [i < 251 for i in range(285)], # list of throw successes after an initial hit
    hit_miss = [i < 48 for i in range(53)]   # list of throw successes after an initial miss
)

On to the model. We are estimating two parameters: the probabilities of making a second throw, after making or missing the first throw. We are given the make/miss counts, which can be treated as Bernoulli trial outcomes. To perform Bayesian estimation of the two probabilities, Kruschke suggests a Beta(30,10) prior for each, corresponding to the belief that NBA players make about 75% of their throws, $E[[Beta_{\alpha,\beta}]] = \alpha/(\alpha+\beta)$.


In [23]:
def make_model():
    """
    Return a dict populated with our model variables.
    """
    # We need to feed a Model constructor a collection of named variables, and the 
    # prettiest way to do that is by exporting a dict of the local scope of a function, 
    # like we do here.

    # Beta priors for our Bernoulli likelihoods. Assume NBA players make 75% of their shots.
    phit_hit =  mc.Beta('phit_hit',  30, 10) # prob. of making second after making first
    phit_miss = mc.Beta('phit_miss', 30, 10) # prob. of making second after missing first

    # likelihood for observed second throws. Each throw is a Bernoulli trial.
    hit_hit = mc.Container(
                [mc.Bernoulli('hit_hit%d' % i, p=phit_hit, value=throw, observed=True) 
                for i, throw in enumerate(throws.hit_hit)])
    hit_miss = mc.Container(
                [mc.Bernoulli('hit_miss%d' % i, p=phit_miss, value=throw, observed=True)
                for i, throw in enumerate(throws.hit_miss)])

    # The statistic we care about is the difference between the hit/miss probabilities
    # You could get a deterministic implicitly with 'diff = phit_hit - phit_miss' but it 
    # would default to having a strange name and trace=False. LinearCombination sets it 
    # up as a one-liner.
    diff = mc.LinearCombination("diff", [phit_hit, phit_miss], [1, -1])
    
    return locals() # give it up for the MC!

# Instantiate and fit the model
S = mc.MCMC(make_model(), db='ram')
mc.MAP(S).fit() # load MAP estimates into our nodes as starting points
S.sample(10000, burn=1000, thin=2)
posterior_hist(S.phit_hit.trace(), "hit_hit", alpha=.5)
posterior_hist(S.phit_miss.trace(), "hit_miss", alpha=.5)
legend(loc='upper left')
show()
posterior_hist(S.diff.trace(), "diff", alpha=.5)
legend(loc='upper left')
show()
S.summary()


[****************100%******************]  10000 of 10000 complete
diff:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.022            0.042            0.001            [-0.054  0.106]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	-0.051           -0.006          0.019          0.05          0.111
	

phit_miss:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.843            0.039            0.001            [ 0.765  0.911]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.758            0.82            0.848          0.87          0.906
	

phit_hit:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.865            0.018            0.001            [ 0.829  0.9  ]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.826            0.853           0.866          0.876         0.899
	

Notice that a difference of 0 is within the 95% HDI, and thus it is credible to conclude no hot hand for Mr. Bird. But I'm bothered by using the same prior for hit/miss data having different numbers of observed hits and misses. To see this more clearly, consider a reparameterization of Beta in terms of its mean $\mu$ and a "confidence" K:

$Beta(a, b) = Beta(\mu*K, (1-\mu)*K)$

K controls the prior mean's weight against new observations during updating. Since we use the same prior for both throw probabilities, we have built-in an initial bias towards these probabilities being the same. But because there are very different numbers of hit_hit vs hit_miss events, their posteriors feel the effect of an equivalent K differently. If 75% is the proper unconditional mean, then we will only be 1/3 as certain about prior hit_miss outcomes as hit_hit outcomes (because we only see 1/3 as many initial misses as initial hits):


In [24]:
def make_model():
    """
    Return a dict populated with our model variables.
    """
    # Beta priors for our Bernoulli likelihoods. Assume NBA players make 75% of their shots.
    # We reparameterize Beta in terms of mean (mu) and confidence (K):
    # Beta(a, b) = Beta(mu*K, (1-mu)*K).
    mu, K_hit, K_miss = (.75, 40, 13.3 )
    phit_hit = mc.Beta('phit_hit',mu*K_hit, (1-mu)*K_hit) # prob. of making second after making first
    phit_miss = mc.Beta('phit_miss', mu*K_miss, (1-mu)*K_miss) # prob. of making second after missing first

    # likelihood for observed second throws. Each throw is a Bernoulli trial.
    hit_hit = mc.Container(
                [mc.Bernoulli('hit_hit%d' % i, p=phit_hit, value=throw, observed=True) 
                for i, throw in enumerate(throws.hit_hit)])
    hit_miss = mc.Container(
                [mc.Bernoulli('hit_miss%d' % i, p=phit_miss, value=throw, observed=True)
                for i, throw in enumerate(throws.hit_miss)])

    # The statistic we care about is the difference between the hit/miss probabilities
    diff = mc.LinearCombination("diff", [phit_hit, phit_miss], [1, -1])
    
    return locals()

# Instantiate and fit the model
S = mc.MCMC(make_model(), db='ram')
mc.MAP(S).fit() # load MAP estimates into our nodes as starting points
S.sample(10000, burn=1000, thin=2)
posterior_hist(S.phit_hit.trace(), "hit_hit", alpha=.5)
posterior_hist(S.phit_miss.trace(), "hit_miss", alpha=.5)
legend()
show()
posterior_hist(S.diff.trace(), "diff", alpha=.5)
legend()
show()
S.summary()


[****************100%******************]  10000 of 10000 complete
phit_miss:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.875            0.039            0.002            [ 0.8    0.941]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.792            0.849           0.881          0.903         0.934
	

phit_hit:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.865            0.02             0.001            [ 0.826  0.904]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.825            0.852           0.867          0.879         0.903
	

diff:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	-0.01            0.042            0.002            [-0.087  0.076]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	-0.084           -0.037          -0.014         0.018         0.081
	

Wha? The miss density is centered farther right than the hit density? Consider the original raw data: 48/53 is indeed greater than 251/285. The only reason our initial estimate showed a slightly hot hand is the unequal biasing towards the prior because of data size!

Continuing this example, recall that if you want to count successes in N Bernoulli(p) trials (like we do here for throws), that count will be distributed as Binomial(n,p). If we specify a Binomial likelihood for throw counts instead of a product of N Bernoulli likelihoods for individual throws, we should get the same estimate for the difference.


In [25]:
def make_model():
    """
    Return a dict populated with our model variables.
    """
    # Beta priors for our Bernoulli likelihoods. Assume NBA players make 75% of their shots.
    mu, K_hit, K_miss = (.75, 40, 13.3 )
    phit_hit = mc.Beta('phit_hit',mu*K_hit, (1-mu)*K_hit) # prob. of making second after making first
    phit_miss = mc.Beta('phit_miss', mu*K_miss, (1-mu)*K_miss) # prob. of making second after missing first

    # likelihood for observed number of second throw successs. Each is ~ Binomial 
    hit_hit = mc.Binomial('hit_hit', n = len(throws.hit_hit), p=phit_hit, value=sum(throws.hit_hit), observed=True)
    hit_miss = mc.Binomial('hit_miss', n = len(throws.hit_miss), p=phit_miss, value=sum(throws.hit_miss), observed=True) 

    # The statistic we care about is the difference between the hit/miss probabilities
    diff = mc.LinearCombination("diff", [phit_hit, phit_miss], [1, -1])
    
    return locals()

# Instantiate and fit the model
S = mc.MCMC(make_model(), db='ram')
mc.MAP(S).fit() # load MAP estimates into our nodes as starting points
S.sample(10000, burn=1000, thin=2)
posterior_hist(S.phit_hit.trace(), "hit_hit", alpha=.5)
posterior_hist(S.phit_miss.trace(), "hit_miss", alpha=.5)
legend(loc='upper left')
show()
posterior_hist(S.diff.trace(), "diff", alpha=.5)
legend()
show()
S.summary()


[****************100%******************]  10000 of 10000 complete
phit_hit:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.863            0.019            0.001            [ 0.823  0.901]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.823            0.852           0.863          0.876         0.901
	

phit_miss:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.875            0.042            0.002            [ 0.782  0.942]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.776            0.85            0.879          0.905         0.94
	

diff:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	-0.011           0.046            0.002            [-0.102  0.083]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	-0.095           -0.043          -0.016         0.016         0.097
	

There is a very small difference between this and the Bernoulli likelihood results, which we attribute to not recalibrating K relative to our repackaging of all observations into two Binomial observations.


In [25]: