Overdispersion for binomial

Let's look at an example of how overdispersion can happen in a binomial setting and how to detect it. Consider three binomial random variables, and three vectors of trials, one with plow = .1, one with phigh = .9, and one with pmiddle = .5phigh + .5 plow = .5.



In [41]:

    
trials <- 500
n <- 100
plow <- 0.1 
xlow <- rbinom(trials,n,plow)
phigh <- 0.9
xhigh <- rbinom(trials,n,phigh)
xmixed <- c(x,y)
trials <- 1000
pmiddle <- 0.5 
xmiddle <- rbinom(trials,n,pmiddle)

The means and hence mles are are what we expect.



In [45]:

    
c( mean(xlow)/n, mean(xhigh)/n, mean(xmiddle)/n, mean(xmixed)/n )

The unmixed distributions are just binomial, with variances given as:



In [47]:

    
n*c( plow*(1-plow), phigh*(1-phigh), pmiddle*(1-pmiddle) )









    Out[47]:





	9
	9
	25



In [ ]:

    
Which matches with the experiement:



In [48]:

    
c( var(xlow),var(xhigh),var(xmiddle) )









    Out[48]:





	9.3031623246493
	9.02500601202405
	24.5825665665666

But have a look at the mle =mean/n and variance of the mixed vector, where half the trials have p=.1 and half p=.9:



In [53]:

    
c( mean(xmixed)/n, var(xmixed))









    Out[53]:





	0.49885
	1613.24301801802

This much bigger variance is overdispersion. The histograms look as you'd expect:



In [56]:

    
hist(xmixed)



In [57]:

    
hist(xmiddle)



In [58]:

    
hist(xlow)



In [59]:

    
hist(xhigh)



In [ ]: