Overdispersion for binomial

Let's look at an example of how overdispersion can happen in a binomial setting and how to detect it. Consider three binomial random variables, and three vectors of trials, one with plow = .1, one with phigh = .9, and one with pmiddle = .5phigh + .5 plow = .5.


In [41]:
trials <- 500
n <- 100
plow <- 0.1 
xlow <- rbinom(trials,n,plow)
phigh <- 0.9
xhigh <- rbinom(trials,n,phigh)
xmixed <- c(x,y)
trials <- 1000
pmiddle <- 0.5 
xmiddle <- rbinom(trials,n,pmiddle)

The means and hence mles are are what we expect.


In [45]:
c( mean(xlow)/n, mean(xhigh)/n, mean(xmiddle)/n, mean(xmixed)/n )


Out[45]:
  1. 0.10038
  2. 0.89862
  3. 0.49996
  4. 0.49885

The unmixed distributions are just binomial, with variances given as:


In [47]:
n*c( plow*(1-plow), phigh*(1-phigh), pmiddle*(1-pmiddle) )


Out[47]:
  1. 9
  2. 9
  3. 25

In [ ]:
Which matches with the experiement:

In [48]:
c( var(xlow),var(xhigh),var(xmiddle) )


Out[48]:
  1. 9.3031623246493
  2. 9.02500601202405
  3. 24.5825665665666

But have a look at the mle =mean/n and variance of the mixed vector, where half the trials have p=.1 and half p=.9:


In [53]:
c( mean(xmixed)/n, var(xmixed))


Out[53]:
  1. 0.49885
  2. 1613.24301801802

This much bigger variance is overdispersion. The histograms look as you'd expect:


In [56]:
hist(xmixed)



In [57]:
hist(xmiddle)



In [58]:
hist(xlow)



In [59]:
hist(xhigh)



In [ ]: