In [41]:
trials <- 500
n <- 100
plow <- 0.1
xlow <- rbinom(trials,n,plow)
phigh <- 0.9
xhigh <- rbinom(trials,n,phigh)
xmixed <- c(x,y)
trials <- 1000
pmiddle <- 0.5
xmiddle <- rbinom(trials,n,pmiddle)
The means and hence mles are are what we expect.
In [45]:
c( mean(xlow)/n, mean(xhigh)/n, mean(xmiddle)/n, mean(xmixed)/n )
Out[45]:
The unmixed distributions are just binomial, with variances given as:
In [47]:
n*c( plow*(1-plow), phigh*(1-phigh), pmiddle*(1-pmiddle) )
Out[47]:
In [ ]:
Which matches with the experiement:
In [48]:
c( var(xlow),var(xhigh),var(xmiddle) )
Out[48]:
But have a look at the mle =mean/n and variance of the mixed vector, where half the trials have p=.1 and half p=.9:
In [53]:
c( mean(xmixed)/n, var(xmixed))
Out[53]:
This much bigger variance is overdispersion. The histograms look as you'd expect:
In [56]:
hist(xmixed)
In [57]:
hist(xmiddle)
In [58]:
hist(xlow)
In [59]:
hist(xhigh)
In [ ]: