Given the issues we saw with the confidence interval for binomial using Wald and Fisher information, let's look at the logit or log-odds transformed version. We begin the same way.


In [333]:
trials<-1000
n<-100
p<-0.1 # make this smaller to see the CI % get farther off
#rbinom(100,100,.2) draws from Bin(n,p) ten times
x<-rbinom(trials,n,p)

And the vector of maximum likelihood estimators is constructed in the same way.


In [334]:
phat = x/n

The difference is that now we are going to look at the variance of the maximum likelihood estimator as a random variable in a different coordinate system (a nonlinear change of coordinates). This eliminates the hard bondaries at 0 and 1. The transformation we will use is called the logit or log-odds transform.


In [335]:
ps = seq(from=0,to=1,by=.01)
logit_of_p = log(ps/(1-ps))
plot(ps,logit_of_p, type='l')
title("graph of logit")


Applying this transformation, lets define $$\hat{\phi} = \log \frac{\hat{p}}{1-\hat{p}}$$


In [336]:
phihat = log (phat/(1-phat)) #vector of estimates
phi  = log (p/(1-p)) # true value
phi


Out[336]:
-2.19722457733622

And now proceed as before in confidence interval construction, obtaining a vector of Fisher informations. The wrinkle is that we need to adjust our formula for the Fisher information using a little calculus. We have $$I(\hat{\phi})=\dfrac{I(\hat{p})}{[\phi'(\hat{p})]^2}, \;\;\text{and}$$ $$\phi'(\hat{p}) = \frac{1}{\hat{p}(1-\hat{p})}.$$ So the Fisher information is computed from $\hat{\phi}$ as follows.


In [337]:
phi_prime = 1/(phat*(1-phat))
FI_of_p = n/(phat*(1-phat))
FI = FI_of_p / (phi_prime^2)

In [338]:
varhat = 1/FI # the vector of estimated variances for the transformed mle, in phi-coordinates

In [339]:
#varhat

Now we compute the vectors of lower and upper bounds of our confidence intervals, this time in the $\phi$ coordinate:


In [340]:
lower = phihat - 1.96*sqrt(varhat)

In [341]:
upper = phihat + 1.96*sqrt(varhat)

If this is a good CI, the next number should converge to .95 as trials $\rightarrow \infty$


In [342]:
outcome = lower< phi & phi<upper
outcome[is.na(outcome)]=FALSE # NAs happen when phat = 0, so the CI is degenerate and doesn't contain the true value
sum(outcome)/trials


Out[342]:
0.95

In [343]:
meta = 10000
metatrials = rep(0,meta);
for(i in 0:meta)
{
  x<-rbinom(trials,n,p);       # vector of outcomes
  phat = x/n;                  # vector of mles
  phihat = log (phat/(1-phat)) # vector of estimates
  phi  = log (p/(1-p))         # true value in phi coordinates (scalar)
    #Fisher information calculation
  phi_prime = 1/(phat*(1-phat))
  FI_of_p = n/(phat*(1-phat))
  FI = FI_of_p / (phi_prime^2)
   
  varhat = 1/FI;
  lower = phihat - 1.96*sqrt(varhat);
  upper = phihat + 1.96*sqrt(varhat);
  outcome = lower< phi & phi<upper
  outcome[is.na(outcome)]=FALSE # NAs happen when phat = 0, so the CI is degenerate and doesn't contain the true value
  inthere = sum(outcome)/trials 
  metatrials[i]=inthere;
}

In [344]:
#metatrials

In [345]:
plot(density(metatrials))


This gives better performance for our 95% CI.


In [ ]: