In [1]:
import numpy
%pylab inline


Populating the interactive namespace from numpy and matplotlib

Mean and variance PDF of a Gaussian using ABC

The mean

As an illustration of Approximate Bayesian Computation (ABC), we will infer first the mean and then the variance of a Gaussian. First we generate the mock data


In [2]:
data= numpy.random.normal(size=100)

First we assume that we know the variance and constrain the PDF for the mean. Let's write a simple function that samples the PDF using ABC. The sample mean is a sufficient statistic for the mean, so we will use that together with the absolute value of the difference between that and that of the simulated data as the distance function. We use a prior on the mean that is flat between -2 and 2:


In [3]:
def Mean_ABC(n=1000,threshold=0.05):
    out= []
    for ii in range(n):
        d= threshold+1.
        while d > threshold:
            m= numpy.random.uniform()*4-2.
            sim= numpy.random.normal(size=len(data))+m
            d= numpy.fabs(numpy.mean(sim)-numpy.mean(data))
        out.append(m)
    return out

Now we sample the PDF using ABC:


In [4]:
mean_pdfsamples_abc= Mean_ABC()

Let's plot this, as well as the analytical PDF


In [5]:
h= hist(mean_pdfsamples_abc,range=[-1.,1.],bins=51,normed=True)
xs= numpy.linspace(-1.,1.,1001)
plot(xs,numpy.sqrt(len(data)/2./numpy.pi)*numpy.exp(-(xs-numpy.mean(data))**2./2.*len(data)),lw=2.)


Out[5]:
[<matplotlib.lines.Line2D at 0x10c838790>]

What happens when we make the threshold larger?


In [6]:
mean_pdfsamples_abc= Mean_ABC(threshold=1.)
h= hist(mean_pdfsamples_abc,range=[-1.,1.],bins=51,normed=True)
plot(xs,numpy.sqrt(len(data)/2./numpy.pi)*numpy.exp(-(xs-numpy.mean(data))**2./2.*len(data)),lw=2.)


Out[6]:
[<matplotlib.lines.Line2D at 0x10c8e2510>]

That's not good! What if we make it smaller?


In [7]:
mean_pdfsamples_abc= Mean_ABC(threshold=0.001)
h= hist(mean_pdfsamples_abc,range=[-1.,1.],bins=51,normed=True)
plot(xs,numpy.sqrt(len(data)/2./numpy.pi)*numpy.exp(-(xs-numpy.mean(data))**2./2.*len(data)),lw=2.)


Out[7]:
[<matplotlib.lines.Line2D at 0x10c8e22d0>]

This runs very long, because it's difficult to find simulated data sets that are close enough to the true one.

The variance

Let's know look at the variance, assuming that we know that the mean is zero. The sample variance is a sufficient statistic in this case and we can again write an ABC function, similar to that above:


In [8]:
def Var_ABC(n=1000,threshold=0.05):
    out= []
    for ii in range(n):
        d= threshold+1.
        while d > threshold:
            v= numpy.random.uniform()*4
            sim= numpy.random.normal(size=len(data))*numpy.sqrt(v)
            d= numpy.fabs(numpy.var(sim)-numpy.var(data))
        out.append(v)
    return out

We again run this to get the PDF and compare to the analytical one


In [9]:
var_pdfsamples_abc= Var_ABC()
h= hist(var_pdfsamples_abc,range=[0.,2.],bins=51,normed=True)
xs= numpy.linspace(0.001,2.,1001)
ys= xs**(-len(data)/2.)*numpy.exp(-1./xs/2.*len(data)*(numpy.var(data)+numpy.mean(data)**2.))
ys/= numpy.sum(ys)*(xs[1]-xs[0])
plot(xs,ys,lw=2.)


Out[9]:
[<matplotlib.lines.Line2D at 0x10ec11950>]

What if we make the threshold larger?


In [10]:
var_pdfsamples_abc= Var_ABC(threshold=1.)
h= hist(var_pdfsamples_abc,range=[0.,2.],bins=51,normed=True)
plot(xs,ys,lw=2.)


Out[10]:
[<matplotlib.lines.Line2D at 0x10c838f10>]

That's not good! What if we make the threshold smaller?


In [11]:
var_pdfsamples_abc= Var_ABC(threshold=0.005)
h= hist(var_pdfsamples_abc,range=[0.,2.],bins=51,normed=True)
plot(xs,ys,lw=2.)


Out[11]:
[<matplotlib.lines.Line2D at 0x10f3afb90>]

In [ ]: