Multi-Gaussian example

As an introductory example we can use astroABC to find the posterior distribution for some Gaussian distributed data. Although in this case we already know the likelihood this example is to illustrate how to call astroABC and provide user-defined metrics.


In [1]:
# start by importing astroabc and numpy
import numpy as np
import astroabc

We need to provide:

  • a dataset
  • a forwards simulating model for the data
  • a method defining the distance metric.

For this example we generate a dataset of a 1000 draws from a 2D multi-Gaussian where the true means are e.g

$\mu_{1,true} = 0.037579, \, \mu_{2, true}=0.573537$

and we have fixed the covariance matrix to be a diagonal matrix with $\sigma_1^2 = \sigma_2^2 = 0.05$. We can do this using an inbuilt model class in astroabc.


In [2]:
#make the fake data with diagonal covariance
means= np.array([0.037579, 0.573537])
cov =np.array([0.01,0.005,0.005,0.1])
data = astroabc.Model("normal",1000).make_mock(means,cov)

In this example the make_mock method also provides a forwards simulating model for the data.


In [3]:
#define a method for simulating the data given input parameters
def simulation(param, pool=None):
    cov =np.array([0.01,0.005,0.005,0.1])
    #Ideally do something with the pool here
    return astroabc.Model("normal",10000).make_mock(param,cov)

model_sim = simulation

Next we define a distance metric method. In this example instead of using all of the data (all 1000 draws from a 2D Gaussian) we use the means of the data as a summary statistic and our distance metric is the sum over the difference in the means for the 2D Gaussian


In [4]:
def dist_metric(d,x):
    return np.sum(np.abs(np.mean(x,axis=0) - np.mean(d,axis=0)))

Next we specify priors for each of the parameters we want to vary in the sampler. This is done by specifying a list of tuples. The zeroth element in each tuple should be a string specifying the prior for this parameter and the first element should be a list of the hyperparameters needed for this prior. e.g. below we use a normal distribution with mean 0 and variance 0.5 for the first parameter and a uniform distribution from 0.1 - 0.9 for the second parameter.


In [5]:
priors =  [('normal', [0.03,0.1]), ('uniform', [0.1, 0.9])]

Next we need to set some keywords for astroABC. This can be done by creating a dictionary of inputs which are passed to the sampler. Many of these entries have defaults and do not need to be specified explicitly. Only the name of the distance metric method needs to be explicity provided as a keyword. The full set of keywords are given in the doc string of the class. Some examples are

  • tol_type: which specifies the decreasing tolerance levels. "exp","lin", "log" and "const" are options. (default = 'exp')

  • verbose: level of verbosity, 0 = no printout to screen, 1 = print to screen (default = 0)

  • adapt_t: Boolean True/False for adaptive threshold setting (default = False)

  • threshold: qth quantile used in adaptive threshold setting (default = 75)

  • pert_kernel: 1 =component wise pert. with local diag variance; 2 = multivariate pert. based on local covariance

  • variance_method: 0 =weighted covariance, 1= Filippi, 2 = TVZ, 3= Leodoit_Wolf, 4=k-nn (default = 0)

  • dfunc:method for calculating the distance metric

  • from_restart: Boolean True/False

  • restart: string name of restart file

  • outfile:string specifying name of output file (default = abc_out.txt)

  • mpi: Boolean True/False (default = False)

  • mp:Boolean True/False (default = False)

  • num_proc:number of threads for mp setting (default = None)

Please see the doc strings of the astroABC sampler for details on each of these settings.


In [6]:
prop={'dfunc':dist_metric, 'outfile':"gaussian_example.txt", 'verbose':1, 'adapt_t': True, 'pert_kernel':2}

Now we are ready to create an instance of our sampler. To do this we just need to specify the following to the astroabc.ABC_class

astroabc.ABC_class(number of parameters,number of particles,data,tolerance levels,number of iterations,priors,prop)

To begin let's run in serial using 100 particles for 30 iterations with default tolerance levels of a maximum threshold of 0.7 and a minimum threshold of 0.05:


In [7]:
sampler = astroabc.ABC_class(2,100,data,[0.5,0.002],20,priors,**prop)


	 	
	 ########################     astroABC     ########################	
	 	
	 Npart=100 	 numt=20 	 tol=[0.5000,0.0020] exp
	 Priors= [('normal', [0.03, 0.1]), ('uniform', [0.1, 0.9])]

Then we simply begin sampling on our data...


In [8]:
sampler.sample(model_sim)


	 Step: 0 	 tol: 0.5 	 Params: [0.004831022751718228, 0.5316987441872384]
	 Step: 1 	 tol: 0.3674356729086831 	 Params: [0.02321074264082056, 0.5742511100054313]
	 Step: 2 	 tol: 0.30166515686162976 	 Params: [0.036441292743712723, 0.5849798902653633]
	 Step: 3 	 tol: 0.2339575792400317 	 Params: [0.04721763109698714, 0.5835376098791599]
	 Step: 4 	 tol: 0.19340880389783577 	 Params: [0.043681453658168924, 0.5825229124687235]
	 Step: 5 	 tol: 0.1698794471891293 	 Params: [0.025025449631187073, 0.5864426900677008]
	 Step: 6 	 tol: 0.1447905652346464 	 Params: [0.03934585197338663, 0.5838000248009465]
	 Step: 7 	 tol: 0.1243271216418435 	 Params: [0.04108654251443204, 0.5850115948283879]
	 Step: 8 	 tol: 0.10398803512341327 	 Params: [0.03269305321900622, 0.5840846504880716]
	 Step: 9 	 tol: 0.08868583214998207 	 Params: [0.03250952497115013, 0.5855360667857193]
	 Step: 10 	 tol: 0.0756948315479395 	 Params: [0.035796696950396154, 0.580821486914087]
	 Step: 11 	 tol: 0.06511416184901217 	 Params: [0.03984439373095, 0.5839234813748221]
	 Step: 12 	 tol: 0.05343054662205011 	 Params: [0.03617377326621689, 0.5849199639229566]
	 Step: 13 	 tol: 0.04621903048072745 	 Params: [0.037625099407746984, 0.5880483203398328]
	 Step: 14 	 tol: 0.037262989801758156 	 Params: [0.038806559919571944, 0.5851600667183121]
	 Step: 15 	 tol: 0.031602756631985564 	 Params: [0.03701197479380795, 0.5831796705165299]
	 Step: 16 	 tol: 0.026952790181097103 	 Params: [0.037998247499682836, 0.5832717144071579]
	 Step: 17 	 tol: 0.023255214366412626 	 Params: [0.038519266122226475, 0.5825269763266807]
	 Step: 18 	 tol: 0.019505198348594738 	 Params: [0.039200951041888976, 0.5838628870292696]
	 Step: 19 	 tol: 0.01637099476629593 	 Params: [0.038146724068983597, 0.5849711692529307]

The output above shows the estimated means of the 2D Gaussian averaged over all 100 particles at each iteration, together with the tolerance level. Note above that the sampling may end before 20 iterations if the minimum tolerance level is reached first. Recall that the true parameter values are $\mu_{1,true} = 0.037579, \, \mu_{2, true}=0.573537$

K-Nearest Neighbours estimation for data sample with covariance matrix

We could also have created a dataset with a full covariance matrix using


In [17]:
means= np.array([0.7579, 0.273537])
cov = np.array([0.1,0.01,0.01,0.1])
data_cov = astroabc.Model("normal",1000).make_mock(means,cov)

Keeping model simulation and distance methods the same as above. We can select a different way of estimating the covariance amongst all the particles using k-nearest neighbours. This returns a local covariance estimate and in many cases this reaches convergence faster then using a weighted covariance amongst all particles.


In [18]:
priors =  [('uniform', [0.1,0.9]), ('uniform', [0.1, 0.9])]
prop={'dfunc':dist_metric, 'outfile':"gaussian_example.txt", 'verbose':1, \
      'adapt_t': True, 'variance_method':4, 'k_near':10 }

In [19]:
sampler = astroabc.ABC_class(2,100,data_cov,[0.5,0.002],25,priors,**prop)


	 	
	 ########################     astroABC     ########################	
	 	
	 Npart=100 	 numt=25 	 tol=[0.5000,0.0020] exp
	 Priors= [('uniform', [0.1, 0.9]), ('uniform', [0.1, 0.9])]

In [20]:
sampler.sample(model_sim)


	 Step: 0 	 tol: 0.5 	 Params: [0.6417417888667538, 0.3769805241152575]
	 Step: 1 	 tol: 0.43598082896982815 	 Params: [0.6728761879909225, 0.3626886691562053]
	 Step: 2 	 tol: 0.358520432619466 	 Params: [0.7227864485788014, 0.31132619316770155]
	 Step: 3 	 tol: 0.2833997638790343 	 Params: [0.7440094967360175, 0.2987062443319014]
	 Step: 4 	 tol: 0.21592705669630358 	 Params: [0.7578314852638425, 0.28997038404805464]
	 Step: 5 	 tol: 0.1832059820435786 	 Params: [0.7725999199112689, 0.2711524459774648]
	 Step: 6 	 tol: 0.1563979221312114 	 Params: [0.7549734561659688, 0.280156875829783]
	 Step: 7 	 tol: 0.12951796909825125 	 Params: [0.7561589203189962, 0.28703131841749463]
	 Step: 8 	 tol: 0.1093959516000065 	 Params: [0.7646331246580728, 0.2767507341644505]
	 Step: 9 	 tol: 0.08927254752993467 	 Params: [0.7631681442434737, 0.2820566810999924]
	 Step: 10 	 tol: 0.07633646127328605 	 Params: [0.7584185873502948, 0.2795299299446455]
	 Step: 11 	 tol: 0.061068540755597894 	 Params: [0.752618957552296, 0.27880575189303963]
	 Step: 12 	 tol: 0.05061479870880048 	 Params: [0.7569674388914795, 0.2742962959261214]
	 Step: 13 	 tol: 0.04088574143743237 	 Params: [0.7574683792627619, 0.2777022756484693]
	 Step: 14 	 tol: 0.03449755151429326 	 Params: [0.7602516050532038, 0.27900294921886354]
	 Step: 15 	 tol: 0.028311973885197322 	 Params: [0.7594578199160145, 0.2814250761815206]
	 Step: 16 	 tol: 0.024854485949822425 	 Params: [0.758612139644833, 0.28191228349811726]
	 Step: 17 	 tol: 0.02080047547436792 	 Params: [0.7580105537118261, 0.28091831607808415]
	 Step: 18 	 tol: 0.018123121350981278 	 Params: [0.759419825029535, 0.2812787295688279]
	 Step: 19 	 tol: 0.01488107118582739 	 Params: [0.7593977382315026, 0.2808831912135362]
	 Step: 20 	 tol: 0.012772810365051915 	 Params: [0.7588644879425643, 0.2804332292642587]
	 Step: 21 	 tol: 0.01054868328411554 	 Params: [0.7592091183480909, 0.2806129347880487]
	 Step: 22 	 tol: 0.008556361908366159 	 Params: [0.7589018644937893, 0.28085938213260714]
	 Step: 23 	 tol: 0.0070204547318383215 	 Params: [0.7590723043195702, 0.2809934891621416]
	 Step: 24 	 tol: 0.006059962305213662 	 Params: [0.7590556114343996, 0.2802493203704434]

In [ ]: