Supernova Example

Introduction

In this toy example we compare ABC and MCMC as methods of estimating cosmological parameters from supernovae data. The following model describes the distance modulus as a function of redshift:

$$\mu_{i}^{model}(z_{i};\Omega_{m},w_{0}) \propto 5log_{10}(\frac{c(1+z)}{h_{0}})\int_{0}^{z}\frac{dz'}{E(z')}$$$$E(z) = \sqrt{\Omega_{m} (1+z)^{3} + (1-\Omega_{m})e^{3\int_{0}^{z} dln(1+z')[1+w(z')]}}$$

Both ABC and MCMC use this model to estimate the posterior distributions of matter density $\Omega_{m}$ and the dark energy equation of state $w_{0}$.

To demonstrate the difference between ABC and MCMC, we first generate a dataset containing artificial noise such that the distribution of the data is non-Gaussian.

  • To use MCMC we are required to specify a likelihood function $\mathcal{L}(D|\Omega_{m}, w_{0})$. By making the standard assumption of a Gaussian likelihood, we are (incorrectly) assumming that the data is Gaussian distributed. Under this assumption we expect biased results from MCMC.
  • To use ABC, we must be able to simulate the data at every point in parameter space. The simulation can naturally include non-Gaussian noise (in many physical examples it is easier to include noise in a simulation then to account for it analytically).

In [1]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(color_codes=True)
import numpy as np
from scipy.stats import skewnorm
import math
import astroabc
from distance_calc import DistanceCalc   
from bin_data import *

The Data

First, we need to provide a dataset. The aim is to generate values of the distance modulus, $\mu_{i}$, with fixed "true" parameters $\Omega_{m}$ and $w_{0}$

$$\Omega_{m} = 0.3$$$$w_{0} = -1.0$$

SNANA is used to generate ~400 supernova light curves which are then fit with the SALT-II light curve fitter. The data span the redshift interval $z \in [0.5,1.0]$ and are binned into 20 redshift bins.


In [2]:
zbins,avmu_bin,averr_bin,mu_in_bin_new,mu_in_bin_new = read_data()

Adding noise to the data

To add artificial noise to the data we use a skewed normal distribution. The standard normal distribution has a probability distribution function given by

$$ \phi(x) = \frac{1}{\sqrt{2 \pi}} e^{-\frac{x^{2}}{2}}$$

and a cumulative distribution function:

$$\Phi(x) = \frac{1}{2} [1 + erf(\frac{x}{\sqrt{2}})] $$

The skewed normal distribution $f(x)$ with parameter $\alpha$ is given by

$$f(x) = 2\phi(x)\Phi(\alpha x)$$

Using this probability distribution function, we can draw a random sample from the skewed normal distribution.


In [3]:
e = -0.1 #location
w = 0.3 #scale
a = 5.0 #skew

plt.figure(figsize=(17,8))
plt.hist(skewnorm.rvs(a, loc=e, scale=w, size=10000),normed=True,bins=20,color='#593686')
plt.title("Distribution of a random sample",fontsize=17);


/Users/danielahuppenkothen/work/sw/miniconda3/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6510: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  alternative="'density'", removal="3.1")

From this distribution, the noisy data is generated. At each $z_{i}$ a random number is drawn from the above distribution and added to $\mu_{i}$.


In [4]:
data = np.zeros(len(zbins)) 

for i in range(len(zbins)):
    data[i] = avmu_bin[i] + skewnorm.rvs(a, loc=e, scale=w, size=1)

A comparison of the data before and after noise is added is shown below:


In [5]:
plt.figure(figsize=(17,8))
plt.errorbar(zbins,avmu_bin,averr_bin,marker="o",linestyle="None",label="without noise",color='#593686')
plt.scatter(zbins,data,color='r',label="with noise")
plt.legend(loc="upper left",prop={'size':17});
plt.xlabel("$z$",fontsize=20)
plt.ylabel("$\mu(z)$",fontsize=20)
plt.title("Data before and after noise is added",fontsize=17);


To demonstrate the non-Gaussian distribution of the data at each $z_{i}$, we focus on the data at $z_{1}=0.5$. The distribution of a large random sample at this redshift is shown below. Each value in this sample is generated by adding a randomly drawn number from the skewed normal distribution to $\mu_{1}$. The value of $\mu_{1}$ before noise is added is shown in red. As we can see the data is now a skewed distribution around the expected mean.


In [6]:
z = 0
distribution = np.zeros(10000)

for j in range(10000):
    distribution[j] = avmu_bin[z] + skewnorm.rvs(a, loc=e, scale=w, size=1)

plt.figure(figsize=(17,8))
plt.title("Distribution of the data at redshift z=0.5",fontsize=17);
plt.hist(distribution,bins=20,color='#593686',normed=True)
plt.plot((avmu_bin[z], avmu_bin[z]), (0, 2.5), 'r-', label="True $\mu$ at $z = 0.5$");
plt.legend(prop={'size':16});


/Users/danielahuppenkothen/work/sw/miniconda3/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6510: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  alternative="'density'", removal="3.1")

ABC

ABC is used to estimate the posterior distribution of the unknown parameters $\Omega_{m}$ and $w_{0}$ from the data. To use ABC we need to specify:

  • A metric $\rho$
  • Prior distributions of $\Omega_{m}$ and $w_{0}$
  • A simulation function

For more information on each of these, please see the Introduction page or the simple Gaussian demo

In this example the metric $\rho$ is defined to be $$\rho(\mu,\mu_{sim}(z)) = \sum_{i} \frac{(\mu_{i} - \mu_{sim}(z_{i}))^{2}}{2 \sigma_{i}^{2}}$$ where $\sigma_{i}$ is the error on the data point $\mu_{i}$.


In [7]:
def my_dist(d,x):
    if x[0]==None:
        return float('Inf')
    else:
        return np.sum(((x-d)/averr_bin)**2)

Parameters needed for ABC are specified:


In [8]:
nparam = 2
npart = 100 #number of particles/walkers
niter = 20  #number of iterations
tlevels = [500.0,0.005] #maximum,minimum tolerance

prop={'tol_type':'exp',"verbose":1,'adapt_t':True,
      'threshold':75,'pert_kernel':2,'variance_method':0,
      'dist_type': 'user','dfunc':my_dist, 'restart':"restart_test.txt", \
      'outfile':"abc_pmc_output_"+str(nparam)+"param.txt",'mpi':False,
      'mp':True,'num_proc':2, 'from_restart':False}

The prior distributions of $\Omega_{m}$ and $w_{0}$ are chosen as follows:

  • For $\Omega_{m}$ we use a normal distribution with mean $0.3$ and standard deviation $0.5$.
  • For $w_{0}$ we use a normal distribution with mean $-1.0$ and standard deviation $0.5$.

In [12]:
priorname  = ["normal","normal"]
hyperp = [[0.3,0.5], [-1.0,0.5]]
prior = list(zip(priorname,hyperp))

Finally, we need a simulation function. This must be able to simulate the data at every point in parameter space. At each $z_{i}$ the simulation uses $\mu_{model}(z_{i};\Omega_{m},w_{0})$ given by $$\mu_{i}^{model}(z_{i};\Omega_{m},w_{0}) \propto 5log_{10}(\frac{c(1+z)}{h_{0}})\int_{0}^{z}\frac{dz'}{E(z')}$$

to produce a value of distance modulus. To account for noise in the data we then add a number randomly drawn from a skewed normal distribution.


In [13]:
def ABCsimulation(param): #param = [om, w0] 
    if param[0] < 0.0 or param[0] > 1.0:
        return [None]*len(zbins)
    else:
        model_1_class = DistanceCalc(param[0],0,1-param[0],0,[param[1],0],0.7)  #om,ok,ol,wmodel,de_params,h0
        data_abc = np.zeros(len(zbins))
        for i in range(len(zbins)):
                data_abc[i] = model_1_class.mu(zbins[i]) + skewnorm.rvs(a, loc=e, scale=w, size=1)
        return data_abc

The ABC sampler is run for 20 iterations. At each iteration the current estimate of the unknown parameters is shown.


In [14]:
sampler = astroabc.ABC_class(nparam,npart,data,tlevels,niter,prior,**prop)
sampler.sample(ABCsimulation)


	 	
	 ########################     astroABC     ########################	
	 	
	 Npart=100 	 numt=20 	 tol=[500.0000,0.0050] exp
	 Priors= [('normal', [0.3, 0.5]), ('normal', [-1.0, 0.5])]
	 Step: 0 	 tol: 499.99999999999994 	 Params: [0.4314501461477677, -1.0361649052741613]
	 Step: 1 	 tol: 89.97788102957212 	 Params: [0.4499554338699862, -1.1645308832694952]
	 Step: 2 	 tol: 68.52023537487136 	 Params: [0.40215707373000653, -1.2107503351789481]
	 Step: 3 	 tol: 47.535428147776244 	 Params: [0.3912204223672498, -1.2797094972855634]
	 Step: 4 	 tol: 38.74827917104074 	 Params: [0.33421301841620116, -1.2570767734248782]
	 Step: 5 	 tol: 34.81883379198433 	 Params: [0.3523042489544083, -1.218582387791155]
	 Step: 6 	 tol: 31.36903356527919 	 Params: [0.3388598662166609, -1.14323336218714]
	 Step: 7 	 tol: 28.988937637671725 	 Params: [0.3283966070540986, -1.1629625809514204]
	 Step: 8 	 tol: 26.48307102422561 	 Params: [0.2987620265919426, -1.0947907139493558]
	 Step: 9 	 tol: 23.791438035587092 	 Params: [0.2816623794449367, -1.0714735098270267]
	 Step: 10 	 tol: 21.479243045053007 	 Params: [0.2951541504504166, -1.0998598883929875]
	 Step: 11 	 tol: 20.083836464851615 	 Params: [0.29782107263662727, -1.1014385282019075]
	 Step: 12 	 tol: 19.22018180211868 	 Params: [0.2835521320637793, -1.0589836298288662]
	 Step: 13 	 tol: 18.528690584461767 	 Params: [0.2806198442574929, -1.0649410809825235]
	 Step: 14 	 tol: 17.026713222581655 	 Params: [0.29250544469051404, -1.1086168237276182]
	 Step: 15 	 tol: 16.106581286077347 	 Params: [0.290764298977372, -1.1423623721617904]
	 Step: 16 	 tol: 15.200701247918136 	 Params: [0.28393933984778763, -1.1213557785879256]
	 Step: 17 	 tol: 14.35912136591412 	 Params: [0.25634609119770263, -1.0206581367070218]
	 Step: 18 	 tol: 13.70586300037004 	 Params: [0.26433859810238775, -1.032137932348848]
	 Step: 19 	 tol: 12.906888706458806 	 Params: [0.2931433055714998, -1.0966384845563952]

Results

The following figure shows the progress of the ABC particles as the iteration number increases (and the tolerance level decreases). Initially, at iteration 1, the particles are scattered over a wide area of parameter space according to the prior. As the iteration number increases the tolerance $\epsilon_{i}$ decreases, causing the particles to converge to the true values of $\Omega_{m}$ and $w_{0}$.

After only 20 iterations the ABC sampler obtains the following results for $\Omega_{m}$ and $w_{0}$: $$\Omega_{m} = 0.36 \pm 0.12$$ $$w_{0} = -1.22 \pm 0.4$$

The 1- and 2-$\sigma$ contours for $\Omega_{m}$ and $w_{0}$ are shown below:

MCMC

Next we use MCMC to estimate the posterior distributions of the parameters $\Omega_{m}$ and $w_{0}$.

The prior distributions of $\Omega_{m}$ and $w_{0}$ are chosen to be the same as those used in ABC:

  • For $\Omega_{m}$ we use a normal distribution with mean $0.3$ and standard deviation $0.5$.
  • For $w_{0}$ we use a normal distribution with mean $-1.0$ and standard deviation $0.5$.

The proposal distribution is chosen to be a normal distribution for both parameters.

A Gaussian likelihood $\mathcal{L}(D|\Omega_{m}, w_{0})$ is specified: $$\mathcal{L}(D|\Omega_{m}, w_{0}) \propto exp(-\sum_{i}\frac{(\mu_{i} - \mu_{model}(z_{i}))^{2}}{2\sigma_{i}^{2}})$$

By using this likelihood, we assume that the distribution of each data point is Gaussian, as shown by the distribution on the left. In reality, this is not the case, as shown by the skewed distribution on the right:

By making this assumption we expect MCMC to produce biased results.

Results

The results of running an MCMC sampler are:

$$\Omega_{m} = 0.17 \pm 0.11$$$$w_{0} = -1.26 \pm 0.55$$

These results are quite different from the true values of $\Omega_{m}$ and $w_{0}$ and there is a clear bias in the value of $\Omega_m$ recovered. The following distributions show samples from the posterior distributions of $\Omega_{m}$ and $w_{0}$ obtained by MCMC.

The 1- and 2-$\sigma$ contours of the $\Omega_{m}$ and $w_{0}$ posterior distribution are shown below, where the red star indicates the true values $\Omega_{m} = 0.3$ and $w_{0} = -1.0$. We can clearly see that the assumption of a Gaussian likelihood has led to biased results from the MCMC sampler.

Summary

The table summarises the values of $\Omega_{m}$ and $w_{0}$ obtained by the ABC and MCMC samplers and compares these to the true values of the parameters.

True values ABC MCMC
$\Omega_{m}$ 0.3 0.36 $\pm$ 0.12 0.17 $\pm$ 0.11
$w_{0}$ -1.0 -1.22 $\pm$ 0.4 -1.26 $\pm$ 0.55

Within error, ABC predicts the correct values of $\Omega_{m}$ and $w_{0}$. In contrast, MCMC does not find the correct values of $\Omega_{m}$ and $w_{0}$. This can be more clearly seen in a side by side comparison of the contour plots: