This notebook is a continuation from the previous one (Introduction to fitting). The first part is identical: make some fake data (emission line) and fit it with a non-linear model (Gaussian + background). But this time, we place priors on the parameters and use Markov Chain Monte Carlo to solve the problem. Another notebook will do the same, but using a different MCMC sampler (`pystan`

).

This notebook requires the `emcee`

and `corner`

packages. You can install them by running:

```
pip install emcee
pip install corner
```

```
In [ ]:
```from numpy import * # mmmmmm crunchy
# Start by defining some parameters. Change these if you like!
cont_zp = 500.0
cont_slope = 5.0
amplitude = 150.0
width = 0.5
center = 5.0
# Next, a grid of wavelenght channels (assumed to have no uncertainty)
wave = linspace(0,10,100)
# The 'true' observations
flux = amplitude*exp(-0.5*power(wave-center,2)/width**2)+ \
cont_zp + cont_slope*wave
# The actual observations = true observations + Poisson noise
obs_flux = random.poisson(flux)

```
In [ ]:
```%matplotlib inline
from matplotlib.pyplot import plot,step,xlabel,ylabel,show,subplots
plot(wave, flux, 'r-')
step(wave, obs_flux, color='k')
xlabel('Wavelength (Angstroms)')
ylabel('Counts')
show()

Bayesian statistics is based on Bayes' theorem (for an excellent intro, see 3blue1brown's video). It's actually a very simple idea and an equally simple equation. It's *dealing* with the equation that gets complicated. Let's say we have some data $\vec{D}$. In the case of our emission line, the data is the number of counts in each wavelength bin. We have a model with some number of parameters $\vec{\theta}$. Bayes' theorem simply states:
$$P\left(\vec{\theta}\left|\vec{D}\right.\right) = \frac{P\left(\vec{D}\left|\vec{\theta}\right.\right)P\left(\vec{\theta}\right)}{P\left(\vec{D}\right)}$$
What this says is that the probability that we get a particular set of parameters given a fixed set of data (which is what we want) is proportional to the probability that we get the data given a fixed set of parameters (which we can calculate) times the probability of the parameters (the priors). The denominator is the probability that we got the data we did, which requires integrating over all possible parameters:
$$P\left(\vec{D}\right) = \int P\left(\vec{D}\left|\vec{\theta}\right.\right)P\left(\vec{\theta}\right)\ d\vec{\theta}$$
and really just ensures that the probability is normalized to 1.

You might wonder what the difference between the priors $P\left(\vec{\theta}\right)$ and $P\left(\vec{\theta}\left|\vec{D}\right.\right)$ (called the likelihood) is. The likelihood is what your data tells you about the parameters. The priors are constraints that are external to the data. It could be a previous experiment's result that you are incorporating into your own. It could be a purely logical constraint (e.g., the age of the universe must be greater than 0), it could even be a *gut feeling*.

Working with the above equation isn't too bad if the number of parameters is small and the priors and likelihoods are all simple. In fact, if you use uniform priors and normally-distributed errors, you get the good-old least-squares formalism. But pretty quickly you can get in a situation where the equation (and integrals of the equation) are not possible to evaluate analytically. This is where Markov Chain Monte Carlo (MCMC) is useful.

`cont_zp`, `cont_slope`

, `amp`, `center`, and `width`. As in the previous tutorial, the order of these parameters will be fixed. The MCMC module we will be using is called `emcee`. Let's first define the model: a function that, given the parameters, predicts the observations.

```
In [ ]:
```def model(x, cont, slope, amp, center, width):
model = amp*exp(-0.5*power(x-center,2)/width**2) + cont + \
slope*x
return model

*emission* line, so `amp`

should be positive. If we don't specify anything, parameters are assumed to have a uniform (equal) probability. Emcee also wants the natural logarithm of the probability, so we call it `lnprior()`

.

```
In [ ]:
```def lnprior(p):
cont,slope,amp,center,width = p
if width <= 0 or slope < 0:
# ln(0)
return -inf
return 0

`model()`

. Under our assumpsions, this model will differ from the observed data because of Poisson errors. For large counts, the Poisson distribution is well-approximated by a normal distribution with variance ($\sigma^2$) equal to the counts. So, given a set of parameters $\vec{\theta}$, the probability we measure the flux in channel $i$ to be $f_i$ given by:
$$P\left(f_i\left|\vec{\theta}\right.\right) = N\left(M(\vec{\theta}), \sqrt{f_i}\right)$$,
where $N$ is the normal distribution. For the entire data-set, we have to multiply the probabilities of all the individual channels. Or, since we need the log of the probability:
$$P\left(\vec{D}\left|\vec{\theta}\right.\right) = \Pi_i P\left(f_i\left|\vec{\theta}\right.\right)$$
We'll use scipy's stats module, which has the normal distribution (and its logarithm) built in. Just like the priors, emcee wants the natural logarithm of the probability, so instead of multiplying all the probabilities, we sum all the logarithms of the probabilities.

```
In [ ]:
```from scipy.stats import norm
def lnlike(p, wave, flux):
cont,slope,amp,center,width = p
m = model(wave, *p)
return sum(norm.logpdf(flux, loc=m, scale=sqrt(flux)))

```
In [ ]:
```def lnprob(p, wave, flux):
# priors
lp = lnprior(p)
if not isfinite(lp): return -inf
return lp + lnlike(p, wave, flux)

Now that we have the probability all figured out, we could in principle figure out where it is maximal and compute the 1-2-3-sigma intervals. This may or may not be possible in "closed form". The more parameters, priors and complicated the model gets, the less likely you'll be able to compute the derivatives (for optimization) and integrals (for expectation values and confidence intervals). But we can always compute these numerically and that's what MCMC is all about. With the `emcee`

module, we do this by creating a bunch of "walkers" that wander around parameter space, always seeking higher probability regions, but also randomly sampling the space. After a certain amount of time, they wander around randomly enough that they lose track of where they started. When this happens, the steps the walkers take is a reflection of $P\left(\vec{\theta}\left|\vec{D}\right.\right)$. So inferences about the moments of $P\left(\vec{\theta}\left|\vec{D}\right.\right)$ can be determined by doing statistics on the walkers' steps. For example, the expectation (mean value) of the amplitude is:
$$\left<A\right> \equiv \int P\left(\vec{\theta}\left|\vec{D}\right.\right)A d\vec{\theta} \simeq mean(A_i)$$
where A_i are the values of `amp`

at each step $i$. The more steps you take, the more accurate the estimate.

So now we create a number of walkers and start them off in random locations around parameter space. In this example, we know the true values so we just perturb around that. When you don't know the true values, you could start in completely random locations or use other tools (like `curve_fit`

) to find an initial starting point.

```
In [ ]:
```Nwalker,Ndim = 50,5
ptrue = array([500.,5.0,150.,5.0, 0.5])
# add a random vector 0.1 times the true vector to the true vector
p0 = [ptrue + 0.1*random.randn(Ndim)*ptrue for i in range(Nwalker)]

`Nwalker`

initial points. We can run the emcee sampler, givin it the `lnprob`

function and any extra arguments it needs. The `run_mcmc`

function takes the initial starting points and how many steps you want each to take. It returns the last position, probability, and state of each walker.

```
In [ ]:
```import emcee
sampler = emcee.EnsembleSampler(Nwalker, Ndim, lnprob, args=(wave, obs_flux))
pos,prob,state = sampler.run_mcmc(p0, 500)

```
In [ ]:
```fig,ax = subplots(4,1)
res = [ax[i].plot(sampler.chain[:,:,i].T, '-', color='k', alpha=0.3) for i in range(4)]
res = [ax[i].axhline(ptrue[i]) for i in range(4)]

```
In [ ]:
```sampler.reset()
pos,prob,state = sampler.run_mcmc(pos, 1000)

`sampler`

object has an attribute `flatchain`

, where all the walkers are combined. This gives us Nwalkers*Nsteps samples from the posterior. We could get the best-fit values and errors by doing statistics on the chains:

```
In [ ]:
```print(mean(sampler.flatchain, axis=0)) # best-fit, well really expectation value
print(std(sampler.flatchain, axis=0)) # errors
# deviation from true parameters in units of standard error
print((mean(sampler.flatchain, axis=0)-ptrue)/std(sampler.flatchain, axis=0))
print(cov(sampler.flatchain.T)) # covariance matrix

*posterior* probabilities of the parameters as well as the covariances between them by plotting a `corner`

plot.

```
In [ ]:
```import corner
rmp = corner.corner(sampler.flatchain, labels=['cont_zp','cont_slope','amp','cent','width'],
truths=[cont_zp,cont_slope,amplitude,center,width])