We are going to use parallel-tempering, implemented via the python emcee package, to explore our posterior, which consists of the set of distances and gas to dust conversion coefficients to the six velocity slices towards the center of the Cepheus molecular cloud. Since we need to explore a 12 dimensional parameter space, we are going to use 50 walkers, 10000 steps each, at 5 different temperatures. If you would like to edit this parameters, simply edit "nwalkers", "ntemps", and "nsteps" in the cell below. However, we are only going to keep the lowest temperature chain ($\beta=1$) for analysis. Since the sampler.chain object from PTSampler returns an array of shape (Ntemps, Nwalkers, Nsteps, Ndim), returning the samples for all walkers, steps, and dimensions at $\beta=1$ would correspond to sampler.chain[0,:,:,:]. To decrease your value of $\beta$ simply increase the index for the first dimension. For more information on how PTSampler works, see http://dan.iel.fm/emcee/current/user/pt/. We will set off our walkers in a Gaussian ball around a) the kinematic distance estimates for the Cepheus molecular cloud given by a flat rotation curve from Leroy & Rosolowsky 2006 and b) the gas-to-dust coefficient given by the literature. We perturb the walkers in a Gaussian ball with mean 0 and variance 1. You can edit the starting positions of the walkers by editing the "result" variable below. We are going to discard the first half of every walker's chain as burn-in.
We need to feed PTSampler the required positional arguments for the log_likelihood and log_prior function. We do this using the fetch_args function from the io module, which creates an instance of the pixclass object that holds our data and metadata. Fetch_args accepts three arguments:
Fetch_args will then return the correct arguments for the log_likelihood and log_prior functions within the model module.
Here we go!
The sampler is done running, so now let's check out the results. We are going to print out our mean acceptance fraction across all walkers for the coldest temperature chain.
We are also going to discard the first half of each walker's chain as burn-in; to change the number of steps to burn off, simply edit the 3rd dimension of sampler.chain[0,:,n:,:] and input your desired value of n. Next, we are going to compute and print out the 50th, 16th, and 84th percentile of the chains for each distance parameter, using the "quantile" attribute of a pandas dataframe object. The 50th percentile measurement represents are best guess for the each distance parameter, while the difference between the 16th and 50th gives us a lower limit and the difference between the 50th and 84th percentile gives us an upper limit:
In [1]:
import emcee
from dustcurve import model
import seaborn as sns
import numpy as np
from dustcurve import pixclass
import matplotlib.pyplot as plt
import pandas as pd
import warnings
from dustcurve import io
from dustcurve import hputils
from dustcurve import kdist
import h5py
from dustcurve import globalvars as gv
%matplotlib inline
f=h5py.File('/n/fink1/czucker/Output/2degrees_jul1_0.1sig_cropped.h5')
samples=f['/chains']
nsteps=samples.shape[2]
ndim=samples.shape[3]
#Extract the coldest [beta=1] temperature chain from the sampler object; discard first half of samples as burnin
samples_cold = samples[0,:,int(.5*nsteps):,:]
traces_cold = samples_cold.reshape(-1, ndim).T
#find best fit values for each of the 24 parameters (12 d's and 12 c's)
theta=pd.DataFrame(traces_cold)
quantile_50=theta.quantile(.50, axis=1).values
quantile_84=theta.quantile(.84, axis=1).values
quantile_16=theta.quantile(.16, axis=1).values
upperlim=quantile_84-quantile_50
lowerlim=quantile_50-quantile_16
#print out distances
for i in range(0,int(len(quantile_50)/2)):
print('d%i: %.3f + %.3f - %.3f' % (i+1,quantile_50[i],upperlim[i], lowerlim[i]))
#print out coefficients
for i in range(int(len(quantile_50)/2), int(len(quantile_50))):
print('c%i: %.3f + %.3f - %.3f' % (i+1-int(len(quantile_50)/2),quantile_50[i],upperlim[i], lowerlim[i]))
Let's see what our chains look like by producing trace plots:
In [5]:
#set up subplots for chain plotting
axes=['ax'+str(i) for i in range(ndim)]
fig, (axes) = plt.subplots(ndim, figsize=(10,60))
plt.tight_layout()
for i in range(0,ndim):
if i<int(ndim/2):
axes[i].set(ylabel='d%i' % (i+1))
else:
axes[i].set(ylabel='c%i' % (i-5))
#plot traces for each parameter
for i in range(0,ndim):
sns.tsplot(traces_cold[i],ax=axes[i])
Now we are going to use the seaborn distplot function to plot histograms of the last half of the traces for each parameter.
In [6]:
#set up subplots for histogram plotting
axes=['ax'+str(i) for i in range(ndim)]
fig, (axes) = plt.subplots(ndim, figsize=(10,60))
plt.tight_layout()
for i in range(0,ndim):
if i<int(ndim/2):
axes[i].set(ylabel='d%i' % (i+1))
else:
axes[i].set(ylabel='c%i' % (i-5))
#plot traces for each parameter
for i in range(0,ndim):
sns.distplot(traces_cold[i],ax=axes[i],hist=True,norm_hist=False)
Now we want to see how similar the parameters at different steps are. To do this, we draw one thousand random samples from the last half of the chain and plot the reddening profile corresponding to those parameters in light blue. Then, we plot the "best fit" reddening profile corresponding to the 50th quantile parameters (essentially the median of the last half of the chains). In all cases, we take the average of the CO for all nside 128 pixels at each slice. As you can see, drawing random samples from the last half of the chain produce reddening profiles essentially identical to the best fit values.
In [3]:
from dustcurve import plot_posterior
ratio=0.06
plot_posterior.plot_samples(np.asarray(post_all),np.linspace(4,19,120),np.linspace(0,7,700),quantile_50,traces_cold,ratio,gv.unique_co,y_range=[0,2],vmax=20,normcol=False)
In [ ]: