Looking into the cov matrix used to make the Metropolis Hastings jumps.
gully


In [2]:
import h5py

In [3]:
import numpy as np

In [4]:
import emcee

No covariance matrix used during sampling.


In [5]:
f1 = h5py.File('mc_noCov.hdf5', mode='r')

d1 = f1['samples']

print("Acceptance fraction: {:.>10.1%}".format(np.float(d1.attrs['acceptance'])))

#f1.close()


Acceptance fraction: .....73.2%

On the high side...

With covariance matrix during sampling:


In [6]:
f2 = h5py.File('mc_Cov.hdf5', mode='r')

d2 = f2['samples']

print("Acceptance fraction: {:.>10.1%}".format(np.float(d2.attrs['acceptance'])))

#f2.close()


Acceptance fraction: .....43.0%

Looks better.

How do the chains compare?

No covariance matrix used during sampling. With covariance matrix used during sampling.

The bottom one is better because it has a lower---but not too low--acceptance ratio.

The path forward

So in order to provide a guess for the transition probability matrix, we need to have already have run the MCMC sampling, and then performed the covariance analysis: chain.py --files mc.hdf5 --cov.

So we have to sample twice. One would be a preliminary run, and the other would be a production run. This strategy already jives with our plan for tuning the spectral line outliers- we run a SampleThetaPhi then we run a SamplePhiLines.

Why we should specify a transition probability matrix

The main advantage of specifying a transition probability (cov) matrix is that it decreases the integrated autocorrelation time, $\tau_{int}$. Let's check that it did that, above.


In [7]:
d1.shape


Out[7]:
(500, 6)

In [16]:
d1.value.shape


Out[16]:
(500, 6)

In [24]:
n_disc = 50
chain = d1.value[n_disc:, 5]

print("{: <8} {: <8} {: <8}".format("window", "tau_int", "x6"))
for w in [10*2**n for n in range(6)]:
    tau_int = emcee.autocorr.integrated_time(chain, window=w)
    print("{:.>8} {:.>8.1f} {:.>8.1f} ".format(w, tau_int, 6.0*tau_int))


window   tau_int  x6      
......10 .....9.5 ....56.9 
......20 ....10.6 ....63.7 
......40 .....8.9 ....53.5 
......80 ....-0.5 ....-2.8 
.....160 .....9.3 ....55.6 
.....320 .....4.1 ....24.6 

It's hard to estimate $\tau_{int}$ from such a small window of data. We need more samples. But at any rate the acceptance fraction has gotten closer to 20-40%, so that seems like a good thing.


In [ ]:
f1.close()
f2.close()

The end.