In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
In [2]:
def tracePlot(chains, labels=None, truths=None):
n_dim = chains.shape[2]
fig, ax = plt.subplots(n_dim, 1, figsize=(8., 27.), sharex=True)
ax[-1].set_xlabel('Iteration', fontsize=20.)
for i in range(len(ax)):
try:
ax[i].set_ylabel(labels[i], fontsize=20.)
except IndexError:
pass
ax[i].tick_params(which='major', axis='both', length=10., labelsize=16.)
for j in range(len(chains)):
try:
ax[i].plot([0, len(chains[j,:,i])], [truths[i], truths[i]], '-', lw=4, dashes=(20., 10.),
c='#B22222')
except:
pass
ax[i].plot(chains[j,:,i], '-', lw=1, c='#0473B3', alpha=0.5)
fig.tight_layout()
In [3]:
def GelmanRubin(chains, labels=None):
n_chains = chains.shape[0]
n_iter = chains.shape[1]/2
n_params = chains.shape[2]
# take last n samples if total was 2n
sample = chains[:,-n_iter:,:]
# compute mean of intra-chain (within) variances
W = np.mean(np.var(sample, axis=1), axis=0)
# compute mean of inter-chain (between) variances
chain_means = np.mean(sample, axis=1)
mean_of_chain_means = np.mean(chain_means, axis=0)
B = np.empty(n_params)
for i in range(n_params):
B[i] = np.sum((chain_means[:, i] - mean_of_chain_means[i])**2)*n_iter/(n_chains - 1.)
# estimated variance (likely an over-estimate)
Sigma_hat_2 = ((n_iter - 1.)*W + B)/n_iter
# pooled posterior variance
Var_hat = Sigma_hat_2 + B/(n_chains*n_iter)
# compute potential scale reduction factor
PSRF = np.sqrt(Var_hat/W)
return W, B, Var_hat, PSRF
In [4]:
flatchain = np.genfromtxt('/Users/grefe950/Software/StarBay/interbay/logs/Sun_W0300_N0600_B0000.dat')
#flatchain = np.genfromtxt('/Users/grefe950/evolve/data/mltcal/model/chains/Sun_W0300_N0600_B0000.dat')
chains = flatchain.reshape(300, -1, 9)
labels=['Mass', '[Fe/H]', 'Y', 'log(Age)', 'Distance', 'alpha', 'log(Teff)', 'log(Fbol)', 'theta']
truths = [1.0, 0.00, 0.278, np.log10(4.568e9), 5.0, 1.884, np.log10(5779.), np.log10(1.286e-6), 1.860]
In [5]:
tracePlot(chains, labels=labels, truths=truths)
In [6]:
GelmanRubin(chains, labels=labels)
Out[6]:
Based on the trace plot, it seems as though the ensemble of chains have converged upon a solution. However, under the Gelman-Rubin diagnostic, the potential scale reduction factor is larger than one should like, that is, well above unity.
It is therefore likely that we should continue running the chains further and continually apply a Gelman-Rubin diagnostic to assess the convergence of each parameter.
In [7]:
PSRF = np.empty((581, 9))
for i in range(10, 591):
W, B, var_hat, PSRF[i - 10] = GelmanRubin(chains[:, :i, :]) # Note that GelmanRubin() trims the lower half
In [8]:
fig, ax = plt.subplots(9, 1, figsize=(8., 18.), sharex=True)
ax[-1].set_xlabel('Iteration', fontsize=20.)
for i in range(len(ax)):
try:
ax[i].set_ylabel('$R$, {:s}'.format(labels[i]), fontsize=18., family='serif')
except IndexError:
pass
ax[i].tick_params(which='major', axis='both', length=10., labelsize=16.)
ax[i].plot(PSRF[:,i], '-', lw=3, c='#0473B3', alpha=0.9)
fig.tight_layout()
There are clear issues with theta and Teff, which appear to possess a high level of variance, even after nearly 600 iterations. This can be traced to the presence of a couple outlier chains in the trace plots above. These chains have non-sense values for the effective temperature, bolometric flux, and angular diameter. Additional chains and interations should mitigate the presence of these outliers.
In the interest of determining "optimal" MCMC parameters (number of walkers and iterations), I've started a new solar run. Instead of continuing the run used above (300 walkers, 600 iterations), I've started a new sampler with 1000 walkers over 500 steps. Depending on how well converged the final set of chains look, it may be worth decreasing the number of walkers and increasing the number of steps.
Tracking the convergence of the above chains illustrated that convergence was slow, with the potential scale reduction factor slowly (albeit steadily) creeping toward acceptable values.
In [9]:
new_flatchain = np.genfromtxt('/Users/grefe950/Software/StarBay/interbay/logs/Sun_W1000_N0500_B0000.dat')
new_chains = new_flatchain.reshape(1000, -1, 9)
In [10]:
tracePlot(new_chains, labels=labels, truths=truths)
In [11]:
GelmanRubin(new_chains, labels=labels)
Out[11]:
In [12]:
new_PSRF = np.empty((481, 9))
for i in range(10, 491):
W, B, var_hat, new_PSRF[i - 10] = GelmanRubin(new_chains[:, :i, :]) # Note that GelmanRubin() trims the lower half
In [13]:
fig, ax = plt.subplots(9, 1, figsize=(8., 18.), sharex=True)
ax[-1].set_xlabel('Iteration', fontsize=20.)
for i in range(len(ax)):
try:
ax[i].set_ylabel('$R$, {:s}'.format(labels[i]), fontsize=18., family='serif')
except IndexError:
pass
ax[i].tick_params(which='major', axis='both', length=10., labelsize=16.)
ax[i].plot(new_PSRF[:,i], '-', lw=3, c='#0473B3', alpha=0.9)
fig.tight_layout()
This time with a large number of iterations, but smaller number of walkers (back to 300), since the previous experiment appears to indicate that the number of walkers is not an issue, but more the number of iterations, at least in terms of developing convergence among the full set of chains.
In [14]:
newer_flatchain = np.genfromtxt('/Users/grefe950/Software/StarBay/interbay/logs/Sun_W0300_N1500_B0000.dat')
newer_chains = newer_flatchain.reshape(300, -1, 9)
In [15]:
tracePlot(newer_chains, labels=labels, truths=truths)
In [16]:
GelmanRubin(newer_chains, labels=labels)
Out[16]:
In [17]:
newer_PSRF = np.empty((1490, 9))
for i in range(10, 1500):
W, B, var_hat, newer_PSRF[i - 10] = GelmanRubin(newer_chains[:, :i, :]) # Note that GelmanRubin() trims the lower half
In [18]:
fig, ax = plt.subplots(9, 1, figsize=(8., 18.), sharex=True)
ax[-1].set_xlabel('Iteration', fontsize=20.)
for i in range(len(ax)):
try:
ax[i].set_ylabel('$R$, {:s}'.format(labels[i]), fontsize=18., family='serif')
except IndexError:
pass
ax[i].tick_params(which='major', axis='both', length=10., labelsize=16.)
ax[i].plot(newer_PSRF[:,i], '-', lw=3, c='#0473B3', alpha=0.9)
fig.tight_layout()
It is clear that pushing the potential reduction scale factor below 2.0 is exceptionally difficult in most sitautions, with convergence occurring very slowly. To estimate how many more iterations would be necessary to push $\hat{R}$ below 1.2, we perform a simple linear extrapolation based on the slope of $\hat{R}$ as a function of iternation number over the final 500 iterations.
In [19]:
slopes = (newer_PSRF[-1,:] - newer_PSRF[-500, :])/(len(newer_PSRF[:-1]) - len(new_PSRF[:-500]))
print (1.2 - newer_PSRF[-1,:])/slopes + len(new_PSRF[:-1])
In [20]:
# remove spurious chains/points from the flatchain
newer_flatchain = newer_chains[:,-300:,:].reshape(-1, 9)
newer_flatchain = np.array([x for x in flatchain if x[0] > 0.0 and x[6] > 3.4])
fig, ax = plt.subplots(3, 3, figsize=(12, 12))
xlimits = [(0.7, 1.0), (-0.4, 0.4), (0.21, 0.33), (8.7, 10.1), (2, 7), (0.5, 3.0),
(3.6, 4.0), (-6.5, -5.8), (1., 3.)]
for i in range(9):
row = int(np.floor(i / 3.))
col = i%3
ax[row,col].set_xlabel(labels[i], fontsize=20.)
ax[row,col].set_ylabel('Number', fontsize=20.)
ax[row,col].set_xlim(xlimits[i])
ax[row,col].tick_params(which='major', axis='both', length=10., labelsize=12.)
ax[row,col].hist(newer_flatchain[:,i], bins=50., facecolor='#0473B3', alpha=0.8)
fig.tight_layout()
In [21]:
# compute statistics on the flattened chains
print '\n================'
print "\nW = 300, N = 600"
print "\nMeans"
print np.mean(flatchain, axis=0)
print "\nMedians"
print np.median(flatchain, axis=0)
print "\nStandard Deviations"
print np.std(flatchain, axis=0)
print '\n--'
print "\n50th Percentiles (Median)"
print np.percentile(flatchain, 50., axis=0)
print "\n16th Percentiles"
print np.percentile(flatchain, 16., axis=0)
print "\n84th Percentiles"
print np.percentile(flatchain, 84., axis=0)
print '\n================'
print "\nW = 1000, N = 500"
print "\nMeans"
print np.mean(new_flatchain, axis=0)
print "\nMedians"
print np.median(new_flatchain, axis=0)
print "\nStandard Deviations"
print np.std(new_flatchain, axis=0)
print '\n--'
print "\n50th Percentiles (Median)"
print np.percentile(new_flatchain, 50., axis=0)
print "\n16th Percentiles"
print np.percentile(new_flatchain, 16., axis=0)
print "\n84th Percentiles"
print np.percentile(new_flatchain, 84., axis=0)
print '\n================'
print "\nW = 300, N = 1500"
print "\nMeans"
print np.mean(newer_flatchain, axis=0)
print "\nMedians"
print np.median(newer_flatchain, axis=0)
print "\nStandard Deviations"
print np.std(newer_flatchain, axis=0)
print '\n--'
print "\n50th Percentiles (Median)"
print np.percentile(newer_flatchain, 50., axis=0)
print "\n16th Percentiles"
print np.percentile(newer_flatchain, 16., axis=0)
print "\n84th Percentiles"
print np.percentile(newer_flatchain, 84., axis=0)
What messages can we take away from these experiments? First, one can easily not that our original set of walkers delivered a very robust solution. It therefore appear that the walkers were adequately sampling from the target distribution. Overall, we did not recover the mass of the Sun, which is almost certainly a consequence of the fact that our grid tops out anywhere between 0.95 and 1.0 solar mass. Not surprisingly, then, the simulation attempted to produce a star with a hotter effective temperature and larger bolometric flux by other means, namely the increasing of the helium abundance, lowering of the overall metallicity, and estimating a closer distance.
However, it's interesting to note that the simulations did not perform poorly. The solar age is nearly recovered by the simulation, with a final age estimate of $4.0^{+3.6}_{-2.1}$ Gyr, with the errors quoted being the 68% confidence interval from the posterior distribution (16th and 84th percentiles from the cumulative distribution). The solar helium abundance is also partially recovered, despite the simulations preference for higher initial helium contents to compensate for a poor grid extension at higher masses. The solar-calibrated helium abundance is $0.276$ dex, whereas our simulation suggests $0.30^{+0.03}_{-0.02}$ dex. Agreement is tenuous, and perhaps expected given the fairly narrow range of permitted helium abundances.
We also observe that the convective mixing length parameter is reproduced. This should be interpretted cautiously, since the distribution, as displayed in the histograms above, peaks around 1.5 with a small bump in the distribution thereafter (centered on the median). How, precisely, to treat the analysis of posterior distribution for the mixing length is an open question, in my mind.
Brooks & Gelman 1998, Journal of Computational and Graphical Statistics, 7, 434 – 455; General Methods for Monitoring Convergence of Iterative Simulations
Gelman & Rubin 1992, Statistical Science, 7, 457 – 511; Inference from Iterative Simulation Using Multiple Sequences