We have observed strange posterior sample behavior from our likelihood calculation and use this notebook to examine the individual integrals. We begin by fixing a set of hyperparameters and generating the distribution of integrals in the log likelihood calculation.
In [205]:
% load_ext autoreload
% autoreload 2
% matplotlib inline
In [206]:
from bigmali.grid import Grid
from bigmali.likelihood import BiasedLikelihood
from bigmali.prior import TinkerPrior
from bigmali.hyperparameter import get
In [207]:
import pandas as pd
data = pd.read_csv('/Users/user/Code/PanglossNotebooks/MassLuminosityProject/mock_data.csv')
grid = Grid()
prior = TinkerPrior(grid)
lum_obs = data.lum_obs[:10 ** 4]
z = data.z[:10 ** 4]
bl = BiasedLikelihood(grid, prior, lum_obs, z)
Below we collect the values for the integrals corresponding to a subset of $10^4$ galaxies (with corresponding lum_obs and z). We collect these values when 100, 1,000, and 10,000 draws from the biased distribution are used in order to compare how sensitive the integrals are to the number of biased distribution samples. We also can examine the runtime and see that from 1,000 to 10,000 samples the runtime grows close to linearly.
In [25]:
hypers = get()
%time vals100 = map(lambda lum_obs,z: bl.single_integral(*(hypers + [lum_obs, z]), nsamples=100), lum_obs, z)
%time vals1000 = map(lambda lum_obs,z: bl.single_integral(*(hypers + [lum_obs, z]), nsamples=1000), lum_obs, z)
%time vals10000 = map(lambda lum_obs,z: bl.single_integral(*(hypers + [lum_obs, z]), nsamples=10000), lum_obs, z)
In [42]:
import matplotlib.pyplot as plt
font = {'family' : 'normal',
'weight' : 'normal',
'size' : 16}
plt.rc('font', **font)
plt.hist(vals100, alpha=0.5, label='nsamples=100', bins=20)
plt.hist(vals1000, alpha=0.5, label='nsamples=1000', bins=20)
plt.hist(vals10000, alpha=0.5, label='nsamples=10000', bins=20)
plt.title('Distribution of Integral Values At Different Sample Sizes')
plt.ylabel('Density')
plt.xlabel('Value')
plt.gcf().set_size_inches((10,6))
plt.legend(loc=2);
This result is a bit surprising. It suggests we may not need many samples to characterize an integral. To examine this further let's isolate a single integral and see how its value changes as we increase the number of samples. We do this for three different galaxies to limit the chance we are making conclusions on outliers
In [109]:
single_lum_obs0 = lum_obs[0]
single_z0 = z[0]
single_lum_obs1 = lum_obs[100]
single_z1 = z[100]
single_lum_obs2 = lum_obs[5000]
single_z2 = z[5000]
space = np.linspace(1, 1000, 1000)
single_vals = np.zeros((1000,3))
for i, nsamples in enumerate(space):
single_vals[i,0] = bl.single_integral(*(hypers + [single_lum_obs0, single_z0]))
single_vals[i,1] = bl.single_integral(*(hypers + [single_lum_obs1, single_z1]))
single_vals[i,2] = bl.single_integral(*(hypers + [single_lum_obs2, single_z2]))
plt.subplot(311)
plt.plot(space, single_vals[:,0])
plt.title('Integral 0')
plt.xlabel('Biased Distribution Samples')
plt.ylabel('Log-Likelihood')
plt.subplot(312)
plt.plot(space, single_vals[:,1])
plt.title('Integral 1')
plt.xlabel('Biased Distribution Samples')
plt.ylabel('Log-Likelihood')
plt.subplot(313)
plt.plot(space, single_vals[:,2])
plt.title('Integral 2')
plt.xlabel('Biased Distribution Samples')
plt.ylabel('Log-Likelihood')
plt.gcf().set_size_inches((10,6))
plt.tight_layout()
A few things are concerning about these results. First, they bobble within a window and never converge. Second, even one sample provides a reasonable approximation. We will need to start examining the individual weights of the biased samples. Below we get the internal weights used in a single integral. The keys of the dataframe correspond to the distributions here:
In [79]:
single_lum_obs0 = lum_obs[0]
single_z0 = z[0]
internals = bl.single_integral_samples_and_weights(*(hypers + [single_lum_obs0, single_z0]))
print single_lum_obs0
print single_z0
internals
Out[79]:
In [130]:
import seaborn as sns
cols = ['v1','v2','v3','v4','v5']
plt.title('Correlation Between Various Weights')
sns.heatmap(internals[cols].corr(), xticklabels=cols, yticklabels=cols);
In [120]:
font = {'family' : 'normal',
'weight' : 'normal',
'size' : 16}
z_z1 = z[z < 1]
z_z2 = z[z > 1]
lum_obs_z1 = lum_obs[z < 1]
lum_obs_z2 = lum_obs[z > 1]
vals100_z1 = np.array(vals100)[np.where(z < 1)]
vals100_z2 = np.array(vals100)[np.where(z > 1)]
plt.subplot(121)
plt.hist(z, bins=100, alpha=0.6)
plt.xlabel('Value')
plt.ylabel('Count')
plt.title('z Histogram')
plt.subplot(122)
plt.scatter(lum_obs_z1, vals100_z1, label='z < 1', alpha=0.1)
plt.scatter(lum_obs_z2, vals100_z2, label='z > 1', alpha=0.1, color='green')
plt.legend()
plt.ylabel('Log-Likelihood')
plt.xlabel('Luminosity')
plt.title('Log-Likelihood vs Luminosity')
plt.gcf().set_size_inches((14,5))
plt.tight_layout()
Hmmm ... have to think about this a bit.
The next question I want to explore is: How do our biased distributions change as we move towards more probable posterior samples?
In [133]:
hypers2 = hypers
hypers2[-1] = 1
internals2 = bl.single_integral_samples_and_weights(*(hypers2 + [single_lum_obs0, single_z0]))
In [141]:
plt.scatter(np.log(internals['mass_samples']), np.log(internals['lum_samples']), color='blue', label='S = 0.155')
plt.scatter(np.log(internals2['mass_samples']), np.log(internals2['lum_samples']), color='green', label='S = 1')
plt.scatter(np.log(data['mass'][:100]), np.log(data['lum'][:100]), color='red', label='True')
plt.gca().axhline(np.log(single_lum_obs0), color='k', label='lum_obs')
plt.title('Scatter Plots for S=0.155, S=1')
plt.ylabel('Log-Luminosity')
plt.xlabel('Log-Mass')
plt.legend();
In [142]:
hypers3 = hypers
hypers3[-1] = 10
internals3 = bl.single_integral_samples_and_weights(*(hypers3 + [single_lum_obs0, single_z0]))
plt.scatter(np.log(internals['mass_samples']), np.log(internals['lum_samples']), color='blue', label='S = 0.155')
plt.scatter(np.log(internals3['mass_samples']), np.log(internals3['lum_samples']), color='green', label='S = 10')
plt.scatter(np.log(data['mass'][:100]), np.log(data['lum'][:100]), color='red', label='True')
plt.gca().axhline(np.log(single_lum_obs0), color='k', label='lum_obs')
plt.title('Scatter Plots for S=0.155, S=1')
plt.ylabel('Log-Luminosity')
plt.xlabel('Log-Mass')
plt.legend();
Need to think about handling out of bounds of prior more gracefully. Look forward to discussing with Phil ...
In [161]:
print np.sum(vals100)
print np.sum(vals100_s10)
In [160]:
font = {'family' : 'normal',
'weight' : 'normal',
'size' : 16}
vals100_s10 = map(lambda lum_obs,z: bl.single_integral(*(hypers3 + [lum_obs, z]), nsamples=100), lum_obs, z)
z_z1 = z[z < 1]
z_z2 = z[z > 1]
lum_obs_z1 = lum_obs[z < 1]
lum_obs_z2 = lum_obs[z > 1]
vals100_s10_z1 = np.array(vals100)[np.where(z < 1)]
vals100_s10_z2 = np.array(vals100)[np.where(z > 1)]
plt.subplot(121)
plt.hist(z, bins=100, alpha=0.6)
plt.xlabel('Value')
plt.ylabel('Count')
plt.title('z Histogram')
plt.subplot(122)
plt.scatter(lum_obs_z1, vals100_s10_z1, label='z < 1', alpha=0.2, s=1)
plt.scatter(lum_obs_z2, vals100_s10_z2, label='z > 1', alpha=0.2, color='green', s=2)
plt.scatter(lum_obs_z1, vals100_z1, label='z < 1', alpha=0.2, color='red', s=2)
plt.scatter(lum_obs_z2, vals100_z2, label='z > 1', alpha=0.2, color='orange', s=2)
plt.legend()
plt.ylabel('Log-Likelihood')
plt.xlabel('Luminosity')
plt.title('Log-Likelihood vs Luminosity')
plt.gcf().set_size_inches((14,5))
plt.tight_layout()
Our posterior is chasing after low mass!?
In [148]:
internals.describe()
Out[148]:
In [165]:
internals['v1'].mean() + internals['v2'].mean() + internals['v3'].mean() - internals['v4'].mean() - internals['v5'].mean()
Out[165]:
In [166]:
from scipy.misc import logsumexp
logsumexp(internals['v1'] + internals['v2'] + internals['v3'] - internals['v4'] - internals['v5'])
Out[166]:
In [150]:
internals3.describe()
Out[150]:
In [162]:
internals3['v1'].mean() + internals3['v2'].mean() + internals3['v3'].mean() - internals3['v4'].mean() - internals3['v5'].mean()
Out[162]:
In [164]:
from scipy.misc import logsumexp
logsumexp(internals3['v1'] + internals3['v2'] + internals3['v3'] - internals3['v4'] - internals3['v5'])
Out[164]:
In [168]:
internals['arg'] = internals['v1'] + internals['v2'] + internals['v3'] - internals['v4'] - internals['v5']
internals3['arg'] = internals3['v1'] + internals3['v2'] + internals3['v3'] - internals3['v4'] - internals3['v5']
In [179]:
plt.title('The Value of logexpsum Arg')
plt.xlabel('Value')
plt.ylabel('Count')
plt.hist(internals['arg'], bins=20, alpha=0.5)
plt.hist(internals3['arg'], bins=20, alpha=0.5);
In [177]:
internals.describe()
Out[177]:
In [174]:
internals3
Out[174]:
FOUND THE ISSUE!
Our current Tinker10 mass prior favors the lower mass points so heavily that it outweighs other components of the likelihood and dominates the posterior probability. In order to get meaningful results we will have to devise a way to get around this dilema.
In [223]:
plt.hist(data['z'][:10000], normed=True, alpha=0.6)
Out[223]:
In [220]:
plt.title('True Mass PDF')
plt.hist(np.log(data['mass'][:10000]) / np.log(10), normed=True, alpha=0.6)
plt.gca().set_yscale("log")
plt.xlabel('Log Mass')
plt.ylabel('Density')
Out[220]:
In [218]:
plt.title('Prior Mass PDF Evaluations')
space = np.linspace(24, 30, 100)
vals = prior.pdf(np.exp(space), 1.0)
plt.plot(space, vals)
plt.gca().set_yscale("log")
plt.xlabel('Log Mass')
plt.ylabel('Density');