Before doing anything,
Make a copy of this notebook in your clone of the private course repository. Edit that copy, not the version in the public repository clone. Please follow the instructions in the private repo README by locating your solutions on a path like phys366_2019/HW_week<N>/<Your Name(s)>/
and note that case matters. Take a minute to look hard at the directory structure and choose your filenames carefully: if your solutions are not in the right place they will not be graded.
Remember that you will submit your solution by pull request to the private repository.
This week you also need to produce a project plan and PGM. As explained in the "Project Milestones" instructions:
At this point, you should have an idea and a team that you expect to stick with. Your task this week is to submit a rough plan for what you aim to accomplish, i.e. the scope of the project, and a guess at when various things will be done. In addition, sketch a PGM for the model involved in your project.
As with your homework solution, push your (team's) plan and PGM to your fork of the private repo in to a file named phys366_2019/Project_milestones/<project name>/plan.ipynb
and submit a PR to the private repository. Writing your plan as a jupyter notebook is a good idea, so you can iterate on any PGM code you write later (or just use the notebook to display a graphics file of your PGM as part of your plan).
Once you've submitted your solution, don't forget to also fill out the very quick (and required!) weekly feedback form.
In the Week 5 Tutorial we developed machinery for evaluating a simple model for our cluster mis-centering distance data. We will now re-use this machinery to compare the simple model we assumed in class (Model 1, $H_1$) with a more complex model of your design (Model 2, $H_2$), repeating the various fitting and evaluating steps that we did for Model 1 for this new model.
If you did not complete the tutorial in class, do so now, since this homework depends on it.
In [ ]:
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
plt.rcParams['xtick.labelsize'] = 'large'
plt.rcParams['ytick.labelsize'] = 'large'
%matplotlib inline
# or %matplotlib notebook
In [ ]:
# Read in the data to a global variable, as in the tutorial:
y = np.loadtxt('../data/model_comparison.dat')
In [ ]:
# Edit, uncomment and execute this cell to read in your tutorial code, downloaded as python from the notebook.
# Then edit it to just contain the code you need.
# %load phils_model_comparison_tutorial.py
Once you have your machinery loaded in, re-do your evaluation of model 1 from the tutorial, step by step, and check that it reproduces your tutorial results.
In [ ]:
Model1 = SimpleModel()
In [ ]:
samples1 = Model1.draw_samples_from_posterior()
In [ ]:
visual_check(Model1)
In [ ]:
p1 = posterior_predictive_check(Model1)
In [ ]:
DIC1, pD1 = DIC(Model1)
In [ ]:
logE1 = log_evidence(Model1)
Choose a more complex model to compare to Model 1 from the tutorial. A good strategy could be to use a mixture of two standard PDFs (implemented in scipy
) for the sampling distribution, with a parameter indicating the relative proportion of the two components, such that
$P(y|\theta_1,\theta_2,f,H_2) = f\,P_1(y|\theta_1,H_2) + (1-f)\,P_2(y|\theta_2,H_2)$.
But feel free to do something different if you like. You will also need to assign sensible prior PDFs for your model parameters $\mathbf{a}$. Note that you will now have an array of parameters to infer. Notice that in the mixture model suggested above, $f$ is also a parameter of the model; in this case, you'd have $\mathbf{a}=(\theta_1,\theta_2,f)$.
Write out the formula for your model sampling distribution $P(\mathbf{y}|\mathbf{a},H_2)$, and for all the terms in the joint prior $P(\mathbf{a}|H_2)$. Write a few sentences explaining your choices.
Model 2: $P(\mathbf{y}|\mathbf{a},H_2) = \ldots$
Priors: $P(\mathbf{a}|H_2) = \ldots$
Explanation: ...
Now copy, extend, or sub-class SimpleModel
to include implementations of the likelihood, posterior, and predictive functions for your Model 2. As in the tutorial, you'll want to do a few spot checks to make sure the class methods are doing what you want them to do.
NB. You'll be moving from 1D to multiple dimensions, so watch out for array shape errors.
In [ ]:
class AlternativeModel(object):
"""
Parameter inference and model evaluation in a more advanced cluster mis-centering analysis.
"""
In [ ]:
Model2 = AlternativeModel()
In [ ]:
samples2 = Model2.draw_samples_from_posterior()
In [ ]:
visual_check(Model2)
In the tutorial you designed a test statistic $T(y)$ for use in a quantitative posterior predictive model check. Carry out the same check for Model 2, and report the probability $P(T > T(y)|y,H_2)$. Does your new model fit the data better than Model 1 in this regard?
In [ ]:
p2 = posterior_predictive_check(Model2)
In [ ]:
DIC2, pD2 = DIC(Model2)
In [ ]:
logE2 = log_evidence(Model2)
In [ ]:
Use emcee
's parallel tempering functionality to calculate the log-evidences for the two models above (see the documentation). Compare the difference in log-evidence with what you found above.
In [ ]: