REMINDERS

Submitting your homework

Before doing anything,

  1. Make a copy of this notebook in your clone of the private course repository. Edit that copy, not the version in the public repository clone. Please follow the instructions in the private repo README by locating your solutions on a path like phys366_2019/HW_week<N>/<Your Name(s)>/ and note that case matters. Take a minute to look hard at the directory structure and choose your filenames carefully: if your solutions are not in the right place they will not be graded.

  2. Remember that you will submit your solution by pull request to the private repository.

Submitting your project pitch

This week you also need to produce a project pitch. As explained in the "Project Milestones" instructions:

With your team, write up a proto-abstract for your project concept, explaining what you want to learn, what data can help, and roughly how you plan to proceed.

As with your homework solution, push your (team's) pitch to your fork of the private repo in a folder like phys366_2019/Project_pitches/<your name(s)>/ and submit a PR to the private repository. You can write your pitch in either markdown, plain text, or jupyter notebook form - all will display reasonably well on the repo.

Sending us feedback

Once you've submitted your solution, don't forget to also fill out the very quick (and required!) weekly feedback form.


Week 4 Homework

Articulating Uncertainty: Credible Regions and Confidence Intervals

Let's work through the example inference described by Jake Vanderplas here (pages 4-6), and get some practice in handling and interpreting both Bayesian credible regions and Frequentist confidence intervals.

Jaynes' Truncated Exponential Model

Let's explore Jaynes' model that Vanderplas relays:

Consider a device that “...will operate without failure for a time θ because of a protective chemical inhibitor injected into it; but at time θ the supply of the chemical is exhausted, and failures then commence, following an exponential failure law [with a characteristic decay time of 1 day]. It is not feasible to observe the depletion of this inhibitor directly; one can observe only the resulting failures. From data on actual failure times, estimate the time θ of guaranteed safe operation...”

Suppose we observe three device failures. We don't know the exhaustion time $\theta$, but we see the three devices fail after $\mathbf{x} = [10, 12 , 15]$ days. What is the value of $\theta$?

This is an inference problem: we have some data $\mathbf{x}$ and a model for how it was generated; the task is to infer the exhaustion time $\theta$.

  • The Bayesian solution is to compute the posterior probability distribution for $\theta$ given the data $\mathbf{x}$, and report a credible region for $\theta$.

  • The Frequentist solution is to find an estimator $\hat{\theta}$ for $\theta$ given the data $\mathbf{x}$, and report a confidence interval for that estimator.

Both approaches will use the same model, which dictates a sampling distribution for $\mathbf{x}$, and hence the same likelihood function (which is the sampling distribution written as a function of the model parameter $\theta$). Whether or not we use this likelihood function will be crucial.

The results need to be articulated carefully - in Vanderplas' words:

  • Bayesian credible region: "Given our observed data, there is a 95% probability that the true value of $\theta$ lies within the credible region."

  • Frequentist confidence interval: “If the experiment is repeated many times, in 95% of these cases the computed confidence interval will contain the true value of $\theta$."

Frequentist Approach

Here's the sampling distribution for the $k^{th}$ failure time implied by Jaynes' model:

$P(x_k|\theta) = \exp(\theta - x_k)\;\;\;\;x_k > \theta$, and 0 otherwise.

Vanderplas points out that the mean of $x$ over its sampling distribution $P(x|\theta)$ is $\theta + 1$, and hence that an estimator for $\theta$ is $\hat{\theta} = \bar{x} - 1$ where $\bar{x}$ is the sample mean of the data $\mathbf{x}$.

Vanderplas then derives an approximate form for the sampling distribution for this estimator:

"The exponential distribution has a standard deviation of 1, so in the limit of large $N$, we can use the standard error of the mean to show that the sampling distribution of $\hat{\theta}$ will approach normal with variance $\sigma =1/\sqrt{N}$."

Under this Gaussian approximation, the the $\pm 2\sigma$ range corresponds to the 95.4% Frequentist confidence interval for the estimator $\hat{\theta}$:

$ CI \approx (\hat{\theta} - 2/\sqrt{N}, \hat{\theta} + 2/\sqrt{N})$

We'll use "95%" interchangeably with "$2\sigma$" below, but note that this is actually very bad practice. People do actually mean 95% when they say 95% in general, and the difference between a 95% range and a 95.4 (and change) % range can be noticeable, given that we're out in the tails of the distribution.

1. Warm-up: write functions to compute the sample mean estimator and its confidence interval (given above), and calculate them for the given $N=3$ dataset.


It seems problematic that the Frequentist confidence interval that you calculated above only contains unphysical values of the exhaustion time $\theta$ (that are all greater than the minimum observed failure time). Recall that the meaning of the confidence interval is the following, in Vanderplas' words:

“If this experiment is repeated many times, in 95% of these cases the computed confidence interval will contain the true value of $\theta$."

2. Generate 10000 3-value mock datasets using the same true value of $\theta$, and compute the 95% confidence interval for $\theta$ in each one. Show that the 95% confidence interval does contain the true value of $\theta$ about 9500 times. GIven this result, what can you say about the particular dataset $\mathbf{x}$ that was given in this problem?


3. Write down the likelihood function $\mathcal{L}(\theta; x) = P(x|\theta)$ and plot it for the data set $\mathbf{x}$ given above. What do you notice about the unphysical values of $\theta$ in this plot? Derive an expression for the maximum likelihood estimator (MLE) for $\theta$.


We saw in class that for a Gaussian likelihood function, the quantity $-2 \log \mathcal{L} = \chi^2$ is expected to be distributed (over datasets) according to a $\chi^2$ distribution, such that the 2 sigma, 95% confidence interval for the single parameter $\theta$ has edges lying at $-2 \Delta \log \mathcal{L} = 4$ from the maximum likelihood point.

4. Show that the joint likelihood can be written as $\mathcal{L}(\theta; x) = e^{N(\theta-\bar{x})}$, for $\theta \leq \min x$ and 0 for $\theta > \min x$, and derive analytic expressions for the upper and lower edge of the 95% (i.e. $2\sigma$) confidence interval under the assumption that $-2 \log \mathcal{L}(\theta; x) \sim \chi^2$.


5. Write a function to compute a 95% confidence interval for $\theta$ using the criterion $-2 \Delta \log \mathcal{L} = 4$. Does this interval contain the true value of $\theta$ in about 9500 mock datasets out of 10000? Comment on your result.


Bayesian Approach

Assuming a very broad uniform prior PDF for $\theta$, Vanderplas gives the posterior PDF for $\theta$ as:

$P(\theta|\mathbf{x}) = N e^{N(\theta-\min \mathbf{x})}$, for $\theta \leq \min \mathbf{x}$; 0 for $\theta > \min \mathbf{x}$

(i.e the posterior PDF is simply proportional to the likelihood function in part 4, as expected for a uniform prior).

The 95% maximum density credible region extends from $\theta_1$ to $\theta_2 = \min \mathbf{x}$, where $\theta_1$ is determined by the integral

$\int_{\theta_1}^{\theta_2} P(\theta|\mathbf{x}) d\theta = 0.95$

Vanderplas does the integral for us:

$\theta_1 = \theta_2 + \frac{1}{N} \log \left[ 1 - 0.95 ( 1 - e^{-N \theta_2} ) \right] $

6. Write a function to compute the Bayesian 95% credible region for the parameter $\theta$ and compare the result for the single dataset $x = [10,12,15]$ with the Frequentist confidence intervals from parts 1 and 5 above.


The Bayesian credible region has the following meaning - again articulated by Vanderplas:

"Given our observed data, there is a 95% probability that the true value of $\theta$ lies within the credible region."

To test that our credible region is accurate, we can again do a numerical test involving a large number of datasets, but this time each will need its own $\theta$ value; the 95% credible region for each $\theta$ given its own dataset should contain the true value of that $\theta$ with 95% probability.

7. Simulate 10,000 measurements, by generating a mock 3-value dataset with its own $\theta$ value drawn from the prior PDF, and compute the 95% credible region for $\theta$ in each inference. Does the true value of $\theta$ lie within the 95% credible region with 95% probability?


Solution

1. Frequentist Sample Mean Estimator


In [ ]:
import numpy as np
import scipy.stats

x0 = np.array([10., 12., 15.])

In [ ]:
# From class notes: the relationship between the confidence level in a confidence interval, 
# and the number of sigma it corresponds to:

dimensions = 1
level = 0.95
sigma = np.sqrt(scipy.stats.chi2.ppf(level, dimensions))

print("A {:.1f} sigma confidence interval in {:d} dimensions is also a {:.0f}% confidence interval, assuming the estimator is Gaussian distributed.".format(sigma, dimensions, 100*level))

2. Checking the Sample Mean Estimator Confidence Interval


In [ ]:
# Generate 10000 N=3 datapoints for the same true value of theta, 
# compute the CI from each dataset, and test it for coverage.

Ns = 10000
theta = 10.0

3. Likelihood function

Derivation:

Plot:


In [ ]:
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
%matplotlib inline

4. MLE and Confidence Interval: Derivation

5. MLE and Confidence Interval: Calculation and Verification


In [ ]:
# Generate 10000 N=3 datapoints for the same true value of theta, 
# compute the CI from each dataset, and test it for coverage.

Ns = 10000
theta = 10.0

Commentary:

6. Bayesian Inference

7. Bayesian Credible Region Verification


In [ ]:
# Draw 10000 values of theta from the prior uniform range, 
# and generate an N=3 dataset for each one.
# Then compute the CR from each dataset and test it for coverage.
# Here I've assumed that an upper limit of 30 puts us in the regime of a "broad, uniform prior" assumed above.

Ns = 10000
thetas = 30.0*np.random.random(Ns)

Commentary: