```
In [1]:
```# Configure Jupyter so figures appear in the notebook
%matplotlib inline
# Configure Jupyter to display the assigned value after an assignment
%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'
import numpy as np
import pandas as pd
# import classes from thinkbayes2
from thinkbayes2 import Pmf, Cdf, Suite, Joint
from thinkbayes2 import MakePoissonPmf, EvalBinomialPmf, MakeMixture
import thinkplot

According to hydrologists, the distribution of total daily rainfall (for days with rain) is well modeled by a two-parameter gamma distribution. There are several ways to parameterize the gamma distribution; we'll use the one with a shape parameter, $k$, and a scale parameter, $\theta$, described by this PDF:

$ \mathrm{pdf}(x; k, \theta) = \frac{1}{\Gamma(k)\theta^k} x^{k-1} \exp(-x/\theta) $

where $\Gamma$ is the gamma function.

Evaluate this PDF for $x=2$, $k=3$, and $\theta=2$ using NumPy functions and

`scipy.special.gamma`

.Evaluate this PDF using

`scipy.stats.gamma`

, and confirm that you get the same answer.During the last three days in the Boston area, we have measured the following rainfalls in inches: 0.78, 0.87, 0.64.

Use this data to compute a joint posterior distributions for the parameters of the gamma distribution.

You can use the following priors:

For $k$, Half-normal with $\sigma = 0.5$.

For $\theta$, Half-normal with $\sigma = 4$.

What are the posterior means for $k$ and $\theta$?

Generate a predictive distribution for the amount of rain we will get tomorrow (assuming that it rains at all). What is the predictive mean?

```
In [2]:
```from scipy.special import gamma
def gamma_pdf(x, k, theta):
return x**(k-1) * np.exp(-x/theta) / gamma(k) / theta**k

`scipy.stats`

, translating from the $k$, $\theta$ parameterization to SciPy's inhumane parameterization.

```
In [3]:
```from scipy import stats
def gamma_pdf2(x, k, theta):
return stats.gamma(k, scale=theta).pdf(x)

Evaluting the PDF at a test location...

```
In [4]:
```x = 2
k = 3
theta = 2
gamma_pdf(x, k, theta)

```
Out[4]:
```

And comparing to the results from SciPy

```
In [5]:
```gamma_pdf2(x, k, theta)

```
Out[5]:
```

Now here's the `Suite`

we'll use to estimate parameters from data.

```
In [6]:
```class Rainfall(Suite, Joint):
def Likelihood(self, data, hypo):
"""
data: observed rainfall
hypo: k, theta
"""
k, theta = hypo
x = data
like = gamma_pdf(x, k, theta)
return like

For the priors, we'll use a HalfNormal for `k`

```
In [7]:
```from scipy.stats import norm
ks = np.linspace(0.01, 2, 101)
ps = norm(0, 0.5).pdf(ks)
pmf_k = Pmf(dict(zip(ks, ps)));

And a HalfNormal for `theta`

```
In [8]:
```thetas = np.linspace(0.01, 12, 101)
ps = norm(0, 4).pdf(thetas)
pmf_theta = Pmf(dict(zip(thetas, ps)));

Now we can initialize the suite.

```
In [9]:
```from thinkbayes2 import MakeJoint
suite = Rainfall(MakeJoint(pmf_k, pmf_theta));

And update it.

```
In [10]:
```data = [0.78, 0.87, 0.64]
%time suite.UpdateSet(data)

```
Out[10]:
```

To my surprise, the simple implementation of the PDF using NumPy functions is substantially faster than the SciPy implementation, which evaluates the log-PDF and then exponentiates it.

If there's a good reason for that, it's probably because the numerical behavior is better, but the performance hit is big.

Anyway, here's the posterior marginal for `k`

:

```
In [11]:
```post_k = suite.Marginal(0)
print(post_k.Mean())
thinkplot.Pdf(post_k)
thinkplot.decorate(xlabel='k', ylabel='PDF')

```
```

And here's the posterior marginal for `theta`

```
In [12]:
```post_theta = suite.Marginal(1)
print(post_theta.Mean())
thinkplot.Pdf(post_theta)
thinkplot.decorate(xlabel='theta', ylabel='PDF')

```
```

To make the predictive distribution, we'll need to make PMF approximations to gamma distributions.

```
In [13]:
```def make_gamma_pmf(xs, k, theta):
ps = gamma_pdf(xs, k, theta)
return Pmf(dict(zip(xs, ps)))

Here's a test case.

```
In [14]:
```xs = np.linspace(0, 20)
pmf = make_gamma_pmf(xs, 3, 2)
thinkplot.Pdf(pmf)

```
```

```
In [27]:
```xs = np.linspace(0.001, 30, 1001)
metapmf = Pmf()
for (k, theta), p in suite.Items():
pmf = make_gamma_pmf(xs, k, theta)
metapmf[pmf] = p

```
In [28]:
```pred_pmf = MakeMixture(metapmf)
print(pred_pmf.Mean())
thinkplot.Pdf(pred_pmf)

```
```

```
In [17]:
```from warnings import simplefilter
simplefilter('ignore', FutureWarning)
import pymc3 as pm

Here's the model in three lines. The only trick part is translating to yet another parameterization.

```
In [18]:
```model = pm.Model()
with model:
k = pm.HalfNormal('k', 0.5)
theta = pm.HalfNormal('theta', 4)
rain = pm.Gamma('rain', alpha=k, beta=1/theta, observed=data)

Sampling worked well enough with the default parameters.

```
In [19]:
```with model:
trace = pm.sample()

```
```

Here are the posterior distributions.

```
In [20]:
```pm.traceplot(trace);

```
```

```
In [21]:
```pm.plot_posterior(trace);

```
```

Here are the posterior means.

```
In [22]:
```trace['k'].mean()

```
Out[22]:
```

```
In [23]:
```trace['theta'].mean()

```
Out[23]:
```

```
In [24]:
```with model:
pred = pm.sample_ppc(trace)

```
```

And the posterior predictive mean.

```
In [25]:
```pred['rain'].mean()

```
Out[25]:
```

Comparing the results from MCMC and the grid algorithm

```
In [26]:
```cdf = Cdf(pred['rain'].flatten())
thinkplot.Cdf(cdf, label='MCMC')
thinkplot.Cdf(pred_pmf.MakeCdf(), label='Grid')
thinkplot.decorate(xlabel='Predicted rainfall',
ylabel='CDF')

```
```

```
In [ ]:
```