Bayesian Statistics (Unfinished)

In this notebook, we discuss Bayesian statistics from a computational/ practical point of view. We borrow some code examples from chapter 2 of Bayesian Analysis with Python (2nd end) by Osvaldo Martin.



In [1]:

    
%matplotlib inline
import sklearn
import scipy.stats as stats
import scipy.optimize
import matplotlib.pyplot as plt
import seaborn as sns
import time
import numpy as np
import os
import pandas as pd









    



/usr/local/lib/python3.6/dist-packages/statsmodels/tools/_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
  import pandas.util.testing as tm



In [2]:

    
# We install various packages for approximate Bayesian inference
# To avoid installing packages the internet every time you open a colab,
# you can use this trick:
# https://stackoverflow.com/questions/55253498/how-do-i-install-a-library-permanently-in-colab

# The PyMC3 package (https://docs.pymc.io) supports HMC and variational inference
# https://docs.pymc.io/notebooks/api_quickstart.html
!pip install pymc3==3.8
import pymc3 as pm
pm.__version__

# The arviz package (https://github.com/arviz-devs/arviz) can be used to make various plots
# of posterior samples generated by any algorithm. 
!pip install arviz
import arviz as az









    



Collecting pymc3==3.8
  Downloading https://files.pythonhosted.org/packages/32/19/6c94cbadb287745ac38ff1197b9fadd66500b6b9c468e79099b110c6a2e9/pymc3-3.8-py3-none-any.whl (908kB)
     |████████████████████████████████| 911kB 8.9MB/s 
Collecting arviz>=0.4.1
  Downloading https://files.pythonhosted.org/packages/6c/23/73ae3b88a6837fa5a162d984acabfd2e75dc847ed67e5690aa44d02f491a/arviz-0.7.0-py3-none-any.whl (1.5MB)
     |████████████████████████████████| 1.5MB 45.5MB/s 
Requirement already satisfied: scipy>=0.18.1 in /usr/local/lib/python3.6/dist-packages (from pymc3==3.8) (1.4.1)
Requirement already satisfied: numpy>=1.13.0 in /usr/local/lib/python3.6/dist-packages (from pymc3==3.8) (1.18.4)
Requirement already satisfied: theano>=1.0.4 in /usr/local/lib/python3.6/dist-packages (from pymc3==3.8) (1.0.4)
Requirement already satisfied: h5py>=2.7.0 in /usr/local/lib/python3.6/dist-packages (from pymc3==3.8) (2.10.0)
Requirement already satisfied: tqdm>=4.8.4 in /usr/local/lib/python3.6/dist-packages (from pymc3==3.8) (4.41.1)
Requirement already satisfied: patsy>=0.4.0 in /usr/local/lib/python3.6/dist-packages (from pymc3==3.8) (0.5.1)
Requirement already satisfied: pandas>=0.18.0 in /usr/local/lib/python3.6/dist-packages (from pymc3==3.8) (1.0.3)
Requirement already satisfied: packaging in /usr/local/lib/python3.6/dist-packages (from arviz>=0.4.1->pymc3==3.8) (20.3)
Collecting netcdf4
  Downloading https://files.pythonhosted.org/packages/35/4f/d49fe0c65dea4d2ebfdc602d3e3d2a45a172255c151f4497c43f6d94a5f6/netCDF4-1.5.3-cp36-cp36m-manylinux1_x86_64.whl (4.1MB)
     |████████████████████████████████| 4.1MB 49.2MB/s 
Requirement already satisfied: xarray>=0.11 in /usr/local/lib/python3.6/dist-packages (from arviz>=0.4.1->pymc3==3.8) (0.15.1)
Requirement already satisfied: matplotlib>=3.0 in /usr/local/lib/python3.6/dist-packages (from arviz>=0.4.1->pymc3==3.8) (3.2.1)
Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python3.6/dist-packages (from theano>=1.0.4->pymc3==3.8) (1.12.0)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.6/dist-packages (from pandas>=0.18.0->pymc3==3.8) (2018.9)
Requirement already satisfied: python-dateutil>=2.6.1 in /usr/local/lib/python3.6/dist-packages (from pandas>=0.18.0->pymc3==3.8) (2.8.1)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.6/dist-packages (from packaging->arviz>=0.4.1->pymc3==3.8) (2.4.7)
Collecting cftime
  Downloading https://files.pythonhosted.org/packages/0f/5e/ee154e2aabb0beea0c4c7dc3d93c6f64f96a2a2019bbd05afc905439d042/cftime-1.1.3-cp36-cp36m-manylinux1_x86_64.whl (322kB)
     |████████████████████████████████| 327kB 49.5MB/s 
Requirement already satisfied: setuptools>=41.2 in /usr/local/lib/python3.6/dist-packages (from xarray>=0.11->arviz>=0.4.1->pymc3==3.8) (46.3.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=3.0->arviz>=0.4.1->pymc3==3.8) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=3.0->arviz>=0.4.1->pymc3==3.8) (1.2.0)
Installing collected packages: cftime, netcdf4, arviz, pymc3
  Found existing installation: pymc3 3.7
    Uninstalling pymc3-3.7:
      Successfully uninstalled pymc3-3.7
Successfully installed arviz-0.7.0 cftime-1.1.3 netcdf4-1.5.3 pymc3-3.8
Requirement already satisfied: arviz in /usr/local/lib/python3.6/dist-packages (0.7.0)
Requirement already satisfied: pandas>=0.23 in /usr/local/lib/python3.6/dist-packages (from arviz) (1.0.3)
Requirement already satisfied: netcdf4 in /usr/local/lib/python3.6/dist-packages (from arviz) (1.5.3)
Requirement already satisfied: xarray>=0.11 in /usr/local/lib/python3.6/dist-packages (from arviz) (0.15.1)
Requirement already satisfied: packaging in /usr/local/lib/python3.6/dist-packages (from arviz) (20.3)
Requirement already satisfied: matplotlib>=3.0 in /usr/local/lib/python3.6/dist-packages (from arviz) (3.2.1)
Requirement already satisfied: numpy>=1.12 in /usr/local/lib/python3.6/dist-packages (from arviz) (1.18.4)
Requirement already satisfied: scipy>=0.19 in /usr/local/lib/python3.6/dist-packages (from arviz) (1.4.1)
Requirement already satisfied: python-dateutil>=2.6.1 in /usr/local/lib/python3.6/dist-packages (from pandas>=0.23->arviz) (2.8.1)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.6/dist-packages (from pandas>=0.23->arviz) (2018.9)
Requirement already satisfied: cftime in /usr/local/lib/python3.6/dist-packages (from netcdf4->arviz) (1.1.3)
Requirement already satisfied: setuptools>=41.2 in /usr/local/lib/python3.6/dist-packages (from xarray>=0.11->arviz) (46.3.0)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.6/dist-packages (from packaging->arviz) (2.4.7)
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from packaging->arviz) (1.12.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=3.0->arviz) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=3.0->arviz) (1.2.0)

Beta-Binomial model

Exact inference



In [0]:

    
# Plot the Binomial likelihood

n_params = [1, 2, 4]  # Number of trials
p_params = [0.25, 0.5, 0.75]  # Probability of success

x = np.arange(0, max(n_params)+1)
f,ax = plt.subplots(len(n_params), len(p_params), sharex=True, sharey=True,
                    figsize=(8, 7), constrained_layout=True)

for i in range(len(n_params)):
    for j in range(len(p_params)):
        n = n_params[i]
        p = p_params[j]

        y = stats.binom(n=n, p=p).pmf(x)

        ax[i,j].vlines(x, 0, y, colors='C0', lw=5)
        ax[i,j].set_ylim(0, 1)
        ax[i,j].plot(0, 0, label="N = {:3.2f}\nθ = {:3.2f}".format(n,p), alpha=0)
        ax[i,j].legend()

        ax[2,1].set_xlabel('y')
        ax[1,0].set_ylabel('p(y | θ, N)')
        ax[0,0].set_xticks(x)



In [0]:

    
# Plot the beta prior

params = [0.5, 1, 2, 3]
x = np.linspace(0, 1, 100)
f, ax = plt.subplots(len(params), len(params), sharex=True, sharey=True,
                     figsize=(8, 7), constrained_layout=True)
for i in range(4):
    for j in range(4):
        a = params[i]
        b = params[j]
        y = stats.beta(a, b).pdf(x)
        ax[i,j].plot(x, y)
        ax[i,j].plot(0, 0, label="α = {:2.1f}\nβ = {:2.1f}".format(a, b), alpha=0)
        ax[i,j].legend()
ax[1,0].set_yticks([])
ax[1,0].set_xticks([0, 0.5, 1])
f.text(0.5, 0.05, 'θ', ha='center')
f.text(0.07, 0.5, 'p(θ)', va='center', rotation=0)



In [0]:

    
# Compute and plot posterior (black vertical line = true parameter value)

plt.figure(figsize=(10, 8))

n_trials = [0, 1, 2, 3, 4, 8, 16, 32, 50, 150]
data = [0, 1, 1, 1, 1, 4, 6, 9, 13, 48]
theta_real = 0.35

beta_params = [(1, 1), (20, 20), (1, 4)]
dist = stats.beta
x = np.linspace(0, 1, 200)

for idx, N in enumerate(n_trials):
    if idx == 0:
        plt.subplot(4, 3, 2)
        plt.xlabel('θ')
    else:
        plt.subplot(4, 3, idx+3)
        plt.xticks([])
    y = data[idx]
    for (a_prior, b_prior) in beta_params:
        p_theta_given_y = dist.pdf(x, a_prior + y, b_prior + N - y)
        plt.fill_between(x, 0, p_theta_given_y, alpha=0.7)

    plt.axvline(theta_real, ymax=0.3, color='k')
    plt.plot(0, 0, label=f'{N:4d} trials\n{y:4d} heads', alpha=0)
    plt.xlim(0, 1)
    plt.ylim(0, 12)
    plt.legend()
    plt.yticks([])
plt.tight_layout()

Credible intervals



In [3]:

    
# We illustrate how to compute a 95% posterior credible interval for a random variable
# with a beta distribution.

from scipy.stats import beta

np.random.seed(42)
theta_real = 0.35
ntrials = 100
data = stats.bernoulli.rvs(p=theta_real, size=ntrials)

N = ntrials; N1 = sum(data); N0 = N-N1; # Sufficient statistics
aprior = 1; bprior = 1; # prior
apost = aprior + N1; bpost = bprior + N0 # posterior

# Interval function
alpha = 0.05
CI1 = beta.interval(1-alpha, apost, bpost)
print('{:0.2f}--{:0.2f}'.format(CI1[0], CI1[1])) # (0.06:0.52) 

# Use the inverse CDF (percent point function)
l  = beta.ppf(alpha/2, apost, bpost)
u  = beta.ppf(1-alpha/2, apost, bpost)
CI2 = (l,u)
print('{:0.2f}--{:0.2f}'.format(CI2[0], CI2[1])) # (0.06:0.52) 

# Use Monte Carlo sampling
samples = beta.rvs(apost, bpost, size=10000)
samples = np.sort(samples)
CI3 = np.percentile(samples, 100*np.array([alpha/2, 1-alpha/2])) 
print('{:0.2f}--{:0.2f}'.format(CI3[0], CI3[1])) # (0.06:0.51) 
print(np.mean(samples))









    



0.24--0.42
0.24--0.42
0.24--0.42
0.3230834644012448



In [4]:

    
# Use arviz to plot posterior.
# By default, it shows the 94\% interval, but we change it to 95%.

az.plot_posterior({'θ':samples}, credible_interval=0.95);



In [5]:

    
# See if the parameter is inside the region of practical equivalence centered at 0.5
az.plot_posterior(samples, rope=[0.45, .55]);









    Out[5]:





array([<matplotlib.axes._subplots.AxesSubplot object at 0x7fc227f147b8>],
      dtype=object)



In [6]:

    
# From the above plot, we see that the HPD does not overlap the ROPE,
#so we can confidently say the parameter is "significiantly different" from 0.5

# We can also verify this by checking if 0.5 is in the HPD
az.plot_posterior(samples,  credible_interval=0.95, ref_val=0.5);



In [19]:

    
# Summarize  posterior samples
az.summary(samples)
# We can ignore the warning about not having enough 'chains',
# since we are drawing exact iid samples from the posterior.









    



arviz.stats.stats_utils - WARNING - Shape validation failed: input_shape: (1, 10000), minimum_shape: (chains=2, draws=4)






    Out[19]:







  
    
      
      mean
      sd
      hpd_3%
      hpd_97%
      mcse_mean
      mcse_sd
      ess_mean
      ess_sd
      ess_bulk
      ess_tail
      r_hat
    
  
  
    
      x
      0.323
      0.046
      0.239
      0.41
      0.041
      0.04
      1.0
      1.0
      1.0
      10.0
      NaN

Point estimates

We minimize the posterior expected loss, using L2 loss (estimator is posterior mean) or L1 loss (estimator is posterior median).



In [0]:

    
grid = np.linspace(0, 1, 200)
θ_pos = samples #trace['θ']
lossf_a = [np.mean(abs(i - θ_pos)) for i in grid]
lossf_b = [np.mean((i - θ_pos)**2) for i in grid]

for lossf, c in zip([lossf_a, lossf_b], ['C0', 'C1']):
    mini = np.argmin(lossf)
    plt.plot(grid, lossf, c)
    plt.plot(grid[mini], lossf[mini], 'o', color=c)
    plt.annotate('{:.3f}'.format(grid[mini]),
                 (grid[mini], lossf[mini] + 0.03), color=c)
    plt.yticks([])
    plt.xlabel(r'$\hat \theta$')

MCMC inference

We will use pymc3 to approximate the posterior of this simple model. Code is based on this notebook.



In [7]:

    
data # same as above









    Out[7]:





array([0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1,
       0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
       0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,
       1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0])



In [9]:

    
with pm.Model() as model:
    # a priori
    θ = pm.Beta('θ', alpha=aprior, beta=bprior)
    # likelihood
    y = pm.Bernoulli('y', p=θ, observed=data)

pm.model_to_graphviz(model) # show the DAG









    Out[9]:



In [10]:

    
# run MCMC (defaults to 2 chains)
with model:
    trace = pm.sample(1000, random_seed=123)









    



Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [θ]
Sampling chain 0, 0 divergences: 100%|██████████| 1500/1500 [00:01<00:00, 1155.94it/s]
Sampling chain 1, 0 divergences: 100%|██████████| 1500/1500 [00:00<00:00, 1877.55it/s]



In [13]:

    
# Standard MCMC diagonistics
pm.traceplot(trace);
Rhat = pm.rhat(trace);
print(Rhat) # should be close to 1.0









    



<xarray.Dataset>
Dimensions:  ()
Data variables:
    θ        float64 1.006



In [14]:

    
pm.plot_posterior(trace);
# The samples from MCMC (called "trace") should be similar to the exact
# iid samples from the posterior, plotted above
# Under the hood, pm.plot_posterior(trace) calls az.plot_posterior(trace)



In [21]:

    
# Summarize  posterior samples
pm.summary(trace)









    Out[21]:







  
    
      
      mean
      sd
      hpd_3%
      hpd_97%
      mcse_mean
      mcse_sd
      ess_mean
      ess_sd
      ess_bulk
      ess_tail
      r_hat
    
  
  
    
      θ
      0.324
      0.045
      0.237
      0.407
      0.002
      0.001
      867.0
      867.0
      851.0
      1292.0
      1.01



In [20]:

    
# Convert posterior samples into a parametric distribution
trace_approx = pm.Empirical(trace, model=model)
# Now plot samples from this distribution
az.plot_posterior(trace_approx.sample(1000));

Variational inference

We use automatic differentiation VI. Details can be found at https://docs.pymc.io/notebooks/variational_api_quickstart.html



In [25]:

    
niter = 10000
with model:
    post = pm.fit(niter, method='advi'); # mean field approximation









    



Average Loss = 64.881: 100%|██████████| 10000/10000 [00:03<00:00, 2560.06it/s]
Finished [100%]: Average Loss = 64.882



In [26]:

    
# Plot negative ELBO vs iteration to assess convergence
plt.plot(post.hist);



In [27]:

    
pm.plot_posterior(post.sample(1000));

1d Gaussian

Exact inference

For simplicity, we assume the variance is known, and we just want to infer the mean.



In [36]:

    
np.random.seed(0)
N = 100
x = np.random.randn(100)

# Parameters of prior
mu_prior = 1.1
sigma_prior = 1.2 #std
Sigma_prior = sigma_prior**2 #var

# Parameters of likelihood
sigma_x = 1.3
Sigma_x = sigma_x**2

# Bayes rule for Gaussians 
Sigma_post = 1/( 1/Sigma_prior + N/Sigma_x )
xbar = np.mean(x)
mu_post = Sigma_post * (1/Sigma_x * N * xbar + 1/Sigma_prior * mu_prior);
print('p(mu|D)=N(mu|{:.3f}, {:.3f})'.format(mu_post, Sigma_post))









    



p(\mu|D)=N(mu|0.072, 0.017)

MCMC inference

Initially we assume the variance is known, so we can compare results to exact infernece.



In [31]:

    
with pm.Model() as model:
    mu = pm.Normal('mu', mu=mu_prior, sd=sigma_prior)
    obs = pm.Normal('obs', mu=mu, sd=sigma_x, observed=x)
    mcmc_samples = pm.sample(1000, tune=500) # mcmc
    
pm.plot_posterior(mcmc_samples);









    



Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [mu]
Sampling chain 0, 0 divergences: 100%|██████████| 1500/1500 [00:00<00:00, 1881.55it/s]
Sampling chain 1, 0 divergences: 100%|██████████| 1500/1500 [00:00<00:00, 1957.38it/s]
The acceptance probability does not match the target. It is 0.8864843167489614, but should be close to 0.8. Try to increase the number of tuning steps.



In [37]:

    
vals = mcmc_samples.get_values('mu')
mu_post_mcmc = np.mean(vals)
Sigma_post_mcmc = np.var(vals)
print('pMCMC(mu|D)=N(mu|{:.3f}, {:.3f})'.format(mu_post_mcmc, Sigma_post_mcmc))
assert np.isclose(mu_post, mu_post_mcmc, atol=1e-1)
assert np.isclose(Sigma_post, Sigma_post_mcmc, atol=1e-1)









    



pMCMC(mu|D)=N(mu|0.069, 0.016)



In [0]:

    
# We can also evaluate the log joint at any given point in parameter space.
# The 'obs' variable is already observed (value=x)
# so the only unknown is mu. Let's clamp it to some value
# and compute log p(mu, D)
mu_clamped = -0.5    
logp = model.logp({'mu': mu_clamped})

# Computed the log joint "by hand"
log_prior = scipy.stats.norm(mu_prior, sigma_prior).logpdf(mu_clamped)
log_lik  = np.sum(scipy.stats.norm(mu_clamped, sigma_x).logpdf(x))
log_joint = log_prior + log_lik

assert np.isclose(logp, log_joint)

Now we consider the case where the mean and variance are both unknown. We also switch to a "real world" dataset, of "chemical shifts", that has a couple of "outliers".



In [39]:

    
#url = 'https://github.com/aloctavodia/BAP/blob/master/code/data/chemical_shifts.csv'
url = 'https://raw.githubusercontent.com/aloctavodia/BAP/master/code/data/chemical_shifts.csv'

df = pd.read_csv(url)
# b=df.iloc[:,1:].values
#data = df.to_numpy() 
data = df.iloc[:,0].values
print(data.shape)
print(data)









    



(47,)
[55.12 53.73 50.24 52.05 56.4  48.45 52.34 55.65 51.49 51.86 63.43 53.
 56.09 51.93 52.31 52.33 57.48 57.44 55.14 53.93 54.62 56.09 68.58 51.36
 55.47 50.73 51.94 54.95 50.39 52.91 51.5  52.68 47.72 49.73 51.82 54.99
 52.84 53.19 54.52 51.46 53.73 51.61 49.81 52.42 54.3  53.84 53.16]



In [40]:

    
az.plot_kde(data, rug=True)
plt.yticks([0], alpha=0)









    Out[40]:





([<matplotlib.axis.YTick at 0x7fc209be60b8>],
 <a list of 1 Text major ticklabel objects>)



In [41]:

    
# We will infer a posterior for the mean and variance.
# We use a uniform prior for the mean, with support slightly larger than the data.
# We use a truncated normal for the variance, with effective support uniform 0 to 3*std.
r = np.max(data)-np.min(data)
min_mu = np.min(data)-0.1*r
max_mu = np.max(data)+0.1*r
prior_std = 3*np.std(data)
print([min_mu, max_mu, prior_std])

with pm.Model() as model_g:
    μ = pm.Uniform('μ', lower=min_mu, upper=max_mu)
    σ = pm.HalfNormal('σ', sd=10)
    y = pm.Normal('y', mu=μ, sd=σ, observed=data)

pm.model_to_graphviz(model_g) # show the DAG









    



[45.634, 70.666, 10.312458871064944]






    Out[41]:



In [42]:

    
with model_g:
    trace_g = pm.sample(1000, random_seed=123)









    



Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [σ, μ]
Sampling chain 0, 0 divergences: 100%|██████████| 1500/1500 [00:01<00:00, 887.75it/s]
Sampling chain 1, 0 divergences: 100%|██████████| 1500/1500 [00:01<00:00, 1474.21it/s]



In [43]:

    
az.plot_trace(trace_g);



In [44]:

    
az.summary(trace_g)



In [46]:

    
az.plot_joint(trace_g, kind='kde', fill_last=False);

Posterior predictive checks



In [48]:

    
# We check how well the gaussian assumption fits our data
# by sampling from the fitted model, and plotting the samples
# and the original data.
# For details, see https://docs.pymc.io/notebooks/posterior_predictive.html

# For the Gaussian model, the mean and variance is higher than for the observed data, 
# indicating poor fit.
y_pred_g = pm.sample_posterior_predictive(trace_g, 100, model_g)

print(y_pred_g.keys())
v=y_pred_g['y']
print(type(v))
print(v.shape) # 100 x 47









    



/usr/local/lib/python3.6/dist-packages/pymc3/sampling.py:1247: UserWarning: samples parameter is smaller than nchains times ndraws, some draws and/or chains may not be represented in the returned posterior predictive sample
  "samples parameter is smaller than nchains times ndraws, some draws "
100%|██████████| 100/100 [00:00<00:00, 1094.21it/s]





    



dict_keys(['y'])
<class 'numpy.ndarray'>
(100, 47)



In [49]:

    
data_ppc = az.from_pymc3(trace=trace_g, posterior_predictive=y_pred_g)
ax = az.plot_ppc(data_ppc, figsize=(12, 6), mean=False);
ax[0].legend(fontsize=15)









    



arviz.data.io_pymc3 - WARNING - posterior predictive variable y's shape not compatible with number of chains and draws. This can mean that some draws or even whole chains are not represented.






    Out[49]:





<matplotlib.legend.Legend at 0x7fc2141a10f0>

Robust likelihood (1d Student distribution)



In [35]:

    
# We replace the above Gaussian likelihood with a Student t distribution.
# The degree of freedom parameter \nu > 0 (also called the "normality parameter")
# determines how close to Normal the distribution is.
# nu=1 corredsponds to a Cauchy, nu >> 10 corresponds to a Gaussian.
# We put an Exponential prior on nu, with a mean of 30.

with pm.Model() as model_t:
    μ = pm.Uniform('μ', 40, 75)
    σ = pm.HalfNormal('σ', sd=10)
    ν = pm.Exponential('ν', 1/30) # PyMC3 uses inverse of the mean
    y = pm.StudentT('y', mu=μ, sd=σ, nu=ν, observed=data)

pm.model_to_graphviz(model_t) # show the DAG

with model_t:
    trace_t = pm.sample(1000)

az.plot_trace(trace_t)

az.summary(trace_t)
# We see that E[nu]=4.5, which is fairly far from Gaussian
# We see E[sigma]=2.1, wich is less than the 3.5 estimate from Gaussian likelihood









    



Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [ν, σ, μ]
Sampling chain 0, 0 divergences: 100%|██████████| 1500/1500 [00:01<00:00, 850.29it/s]
Sampling chain 1, 0 divergences: 100%|██████████| 1500/1500 [00:01<00:00, 857.23it/s]






    Out[35]:







  
    
      
      mean
      sd
      hpd_3%
      hpd_97%
      mcse_mean
      mcse_sd
      ess_mean
      ess_sd
      ess_bulk
      ess_tail
      r_hat
    
  
  
    
      μ
      53.077
      0.388
      52.340
      53.774
      0.010
      0.007
      1365.0
      1365.0
      1366.0
      867.0
      1.0
    
    
      σ
      2.188
      0.403
      1.507
      2.977
      0.012
      0.009
      1043.0
      1043.0
      1035.0
      1275.0
      1.0
    
    
      ν
      4.500
      3.510
      1.270
      9.556
      0.115
      0.081
      933.0
      933.0
      1095.0
      1125.0
      1.0



In [36]:

    
# posterior predictive check

y_ppc_t = pm.sample_posterior_predictive(
    trace_t, 100, model_t, random_seed=123)
y_pred_t = az.from_pymc3(trace=trace_t, posterior_predictive=y_ppc_t)
az.plot_ppc(y_pred_t, figsize=(12, 6), mean=False)
ax[0].legend(fontsize=15)
plt.xlim(40, 70)









    



/usr/local/lib/python3.6/dist-packages/pymc3/sampling.py:1247: UserWarning: samples parameter is smaller than nchains times ndraws, some draws and/or chains may not be represented in the returned posterior predictive sample
  "samples parameter is smaller than nchains times ndraws, some draws "
100%|██████████| 100/100 [00:00<00:00, 1055.67it/s]
arviz.data.io_pymc3 - WARNING - posterior predictive variable y's shape not compatible with number of chains and draws. This can mean that some draws or even whole chains are not represented.






    Out[36]:





(40.0, 70.0)



In [38]:

    
# Remove outliers from data and compare empirical mean and variance of cleaned data
# to posterior mean and posterior scale of a Student likelihood

# https://gist.github.com/vishalkuo/f4aec300cf6252ed28d3
def removeOutliers(x, outlierConstant=1.5):
    a = np.array(x)
    upper_quartile = np.percentile(a, 75)
    lower_quartile = np.percentile(a, 25)
    IQR = (upper_quartile - lower_quartile) * outlierConstant
    quartileSet = (lower_quartile - IQR, upper_quartile + IQR)
    result = a[np.where((a >= quartileSet[0]) & (a <= quartileSet[1]))]
    return result.tolist()

data_clean = removeOutliers(data)
mu_mcmc = np.mean(trace_t.get_values('μ'))
sigma_mcmc = np.mean(trace_t.get_values('σ'))

print([np.mean(data), np.mean(data_clean), mu_mcmc])
print([np.std(data), np.std(data_clean), sigma_mcmc])









    



[53.548297872340434, 52.994666666666674]
[3.4374862903549817, 2.200877198856048]

Comparing means of different datasets

We often want to know if one dataset, $D_i$, has a "statistically signficant" difference in one of its parameters, such as its mean $\mu_i$, compared to some other dataset, $D_j$. We can answer this in a Bayesian way by computing $p(\delta_{ij}|D_i,D_j)$, where

$\delta_{ij}=\mu_i-\mu_j$.

To do this, we draw samples from $p(\mu_i|D_i)$ and $p(\mu_j|D_j)$.

Since the magnitude of $\delta_{ij}$ can be hard to interpret, it is common to divide it by the pooled standard deviation, to get a metric known as Cohen's d (see this website for details):

$d_{ij} = \frac{\mu_j - \mu_i}{\sqrt{\frac{\sigma_i^2 + \sigma_j^2}{2}}}$

We can compute $p(d_{ij}|D_i,D_j)$ using posterior samples of $\mu_i,\mu_j,\sigma_i,\sigma_j$.



In [48]:

    
#We illustrate this below using the same dataset as used in chap 2 of
#["Bayesian Analysis with Python (2nd end)"](https://github.com/aloctavodia/BAP) that records how much tips waiters made.

url = 'https://raw.githubusercontent.com/aloctavodia/BAP/master/code/data/tips.csv'
df = pd.read_csv(url)
df.tail()









    Out[48]:







  
    
      
      total_bill
      tip
      sex
      smoker
      day
      time
      size
    
  
  
    
      239
      29.03
      5.92
      Male
      No
      Sat
      Dinner
      3
    
    
      240
      27.18
      2.00
      Female
      Yes
      Sat
      Dinner
      2
    
    
      241
      22.67
      2.00
      Male
      Yes
      Sat
      Dinner
      2
    
    
      242
      17.82
      1.75
      Male
      No
      Sat
      Dinner
      2
    
    
      243
      18.78
      3.00
      Female
      No
      Thur
      Dinner
      2



In [46]:

    
# We look at the effect of day on tip amount.
# We ignore other covariates, such as gender.

sns.violinplot(x='day', y='tip', data=df)









    Out[46]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f1ab030ec88>



In [51]:

    
# We will compute 4 groups, corresponding to Thur-Sun.
x=df['day'].values
print(type(x))
print(x)
print(np.unique(x))









    



<class 'numpy.ndarray'>
['Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun'
 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat'
 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat'
 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun'
 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sat' 'Sat' 'Sat' 'Sat'
 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat'
 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur'
 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Fri' 'Fri' 'Fri' 'Fri'
 'Fri' 'Fri' 'Fri' 'Fri' 'Fri' 'Fri' 'Fri' 'Fri' 'Sat' 'Sat' 'Sat' 'Sat'
 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Thur'
 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur'
 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur'
 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur'
 'Thur' 'Thur' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun'
 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sat' 'Sat' 'Sat' 'Sat'
 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun'
 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Sun' 'Thur' 'Thur' 'Thur' 'Thur'
 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur' 'Thur'
 'Thur' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat'
 'Sat' 'Sat' 'Sat' 'Fri' 'Fri' 'Fri' 'Fri' 'Fri' 'Fri' 'Fri' 'Sat' 'Sat'
 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat' 'Sat'
 'Sat' 'Sat' 'Thur']
['Fri' 'Sat' 'Sun' 'Thur']



In [67]:

    
tip_amount = df['tip'].values
days = ['Thur', 'Fri', 'Sat', 'Sun']
idx = pd.Categorical(tips['day'], categories=days).codes
ngroups = len(np.unique(idx))
print(idx)









    



[3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2
 2 3 3 3 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3
 3 3 3 3 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1
 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0]



In [69]:

    
with pm.Model() as model_cg:
    μ = pm.Normal('μ', mu=0, sd=10, shape=ngroups)
    σ = pm.HalfNormal('σ', sd=10, shape=ngroups)
    y = pm.Normal('y', mu=μ[idx], sd=σ[idx], observed=tip_amount)

print(model_cg.basic_RVs)

with model_cg:
    trace_cg = pm.sample(5000)

az.plot_trace(trace_cg)
az.summary(trace_cg)









    



Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...






    



[μ, σ_log__, y]






    



Sequential sampling (2 chains in 1 job)
NUTS: [σ, μ]
Sampling chain 0, 0 divergences: 100%|██████████| 5500/5500 [00:05<00:00, 938.62it/s]
Sampling chain 1, 0 divergences: 100%|██████████| 5500/5500 [00:05<00:00, 919.67it/s]






    Out[69]:







  
    
      
      mean
      sd
      hpd_3%
      hpd_97%
      mcse_mean
      mcse_sd
      ess_mean
      ess_sd
      ess_bulk
      ess_tail
      r_hat
    
  
  
    
      μ[0]
      2.770
      0.162
      2.469
      3.078
      0.001
      0.001
      12816.0
      12699.0
      12846.0
      7972.0
      1.0
    
    
      μ[1]
      2.732
      0.250
      2.268
      3.208
      0.003
      0.002
      8791.0
      8791.0
      8927.0
      6985.0
      1.0
    
    
      μ[2]
      2.992
      0.178
      2.653
      3.320
      0.002
      0.001
      12015.0
      12015.0
      12037.0
      7042.0
      1.0
    
    
      μ[3]
      3.255
      0.144
      2.985
      3.523
      0.001
      0.001
      12028.0
      11919.0
      12093.0
      7467.0
      1.0
    
    
      σ[0]
      1.264
      0.116
      1.064
      1.494
      0.001
      0.001
      11665.0
      11170.0
      12329.0
      7697.0
      1.0
    
    
      σ[1]
      1.094
      0.198
      0.768
      1.473
      0.002
      0.002
      8779.0
      7947.0
      9898.0
      6861.0
      1.0
    
    
      σ[2]
      1.654
      0.126
      1.433
      1.901
      0.001
      0.001
      11351.0
      11129.0
      11410.0
      7870.0
      1.0
    
    
      σ[3]
      1.255
      0.105
      1.071
      1.460
      0.001
      0.001
      10241.0
      9897.0
      10768.0
      7375.0
      1.0



In [70]:

    
# Looking at the posterior mean for the mu_i,
# we see Thur ~ Fri < Sat < Sun.
# But to see if these differences are significant, we should take
# into account the variability. We illustrate this below.
# We see that Thursday and Friday both earn significantly less than Sunday.
# (The threshold of 0 is outside the 95% HPD).
# Other differences are less significant.

fig, ax = plt.subplots(3, 2, figsize=(14, 8), constrained_layout=True)

comparisons = [(i, j) for i in range(ngroups) for j in range(i+1, ngroups)]
pos = [(k, l) for k in range(ngroups-1) for l in (0, 1)] # position of plot

for (i, j), (k, l) in zip(comparisons, pos):
    means_diff = trace_cg['μ'][:, i] - trace_cg['μ'][:, j]
    d_cohen = (means_diff / np.sqrt((trace_cg['σ'][:, i]**2 + trace_cg['σ'][:, j]**2) / 2)).mean()
    az.plot_posterior(means_diff, ref_val=0, ax=ax[k, l],  credible_interval=0.95)
    name_i = days[i]
    name_j = days[j]
    str = 'mean {} -  mean {}'.format(name_i, name_j)
    ax[k, l].set_title(str)
    ax[k, l].plot(0, label=f"Cohen's d = {d_cohen:.2f}")
    ax[k, l].legend()



In [0]:

Hierarchical Bayes

Binomial rates (cancer data)

We use the "cancer rates" example from the MLAPP2.0 book. For PyMC3 code for a similar rats" example from BDA3, see here.



In [13]:

    
#https://github.com/probml/pmtk3/blob/master/demos/cancerRatesEb.m

data_y = np.array([0, 0, 2, 0, 1, 1, 0, 2, 1, 3, 0, 1, 1, 1, 54, 0, 0, 1, 3, 0]);
data_n = np.array([1083, 855, 3461, 657, 1208, 1025, 527, 1668, 583, 582, 917, 857,
    680, 917, 53637, 874, 395, 581, 588, 383]);
N = len(data_n)

# We put a prior on the mean and precision () of the Beta distribution,
# instead of on the alpha and beta parameters 
with pm.Model() as model_h:
    mu = pm.Beta('mu', 1., 1.)
    kappa = pm.HalfNormal('kappa', 500)
    alpha = pm.Deterministic('alpha', mu*kappa)
    beta = pm.Deterministic('beta', (1.0-mu)*kappa)
    theta = pm.Beta('theta', alpha=alpha, beta=beta, shape=N)
    y = pm.Binomial('y', p=theta, observed=data_y, n=data_n)

np.random.seed(0)
with model_h:
  trace_h = pm.sample(1000, chains=4)









    



Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (4 chains in 1 job)
NUTS: [theta, kappa, mu]
Sampling chain 0, 0 divergences: 100%|██████████| 1500/1500 [00:03<00:00, 414.72it/s]
Sampling chain 1, 0 divergences: 100%|██████████| 1500/1500 [00:03<00:00, 427.66it/s]
Sampling chain 2, 0 divergences: 100%|██████████| 1500/1500 [00:03<00:00, 388.88it/s]
Sampling chain 3, 0 divergences: 100%|██████████| 1500/1500 [00:04<00:00, 366.06it/s]
The acceptance probability does not match the target. It is 0.8860999355152308, but should be close to 0.8. Try to increase the number of tuning steps.
The acceptance probability does not match the target. It is 0.8922335926247497, but should be close to 0.8. Try to increase the number of tuning steps.
The acceptance probability does not match the target. It is 0.9040355530698296, but should be close to 0.8. Try to increase the number of tuning steps.



In [14]:

    
az.summary(trace_h)



In [15]:

    
az.plot_forest(trace_h, var_names=["theta"], credible_interval=0.95);



In [16]:

    
az.plot_forest(trace_h, var_names=["theta"], combined=True, credible_interval=0.95);



In [17]:

    
az.plot_forest(trace_h, var_names=["theta"], combined=True, kind='ridgeplot');

Gaussian means (8 schools data)

This example is from "Bayesian Data Analysis".



In [71]:

    
# https://github.com/probml/pyprobml/blob/master/scripts/schools8_pymc3.py

# Data of the Eight Schools Model
J = 8
y = np.array([28.,  8., -3.,  7., -1.,  1., 18., 12.])
sigma = np.array([15., 10., 16., 11.,  9., 11., 10., 18.])

names=[]; 
for t in range(8):
    names.append('theta {}'.format(t)); 
print(names)

# Plot raw data
fig, ax = plt.subplots()
y_pos = np.arange(8)
ax.errorbar(y,y_pos, xerr=sigma, fmt='o')
ax.set_yticks(y_pos)
ax.set_yticklabels(names)
ax.invert_yaxis()  # labels read top-to-bottom
plt.show()









    



['theta 0', 'theta 1', 'theta 2', 'theta 3', 'theta 4', 'theta 5', 'theta 6', 'theta 7']



In [72]:

    
# Centered model
with pm.Model() as Centered_eight:
    mu_alpha = pm.Normal('mu_alpha', mu=0, sigma=5)
    sigma_alpha = pm.HalfCauchy('sigma_alpha', beta=5)
    alpha = pm.Normal('alpha', mu=mu_alpha, sigma=sigma_alpha, shape=J)
    obs = pm.Normal('obs', mu=alpha, sigma=sigma, observed=y)
    
np.random.seed(0)
with Centered_eight:
    trace_centered = pm.sample(1000, chains=2)
    
pm.summary(trace_centered).round(2)
# PyMC3 gives multiple warnings about  divergences
# Also, see r_hat ~ 1.01, ESS << nchains*1000, especially for sigma_alpha
# We can solve these problems below by using a non-centered parameterization.
# In practice, for this model, the results are very similar.









    



Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (4 chains in 1 job)
NUTS: [alpha, sigma_alpha, mu_alpha]
Sampling chain 0, 6 divergences: 100%|██████████| 1500/1500 [00:03<00:00, 427.03it/s]
Sampling chain 1, 16 divergences: 100%|██████████| 1500/1500 [00:02<00:00, 508.83it/s]
Sampling chain 2, 13 divergences: 100%|██████████| 1500/1500 [00:03<00:00, 461.16it/s]
Sampling chain 3, 19 divergences: 100%|██████████| 1500/1500 [00:03<00:00, 470.60it/s]
There were 6 divergences after tuning. Increase `target_accept` or reparameterize.
There were 22 divergences after tuning. Increase `target_accept` or reparameterize.
There were 35 divergences after tuning. Increase `target_accept` or reparameterize.
There were 54 divergences after tuning. Increase `target_accept` or reparameterize.
The number of effective samples is smaller than 10% for some parameters.






    Out[72]:







  
    
      
      mean
      sd
      hpd_3%
      hpd_97%
      mcse_mean
      mcse_sd
      ess_mean
      ess_sd
      ess_bulk
      ess_tail
      r_hat
    
  
  
    
      mu_alpha
      4.34
      3.27
      -2.01
      10.25
      0.13
      0.10
      647.0
      497.0
      644.0
      647.0
      1.01
    
    
      alpha[0]
      6.62
      5.95
      -3.81
      17.70
      0.19
      0.14
      997.0
      966.0
      980.0
      1534.0
      1.01
    
    
      alpha[1]
      5.08
      4.95
      -4.06
      14.67
      0.14
      0.10
      1235.0
      1235.0
      1143.0
      1860.0
      1.01
    
    
      alpha[2]
      3.84
      5.61
      -6.89
      14.05
      0.16
      0.12
      1271.0
      1200.0
      1106.0
      1401.0
      1.01
    
    
      alpha[3]
      4.75
      4.96
      -5.07
      13.70
      0.14
      0.10
      1334.0
      1263.0
      1171.0
      1929.0
      1.01
    
    
      alpha[4]
      3.46
      4.78
      -5.76
      12.42
      0.16
      0.12
      927.0
      809.0
      848.0
      1538.0
      1.01
    
    
      alpha[5]
      4.02
      4.82
      -4.97
      13.47
      0.14
      0.10
      1114.0
      1080.0
      1006.0
      1738.0
      1.01
    
    
      alpha[6]
      6.63
      5.38
      -2.60
      17.34
      0.17
      0.12
      965.0
      965.0
      894.0
      1541.0
      1.01
    
    
      alpha[7]
      4.86
      5.64
      -5.88
      15.72
      0.15
      0.11
      1389.0
      1242.0
      1200.0
      1734.0
      1.01
    
    
      sigma_alpha
      4.13
      3.20
      0.80
      9.72
      0.16
      0.11
      400.0
      400.0
      235.0
      147.0
      1.02



In [73]:

    
# Display the total number and percentage of divergent chains
diverging = trace_centered['diverging']
print('Number of Divergent Chains: {}'.format(diverging.nonzero()[0].size))
diverging_pct = diverging.nonzero()[0].size / len(trace_centered) * 100
print('Percentage of Divergent Chains: {:.1f}'.format(diverging_pct))









    



Number of Divergent Chains: 54
Percentage of Divergent Chains: 5.4



In [27]:

    
# We can see somewhat high auto correlation of the samples
az.plot_autocorr(trace_centered, var_names=['mu_alpha', 'sigma_alpha']);



In [38]:

    
az.plot_forest(trace_centered, var_names="alpha", 
               credible_interval=0.95, combined=True);



In [29]:

    
# Non-centered parameterization

with pm.Model() as NonCentered_eight:
    mu_alpha = pm.Normal('mu_alpha', mu=0, sigma=5)
    sigma_alpha = pm.HalfCauchy('sigma_alpha', beta=5)
    alpha_offset = pm.Normal('alpha_offset', mu=0, sigma=1, shape=J)
    alpha = pm.Deterministic('alpha', mu_alpha + sigma_alpha * alpha_offset)
    #alpha = pm.Normal('alpha', mu=mu_alpha, sigma=sigma_alpha, shape=J)
    obs = pm.Normal('obs', mu=alpha, sigma=sigma, observed=y)
    
np.random.seed(0)
with NonCentered_eight:
    trace_noncentered = pm.sample(1000)
    
pm.summary(trace_noncentered).round(2)
# Samples look good: r_hat = 1, ESS ~= nchains*1000









    



Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [alpha_offset, sigma_alpha, mu_alpha]
Sampling chain 0, 0 divergences: 100%|██████████| 1500/1500 [00:01<00:00, 769.54it/s]
Sampling chain 1, 6 divergences: 100%|██████████| 1500/1500 [00:01<00:00, 790.51it/s]
There were 6 divergences after tuning. Increase `target_accept` or reparameterize.






    Out[29]:







  
    
      
      mean
      sd
      hpd_3%
      hpd_97%
      mcse_mean
      mcse_sd
      ess_mean
      ess_sd
      ess_bulk
      ess_tail
      r_hat
    
  
  
    
      mu_alpha
      4.35
      3.33
      -1.60
      10.80
      0.07
      0.06
      2109.0
      1785.0
      2128.0
      1133.0
      1.0
    
    
      alpha_offset[0]
      0.32
      0.97
      -1.56
      2.02
      0.02
      0.02
      2201.0
      823.0
      2178.0
      1134.0
      1.0
    
    
      alpha_offset[1]
      0.10
      0.95
      -1.80
      1.82
      0.02
      0.02
      2368.0
      879.0
      2360.0
      1510.0
      1.0
    
    
      alpha_offset[2]
      -0.08
      1.00
      -1.82
      1.95
      0.02
      0.03
      2302.0
      740.0
      2299.0
      1198.0
      1.0
    
    
      alpha_offset[3]
      0.04
      0.95
      -1.72
      1.86
      0.02
      0.02
      2624.0
      850.0
      2616.0
      1380.0
      1.0
    
    
      alpha_offset[4]
      -0.17
      0.94
      -2.01
      1.50
      0.02
      0.02
      2348.0
      1021.0
      2342.0
      1520.0
      1.0
    
    
      alpha_offset[5]
      -0.08
      0.99
      -2.00
      1.68
      0.02
      0.02
      2581.0
      856.0
      2665.0
      1197.0
      1.0
    
    
      alpha_offset[6]
      0.38
      0.98
      -1.50
      2.09
      0.02
      0.02
      2256.0
      1117.0
      2249.0
      1480.0
      1.0
    
    
      alpha_offset[7]
      0.12
      0.97
      -1.72
      1.86
      0.02
      0.02
      2216.0
      836.0
      2229.0
      1501.0
      1.0
    
    
      sigma_alpha
      3.60
      3.17
      0.02
      9.39
      0.09
      0.06
      1280.0
      1280.0
      1161.0
      1066.0
      1.0
    
    
      alpha[0]
      6.11
      5.56
      -3.29
      17.28
      0.13
      0.10
      1841.0
      1409.0
      2077.0
      1472.0
      1.0
    
    
      alpha[1]
      4.93
      4.54
      -3.68
      13.11
      0.10
      0.09
      1920.0
      1381.0
      2086.0
      1462.0
      1.0
    
    
      alpha[2]
      4.03
      5.26
      -6.54
      13.50
      0.12
      0.10
      1899.0
      1484.0
      2045.0
      1266.0
      1.0
    
    
      alpha[3]
      4.56
      4.84
      -5.06
      12.98
      0.11
      0.08
      2087.0
      1676.0
      2273.0
      1268.0
      1.0
    
    
      alpha[4]
      3.44
      4.67
      -5.05
      11.93
      0.10
      0.08
      2011.0
      1719.0
      2138.0
      1605.0
      1.0
    
    
      alpha[5]
      3.96
      4.98
      -5.31
      12.80
      0.10
      0.08
      2280.0
      1728.0
      2398.0
      1630.0
      1.0
    
    
      alpha[6]
      6.50
      5.29
      -2.97
      17.01
      0.12
      0.09
      2032.0
      1622.0
      2177.0
      1495.0
      1.0
    
    
      alpha[7]
      4.97
      5.40
      -5.63
      14.53
      0.13
      0.10
      1623.0
      1463.0
      1759.0
      1191.0
      1.0



In [31]:

    
az.plot_autocorr(trace_noncentered, var_names=['mu_alpha', 'sigma_alpha']);



In [51]:

    
az.plot_forest(trace_noncentered, var_names="theta",
               combined=True, credible_interval=0.95);



In [0]:

    
# Plot the "funnel of hell"
# Based on
# https://github.com/twiecki/WhileMyMCMCGentlySamples/blob/master/content/downloads/notebooks/GLM_hierarchical_non_centered.ipynb

group = 0
x = pd.Series(trace_centered['alpha'][:, group], name=f'alpha {group}')
y = pd.Series(trace_centered['sigma_alpha'], name='sigma_alpha')
sns.jointplot(x, y);
plt.suptitle('centered')



In [0]:

    
group = 0
x = pd.Series(trace_noncentered['alpha'][:, group], name=f'alpha {group}')
y = pd.Series(trace_noncentered['sigma_alpha'], name='sigma_alpha')
sns.jointplot(x, y);
plt.suptitle('noncentered')



In [40]:

    
for group in range(J):


  fig, axs = plt.subplots(ncols=2, sharex=True, sharey=True)
  x = pd.Series(trace_centered['alpha'][:, group], name=f'alpha {group}')
  y  = pd.Series(trace_centered['sigma_alpha'], name='sigma_alpha')
  axs[0].plot(x, y, '.');
  axs[0].set(title='Centered', ylabel='sigma_alpha', xlabel=f'alpha {group}')
  x = pd.Series(trace_noncentered['alpha'][:, group], name=f'alpha {group}')
  y  = pd.Series(trace_noncentered['sigma_alpha'], name='sigma_alpha')
  axs[1].plot(x, y, '.');
  axs[1].set(title='Non-centered', ylabel='sigma_alpha', xlabel=f'alpha {group}')

Linear regression (synthetic 1d data)

This section is based on Bayesian Analysis with Python, ch 3 and this blog post from Thomas Wiecki.



In [57]:

    
N = 10 #20
M = 8 # num groups
idx = np.repeat(range(M-1), N) # N samples for groups 0-6
idx = np.append(idx, 7) # 1 sample for 7'th group
np.random.seed(123)

#alpha_real = np.random.normal(2.5, 0.5, size=M)
#beta_real = np.random.beta(6, 1, size=M)
#eps_real = np.random.normal(0, 0.5, size=len(idx))

alpha_real = np.random.normal(2.5, 0.5, size=M)
beta_real = np.random.beta(1, 1, size=M) # slope is closer to 0
eps_real = np.random.normal(0, 0.5, size=len(idx))

print(beta_real)

y_m = np.zeros(len(idx))
x_m = np.random.normal(10, 1, len(idx))
y_m = alpha_real[idx] + beta_real[idx] * x_m + eps_real
x_centered = x_m - x_m.mean()

_, ax = plt.subplots(2, 4, figsize=(10, 5), sharex=True, sharey=True)
ax = np.ravel(ax)
j, k = 0, N
for i in range(M):
    ax[i].scatter(x_m[j:k], y_m[j:k])
    ax[i].set_xlabel(f'x_{i}')
    ax[i].set_ylabel(f'y_{i}', rotation=0, labelpad=15)
    ax[i].set_xlim(6, 15)
    ax[i].set_ylim(1, 18)
    j += N
    k += N
plt.tight_layout()









    



[0.88022503 0.50983392 0.61314719 0.31763509 0.17516902 0.46602537
 0.57693427 0.84366837]



In [58]:

    
# Fit separarate models per group

with pm.Model() as unpooled_model:
    α = pm.Normal('α', mu=0, sd=10, shape=M)
    β = pm.Normal('β', mu=0, sd=10, shape=M)
    ϵ = pm.HalfCauchy('ϵ', 5)

    y_pred = pm.Normal('y_pred', mu=α[idx] + β[idx] * x_m,
                         sd=ϵ,  observed=y_m)
    trace_up = pm.sample(1000)

az.summary(trace_up)









    



Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [ϵ, β, α]
Sampling chain 0, 0 divergences: 100%|██████████| 1500/1500 [00:24<00:00, 60.08it/s]
Sampling chain 1, 0 divergences: 100%|██████████| 1500/1500 [00:23<00:00, 63.04it/s]






    Out[58]:







  
    
      
      mean
      sd
      hpd_3%
      hpd_97%
      mcse_mean
      mcse_sd
      ess_mean
      ess_sd
      ess_bulk
      ess_tail
      r_hat
    
  
  
    
      α[0]
      2.749
      1.645
      -0.314
      5.723
      0.036
      0.026
      2125.0
      1993.0
      2132.0
      1486.0
      1.00
    
    
      α[1]
      3.116
      2.507
      -1.687
      7.561
      0.051
      0.047
      2455.0
      1395.0
      2506.0
      1305.0
      1.00
    
    
      α[2]
      2.900
      2.017
      -0.841
      6.609
      0.052
      0.040
      1493.0
      1289.0
      1502.0
      1282.0
      1.00
    
    
      α[3]
      0.630
      3.235
      -5.386
      6.556
      0.065
      0.071
      2494.0
      1052.0
      2484.0
      1361.0
      1.00
    
    
      α[4]
      4.467
      1.946
      1.228
      8.495
      0.047
      0.034
      1686.0
      1686.0
      1692.0
      1369.0
      1.00
    
    
      α[5]
      0.315
      2.317
      -3.947
      4.733
      0.048
      0.054
      2315.0
      930.0
      2340.0
      1412.0
      1.00
    
    
      α[6]
      -2.924
      3.008
      -7.909
      3.450
      0.062
      0.052
      2371.0
      1648.0
      2380.0
      1305.0
      1.01
    
    
      α[7]
      -0.019
      10.236
      -18.613
      19.346
      0.199
      0.247
      2653.0
      859.0
      2646.0
      1285.0
      1.00
    
    
      β[0]
      0.822
      0.163
      0.524
      1.124
      0.004
      0.003
      2131.0
      2038.0
      2143.0
      1416.0
      1.00
    
    
      β[1]
      0.523
      0.245
      0.080
      0.990
      0.005
      0.004
      2427.0
      2277.0
      2478.0
      1287.0
      1.00
    
    
      β[2]
      0.588
      0.199
      0.214
      0.950
      0.005
      0.004
      1566.0
      1478.0
      1576.0
      1335.0
      1.00
    
    
      β[3]
      0.404
      0.310
      -0.201
      0.941
      0.006
      0.005
      2500.0
      1876.0
      2492.0
      1360.0
      1.00
    
    
      β[4]
      -0.046
      0.198
      -0.429
      0.304
      0.005
      0.004
      1695.0
      1224.0
      1701.0
      1406.0
      1.00
    
    
      β[5]
      0.765
      0.232
      0.321
      1.183
      0.005
      0.004
      2331.0
      2098.0
      2352.0
      1458.0
      1.00
    
    
      β[6]
      0.991
      0.298
      0.381
      1.498
      0.006
      0.004
      2361.0
      2340.0
      2379.0
      1297.0
      1.01
    
    
      β[7]
      1.025
      0.896
      -0.683
      2.646
      0.017
      0.016
      2649.0
      1568.0
      2645.0
      1303.0
      1.00
    
    
      ϵ
      0.596
      0.059
      0.492
      0.706
      0.001
      0.001
      2028.0
      1988.0
      2056.0
      1358.0
      1.00



In [0]:

    
def plot_post_pred_samples(trace, nsamples=20):
    _, ax = plt.subplots(2, 4, figsize=(10, 5), sharex=True, sharey=True,
                       constrained_layout=True)
    ax = np.ravel(ax)
    j, k = 0, N
    x_range = np.linspace(x_m.min(), x_m.max(), 10)
    X =  x_range[:, np.newaxis]
    
    for i in range(M):
        ax[i].scatter(x_m[j:k], y_m[j:k])
        ax[i].set_xlabel(f'x_{i}')
        ax[i].set_ylabel(f'y_{i}', labelpad=17, rotation=0)
        alpha_m = trace['α'][:, i].mean()
        beta_m = trace['β'][:, i].mean()
        ax[i].plot(x_range, alpha_m + beta_m * x_range, c='r', lw=3,
                  label=f'y = {alpha_m:.2f} + {beta_m:.2f} * x')
        plt.xlim(x_m.min()-1, x_m.max()+1)
        plt.ylim(y_m.min()-1, y_m.max()+1)
        alpha_samples = trace['α'][:,i]
        beta_samples = trace['β'][:,i]
        ndx = np.random.choice(np.arange(len(alpha_samples)), nsamples)
        alpha_samples_thinned = alpha_samples[ndx]
        beta_samples_thinned = beta_samples[ndx]
        ax[i].plot(x_range, alpha_samples_thinned + beta_samples_thinned * X,
            c='gray', alpha=0.5)
        
        j += N
        k += N



In [60]:

    
plot_post_pred_samples(trace_up)



In [61]:

    
# Fit the centered model to the raw data
with pm.Model() as model_centered:
    # hyper-priors
    μ_α = pm.Normal('μ_α', mu=0, sd=10)
    σ_α = pm.HalfNormal('σ_α', 10)
    μ_β = pm.Normal('μ_β', mu=0, sd=10)
    σ_β = pm.HalfNormal('σ_β', sd=10)

    # priors
    α = pm.Normal('α', mu=μ_α, sd=σ_α, shape=M)
    β = pm.Normal('β', mu=μ_β, sd=σ_β, shape=M)
    ϵ = pm.HalfCauchy('ϵ', 5)

    y_pred = pm.Normal('y_pred', mu=α[idx] + β[idx] * x_m,
                         sd=ϵ, observed=y_m)

    trace_centered = pm.sample(1000)

az.summary(trace_centered)
# Lots of warnings about divergence. We will fix this below
# when we switch to non-centered parameterization.









    



Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [ν, ϵ, β, α, σ_β, μ_β, σ_α, μ_α]
Sampling chain 0, 10 divergences: 100%|██████████| 1500/1500 [00:19<00:00, 76.42it/s]
Sampling chain 1, 3 divergences: 100%|██████████| 1500/1500 [00:18<00:00, 83.23it/s]
There were 10 divergences after tuning. Increase `target_accept` or reparameterize.
There were 13 divergences after tuning. Increase `target_accept` or reparameterize.
The estimated number of effective samples is smaller than 200 for some parameters.






    Out[61]:







  
    
      
      mean
      sd
      hpd_3%
      hpd_97%
      mcse_mean
      mcse_sd
      ess_mean
      ess_sd
      ess_bulk
      ess_tail
      r_hat
    
  
  
    
      μ_α
      2.264
      1.044
      0.219
      4.124
      0.050
      0.036
      442.0
      411.0
      434.0
      700.0
      1.0
    
    
      μ_β
      0.557
      0.141
      0.298
      0.813
      0.006
      0.004
      518.0
      518.0
      516.0
      944.0
      1.0
    
    
      α[0]
      2.878
      1.190
      0.882
      5.400
      0.059
      0.042
      411.0
      411.0
      415.0
      652.0
      1.0
    
    
      α[1]
      2.503
      1.342
      -0.248
      4.953
      0.054
      0.039
      621.0
      605.0
      599.0
      968.0
      1.0
    
    
      α[2]
      2.642
      1.224
      0.401
      5.009
      0.054
      0.039
      518.0
      495.0
      519.0
      1105.0
      1.0
    
    
      α[3]
      1.539
      1.618
      -1.732
      4.426
      0.074
      0.052
      480.0
      480.0
      486.0
      888.0
      1.0
    
    
      α[4]
      2.255
      1.356
      -0.747
      4.543
      0.054
      0.038
      642.0
      642.0
      646.0
      526.0
      1.0
    
    
      α[5]
      1.914
      1.253
      -0.645
      4.034
      0.057
      0.041
      482.0
      474.0
      476.0
      871.0
      1.0
    
    
      α[6]
      1.541
      1.536
      -1.291
      4.413
      0.067
      0.047
      525.0
      525.0
      544.0
      817.0
      1.0
    
    
      α[7]
      2.842
      1.757
      -0.274
      6.420
      0.075
      0.059
      546.0
      441.0
      565.0
      660.0
      1.0
    
    
      β[0]
      0.807
      0.118
      0.580
      1.021
      0.006
      0.004
      425.0
      425.0
      427.0
      661.0
      1.0
    
    
      β[1]
      0.583
      0.132
      0.327
      0.839
      0.005
      0.004
      638.0
      638.0
      613.0
      955.0
      1.0
    
    
      β[2]
      0.613
      0.120
      0.393
      0.839
      0.005
      0.004
      525.0
      525.0
      527.0
      1145.0
      1.0
    
    
      β[3]
      0.318
      0.155
      0.028
      0.617
      0.007
      0.005
      483.0
      449.0
      488.0
      921.0
      1.0
    
    
      β[4]
      0.180
      0.139
      -0.057
      0.481
      0.005
      0.004
      660.0
      581.0
      661.0
      542.0
      1.0
    
    
      β[5]
      0.606
      0.126
      0.382
      0.850
      0.006
      0.004
      487.0
      487.0
      481.0
      1032.0
      1.0
    
    
      β[6]
      0.549
      0.152
      0.293
      0.859
      0.007
      0.005
      542.0
      520.0
      557.0
      820.0
      1.0
    
    
      β[7]
      0.768
      0.157
      0.438
      1.023
      0.007
      0.005
      537.0
      537.0
      558.0
      729.0
      1.0
    
    
      σ_α
      1.334
      0.987
      0.133
      3.057
      0.064
      0.045
      238.0
      238.0
      170.0
      254.0
      1.0
    
    
      σ_β
      0.288
      0.123
      0.095
      0.512
      0.004
      0.003
      827.0
      727.0
      831.0
      540.0
      1.0
    
    
      ϵ
      0.579
      0.057
      0.480
      0.688
      0.002
      0.001
      1300.0
      1300.0
      1262.0
      1048.0
      1.0
    
    
      ν
      44.776
      33.242
      3.721
      104.553
      0.862
      0.610
      1488.0
      1483.0
      1322.0
      1146.0
      1.0



In [62]:

    
az.plot_autocorr(trace_centered, var_names=['μ_α', 'σ_α', 'μ_β', 'σ_β']); #sigma_alpha chain is highly correlated



In [63]:

    
plot_post_pred_samples(trace_centered)



In [65]:

    
# Fit the non-centered model to the raw data
with pm.Model() as model_noncentered:
    # hyper-priors
    μ_α = pm.Normal('μ_α', mu=0, sd=10)
    σ_α = pm.HalfNormal('σ_α', 10)
    μ_β = pm.Normal('μ_β', mu=0, sd=10)
    σ_β = pm.HalfNormal('σ_β', sd=10)

    # priors
    α_offset = pm.Normal('α_offset', mu=0, sd=1, shape=M)
    α = pm.Deterministic('α', μ_α + σ_α * α_offset) 
    β_offset = pm.Normal('β_offset', mu=0, sd=1, shape=M)
    β = pm.Deterministic('β', μ_β + σ_β * β_offset) 

    ϵ = pm.HalfCauchy('ϵ', 5)

    y_pred = pm.Normal('y_pred', mu=α[idx] + β[idx] * x_m,
                         sd=ϵ, observed=y_m)

    trace_noncentered = pm.sample(1000)

az.summary(trace_noncentered)
# very few divergences :)









    



Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [ϵ, β_offset, α_offset, σ_β, μ_β, σ_α, μ_α]
Sampling chain 0, 13 divergences: 100%|██████████| 1500/1500 [00:31<00:00, 48.14it/s]
Sampling chain 1, 51 divergences: 100%|██████████| 1500/1500 [00:32<00:00, 46.79it/s]
There were 13 divergences after tuning. Increase `target_accept` or reparameterize.
There were 64 divergences after tuning. Increase `target_accept` or reparameterize.
The estimated number of effective samples is smaller than 200 for some parameters.






    Out[65]:







  
    
      
      mean
      sd
      hpd_3%
      hpd_97%
      mcse_mean
      mcse_sd
      ess_mean
      ess_sd
      ess_bulk
      ess_tail
      r_hat
    
  
  
    
      μ_α
      2.271
      0.973
      0.513
      4.191
      0.031
      0.023
      978.0
      932.0
      1002.0
      748.0
      1.00
    
    
      μ_β
      0.557
      0.146
      0.278
      0.824
      0.005
      0.004
      749.0
      700.0
      754.0
      670.0
      1.01
    
    
      α_offset[0]
      0.356
      0.852
      -1.308
      1.935
      0.026
      0.019
      1093.0
      970.0
      1042.0
      1308.0
      1.00
    
    
      α_offset[1]
      0.168
      0.830
      -1.492
      1.618
      0.023
      0.020
      1338.0
      868.0
      1321.0
      1340.0
      1.00
    
    
      α_offset[2]
      0.218
      0.815
      -1.296
      1.742
      0.020
      0.019
      1611.0
      957.0
      1590.0
      1254.0
      1.00
    
    
      α_offset[3]
      -0.399
      0.918
      -2.167
      1.275
      0.021
      0.021
      1889.0
      966.0
      1839.0
      1442.0
      1.00
    
    
      α_offset[4]
      0.054
      0.924
      -1.667
      1.807
      0.025
      0.024
      1374.0
      736.0
      1370.0
      1289.0
      1.01
    
    
      α_offset[5]
      -0.207
      0.845
      -1.703
      1.531
      0.022
      0.022
      1533.0
      723.0
      1517.0
      1160.0
      1.01
    
    
      α_offset[6]
      -0.479
      0.928
      -2.187
      1.303
      0.030
      0.025
      973.0
      708.0
      858.0
      739.0
      1.01
    
    
      α_offset[7]
      0.290
      0.946
      -1.571
      2.045
      0.024
      0.027
      1573.0
      610.0
      1599.0
      653.0
      1.00
    
    
      β_offset[0]
      0.977
      0.577
      -0.171
      1.954
      0.032
      0.024
      329.0
      292.0
      346.0
      722.0
      1.00
    
    
      β_offset[1]
      0.037
      0.576
      -1.463
      0.872
      0.052
      0.050
      121.0
      66.0
      193.0
      33.0
      1.01
    
    
      β_offset[2]
      0.177
      0.517
      -0.914
      1.046
      0.036
      0.026
      201.0
      201.0
      229.0
      66.0
      1.01
    
    
      β_offset[3]
      -0.875
      0.585
      -2.035
      0.096
      0.027
      0.019
      483.0
      483.0
      473.0
      1104.0
      1.00
    
    
      β_offset[4]
      -1.365
      0.605
      -2.438
      -0.218
      0.022
      0.015
      785.0
      776.0
      790.0
      912.0
      1.00
    
    
      β_offset[5]
      0.146
      0.484
      -0.835
      0.958
      0.021
      0.015
      527.0
      527.0
      521.0
      876.0
      1.00
    
    
      β_offset[6]
      0.034
      0.675
      -0.830
      1.918
      0.067
      0.059
      102.0
      67.0
      173.0
      38.0
      1.01
    
    
      β_offset[7]
      0.757
      0.573
      -0.237
      1.820
      0.026
      0.019
      469.0
      469.0
      470.0
      800.0
      1.00
    
    
      σ_α
      1.304
      1.155
      0.001
      3.663
      0.131
      0.117
      78.0
      49.0
      150.0
      40.0
      1.01
    
    
      σ_β
      0.307
      0.127
      0.123
      0.526
      0.005
      0.004
      595.0
      580.0
      654.0
      892.0
      1.00
    
    
      α[0]
      2.707
      1.118
      0.578
      4.754
      0.051
      0.036
      490.0
      490.0
      456.0
      589.0
      1.00
    
    
      α[1]
      2.628
      1.464
      0.419
      6.469
      0.133
      0.126
      121.0
      68.0
      191.0
      45.0
      1.01
    
    
      α[2]
      2.678
      1.269
      0.669
      5.541
      0.101
      0.087
      159.0
      107.0
      200.0
      78.0
      1.01
    
    
      α[3]
      1.602
      1.505
      -1.274
      4.153
      0.057
      0.040
      692.0
      692.0
      781.0
      983.0
      1.00
    
    
      α[4]
      2.291
      1.264
      -0.187
      4.521
      0.048
      0.034
      694.0
      694.0
      634.0
      1401.0
      1.00
    
    
      α[5]
      1.991
      1.207
      -0.511
      4.076
      0.035
      0.025
      1183.0
      1137.0
      1192.0
      1043.0
      1.00
    
    
      α[6]
      1.430
      1.734
      -2.838
      3.927
      0.145
      0.103
      142.0
      142.0
      242.0
      61.0
      1.01
    
    
      α[7]
      2.794
      1.643
      0.057
      6.148
      0.102
      0.078
      258.0
      220.0
      292.0
      312.0
      1.01
    
    
      β[0]
      0.824
      0.112
      0.593
      1.009
      0.005
      0.004
      530.0
      482.0
      493.0
      580.0
      1.00
    
    
      β[1]
      0.569
      0.145
      0.189
      0.782
      0.013
      0.009
      119.0
      119.0
      186.0
      40.0
      1.01
    
    
      β[2]
      0.609
      0.125
      0.328
      0.800
      0.010
      0.007
      163.0
      163.0
      207.0
      57.0
      1.01
    
    
      β[3]
      0.312
      0.144
      0.059
      0.586
      0.005
      0.004
      734.0
      647.0
      892.0
      985.0
      1.00
    
    
      β[4]
      0.176
      0.130
      -0.079
      0.410
      0.005
      0.003
      799.0
      799.0
      736.0
      1343.0
      1.00
    
    
      β[5]
      0.597
      0.121
      0.380
      0.840
      0.003
      0.003
      1204.0
      1158.0
      1216.0
      1099.0
      1.00
    
    
      β[6]
      0.561
      0.172
      0.314
      0.983
      0.014
      0.012
      144.0
      108.0
      243.0
      58.0
      1.01
    
    
      β[7]
      0.771
      0.149
      0.502
      1.026
      0.008
      0.006
      335.0
      335.0
      362.0
      788.0
      1.00
    
    
      ϵ
      0.596
      0.056
      0.486
      0.689
      0.001
      0.001
      1829.0
      1772.0
      1892.0
      1188.0
      1.00



In [66]:

    
az.plot_autocorr(trace_noncentered, var_names=['μ_α', 'σ_α', 'μ_β', 'σ_β']);



In [67]:

    
plot_post_pred_samples(trace_noncentered)



In [0]:

	mean	sd	hpd_3%	hpd_97%	mcse_mean	mcse_sd	ess_mean	ess_sd	ess_bulk	ess_tail	r_hat
mu	0.001	0.000	0.001	0.002	0.000	0.000	2985.0	2948.0	3017.0	3066.0	1.0
kappa	768.204	306.928	249.011	1356.562	7.212	5.100	1811.0	1811.0	1692.0	2178.0	1.0
alpha	1.055	0.419	0.400	1.866	0.013	0.009	1096.0	1096.0	970.0	1622.0	1.0
beta	767.149	306.613	248.607	1354.849	7.201	5.093	1813.0	1813.0	1693.0	2165.0	1.0
theta[0]	0.001	0.001	0.000	0.002	0.000	0.000	4150.0	3848.0	2472.0	1526.0	1.0
theta[1]	0.001	0.001	0.000	0.002	0.000	0.000	4051.0	4006.0	1938.0	1308.0	1.0
theta[2]	0.001	0.000	0.000	0.001	0.000	0.000	4410.0	3838.0	3884.0	2253.0	1.0
theta[3]	0.001	0.001	0.000	0.002	0.000	0.000	4041.0	3797.0	2305.0	1357.0	1.0
theta[4]	0.001	0.001	0.000	0.002	0.000	0.000	4349.0	3764.0	3489.0	2266.0	1.0
theta[5]	0.001	0.001	0.000	0.003	0.000	0.000	4776.0	3974.0	3840.0	1975.0	1.0
theta[6]	0.001	0.001	0.000	0.002	0.000	0.000	4311.0	3737.0	1983.0	1097.0	1.0
theta[7]	0.001	0.001	0.000	0.003	0.000	0.000	5951.0	5036.0	4966.0	2804.0	1.0
theta[8]	0.002	0.001	0.000	0.004	0.000	0.000	5728.0	4474.0	3853.0	2085.0	1.0
theta[9]	0.003	0.002	0.001	0.006	0.000	0.000	4680.0	4357.0	4085.0	2378.0	1.0
theta[10]	0.001	0.001	0.000	0.002	0.000	0.000	5483.0	5037.0	2839.0	1820.0	1.0
theta[11]	0.001	0.001	0.000	0.003	0.000	0.000	5324.0	4342.0	3747.0	1825.0	1.0
theta[12]	0.001	0.001	0.000	0.003	0.000	0.000	5051.0	4094.0	4020.0	2187.0	1.0
theta[13]	0.001	0.001	0.000	0.003	0.000	0.000	5220.0	4370.0	4287.0	2282.0	1.0
theta[14]	0.001	0.000	0.001	0.001	0.000	0.000	5364.0	5178.0	5386.0	3053.0	1.0
theta[15]	0.001	0.001	0.000	0.002	0.000	0.000	4457.0	3987.0	2666.0	1360.0	1.0
theta[16]	0.001	0.001	0.000	0.003	0.000	0.000	4824.0	4265.0	2440.0	1511.0	1.0
theta[17]	0.002	0.001	0.000	0.004	0.000	0.000	4558.0	3619.0	3650.0	1944.0	1.0
theta[18]	0.003	0.002	0.001	0.006	0.000	0.000	4109.0	3643.0	3975.0	2559.0	1.0
theta[19]	0.001	0.001	0.000	0.003	0.000	0.000	4775.0	4391.0	2542.0	1862.0	1.0

	mean	sd	hpd_3%	hpd_97%	mcse_mean	mcse_sd	ess_mean	ess_sd	ess_bulk	ess_tail	r_hat
μ	53.544	0.526	52.549	54.486	0.012	0.008	1929.0	1929.0	1929.0	1423.0	1.0
σ	3.566	0.387	2.835	4.266	0.010	0.008	1366.0	1297.0	1438.0	1157.0	1.0

	mean	sd	hpd_3%	hpd_97%	mcse_mean	mcse_sd	ess_mean	ess_sd	ess_bulk	ess_tail	r_hat
μ	53.077	0.388	52.340	53.774	0.010	0.007	1365.0	1365.0	1366.0	867.0	1.0
σ	2.188	0.403	1.507	2.977	0.012	0.009	1043.0	1043.0	1035.0	1275.0	1.0
ν	4.500	3.510	1.270	9.556	0.115	0.081	933.0	933.0	1095.0	1125.0	1.0

	total_bill	tip	sex	smoker	day	time	size
239	29.03	5.92	Male	No	Sat	Dinner	3
240	27.18	2.00	Female	Yes	Sat	Dinner	2
241	22.67	2.00	Male	Yes	Sat	Dinner	2
242	17.82	1.75	Male	No	Sat	Dinner	2
243	18.78	3.00	Female	No	Thur	Dinner	2

	mean	sd	hpd_3%	hpd_97%	mcse_mean	mcse_sd	ess_mean	ess_sd	ess_bulk	ess_tail	r_hat
μ[0]	2.770	0.162	2.469	3.078	0.001	0.001	12816.0	12699.0	12846.0	7972.0	1.0
μ[1]	2.732	0.250	2.268	3.208	0.003	0.002	8791.0	8791.0	8927.0	6985.0	1.0
μ[2]	2.992	0.178	2.653	3.320	0.002	0.001	12015.0	12015.0	12037.0	7042.0	1.0
μ[3]	3.255	0.144	2.985	3.523	0.001	0.001	12028.0	11919.0	12093.0	7467.0	1.0
σ[0]	1.264	0.116	1.064	1.494	0.001	0.001	11665.0	11170.0	12329.0	7697.0	1.0
σ[1]	1.094	0.198	0.768	1.473	0.002	0.002	8779.0	7947.0	9898.0	6861.0	1.0
σ[2]	1.654	0.126	1.433	1.901	0.001	0.001	11351.0	11129.0	11410.0	7870.0	1.0
σ[3]	1.255	0.105	1.071	1.460	0.001	0.001	10241.0	9897.0	10768.0	7375.0	1.0

	mean	sd	hpd_3%	hpd_97%	mcse_mean	mcse_sd	ess_mean	ess_sd	ess_bulk	ess_tail	r_hat
mu_alpha	4.34	3.27	-2.01	10.25	0.13	0.10	647.0	497.0	644.0	647.0	1.01
alpha[0]	6.62	5.95	-3.81	17.70	0.19	0.14	997.0	966.0	980.0	1534.0	1.01
alpha[1]	5.08	4.95	-4.06	14.67	0.14	0.10	1235.0	1235.0	1143.0	1860.0	1.01
alpha[2]	3.84	5.61	-6.89	14.05	0.16	0.12	1271.0	1200.0	1106.0	1401.0	1.01
alpha[3]	4.75	4.96	-5.07	13.70	0.14	0.10	1334.0	1263.0	1171.0	1929.0	1.01
alpha[4]	3.46	4.78	-5.76	12.42	0.16	0.12	927.0	809.0	848.0	1538.0	1.01
alpha[5]	4.02	4.82	-4.97	13.47	0.14	0.10	1114.0	1080.0	1006.0	1738.0	1.01
alpha[6]	6.63	5.38	-2.60	17.34	0.17	0.12	965.0	965.0	894.0	1541.0	1.01
alpha[7]	4.86	5.64	-5.88	15.72	0.15	0.11	1389.0	1242.0	1200.0	1734.0	1.01
sigma_alpha	4.13	3.20	0.80	9.72	0.16	0.11	400.0	400.0	235.0	147.0	1.02

	mean	sd	hpd_3%	hpd_97%	mcse_mean	mcse_sd	ess_mean	ess_sd	ess_bulk	ess_tail	r_hat
mu_alpha	4.35	3.33	-1.60	10.80	0.07	0.06	2109.0	1785.0	2128.0	1133.0	1.0
alpha_offset[0]	0.32	0.97	-1.56	2.02	0.02	0.02	2201.0	823.0	2178.0	1134.0	1.0
alpha_offset[1]	0.10	0.95	-1.80	1.82	0.02	0.02	2368.0	879.0	2360.0	1510.0	1.0
alpha_offset[2]	-0.08	1.00	-1.82	1.95	0.02	0.03	2302.0	740.0	2299.0	1198.0	1.0
alpha_offset[3]	0.04	0.95	-1.72	1.86	0.02	0.02	2624.0	850.0	2616.0	1380.0	1.0
alpha_offset[4]	-0.17	0.94	-2.01	1.50	0.02	0.02	2348.0	1021.0	2342.0	1520.0	1.0
alpha_offset[5]	-0.08	0.99	-2.00	1.68	0.02	0.02	2581.0	856.0	2665.0	1197.0	1.0
alpha_offset[6]	0.38	0.98	-1.50	2.09	0.02	0.02	2256.0	1117.0	2249.0	1480.0	1.0
alpha_offset[7]	0.12	0.97	-1.72	1.86	0.02	0.02	2216.0	836.0	2229.0	1501.0	1.0
sigma_alpha	3.60	3.17	0.02	9.39	0.09	0.06	1280.0	1280.0	1161.0	1066.0	1.0
alpha[0]	6.11	5.56	-3.29	17.28	0.13	0.10	1841.0	1409.0	2077.0	1472.0	1.0
alpha[1]	4.93	4.54	-3.68	13.11	0.10	0.09	1920.0	1381.0	2086.0	1462.0	1.0
alpha[2]	4.03	5.26	-6.54	13.50	0.12	0.10	1899.0	1484.0	2045.0	1266.0	1.0
alpha[3]	4.56	4.84	-5.06	12.98	0.11	0.08	2087.0	1676.0	2273.0	1268.0	1.0
alpha[4]	3.44	4.67	-5.05	11.93	0.10	0.08	2011.0	1719.0	2138.0	1605.0	1.0
alpha[5]	3.96	4.98	-5.31	12.80	0.10	0.08	2280.0	1728.0	2398.0	1630.0	1.0
alpha[6]	6.50	5.29	-2.97	17.01	0.12	0.09	2032.0	1622.0	2177.0	1495.0	1.0
alpha[7]	4.97	5.40	-5.63	14.53	0.13	0.10	1623.0	1463.0	1759.0	1191.0	1.0

	mean	sd	hpd_3%	hpd_97%	mcse_mean	mcse_sd	ess_mean	ess_sd	ess_bulk	ess_tail	r_hat
α[0]	2.749	1.645	-0.314	5.723	0.036	0.026	2125.0	1993.0	2132.0	1486.0	1.00
α[1]	3.116	2.507	-1.687	7.561	0.051	0.047	2455.0	1395.0	2506.0	1305.0	1.00
α[2]	2.900	2.017	-0.841	6.609	0.052	0.040	1493.0	1289.0	1502.0	1282.0	1.00
α[3]	0.630	3.235	-5.386	6.556	0.065	0.071	2494.0	1052.0	2484.0	1361.0	1.00
α[4]	4.467	1.946	1.228	8.495	0.047	0.034	1686.0	1686.0	1692.0	1369.0	1.00
α[5]	0.315	2.317	-3.947	4.733	0.048	0.054	2315.0	930.0	2340.0	1412.0	1.00
α[6]	-2.924	3.008	-7.909	3.450	0.062	0.052	2371.0	1648.0	2380.0	1305.0	1.01
α[7]	-0.019	10.236	-18.613	19.346	0.199	0.247	2653.0	859.0	2646.0	1285.0	1.00
β[0]	0.822	0.163	0.524	1.124	0.004	0.003	2131.0	2038.0	2143.0	1416.0	1.00
β[1]	0.523	0.245	0.080	0.990	0.005	0.004	2427.0	2277.0	2478.0	1287.0	1.00
β[2]	0.588	0.199	0.214	0.950	0.005	0.004	1566.0	1478.0	1576.0	1335.0	1.00
β[3]	0.404	0.310	-0.201	0.941	0.006	0.005	2500.0	1876.0	2492.0	1360.0	1.00
β[4]	-0.046	0.198	-0.429	0.304	0.005	0.004	1695.0	1224.0	1701.0	1406.0	1.00
β[5]	0.765	0.232	0.321	1.183	0.005	0.004	2331.0	2098.0	2352.0	1458.0	1.00
β[6]	0.991	0.298	0.381	1.498	0.006	0.004	2361.0	2340.0	2379.0	1297.0	1.01
β[7]	1.025	0.896	-0.683	2.646	0.017	0.016	2649.0	1568.0	2645.0	1303.0	1.00
ϵ	0.596	0.059	0.492	0.706	0.001	0.001	2028.0	1988.0	2056.0	1358.0	1.00

	mean	sd	hpd_3%	hpd_97%	mcse_mean	mcse_sd	ess_mean	ess_sd	ess_bulk	ess_tail	r_hat
μ_α	2.264	1.044	0.219	4.124	0.050	0.036	442.0	411.0	434.0	700.0	1.0
μ_β	0.557	0.141	0.298	0.813	0.006	0.004	518.0	518.0	516.0	944.0	1.0
α[0]	2.878	1.190	0.882	5.400	0.059	0.042	411.0	411.0	415.0	652.0	1.0
α[1]	2.503	1.342	-0.248	4.953	0.054	0.039	621.0	605.0	599.0	968.0	1.0
α[2]	2.642	1.224	0.401	5.009	0.054	0.039	518.0	495.0	519.0	1105.0	1.0
α[3]	1.539	1.618	-1.732	4.426	0.074	0.052	480.0	480.0	486.0	888.0	1.0
α[4]	2.255	1.356	-0.747	4.543	0.054	0.038	642.0	642.0	646.0	526.0	1.0
α[5]	1.914	1.253	-0.645	4.034	0.057	0.041	482.0	474.0	476.0	871.0	1.0
α[6]	1.541	1.536	-1.291	4.413	0.067	0.047	525.0	525.0	544.0	817.0	1.0
α[7]	2.842	1.757	-0.274	6.420	0.075	0.059	546.0	441.0	565.0	660.0	1.0
β[0]	0.807	0.118	0.580	1.021	0.006	0.004	425.0	425.0	427.0	661.0	1.0
β[1]	0.583	0.132	0.327	0.839	0.005	0.004	638.0	638.0	613.0	955.0	1.0
β[2]	0.613	0.120	0.393	0.839	0.005	0.004	525.0	525.0	527.0	1145.0	1.0
β[3]	0.318	0.155	0.028	0.617	0.007	0.005	483.0	449.0	488.0	921.0	1.0
β[4]	0.180	0.139	-0.057	0.481	0.005	0.004	660.0	581.0	661.0	542.0	1.0
β[5]	0.606	0.126	0.382	0.850	0.006	0.004	487.0	487.0	481.0	1032.0	1.0
β[6]	0.549	0.152	0.293	0.859	0.007	0.005	542.0	520.0	557.0	820.0	1.0
β[7]	0.768	0.157	0.438	1.023	0.007	0.005	537.0	537.0	558.0	729.0	1.0
σ_α	1.334	0.987	0.133	3.057	0.064	0.045	238.0	238.0	170.0	254.0	1.0
σ_β	0.288	0.123	0.095	0.512	0.004	0.003	827.0	727.0	831.0	540.0	1.0
ϵ	0.579	0.057	0.480	0.688	0.002	0.001	1300.0	1300.0	1262.0	1048.0	1.0
ν	44.776	33.242	3.721	104.553	0.862	0.610	1488.0	1483.0	1322.0	1146.0	1.0

	mean	sd	hpd_3%	hpd_97%	mcse_mean	mcse_sd	ess_mean	ess_sd	ess_bulk	ess_tail	r_hat
μ_α	2.271	0.973	0.513	4.191	0.031	0.023	978.0	932.0	1002.0	748.0	1.00
μ_β	0.557	0.146	0.278	0.824	0.005	0.004	749.0	700.0	754.0	670.0	1.01
α_offset[0]	0.356	0.852	-1.308	1.935	0.026	0.019	1093.0	970.0	1042.0	1308.0	1.00
α_offset[1]	0.168	0.830	-1.492	1.618	0.023	0.020	1338.0	868.0	1321.0	1340.0	1.00
α_offset[2]	0.218	0.815	-1.296	1.742	0.020	0.019	1611.0	957.0	1590.0	1254.0	1.00
α_offset[3]	-0.399	0.918	-2.167	1.275	0.021	0.021	1889.0	966.0	1839.0	1442.0	1.00
α_offset[4]	0.054	0.924	-1.667	1.807	0.025	0.024	1374.0	736.0	1370.0	1289.0	1.01
α_offset[5]	-0.207	0.845	-1.703	1.531	0.022	0.022	1533.0	723.0	1517.0	1160.0	1.01
α_offset[6]	-0.479	0.928	-2.187	1.303	0.030	0.025	973.0	708.0	858.0	739.0	1.01
α_offset[7]	0.290	0.946	-1.571	2.045	0.024	0.027	1573.0	610.0	1599.0	653.0	1.00
β_offset[0]	0.977	0.577	-0.171	1.954	0.032	0.024	329.0	292.0	346.0	722.0	1.00
β_offset[1]	0.037	0.576	-1.463	0.872	0.052	0.050	121.0	66.0	193.0	33.0	1.01
β_offset[2]	0.177	0.517	-0.914	1.046	0.036	0.026	201.0	201.0	229.0	66.0	1.01
β_offset[3]	-0.875	0.585	-2.035	0.096	0.027	0.019	483.0	483.0	473.0	1104.0	1.00
β_offset[4]	-1.365	0.605	-2.438	-0.218	0.022	0.015	785.0	776.0	790.0	912.0	1.00
β_offset[5]	0.146	0.484	-0.835	0.958	0.021	0.015	527.0	527.0	521.0	876.0	1.00
β_offset[6]	0.034	0.675	-0.830	1.918	0.067	0.059	102.0	67.0	173.0	38.0	1.01
β_offset[7]	0.757	0.573	-0.237	1.820	0.026	0.019	469.0	469.0	470.0	800.0	1.00
σ_α	1.304	1.155	0.001	3.663	0.131	0.117	78.0	49.0	150.0	40.0	1.01
σ_β	0.307	0.127	0.123	0.526	0.005	0.004	595.0	580.0	654.0	892.0	1.00
α[0]	2.707	1.118	0.578	4.754	0.051	0.036	490.0	490.0	456.0	589.0	1.00
α[1]	2.628	1.464	0.419	6.469	0.133	0.126	121.0	68.0	191.0	45.0	1.01
α[2]	2.678	1.269	0.669	5.541	0.101	0.087	159.0	107.0	200.0	78.0	1.01
α[3]	1.602	1.505	-1.274	4.153	0.057	0.040	692.0	692.0	781.0	983.0	1.00
α[4]	2.291	1.264	-0.187	4.521	0.048	0.034	694.0	694.0	634.0	1401.0	1.00
α[5]	1.991	1.207	-0.511	4.076	0.035	0.025	1183.0	1137.0	1192.0	1043.0	1.00
α[6]	1.430	1.734	-2.838	3.927	0.145	0.103	142.0	142.0	242.0	61.0	1.01
α[7]	2.794	1.643	0.057	6.148	0.102	0.078	258.0	220.0	292.0	312.0	1.01
β[0]	0.824	0.112	0.593	1.009	0.005	0.004	530.0	482.0	493.0	580.0	1.00
β[1]	0.569	0.145	0.189	0.782	0.013	0.009	119.0	119.0	186.0	40.0	1.01
β[2]	0.609	0.125	0.328	0.800	0.010	0.007	163.0	163.0	207.0	57.0	1.01
β[3]	0.312	0.144	0.059	0.586	0.005	0.004	734.0	647.0	892.0	985.0	1.00
β[4]	0.176	0.130	-0.079	0.410	0.005	0.003	799.0	799.0	736.0	1343.0	1.00
β[5]	0.597	0.121	0.380	0.840	0.003	0.003	1204.0	1158.0	1216.0	1099.0	1.00
β[6]	0.561	0.172	0.314	0.983	0.014	0.012	144.0	108.0	243.0	58.0	1.01
β[7]	0.771	0.149	0.502	1.026	0.008	0.006	335.0	335.0	362.0	788.0	1.00
ϵ	0.596	0.056	0.486	0.689	0.001	0.001	1829.0	1772.0	1892.0	1188.0	1.00