In [1]:

    
# https://github.com/pymc-devs/pymc3/blob/master/docs/source/notebooks/updating_priors.ipynb

Use an arbitary distribution

NOTE this requires Pymc3 3.1

pymc3.distributions.DensityDist



In [ ]:



In [1]:

    
# pymc3.distributions.DensityDist?



In [2]:

    
import matplotlib.pyplot as plt
import matplotlib as mpl
from pymc3 import Model, Normal, Slice
from pymc3 import sample
from pymc3 import traceplot
from pymc3.distributions import Interpolated
from theano import as_op
import theano.tensor as tt
import numpy as np
from scipy import stats

%matplotlib inline

%load_ext version_information

%version_information pymc3









    Out[2]:




Software Version
Python 3.6.2 64bit [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
IPython 6.1.0
OS Darwin 15.6.0 x86_64 i386 64bit
pymc3 3.1
Mon Sep 18 11:42:20 2017 MDT



In [5]:

    
from sklearn.neighbors.kde import KernelDensity
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
kde = KernelDensity(kernel='gaussian', bandwidth=0.2).fit(X)
kde.score_samples(X)
plt.scatter(X[:,0], X[:,1])









    Out[5]:





<matplotlib.collections.PathCollection at 0x114862358>



In [ ]:



In [ ]:



In [ ]:



In [ ]:

Generating data



In [3]:

    
# Initialize random number generator
np.random.seed(123)

# True parameter values
alpha_true = 5
beta0_true = 7
beta1_true = 13

# Size of dataset
size = 100

# Predictor variable
X1 = np.random.randn(size)
X2 = np.random.randn(size) * 0.2

# Simulate outcome variable
Y = alpha_true + beta0_true * X1 + beta1_true * X2 + np.random.randn(size)

Model specification

Our initial beliefs about the parameters are quite informative (sd=1) and a bit off the true values.



In [4]:

    
basic_model = Model()

with basic_model:
    
    # Priors for unknown model parameters
    alpha = Normal('alpha', mu=0, sd=1)
    beta0 = Normal('beta0', mu=12, sd=1)
    beta1 = Normal('beta1', mu=18, sd=1)
    
    # Expected value of outcome
    mu = alpha + beta0 * X1 + beta1 * X2
    
    # Likelihood (sampling distribution) of observations
    Y_obs = Normal('Y_obs', mu=mu, sd=1, observed=Y)
    
    # draw 1000 posterior samples
    trace = sample(1000)









    



Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 180.72:   9%|▉         | 18724/200000 [00:01<00:18, 9914.78it/s]
Convergence archived at 19400
Interrupted at 19,400 [9%]: Average Loss = 958.67
100%|██████████| 1500/1500 [00:01<00:00, 1367.02it/s]



In [5]:

    
traceplot(trace);

In order to update our beliefs about the parameters, we use the posterior distributions, which will be used as the prior distributions for the next inference. The data used for each inference iteration has to be independent from the previous iterations, otherwise the same (possibly wrong) belief is injected over and over in the system, amplifying the errors and misleading the inference. By ensuring the data is independent, the system should converge to the true parameter values.

Because we draw samples from the posterior distribution (shown on the right in the figure above), we need to estimate their probability density (shown on the left in the figure above). Kernel density estimation (KDE) is a way to achieve this, and we will use this technique here. In any case, it is an empirical distribution that cannot be expressed analytically. Fortunately PyMC3 provides a way to use custom distributions, via Interpolated class.



In [6]:

    
def from_posterior(param, samples):
    smin, smax = np.min(samples), np.max(samples)
    width = smax - smin
    x = np.linspace(smin, smax, 100)
    y = stats.gaussian_kde(samples)(x)
    
    # what was never sampled should have a small probability but not 0,
    # so we'll extend the domain and use linear approximation of density on it
    x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
    y = np.concatenate([[0], y, [0]])
    return Interpolated(param, x, y)

Now we just need to generate more data and build our Bayesian model so that the prior distributions for the current iteration are the posterior distributions from the previous iteration. It is still possible to continue using NUTS sampling method because Interpolated class implements calculation of gradients that are necessary for Hamiltonian Monte Carlo samplers.



In [7]:

    
traces = [trace]



In [8]:

    
for _ in range(10):

    # generate more data
    X1 = np.random.randn(size)
    X2 = np.random.randn(size) * 0.2
    Y = alpha_true + beta0_true * X1 + beta1_true * X2 + np.random.randn(size)

    model = Model()
    with model:
        # Priors are posteriors from previous iteration
        alpha = from_posterior('alpha', trace['alpha'])
        beta0 = from_posterior('beta0', trace['beta0'])
        beta1 = from_posterior('beta1', trace['beta1'])
 
        # Expected value of outcome
        mu = alpha + beta0 * X1 + beta1 * X2

        # Likelihood (sampling distribution) of observations
        Y_obs = Normal('Y_obs', mu=mu, sd=1, observed=Y)
                
        # draw 10000 posterior samples
        trace = sample(1000)
        traces.append(trace)









    



Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 130.05:   6%|▌         | 12258/200000 [00:03<00:49, 3758.99it/s]
Convergence archived at 12300
Interrupted at 12,300 [6%]: Average Loss = 145.76
100%|██████████| 1500/1500 [00:04<00:00, 366.29it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 136.93:   6%|▌         | 12221/200000 [00:03<00:48, 3833.78it/s]
Convergence archived at 12300
Interrupted at 12,300 [6%]: Average Loss = 147.82
100%|██████████| 1500/1500 [00:03<00:00, 415.15it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 139.19:   5%|▌         | 10760/200000 [00:02<00:45, 4169.32it/s]
Convergence archived at 11100
Interrupted at 11,100 [5%]: Average Loss = 149.28
100%|██████████| 1500/1500 [00:03<00:00, 408.66it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 141.75:   5%|▌         | 10457/200000 [00:02<00:51, 3691.59it/s]
Convergence archived at 10500
Interrupted at 10,500 [5%]: Average Loss = 151.11
100%|██████████| 1500/1500 [00:03<00:00, 384.84it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 158.12:   4%|▍         | 7689/200000 [00:01<00:50, 3839.67it/s]
Convergence archived at 7900
Interrupted at 7,900 [3%]: Average Loss = 166.49
100%|██████████| 1500/1500 [00:04<00:00, 354.93it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 143.87:   6%|▌         | 12281/200000 [00:03<00:46, 4046.94it/s]
Convergence archived at 12300
Interrupted at 12,300 [6%]: Average Loss = 159.5
100%|██████████| 1500/1500 [00:03<00:00, 395.59it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 136.26:   7%|▋         | 14715/200000 [00:03<00:46, 3979.90it/s]
Convergence archived at 14800
Interrupted at 14,800 [7%]: Average Loss = 146.52
100%|██████████| 1500/1500 [00:03<00:00, 430.66it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 138.87:   5%|▍         | 9214/200000 [00:02<00:47, 3989.76it/s]
Convergence archived at 9600
Interrupted at 9,600 [4%]: Average Loss = 147.55
100%|██████████| 1500/1500 [00:03<00:00, 391.90it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 156.24:   9%|▉         | 17625/200000 [00:04<00:47, 3857.94it/s]
Convergence archived at 18000
Interrupted at 18,000 [9%]: Average Loss = 163.64
100%|██████████| 1500/1500 [00:04<00:00, 354.67it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 141.24:   8%|▊         | 16281/200000 [00:04<00:49, 3699.13it/s]
Convergence archived at 16600
Interrupted at 16,600 [8%]: Average Loss = 150.14
100%|██████████| 1500/1500 [00:03<00:00, 376.04it/s]



In [9]:

    
print('Posterior distributions after ' + str(len(traces)) + ' iterations.')
cmap = mpl.cm.autumn
for param in ['alpha', 'beta0', 'beta1']:
    plt.figure(figsize=(8, 2))
    for update_i, trace in enumerate(traces):
        samples = trace[param]
        smin, smax = np.min(samples), np.max(samples)
        x = np.linspace(smin, smax, 100)
        y = stats.gaussian_kde(samples)(x)
        plt.plot(x, y, color=cmap(1 - update_i / len(traces)))
    plt.axvline({'alpha': alpha_true, 'beta0': beta0_true, 'beta1': beta1_true}[param], c='k')
    plt.ylabel('Frequency')
    plt.title(param)
    plt.show()









    



Posterior distributions after 11 iterations.

You can re-execute the last two cells to generate more updates.

What is interesting to note is that the posterior distributions for our parameters tend to get centered on their true value (vertical lines), and the distribution gets thiner and thiner. This means that we get more confident each time, and the (false) belief we had at the beginning gets flushed away by the new data we incorporate.



In [10]:

    
for _ in range(10):

    # generate more data
    X1 = np.random.randn(size)
    X2 = np.random.randn(size) * 0.2
    Y = alpha_true + beta0_true * X1 + beta1_true * X2 + np.random.randn(size)

    model = Model()
    with model:
        # Priors are posteriors from previous iteration
        alpha = from_posterior('alpha', trace['alpha'])
        beta0 = from_posterior('beta0', trace['beta0'])
        beta1 = from_posterior('beta1', trace['beta1'])

        # Expected value of outcome
        mu = alpha + beta0 * X1 + beta1 * X2

        # Likelihood (sampling distribution) of observations
        Y_obs = Normal('Y_obs', mu=mu, sd=1, observed=Y)
                
        # draw 10000 posterior samples
        trace = sample(1000)
        traces.append(trace)









    



Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 139.6:   7%|▋         | 13704/200000 [00:03<00:46, 3966.45it/s] 
Convergence archived at 13800
Interrupted at 13,800 [6%]: Average Loss = 146.56
100%|██████████| 1500/1500 [00:03<00:00, 388.42it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 131.74:   4%|▍         | 8248/200000 [00:02<00:49, 3873.89it/s]
Convergence archived at 8300
Interrupted at 8,300 [4%]: Average Loss = 138.37
100%|██████████| 1500/1500 [00:04<00:00, 335.05it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 139.78:   8%|▊         | 15037/200000 [00:03<00:50, 3687.62it/s]
Convergence archived at 15400
Interrupted at 15,400 [7%]: Average Loss = 147.81
100%|██████████| 1500/1500 [00:03<00:00, 435.10it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 154.99:   5%|▌         | 10159/200000 [00:02<00:45, 4150.15it/s]
Convergence archived at 10400
Interrupted at 10,400 [5%]: Average Loss = 161.32
100%|██████████| 1500/1500 [00:04<00:00, 364.39it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 141.27:   6%|▋         | 12872/200000 [00:03<00:50, 3691.38it/s]
Convergence archived at 13100
Interrupted at 13,100 [6%]: Average Loss = 149.2
100%|██████████| 1500/1500 [00:04<00:00, 338.90it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 145.93:   7%|▋         | 13058/200000 [00:03<00:48, 3849.34it/s]
Convergence archived at 13200
Interrupted at 13,200 [6%]: Average Loss = 159.83
100%|█████████▉| 1498/1500 [00:06<00:00, 248.10it/s]/Users/balarsen/miniconda3/envs/python3/lib/python3.6/site-packages/pymc3/step_methods/hmc/nuts.py:456: UserWarning: Chain 0 contains 3 diverging samples after tuning. If increasing `target_accept` does not help try to reparameterize.
  % (self._chain_id, n_diverging))
100%|██████████| 1500/1500 [00:06<00:00, 223.91it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 152.51:   5%|▌         | 10107/200000 [00:02<00:49, 3864.49it/s]
Convergence archived at 10400
Interrupted at 10,400 [5%]: Average Loss = 158.79
100%|██████████| 1500/1500 [00:04<00:00, 308.90it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 136.7:   5%|▍         | 9142/200000 [00:02<00:50, 3772.21it/s] 
Convergence archived at 9200
Interrupted at 9,200 [4%]: Average Loss = 145.32
100%|██████████| 1500/1500 [00:05<00:00, 265.95it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 139.58:   5%|▍         | 9174/200000 [00:02<00:49, 3850.77it/s]
Convergence archived at 9500
Interrupted at 9,500 [4%]: Average Loss = 152.53
100%|██████████| 1500/1500 [00:03<00:00, 379.61it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 139.33:   7%|▋         | 14512/200000 [00:03<00:50, 3680.81it/s]
Convergence archived at 14600
Interrupted at 14,600 [7%]: Average Loss = 147.73
100%|██████████| 1500/1500 [00:04<00:00, 301.23it/s]



In [11]:

    
print('Posterior distributions after ' + str(len(traces)) + ' iterations.')
cmap = mpl.cm.autumn
for param in ['alpha', 'beta0', 'beta1']:
    plt.figure(figsize=(8, 2))
    for update_i, trace in enumerate(traces):
        samples = trace[param]
        smin, smax = np.min(samples), np.max(samples)
        x = np.linspace(smin, smax, 100)
        y = stats.gaussian_kde(samples)(x)
        plt.plot(x, y, color=cmap(1 - update_i / len(traces)))
    plt.axvline({'alpha': alpha_true, 'beta0': beta0_true, 'beta1': beta1_true}[param], c='k')
    plt.ylabel('Frequency')
    plt.title(param)
    plt.show()









    



Posterior distributions after 21 iterations.



In [12]:

    
for _ in range(10):

    # generate more data
    X1 = np.random.randn(size)
    X2 = np.random.randn(size) * 0.2
    Y = alpha_true + beta0_true * X1 + beta1_true * X2 + np.random.randn(size)

    model = Model()
    with model:
        # Priors are posteriors from previous iteration
        alpha = from_posterior('alpha', trace['alpha'])
        beta0 = from_posterior('beta0', trace['beta0'])
        beta1 = from_posterior('beta1', trace['beta1'])

        # Expected value of outcome
        mu = alpha + beta0 * X1 + beta1 * X2

        # Likelihood (sampling distribution) of observations
        Y_obs = Normal('Y_obs', mu=mu, sd=1, observed=Y)
                
        # draw 10000 posterior samples
        trace = sample(1000)
        traces.append(trace)









    



Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 156.72:   6%|▌         | 12078/200000 [00:03<00:49, 3795.07it/s]
Convergence archived at 12200
Interrupted at 12,200 [6%]: Average Loss = 167.39
100%|██████████| 1500/1500 [00:04<00:00, 352.19it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 145.09:   7%|▋         | 13952/200000 [00:03<00:48, 3813.99it/s]
Convergence archived at 14100
Interrupted at 14,100 [7%]: Average Loss = 154.96
100%|██████████| 1500/1500 [00:03<00:00, 407.58it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 141.53:   5%|▌         | 10656/200000 [00:02<00:49, 3854.80it/s]
Convergence archived at 10800
Interrupted at 10,800 [5%]: Average Loss = 151.16
100%|██████████| 1500/1500 [00:03<00:00, 393.20it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 139.76:   7%|▋         | 14756/200000 [00:03<00:46, 4010.98it/s]
Convergence archived at 14800
Interrupted at 14,800 [7%]: Average Loss = 146.05
100%|██████████| 1500/1500 [00:03<00:00, 386.88it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 141.06:   7%|▋         | 14965/200000 [00:03<00:50, 3697.47it/s]
Convergence archived at 15300
Interrupted at 15,300 [7%]: Average Loss = 147.88
100%|██████████| 1500/1500 [00:04<00:00, 352.72it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 141.56:   6%|▋         | 12830/200000 [00:03<00:49, 3773.81it/s]
Convergence archived at 13100
Interrupted at 13,100 [6%]: Average Loss = 148.9
100%|██████████| 1500/1500 [00:04<00:00, 372.56it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 130.83:   6%|▋         | 12697/200000 [00:03<00:49, 3760.12it/s]
Convergence archived at 13000
Interrupted at 13,000 [6%]: Average Loss = 137.58
100%|██████████| 1500/1500 [00:05<00:00, 282.27it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 146.72:   7%|▋         | 13366/200000 [00:03<00:49, 3754.16it/s]
Convergence archived at 13400
Interrupted at 13,400 [6%]: Average Loss = 153.72
100%|██████████| 1500/1500 [00:03<00:00, 438.34it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 151.28:   7%|▋         | 13148/200000 [00:03<00:47, 3900.75it/s]
Convergence archived at 13200
Interrupted at 13,200 [6%]: Average Loss = 157.81
100%|██████████| 1500/1500 [00:04<00:00, 348.12it/s]
Auto-assigning NUTS sampler...
Initializing NUTS using ADVI...
Average Loss = 139.41:   7%|▋         | 14511/200000 [00:03<00:44, 4133.31it/s]
Convergence archived at 14600
Interrupted at 14,600 [7%]: Average Loss = 146.61
100%|██████████| 1500/1500 [00:03<00:00, 444.71it/s]



In [13]:

    
print('Posterior distributions after ' + str(len(traces)) + ' iterations.')
cmap = mpl.cm.autumn
for param in ['alpha', 'beta0', 'beta1']:
    plt.figure(figsize=(8, 2))
    for update_i, trace in enumerate(traces):
        samples = trace[param]
        smin, smax = np.min(samples), np.max(samples)
        x = np.linspace(smin, smax, 100)
        y = stats.gaussian_kde(samples)(x)
        plt.plot(x, y, color=cmap(1 - update_i / len(traces)))
    plt.axvline({'alpha': alpha_true, 'beta0': beta0_true, 'beta1': beta1_true}[param], c='k')
    plt.ylabel('Frequency')
    plt.title(param)
    plt.show()









    



Posterior distributions after 31 iterations.



In [ ]:

Software	Version
Python	3.6.2 64bit [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
IPython	6.1.0
OS	Darwin 15.6.0 x86_64 i386 64bit
pymc3	3.1
Mon Sep 18 11:42:20 2017 MDT