The Gaussian copula and the meta-Gaussian distribution


In [1]:
%matplotlib inline
import elfi
import scipy.stats as ss
import numpy as np
import matplotlib.pyplot as plt

In [2]:
# This hack is needed for relative imports.
import os
import sys
sys.path.insert(0, os.path.abspath('.'))

import toy_model
import copula_abc as cop

Due to Sklar's theorem the cumulative distribution function $F$ of a $p$-dimensional random variable $X$ can be expressed in terms of its marginal distribution functions $F_i$ and a copula $C$ as $$F(x_i, \dots, x_p) = C(F_i(x_1), \dots, F_p(x_p)) \, .$$ If the random variable $X$ also has a density function $f$, it further holds that $$f(x) = c(F_i(x_1), \dots, F_p(x_p)) \prod_{i=1}^p f_i(x_i) \, ,$$ where $f_i$ are the marginal densities and $c = \frac{\partial^p C}{\partial F_1, \dots, \partial F_p}$ is the density of the copula. Notably, a multivariate Gaussian distribution can be written in this form $$h(x) = \underbrace{\frac{1}{|R|^{\frac12}} \exp \left\{ -\frac12 u^T (R^{-1} - I) u \right\}}_{c_{G}(H_i(x_1), \dots, H_p(x_p))} \prod_{i=1}^p \underbrace{\frac{1}{\sigma_i}\phi(u_i)}_{h_i(x_i)}\, ,$$ where $u_i = \Phi^{-1}(F_i(x_i))$, $\phi$ is the standard normal density, $H_i$ and $h_i$ are the marginal CDFs and densities respectively, and $R$ is a correlation matrix. Here the term $c_G$ is the density of the Gaussian copula. The meta-Gaussian distribution is obtained by substituting the normal distributions $h_i$ with arbitrary marginal densities.

As can be seen from the density function $h$, the meta-Gaussian distribution has the useful propery that the joint distribution is fully specified by the marginal distributions and the pairwise correlations between the different components. In large sample settings the posterior distribution of an ABC model is approximately normal. Li at al 2017 [1] use this fact to justify using a meta-Gaussian approximation of the posterior. This enables us to break down the inference of high dimensional ABC models into multiple easier subproblems of estimating of 1- and 2-dimensional distributions.

A meta-Gaussian distribution can be constructed in ELFI by specifying the marginal distributions $h_i$ and the correlation matrix $R$. The marginal distributions can be any objects which implement the cdf and ppf methods, for example distributions from scipy. Here we define a meta-Gaussian distribution with beta and gamma distributed marginals with a correlation of 0.9.


In [3]:
rho = 0.9
corr = np.array([[1, rho],
                 [rho, 1]])
marginals = [ss.beta(5, 2),
             ss.gamma(3, 2)]

mg = cop.MetaGaussian(corr=corr, marginals=marginals)

t1 = np.linspace(0.2, 1.1, 30)
t2 = np.linspace(1, 11, 30)
grid, vals = cop.tabulate(mg.pdf, t1, t2)
plt.contour(*grid, vals);


The marginal distributions can also be estimated from data using the EmpiricalDensity class. It estimates the density function using a Gaussian kernel density estimate and the cumulative distribution function using linear interpolation of the empirical CDF.


In [4]:
from elfi.distributions import EmpiricalDensity
density_estimate = EmpiricalDensity(np.random.randn(1000))

t = np.linspace(-5, 5, 100)
plt.plot(t, density_estimate.pdf(t));


Copula ABC

Li at al 2016 [1] exploits the structure of the meta-Gaussian distribution to scale ABC to higher dimensional models. We assume that the summary statistic can be factorized into subsets that are informative for each component of the parameter vector of interest.

The principal difficulty of using Copula ABC in ELFI is the construction of the submodels needed for the approximation of the parameters of the meta-Gaussian distribution. The main idea is to produce an inference model that has discrepancy nodes corresponding to each informative subset of the summary statistic. Providing a general interface for the estimation of meta-Gaussian distributions for arbitrary ELFI models is very difficult. Instead of trying to achieve that, ELFI provides several utilities for constructing suitable inference models.

These utilities assume that there is a single summary statistic the components of which form the informative subsets for the components of the parameter. We also assume that there is only a single multivariate prior distribution. In addition, we assume that the bivariate summary statistics are the unions of the univariate ones. In other cases the inference model will have to be constructed by the user.

Constructing the inference model

We will look at an example taken from Li at al 2016 [1] to illustrate these tools. Assume that the data $y$ is generated from a multivariate Gaussian $y \sim N_p(\mu, I)$, $p \geq 2$. We use a "twisted" Gaussian distribution as our prior for $\mu$, which is obtained by transforming the Gaussian distribution $N(0, \Sigma)$, where $\Sigma = \text{diag}([100, 1, 1, \dots, 1])$, by the function $g_b(x) = (x_1, x_2 + b x_1^2 - 100 b, x_3, \dots, x_p$). We use $b=0.1$ to get a strong correlation between the first two components, $y_{obs} = [10, 0, \dots, 0]$ as the observed data, and the identity function as the summary statistic $s = S(y) = y$.

Exploting our knowledge of the model, we use the summary statistic informative about $\mu_i$ to be $s_i$ with the expection of the second component, which we set to $s_{(2)} = (s_1, s_2)$. For the bivariate marginals, we use the unions of the previous summary statistics i.e $s_{(2, 3)} = (s_1, s_2, s_3)$.

This model is implemented in the file toy_model.py


In [5]:
p = 3

m = toy_model.get_model(p)
elfi.draw(m)


Out[5]:
%3 mu mu Gauss Gauss mu->Gauss summary summary Gauss->summary

The informative indices of the summary statistic are defined in a dictionary where each index of the parameter $\theta$ corresponds to a set of summary statistic indices that are informative for that particular index.


In [6]:
informative_indices = {i: set([i]) for i in range(p)}
informative_indices[1] = {0, 1}
informative_indices


Out[6]:
{0: {0}, 1: {0, 1}, 2: {2}}

The function complete_informative_indices completes the definition of the informative indices by adding all the pairwise unions of the univariate summary statistics.


In [7]:
full_indices = cop.complete_informative_indices(informative_indices); full_indices


Out[7]:
{0: {0}, 1: {0, 1}, 2: {2}, (0, 1): {0, 1}, (0, 2): {0, 2}, (1, 2): {0, 1, 2}}

Next, we need to add the summary statistic and discrepancy nodes corresponding to the subsets we defined. The make_distances function adds summary and distance nodes corresponding to the specified indices. By default the ELFI model is copied instead of modifying it in place. This behavious can be changed with the inplace keyword.


In [8]:
summary = m['summary']
distance_nodes = cop.make_distances(full_indices, summary)

# The model is not modified in place by default
elfi.draw(m)


Out[8]:
%3 mu mu Gauss Gauss mu->Gauss summary summary Gauss->summary

In [9]:
# Here is the modified model
elfi.draw(distance_nodes[0].model)


Out[9]:
%3 mu mu Gauss Gauss mu->Gauss summary summary Gauss->summary S0 S0 summary->S0 S1 S1 summary->S1 S2 S2 summary->S2 S3 S3 summary->S3 S4 S4 summary->S4 S5 S5 summary->S5 D0 D0 S0->D0 D1 D1 S1->D1 D2 D2 S2->D2 D3 D3 S3->D3 D4 D4 S4->D4 D5 D5 S5->D5

Sampling and estimating parameters

Finally, we must specify which ABC method to use for each of the submodels. The make_samplers function will construct a dictionary of sampling algorithms for each of the marginal distributions. The sampler_factory argument needs to be given a function which, given a discrepancy node, will produce an ABC method for sampling the corresponding subtree.


In [10]:
def Rejection(**kwargs):
    def wrapper(dist):
        return elfi.Rejection(dist, **kwargs)
    
    return wrapper

samplers = cop.make_samplers(distance_nodes, Rejection(batch_size=100)); samplers


Out[10]:
{0: <elfi.methods.parameter_inference.Rejection at 0x7fed3a418a58>,
 1: <elfi.methods.parameter_inference.Rejection at 0x7fed3a418828>,
 2: <elfi.methods.parameter_inference.Rejection at 0x7fed3a42cda0>,
 (0, 1): <elfi.methods.parameter_inference.Rejection at 0x7fed3a42ca20>,
 (0, 2): <elfi.methods.parameter_inference.Rejection at 0x7fed3a42c550>,
 (1, 2): <elfi.methods.parameter_inference.Rejection at 0x7fed3a42c320>}

The get_samples function provides a convenient way to obtain samples from the marginal distributions.


In [11]:
# sample from the first component
sample0 = cop.get_samples(marginal=0, samplers=samplers, n_samples=10, parameter='mu', quantile=0.01)

# sample from the bivariate marginal of the first and second component
sample01 = cop.get_samples(marginal=(0, 1), samplers=samplers, n_samples=10, parameter='mu', quantile=0.1)
sample01


Out[11]:
array([[ 10.47952793,   0.82166285],
       [ 10.28033902,   1.61649735],
       [ 10.91953318,   0.90371598],
       [  8.30900669,  -1.72784092],
       [  8.57632374,  -2.51606538],
       [ 11.36845168,   3.69932377],
       [  8.57595868,  -3.10522639],
       [  8.67280917,  -2.7638273 ],
       [  8.7046615 ,  -0.91064065],
       [  7.88197058,  -3.44926375]])

The entries in the correlation matrix $R$ can be estimated using bivariate samples from $(\mu_i, \mu_j)$ as described in Li at al 2016 [1]. This is done with the estimate_correlation function. There are no guarantees that the correlation matrix constructed from the pairwise correlations is positive deifinite. In such cases other estimations for the correlation matrix will have to be made.


In [12]:
cop.estimate_correlation(marginal=(0, 1), samplers=samplers, parameter='mu', n_samples=100, quantile=0.01)


Out[12]:
0.68707977623011141

Estimating the meta-Gaussian

The copula_abc function combines all the previous steps to produce a meta-Gaussian approximation of the posterior distribution. In this case we are only interested in the marginal distribution of the first two components so we only define a part of the informative indices. Note that the dimensionality of the model doesn't affect the estimation as the marginal is estimated in the same way in all cases.


In [13]:
informative_indices = {0: 0, 1: {0, 1}}
posterior = cop.copula_abc(informative_indices, summary=m['summary'],
                           parameter='mu',
                           sampler_factory=Rejection(batch_size=1000),
                           n_samples=10000, quantile=0.01)


Estimating marginal densities: 100%|██████████| 2/2 [06:28<00:00, 194.20s/it]
Estimating correlations: 100%|██████████| 1/1 [03:15<00:00, 195.86s/it]

Comparison to naive rejection sampling

A naive application of rejection sampling does not take the the structure of the model into account and the performance depends heavily on the dimensionality of the model.


In [14]:
p = 100
m2 = toy_model.get_model(p)
d = elfi.Distance('euclidean', m2['Gauss'])
rej = elfi.Rejection(d)
res = rej.sample(n_samples=1000, quantile=0.01)

In [15]:
rejection_posterior = res.outputs['mu'][:, [0, 1]]
rp0 = rejection_posterior[:, 0]
rp1 = rejection_posterior[:, 1]
kde_rp0 = ss.gaussian_kde(rp0)
kde_rp1 = ss.gaussian_kde(rp1)

In [16]:
def mposterior(x, b=0.1):
    """Marginal posterior for the first two components."""
    t1 = x[0]
    t2 = x[1]
    C = 1/(40*np.pi**2)
    return C*np.exp(-(-10 + t1)**2/2. - t1**2/200. - t2**2/2. - (-(b*(-100 + t1**2)) + t2)**2/2.)

In [17]:
def posterior_0(t1, b=0.1):
    """Un-normalized marginal"""
    return np.exp(-50 + 10*t1 - (101*t1**2)/200. - (b**2*(-100 + t1**2)**2)/4.)*np.sqrt(np.pi)

In [18]:
MAP = mposterior(np.array([10, 0]))

In [19]:
MAP


Out[19]:
0.0015363601089363004

In [20]:
from copula_abc import overlay

levels = np.logspace(np.log(0.0001), np.log(MAP), 8, base=np.e)

t1 = np.linspace(4, 16, 30)
t2 = np.linspace(-9, 9, 30)
overlay({posterior.pdf: {'levels': levels},
         mposterior: {'linestyles': 'dashed', 'levels': levels}}, t1, t2)



In [21]:
# t1 = np.linspace(6, 13, 30)
# t2 = np.linspace(-8, 8, 30)
kde = ss.gaussian_kde(np.transpose(rejection_posterior))
overlay({kde.pdf: {'levels': levels},
         mposterior: {'linestyles': 'dashed', 'levels': levels}}, t1, t2)


Marginals

The Copula ABC (blue line) has better summary statistics for inferring the second component and produces a better result compared to the naive rejection sampling approach (orange dashed line).


In [22]:
overlay({posterior.marginals[0].pdf: {}, kde_rp0.pdf: {'linestyle': 'dashed'},
         posterior_0:{}}, np.linspace(5, 15, 50))



In [23]:
overlay({posterior.marginals[1].pdf: {}, kde_rp1.pdf: {'linestyle': 'dashed'}}, np.linspace(-5, 5, 50))


Model validation

The Copula ABC method relies on the approximate normality of the posterior. Li at al 2017 [1] suggest comparing the bivariate meta-Gaussian approximations to density estimates of corrected samples based on the informative summary statistics.

References

  • [1] J.Li, D.J.Nott, Y.Fan, S.A.Sisson; Extending approximate Bayesian computation methods to high dimensions via a Gaussian copula model. Computational Statistics & Data Analytics Volume 106, February 2017, Pages 77-89; doi: 10.1016/j.csda.2016.07.005