Making Custom Distributions

Introduction

By using the InterpolatedUnivariateDistribution class, you can easily create a single-variable distribution by specifying its PDF as a callable function. Here, we'll demonstrate this functionality by implementing the asymmetric Lorentz distribution of Stancik and Brauns.

Preamble

As always, we start by setting up the Python environment for inline plotting and true division.



In [1]:

    
from __future__ import division
%matplotlib inline









    



/home/cgranade/anaconda/envs/qinfer-binder/lib/python3.5/site-packages/matplotlib/__init__.py:872: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
  warnings.warn(self.msg_depr % (key, alt_key))



In [2]:

    
import numpy as np
import matplotlib.pyplot as plt
try: plt.style.use('ggplot')
except: pass

Next, we import the InterpolatedUnivariateDistribution class.



In [3]:

    
from qinfer.distributions import InterpolatedUnivariateDistribution









    



/home/cgranade/anaconda/envs/qinfer-binder/lib/python3.5/site-packages/qinfer/metrics.py:51: UserWarning: Could not import scikit-learn. Some features may not work.
  warnings.warn("Could not import scikit-learn. Some features may not work.")
/home/cgranade/anaconda/envs/qinfer-binder/lib/python3.5/site-packages/IPython/parallel.py:13: ShimWarning: The `IPython.parallel` package has been deprecated. You should import from ipyparallel instead.
  "You should import from ipyparallel instead.", ShimWarning)
/home/cgranade/anaconda/envs/qinfer-binder/lib/python3.5/site-packages/qinfer/parallel.py:53: UserWarning: Could not import IPython parallel. Parallelization support will be disabled.
  "Could not import IPython parallel. "

Defining Distributions

The asymmetric Lorentz distribution is defined by letting the scale parameter $\gamma$ of a Lorentz distribution be a function of the random variable $x$, $$ \gamma(x) = \frac{2\gamma_0}{1 + \exp(a [x - x_0])}. $$ It is straightforward to implement this in a vectorized way by defining this function and then substituting it into the PDF of a Lorentz distribution.



In [4]:

    
def asym_lorentz_scale(x, x_0, gamma_0, a):
    return 2 * gamma_0 / (1 + np.exp(a * (x - x_0)))



In [5]:

    
def asym_lorentz_pdf(x, x_0, gamma_0, a):
    gamma = asym_lorentz_scale(x, x_0, gamma_0, a)
    return 2 * gamma / (np.pi * gamma_0 * (1 + 4 * ((x - x_0) / (gamma_0))**2))

Once we have this, we can pass the PDF as a lambda function to InterpolatedUnivariateDistribution in order to specify the values of the location $x_0$, the nominal scale $\gamma_0$ and the asymmetry parameter $a$.



In [6]:

    
dist = InterpolatedUnivariateDistribution(lambda x: asym_lorentz_pdf(x, 0, 1, 2), 2, 1200)

The resulting distribution can be sampled like any other, such that we can quickly check that it produces something of the desired shape.



In [7]:

    
hist(dist.sample(n=10000), bins=100);

We note that making this distribution object is fast enough that it can conceivably be embedded within a likelihood function itself, so as to enable using the method of hyperparameters to estimate the parameters of the asymmetric Lorentz distribution.



In [8]:

    
%timeit dist = InterpolatedUnivariateDistribution(lambda x: asym_lorentz_pdf(x, 0, 1, 2), 2, 120)









    



1000 loops, best of 3: 764 µs per loop



In [9]:

    
%timeit dist = InterpolatedUnivariateDistribution(lambda x: asym_lorentz_pdf(x, 0, 1, 2), 2, 1200)









    



1000 loops, best of 3: 957 µs per loop



In [10]:

    
%timeit dist = InterpolatedUnivariateDistribution(lambda x: asym_lorentz_pdf(x, 0, 1, 2), 2, 12000)









    



100 loops, best of 3: 2.65 ms per loop



In [ ]: