An Aside on Alpha Stable Distributions

Alpha-stable distributions are a fascinating family of distributions. Some basic practices of statistics like hypothesis testing likely wouldn't exist if there were no $\alpha$-stable distributions (we'll see why as we continue). This notebook will serve as an extremely brief introduction and "learning by doing" exercise, particularly focused on Python implementations of this family of distributions.

Whenever possible I will draw discussion from external (open) sources, and focus energy and time on code.

A Lightning Intro

Let's start with a quick description from wikipedia:

In probability theory, a distribution or a random variable is said to be stable if a linear combination of two independent copies of a random sample has the same distribution, up to location and scale parameters. The stable distribution family is also sometimes referred to as the Lévy alpha-stable distribution, after Paul Lévy, the first mathematician to have studied it.

More description from U Huntsville's great Random online notes, specifically the stable distribution section:

Stable distributions are an important general class of probability distributions on $\mathbb{R}$ that are defined in terms of location-scale transformations. Stable distributions occur as limits (in distribution) of scaled and centered sums of independent, identically distributed variables. Such limits generalize the central limit theorem, and so stable distributions generalize the normal distribution in a sense. The pioneering work on stable distributions was done by Paul Lévy.

...

Recall that two distributions on $\mathbb{R}$ that are related by a location-scale transformation are said to be of the same type, and that being of the same type defines an equivalence relation on the class of distributions on $\mathbb{R}$. With this terminology, the definition of stability has a more elegant expression: X has a stable distribution if the sum of a finite number of independent copies of X is of the same type as X.

This family of distributions is defined by four parameters, which have a variety of different names depending on the text you reference. To be consistent with Dominicy and Veredas, I'll use the following:

  • $\alpha \in (0,2]$, is the "tail index:" it measures the thickness of tails of the distribution.
    • Also called the "characteristic index," "stability index," or the first shape parameter.
    • Governs existence of moments: $\mathbb{E} \left[ X^{p} \right] \lt \infty \textrm{, } \forall p \lt \alpha$.
    • Note this implies moments greater than 2 don't exist. (In fact it turns out that only the Normal distribution, a special case of the stable family, will have finite variance. Very interesting.)
  • $\beta \in [-1,1]$ is the "skewness" parameter.
    • $\beta < 0 \rightarrow $ left-skewed (left-asymmetric), while $\beta > 0 \rightarrow $ right-skewed (right-asymmetric).
    • Also referred to as the second shape parameter.
    • Note: as $\alpha \rightarrow 2, \beta$ become unidentifies; for $\alpha = 2$ the distribution is Normal and $\beta$ is ignored.
  • $\sigma \in \mathbb{R}^{+}$ is the "scale" or "dispersion" parameter
  • $\mu \in \mathbb{R}$ is the location parameter

An instance of this family may be denoted $S(\alpha, \beta, \sigma, \mu)$. Be careful about the last two $\sigma$ and $\mu$ parameters -- these do not correspond immediately to typical mean and variance. For example, for normal distribution special case, $2 \sigma^{2}_{stable} = \sigma^{2}_{normal}$.

There are three special cases of the family of stable distributions:

  • Normal: $S(\alpha=2, \beta=NA, \frac{\sigma}{\sqrt{2}}, \mu) \rightarrow \mathscr{N}(\mu, \sigma^2)$
  • Cauchy: $S(\alpha=1, \beta=0, \sigma, \mu)$
  • Levy: $S(\alpha=0.5, \beta=1, \sigma, \mu)$

Directly quoting Domicy and Veredas again:

"A related property of the $\alpha$-stable distribution is that it is a domain of attraction – this is often known as the Generalized Central Limit Theorem – which states that the only distribution that arises as limit from sums of i.i.d. random variables (suitably scaled and centered) is the $\alpha$-stable distribution."

Combine this with the observation that the Normal distribution is the only stable distribution with finite variance. May it be that many distributions we care about have no second moment? (This should be of particular interest to anyone wishing to use ABMs, which typically have convolutions of multiple distributions.)

Let's Play with Simulated Data

Fortunately SciPy has a (somewhat limited) levy-stable distribution in the stats library. Let's test out the special cases of levy-stable.


In [1]:
# Import libraries
from __future__ import print_function, division
import scipy.stats as stats
import pylab as plt
import numpy as np

In [3]:
# Let's draw some rvs from the gaussian special case, and compare to the 
# appropriate guassian from the stats library
# Form: rvs(alpha, beta, loc=0, scale=1, size=1, random_state=None)
n = 10000
mu=0.0
sigma=1.0
gaussian_special_case_rvs = stats.levy_stable.rvs(alpha=2, beta=1, loc=mu, scale=sigma/np.sqrt(2), size=n)

gaussian_mean = np.mean(gaussian_special_case_rvs)
gaussian_std = np.std(gaussian_special_case_rvs)
print("mean:", gaussian_mean)
print("stdev:", gaussian_std)

x_range = np.linspace(-4,4,1000)    #random.normal.rvs(loc=test_mean, scale=test_std)

plt.hist(gaussian_special_case_rvs, bins=21, normed=True)
plt.plot(x_range, stats.norm.pdf(x_range, loc=gaussian_mean, scale=gaussian_std) )
plt.show()


mean: 0.014236791461
stdev: 1.00867489694

In [4]:
# Let's look at a cauchy:
cauchy_special_case_rvs = stats.levy_stable.rvs(alpha=1, beta=0, loc=mu, scale=sigma/2.0, size=n)

cauchy_mean = np.mean(cauchy_special_case_rvs)
#cauchy_std = np.std(cauchy_special_case_rvs)
print("mean:", cauchy_mean)
#print("stdev:", cauchy_std)

#x_range = np.linspace(-4,4,1000)    #random.normal.rvs(loc=test_mean, scale=test_std)

# Let's trim to look at in hist:
trimval = 10
trimndx =  abs(cauchy_special_case_rvs) < trimval
print("fraction vals < trimval="+str(trimval)+ ": ",  sum(trimndx)/len(trimndx) )
plot_rvs = cauchy_special_case_rvs[trimndx]

x_range = np.linspace(-trimval,trimval,1000)

plt.hist(plot_rvs, bins=51, normed=True)
plt.plot(x_range, stats.cauchy.pdf(x_range, loc=mu, scale=sigma/2.0) )
plt.show()


mean: 0.306210593497
fraction vals < trimval=10:  0.9712

In [5]:
# Let's look at a Levy:
levy_special_case_rvs = stats.levy_stable.rvs(alpha=0.5, beta=1.0, loc=mu, scale=sigma/2.0, size=n)

levy_mean = np.mean(levy_special_case_rvs)
print("mean:", levy_mean)

# Let's trim to look at in hist:
trimval = 10
trimndx =  abs(levy_special_case_rvs) < trimval
print("fraction vals < trimval="+str(trimval)+ ": ",  sum(trimndx)/len(trimndx) )
plot_rvs = levy_special_case_rvs[trimndx]

x_range = np.linspace(0.0,trimval,1000)

plt.hist(plot_rvs, bins=51, normed=True)
plt.plot(x_range, stats.levy.pdf(x_range, loc=mu, scale=sigma/2.0) )
plt.show()


mean: 98538.1572388
fraction vals < trimval=10:  0.8156

Excellent. Not shown: small perturbations of the $\alpha, \beta$ parameters show how the distributions "move away" from the special cases as the paramaterization moves away.

Next steps: generate the data needed for the MSQ estimation experiments.

Notes:

  • [ ] Need to determine how to standardize sample when don't know norming parameters.
    • specifically, need to querry about how to standardize.
    • [x] NOTE: ASSUME that this is "standard" standardization. See if this replicates.
  • [ ] code two versions of the THETA function, and one THETAR function
  • [ ] generate a sample, normalize it, and run it through the THETAr funtion
  • [ ] then let's try this thing.

Additional Reading

To explore further:


In [5]:



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-69c2b34142dc> in <module>()
      1 stats.levy_stable.rvs(alpha=1.2, beta=0.5, 
----> 2                                  loc=1, scale=1, size=(5,10), seed=1234)

/home/npalmer/anaconda2/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.pyc in rvs(self, *args, **kwds)
    931         discrete = kwds.pop('discrete', None)
    932         rndm = kwds.pop('random_state', None)
--> 933         args, loc, scale, size = self._parse_args_rvs(*args, **kwds)
    934         cond = logical_and(self._argcheck(*args), (scale >= 0))
    935         if not np.all(cond):

TypeError: _parse_args_rvs() got an unexpected keyword argument 'seed'

In [ ]: