Stable distributions, also known as $\alpha$-stable distributions and Lévy (alpha) stable distribution, are the distibutions with 4 parameters:
If two independent random variables follow stable distribution, then their linear combinations also have the same distribution up to scale and location parameters[1]. That is to say, if two random variables are generated using the same stability and skewness parameters, their linear combinations would have the same ($\alpha$ and $\beta$) parameters. Note that parameterization is different for multivariate stable distributions but in this work we are only interested in the univariate version.
The distrubution has several attractive properties such as including infinite variance/skewness and having heavy tails, and therefore has been applied in many domains including statistics, finance, signal processing, etc[2]. Some special cases of stable distribution are are as follows:
Stable distributions have no general analytic expressions for the density, median, mode or entropy. On the other hand, it is possible to generate random variables given fixed parameters [2,3]. Therefore, ABC techniques are suitable to estimate the unknown parameters of stable distributions. In this notebook, we present how the estimation can be done using ELFI framework. A good list of alternative methods for the parameter estimation is given in [2].
In [12]:
import elfi
import scipy.stats as ss
import numpy as np
import matplotlib.pyplot as plt
import pickle
# http://www.ams.sunysb.edu/~yiyang/research/presentation/proof_simulate_stable_para_estimate.pdf
def stable_dist_rvs(alpha,beta,mu,sig,Ns=200,batch_size=1,random_state=None):
'''
generates random numbers from stable distribution
Input
alpha - stability parameter
beta - skewness parameter
c - scale parameter
mu - location parameter
N - number of random numbers
Ns - number of samples in each batch
batch_size
random_state
sigma in the last column of each batch
'''
assert(np.all(alpha!=1))
N = batch_size
Rs = np.zeros((batch_size,Ns))
for i in range(len(alpha)):
U = ss.uniform.rvs(size=Ns,random_state=random_state)*np.pi - np.pi/2
E = ss.expon.rvs(size=Ns,random_state=random_state)
S = (1 + (beta[i]*np.tan(np.pi*alpha[i]/2))**2 )**(1/2/alpha[i])
B = np.arctan(beta[i]*np.tan(np.pi*alpha[i]/2)) / alpha[i]
X = S * np.sin(alpha[i]*(U+B)) / (np.cos(U)**(1/alpha[i])) * \
(np.cos(U-alpha[i]*(U+B))/E)**((1-alpha[i])/alpha[i])
R = sig[i]*X + mu[i]
Rs[i,:] = R.reshape((1,-1))
return Rs
# check the simulator is correct
randstate = np.random.RandomState(seed=102340)
y0 = stable_dist_rvs([1.7],[0.9],[10],[10],200,1,random_state=randstate)
randstate = np.random.RandomState(seed=102340)
y0scipy = ss.levy_stable.rvs(1.7,0.9,10,10,200,random_state=randstate).reshape((1,-1))
print(np.mean(y0) - np.mean(y0scipy))
print(np.var(y0) - np.var(y0scipy))
We first see how changing stability parameter while keeping the others same affect the distribution. Histograms indicate that increased $\alpha$ values yield samples that are closer to the mean, which is zero in these figures. Observe that this is different than variance since only very few samples are scattered, so $\alpha$ does not control the variance around the mean.
In [16]:
alphas = np.array([1.4, 1.6, 1.8, 1.99])
betas = np.array([-0.99, -0.5, 0.5, 0.99])
mus = np.array([-200, -100, 0, 100])
sigs = np.array([0.1, 1, 5, 100])
beta = 0
mu = 10
sig = 5
y = stable_dist_rvs(alphas,np.repeat(beta,4),np.repeat(mu,4),\
np.repeat(sig,4),Ns=1000,batch_size=4)
plt.figure(figsize=(20,5))
for i in range(4):
plt.subplot(1,4,i+1)
plt.hist(y[i,:], bins=10)
plt.title("({0:.3g}, {1:.3g}, {2:.3g}, {3:.3g})".format(alphas[i],beta,mu,sig),fontsize=18)
In [3]:
alpha = 1.4
mu = 0
sig = 0.1
y = stable_dist_rvs(np.repeat(alpha,4),betas,np.repeat(mu,4),np.repeat(sig,4),Ns=200,batch_size=4)
plt.figure(figsize=(20,5))
for i in range(4):
plt.subplot(1,4,i+1)
plt.hist(y[i,:])
plt.title("({0:.3g}, {1:.3g}, {2:.3g}, {3:.3g})".format(alpha,betas[i],mu,sig),fontsize=18)
In [4]:
alpha = 1.9
beta = 0.5
sigma = 1
y = stable_dist_rvs(np.repeat(alpha,4),np.repeat(beta,4),mus,\
np.repeat(sigma,4),Ns=1000,batch_size=4)
plt.figure(figsize=(20,5))
for i in range(4):
plt.subplot(1,4,i+1)
plt.hist(y[i,:])
plt.title("({0:.3g}, {1:.3g}, {2:.3g}, {3:.3g})".format(alpha,beta,mus[i],sigma),fontsize=18)
In [5]:
alpha = 1.8
beta = 0.5
mu = 0
y = stable_dist_rvs(np.repeat(alpha,4),np.repeat(beta,4),np.repeat(mu,4),sigs,Ns=1000,batch_size=4)
plt.figure(figsize=(20,5))
for i in range(4):
plt.subplot(1,4,i+1)
plt.hist(y[i,:])
plt.title("({0:.3g}, {1:.3g}, {2:.3g}, {3:.3g})".format(alpha,beta,mu,sigs[i]),fontsize=18)
In [6]:
y = stable_dist_rvs(np.array([2]),np.array([0]),np.array([0]),np.array([1]),Ns=10000,batch_size=1)
plt.figure(figsize=(4,4))
plt.hist(y[0,:],bins=10)
plt.title('Standard Gaussian samples drawn using simulator');
To analyze the performance of likelihood-free inference methods on this problem, we follow a similar experiment setup as presented in [2]:
Similar to [4,2], we consider a restricted domain for the priors:
Limiting the domains of $\mu$ and $\sigma$ is meaningful for practical purposes. We also restrict the possible values that $\alpha$ can take because summary statistics are defined for $\alpha>1$.
In [17]:
elfi.new_model()
alpha = elfi.Prior(ss.uniform, 1.1, 0.9)
beta = elfi.Prior(ss.uniform, -1, 2)
mu = elfi.Prior(ss.uniform, -300, 600)
sigma = elfi.Prior(ss.uniform, 0, 300)
In this section, we give brief explanations for 5 sets of summary statistics. Last four summary vectors are based on characteristic function of stable distribution. Thus the expressions for the summary vectors are too complex to discuss here. We give the exact formulas of the first set of summary statistics and those of the last four items below can be checked out in Section 3.1.1 of [2].
McCulloch has developed a method for estimating the parameters based on sample quantiles [5]. He gives consistent estimators, which are the functions of sample quantiles, of population quantiles, and provides tables for recovering $\alpha$ and $\beta$ from the estimated quantiles. Note that in contrast to his work, we use the sample means as the summary for the location parameter $\mu$. If $\hat{q}_p(x)$ denotes the estimate for $p$'th quantile, then the statictics are as follows: \begin{align} \hat{v}_\alpha = \frac{\hat{q}_{0.95}(\cdot)-\hat{q}_{0.05}(\cdot)}{\hat{q}_{0.75}(\cdot)-\hat{q}_{0.25}(\cdot)} \qquad \hat{v}_\beta = \frac{\hat{q}_{0.95}(\cdot)+\hat{q}_{0.05}(\cdot)-2\hat{q}_{0.5}(\cdot)}{\hat{q}_{0.95}(\cdot)-\hat{q}_{0.5}(\cdot)} \qquad \hat{v}_\mu = \frac{1}{N}\sum_{i=1}^N x_i \qquad \hat{v}_\sigma = \frac{\hat{q}_{0.75}(\cdot)-\hat{q}_{0.25}(\cdot)}{\sigma} \end{align} where the samples are denoted by $x_i$. In practice, we observe that $\frac{1}{\sigma}$ term in $\hat{v}_\sigma$ detoriates the performance, so we ignore it.
Zolotarev gives an alternative parameterization of stable distribution in terms of its characteristic function [6]. The characteristic function of a probability density function is simply its Fourier transform and it completely defines the pdf [7]. More formally, the characteristic function of a random variable $X$ is defined to be \begin{align} \phi_X(t) = \mathbb{E}[e^{itX}] \end{align} The exact statistics are not formulated here as it would be out of scope of this project but one can see, for example [6] or [2] for details.
By evaluating the characteristic function at particular time points, it is possible to obtain the method of moment equations [8]. In turn, these equations can be used to obtain estimates for the model parameters. We follow the recommended evaluation time points in [2].
The formula for empirical characteristic function is \begin{align} \hat{\phi}_X(t) = \frac{1}{n} \sum_{i=1}^N e^{itX_i} \end{align} where $X_i$ denotes the samples and $t \in (-\infty,\infty)$. So, the extracted statistics are $\left(\hat{\phi}_X(t_1),\hat{\phi}_X(t_2),\ldots,\hat{\phi}_X(t_20\right)$ and $t=\left\{ \pm 0.5, \pm 1 \ldots \pm 5 \right\}$
The Kolmogorov–Smirnov statistic measures the maximum absolute distance between a cumulative density function and the empirical distribution function, which is defined as a step function that jumps up by $1/n$ at each of the $n$ data points. By computing the statistic on two empricial distribution functions (rather than a cdf), one can test whether two underlying one-dimensional probability distributions differ. Observe that in ABC setting we compare the empirical distribution function of the observed data and the data generated using a set of candidate parameters. In addition to Kolmogorov–Smirnov statistic, we include the mean and a set of quantiles $\hat{q}_p(x)$ where $p \in \{0.01, 0.05, 0.1, 0.15, \ldots, 0.9, 0.95, 0.99\}$.
In [18]:
def S1(X):
q95 = np.percentile(X,95,axis=1)
q75 = np.percentile(X,75,axis=1)
q50 = np.percentile(X,50,axis=1)
q25 = np.percentile(X,25,axis=1)
q05 = np.percentile(X,5,axis=1)
Xalpha = (q95-q05) / (q75-q25)
Xbeta = (q95+q05-2*q50) / (q95-q05)
Xmu = np.mean(X,axis=1)
Xsig = (q75-q25)
return np.column_stack((Xalpha,Xbeta,Xmu,Xsig))
def S1_alpha(X):
X = S1(X)
return X[:,0]
def S1_beta(X):
X = S1(X)
return X[:,1]
def S1_mu(X):
X = S1(X)
return X[:,2]
def S1_sigma(X):
X = S1(X)
return X[:,3]
def S2(X):
ksi = 0.25
N = int(np.floor((X.shape[1]-1)/3))
R = X.shape[0]
Z = np.zeros((R,N))
for i in range(N):
Z[:,i] = X[:,3*i] - ksi*X[:,3*i+1] - (1-ksi)*X[:,3*i+2]
V = np.log(np.abs(Z))
U = np.sign(Z)
sighat = np.mean(V,1)
betahat = np.mean(U,1)
t1 = 6/np.pi/np.pi*np.var(V,1) - 3/2*np.var(U,1)+1
t2 = np.power(1+np.abs(betahat),2)/4
alphahat = np.max(np.vstack((t1,t2)),0)
muhat = np.mean(X,axis=1)
return np.column_stack((alphahat,betahat,muhat,sighat))
def S2_alpha(X):
X = S2(X)
return X[:,0]
def S2_beta(X):
X = S2(X)
return X[:,1]
def S2_mu(X):
X = S2(X)
return X[:,2]
def S2_sigma(X):
X = S2(X)
return X[:,3]
def S3(X):
t = [0.2, 0.8, 1, 0.4]
pht = np.zeros((X.shape[0],len(t)))
uhat = np.zeros((X.shape[0],len(t)))
for i in range(len(t)):
pht[:,i] = np.mean(np.exp(1j*t[i]*X),1)
uhat[:,i] = np.arctan( np.sum(np.cos(t[i]*X),1) / np.sum(np.sin(t[i]*X),1) )
alphahat = np.log(np.log(np.abs(pht[:,0]))/np.log(np.abs(pht[:,1]))) / np.log(t[0]/t[1])
sighat = np.exp( (np.log(np.abs(t[0]))*np.log(-np.log(np.abs(pht[:,1]))) - \
np.log(np.abs(t[1]))*np.log(-np.log(np.abs(pht[:,0]))))/ \
np.log(t[0]/t[1]) )
betahat = (uhat[:,3]/t[3]-uhat[:,2]/t[2]) / np.power(sighat,alphahat) / np.tan(alphahat*np.pi/2) \
/ (np.power(np.abs(t[3]),alphahat-1) - np.power(np.abs(t[2]),alphahat-1) )
muhat = (np.power(np.abs(t[3]),alphahat-1)*uhat[:,3]/t[3] - np.power(np.abs(t[2]),alphahat-1)*uhat[:,2]/t[2]) \
/ (np.power(np.abs(t[3]),alphahat-1) - np.power(np.abs(t[2]),alphahat-1) )
return np.column_stack((alphahat,betahat,muhat,sighat))
def S3_alpha(X):
X = S3(X)
return X[:,0]
def S3_beta(X):
X = S3(X)
return X[:,1]
def S3_mu(X):
X = S3(X)
return X[:,2]
def S3_sigma(X):
X = S3(X)
return X[:,3]
def S4(X):
Xc = X.copy()
Xc = Xc - np.median(Xc,1).reshape((-1,1))
Xc = Xc / 2 / ss.iqr(Xc,1).reshape((-1,1))
ts = np.linspace(-5,5,21)
phts = np.zeros((Xc.shape[0],len(ts)))
for i in range(len(ts)):
phts[:,i] = np.mean(np.exp(1j*ts[i]*Xc),1)
return phts
def S5(X):
N = X.shape[0]
R = np.zeros((N,23))
kss = np.zeros(N).reshape((N,1))
for i in range(N):
R[i,0] = ss.ks_2samp(X[i,:],z.flatten()).statistic
R[:,1] = np.mean(X,axis=1)
qs = np.linspace(0,100,21)
qs[0] = 1
qs[-1] = 99
for i in range(21):
R[:,i+2] = np.percentile(X,qs[i],axis=1)
return R
Below, I visualize the ELFI graph. Note that the sole purpose of the below cell is to draw the graph. Therefore, in the next cell, I re-create the initial model (and the priors).
In [26]:
sim = elfi.Simulator(stable_dist_rvs,alpha,beta,mu,sigma,observed=None)
SS1 = elfi.Summary(S1, sim)
SS2 = elfi.Summary(S2, sim)
SS3 = elfi.Summary(S3, sim)
SS4 = elfi.Summary(S4, sim)
SS5 = elfi.Summary(S5, sim)
d = elfi.Distance('euclidean',SS1,SS2,SS3,SS4,SS5)
elfi.draw(d)
Out[26]:
In [27]:
elfi.new_model()
alpha = elfi.Prior(ss.uniform, 1.1, 0.9)
beta = elfi.Prior(ss.uniform, -1, 2)
mu = elfi.Prior(ss.uniform, -300, 600)
sigma = elfi.Prior(ss.uniform, 0, 300)
In [9]:
Ns = 200
alpha0 = np.array([1.7,1.2,0.7])
beta0 = np.array([0.9,0.3,0.3])
mu0 = np.array([10,10,10])
sig0 = np.array([10,10,10])
y0 = stable_dist_rvs(alpha0,beta0,mu0,sig0,Ns,batch_size=3)
plt.figure(figsize=(18,6))
plt.subplot(1,3,1)
plt.hist(y0[0,:])
plt.title('Data Set 1 (Easy)', fontsize=14)
plt.subplot(1,3,2)
plt.hist(y0[1,:])
plt.title('Data Set 2 (Medium)', fontsize=14)
plt.subplot(1,3,3)
plt.hist(y0[2,:])
plt.title('Data Set 3 (Hard)', fontsize=14);
z = y0[0,:].reshape((1,-1))
In [20]:
class Experiment():
def __init__(self, inf_method, dataset, sumstats, quantile=None, schedule=None, Nsamp=100):
'''
inf_method - 'SMC' or 'Rejection'
dataset - 'Easy', 'Medium' or 'Hard'
sumstats - 'S1', 'S2', 'S3', 'S4' or 'S5'
'''
self.inf_method = inf_method
self.dataset = dataset
self.sumstats = sumstats
self.quantile = quantile
self.schedule = schedule
self.Nsamp = Nsamp
self.res = None
def infer(self):
# read the dataset
if self.dataset == 'Easy':
z = y0[0,:].reshape((1,-1))
elif self.dataset == 'Medium':
z = y0[1,:].reshape((1,-1))
elif self.dataset == 'Hard':
z = y0[2,:].reshape((1,-1))
sim = elfi.Simulator(stable_dist_rvs,alpha,beta,mu,sigma,observed=z)
ss_func = globals().get(self.sumstats)
SS = elfi.Summary(ss_func, sim)
d = elfi.Distance('euclidean',SS)
# run the inference and save the results
if self.inf_method == 'SMC':
algo = elfi.SMC(d, batch_size=Ns)
self.res = algo.sample(self.Nsamp, self.schedule)
elif self.inf_method == 'Rejection':
algo = elfi.Rejection(d, batch_size=Ns)
self.res = algo.sample(self.Nsamp, quantile=self.quantile)
In [ ]:
datasets = ['Easy','Medium','Hard']
sumstats = ['S1','S2','S3','S4','S5']
inferences = ['Rejection','SMC']
exps_alt = []
for dataset in datasets:
for sumstat in sumstats:
rej = Experiment('Rejection',dataset,sumstat,quantile=0.01,Nsamp=1000)
rej.infer()
exps_alt.append(rej)
schedule = [rej.res.threshold*4, rej.res.threshold*2, rej.res.threshold]
smc = Experiment('SMC',dataset,sumstat,schedule=schedule,Nsamp=1000)
smc.infer()
exps_alt.append(smc)
Now, we try to infer each model parameter separately. This way of inference applies to the first three summary statictics. More concretely, our summary statistics will be not vectors but just some real numbers which are informative about only a single parameter. Note that we repeat the procedure for all four parameters.
In [ ]:
params = ['alpha','beta','mu','sigma']
exps_sep = []
for dataset in datasets:
for sumstat in sumstats[0:3]:
for param in params:
sumstat_ = sumstat + '_' + param
rej = Experiment('Rejection',dataset,sumstat_,quantile=0.01,Nsamp=250)
rej.infer()
exps_sep.append(rej)
schedule = [rej.res.threshold*4, rej.res.threshold*2, rej.res.threshold]
smc = Experiment('SMC',dataset,sumstat_,schedule=schedule,Nsamp=250)
smc.infer()
exps_sep.append(smc)
file_pi = open('exps_sep.obj', 'wb')
pickle.dump(exps_sep, file_pi)
In [27]:
from IPython.display import display, Markdown, Latex
def print_results_alt(dataset,inference,exps_):
nums = []
for sumstat in sumstats:
exp_ = [e for e in exps_ if e.sumstats==sumstat][0]
for param in params:
nums.append(np.mean(exp_.res.outputs[param]))
nums.append(np.std(exp_.res.outputs[param]))
if dataset == 'Easy':
ds_id = 0
elif dataset == 'Medium':
ds_id = 1
elif dataset == 'Hard':
ds_id = 2
tmp = '### {0:s} Data Set - {1:s}'.format(dataset,inference)
display(Markdown(tmp))
tmp = "| Variable | True Value | S1 | S2 | S3 | S4 | S5 | \n \
|:-----: | :----------: |:-------------:|:-----:|:-----:|:-----:|:-----:| \n \
| {44:s} | {40:4.2f} | {0:4.2f} $\pm$ {1:4.2f} | {8:4.2f} $\pm$ {9:4.2f} | {16:4.2f} $\pm$ {17:4.2f} | {24:4.2f} $\pm$ {25:4.2f} | {32:4.2f} $\pm$ {33:4.2f} | \n \
| {45:s} | {41:4.2f} | {2:4.2f} $\pm$ {3:4.2f} | {10:4.2f} $\pm$ {11:4.2f} | {18:4.2f} $\pm$ {19:4.2f} | {26:4.2f} $\pm$ {27:4.2f} | {34:4.2f} $\pm$ {35:4.2f} | \n \
| $\mu$ | {42:4.2f} | {4:4.2f} $\pm$ {5:4.2f} | {12:4.2f} $\pm$ {13:4.2f} | {20:4.2f} $\pm$ {21:4.2f} | {28:4.2f} $\pm$ {29:4.2f} | {36:4.2f} $\pm$ {37:4.2f} | \n \
| $\gamma$ | {43:4.2f} | {6:4.2f} $\pm$ {7:4.2f} | {14:4.2f} $\pm$ {15:4.2f} | {22:4.2f} $\pm$ {23:4.2f} | {30:4.2f} $\pm$ {31:4.2f} | {38:4.2f} $\pm$ {39:4.2f} |".format(\
nums[0],nums[1],nums[2],nums[3],nums[4],nums[5],nums[6],nums[7],nums[8],nums[9], \
nums[10],nums[11],nums[12],nums[13],nums[14],nums[15],nums[16],nums[17],nums[18],nums[19], \
nums[20],nums[21],nums[22],nums[23],nums[24],nums[25],nums[26],nums[27],nums[28],nums[29], \
nums[30],nums[31],nums[32],nums[33],nums[34],nums[35],nums[36],nums[37],nums[38],nums[39], \
alpha0[ds_id],beta0[ds_id],mu0[ds_id],sig0[ds_id],r'$\alpha$',r'$\beta$')
display(Markdown(tmp))
def print_results_sep(dataset,inference,exps_):
nums = []
for sumstat in sumstats[0:3]:
for param in params:
exp_ = [e for e in exps_ if str(e.sumstats)==str(sumstat)+'_'+param][0]
nums.append(np.mean(exp_.res.outputs[param]))
nums.append(np.std(exp_.res.outputs[param]))
if dataset == 'Easy':
ds_id = 0
elif dataset == 'Medium':
ds_id = 1
elif dataset == 'Hard':
ds_id = 2
tmp = '### {0:s} Data Set - {1:s}'.format(dataset,inference)
display(Markdown(tmp))
tmp = "| Variable | True Value | S1 | S2 | S3 | \n \
|:-----: | :----------: |:-------------:|:-----:|:-----:|:-----:|:-----:| \n \
| {28:s} | {24:4.2f} | {0:4.2f} $\pm$ {1:4.2f} | {8:4.2f} $\pm$ {9:4.2f} | {16:4.2f} $\pm$ {17:4.2f} | \n \
| {29:s} | {25:4.2f} | {2:4.2f} $\pm$ {3:4.2f} | {10:4.2f} $\pm$ {11:4.2f} | {18:4.2f} $\pm$ {19:4.2f} | \n \
| $\mu$ | {26:4.2f} | {4:4.2f} $\pm$ {5:4.2f} | {12:4.2f} $\pm$ {13:4.2f} | {20:4.2f} $\pm$ {21:4.2f} | \n \
| $\gamma$ | {27:4.2f} | {6:4.2f} $\pm$ {7:4.2f} | {14:4.2f} $\pm$ {15:4.2f} | {22:4.2f} $\pm$ {23:4.2f} |".format(\
nums[0],nums[1],nums[2],nums[3],nums[4],nums[5],nums[6],nums[7],nums[8],nums[9], \
nums[10],nums[11],nums[12],nums[13],nums[14],nums[15],nums[16],nums[17],nums[18],nums[19], \
nums[20],nums[21],nums[22],nums[23], \
alpha0[ds_id],beta0[ds_id],mu0[ds_id],sig0[ds_id],r'$\alpha$',r'$\beta$')
display(Markdown(tmp))
In [28]:
for inference in inferences:
for dataset in datasets:
results = [e for e in exps_alt if e.inf_method==inference and e.dataset==dataset]
print_results_alt(dataset,inference,results)
The tables may not show the performance clearly. Thus, we plot the marginals of 5 summary vectors when rejection sampling is executed on the easy data set. As can be seen, only $\mu$ is identified well. S1 estimates a somewhat good interval for $\sigma$. All the other plots are almost random draws from the prior.
In [29]:
exps_alt[0].res.plot_marginals();
exps_alt[1].res.plot_marginals();
exps_alt[2].res.plot_marginals();
exps_alt[3].res.plot_marginals();
exps_alt[4].res.plot_marginals();
In [35]:
for inference in inferences[0:1]:
for dataset in datasets:
results = [e for e in exps_sep if e.inf_method==inference and e.dataset==dataset]
print_results_sep(dataset,inference,results)
Now we take a look at the marginals for S1 only (Note that due to the design of the inference, the output of each experiment is the marginals of a single variable). Below are the marginals of the first 4 experiment, which correspond to $\alpha, \beta, \mu$ and $\sigma$ variables. True values of the parameters are given in paranthesis.
In [36]:
alphas = exps_sep[0].res.samples['alpha']
betas = exps_sep[2].res.samples['beta']
mus = exps_sep[4].res.samples['mu']
sigmas = exps_sep[6].res.samples['sigma']
plt.figure(figsize=(20,5))
plt.subplot(141)
plt.hist(alphas)
plt.title('alpha (1.7)')
plt.subplot(142)
plt.hist(betas)
plt.title('beta (0.9)')
plt.subplot(143)
plt.hist(mus)
plt.title('mu (10)')
plt.subplot(144)
plt.hist(sigmas)
plt.title('sigma (10)');
Unsurprisingly, the easier the data set is, the better the results are. The easiness of the data set is basically controlled by the spread of the data. So, we can conclude that if the samples are concentrated in a region, which happens when $\alpha$ is large, the estimation task gets easier.
Overall, we see that the estimates are much better when we infer them separately. A possible explanation could be the curse of dimensionality: It may be easier to make four searches in 1D rather than one search in 4D.
The best summary statistics seem to be S1 (quantiles), which have been used in this problem for centuries. When inferred separately with S1, the estimates are arguably good enough (and can always be better if we reduce the threshold). The only peculiarity is with $\alpha$, for which I have no good explanation. Nonetheless, I re-run the same experiments (with different data sets), and realized that $\alpha$ estimates are usually much better than that. Again, I am not fully able to understand why, but could be just the nature of the problem or the amount of data.
[1] https://en.wikipedia.org/wiki/Stable_distribution
[2] Peters, Gareth W., Scott A. Sisson, and Yanan Fan. "Likelihood-free Bayesian inference for α-stable models." Computational Statistics & Data Analysis 56.11 (2012): 3743-3756.
[3] Yang, Yiyang. "Simulate α-Stable Random Variable and Estimate Stable Parameters Based on Market Data." (2012).
[4] Buckle, D. J. "Bayesian inference for stable distributions." Journal of the American Statistical Association 90.430 (1995): 605-613.
[5] McCulloch, J. Huston. "Simple consistent estimators of stable distribution parameters." Communications in Statistics-Simulation and Computation 15.4 (1986): 1109-1136.
[6] Zolotarev, Vladimir M. One-dimensional stable distributions. Vol. 65. American Mathematical Soc., 1986.
[7] https://en.wikipedia.org/wiki/Characteristic_function_(probability_theory)
[8] Press, S. James. "Estimation in univariate and multivariate stable distributions." Journal of the American Statistical Association 67.340 (1972): 842-846.