In [3]:

import numpy as np
import scipy.stats as ss
from scipy.stats import norm
from scipy.special import erf,erfc
import matplotlib.pyplot as plt
%matplotlib inline



From Reed $$f(x) = \frac{\alpha\beta}{\alpha+\beta}\left[A(\alpha,\nu,\tau^2)x^{-\alpha-1}\Phi\left(\frac{\log x-\nu-\alpha \tau^2}{\tau} \right)+A(-\beta,\nu,\tau^2)x^{\beta-1}\Phi^c\left(\frac{\log x-\nu+\beta \tau^2}{\tau} \right)\right]$$ where $A(\theta,\nu,\tau^2) = \exp\left(\theta \nu + \theta^2\tau^2/2\right)$. Parameter shapes are something like

• $\nu$ is like the mean of the lognormal
• $\alpha,\beta$ are the PL slopes of the tails
• $\tau^2$ is the precision of the normal


In [4]:

def dpln(x,alpha=1,beta=0.5,nu=-1,tau2=1):
A1 = np.exp(alpha*nu+alpha**2*tau2/2)
A2 = np.exp(-beta*nu+beta**2*tau2/2)
term1 = A1*x**(-alpha-1)*norm.cdf((np.log(x)-nu-alpha*tau2)/tau2**0.5)
term2 = A2*x**(beta-1)*norm.sf((np.log(x)-nu+beta*tau2)/tau2**0.5)
fofx = alpha*beta/(alpha+beta)*(term2+term1)
return fofx



This also suggests the Power-log normal, which is just one side of the equation (and also likely needs a proper normalization). $$f(x) = \alpha x^{-\alpha-1}\Phi\left(\frac{\log x-\nu-\alpha \tau^2}{\tau} \right)$$

If we consider the dpln distribution in the limit as $$\beta \rightarrow \infty,$$ we find the above since dpln may be written as mixture of the limiting cases (and fitted as such).



In [5]:

def pln(x,alpha=1,tau2=1,nu=1):
A1 = np.exp(alpha*nu+alpha**2*tau2/2)
fofx = alpha*x**(-alpha-1)*norm.cdf((np.log(x)-nu-alpha*tau2)/tau2**0.5)
return fofx




In [6]:

trialx = np.logspace(-2,3,100)
pdf = dpln(trialx,alpha=1,nu=-1, beta = 3.0, tau2=0.50)
pdf2 = pln(trialx, alpha=1,nu=-1,tau2=0.5)
plt.loglog(trialx,pdf,label='Double power lognorm')
plt.loglog(trialx,pdf2,label='Power lognorm')
plt.legend(loc=4)




Out[6]:

<matplotlib.legend.Legend at 0x10d976d50>



Using the scipy rv_continuous class, we can define dPlN as a subclass.



In [7]:

from dpln_distrib import dpln



Let's load in some data to test it...



In [8]:

from astropy.io.fits import getdata
img = getdata("../testingdata/hd22.13co.intintensity.fits")
data = img[np.isfinite(img)]




WARNING: AstropyDeprecationWarning: Config parameter 'enabled_record_valued_keyword_cards' in section [io.fits] of the file '/Users/ekoch/.astropy/config/astropy.cfg' is deprecated. Use 'enable_record_valued_keyword_cards' in section [fits] instead. [astropy.config.configuration]
WARNING:astropy:AstropyDeprecationWarning: Config parameter 'enabled_record_valued_keyword_cards' in section [io.fits] of the file '/Users/ekoch/.astropy/config/astropy.cfg' is deprecated. Use 'enable_record_valued_keyword_cards' in section [fits] instead.



scipy's general definitions are a little weird, and dpln will likely need to be adjusted to some degree (like doing away with nu and tau2 in favour of the built-ins). For now, I've found it necessary to freeze the loc parameter to zero. Also parameter constraints should assist with the fitting.



In [9]:

fit = dpln.fit(data, floc=0.0, fscale=1.0)
print(fit)




/Users/ekoch/anaconda/lib/python2.7/site-packages/scipy/integrate/quadpack.py:321: IntegrationWarning: Extremely bad integrand behavior occurs at some points of the
integration interval.
warnings.warn(msg, IntegrationWarning)
/Users/ekoch/anaconda/lib/python2.7/site-packages/scipy/integrate/quadpack.py:321: IntegrationWarning: The integral is probably divergent, or slowly convergent.
warnings.warn(msg, IntegrationWarning)

(1.2231409354398668, 1.4283787570663817, 1.3551816091802553, 7.9469599648260317e-10, 0.0, 1.0)

/Users/ekoch/anaconda/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:2075: RuntimeWarning: invalid value encountered in double_scalars
Lhat = muhat - Shat*mu



How did it do?



In [10]:

plt.hist(data, bins=100, normed=1, alpha=0.1)
print(dpln.ppf(0.05, *fit), dpln.ppf(0.95, *fit))
x = np.linspace(dpln.ppf(0.001, *fit), dpln.ppf(0.95, *fit), 1000)
plt.plot(x, dpln.pdf(x, *fit))




(3.8774799664443576, 27.07614232728028)

Out[10]:

[<matplotlib.lines.Line2D at 0x111896550>]



Clearly, there's still some issues with this fit... Now try the single powerlaw form.



In [11]:

from pln_distrib import pln




In [18]:

data = data[data<15]
fit2 = pln.fit(data, fscale=1.0, floc=0.0)
print(fit2)




(1.9385391274499997, 1.0032408991775963, 0.051442094897371768, 0.0, 1.0)




In [19]:

plt.hist(data, bins=100, normed=1)
print(pln.ppf(0.01, *fit2))
x2 = np.linspace(pln.ppf(0.01, *fit2), pln.ppf(0.99, *fit2), 1000)
plt.plot(x2, pln.pdf(x2, *fit2))
print(pln.stats(*fit2, moments='mvsk'))




1.92250944215
(array(5.779558763617037), array(-293.396422840921), array(nan), array(-3.6325896982437436))



This seems to have worked fairly well. To ensure the code is working properly, let's try out some of the features.



In [20]:

rv = pln(fit2[0], fit2[1], fit2[2], loc=fit2[3], scale=fit2[4])
plt.hist(data, bins=100, color="b", alpha=0.3, normed=1)
plt.hist(rv.rvs(size=5000), bins=200, color="g", alpha=0.4, normed=1)
plt.plot(x2, rv.pdf(x2))
# plt.xlim([0,30])




Out[20]:

[<matplotlib.lines.Line2D at 0x112302990>]




In [24]:

plt.plot(x2, rv.cdf(x2))




Out[24]:

[<matplotlib.lines.Line2D at 0x113a2ab50>]



This looks pretty good. In log-scale...



In [25]:

plt.hist(data, bins=100, log=True, normed=1, alpha=0.3)
plt.hist(rv.rvs(size=5000), bins=200, color="g", alpha=0.2, normed=1, log=True)
plt.plot(x2, rv.pdf(x2))
plt.xlim([0, 40])




Out[25]:

(0, 40)



There's some obvious discrepancies here. The fit hasn't been able to pick up the proper power-law for the tail.



In [23]: