Power Law



In [ ]:

    
import numpy as np
import numpy.random as rnd
import numpy.linalg as la

import matplotlib.pyplot as plt
%matplotlib inline

## Stop wit hthis nonsense already!
# plt.xkcd() <- this must not be overused!
plt.style.use( "ggplot" )

Generating Power Law Distribution

So, the pdf of the power law distribution is $$ p(x) = Cx^{-\alpha}\text{, } $$ where $C$ is normalization constant and $\alpha>0$ is called as exponent of the distribution.

From the lecture we know that (unless seminar starts before. If it is so, let's derive...)$$C = \frac{\alpha - 1}{x_{\text{min}}^{- \alpha + 1}}.$$ Let's try to generate power law random variable (and pretend we don't know np.pareto() ).

EXERCISE 1

The first step is to derive cdf for powel law: $F(x) = P(X \leq x)$

$$ F(x) = 1 - \int_{x}^\infty p(t) dt$$

(here some random student goes to the whiteboard...)



In [ ]:

    
numStud = 10
print 'Student {0:2.0f} is going to the whiteboard'.format(np.round(rnd.random()*(numStud-1) + 1))

SOLUTION

$$F(x) = 1 - (\frac{x}{x_\text{min}})^{ - \alpha + 1} $$

EXERCISE 2

If we define a random variable $R$, s.t. $R = F(X)$, then $R$ will be uniformly distributed on interval [0, 1].

Good thing here is that we easily can generate uniformly distributed pseudorandom numbers. Let's find an expression for $x = F^{-1}(r)$.

SOLUTION

$$x = (1 - r)^{\frac{-1}{\alpha - 1}} x_\text{min} $$

EXERCISE 3

1. Generate 10000 uniformly distributed pseudorandom numbers. Set alpha-1 = - 2.5 and x_min = 1
2. Produce power law!
3. Plot histogram of the results (instead of using plt.histogram() use np.histogram( ,bins = 5000 ))
4. See something unpleasant?



In [ ]:

    
## Define the parameters of a paritcualr power law random variable
x_min = 10 ; beta = 2.3

## Generate some variates
size = 10000

## First sample some \text{Exp}(\beta-1) random variates
exp = - np.log( rnd.uniform( size = size ) ) / ( beta - 1 )

## Then using the inverse sampling technique generate power law-distributed variates.
pwr = x_min * np.exp( exp )

## Build an equi-bin histogram
freq, bin_edges = np.histogram( pwr, bins = 5000 )
bincenters = .5 * (bin_edges[1:] + bin_edges[:-1])
binwidths = (bin_edges[1:] - bin_edges[:-1])

## Plot
plt.figure(1, figsize=(10, 10))
plt.loglog( bincenters, freq, "xb" )

Exponent Estimation

EXERCISE 4

Given the results you've obtained in the previous section, try to estimate $\alpha$ via Linear Regression (don't forget to take $\log$)



In [ ]:

    
def lm( X, Y, intercept = True):
    T = np.asarray( Y )
    M = np.asarray( X )
    if intercept is True :
        M = np.vstack( [ np.ones( len( Y ) ), M ] ).T
## (X'X)^{-1} (X'Y)
    coef = np.dot(
        la.inv(
            np.dot( M.T, M ) ),
        np.dot( M.T, T ) )
    return (M, coef)

## Extimate the model, but first remove the undersampled bins
idx = np.ix_( freq > 0 )
bincenters, binwidths, freq = bincenters[ idx ], binwidths[ idx ],  freq[ idx ]

M, coef = lm( np.log( bincenters ), np.log( freq ), True )
freq_fit = np.dot( M, coef )

## Plot
plt.figure(1, figsize = (10,10) )
plt.loglog( bincenters, freq, "xr")
plt.loglog( bincenters, np.exp( freq_fit ), "b-")

Probably, you won't be satisfied with the results.. Btw, why?

Thankfully, there are some options to solve the problem:

1. Exponential binning
2. CDF estimation
3. Likelihood estimation

During the seminar we will work with 1 (and maybe 2).

EXERCISE 5

Perform exponential binning, that is, instead of using uniformal bins, spread them with log-scaling

hint: use use np.logspace()



In [ ]:

    
# Write your code here..
#
#
bins = np.logspace(0, 6, base = 2, num = 10)
hist, bin_edges = np.histogram( pwr, bins )



In [ ]: