In [ ]:
import numpy as np
import numpy.random as rnd
import numpy.linalg as la
import matplotlib.pyplot as plt
%matplotlib inline
## Stop wit hthis nonsense already!
# plt.xkcd() <- this must not be overused!
plt.style.use( "ggplot" )
So, the pdf of the power law distribution is $$ p(x) = Cx^{-\alpha}\text{, } $$ where $C$ is normalization constant and $\alpha>0$ is called as exponent of the distribution.
From the lecture we know that (unless seminar starts before. If it is so, let's derive...)$$C = \frac{\alpha - 1}{x_{\text{min}}^{- \alpha + 1}}.$$ Let's try to generate power law random variable (and pretend we don't know np.pareto() ).
EXERCISE 1
The first step is to derive cdf for powel law: $F(x) = P(X \leq x)$
$$ F(x) = 1 - \int_{x}^\infty p(t) dt$$(here some random student goes to the whiteboard...)
In [ ]:
numStud = 10
print 'Student {0:2.0f} is going to the whiteboard'.format(np.round(rnd.random()*(numStud-1) + 1))
SOLUTION
EXERCISE 2
If we define a random variable $R$, s.t. $R = F(X)$, then $R$ will be uniformly distributed on interval [0, 1].
Good thing here is that we easily can generate uniformly distributed pseudorandom numbers. Let's find an expression for $x = F^{-1}(r)$.
SOLUTION
EXERCISE 3
1. Generate 10000 uniformly distributed pseudorandom numbers. Set alpha-1 = - 2.5 and x_min = 1
2. Produce power law!
3. Plot histogram of the results (instead of using plt.histogram() use np.histogram( ,bins = 5000 ))
4. See something unpleasant?
In [ ]:
## Define the parameters of a paritcualr power law random variable
x_min = 10 ; beta = 2.3
## Generate some variates
size = 10000
## First sample some \text{Exp}(\beta-1) random variates
exp = - np.log( rnd.uniform( size = size ) ) / ( beta - 1 )
## Then using the inverse sampling technique generate power law-distributed variates.
pwr = x_min * np.exp( exp )
## Build an equi-bin histogram
freq, bin_edges = np.histogram( pwr, bins = 5000 )
bincenters = .5 * (bin_edges[1:] + bin_edges[:-1])
binwidths = (bin_edges[1:] - bin_edges[:-1])
## Plot
plt.figure(1, figsize=(10, 10))
plt.loglog( bincenters, freq, "xb" )
EXERCISE 4
Given the results you've obtained in the previous section, try to estimate $\alpha$ via Linear Regression (don't forget to take $\log$)
In [ ]:
def lm( X, Y, intercept = True):
T = np.asarray( Y )
M = np.asarray( X )
if intercept is True :
M = np.vstack( [ np.ones( len( Y ) ), M ] ).T
## (X'X)^{-1} (X'Y)
coef = np.dot(
la.inv(
np.dot( M.T, M ) ),
np.dot( M.T, T ) )
return (M, coef)
## Extimate the model, but first remove the undersampled bins
idx = np.ix_( freq > 0 )
bincenters, binwidths, freq = bincenters[ idx ], binwidths[ idx ], freq[ idx ]
M, coef = lm( np.log( bincenters ), np.log( freq ), True )
freq_fit = np.dot( M, coef )
## Plot
plt.figure(1, figsize = (10,10) )
plt.loglog( bincenters, freq, "xr")
plt.loglog( bincenters, np.exp( freq_fit ), "b-")
Probably, you won't be satisfied with the results.. Btw, why?
Thankfully, there are some options to solve the problem:
1. Exponential binning
2. CDF estimation
3. Likelihood estimation
During the seminar we will work with 1 (and maybe 2).
EXERCISE 5
Perform exponential binning, that is, instead of using uniformal bins, spread them with log-scaling
hint: use use np.logspace()
In [ ]:
# Write your code here..
#
#
bins = np.logspace(0, 6, base = 2, num = 10)
hist, bin_edges = np.histogram( pwr, bins )
In [ ]: