Scipy provides many user-friendly and efficient numerical routines, e.g. numerical integration and optimization. The full documentation is available at http://docs.scipy.org/doc/scipy/reference/.
We will use the provided tools to simulate and estimate and Ordinary Least Squares (OLS) regression: $Y = X\beta + \epsilon$
We will proceed in three steps:
Of course, OLS and other statistical models are readily available in the StatsModels Library http://statsmodels.sourceforge.net/.
In [7]:
# standard library
import numpy as np
# Parametrization
num_agents = 1000
num_covars = 3
betas_true = np.array([0.22, 0.30, -0.1]).T
sd_true = 0.01
# Sampling of observables
np.random.seed(123)
X = np.random.rand(num_agents, num_covars)
X[:,0] = 1
# Sampling disturbances
eps = np.random.normal(loc=0.0, scale=sd_true, size=num_agents)
# Create endogenous outcome
idx_true = np.dot(X, betas_true)
Y = idx_true + eps
# Checks
assert (X.dtype == 'float')
assert (Y.dtype == 'float')
assert (np.all(np.isfinite(X)))
assert (np.all(np.isfinite(Y)))
assert (X.shape == (num_agents, num_covars))
assert (Y.shape == (num_agents, ))
assert (np.all(X[:,0] == 1.0))
Let us have a look at the relationship.
In [3]:
%pylab inline
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_ylabel(r'$Y$'), ax.set_xlabel(r'$ X\beta $')
ax.plot(idx_true, Y, 'o')
Out[3]:
In [4]:
# Let us get the estimates.
beta_hat = np.dot(np.dot(np.linalg.inv(np.dot(X.T,X)), X.T), Y)
sd_hat = np.sqrt(np.var(Y - np.dot(X, beta_hat)))
# Let us have a look now.
print('Results for beta', beta_hat, ' Results for sd', sd_hat)
Let us now determine $\hat{\beta}$ using Maximum Likelihood (ML) estimation. So, we need to maximize the following log-likelihood function:
$$L(\beta, \sigma) = \sum_{i = 1, ..., N} \log\left(\frac{1}{\sigma}\phi\left(\frac{Y_i - X_i\beta}{\sigma}\right)\right)$$SciPy offers a convenient interface to alternative optimization algorithms. Let us check it out online.
In [ ]:
# standard library
from scipy.optimize import minimize
from scipy.stats import norm
# Auxiliary functions.
def sample_likelihood(paras, X, Y):
''' Construct sample likelihood.
'''
# Antibugging.
assert (isinstance(paras, np.ndarray))
assert (paras.dtype == 'float')
assert (X.ndim == 2), (Y.ndim == 2)
# Auxiliary objects.
num_agents = Y.shape[0]
# Summing over the sample.
contribs = 0.0
for i in range(num_agents):
contrib = individual_likelihood(paras, X[i,:], Y[i])
contribs += contrib
# Modifications.
contribs = np.mean(contribs)
# Finishing.
return contribs
def individual_likelihood(paras, x, y):
''' This function determines the an individual's contribution to the sample likelihood.
'''
# Antibugging.
assert (isinstance(paras, np.ndarray))
assert (paras.dtype == 'float')
assert (x.ndim == 1), (y.ndim == 1)
# Distribute parameters.
betas, sd = paras[:-1], paras[-1]
# Calculate likelihood contribution.
resid = (y - np.dot(x, betas))/sd
contrib = (1.0/sd)*norm.pdf(resid)
# Modifications.
contrib = np.clip(contrib, 1e-20, np.inf)
contrib = -np.log(contrib)
# Finishing.
return contrib
''' Main calculations.
'''
# Construct parameters.
paras = np.concatenate((betas_true, [sd_true]))
# Single individual.
individual_likelihood(paras, X[1,:], Y[1])
# Full sample.
sample_likelihood(paras, X, Y)
# Optimization.
x0 = paras
#x0 = [0.0, 0.0, 0.0, 1.0]
for optimizer in ['BFGS', 'Nelder-Mead']:
rslt = minimize(sample_likelihood, x0, args=(X, Y), method=optimizer)
Formatting
In [3]:
import urllib; from IPython.core.display import HTML
HTML(urllib.urlopen('http://bit.ly/1Ki3iXw').read())
Out[3]: