SciPy for Economists

Scipy provides many user-friendly and efficient numerical routines, e.g. numerical integration and optimization. The full documentation is available at http://docs.scipy.org/doc/scipy/reference/.

We will use the provided tools to simulate and estimate and Ordinary Least Squares (OLS) regression: $Y = X\beta + \epsilon$

We will proceed in three steps:

  • Simulated Sample
  • Estimate Model usig Linear Algebra Tools (NumPy)
  • Estimate Model usig Optimization Tools (SciPy)

Of course, OLS and other statistical models are readily available in the StatsModels Library http://statsmodels.sourceforge.net/.

Simulate Sample


In [7]:
# standard library
import numpy as np

# Parametrization
num_agents = 1000
num_covars = 3

betas_true = np.array([0.22, 0.30, -0.1]).T
sd_true    = 0.01

# Sampling of observables
np.random.seed(123)

X = np.random.rand(num_agents, num_covars)
X[:,0] = 1

# Sampling disturbances
eps = np.random.normal(loc=0.0, scale=sd_true, size=num_agents)

# Create endogenous outcome
idx_true = np.dot(X, betas_true)
Y   = idx_true + eps

# Checks
assert (X.dtype == 'float')
assert (Y.dtype == 'float')
assert (np.all(np.isfinite(X)))
assert (np.all(np.isfinite(Y)))
assert (X.shape == (num_agents, num_covars))
assert (Y.shape == (num_agents, ))
assert (np.all(X[:,0] == 1.0))

Let us have a look at the relationship.


In [3]:
%pylab inline
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(111)

ax.set_ylabel(r'$Y$'), ax.set_xlabel(r'$ X\beta $')

ax.plot(idx_true, Y, 'o')


Populating the interactive namespace from numpy and matplotlib
Out[3]:
[<matplotlib.lines.Line2D at 0x7fedd0bed860>]

Estimation using Linear Algebra Tools

We can determine the $\hat{\beta} = (X^T X)^{-1}X^T Y$ using basic linear algebra tools from NumPy.


In [4]:
# Let us get the estimates.
beta_hat = np.dot(np.dot(np.linalg.inv(np.dot(X.T,X)), X.T), Y)
sd_hat   = np.sqrt(np.var(Y - np.dot(X, beta_hat)))

# Let us have a look now.
print('Results for beta', beta_hat, '  Results for sd', sd_hat)


Results for beta [ 0.2198005   0.29961809 -0.09872063]   Results for sd 0.00972580820726

Estimation using Optimization Tools

Let us now determine $\hat{\beta}$ using Maximum Likelihood (ML) estimation. So, we need to maximize the following log-likelihood function:

$$L(\beta, \sigma) = \sum_{i = 1, ..., N} \log\left(\frac{1}{\sigma}\phi\left(\frac{Y_i - X_i\beta}{\sigma}\right)\right)$$

SciPy offers a convenient interface to alternative optimization algorithms. Let us check it out online.


In [ ]:
# standard library
from scipy.optimize import minimize
from scipy.stats    import norm

# Auxiliary functions.
def sample_likelihood(paras, X, Y):
    ''' Construct sample likelihood.
    '''
    # Antibugging.
    assert (isinstance(paras, np.ndarray))
    assert (paras.dtype == 'float')
    assert (X.ndim == 2), (Y.ndim == 2)
    
    # Auxiliary objects.
    num_agents = Y.shape[0]
    
    # Summing over the sample.
    contribs = 0.0
    
    for i in range(num_agents):
        
        contrib = individual_likelihood(paras, X[i,:], Y[i])
        
        contribs += contrib
    
    # Modifications.
    contribs = np.mean(contribs)
    
    # Finishing.
    return contribs

def individual_likelihood(paras, x, y):
    ''' This function determines the an individual's contribution to the sample likelihood.    
    '''
    # Antibugging.
    assert (isinstance(paras, np.ndarray))
    assert (paras.dtype == 'float')
    assert (x.ndim == 1), (y.ndim == 1)
    
    # Distribute parameters.
    betas, sd = paras[:-1], paras[-1]
    
    # Calculate likelihood contribution.
    resid = (y - np.dot(x, betas))/sd
     
    contrib = (1.0/sd)*norm.pdf(resid)
    
    # Modifications.
    contrib = np.clip(contrib, 1e-20, np.inf)
    
    contrib = -np.log(contrib)
    
    # Finishing.
    return contrib
    
''' Main calculations.
'''

# Construct parameters.
paras = np.concatenate((betas_true, [sd_true]))

# Single individual.
individual_likelihood(paras, X[1,:], Y[1])

# Full sample.
sample_likelihood(paras, X, Y)

# Optimization.
x0 = paras
#x0 = [0.0, 0.0, 0.0, 1.0]

for optimizer in ['BFGS', 'Nelder-Mead']:
    rslt = minimize(sample_likelihood, x0, args=(X, Y), method=optimizer)

Formatting


In [3]:
import urllib; from IPython.core.display import HTML
HTML(urllib.urlopen('http://bit.ly/1Ki3iXw').read())


Out[3]: