This command is used to set the location of the data directory.
import os
os.chdir(r'C:\Users\kevin.sheppard\Dropbox\Teaching\Graduate\2013-MFE\Python\Python_Introduction\data')
In [1]:
import os
os.chdir(r'C:\Users\kevin.sheppard\Dropbox\Teaching\Graduate\2013-MFE\Python\Python_Introduction\data')
This example highlights how to implement a Fama-MacBeth 2-stage regression to estimate factor risk premia, make inference on the risk premia, and test whether a linear factor model can explain a cross-section of portfolio returns. This example closely follows [Cochrane::2001] (See also [JagannathanSkoulakisWang::2010]). As in the previous example, the first segment contains the imports.
In [2]:
from __future__ import print_function, division
from numpy import mat, cov, mean, hstack, multiply,sqrt,diag, genfromtxt, \
squeeze, ones, array, vstack, kron, zeros, eye, savez_compressed
from numpy.linalg import lstsq, inv
from scipy.stats import chi2
from pandas import read_csv
Next, the data are imported. I formatted the data downloaded from Ken French's website into an easy-to-import CSV which can be read by pandas.read_csv. The data is split using named columns for the small sets of variables and ix for the portfolios. The code uses pure NumPy arrays, and so values is used to retrieve the array from the DataFrame. The dimensions are determined using shape. Finally the risk free rate is forced to have 2 dimensions so that it will be broadcastable with the portfolio returns in the construction of the excess returns to the Size and Value-weighted portfolios. asmatrix is used to return matrix views of all of the arrays. This code is linear algebra-heavy and so matrices are easier to use than arrays.
In [3]:
data = read_csv('FamaFrench.csv')
# Split using both named colums and ix for larger blocks
dates = data['date'].values
factors = data[['VWMe', 'SMB', 'HML']].values
riskfree = data['RF'].values
portfolios = data.ix[:, 5:].values
# Use mat for easier linear algebra
factors = mat(factors)
riskfree = mat(riskfree)
portfolios = mat(portfolios)
# Shape information
T,K = factors.shape
T,N = portfolios.shape
# Reshape rf and compute excess returns
riskfree.shape = T,1
excessReturns = portfolios - riskfree
The next block does 2 things:
In [4]:
# Time series regressions
X = hstack((ones((T, 1)), factors))
out = lstsq(X, excessReturns)
alpha = out[0][0]
beta = out[0][1:]
avgExcessReturns = mean(excessReturns, 0)
# Cross-section regression
out = lstsq(beta.T, avgExcessReturns.T)
riskPremia = out[0]
The asymptotic variance requires computing the covariance of the demeaned returns and the weighted pricing errors. The problem is formulated using 2-step GMM where the moment conditions are \begin{equation} g_{t}\left(\theta\right)=\left[\begin{array}{c} \epsilon_{1t}\\ \epsilon_{1t}f_{t}\\ \epsilon_{2t}\\ \epsilon_{2t}f_{t}\\ \vdots\\ \epsilon_{Nt}\\ \epsilon_{Nt}f_{t}\\ \beta u_{t} \end{array}\right] \end{equation}
where $\epsilon_{it}=r_{it}^{e}-\alpha_{i}-\beta_{i}^{\prime}f_{t}$, $\beta_{i}$ is a $K$ by 1 vector of factor loadings, $f_{t}$ is a $K$ by 1 set of factors, $\beta=\left[\beta_{1}\,\beta_{2}\ldots\beta_{N}\right]$ is a $K$ by $N$ matrix of all factor loadings, $u_{t}=r_{t}^{e}-\beta'\lambda$ are the $N$ by 1 vector of pricing errors and $\lambda$ is a $K$ by 1 vector of risk premia. The vector of parameters is then $\theta= \left[\alpha_{1}\:\beta_{1}^{\prime}\:\alpha_{2}\:\beta_{2}^{\prime}\:\ldots\:\alpha_{N}\,\beta_{N}^{\prime}\:\lambda'\right]'$ To make inference on this problem, the derivative of the moments with respect to the parameters, $\partial g_{t}\left(\theta\right)/\partial\theta^{\prime}$ is needed. With some work, the estimator of this matrix can be seen to be
\begin{equation} G=E\left[\frac{\partial g_{t}\left(\theta\right)}{\partial\theta^{\prime}}\right]=\left[\begin{array}{cc} -I_{n}\otimes\Sigma_{X} & 0\\ G_{21} & -\beta\beta^{\prime} \end{array}\right]. \end{equation}where $X_{t}=\left[1\: f_{t}^{\prime}\right]'$ and $\Sigma_{X}=E\left[X_{t}X_{t}^{\prime}\right]$. $G_{21}$ is a matrix with the structure
\begin{equation} G_{21}=\left[G_{21,1}\, G_{21,2}\,\ldots G_{21,N}\right] \end{equation}where
\begin{equation} G_{21,i}=\left[\begin{array}{cc} 0_{K,1} & \textrm{diag}\left(E\left[u_{i}\right]-\beta_{i}\odot\lambda\right)\end{array}\right]\end{equation}and where $E\left[u_{i}\right]$ is the expected pricing error. In estimation, all expectations are replaced with their sample analogues.
In [5]:
# Moment conditions
X = hstack((ones((T, 1)), factors))
p = vstack((alpha, beta))
epsilon = excessReturns - X * p
moments1 = kron(epsilon, ones((1, K + 1)))
moments1 = multiply(moments1, kron(ones((1, N)), X))
u = excessReturns - riskPremia.T * beta
moments2 = u * beta.T
# Score covariance
S = mat(cov(hstack((moments1, moments2)).T))
# Jacobian
G = mat(zeros((N * K + N + K, N * K + N + K)))
SigmaX = X.T * X / T
G[:N * K + N, :N * K + N] = kron(eye(N), SigmaX)
G[N * K + N:, N * K + N:] = -beta * beta.T
for i in xrange(N):
temp = zeros((K, K + 1))
values = mean(u[:, i]) - multiply(beta[:, i], riskPremia)
temp[:, 1:] = diag(values.A1)
G[N * K + N:, i * (K + 1):(i + 1) * (K + 1)] = temp
vcv = inv(G.T) * S * inv(G) / T
The $J$-test examines whether the average pricing errors, $\hat{\alpha}$, are zero. The $J$ statistic has an asymptotic $\chi_{N}^{2}$ distribution, and the model is badly rejected.
In [6]:
vcvAlpha = vcv[0:N * K + N:4, 0:N * K + N:4]
J = alpha * inv(vcvAlpha) * alpha.T
J = J[0, 0]
Jpval = 1 - chi2(25).cdf(J)
The final block using formatted output to present all of the results in a readable manner.
In [7]:
vcvRiskPremia = vcv[N * K + N:, N * K + N:]
annualizedRP = 12 * riskPremia
arp = list(squeeze(annualizedRP.A))
arpSE = list(sqrt(12 * diag(vcvRiskPremia)))
print(' Annualized Risk Premia')
print(' Market SMB HML')
print('--------------------------------------')
print('Premia {0:0.4f} {1:0.4f} {2:0.4f}'.format(arp[0], arp[1], arp[2]))
print('Std. Err. {0:0.4f} {1:0.4f} {2:0.4f}'.format(arpSE[0], arpSE[1], arpSE[2]))
print('\n\n')
print('J-test: {:0.4f}'.format(J))
print('P-value: {:0.4f}'.format(Jpval))
i = 0
betaSE = []
for j in xrange(5):
for k in xrange(5):
a = alpha[0, i]
b = beta[:, i].A1
variances = diag(vcv[(K + 1) * i:(K + 1) * (i + 1), (K + 1) * i:(K + 1) * (i + 1)])
betaSE.append(sqrt(variances))
s = sqrt(variances)
c = hstack((a, b))
t = c / s
print('Size: {:}, Value:{:} Alpha Beta(VWM) Beta(SMB) Beta(HML)'.format(j + 1, k + 1))
print('Coefficients: {:>10,.4f} {:>10,.4f} {:>10,.4f} {:>10,.4f}'.format(a, b[0], b[1], b[2]))
print('Std Err. {:>10,.4f} {:>10,.4f} {:>10,.4f} {:>10,.4f}'.format(s[0], s[1], s[2], s[3]))
print('T-stat {:>10,.4f} {:>10,.4f} {:>10,.4f} {:>10,.4f}'.format(t[0], t[1], t[2], t[3]))
print('')
i += 1
The final block converts the standard errors of $\beta$ to be an array and saves the results.
In [8]:
betaSE = array(betaSE)
savez_compressed('Fama-MacBeth_results', alpha=alpha, \
beta=beta, betaSE=betaSE, arpSE=arpSE, arp=arp, J=J, Jpval=Jpval)