By Evgenia "Jenny" Nitishinskaya and Delaney Granizo-Mackenzie
Notebook released under the Creative Commons Attribution 4.0 License.
Fundamentals are data having to do with the asset issuer, like the sector, size, and expenses of the company. We can use this data to build a linear factor model, expressing the returns as
$$R_i = a_i + b_{i1} F_1 + b_{i2} F_2 + \ldots + b_{iK} F_K + \epsilon_i$$There are two different approaches to computing the factors $F_j$, which represent the returns associated with some fundamental characteristics, and the factor sensitivities $b_{ij}$.
In the first, we start by representing each characteristic of interest by a portfolio: we sort all assets by that characteristic, then build the portfolio by going long the top quantile of assets and short the bottom quantile. The factor corresponding to this characteristic is the return on this portfolio. Then, the $b_{ij}$ are estimated for each asset $i$ by regressing over the historical values of $R_i$ and of the factors.
We start by getting the fundamentals data for all assets and constructing the portfolios for each characteristic:
In [58]:
import numpy as np
import statsmodels.api as sm
from statsmodels import regression
import matplotlib.pyplot as plt
import pandas as pd
# Get market cap and book-to-price for all assets in universe
fundamentals = init_fundamentals()
data = get_fundamentals(query(fundamentals.valuation.market_cap,
fundamentals.valuation_ratios.book_value_yield), '2015-07-31').T
# Drop missing data
data.dropna(inplace=True)
# Following the Fama-French model, ignore assets with negative book-to-price
data = data.loc[data['book_value_yield'] > 0]
In [59]:
# As per Fama-French, get the top 30% and bottom 30% of stocks by market cap
market_cap_top = data.sort('market_cap')[7*len(data)/10:]
market_cap_bottom = data.sort('market_cap')[:3*len(data)/10]
# Factor 1 is returns on portfolio that is long the top stocks and short the bottom stocks
f1 = (np.mean(get_pricing(market_cap_top.index, fields='price',
start_date='2014-07-31', end_date='2015-07-31').pct_change()[1:].T.dropna()) -
np.mean(get_pricing(market_cap_bottom.index, fields='price',
start_date='2014-07-31', end_date='2015-07-31').pct_change()[1:].T.dropna()))
In [60]:
# Repeat above procedure for book-to-price
bp_top = data.sort('book_value_yield')[7*len(data)/10:]
bp_bottom = data.sort('book_value_yield')[:3*len(data)/10]
f2 = (np.mean(get_pricing(bp_top.index, fields='price',
start_date='2014-07-31', end_date='2015-07-31').pct_change()[1:].T.dropna()) -
np.mean(get_pricing(bp_bottom.index, fields='price',
start_date='2014-07-31', end_date='2015-07-31').pct_change()[1:].T.dropna()))
Now that we have returns series representing our factors, we can compute the factor model for any return stream using a linear regression. Below, we compute the factor sensitivities for returns on Alcoa stock:
In [36]:
# Get returns data for our asset
asset = get_pricing('AA', fields='price', start_date='2014-07-31', end_date='2015-07-31').pct_change()[1:]
In [82]:
# Perform linear regression to get the coefficients in the model
mlr = regression.linear_model.OLS(asset, sm.add_constant(np.column_stack((f1, f2)))).fit()
# Print the coefficients from the linear regression
print'Historical sensitivities of AA returns to factors:\nMarket cap: %f\nB/P: %f' % (mlr.params[1],
mlr.params[2])
# Print the latest values for each of the factors
print '\nValues of factors on 2015-07-31:\nMarket cap: %f\nB/P: %f' % (f1[-1], f2[-1])
With the other method, we calculate the coefficients $b_{ij}$ from the formula
$$ b_{ij} = \frac{\text{Value of factor for asset }i - \text{Average value of factor}}{\sigma(\text{Factor values})} $$By scaling the value of the factor in this way, we make the coefficients comparable across factors. The exceptions to this formula are indicator variables, which are set to 1 for true and 0 for false. One example is industry membership: the coefficient tells us whether the asset belongs to the industry or not. After we calculate all of the coefficients, we estimate $F_j$ and $a_i$ using a cross-sectional regression (i.e. at each time step, we perform a regression using the equations for all of the assets).
Following this procedure, we get the cross-sectional returns on 2015-07-31, and compute the coefficients for all assets:
In [66]:
# Get one day's worth of cross-sectional returns
cs_returns = get_pricing(data.index, fields='price',
start_date='2015-07-30', end_date='2015-07-31').pct_change()[1:].T.dropna()
# Only look at fundamentals data of assets that we have pricing data for
data = data.loc[cs_returns.index]
# Compute coefficients according to formula above
coeffs = (data - data.mean())/data.std()
Now that we have the factor sensitivities, we use a linear regression to compute the factors on 2015-07-31:
In [67]:
mlr = regression.linear_model.OLS(cs_returns,
sm.add_constant(coeffs)).fit()
In [68]:
# Print the coefficients we computed for AA
print 'Sensitivities of AA returns:\n', coeffs.iloc[0]
# Print factor values from linear regression
print '\nFactors on 2015-07-31:\n', mlr.params[1:]