By Evgenia "Jenny" Nitishinskaya and Delaney Granizo-Mackenzie.
Part of the Quantopian Lecture Series:
Notebook released under the Creative Commons Attribution 4.0 License.
As always, this analysis is based on historical data, and risk exposures estimated on historical data may or may not affect the exposures going forward. As such, computing the risk exposure of to a factor is not enough. You must put confidence bounds on that risk exposure, and determine whether the risk exposure can even be modeled reasonably. For more information on this, please see our other lectures, especially Instability of Parameter Estimates.
We can use factor models to analyze the sources of risks and returns in portfolios. Recall that a factor model expresses the returns as
$$R_i = a_i + b_{i1} F_1 + b_{i2} F_2 + \ldots + b_{iK} F_K + \epsilon_i$$By modelling the historical returns, we can see how much of them is due to speculation on different factors and how much to asset-specific fluctuations ($\epsilon_p$). We can also examine what sources of risk the portfolio is exposed to.
In risk analysis, we often model active returns (returns relative to a benchmark) and active risk (standard deviation of active returns, also known as tracking error or tracking risk).
For instance, we can find a factor's marginal contribution to active risk squared (FMCAR). For factor $j$, this is
$$ \text{FMCAR}_j = \frac{b_j^a \sum_{i=1}^K b_i^a Cov(F_j, F_i)}{(\text{Active risk})^2} $$where $b_i^a$ is the portfolio's active exposure (exposure different from the benchmark's) to factor $i$. This tells us how much risk we incur by being exposed to factor $j$, given all the other factors we're already exposed to.
Fundamental factor models are often used to evaluate portfolios because they correspond directly to investment choices (e.g. whether we invest in small-cap or large-cap stocks, etc.). Below, we construct a model to evaluate a single asset; for more information on the model construction, check out the fundamental factor models notebook.
We'll use the canonical Fama-French factors for this example, which are the returns of portfolios constructred based on fundamental factors.
In the Arbitrage Pricing Theory lecture we mention that for predictive models you want fewer parameters. However, this doesn't quite hold for risk exposure. Anything left over in our $\alpha$ is risk exposure that is currently unexplained by the selected factors. You want your strategy's return stream to be all alpha, and to be unexplained by as many parameters as possible. If you can show that your historical returns have little to no dependence on many factors, this is very positive.
In [1]:
import numpy as np
import statsmodels.api as sm
from statsmodels import regression
import matplotlib.pyplot as plt
import pandas as pd
In [2]:
# Get market cap and book-to-price for all assets in universe
fundamentals = init_fundamentals()
data = get_fundamentals(query(fundamentals.valuation.market_cap,
fundamentals.valuation_ratios.book_value_yield), '2015-07-31').T
# Drop missing data
data.dropna(inplace=True)
# Following the Fama-French model, ignore assets with negative book-to-price
data = data.loc[data['book_value_yield'] > 0]
In [3]:
# As per Fama-French, get the top 30% and bottom 30% of stocks by market cap
market_cap_top = data.sort('market_cap')[7*len(data)/10:]
market_cap_bottom = data.sort('market_cap')[:3*len(data)/10]
# Factor 1 is returns on portfolio that is long the top stocks and short the bottom stocks
f1 = (np.mean(get_pricing(market_cap_top.index, fields='price',
start_date='2014-07-31', end_date='2015-07-31').pct_change()[1:].T.dropna()) -
np.mean(get_pricing(market_cap_bottom.index, fields='price',
start_date='2014-07-31', end_date='2015-07-31').pct_change()[1:].T.dropna()))
# Repeat above procedure for book-to-price
bp_top = data.sort('book_value_yield')[7*len(data)/10:]
bp_bottom = data.sort('book_value_yield')[:3*len(data)/10]
f2 = (np.mean(get_pricing(bp_top.index, fields='price',
start_date='2014-07-31', end_date='2015-07-31').pct_change()[1:].T.dropna()) -
np.mean(get_pricing(bp_bottom.index, fields='price',
start_date='2014-07-31', end_date='2015-07-31').pct_change()[1:].T.dropna()))
Now that we have our factors, we will use them to model active returns (that is, asset returns less benchmark returns):
In [4]:
start_date = '2014-07-31'
end_date = '2015-07-31'
# Get returns data for our asset
asset = get_pricing('HSC', fields='price', start_date=start_date, end_date=end_date).pct_change()[1:]
bench = get_pricing('SPY', fields='price', start_date=start_date, end_date=end_date).pct_change()[1:]
# The excess returns of our active management, in this case just holding a portfolio of our one asset
active = asset - bench
# Define a constant to compute intercept
constant = pd.TimeSeries(np.ones(len(asset.index)), index=asset.index)
df = pd.DataFrame({'R': active,
'F1': f1,
'F2': f2,
'Constant': constant})
df = df.dropna()
In [5]:
# Perform linear regression to get the coefficients in the model
b1, b2 = regression.linear_model.OLS(active, df[['F1', 'F2']]).fit().params
# Print the coefficients from the linear regression
print 'Sensitivities of active returns to factors:\nMarket cap: %f\nB/P: %f' % (b1, b2)
Using the formula above, we compute the factors' marginal contributions to active risk squared:
In [6]:
cov = np.cov(f1, f2)
ar_squared = (active.std())**2
fmcar1 = (b1*(b2*cov[0,1] + b1*cov[0,0]))/ar_squared
fmcar2 = (b2*(b1*cov[0,1] + b2*cov[1,1]))/ar_squared
print 'Market Cap Risk Contribution:', fmcar1
print 'Book to Price Risk Contribution:', fmcar2
The first factor has a small negative contribution to active risk squared, while the second accounts for about 6.2% of that risk. The rest can be attributed to active specific risk, i.e. factors that we did not take into account or the asset's idiosyncratic risk.
However, as usual we will look at how the exposure to these factors changes over time. As we lose a tremendous amount of information by just looking at one data point.
In [7]:
# Compute the rolling betas
model = pd.stats.ols.MovingOLS(y = df['R'], x=df[['F1', 'F2']],
window_type='rolling',
window=30)
rolling_parameter_estimates = model.beta
rolling_parameter_estimates.plot();
plt.title('Computed Betas');
plt.legend(['F1 Beta', 'F2 Beta', 'Intercept']);
Here we'll define a function to compute FMCAR for a given date.
In [8]:
active_risk_sqaured = pd.rolling_std(active, window = 30)**2
# Remove the first 30, which are all NaN
active_risk_sqaured = active_risk_sqaured[29:]
def compute_FMCAR(asset, bench, factor_df, factor_name, t, window = 30):
# Transform to integer rather than date for indexing
t = asset.index.get_loc(t)
if t < window:
return
# Compute exess returns of the asset
excess_returns = (asset-bench).iloc[t-window:t]
# Compute the squared risk of the excess returns
ar_squared = (excess_returns.std())**2
# Compute the betas for the factors
model = regression.linear_model.OLS(excess_returns, factor_df[t-window:t])
fitted_model = model.fit()
# Compute the bulk of FMCAR
f_j = factor_df[factor_name].iloc[t-window:t]
b_j = fitted_model.params[factor_name]
s = 0.0
for factor in factor_df:
b_factor = fitted_model.params[factor]
s += b_factor * np.cov(f_j, df[factor].iloc[t-window:t])[0, 1]
return b_j * s / ar_squared
Now we'll compute it over our data on a rolling basis. We can see how the risk exposure to the different factors changes.
In [9]:
# Compute the FMCAR values for all timepoints
F1_FMCAR = [compute_FMCAR(asset, bench, df[['F1', 'F2', 'Constant']], 'F1', date, window=30) for date in asset.index]
F2_FMCAR = [compute_FMCAR(asset, bench, df[['F1', 'F2', 'Constant']], 'F2', date, window=30) for date in asset.index]
# Add the date index back in
F1_FMCAR = pd.TimeSeries(F1_FMCAR, index=asset.index)
F2_FMCAR = pd.TimeSeries(F2_FMCAR, index=asset.index)
# See how it looks
F1_FMCAR.plot(alpha=0.5)
F2_FMCAR.plot(alpha=0.5)
plt.ylabel('Marginal Contribution to Active Risk Squared')
plt.legend(['F1 FMCAR', 'F2 FMCAR'])
Out[9]:
We'd like to be able to make a meaningful statement about how exposued our asset is to these two factors, however, as you saw the exposure varies quite a bit, so taking the average is dangerous. We could put confidence intervals around that average, but that would only work if the distribution of exposures were normal. Let's check using our old buddy, the Jarque-Bera test.
In [10]:
from statsmodels.stats.stattools import jarque_bera
_, pvalue1, _, _ = jarque_bera(F1_FMCAR.dropna().values)
_, pvalue2, _, _ = jarque_bera(F2_FMCAR.dropna().values)
print 'p-value F1_FMCAR is normally distributed', pvalue1
print 'p-value F2_FMCAR is normally distributed', pvalue2
The p-values are below our default cutoff of 0.05. We can't even put good confidence intervals on the risk exposure of the asset, so making any statement about exposure in the future is very difficult right now. Any hedge we took out to cancel the exposure to one of the factors might be way over or under hedged.
We are trying to predict future exposure, and predicting the future is incredibly difficult. One must be very careful with statistical methods to ensure that false predictions are not made.
We can use factor and tracking portfolios to tweak a portfolio's sensitivities to different sources of risk.
A factor portfolio has a sensitivity of 1 to a particular factor and 0 to all other factors. In other words, it represents the risk of that one factor. We can add a factor portfolio to a larger portfolio to adjust its exposure to that factor.
A similar concept is a tracking portfolio, which is constructed to have the same factor sensitivities as a benchmark or other portfolio. Like a factor portfolio, this allows us to either speculate on or hedge out the risks associated with that benchmark or portfolio. For instance, we regularly hedge out the market, because we care about how our portfolio performs relative to the market, and we don't want to be subject to the market's fluctuations.
To construct a factor or tracking portfolio, we need the factor sensitivities of what we want to track. We already know what these are in the former case, but we need to compute them in the latter using usual factor model methods. Then, we pick some $K+1$ assets (where $K$ is the number of factors we're considering) and solve for the weights of the assets in the portfolio.
Say we have two factors $F_1$ and $F_2$, and a benchmark with sensitivities of 1 and 1.1 to the factors, respectively. We identify 3 securities $x_1, x_2, x_3$ that we would like to use in composing a portfolio that tracks the benchmark, whose sensitivities are $b_{11} = 0.7$, $b_{12} = 1.1$, $b_{21} = 0.1$, $b_{22} = 0.5$, $b_{31} = 1.5$, $b_{32} = 1.3$. We would like to compute weights $w_1$, $w_2$, $w_3$ so that our tracking portfolio is
$$ P = w_1 x_1 + w_2 x_2 + w_3 x_3 $$We want our portfolio sensitivities to match the benchmark:
$$ w_1 b_{11} + w_2 b_{21} + w_3 b_{31} = 1 $$$$ w_1 b_{12} + w_2 b_{22} + w_3 b_{32} = 1.1 $$Also, the weights need to sum to 1:
$$ w_1 + w_2 + w_3 = 1 $$Solving this system of 3 linear equations, we find that $w_1 = 1/3$, $w_2 = 1/6$, and $w_3 = 1/2$. Putting the securities together into a portfolio using these weights, we obtain a portfolio with the same risk profile as the benchmark.
Once we know our risk exposures, we can do a few things. We can not enter into positions that have high exposures to certain factors, or we can hedge our positions to try to neutralize the exposure.
Often times funds will have a layer of protection over their traders/algorithms. This layer of protection takes in the trades that the fund wants to make, then computes the exposure of the new portfolio, and checks to make sure they're within pre-defined ranges. If they are not, it does not place the trade and files a warning.
Another method of dealing with exposure is to take out hedges. You can determine, for example, your exposure to each sector of the market. You can then take out a hedge if a particular sector seems to affect your returns too much. For more information on hedging, please see our Beta Hedging lecture. Good algorithms will have built-in hedging logic that ensures they are never over-exposed.