This notebook analyzes some securities -- stocks, bonds, commodities -- computes their correlations, and optimizes risk vs. reward for a set of portfolio scenarios.
Select "Run All" to update this notebook with the latest stock quotes.
Caveat: This analysis covers a date range starting around 2007 (if run without modification). Obviously, the analysis can vary dramatically if you choose another date range.
In [1]:
from sklearn import cluster # machine learning
import pandas # data manipulation and computation
import ystockquote as ys # stock prices
In [2]:
rcParams['figure.figsize'] = (13, 6) # default figure size
In [3]:
symbols = array([
'blv', # bond: long-term bond
'bnd', # bond: total bond market
'bsv', # bond: short-term bond
'tip', # bond: treasury inflation protected
'gld', # commodity: gold (SPDR)
'iau', # commodity: gold (iShares)
'slv', # commodity: silver
'vti', # stock: US total stock market
'vbk', # stock: US small-cap stock
'aapl', # stock: Apple
'ua', # stock: Under Armour
'cmg', # stock: Chipotle
])
In [4]:
prices = pandas.DataFrame({
symbol: {
date: quote['Adj Close']
for date, quote in ys.get_historical_prices(symbol, '2000-01-01', '2015-01-01').iteritems()
}
for symbol in symbols
}, columns=symbols, dtype=float)
prices.tail()
Out[4]:
In [5]:
prices.head()
Out[5]:
In [6]:
prices.dropna(inplace=True)
prices.head()
Out[6]:
In [7]:
prices = prices.ix['2013-01-01':]
prices.head()
Out[7]:
In [8]:
prices[['vti', 'bnd', 'iau']].plot() # plot a few, or...
#prices.plot() # plot all
ylabel('Daily Close')
Out[8]:
In [9]:
prices_norm = prices.copy()
for symbol in symbols:
prices_norm[symbol] = prices[symbol]/prices[symbol][0]
prices_norm[['vti', 'bnd', 'iau']].plot()
Out[9]:
In 2013-14, stocks are increasing faster than gold.
In [10]:
(prices_norm.vti/prices_norm.iau).plot(logy=True)
ylim(0.3, 1.5)
Out[10]:
In [11]:
prices.pct_change().tail()
Out[11]:
In [12]:
prices.pct_change().describe()
Out[12]:
In [13]:
C = prices.pct_change().corr()
C
Out[13]:
Not surprisingly, the securities cluster by stocks, bonds, and commodities! And that's without predefining the number of clusters.
In [14]:
_, labels = cluster.affinity_propagation(C)
for label in range(max(labels)+1):
print label, symbols[labels==label]
We find that the daily percent change is roughly Gaussian, which implies that the daily change follows a log-normal distribution.
In [15]:
prices.pct_change().hist(column=['vti', 'iau', 'bnd', 'cmg'], bins=40, sharex=True, sharey=True)
Out[15]:
There is a slight inverse correlation between BND and VTI.
The plot also suggests that the joint daily change is jointly Gaussian. If two jointly Gaussian variables are uncorrelated, then they are statistically independent.
That is a good property to have when choosing securities for your portfolio. If two securities are perfectly correlated, well, then you should just own more of one. If two securities are perfectly inversely correlated, then their movements cancel out, so it's as if you're investing less. By choosing statistically independent securities, we have a broader range of risk vs. reward combinations.
In [16]:
scatter(prices.bnd.pct_change(), prices.vti.pct_change())
xlabel('BND')
ylabel('VTI')
Out[16]:
As shown earlier, the securities cluster into three categories: stocks, bonds, and commodities. So let's choose one from each.
In [17]:
# assuming you use these proportions on day 1
portfolio = 0.5*prices_norm.vti + 0.3*prices_norm.bnd + 0.2*prices_norm.iau
portfolio.pct_change().describe()
Out[17]:
Compared to the previous portfolio, this portfolio has similar reward but far more risk.
In [18]:
(0.8*prices_norm.vti + 0.2).pct_change().describe()
Out[18]:
In [19]:
pct_vti = linspace(0, 1, 100, endpoint=False)
vti_iau = pandas.DataFrame({
'reward': pandas.Series([(r*prices_norm.vti + (1-r)*prices_norm.iau).pct_change().mean() for r in pct_vti], index=pct_vti),
'risk': pandas.Series([(r*prices_norm.vti + (1-r)*prices_norm.iau).pct_change().std() for r in pct_vti], index=pct_vti)
})
ax = vti_iau.plot(secondary_y='risk')
ax.set_xlabel('Percent of portfolio in VTI')
ax.set_ylabel('Mean of daily percent change')
ax.right_ax.set_ylabel('Standard deviation of daily percent change')
title('VTI vs. IAU: Reward and Risk')
Out[19]:
In [20]:
reward_over_risk = vti_iau.reward/vti_iau.risk
reward_over_risk.plot()
xlabel('Percent of portfolio in VTI')
ylabel('Reward/Risk')
Out[20]:
Reward/risk is maximized when the portfolio is approximately 53% VTI and 47% IAU.
In [21]:
reward_over_risk.argmax()
Out[21]:
In [22]:
vti_bnd = pandas.DataFrame({
'reward': pandas.Series([(r*prices_norm.vti + (1-r)*prices_norm.bnd).pct_change().mean() for r in pct_vti], index=pct_vti),
'risk': pandas.Series([(r*prices_norm.vti + (1-r)*prices_norm.bnd).pct_change().std() for r in pct_vti], index=pct_vti)
})
ax = vti_bnd.plot(secondary_y='risk')
ax.set_xlabel('Percent of portfolio in VTI')
ax.set_ylabel('Mean of daily percent change')
ax.right_ax.set_ylabel('Standard deviation of daily percent change')
title('VTI vs. BND: Reward and Risk')
Out[22]:
In [23]:
reward_over_risk = vti_bnd.reward/vti_bnd.risk
reward_over_risk.plot()
xlabel('Percent of portfolio in VTI')
ylabel('Reward/Risk')
Out[23]:
Reward/risk is mazimized at 14% VTI and 86% BND.
In [24]:
reward_over_risk.argmax()
Out[24]:
Suppose that we require the standard deviation of the daily percent change to be less than 0.01.
Given this risk constraint, reward is maximized at 77% VTI and 23% BND.
In [25]:
vti_bnd[vti_bnd.risk < 0.01].iloc[-1]
Out[25]:
In [26]:
portfolios = pandas.DataFrame({
'bnd heavy': 0.15*prices_norm.vti + 0.15*prices_norm.iau + 0.70*prices_norm.bnd,
'iau heavy': 0.15*prices_norm.vti + 0.70*prices_norm.iau + 0.15*prices_norm.bnd,
'vti heavy': 0.70*prices_norm.vti + 0.15*prices_norm.iau + 0.15*prices_norm.bnd,
'mixed': 0.45*prices_norm.vti + 0.43*prices_norm.iau + 0.12*prices_norm.bnd,
})
portfolios.plot()
Out[26]:
In [27]:
portfolios.pct_change().describe()
Out[27]:
So what does this all mean? I have no idea.