Stock Analysis

This notebook analyzes some securities -- stocks, bonds, commodities -- computes their correlations, and optimizes risk vs. reward for a set of portfolio scenarios.

Select "Run All" to update this notebook with the latest stock quotes.

Caveat: This analysis covers a date range starting around 2007 (if run without modification). Obviously, the analysis can vary dramatically if you choose another date range.

Initialization


In [1]:
from sklearn import cluster  # machine learning
import pandas             # data manipulation and computation
import ystockquote as ys  # stock prices

In [2]:
rcParams['figure.figsize'] = (13, 6)  # default figure size

1. Data Collection

Choose a few securities.


In [3]:
symbols = array([
           'blv',  # bond: long-term bond
           'bnd',  # bond: total bond market
           'bsv',  # bond: short-term bond
           'tip',  # bond: treasury inflation protected
           'gld',  # commodity: gold (SPDR)
           'iau',  # commodity: gold (iShares)
           'slv',  # commodity: silver
           'vti',  # stock: US total stock market
           'vbk',  # stock: US small-cap stock
           'aapl', # stock: Apple
           'ua',   # stock: Under Armour
           'cmg',  # stock: Chipotle
          ])

Get daily close prices.


In [4]:
prices = pandas.DataFrame({
             symbol: {
                 date: quote['Adj Close'] 
                 for date, quote in ys.get_historical_prices(symbol, '2000-01-01', '2015-01-01').iteritems()
             }
             for symbol in symbols
         }, columns=symbols, dtype=float)
prices.tail()


Out[4]:
blv bnd bsv tip gld iau slv vti vbk aapl ua cmg
2014-11-03 91.10 82.15 80.17 113.08 112.15 11.29 15.48 104.08 124.94 108.93 65.99 638.53
2014-11-04 91.18 82.13 80.17 113.04 112.22 11.30 15.36 103.70 124.12 108.13 64.79 647.68
2014-11-05 91.36 82.23 80.12 113.15 109.79 11.06 14.66 104.19 123.87 108.39 64.25 642.59
2014-11-06 91.01 82.04 80.10 112.96 109.88 11.07 14.82 104.70 124.75 108.70 66.28 647.12
2014-11-07 91.60 82.32 80.21 113.62 112.97 11.37 15.10 104.78 125.07 109.01 66.98 649.03

In [5]:
prices.head()


Out[5]:
blv bnd bsv tip gld iau slv vti vbk aapl ua cmg
2000-01-03 NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.79 NaN NaN
2000-01-04 NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.47 NaN NaN
2000-01-05 NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.52 NaN NaN
2000-01-06 NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.21 NaN NaN
2000-01-07 NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.37 NaN NaN

Only preserve dates for which all stocks exist.


In [6]:
prices.dropna(inplace=True)
prices.head()


Out[6]:
blv bnd bsv tip gld iau slv vti vbk aapl ua cmg
2007-04-10 50.97 57.73 62.42 78.08 67.16 6.71 13.85 62.02 66.83 12.75 13.26 64.67
2007-04-11 50.90 57.58 62.38 77.96 67.08 6.71 13.77 61.62 66.59 12.53 13.17 63.78
2007-04-12 50.96 57.57 62.53 77.96 66.99 6.71 13.81 62.03 67.16 12.47 13.16 64.16
2007-04-13 50.86 57.48 62.35 77.77 67.84 6.79 13.99 62.24 67.51 12.21 12.97 64.07
2007-04-16 51.03 57.53 62.43 78.01 68.40 6.84 13.98 62.86 68.31 12.37 13.13 65.20

Optional: choose a custom start date for analysis.


In [7]:
prices = prices.ix['2013-01-01':]
prices.head()


Out[7]:
blv bnd bsv tip gld iau slv vti vbk aapl ua cmg
2013-01-02 85.86 79.92 79.00 118.16 163.17 16.40 29.92 72.75 90.81 74.93 24.17 301.06
2013-01-03 84.98 79.68 78.97 117.35 161.20 16.19 29.18 72.64 90.74 73.99 24.83 300.95
2013-01-04 85.37 79.81 78.98 117.30 160.44 16.11 29.24 73.01 91.32 71.93 24.67 300.18
2013-01-07 85.37 79.75 78.96 117.70 159.43 16.02 29.18 72.81 91.19 71.50 24.22 299.59
2013-01-08 85.73 79.83 79.00 117.86 160.56 16.12 29.38 72.62 90.92 71.70 24.32 297.76

Plot the daily close prices.


In [8]:
prices[['vti', 'bnd', 'iau']].plot()  # plot a few, or...
#prices.plot()  # plot all
ylabel('Daily Close')


Out[8]:
<matplotlib.text.Text at 0x10b782890>

Plot prices normalized from Day 1.


In [9]:
prices_norm = prices.copy()
for symbol in symbols:
    prices_norm[symbol] = prices[symbol]/prices[symbol][0]
prices_norm[['vti', 'bnd', 'iau']].plot()


Out[9]:
<matplotlib.axes.AxesSubplot at 0x107559310>

Plot the ratio VTI/IAU.

In 2013-14, stocks are increasing faster than gold.


In [10]:
(prices_norm.vti/prices_norm.iau).plot(logy=True)
ylim(0.3, 1.5)


Out[10]:
(0.3, 1.5)

2. Data Analysis

Compute the daily percent change.


In [11]:
prices.pct_change().tail()


Out[11]:
blv bnd bsv tip gld iau slv vti vbk aapl ua cmg
2014-11-03 -0.002955 -0.000487 -0.000623 0.000000 -0.004527 -0.005286 -0.001290 0.000577 -0.000720 0.013020 0.006252 0.000831
2014-11-04 0.000878 -0.000243 0.000000 -0.000354 0.000624 0.000886 -0.007752 -0.003651 -0.006563 -0.007344 -0.018185 0.014330
2014-11-05 0.001974 0.001218 -0.000624 0.000973 -0.021654 -0.021239 -0.045573 0.004725 -0.002014 0.002405 -0.008335 -0.007859
2014-11-06 -0.003831 -0.002311 -0.000250 -0.001679 0.000820 0.000904 0.010914 0.004895 0.007104 0.002860 0.031595 0.007050
2014-11-07 0.006483 0.003413 0.001373 0.005843 0.028122 0.027100 0.018893 0.000764 0.002565 0.002852 0.010561 0.002952

Get basic statistics for the daily percent change.


In [12]:
prices.pct_change().describe()


Out[12]:
blv bnd bsv tip gld iau slv vti vbk aapl ua cmg
count 467.000000 467.000000 467.000000 467.000000 467.000000 467.000000 467.000000 467.000000 467.000000 467.000000 467.000000 467.000000
mean 0.000153 0.000065 0.000033 -0.000078 -0.000721 -0.000718 -0.001314 0.000807 0.000730 0.000933 0.002431 0.001813
std 0.005318 0.001986 0.000802 0.003354 0.011482 0.011484 0.017203 0.007089 0.009358 0.016040 0.022617 0.018602
min -0.025536 -0.011150 -0.003264 -0.013636 -0.087808 -0.088459 -0.126187 -0.024777 -0.035620 -0.123450 -0.073851 -0.069645
25% -0.003047 -0.001057 -0.000380 -0.001910 -0.005575 -0.005242 -0.009114 -0.002752 -0.004452 -0.006993 -0.008279 -0.006315
50% 0.000365 0.000125 0.000000 0.000000 -0.000241 0.000000 -0.000585 0.001216 0.001855 0.001070 0.002025 0.000433
75% 0.003590 0.001261 0.000504 0.001917 0.005494 0.005135 0.005532 0.005202 0.006979 0.009821 0.011570 0.008570
max 0.014883 0.007994 0.003556 0.013260 0.043557 0.044025 0.064408 0.022464 0.024459 0.081912 0.229289 0.160954

Compute the correlation between each pair of ETFs.


In [13]:
C = prices.pct_change().corr()
C


Out[13]:
blv bnd bsv tip gld iau slv vti vbk aapl ua cmg
blv 1.000000 0.883149 0.668414 0.832665 0.235849 0.233196 0.138741 -0.170632 -0.175536 -0.118826 -0.117368 -0.003331
bnd 0.883149 1.000000 0.827002 0.865648 0.332163 0.330167 0.227901 -0.088770 -0.101778 -0.090745 -0.103985 0.001950
bsv 0.668414 0.827002 1.000000 0.748721 0.324600 0.326571 0.238990 -0.053424 -0.049828 -0.052260 -0.057686 -0.031347
tip 0.832665 0.865648 0.748721 1.000000 0.272348 0.273133 0.181317 -0.121027 -0.112926 -0.110870 -0.078758 -0.001065
gld 0.235849 0.332163 0.324600 0.272348 1.000000 0.998278 0.883991 0.031921 0.052756 0.105621 -0.116838 -0.018545
iau 0.233196 0.330167 0.326571 0.273133 0.998278 1.000000 0.883531 0.033171 0.052788 0.104115 -0.112072 -0.018756
slv 0.138741 0.227901 0.238990 0.181317 0.883991 0.883531 1.000000 0.099791 0.112499 0.143244 -0.091253 -0.004312
vti -0.170632 -0.088770 -0.053424 -0.121027 0.031921 0.033171 0.099791 1.000000 0.924305 0.277488 0.425141 0.370118
vbk -0.175536 -0.101778 -0.049828 -0.112926 0.052756 0.052788 0.112499 0.924305 1.000000 0.226838 0.486547 0.393794
aapl -0.118826 -0.090745 -0.052260 -0.110870 0.105621 0.104115 0.143244 0.277488 0.226838 1.000000 0.068858 0.031359
ua -0.117368 -0.103985 -0.057686 -0.078758 -0.116838 -0.112072 -0.091253 0.425141 0.486547 0.068858 1.000000 0.301409
cmg -0.003331 0.001950 -0.031347 -0.001065 -0.018545 -0.018756 -0.004312 0.370118 0.393794 0.031359 0.301409 1.000000

Cluster the stocks based upon their pairwise correlations.

Not surprisingly, the securities cluster by stocks, bonds, and commodities! And that's without predefining the number of clusters.


In [14]:
_, labels = cluster.affinity_propagation(C)
for label in range(max(labels)+1):
    print label, symbols[labels==label]


0 ['blv' 'bnd' 'bsv' 'tip']
1 ['gld' 'iau' 'slv']
2 ['vti' 'vbk' 'aapl' 'ua' 'cmg']

Plot histograms of the daily percent change.

We find that the daily percent change is roughly Gaussian, which implies that the daily change follows a log-normal distribution.


In [15]:
prices.pct_change().hist(column=['vti', 'iau', 'bnd', 'cmg'], bins=40, sharex=True, sharey=True)


Out[15]:
array([[<matplotlib.axes.AxesSubplot object at 0x10b9f0a50>,
        <matplotlib.axes.AxesSubplot object at 0x10b966290>],
       [<matplotlib.axes.AxesSubplot object at 0x10ba96310>,
        <matplotlib.axes.AxesSubplot object at 0x10b894150>]], dtype=object)

Scatter plot the daily percent change between BND and VTI.

There is a slight inverse correlation between BND and VTI.

The plot also suggests that the joint daily change is jointly Gaussian. If two jointly Gaussian variables are uncorrelated, then they are statistically independent.

That is a good property to have when choosing securities for your portfolio. If two securities are perfectly correlated, well, then you should just own more of one. If two securities are perfectly inversely correlated, then their movements cancel out, so it's as if you're investing less. By choosing statistically independent securities, we have a broader range of risk vs. reward combinations.


In [16]:
scatter(prices.bnd.pct_change(), prices.vti.pct_change())
xlabel('BND')
ylabel('VTI')


Out[16]:
<matplotlib.text.Text at 0x10ba66450>

3. Portfolio Assignment

As shown earlier, the securities cluster into three categories: stocks, bonds, and commodities. So let's choose one from each.

Get statistics for a portfolio with positions in VTI, BND, and IAU.


In [17]:
# assuming you use these proportions on day 1
portfolio = 0.5*prices_norm.vti + 0.3*prices_norm.bnd + 0.2*prices_norm.iau
portfolio.pct_change().describe()


Out[17]:
count    467.000000
mean       0.000343
std        0.004556
min       -0.027941
25%       -0.001741
50%        0.000672
75%        0.002995
max        0.015430
dtype: float64

Get statistics for a portfolio that is partially allocated in VTI and partially unallocated.

Compared to the previous portfolio, this portfolio has similar reward but far more risk.


In [18]:
(0.8*prices_norm.vti + 0.2).pct_change().describe()


Out[18]:
count    467.000000
mean       0.000664
std        0.005887
min       -0.020276
25%       -0.002300
50%        0.001026
75%        0.004367
max        0.018488
dtype: float64

Plot the risk and reward of a portfolio with varying positions in VTI and IAU.


In [19]:
pct_vti = linspace(0, 1, 100, endpoint=False)
vti_iau = pandas.DataFrame({
    'reward': pandas.Series([(r*prices_norm.vti + (1-r)*prices_norm.iau).pct_change().mean() for r in pct_vti], index=pct_vti),
    'risk': pandas.Series([(r*prices_norm.vti + (1-r)*prices_norm.iau).pct_change().std() for r in pct_vti], index=pct_vti)
})
ax = vti_iau.plot(secondary_y='risk')
ax.set_xlabel('Percent of portfolio in VTI')
ax.set_ylabel('Mean of daily percent change')
ax.right_ax.set_ylabel('Standard deviation of daily percent change')
title('VTI vs. IAU: Reward and Risk')


Out[19]:
<matplotlib.text.Text at 0x10c451990>

Plot risk vs. reward.


In [20]:
reward_over_risk = vti_iau.reward/vti_iau.risk
reward_over_risk.plot()
xlabel('Percent of portfolio in VTI')
ylabel('Reward/Risk')


Out[20]:
<matplotlib.text.Text at 0x10c9749d0>

Compute optimal allocation between VTI and IAU.

Reward/risk is maximized when the portfolio is approximately 53% VTI and 47% IAU.


In [21]:
reward_over_risk.argmax()


Out[21]:
0.98999999999999999

Repeat for VTI and BND.


In [22]:
vti_bnd = pandas.DataFrame({
    'reward': pandas.Series([(r*prices_norm.vti + (1-r)*prices_norm.bnd).pct_change().mean() for r in pct_vti], index=pct_vti),
    'risk': pandas.Series([(r*prices_norm.vti + (1-r)*prices_norm.bnd).pct_change().std() for r in pct_vti], index=pct_vti)
})
ax = vti_bnd.plot(secondary_y='risk')
ax.set_xlabel('Percent of portfolio in VTI')
ax.set_ylabel('Mean of daily percent change')
ax.right_ax.set_ylabel('Standard deviation of daily percent change')
title('VTI vs. BND: Reward and Risk')


Out[22]:
<matplotlib.text.Text at 0x10cc528d0>

In [23]:
reward_over_risk = vti_bnd.reward/vti_bnd.risk
reward_over_risk.plot()
xlabel('Percent of portfolio in VTI')
ylabel('Reward/Risk')


Out[23]:
<matplotlib.text.Text at 0x10cc72690>

Reward/risk is mazimized at 14% VTI and 86% BND.


In [24]:
reward_over_risk.argmax()


Out[24]:
0.41000000000000003

Find the maximum reward given a risk constraint.

Suppose that we require the standard deviation of the daily percent change to be less than 0.01.

Given this risk constraint, reward is maximized at 77% VTI and 23% BND.


In [25]:
vti_bnd[vti_bnd.risk < 0.01].iloc[-1]


Out[25]:
reward    0.000800
risk      0.007029
Name: 0.99, dtype: float64

Plot the trajectory of three hypothetical portfolios.


In [26]:
portfolios = pandas.DataFrame({
    'bnd heavy': 0.15*prices_norm.vti + 0.15*prices_norm.iau + 0.70*prices_norm.bnd,
    'iau heavy': 0.15*prices_norm.vti + 0.70*prices_norm.iau + 0.15*prices_norm.bnd,
    'vti heavy': 0.70*prices_norm.vti + 0.15*prices_norm.iau + 0.15*prices_norm.bnd,
    'mixed':     0.45*prices_norm.vti + 0.43*prices_norm.iau + 0.12*prices_norm.bnd,
})
portfolios.plot()


Out[26]:
<matplotlib.axes.AxesSubplot at 0x10c44c790>

Get the statistics of these portfolios.


In [27]:
portfolios.pct_change().describe()


Out[27]:
bnd heavy iau heavy mixed vti heavy
count 467.000000 467.000000 467.000000 467.000000
mean 0.000089 -0.000305 0.000161 0.000522
std 0.002568 0.007536 0.005703 0.005609
min -0.017204 -0.062706 -0.045805 -0.028828
25% -0.001041 -0.003362 -0.002408 -0.002371
50% 0.000293 0.000025 0.000579 0.000952
75% 0.001456 0.003327 0.003191 0.004036
max 0.012962 0.031291 0.022142 0.015467

So what does this all mean? I have no idea.