Stock analysis: returns and volatility

This notebook aims to explore the Markowitz theory on modern portfolios with a little of code and a little of maths. The modern portfolio theory seeks to build a portfolio of different assets in such way that increases the returns and reduces the risk of holding the portfolio. In almost any treatment of risk I have read, the risk is repressented by the standar deviation of the returns and is called volatility. Let's assume that the random walk for stocks is of the form

$$ S_{t+1} = S_t \mu \Delta t + S_t \sigma \epsilon \sqrt{\Delta t} $$

where $S$ is the stock price at time $t$, $\mu$ and $\sigma$ are mean and standar deviation, $\Delta t$ the time step and $\epsilon$ a normal distributed random variable with mean zero and variance one. Under this assumption, $\sigma$ indicates how scattered are the returns in the future and under certain conditions, there is a probability that it can lead to a loss due to this scattering. So, an investor seeks to maximize returns, but keeping volatility at bay, so to reduce the chance of lossing money.

For this analysis we will use only basic numerical libraries, to show how some algorithms work without much black-box and we will work with real stock data.


In [1]:
import numpy as np
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize=(12,12))


Out[1]:
<matplotlib.figure.Figure at 0x7fec4544bb38>
<matplotlib.figure.Figure at 0x7fec4544bb38>

The volatility calculation is made using the return for each day, given the close price of any stock. As we are interested in the annual volatility, this needs to be scaled by a factor $\sqrt{n}$, where $n$ is the number of working days in a year (I will assume 260)


In [2]:
# beware! it takes a numpy array
def calculate_volatility(df, t=10):  # default timespan for volatility
    rets = np.diff(np.log(df))
    vol = np.zeros(rets.size - t)
    for i in range(vol.size):
        vol[i] = np.std(rets[i:(t+i)])
    return np.sqrt(260)*vol

Using the scripts on utils it's possible to download the data from Yahoo! Finance for the following stocks (by default):

  • AAPL
  • AMZN
  • GOOG
  • MSFT

We will use the function defined above to plot the running volatility of those stocks on a 63 days window.


In [3]:
symbols = np.loadtxt("../utils/symbols", dtype=str)
n_sym = symbols.size

data = {}
for s in map(lambda x: x[2:-1], symbols):
    data[str(s)] = pd.read_csv("../data/{}.csv".format(str(s)))
    data[str(s)].sort_values(by="Date", ascending=True, inplace=True)

    
t = data["AAPL"].Date[64:]
t = [dt.datetime.strptime(d,'%Y-%m-%d').date() for d in t]
plt.subplot(411)
plt.plot(t, calculate_volatility(data["AAPL"].Close.values, 63))
plt.subplot(412)
plt.plot(t, calculate_volatility(data["AMZN"].Close.values, 63))
plt.subplot(413)
plt.plot(t, calculate_volatility(data["GOOG"].Close.values, 63))
plt.subplot(414)
plt.plot(t, calculate_volatility(data["MSFT"].Close.values, 63))
plt.tight_layout()


I forgot to check if in the time in which those stocks are defined there was any splitting (that's why it's important explorative analysis).

AAPL splitted on Febbruary 28 2005 by 2:1 and June 9 2014 by 7:1, while GOOG splitted on April 2 2014 (not a real split, it generated another kind of stock, splitting by 2:1)


In [4]:
def repair_split(df, info):   # info is a list of tuples [(date, split-ratio)]
    temp = df.Close.values.copy()
    for i in info:
        date, ratio = i
        mask = np.array(df.Date >= date)
        temp[mask] = temp[mask]*ratio
    return temp


aapl_info = [("2005-02-28", 2), ("2014-06-09", 7)]
plt.figure()
plt.subplot(411)
plt.plot(t, calculate_volatility(repair_split(data["AAPL"], aapl_info), 63))
plt.subplot(412)
plt.plot(t, calculate_volatility(data["AMZN"].Close.values, 63))
plt.subplot(413)
plt.plot(t, calculate_volatility(repair_split(data["GOOG"], [("2014-03-27", 2)]), 63))
plt.subplot(414)
plt.plot(t, calculate_volatility(data["MSFT"].Close.values, 63))
plt.tight_layout()


Now let's build a data structure with only daily return for every stock. The model for returns is given by a compounding of the daily return, like

$$ S_{n+1} = S_{n}e^{r} $$

In [5]:
rets = {key:np.diff(np.log(df.Close.values)) for key, df in data.items()}

In [6]:
corr = np.corrcoef(list(rets.values()))
rets.keys(), corr


Out[6]:
(dict_keys(['GOOG', 'MSFT', 'AMZN', 'AAPL']),
 array([[ 1.        ,  0.40808215,  0.40073829,  0.19216722],
        [ 0.40808215,  1.        ,  0.44450401,  0.21672978],
        [ 0.40073829,  0.44450401,  1.        ,  0.20380946],
        [ 0.19216722,  0.21672978,  0.20380946,  1.        ]]))

In [7]:
plt.xticks(range(4), rets.keys(), rotation=45)
plt.yticks(range(4), rets.keys())
plt.imshow(corr, interpolation='nearest')


Out[7]:
<matplotlib.image.AxesImage at 0x7fec1a512b00>

Too bad there arent negative correlations, after all, they all play more or less on the same areas. So, given those stocks, what combination yields the best portfolio (bigger return, lower risk) with a given ammount of investment capital? The optimization problem isn't so hard numericaly, but its possible to derive an analytical expression. What follows is know as modern portfolio theory.

First, let's explore what I think is a pretty beautifull result (because I didn't expected it). We will allocate randomly those stocks in order to generate a lot of random portfolios with the same initial value.


In [8]:
def normalize(v):
    return v/v.sum()

portfolios = np.random.uniform(0, 1, size=(1000, 4))        # 1000 random portfolios
portfolios = np.apply_along_axis(normalize, 1, portfolios)  # normalize so that they sum 1
# total returns per dollar per portfolio
total_returns = np.dot(portfolios, list(rets.values()))

mean = 260*total_returns.mean(axis=1)
std = np.sqrt(260)*total_returns.std(axis=1)
plt.scatter(std, mean)
plt.xlabel("Annual volatility")
plt.ylabel("Annual returns")


Out[8]:
<matplotlib.text.Text at 0x7fec1a483c50>

It's easy to see that there is a hard superior limit on the points for a given volatility. That curve is called efficient frontier and it represents the best portfolio allocation (less risk for an expected return or bigger returns for a choice of risk). It seems somewhat to a parabolla. As said before, it's possible to derive some analytical expressions for that curve.

Let $w_i$ be a weight vector that represents the ammount of stock $i$ and let its sum be one (scale isn't important here, so it can be put as the initial investment, but the number one is easy and I don't carry more symbols).

The expected return of the i-th stock is $r_i$, so the total return for a portfolio is

$$ r = w_i r^i $$

(summation of same indices is implied). In the same way, the total volatility (standar deviation) of the portfolio is

$$ \sigma^2 = K_i^j w_j w^i $$

where $K$ is the covariance matrix, and the condition on the weights is expressed as

$$ w_i 1^i = 1 $$

where $1^i$ is a vector of ones. If we choice an expected return, we can build an optimal portfolio by minimizing the standar deviation. So the problem becomes

$$ min\left( K_i^j w_j w^i \,\,\,|\,\,\, w_i 1^i = 1,\,\,\, r = w_i r^i \right) $$

the right side I think may bring some confusion: the $w_i$ isn't bounded, only the $r$. In fact, if $r^i$ is a n-dimentional vector, for a given $r$ there is a full subspace of dimension $n-1$ of weights. The Lagrange multiplier problem can be solved by minimizing

$$ \Lambda(w, \lambda) = K_j^i w_i w^j + \lambda_1 \left( w_i 1^i - 1 \right) + \lambda_2 \left( w_i r^i - r \right) $$$$ \frac{\partial\Lambda}{\partial w_i} = 2 K_j^i w^j + \lambda_1 1^i + \lambda_2 r^i = 0 $$

and solving for $w^j$ yields

$$ w^j = -\frac{1}{2} (K_j^i)^{-1} \left( \lambda_1 1^i + \lambda_2 r^i \right) $$

the term between parentesis can be put in a concise way as

$$ (\lambda \cdot R)^T $$

where $\lambda$ is a 2-dimensional row vector and R a $2 \times q$ matrix (with q the number of stocks)

$$ \lambda = (\lambda_1 \,\,\,\,\,\lambda_2) $$$$ R = (1^i\,\,\,\,\,r^i)^T $$

this way, the bounding conditions can be put also as

$$ R w^j = (1\,\,\,\,\,r)^T $$

In this last expression, the weight can be changed with the solution above, returning

$$ -\frac{1}{2}\lambda \cdot \left[ R (K_j^i)^{-1} R^T \right] = (1\,\,\,\,\,r) $$

calling $M$ that messy $2\times 2$ matrix in brackets, it's possible to solve $\lambda$ as

$$ \lambda = -(2\,\,\,\,\,2r) \cdot M^{-1} $$

It's easy to check that the matrix $M$, and hence also it's inverse, are symmetric. And with this, the variance can be (finaly) solved:

$$ \sigma^2 = K_i^j w_j w^i = \frac{1}{4}\lambda R K^{-1} K K^{-1} R^T \lambda^T $$$$ = \frac{1}{4}\lambda R K^{-1} R^T \lambda^T = \frac{1}{4}\lambda M \lambda^T $$$$ = (1\,\,\,\,\,r) M^{-1} (1\,\,\,\,\,r)^T $$

That will be a very long calculation. I will just put the final result (remember, that formula is a scalar). The elements of M are

$$ M_{00} = 1_i (K_j^i)^{-1} 1^i $$$$ M_{11} = r_i (K_j^i)^{-1} r^i $$$$ M_{10} = M_{01} = 1_i (K_j^i)^{-1} r^i $$

and the minimal variance, in function of the desidered return is

$$ \sigma^2(r) = \frac{M_{00} r^2 - 2M_{01} r + M_{11}}{M_{00}M_{11} - M_{01}^2} $$

and the weights are

$$ w^j = (K_j^i)^{-1} R^T M^{-1} (1\,\,\,\,\,r)^T $$

I was wrong, the plot of variance-mean is a parabola, but it seems that volatility-mean is a hyperbolla.

So, returning to code:


In [9]:
K = 260*np.cov(list(rets.values()))  # annual covariance
R = np.array([np.ones(4), 260*np.mean(list(rets.values()), axis=1)])
x = np.array([1, 0.15])  # I will select a 15% of annual return
M = np.dot(R, np.dot(np.linalg.inv(K), R.transpose()))
variance = np.dot(x, np.dot(np.linalg.inv(M), x.transpose()))
volatility = np.sqrt(variance)
weigths = np.dot(np.linalg.inv(K), np.dot(R.transpose(), np.dot(np.linalg.inv(M), x.transpose())))

In [10]:
volatility, weigths


Out[10]:
(0.27167570703300697,
 array([ 0.21001437,  0.42240463,  0.36320754,  0.00437345]))

For example, at this day (24 Jannuary 2017) the market closed with the following prices for the stocks in this list:

  • GOOG: 823.87
  • MSFT: 63.52
  • AMZN: 822.44
  • AAPL: 119.97

with the assets allocation suggested by the optimum, if I have $10000 to invest, I will need to:

  • Buy 2 stocks of GOOG (2.5)
  • Buy 66 stocks of MSFT (66.5)
  • Buy 4 stocks of AMZN (4.4)
  • Don't buy AAPL (0.4)
  • Put the remaining $870.18 to take LIBOR rate? or to rebalance the portfolio? options?

Another kind of optimization that it's possible is to maximize the Sharpe ratio, defined as the ration of the expected return and the volatility. One can think of it as the returns for unit of risk, so maximizing it yields an optimization indeed. We know that any optimal portfolio is in the efficient frontier, so having an expression of this curve we only need to maximize

$$ S = \frac{r}{\sigma} $$

The expression for the volatility in function of the desidered return can be put as

$$ \sigma^2 = ar^2 + br + c $$

As we are interested only on the optimal curve, we will consider only the right side of this parabolla. This way, we have an additional advantage of having an invertible function on its domain. The return then have solution

$$ r = \frac{-b + \sqrt{b^2 - 4a(c - \sigma^2)}}{2a} $$

(seems like cheating, I know...), so the Sharpe ratio becomes

$$ S = \frac{-b + \sqrt{b^2 - 4a(c - \sigma^2)}}{2a \sigma} $$

do note that the part beyond square root is always positive thanks to the Bessel inequality, at least for this problem, so the Sharpe ratio will always defined positive.

Doing the derivative of S with respect to $\sigma$ and solving the problem for the maximum the solutions are

$$ \sigma = \pm \sqrt{\frac{4ac^2 - b^2 c}{b^2}} $$

and we take the positive value. With the real values back we obtain a very simple expression for the volatility:

$$ \sigma = \sqrt{\frac{M_{11}}{M_{01}^2}} $$

and the return:

$$ r = \frac{M_{00}}{M_{10}} $$

With the volatility the other quantities can be calculated as well using the formulas, so let's return again to code:


In [11]:
volatility = np.sqrt(M[1,1]/(M[0,1]*M[0,1]))
returns = M[1,1]/M[1,0]
x = np.array([1, returns])
weigths = np.dot(np.linalg.inv(K), np.dot(R.transpose(), np.dot(np.linalg.inv(M), x.transpose())))
returns, volatility, weigths


Out[11]:
(0.25450263575661258,
 0.40190208713762882,
 array([ 0.20758566, -0.05706873,  0.928021  , -0.07853793]))

This time the algorithm ask to buy a lot of AMZN, some of GOOG and sell a little of the others. With the same $10000 the portfolio distribution would be:

  • Buy 2 stocks of GOOG (2.5)
  • Sell 9 stocks of MSFT (8.9)
  • Buy 11 stocks of AMZN (11.3)
  • Sell 6 stocks of AAPL (6.5)
  • With remaining $596.92 to play

It's possible to visualize the Sharpe factor for differents volatilities, rewriting the equation of the returns in function of the volatility as

$$ r = \frac{M_{10} + \sqrt{det(M)}\sqrt{M_{00}\sigma^2 - 1}}{M_{00}} $$

and hence the Sharpe ratio as

$$ S = \frac{M_{10} + \sqrt{det(M)}\sqrt{M_{00}\sigma^2 - 1}}{\sigma M_{00}} $$

In [12]:
sig = np.linspace(0.28, 0.6, 100)
sharpe = (M[1,0] + np.sqrt(np.linalg.det(M))*np.sqrt(sig*sig*M[0,0] - 1))/(sig*M[0,0])
plt.plot(sig, sharpe)


Out[12]:
[<matplotlib.lines.Line2D at 0x7fec1a83ed30>]

In [ ]: