Borrowed and updated from Michael C. Grant, Continuum Analytics
In [1]:
import pandas as pd
import numpy as np
from math import sqrt
import sys
from bokeh.plotting import figure, show, ColumnDataSource, save
from bokeh.models import Range1d, HoverTool
from bokeh.io import output_notebook, output_file
import quandl
from gurobipy import *
# output_notebook() #To enable Bokeh output in notebook, uncomment this line
First of all, we need some data to proceed. For that purpose we use Quandl. First, you're going to need the quandl package. This isn't totally necessary, as pulling from the API is quite simple with or without the package, but it does make it a bit easier and knocks out a few steps. The Quandl package can be downloaded here. If we set up quandl, next thing to do is to choose some stocks to import. The following is a random selection of stocks.
In [2]:
APIToken = "xxx-xxxxxx"
quandlcodes = ["GOOG/NASDAQ_AAPL.4","WIKI/GOOGL.4", "GOOG/NASDAQ_CSCO.4","GOOG/NASDAQ_FB.4",
"GOOG/NASDAQ_MSFT.4","GOOG/NASDAQ_TSLA.4","GOOG/NASDAQ_YHOO.4","GOOG/PINK_CSGKF.4",
"YAHOO/F_EOAN.4","YAHOO/F_BMW.4","YAHOO/F_ADS.4","GOOG/NYSE_ABB.4","GOOG/VTX_ADEN.4",
"GOOG/VTX_NOVN.4","GOOG/VTX_HOLN.4","GOOG/NYSE_UBS.4", "GOOG/NYSE_SAP.4", "YAHOO/SW_SNBN.4",
"YAHOO/IBM.4", "YAHOO/RIG.4" , "YAHOO/CTXS.4", "YAHOO/INTC.4","YAHOO/KO.4",
"YAHOO/NKE.4","YAHOO/MCD.4","YAHOO/EBAY.4","GOOG/VTX_NESN.4","YAHOO/MI_ALV.4","YAHOO/AXAHF.4",
"GOOG/VTX_SREN.4"]
The command to import those stocks is quandl.get()
. With trim_start and trim_end we can choose a desired time horizon.
In [3]:
data = quandl.get(quandlcodes,authtoken=APIToken, trim_start='2009-01-01', trim_end='2016-11-09', paginate=True, per_end_date={'gte': '2009-01-01'},
qopts={'columns':['ticker', 'per_end_date']})
Let's now calculate the growth rates and some stats:
In [4]:
GrowthRates = data.pct_change()*100
syms = GrowthRates.columns
Sigma = GrowthRates.cov()
stats = pd.concat((GrowthRates.mean(),GrowthRates.std()),axis=1)
stats.columns = ['Mean_return', 'Volatility']
extremes = pd.concat((stats.idxmin(),stats.min(),stats.idxmax(),stats.max()),axis=1)
extremes.columns = ['Minimizer','Minimum','Maximizer','Maximum']
stats
Out[4]:
As we move towards our Markowitz portfolio designs it makes sense to view the stocks on a mean/variance scatter plot.
In [5]:
fig = figure(tools="pan,box_zoom,reset,resize")
source = ColumnDataSource(stats)
hover = HoverTool(tooltips=[('Symbol','@index'),('Volatility','@Volatility'),('Mean return','@Mean_return')])
fig.add_tools(hover)
fig.circle('Volatility', 'Mean_return', size=5, color='maroon', source=source)
fig.text('Volatility', 'Mean_return', syms, text_font_size='10px', x_offset=3, y_offset=-2, source=source)
fig.xaxis.axis_label='Volatility (standard deviation)'
fig.yaxis.axis_label='Mean return'
output_file("portfolio.html")
show(fig)
Time to bring in the big guns. Expressed in mathematical terms, we will be solving models in this form:
$$\begin{array}{lll} \text{minimize} & x^T \Sigma x \\ \text{subject to} & \sum_i x_i = 1 & \text{fixed budget} \\ & r^T x = \gamma & \text{fixed return} \\ & x \geq 0 \end{array}$$In this model, the optimization variable $x\in\mathbb{R}^N$ is a vector representing the fraction of the budget allocated to each stock; that is, $x_i$ is the amount allocated to stock $i$. The paramters of the model are the mean returns $r$, a covariance matrix $\Sigma$, and the target return $\gamma$. What we will do is sweep $\gamma$ between the worst and best returns we have seen above, and compute the portfolio that achieves the target return but with as little risk as possible.
The covariance matrix $\Sigma$ merits some explanation. Along the diagonal, it contains the squares of the volatilities (standard deviations) computed above. But off the diagonal, it contains measures of the correlation between two stocks: that is, whether they tend to move in the same direction (positive correlation), in opposite directions (negative correlation), or a mixture of both (small correlation). This entire matrix is computed with a single call to Pandas.
We are not solving just one model here, but literally hundreds of them, with different return targets and with the short constraints added or removed. One very nice feature of the Gurobi Python interface is that we can build a single "base" model, and reuse it for each of these scenarios by adding and removing constraints.
First, let's initialize the model and declare the variables. As we mentioned above, we're creating separate variables for the long and short positions. We put these variables into a Pandas DataFrame for easy organization, and create a third column that holds the difference between the long and short variables---that is, the net allocations for each stock. Another nice feature of Gurobi's Python interface is that the variable objects can be used in simple linear and quadratic expressions using familar Python syntax.
In [6]:
# Instantiate our model
m = Model("portfolio")
# Create one variable for each stock
portvars = [m.addVar(name=symb,lb=0.0) for symb in syms]
portvars[7]=m.addVar(name='GOOG/PINK_CSGKF - Close',lb=0.0,ub=0.5)
portvars = pd.Series(portvars, index=syms)
portfolio = pd.DataFrame({'Variables':portvars})
# Commit the changes to the model
m.update()
# The total budget
p_total = portvars.sum()
# The mean return for the portfolio
p_return = stats['Mean_return'].dot(portvars)
# The (squared) volatility of the portfolio
p_risk = Sigma.dot(portvars).dot(portvars)
# Set the objective: minimize risk
m.setObjective(p_risk, GRB.MINIMIZE)
# Fix the budget
m.addConstr(p_total, GRB.EQUAL, 1)
# Select a simplex algorithm (to ensure a vertex solution)
m.setParam('Method', 1)
m.optimize()
In [7]:
portfolio['Minimum risk'] = portvars.apply(lambda x:x.getAttr('x'))
portfolio
Out[7]:
In [8]:
# Add the return target
ret50 = 0.5 * extremes.loc['Mean_return','Maximum']
fixreturn = m.addConstr(p_return, GRB.EQUAL, ret50)
m.optimize()
portfolio['50% Max'] = portvars.apply(lambda x:x.getAttr('x'))
Now what we're going to do is sweep our return target over a range of values, starting at the smallest possible value to the largest. For each, we construct the minimum-risk portfolio. This will give us a tradeoff curve that is known in the business as the efficient frontier or the Pareto-optimal curve.
Note that we're using the same model object we've already constructed! All we have to do is set the return target and re-optimize for each value of interest.
In [9]:
m.setParam('OutputFlag',False)
# Determine the range of returns. Make sure to include the lowest-risk
# portfolio in the list of options
minret = extremes.loc['Mean_return','Minimum']
maxret = extremes.loc['Mean_return','Maximum']
riskret = extremes.loc['Volatility','Minimizer']
riskret = stats.loc[riskret,'Mean_return']
riskret =sum(portfolio['Minimum risk']*stats['Mean_return'])
returns = np.unique(np.hstack((np.linspace(minret,maxret,10000),riskret)))
# Iterate through all returns
risks = returns.copy()
for k in range(len(returns)):
fixreturn.rhs = returns[k]
m.optimize()
risks[k] = sqrt(p_risk.getValue())
In [10]:
fig = figure(tools="pan,box_zoom,reset,resize")
# Individual stocks
fig.circle(stats['Volatility'], stats['Mean_return'], size=5, color='maroon')
fig.text(stats['Volatility'], stats['Mean_return'], syms, text_font_size='10px', x_offset=3, y_offset=-2)
fig.circle('Volatility', 'Mean_return', size=5, color='maroon', source=source)
# Divide the efficient frontier into two sections: those with
# a return less than the minimum risk portfolio, those that are greater.
tpos_n = returns >= riskret
tneg_n = returns <= riskret
fig.line(risks[tneg_n], returns[tneg_n], color='red')
fig.line(risks[tpos_n], returns[tpos_n], color='blue')
fig.xaxis.axis_label='Volatility (standard deviation)'
fig.yaxis.axis_label='Mean return'
fig.legend.orientation='bottom_left'
output_file("efffront.html")
show(fig)