This tutorial demonstrates the use of Python tools and libraries applied to volatility modelling, more specifically the generalized autoregressive conditional heteroscedasticity (GARCH) model.
Working with financial data, we are often faced with processing time series, that is sequences of discrete time data (e.g. the daily closing value of a stock market index). To work with such data, we can use parametric models in the time domain.
A GARCH model is essentially an autoregressive conditional heteroskedasticity model for which an autoregressive moving average model (ARMA model) is assumed for the error variance.
The first step of our tutorial is to get the data. For this tutorial, we will use the adjusted closing price of 10 stocks (the adjustment is done to account for the dividends and the splits).
The pandas_datareader library could prove useful for this task (note that such tools are very sensitive to changes in the target API). However, this will be the subject of some other tutorial. For now, let's just read the local files.
In [1]:
import pandas as pd
import string
raw_prices = pd.DataFrame(columns=['Date'])
stock_names = string.ascii_uppercase[:10]
for stock_name in stock_names:
raw_data = pd.read_csv('data/'+stock_name+'.csv')[['Date','Adj Close']]
raw_data.rename(columns={'Adj Close':stock_name},inplace=True)
raw_data.dropna(inplace=True)
raw_prices = pd.merge(raw_prices,raw_data,how='outer',on='Date')
raw_prices.sort_values(by='Date',inplace=True)
raw_prices.head()
Out[1]:
We can see that some of the assets had prices back in the 60's but that many others didn't. Let's drop the incomplete rows. We could have done inner joins instead of of an outer join followed by dropping incomplete rows, but we wanted to show the complete picture. (We leave it as an exercise to the reader to write inner joins.)
In [2]:
raw_prices.dropna(inplace=True)
raw_prices.head()
Out[2]:
In [3]:
raw_prices.tail()
Out[3]:
This seems better. Let's constrain our analysis from Mar 1, 1990, to Jun 31, 2017, in order to have complete calendar year quarters. (We leave as an exercise to the reader to make sure that we have data for every trading day in that period.)
In [4]:
raw_prices = raw_prices[(raw_prices.Date>'1990-03') & (raw_prices.Date<'2017-07')]
raw_prices.head(2)
Out[4]:
In [5]:
raw_prices.tail(2)
Out[5]:
For simplicity, let's pretend that our portfolio consists of one share of each stock.
In [6]:
raw_prices['Close'] = raw_prices[list(stock_names)].sum(axis=1)
portfolio = raw_prices.reset_index()[['Date','Close']].copy()
portfolio.head()
Out[6]:
In [7]:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
import seaborn as sns
In [8]:
f, ax = plt.subplots(figsize=(12, 6))
sns.set(color_codes=True)
sns.tsplot(data=portfolio.Close, ax=ax)
xticks = range(0,len(portfolio),365*2)
ax.set_xticks(xticks)
ax.set_xticklabels([pd.to_datetime(portfolio.Date).dt.date[i] for i in xticks], rotation=90)
ax.set_ylabel('Portfolio - Closing value ')
Out[8]:
In [9]:
portfolio['Return'] = np.log(portfolio.Close / portfolio.Close.shift())
portfolio['R2'] = np.square(portfolio['Return'])
portfolio.tail()
Out[9]:
Note that we use the daily continuously compounded return (log return) instead of the daily simple rate of return as given by the pct_change method:
In [10]:
portfolio.Close.pct_change().tail()
Out[10]:
Both returns appear relatively similar. However, they both have their pros and cons. The main reason we prefer to use the log return is that we can calculate the compounded return over a specific horizon by adding the daily returns. This simple fact facilitates the development of models.
In [11]:
def plot_autocorrelation(returns, y_label,zero_line=False):
lags = np.arange(1,101)
return_autocorr = [returns.autocorr(lag=lag) for lag in lags]
f, ax = plt.subplots(figsize=(12, 6))
sns.set(color_codes=True)
sns.tsplot(data=return_autocorr, time=lags, ax=ax)
ax.set_xticks(np.arange(lags[0]-1,lags[-1]+1,10))
ax.set_xlabel('Lag order')
ax.set_ylabel(y_label)
if(zero_line):
plt.plot([lags[0],lags[-1]], [0,0], 'b')
In [12]:
plot_autocorrelation(portfolio.Return, 'Autocorrelation of daily returns',zero_line=True)
We see that there is very little autocorrelation for daily returns of the portfolio over this period. Let's look at the autocorrelation of the squared returns.
In [13]:
plot_autocorrelation(portfolio.R2, 'Autocorrelation of squared daily returns')
We notice a fair amount of autocorrelation of the squared returns.
For the purpose of this tutorial, let's first build our own model. In this model, we consider one lag of squared returns and one lag of variance (1,1).
In [14]:
import scipy.optimize as opt
def compute_omega(factor,alpha,beta):
persistence = alpha + beta
return factor *(1-persistence)
def compute_cond_var(data,initial_value,alpha,beta,omega=None):
if omega is None:
omega = compute_omega(initial_value,alpha,beta)
cond_var = np.array([initial_value], dtype=np.float64)
for time in range(1,len(data)):
cond_var = np.append(cond_var,(omega +beta*cond_var[time-1]+alpha*data[time]**2))
return cond_var
In [15]:
def fn(x, df, r_var):
sig_2 = compute_cond_var(df.Return,r_var,x[0],x[1])[:-1]
return (np.log(sig_2)+df.R2[1:]/sig_2).sum()*1/2
res = opt.minimize(fn, [0.08,0.82], bounds=[[0,1],[0,1]],
args=(portfolio, portfolio.Return.std()**2))
res
Out[15]:
In [16]:
compute_cond_var(portfolio.Return,portfolio.Return.std()**2,res.x[0],res.x[1])
Out[16]:
It turns out that a python library exists for such models. Let's use it to build a GARCH(1,1) model and compare it to our own model.
In [17]:
from arch import arch_model
garch = arch_model(100*portfolio.Return[1:],)
res = garch.fit(update_freq=5)
print(res.summary())
In [18]:
omega = res.params['omega']/(100**2)
omega
Out[18]:
In [19]:
compute_cond_var(portfolio.Return,portfolio.Return.std()**2,res.params['alpha[1]'],res.params['beta[1]'],omega)
Out[19]:
Our model was quite simple. Let's make it slightly more robust and take into account the leverage effect (nonlinear GARCH also known as NGARCH).
In [20]:
import scipy.optimize as opt
from functools import partial
import sys
# Constraints for the optimization algorithm
a = np.array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.],
[-1., 0., 0.],
[ 0., -1., 0.],
[ 0., 0., 0.]])
b = [ 0., 0., 0., -1., -1., 0.]
def ieqcons(x, *args, **kwargs):
ineq = kwargs['a'].dot(x) - kwargs['b']
ineq[len(a)-1] = 1 - compute_persistance(x[0],x[1],x[2])
return ineq
def f_ieqcons(a,b):
return partial(ieqcons,a=a,b=b)
def compute_persistance(alpha,beta,theta=0.0):
return alpha*(1+theta**2) + beta
def compute_omega(unconditional_variance,persistence):
return unconditional_variance *(1-persistence)
def compute_cond_var(returns,unconditional_variance,alpha,beta,omega=None,theta=0.0):
if omega is None:
omega = compute_omega(unconditional_variance,compute_persistance(alpha,beta,theta))
cond_var = np.array([unconditional_variance], dtype=np.float64)
for time in range(1,len(returns)):
cond_var = np.append(cond_var,(omega +beta*cond_var[time-1]
+alpha*(returns[time-1]-theta*cond_var[time-1]**0.5)**2))
return cond_var
def our_garch(x, returns, unconditional_variance):
alpha = x[0]
beta = x[1]
theta = 0 if (len(x)<=2) else x[2]
persistence = compute_persistance(alpha,beta,theta)
# somehow fmin_slsqp runs iterations that fail to respect the given constraints (values in ieqcons are < 0)
# so this is a workaround
if(persistence<=1 and persistence>=0):
omega = compute_omega(unconditional_variance,persistence)
cond_var = compute_cond_var(returns,unconditional_variance,alpha,beta,omega,theta)
score = (np.log(cond_var)+np.square(returns)/cond_var).sum()
else:
score = sys.float_info.max
return score
returns = portfolio.Return[1:].tolist()
unconditional_variance = portfolio.Return[1:].std()**2
Without leverage, we get the same results as before (whitout warnings this time):
In [21]:
res = opt.fmin_slsqp(our_garch, [0.08,0.82,0.0], bounds=((0,1),(0,1),(0,0)), f_ieqcons=f_ieqcons(a,b),
args=(returns,unconditional_variance), iter=100, acc=1e-10,
epsilon=1.4901161193847656e-08)
res
Out[21]:
Including the leverage:
In [22]:
res = opt.fmin_slsqp(our_garch, [0.08,0.82,0.5], f_ieqcons=f_ieqcons(a,b),
args=(returns,unconditional_variance), iter=100, acc=1e-06,
epsilon=1.4901161193847656e-08)
res
Out[22]:
In [ ]: