# Volatility Modelling in Python

This tutorial demonstrates the use of Python tools and libraries applied to volatility modelling, more specifically the generalized autoregressive conditional heteroscedasticity (GARCH) model.

Working with financial data, we are often faced with processing time series, that is sequences of discrete time data (e.g. the daily closing value of a stock market index). To work with such data, we can use parametric models in the time domain.

A GARCH model is essentially an autoregressive conditional heteroskedasticity model for which an autoregressive moving average model (ARMA model) is assumed for the error variance.

## Getting the data

The first step of our tutorial is to get the data. For this tutorial, we will use the adjusted closing price of 10 stocks (the adjustment is done to account for the dividends and the splits).

The pandas_datareader library could prove useful for this task (note that such tools are very sensitive to changes in the target API). However, this will be the subject of some other tutorial. For now, let's just read the local files.

``````

In [1]:

import pandas as pd
import string

raw_prices = pd.DataFrame(columns=['Date'])
stock_names = string.ascii_uppercase[:10]

for stock_name in stock_names:
raw_data.dropna(inplace=True)
raw_prices = pd.merge(raw_prices,raw_data,how='outer',on='Date')

raw_prices.sort_values(by='Date',inplace=True)

``````
``````

Out[1]:

Date
A
B
C
D
E
F
G
H
I
J

8450
1962-01-02
NaN
0.202636
NaN
0.059620
NaN
NaN
NaN
NaN
NaN
NaN

8451
1962-01-03
NaN
0.206688
NaN
0.060421
NaN
NaN
NaN
NaN
NaN
NaN

8452
1962-01-04
NaN
0.204662
NaN
0.060421
NaN
NaN
NaN
NaN
NaN
NaN

8453
1962-01-05
NaN
0.200609
NaN
0.060621
NaN
NaN
NaN
NaN
NaN
NaN

8454
1962-01-08
NaN
0.201116
NaN
0.060421
NaN
NaN
NaN
NaN
NaN
NaN

``````

We can see that some of the assets had prices back in the 60's but that many others didn't. Let's drop the incomplete rows. We could have done inner joins instead of of an outer join followed by dropping incomplete rows, but we wanted to show the complete picture. (We leave it as an exercise to the reader to write inner joins.)

``````

In [2]:

raw_prices.dropna(inplace=True)

``````
``````

Out[2]:

Date
A
B
C
D
E
F
G
H
I
J

1550
1990-02-16
3.657441
12.040095
0.064382
6.770798
4.859589
0.456696
3.713418
1.243683
0.841391
3.756277

1551
1990-02-20
3.556310
11.776785
0.066552
6.647411
4.784537
0.467182
3.615980
1.222314
0.815098
3.688598

1552
1990-02-21
3.539457
11.585296
0.065105
6.593432
4.690724
0.447959
3.605154
1.260778
0.817727
3.637836

1553
1990-02-22
3.488895
11.609239
0.065829
6.562585
4.690724
0.449124
3.551025
1.291745
0.815098
3.595536

1554
1990-02-23
3.438330
11.489553
0.065467
6.462332
4.615672
0.454366
3.518545
1.253250
0.809839
3.527858

``````
``````

In [3]:

raw_prices.tail()

``````
``````

Out[3]:

Date
A
B
C
D
E
F
G
H
I
J

8444
2017-06-28
89.328644
199.619995
31.783060
106.070923
154.300003
69.800003
76.510002
153.229996
34.200001
133.820007

8445
2017-06-29
90.651367
197.460007
31.119259
104.929413
153.130005
68.489998
75.930000
152.160004
33.540001
132.639999

8446
2017-06-30
90.900002
197.750000
31.010277
105.465424
153.160004
68.930000
75.680000
153.399994
33.740002
132.289993

8447
2017-07-03
92.750000
198.589996
31.039999
106.666489
152.500000
68.169998
75.360001
154.009995
33.459999
132.899994

8449
2017-07-06
93.379997
201.479996
30.719999
103.349998
153.089996
68.570000
75.470001
152.050003
33.630001
132.520004

``````

This seems better. Let's constrain our analysis from Mar 1, 1990, to Jun 31, 2017, in order to have complete calendar year quarters. (We leave as an exercise to the reader to make sure that we have data for every trading day in that period.)

``````

In [4]:

raw_prices = raw_prices[(raw_prices.Date>'1990-03') & (raw_prices.Date<'2017-07')]

``````
``````

Out[4]:

Date
A
B
C
D
E
F
G
H
I
J

1558
1990-03-01
3.43833
12.447020
0.066552
6.670546
4.739837
0.466017
3.864985
1.364459
0.854538
3.654757

1559
1990-03-02
3.50575
12.423082
0.067275
6.732243
4.777454
0.483492
3.908290
1.364459
0.857166
3.722438

``````
``````

In [5]:

raw_prices.tail(2)

``````
``````

Out[5]:

Date
A
B
C
D
E
F
G
H
I
J

8445
2017-06-29
90.651367
197.460007
31.119259
104.929413
153.130005
68.489998
75.93
152.160004
33.540001
132.639999

8446
2017-06-30
90.900002
197.750000
31.010277
105.465424
153.160004
68.930000
75.68
153.399994
33.740002
132.289993

``````

For simplicity, let's pretend that our portfolio consists of one share of each stock.

``````

In [6]:

raw_prices['Close'] = raw_prices[list(stock_names)].sum(axis=1)
portfolio = raw_prices.reset_index()[['Date','Close']].copy()

``````
``````

Out[6]:

Date
Close

0
1990-03-01
37.567041

1
1990-03-02
37.841649

2
1990-03-05
37.961847

3
1990-03-06
38.351505

4
1990-03-07
38.399774

``````

## Plotting the data

``````

In [7]:

import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
import seaborn as sns

``````
``````

/usr/local/lib/python2.7/dist-packages/IPython/html.py:14: ShimWarning: The `IPython.html` package has been deprecated since IPython 4.0. You should import from `notebook` instead. `IPython.html.widgets` has moved to `ipywidgets`.
"`IPython.html.widgets` has moved to `ipywidgets`.", ShimWarning)

``````
``````

In [8]:

f, ax = plt.subplots(figsize=(12, 6))
sns.set(color_codes=True)
sns.tsplot(data=portfolio.Close, ax=ax)

xticks = range(0,len(portfolio),365*2)
ax.set_xticks(xticks)
ax.set_xticklabels([pd.to_datetime(portfolio.Date).dt.date[i] for i in xticks], rotation=90)
ax.set_ylabel('Portfolio - Closing value ')

``````
``````

Out[8]:

<matplotlib.text.Text at 0x7f357058df50>

``````

## Computing returns and squared returns

``````

In [9]:

portfolio['Return'] = np.log(portfolio.Close / portfolio.Close.shift())
portfolio['R2'] = np.square(portfolio['Return'])
portfolio.tail()

``````
``````

Out[9]:

Date
Close
Return
R2

6884
2017-06-26
1045.294773
-0.001264
0.000002

6885
2017-06-27
1042.395983
-0.002777
0.000008

6886
2017-06-28
1048.662634
0.005994
0.000036

6887
2017-06-29
1040.050053
-0.008247
0.000068

6888
2017-06-30
1042.325696
0.002186
0.000005

``````

Note that we use the daily continuously compounded return (log return) instead of the daily simple rate of return as given by the pct_change method:

``````

In [10]:

portfolio.Close.pct_change().tail()

``````
``````

Out[10]:

6884   -0.001263
6885   -0.002773
6886    0.006012
6887   -0.008213
6888    0.002188
Name: Close, dtype: float64

``````

Both returns appear relatively similar. However, they both have their pros and cons. The main reason we prefer to use the log return is that we can calculate the compounded return over a specific horizon by adding the daily returns. This simple fact facilitates the development of models.

## Autocorrelation

The autocorrelation is the correlation of a series with a delayed, or shifted, copy of itself. It is a useful tool for detecting linear dynamics in time series analysis.

Let's look at the autocorrelation of daily returns.

``````

In [11]:

def plot_autocorrelation(returns, y_label,zero_line=False):
lags = np.arange(1,101)
return_autocorr = [returns.autocorr(lag=lag) for lag in lags]

f, ax = plt.subplots(figsize=(12, 6))
sns.set(color_codes=True)
sns.tsplot(data=return_autocorr, time=lags, ax=ax)
ax.set_xticks(np.arange(lags[0]-1,lags[-1]+1,10))
ax.set_xlabel('Lag order')
ax.set_ylabel(y_label)
if(zero_line):
plt.plot([lags[0],lags[-1]], [0,0], 'b')

``````
``````

In [12]:

plot_autocorrelation(portfolio.Return, 'Autocorrelation of daily returns',zero_line=True)

``````
``````

``````

We see that there is very little autocorrelation for daily returns of the portfolio over this period. Let's look at the autocorrelation of the squared returns.

``````

In [13]:

plot_autocorrelation(portfolio.R2, 'Autocorrelation of squared daily returns')

``````
``````

``````

We notice a fair amount of autocorrelation of the squared returns.

## GARCH(1,1)

For the purpose of this tutorial, let's first build our own model. In this model, we consider one lag of squared returns and one lag of variance (1,1).

``````

In [14]:

import scipy.optimize as opt

def compute_omega(factor,alpha,beta):
persistence = alpha + beta
return factor *(1-persistence)

def compute_cond_var(data,initial_value,alpha,beta,omega=None):
if omega is None:
omega = compute_omega(initial_value,alpha,beta)
cond_var = np.array([initial_value], dtype=np.float64)
for time in range(1,len(data)):
cond_var = np.append(cond_var,(omega +beta*cond_var[time-1]+alpha*data[time]**2))
return cond_var

``````
``````

In [15]:

def fn(x, df, r_var):
sig_2 = compute_cond_var(df.Return,r_var,x[0],x[1])[:-1]
return (np.log(sig_2)+df.R2[1:]/sig_2).sum()*1/2

res = opt.minimize(fn, [0.08,0.82], bounds=[[0,1],[0,1]],
args=(portfolio, portfolio.Return.std()**2))
res

``````
``````

/usr/local/lib/python2.7/dist-packages/ipykernel_launcher.py:3: RuntimeWarning: invalid value encountered in log
This is separate from the ipykernel package so we can avoid doing imports until

Out[15]:

fun: -28107.387398143728
hess_inv: <2x2 LbfgsInvHessProduct with dtype=float64>
jac: array([ 0.11132215,  0.08949428])
message: 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
nfev: 54
nit: 10
status: 0
success: True
x: array([ 0.08210269,  0.908509  ])

``````
``````

In [16]:

compute_cond_var(portfolio.Return,portfolio.Return.std()**2,res.x[0],res.x[1])

``````
``````

Out[16]:

array([  1.46845582e-04,   1.39144331e-04,   1.28618231e-04, ...,
3.01296863e-05,   3.43355444e-05,   3.29649827e-05])

``````

It turns out that a python library exists for such models. Let's use it to build a GARCH(1,1) model and compare it to our own model.

``````

In [17]:

from arch import arch_model
garch = arch_model(100*portfolio.Return[1:],)
res = garch.fit(update_freq=5)
print(res.summary())

``````
``````

Iteration:      5,   Func. Count:     39,   Neg. LLF: 9921.63392772
Iteration:     10,   Func. Count:     72,   Neg. LLF: 9918.29018018
Optimization terminated successfully.    (Exit mode 0)
Current function value: 9918.28947529
Iterations: 12
Function evaluations: 84
Constant Mean - GARCH Model Results
==============================================================================
Dep. Variable:                 Return   R-squared:                      -0.000
Mean Model:             Constant Mean   Adj. R-squared:                 -0.000
Vol Model:                      GARCH   Log-Likelihood:               -9918.29
Distribution:                  Normal   AIC:                           19844.6
Method:            Maximum Likelihood   BIC:                           19871.9
No. Observations:                 6888
Date:                Tue, Jul 11 2017   Df Residuals:                     6884
Time:                        11:20:05   Df Model:                            4
Mean Model
============================================================================
coef    std err          t      P>|t|      95.0% Conf. Int.
----------------------------------------------------------------------------
mu             0.0730  1.064e-02      6.863  6.743e-12 [5.219e-02,9.391e-02]
Volatility Model
============================================================================
coef    std err          t      P>|t|      95.0% Conf. Int.
----------------------------------------------------------------------------
omega          0.0142  3.680e-03      3.853  1.166e-04 [6.966e-03,2.139e-02]
alpha[1]       0.0864  1.305e-02      6.621  3.577e-11   [6.084e-02,  0.112]
beta[1]        0.9048  1.371e-02     65.992      0.000     [  0.878,  0.932]
============================================================================

Covariance estimator: robust

``````
``````

In [18]:

omega = res.params['omega']/(100**2)
omega

``````
``````

Out[18]:

1.4177695568094985e-06

``````
``````

In [19]:

compute_cond_var(portfolio.Return,portfolio.Return.std()**2,res.params['alpha[1]'],res.params['beta[1]'],omega)

``````
``````

Out[19]:

array([  1.46845582e-04,   1.38870052e-04,   1.27938520e-04, ...,
3.00012610e-05,   3.44409699e-05,   3.29932817e-05])

``````

Our model was quite simple. Let's make it slightly more robust and take into account the leverage effect (nonlinear GARCH also known as NGARCH).

``````

In [20]:

import scipy.optimize as opt
from functools import partial
import sys

# Constraints for the optimization algorithm
a = np.array([[ 1.,  0.,  0.],
[ 0.,  1.,  0.],
[ 0.,  0.,  1.],
[-1.,  0.,  0.],
[ 0., -1.,  0.],
[ 0.,  0.,  0.]])

b = [ 0.,  0.,  0., -1., -1., 0.]

def ieqcons(x, *args, **kwargs):
ineq = kwargs['a'].dot(x) - kwargs['b']
ineq[len(a)-1] = 1 - compute_persistance(x[0],x[1],x[2])
return ineq

def f_ieqcons(a,b):
return partial(ieqcons,a=a,b=b)

def compute_persistance(alpha,beta,theta=0.0):
return alpha*(1+theta**2) + beta

def compute_omega(unconditional_variance,persistence):
return unconditional_variance *(1-persistence)

def compute_cond_var(returns,unconditional_variance,alpha,beta,omega=None,theta=0.0):
if omega is None:
omega = compute_omega(unconditional_variance,compute_persistance(alpha,beta,theta))
cond_var = np.array([unconditional_variance], dtype=np.float64)
for time in range(1,len(returns)):
cond_var = np.append(cond_var,(omega +beta*cond_var[time-1]
+alpha*(returns[time-1]-theta*cond_var[time-1]**0.5)**2))
return cond_var

def our_garch(x, returns, unconditional_variance):
alpha = x[0]
beta = x[1]
theta = 0 if (len(x)<=2) else x[2]
persistence = compute_persistance(alpha,beta,theta)
# somehow fmin_slsqp runs iterations that fail to respect the given constraints (values in ieqcons are < 0)
# so this is a workaround
if(persistence<=1 and persistence>=0):
omega = compute_omega(unconditional_variance,persistence)
cond_var = compute_cond_var(returns,unconditional_variance,alpha,beta,omega,theta)
score = (np.log(cond_var)+np.square(returns)/cond_var).sum()
else:
score = sys.float_info.max
return score

returns = portfolio.Return[1:].tolist()
unconditional_variance = portfolio.Return[1:].std()**2

``````

Without leverage, we get the same results as before (whitout warnings this time):

``````

In [21]:

res = opt.fmin_slsqp(our_garch, [0.08,0.82,0.0], bounds=((0,1),(0,1),(0,0)), f_ieqcons=f_ieqcons(a,b),
args=(returns,unconditional_variance), iter=100, acc=1e-10,
epsilon=1.4901161193847656e-08)
res

``````
``````

Optimization terminated successfully.    (Exit mode 0)
Current function value: -56214.7747963
Iterations: 13
Function evaluations: 72

Out[21]:

array([ 0.08210233,  0.9085093 ,  0.        ])

``````

Including the leverage:

``````

In [22]:

res = opt.fmin_slsqp(our_garch, [0.08,0.82,0.5], f_ieqcons=f_ieqcons(a,b),
args=(returns,unconditional_variance), iter=100, acc=1e-06,
epsilon=1.4901161193847656e-08)
res

``````
``````

Optimization terminated successfully.    (Exit mode 0)
Current function value: -56442.8246217
Iterations: 19
Function evaluations: 104

Out[22]:

array([ 0.06924546,  0.87205728,  0.80197473])

``````
``````

In [ ]:

``````