Tutorial with Examples based on the VSTOXX Volatility Index
Dr. Yves J. Hilpisch
Continuum Analytics Europe GmbH
For Python Quants – 14. March 2014
You find the presentation and the IPython Notebook here:
A brief bio:
See www.hilpisch.com.
This tutorial focuses on
It does not address such important issues like
A fundamental Python stack for interactive data analytics and visualization should at least contain the following libraries tools:
It is best to use e.g. a Python distribution like Anaconda to ensure consistency of libraries.
We need to make a couple of imports for what is to come.
In [1]:
import numpy as np
import pandas as pd
import pandas.io.data as pdd
from urllib import urlretrieve
%matplotlib inline
The convenience function DataReader makes it easy to read historical stock price data from Yahoo! Finance (http://finance.yahoo.com).
In [2]:
try:
index = pdd.DataReader('^GDAXI', data_source='yahoo', start='2007/3/30')
# e.g. the EURO STOXX 50 ticker symbol -- ^SX5E
except:
index = pd.read_csv('dax.txt', index_col=0, parse_dates=True)
In [3]:
index.info()
pandas strength is the handling of indexed/labeled/structured data, like times series data.
In [4]:
index.tail()
Out[4]:
pandas makes it easy to implement vectorized operations, like calculating log-returns over whole time series.
In [5]:
index['Returns'] = np.log(index['Close'] / index['Close'].shift(1))
In addition, pandas makes plotting quite simple and compact.
In [6]:
index[['Close', 'Returns']].plot(subplots=True, style='b', figsize=(8, 5))
Out[6]:
We now want to check how annual volatility changes over time.
In [7]:
index['Mov_Vol'] = pd.rolling_std(index['Returns'], window=252) * np.sqrt(252)
Obviously, the annual volatility changes significantly over time.
In [8]:
index[['Close', 'Returns', 'Mov_Vol']].plot(subplots=True, style='b', figsize=(8, 5))
Out[8]:
Trend-based investment strategy with the EURO STOXX 50 index:
Signal generation:
It is a stylized fact that stock indexes and related volatility indexes are highly negatively correlated. The following example analyzes this stylized fact based on the EURO STOXX 50 stock index and the VSTOXX volatility index using Ordinary Least-Squares regession (OLS).
First, we collect historical data for both the EURO STOXX 50 stock and the VSTOXX volatility index.
In [9]:
import pandas as pd
import datetime as dt
from urllib import urlretrieve
In [10]:
try:
es_url = 'http://www.stoxx.com/download/historical_values/hbrbcpe.txt'
vs_url = 'http://www.stoxx.com/download/historical_values/h_vstoxx.txt'
urlretrieve(es_url, 'es.txt')
urlretrieve(vs_url, 'vs.txt')
except:
pass
The EURO STOXX 50 data is not yet in the right format. Some house cleaning is necessary (I).
In [11]:
lines = open('es.txt').readlines() # reads the whole file line-by-line
In [12]:
lines[:5] # header not well formatted
Out[12]:
The EURO STOXX 50 data is not yet in the right format. Some house cleaning is necessary (II).
In [13]:
lines[3883:3890] # from 27.12.2001 additional semi-colon
Out[13]:
The EURO STOXX 50 data is not yet in the right format. Some house cleaning is necessary (III).
In [14]:
lines = open('es.txt').readlines() # reads the whole file line-by-line
new_file = open('es50.txt', 'w') # opens a new file
new_file.writelines('date' + lines[3][:-1].replace(' ', '') + ';DEL' + lines[3][-1])
# writes the corrected third line (additional column name)
# of the orginal file as first line of new file
new_file.writelines(lines[4:]) # writes the remaining lines of the orginal file
The EURO STOXX 50 data is not yet in the right format. Some house cleaning is necessary (IV).
In [15]:
list(open('es50.txt'))[:5] # opens the new file for inspection
Out[15]:
Now, the data can be safely read into a DataFrame object.
In [16]:
es = pd.read_csv('es50.txt', index_col=0, parse_dates=True, sep=';', dayfirst=True)
In [17]:
del es['DEL'] # delete the helper column
In [18]:
es.info()
The VSTOXX data can be read without touching the raw data.
In [19]:
vs = pd.read_csv('vs.txt', index_col=0, header=2, parse_dates=True, sep=',', dayfirst=True)
# you can alternatively read from the Web source directly
# without saving the csv file to disk:
# vs = pd.read_csv(vs_url, index_col=0, header=2,
# parse_dates=True, sep=',', dayfirst=True)
We now merge the data for further analysis.
In [20]:
import datetime as dt
data = pd.DataFrame({'EUROSTOXX' :
es['SX5E'][es.index > dt.datetime(1999, 12, 31)]})
data = data.join(pd.DataFrame({'VSTOXX' :
vs['V2TX'][vs.index > dt.datetime(1999, 12, 31)]}))
data.info()
Let's inspect the two time series.
In [21]:
data.head()
Out[21]:
A picture can tell almost the complete story.
In [22]:
data.plot(subplots=True, grid=True, style='b', figsize=(10, 5))
Out[22]:
We now generate log returns for both time series.
In [23]:
rets = np.log(data / data.shift(1))
rets.head()
Out[23]:
To this new data set, also stored in a DataFrame object, we apply OLS.
In [24]:
xdat = rets['EUROSTOXX']
ydat = rets['VSTOXX']
model = pd.ols(y=ydat, x=xdat)
model
Out[24]:
Again, we want to see how our results look graphically.
In [25]:
import matplotlib.pyplot as plt
plt.plot(xdat, ydat, 'r.')
ax = plt.axis() # grab axis values
x = np.linspace(ax[0], ax[1] + 0.01)
plt.plot(x, model.beta[1] + model.beta[0] * x, 'b', lw=2)
plt.grid(True)
plt.axis('tight')
Out[25]:
Let us see if we can identify systematics over time. And indeed, during the crisis 2007/2008 (yellow dots) volatility has been more pronounced than more recently (red dots).
In [26]:
import matplotlib as mpl
mpl_dates = mpl.dates.date2num(rets.index)
plt.figure(figsize=(8, 4))
plt.scatter(rets['EUROSTOXX'], rets['VSTOXX'], c=mpl_dates, marker='o')
plt.grid(True)
plt.xlabel('EUROSTOXX')
plt.ylabel('VSTOXX')
plt.colorbar(ticks=mpl.dates.DayLocator(interval=250),
format=mpl.dates.DateFormatter('%d %b %y'))
Out[26]:
We want to test whether the EURO STOXX 50 and/or the VSTOXX returns are normally distributed or not (e.g. if they might have fat tails). We want to do a
Add on: plot a histogram of the log return frequencies and compare that to a normal distribution with same mean and variance (using e.g. norm.pdf from scipy.stats)
There has been a number of studies which have illustrated that constant proportion investments in volatility derivatives – given a diversified equity portfolio – might improve investment performance considerably. See, for instance, the study
The Benefits of Volatility Derivatives in Equity Portfolio Management
We now want to replicate (in a simplified fashion) what you can flexibly test here on the basis of two backtesting applications for VSTOXX-based investment strategies:
The strategy we are going to implement and test is characterized as follows:
We already have the necessary data available. However, we want to drop 'NaN' values and want to normalize the index values.
In [27]:
data = data.dropna()
In [28]:
data = data / data.ix[0] * 100
In [29]:
data.head()
Out[29]:
First, the initial invest.
In [30]:
invest = 100
cratio = 0.3
data['Equity'] = (1 - cratio) * invest / data['EUROSTOXX'][0]
data['Volatility'] = cratio * invest / data['VSTOXX'][0]
This can already be considered an static investment strategy.
In [31]:
data['Static'] = (data['Equity'] * data['EUROSTOXX']
+ data['Volatility'] * data['VSTOXX'])
In [32]:
data[['EUROSTOXX', 'Static']].plot(figsize=(10, 5))
Out[32]:
Second, the dynamic strategy with daily adjustments to keep the value ratio constant.
In [33]:
for i in range(1, len(data)):
evalue = data['Equity'][i - 1] * data['EUROSTOXX'][i]
# value of equity position
vvalue = data['Volatility'][i - 1] * data['VSTOXX'][i]
# value of volatility position
tvalue = evalue + vvalue
# total wealth
data['Equity'][i] = (1 - cratio) * tvalue / data['EUROSTOXX'][i]
# re-allocation of total wealth to equity ...
data['Volatility'][i] = cratio * tvalue / data['VSTOXX'][i]
# ... and volatility position
Third, the total wealth position.
In [34]:
data['Dynamic'] = (data['Equity'] * data['EUROSTOXX']
+ data['Volatility'] * data['VSTOXX'])
In [35]:
data.head()
Out[35]:
A brief check if the ratios are indeed constant.
In [36]:
(data['Volatility'] * data['VSTOXX'] / data['Dynamic'])[:5]
Out[36]:
In [37]:
(data['Equity'] * data['EUROSTOXX'] / data['Dynamic'])[:5]
Out[37]:
Let us inspect the performance of the strategy.
In [38]:
data[['EUROSTOXX', 'Dynamic']].plot(figsize=(10, 5))
Out[38]:
Write a Python function which allows for an arbitrary but constant ratio to be invested in the VSTOXX index and which returns net performance values (in percent) for the constant proportion VSTOXX strategy.
Add on: find the ratio to be invested in the VSTOXX that gives the maximum performance.
Using standard Python functionality and pandas, the code that follows reads intraday, high-frequency data from a Web source, plots it and resamples it.
In [39]:
try:
url = 'http://hopey.netfonds.no/posdump.php?'
url += 'date=%s%s%s&paper=AAPL.O&csv_format=csv' % ('2014', '03', '12')
# you may have to adjust the date since only recent dates are available
urlretrieve(url, 'aapl.csv')
except:
pass
In [40]:
AAPL = pd.read_csv('aapl.csv', index_col=0, header=0, parse_dates=True)
In [41]:
AAPL.info()
The intraday evolution of the Apple stock price.
In [42]:
AAPL['bid'].plot()
Out[42]:
In [43]:
AAPL = AAPL[AAPL.index > dt.datetime(2014, 3, 12, 10, 0, 0)]
# only data later than 10am at that day
A resampling of the data is easily accomplished with pandas.
In [44]:
# this resamples the record frequency to 5 minutes, using mean as aggregation rule
AAPL_5min = AAPL.resample(rule='5min', how='mean').fillna(method='ffill')
AAPL_5min.head()
Out[44]:
Let's have a graphical look at the new data set.
In [45]:
AAPL_5min['bid'].plot()
Out[45]:
With pandas you can easily apply custom functions to time series data.
In [46]:
AAPL_5min['bid'].apply(lambda x: 2 * 530 - x).plot()
# this mirrors the stock price development at
Out[46]:
10 years ago, Python was considered exotic in the analytics space – at best. Languages/packages like R and Matlab dominated the scene. Today, Python has become a major force in financial analytics & visualization due to a number of characteristics:
Continuum Analytics Inc. – the company Web site
Dr. Yves J. Hilpisch – my personal Web site
Python for Finance – my NEW book (out as Early Release)
Derivatives Analytics with Python – my current book
www.derivatives-analytics-with-python.com
Contact Us
yves@continuum.io | europe@continuum.io | @dyjh | @ContinuumIO