In [1]:
# Copyright (c) Thalesians Ltd, 2019. All rights reserved
# Copyright (c) Paul Alexander Bilokon, 2019. All rights reserved
# Author: Paul Alexander Bilokon <paul@thalesians.com>
# Version: 1.0 (2019.04.23)
# Email: education@thalesians.com
# Platform: Tested on Windows 10 with Python 3.6

Datasets

We introduce several datasets used as running examples in TSA.


In [2]:
import pandas as pd

Dataset 1

This dataset is derived from a time series of daily GBP/USD exchange rates, $(S_t)_{t=0,1,\ldots,n}$, $n = 945$, from 1981.10.01 to 1985.06.28, both inclusive. Logarithmic (continuously compounded) daily returns were computed, scaled by 100, and the resulting time series was mean-adjusted: $$X_t = 100 \cdot \left[ \ln S_t - \ln S_{t-1} - \frac{1}{n} \sum_{u=1}^n (\ln S_u - \ln S_{u-1}) \right], \quad t = 1, 2, \ldots, n.$$ This dataset has been extensively studied in the literature [HRS94, SP97, KSC98, DK00, MY00]. We obtained it as part of the course materials for [Mey10], which are publicly available for download. It is not clear which fixing was used as the daily exchange rate to generate the dataset. We attempted to reconstruct the dataset using a time series of WM/Reuters fixes and noticed significant differences. Meyer's time series was also longer than that provided by Reuters by eight points. We chose to use Meyer's dataset without modifications for the sake of reproducibility.


In [3]:
df1 = pd.read_csv('../../../../data/dataset-1.csv')

In [4]:
df1.head()


Out[4]:
daily_log_return
0 -0.320221
1 1.460719
2 -0.408630
3 1.060960
4 1.712889

In [5]:
y1 = df1['daily_log_return'].values

Dataset 2

This dataset is derived from a time series of daily closing prices of the Standard & Poor's (S&P) 500, a stock market index based on the market capitalizations of 500 large companies with common stock listed on the New York Stock Exchange (NYSE) or NASDAQ. The data was provided by Yahoo! Finance service. The closing prices were adjusted for all applicable splits and dividend distributions by Yahoo! in adherence to Center for Research in Security Prices (CRSP) standards. From the prices, $(S_t)_{t=0,\ldots,n}$, $n = 2022$, for the dates from 1980.01.02 to 1987.12.31, both inclusive, we obtained the time series of logarithmic (continuously compounded) daily returns, scaled by 100, and the resulting time series was mean-adjusted: $$X_t = 100 \cdot \left[ \ln S_t - \ln S_{t-1} - \frac{1}{n} \sum_{u=1}^n (\ln S_u - \ln S_{u-1}) \right], \quad t = 1, 2, \ldots, n.$$ This is one of the time series used in [Yu05]. We generated the time series ourselves as we didn't have access to the author's input data. The number of data points in our time series matches that reported in [Yu05, p.172]. We were also able to reproduce some of the results mentioned in that paper very closely using our time series.


In [6]:
df2 = pd.read_csv('../../../../data/dataset-2.csv')

In [7]:
df2.head()


Out[7]:
daily_log_return
0 -0.553864
1 1.185967
2 0.229915
3 1.941784
4 0.049783

In [8]:
y2 = df2['daily_log_return'].values

Bibliography

[DK00] James Durbin and Siem Jan Koopman. Time series analysis of non-Gaussian observations based on state space models from both classical and Bayesian perspectives (with discussion). Journal of the Royal Statistical Society, Series B, 62:3-56, 2000.

[HRS94] Andrew C. Harvey, Esther Ruiz, and Neil Shephard. Multivariate stochastic variance models. Review of Economic Studies, 61:247-264, 1994.

[KSC98] Sangjoon Kim, Neil Shephard, and Siddhartha Chib. Stochastic volatility: Likelihood inference and comparison with ARCH models. The Review of Economic Studies, 65(3):361-393, July 1998.

[MY00] Renate Meyer and Jun Yu. BUGS for a Bayesian analysis of stochastic volatility models. Econometrics Journal, 3:198-215, 2000.

[SP97] Neil Shephard and Michael K. Pitt. Likelihood analysis of non-Gaussian measurement time series. Biometrika, 84:653-667, 1997.

[Yu05] Jun Yu. On leverage in a statistical volatility model. Journal of Econometrics, 127:165-178, 2005.