In [1]:
library("quantmod")
library("repr")
#Change plot size.
options(repr.plot.width=14, repr.plot.height=10)
The data used in this analysis is the closing prices of two indexes, BIST100 and BISTTUM, between 2001 and 2016. BIST 100 index includes 100 leading companies of Borsa Istanbul and BISTTUM index includes every publicly listed company. The selection of leading companies into BIST 100 is mainly based on trading volume and market capitalizations. Companies involved in BIST 100 are susceptible to changes and reviewed 4 times a year by Borsa Istanbul. The weights of companies are calculated accordingly to market capitalizations. The formula for calculation of the two indices are the same and is represented as $E_t$.
$$E_t=\frac{\sum_{i=1}^n (F_{it}/D_t) * N_{it} * H_{it} * K_{it}}{B_t}$$| $E_t$ = Value of the index at t. | $n$ = number of companies in the index. | $F_{it}$ = price of the ith company at t. | $D_t$ = The exchange rate of the index at t. | $N_{it}$ = total number of shares of the ith company at t. | $H_{it}$ = free float rate of the ith company at t. | $K_{it}$ = coefficient of ith company at t. | $B_t$ = denominator of index at t. |
Data is stored in a comma separated value (csv) file and transferred into a data frame with the read.csv function.
In [2]:
closingdata <- read.csv(file="BIST.csv", header=TRUE, sep=",")
Data frame is converted into an Extensible Time Series (xts) object for easier calculation and better compatibility with quantmod package. Date column is first extracted then deleted from the data frame to correctly transfer the data into xts object.
In [3]:
date <- strptime(closingdata$Date, format = "%d/%m/%Y")
closingdata <- closingdata[,-1]
xclosingdata <- xts(closingdata,date)
XU100 <- xclosingdata$XU100
XUTUM <- xclosingdata$XUTUM
Plotting XU100 and XUTUM between 2001/2008 and 2008/2016.
In [4]:
par(mfrow=c(2,1))
plot(XU100["2001/2008"],xlab="", ylab="",main="XU100 and XUTUM between 2001 and 2008")
par(new=T)
plot(as.vector(XUTUM["2001/2008"]),type="l",col="blue",xaxt="n", yaxt="n",ylab="",xlab="",main="")
axis(4,col.axis="blue",col="blue")
legend("topleft", c("XU100", "XUTUM"), lty=c(1,1), col=c("black", "blue"), bg="white",cex=0.7)
plot(XU100["2008/2016"],xlab="", ylab="",main="XU100 and XUTUM between 2008 and 2016")
par(new=T)
plot(as.vector(XUTUM["2008/2016"]),type="l",col="blue",xaxt="n", yaxt="n",ylab="",xlab="",main="")
axis(4,col.axis="blue",col="blue")
legend("topleft", c("XU100", "XUTUM"), lty=c(1,1), col=c("black", "blue"), bg="white",cex=0.7)
The null hypothesis ($H_0$) states that the compared samples are drawn from the same population with regard to the outcome variable. This means that any observed differences in the dependent variable (outcome) must be due to sampling error, the independent variable does not make a difference.
The research hypothesis ($H_1$) is what we expect to happen, our prediction. It is also called the alternative hypothesis because it is an alternative to the null hypothesis. Technically, the claim of the research hypothesis is that with respect to the outcome variable, our samples are from different populations.
Example: If we predict that exercise results in better weight loss, we are predicting that after the treatment (exercise), the treated sample truly is different from the untreated one therefore, from a different population.
$H_0$ in this case would be exercise is unrelated to weight loss and $H_1$ would be exercise leads to weight loss.
The p value determines whether or not we reject the null hypothesis. We use it to estimate whether or not we think the null hypothesis is true. The p value provides an estimate of how often we would get the obtained result by chance, if in fact the null hypothesis were true. When interpreting p value, if the p value is small, reject the null hypothesis and accept the alternative hypothesis that the samples are truly different with regard to the outcome. If the p value is large, we simply fail to reject the null hypothesis.
In this analysis we will use the time intervals 2001/2008 and 2008/2016 because of the fact that there was a global financial crisis in 2008. Main goal of this section is to determine whether the global crisis caused market structure to change from a statistical standpoint.
In order to do so our the null hypothesis, $H_0$, is that there is no change in the market structure because of the crisis. The alternative hypothesis, $H_1$ , is that market structure changed with the effect of crises.
XU100 and XUTUM will be tested against each other in the same time frames in order to determine different effects on the top companies and the whole stock market.
Instead of closing price series, log returns will be used for further statistical tests and calculations.
Advantage of returns compared to closing prices is normalization and it enables evaluation of analytic relationships among variables despite originating from price series of varied values by measuring all variables in a comparable metric.
$$ R_t = \frac{P_t - P_{t-1}}{P_{t-1}} $$The Taylor expansion for $ \log{1+x} $ is $ x-\frac{x^2}{2}+\frac{x^3}{3} + O(x^4) $
When x is a small number, $ \log{1+x} \approx x $
Substituting $ R_t $ gives us $ \log{1+R_t} \approx R_t $
$$ =\log{1+\frac{P_t}{P_{t-1}-1}} \approx R_t $$$$ =\log{\frac{P_t}{P_{t-1}}} = =\log{P_t} - \log{P_{t-1}} \approx R_t $$| $R_t$ = Returns at t | $P_t$ = Closing price at t |
Summaries are very useful In order to get a first look at the data before analyzing.
In [5]:
summary(dailyReturn(log(XU100["2001/2008"])))
summary(dailyReturn(log(XU100["2008/2016"])))
In [6]:
summary(dailyReturn(log(XUTUM["2001/2008"])))
summary(dailyReturn(log(XUTUM["2008/2016"])))
Looking at the summaries, between 2001 and 2008 the mean log return for XU100 was $5.717*10^{-5}$ where median was $7.182*10^{-5}$, in the same period XUTUM mean and median was $5.967*10^{-5}$ and $7.182*10^{-5}$.
Between 2008 and 2016 the mean log return for XU100 was $1.543*10^{-5}$ where median was $6.952*10^{-5}$, in the same period mean and median of XUTUM was $1.763*10^{-5}$ and $7.232*10^{-5}$.
Means didn't change much in different time periods, however there is a substantial difference between means of XUTUM and XU100.
In [7]:
par(mfrow=c(2,2))
plot(density(dailyReturn(log(XU100["2001/2008"]))),type="h", main="XU100:2001/2008")
plot(density(dailyReturn(log(XU100["2008/2016"]))),type="h", main="XU100:2008/2016")
plot(density(dailyReturn(log(XUTUM["2001/2008"]))),type="h", main="XUTUM:2001/2008")
plot(density(dailyReturn(log(XUTUM["2008/2016"]))),type="h", main="XUTUM:2008/2016")
T-test is used to determine the difference between means of two groups. The premise of this test is that the two groups are sampled from normal distributions with equal variances. The null hypothesis of t test is the means of samples are equal, and the alternative is that they are not.
In our case we will use t-test to determine if there is really a structural difference between different period of times.
In [8]:
t.test(dailyReturn(log(XUTUM["2008/2016"])),dailyReturn(log(XUTUM["2001/2008"])))
In [9]:
t.test(dailyReturn(log(XU100["2008/2016"])),dailyReturn(log(XU100["2001/2008"])))
In [10]:
t.test(dailyReturn(log(XUTUM["2001/2008"])),dailyReturn(log(XU100["2001/2008"])))
In [11]:
t.test(dailyReturn(log(XUTUM["2008/2016"])),dailyReturn(log(XU100["2008/2016"])))
P values of t tests are really high to make meaningful conclusions about the data. We cannot reject the null hypothesis, there is no evidence whether the data is normally distributed or not.
Generally it is assumed that log returns of equities and assets are normally distributed. In order to reach a more robust results in this analysis we require normalization of log returns to detect if their distributions are really normally distributed.
The following formula is used for our normalization process.
$$ X_{normalized} = \frac{X-mean(X)}{\sigma_X} $$
In [12]:
XU100_0108 <- (dailyReturn(log(XU100["2001/2008"])) - mean(dailyReturn(log(XU100["2001/2008"])))) / sd(dailyReturn(log(XU100["2001/2008"])))
XU100_0816 <- (dailyReturn(log(XU100["2008/2016"])) - mean(dailyReturn(log(XU100["2008/2016"])))) / sd(dailyReturn(log(XU100["2008/2016"])))
XUTUM_0108 <- (dailyReturn(log(XUTUM["2001/2008"])) - mean(dailyReturn(log(XUTUM["2001/2008"])))) / sd(dailyReturn(log(XUTUM["2001/2008"])))
XUTUM_0816 <- (dailyReturn(log(XUTUM["2008/2016"])) - mean(dailyReturn(log(XUTUM["2008/2016"])))) / sd(dailyReturn(log(XUTUM["2008/2016"])))
In [13]:
par(mfrow=c(2,2))
plot(density(XU100_0108),type="h", main="XU100:2001/2008")
plot(density(XU100_0816),type="h", main="XU100:2008/2016")
plot(density(XUTUM_0108),type="h", main="XUTUM:2001/2008")
plot(density(XUTUM_0816),type="h", main="XUTUM:2008/2016")
Means are fixed to 0 and standard deviations to 1 in order to fully compare and comprehend the distributions of the log returns of XUTUM and XU100.
In [14]:
summary(XU100_0108)
summary(XU100_0816)
In [15]:
summary(XUTUM_0108)
summary(XUTUM_0816)
Kolmogorov–Smirnov test is a test of the probability distributions that is used to compare a sample with a reference probability distribution (one-sample), or to compare two samples (two-sample K–S test) whether the two data samples come from the same distribution.
In this case, two sample KS test is used to determine the differences between normalized log return distributions of XU100 and XUTUM in specified periods.
D value of this test represents whether two data frames are from the same distribution. When D approaches to 0 we can safely conclude that the data frames are from the same distribution.
$$ D = \max_{1 \le i \le N} \left( F(Y_{i}) - \frac{i-1} {N}, \frac{i}{N} - F(Y_{i}) \right) $$
In [16]:
ks.test(XUTUM_0108,XUTUM_0816)
In [17]:
ks.test(XU100_0108, XU100_0816)
In [18]:
ks.test(XU100_0108, XUTUM_0108)
In [19]:
ks.test(XU100_0816, XUTUM_0816)
Obviously there is not much difference in the distributions of XU100 and XUTUM between the time frames 2001-2008 and 2008-2016. On the other hand, we can safely accept the alternative hypothesis that the distributions of XUTUM and XU100 are different in identical time frames.
Volatility refers to the amount of uncertainty or risk about the size of changes in a security's value. A higher volatility means that a security's value can potentially be spread out over a larger range of values. This means that the price of the security can change dramatically over a short time period in either direction. A lower volatility means that a security's value does not fluctuate dramatically, but changes in value at a steady pace over a period of time.
Formula used in volatility function from TTR package where $ R_i = \log{\frac{Ci}{C_{i-1}}} $ and $ \bar{R} = \frac{R_1 + R_2 + ... + R_{n-1}}{n-1} $.
$$ \sigma_{cl}=\sqrt[\leftroot{-2}\uproot{2}]{\frac{N}{n-2}*\sum_{i=1}^{n-1}(R_i - \bar{R})^2} $$
In [20]:
V_XU100_0108 <- volatility(XU100["2001/2008"],N=286,n=24,calc="close")
V_XU100_0816 <- volatility(XU100["2008/2016"],N=286,n=24,calc="close")
V_XUTUM_0108 <- volatility(XUTUM["2001/2008"],N=286,n=24,calc="close")
V_XUTUM_0816 <- volatility(XUTUM["2008/2016"],N=286,n=24,calc="close")
par(mfrow=c(2,2))
plot(density(na.omit(V_XU100_0108)),type="h", main="XU100:2001/2008")
plot(density(na.omit(V_XU100_0816)),type="h", main="XU100:2008/2016")
plot(density(na.omit(V_XUTUM_0108)),type="h", main="XUTUM:2001/2008")
plot(density(na.omit(V_XUTUM_0816)),type="h", main="XUTUM:2008/2016")
In [21]:
summary(V_XUTUM_0108)
summary(V_XUTUM_0816)
In [22]:
summary(V_XU100_0108)
summary(V_XU100_0816)
Looking at the summaries, between 2001 and 2008 the mean volatility of XU100 was $0.378$ where median was $0.333$, in the same period XUTUM mean and median was $0.363$ and $0.319$.
Between 2008 and 2016 the mean volatility of XU100 was $0.2643$ where median was $0.2344$, in the same period mean and median of XUTUM was $0.2530$ and $0.2227$.
Means did change throughout time in different time periods for XUTUM and XU100. Also there is a substantial difference between mean volalities of XU100 and XUTUM in the same time periods.
In order to test our hypothesis we will use t-test on volatility.
In [23]:
t.test(V_XU100_0108,V_XU100_0816)
In [24]:
t.test(V_XUTUM_0108,V_XUTUM_0816)
In [25]:
t.test(V_XU100_0108,V_XUTUM_0108)
In [26]:
t.test(V_XU100_0816,V_XUTUM_0816)
Looking at the p values, we can safely reject the null hypothesis in different time frames. Concluding that the different volatility distributions have different means. There is a difference in means but we don't know whether they are normally distributed or not.There is still not much difference between XUTUM and XU100 in the same time frames.
In our analysis we statistically proved that the volatility and log return distributions of XU100 and XUTUM changed after the crisis.
KS Test and normality is utilized to prove the differences in the log return distributions of XU100 and XUTUM between the time frames 2001/2008 and 2008/2016.
T test is used to analyzing the distributions of volatilities for different time frames. Results from t test obviously shows us that the risk profile of the markets have shifted after the crisis and the top 100 universe behaved differently from the whole stock market during the same period.
There is no suprise that log return and volatility distributions of XUTUM and XU100 differ during the same period as market is more liquid and action packed for bigger capitalized companies. In both periods mean volatility of XU100 is substantially higher than XUTUM.
All in all, 2008 crisis effected psychology of investors and shifted them to risk free mindset which lowered both average mean log returns and volatilities of XU100 and XUTUM. From a statistical standpoint, we can safely conclude that 2008 fundamentally effected the structure of Borsa Istanbul.
Borsa İstanbul Endeks ve Veri Bölümü. (2016, November). BIST PAY ENDEKSLERİ TEMEL KURALLARI. Retrieved February 15, 2018, from http://www.borsaistanbul.com/docs/default-source/endeksler/bist-pay-endeksleri-temel-kurallari.pdf?sfvrsn=12
Georgakopoulos, H. (2014). Quantitative trading with R: A practical guide to financial mathematics and statistical computing. Basingstoke: Palgrave Macmillan.
Shumway, R. H., & Stoffer, D. S. (2006). Time series analysis and its applications: With R examples. New York: Springer.
Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.
Chan, E. P. (2009). Quantitative trading: How to build your own algorithmic trading business. Hoboken, NJ: John Wiley & Sons.