By Christopher van Hoecke and Maxwell Margenot
https://www.quantopian.com/lectures/hypothesis-testing
This lecture corresponds to the Hypothesis Testing lecture, which is part of the Quantopian lecture series. This homework expects you to rely heavily on the code presented in the corresponding lecture. Please copy and paste regularly from that lecture when starting to work on the problems, as trying to do them from scratch will likely be too difficult.
When you feel comfortable with the topics presented here, see if you can create an algorithm that qualifies for the Quantopian Contest. Participants are evaluated on their ability to produce risk-constrained alpha and the top 10 contest participants are awarded cash prizes on a daily basis.
https://www.quantopian.com/contest
Part of the Quantopian Lecture Series:
In [1]:
# Useful Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import t
import scipy.stats
Using the techniques laid out in lecture, verify if we can state that the returns of TSLA are greater than 0.
In [2]:
prices1 = get_pricing('TSLA', start_date = '2015-01-01', end_date = '2016-01-01', fields = 'price')
returns_sample_tsla = prices1.pct_change()[1:]
print 'Tesla return sample mean', returns_sample_tsla.mean()
print 'Tesla return sample standard deviation', returns_sample_tsla.std()
print 'Tesla return sample size', len(returns_sample_tsla)
In [3]:
# Testing
## Z- Statistic:
test_stat = (returns_sample_tsla.mean() - 0) / \
( returns_sample_tsla.std() / np.sqrt( len(returns_sample_tsla) ) )
print 't-statistic is:', test_stat
## Finding the p-value for one tail test
p_val = (1 - t.cdf(test_stat, len(returns_sample_tsla) - 1))
print 'p-value is: ', p_val
With $\alpha = 0.01$, our p-value is greater than our $\alpha$ value, we thus fail to reject the null hypothesis.
With $\alpha = 0.5$, our p-value is greater than our $\alpha$ value, we thus fail to reject the null hypothesis.
With $\alpha = 0.1$, our p-value is greater than our $\alpha$ value, we thus fail to reject the null hypothesis.
In [4]:
## Graph for visualization.
x = np.linspace(-3, 3, 100)
norm_pdf = lambda x: (1/np.sqrt(2 * np.pi)) * np.exp(-x * x / 2)
y = norm_pdf(x)
fig, ax = plt.subplots(1, 1, sharex=True)
ax.plot(x, y)
ax.fill_between(x, 0, y, where = x > 1.282, label = 'Confidence region alpha = 0.1')
ax.fill_between(x, 0, y, where = x > 1.645, label = 'Confidence region alpha = 0.05', color = 'green')
ax.fill_between(x, 0, y, where = x > 2.326, label = 'Confidence region alpha = 0.01', color = 'red')
plt.axvline(p_val, linestyle = 'dashed', label = 'Location of P Value')
plt.title('Graph of rejection region and P-value')
plt.xlabel('x')
plt.ylabel('p(x)')
plt.legend();
In the graph above, we can clearly see the rejection region for all three values of $\alpha$ are bellow the found p-value.
In [5]:
## Finding the p-value for two tailed test.
p_val = 2*(1 - t.cdf(test_stat, len(returns_sample_tsla) - 1))
print 'p-value is: ', p_val
With $\alpha = 0.01$, our p-value is greater than our $\alpha$ value, we thus fail to reject the null hypothesis.
With $\alpha = 0.5$, our p-value is greater than our $\alpha$ value, we thus fail to reject the null hypothesis.
With $\alpha = 0.1$, our p-value is greater than our $\alpha$ value, we thus fail to reject the null hypothesis.
In [6]:
x = np.linspace(-3, 3, 100)
norm_pdf = lambda x: (1/np.sqrt(2 * np.pi)) * np.exp(-x * x / 2)
y = norm_pdf(x)
fig, ax = plt.subplots(1, 1, sharex=True)
ax.plot(x, y)
ax.fill_between(x, 0, y, where = x > 1.645, label = 'Confidence region at alpha = 0.1')
ax.fill_between(x, 0, y, where = x < -1.645)
ax.fill_between(x, 0, y, where = x > 1.96, label = 'Confidence region at alpha = 0.05', color = 'green')
ax.fill_between(x, 0, y, where = x < -1.96, color = 'green')
ax.fill_between(x, 0, y, where = x > 2.576, label = 'Confidence region at alpha = 0.05', color='red')
ax.fill_between(x, 0, y, where = x < -2.576, color = 'red')
plt.axvline(p_val, linestyle = 'dashed', label = 'Location of P Value')
plt.title('Graph of rejection region and P-value')
plt.xlabel('x')
plt.ylabel('p(x)')
plt.legend();
Find the critical values associated with $\alpha = 1\%, 5\%, 10\%$ and graph the rejection regions on a plot for a two tailed test.
Useful formula: $$ f = 1 - \frac{\alpha}{2} $$
In order to find the z-value associated with each f value use the z-table here.
You can read more about how to read z-tables here
In [7]:
# For alpha = 10%
alpha = 0.1
f = 1 - (alpha/2)
print 'alpha = 10%: f = ', f
# For alpha = 5%
alpha = 0.05
f = 1 - (alpha/2)
print 'alpha = 5%: f = ', f
# For alpha = 1%
alpha = 0.01
f = 1 - (alpha/2)
print 'alpha = 1%: f = ', f
Using the z-table above, we find that for
$\alpha = 10\%$, x = $\pm 1.645$
$\alpha = 5\%$, x = $\pm 1.96$
$\alpha = 1\%$, x = $\pm 2.575$
In [8]:
# Plot a standard normal distribution and mark the critical regions with shading
x = np.linspace(-3, 3, 100)
norm_pdf = lambda x: (1/np.sqrt(2 * np.pi)) * np.exp(-x * x / 2)
y = norm_pdf(x)
fig, ax = plt.subplots(1, 1, sharex=True)
ax.plot(x, y)
# Value for alpha = 1%
ax.fill_between(x, 0, y, where = x > 1.645, label = 'alpha = 10%')
ax.fill_between(x, 0, y, where = x < -1.645)
# Value for alpha = 5%
ax.fill_between(x, 0, y, where = x > 1.96, color = 'red', label = 'alpha = 5%')
ax.fill_between(x, 0, y, where = x < -1.96, color = 'red')
#Value for alpha = 10%
ax.fill_between(x, 0, y, where = x > 2.575, facecolor='green', label = 'alpha = 1%')
ax.fill_between(x, 0, y, where = x < -2.575, facecolor='green')
plt.title('Rejection regions for a two-tailed hypothesis test at 90%, 95%, 99% confidence')
plt.xlabel('x')
plt.ylabel('p(x)')
plt.legend();
In [9]:
# Calculating Critical Values probability
alpha = 0.1
v = 1 - (alpha/2)
print v
From the previous question, we find a critical value of 1.645.
In [10]:
data = get_pricing('SPY', start_date = '2016-01-01', end_date = '2017-01-01', fields = 'price')
returns_sample = data.pct_change()[1:]
# Running the T-test.
n = len(returns_sample)
test_statistic = ((returns_sample.mean() - 0) /
(returns_sample.std()/np.sqrt(n)))
print 't test statistic: ', test_statistic
We find that $-1.645 < 1.05 < 1.645$. We thus conclude that we fail to reject our $H_0$
In [11]:
# Running p-value test.
alpha = 0.1
p_val = 2 * (1 - t.cdf(test_statistic, n - 1))
print 'p-value is: ', p_val
if p_val > alpha:
print 'p-value is greater than our significant level, we thus fail to reject the null hypothesis.'
else:
print 'p-value is less than or equal to our significal level, we thus reject the null hypothesis.'
As we can see above, our p-value is greater than our significant level, $\alpha = 0.1$, we thus fail to reject the null hypothesis.
note: one formula for t involves equal variance, the other does not. Use the right one given the information above
Answer:
In [12]:
# Data Collection
alpha = 0.1
symbol_list = ['XLF', 'MCD']
start = '2015-01-01'
end = '2016-01-01'
pricing_sample = get_pricing(symbol_list, start_date = start, end_date = end, fields='price')
pricing_sample.columns = map(lambda x: x.symbol, pricing_sample.columns)
returns_sample = pricing_sample.pct_change()[1:]
# Sample mean values
mu_xlf, mu_gs = returns_sample.mean()
s_xlf, s_gs = returns_sample.std()
n_xlf = len(returns_sample['XLF'])
n_gs = len(returns_sample['MCD'])
test_statistic = ((mu_xlf - mu_gs) - 0)/((s_xlf**2/n_xlf) + (s_gs**2/n_gs))**0.5
df = ((s_xlf**2/n_xlf) + (s_gs**2/n_gs))**2/ \
(((s_xlf**2 / n_xlf)**2 /(n_xlf-1))+((s_gs**2 / n_gs)**2/(n_gs-1)))
print 't test statistic: ', test_statistic
print 'Degrees of freedom (modified): ', df
print 'p-value: ', 2 * (1 - t.cdf(test_statistic, df))
With a confidence level of 90%, our test statistic belongs to the range -1.645, 1.645. Since our test statistic is above these values we accept the null hypothesis and determine that the difference between XLF and MCD returns is significantly different from $0$.
Answer:
$$1. H_0: \sigma_1^2 = \sigma_2^2, \ H_A: \sigma_1^2 \neq \sigma_2^2$$$$2. H_0: \sigma_1^2 \leq \sigma_2^2, \ H_A: \sigma_1^2 > \sigma_2^2$$$$3. H_0: \sigma_1^2 \geq \sigma_2^2, \ H_A: \sigma_1^2 < \sigma_2^2$$
In [13]:
# Data
symbol_list = ['XLF', 'MCD']
start = "2015-01-01"
end = "2016-01-01"
pricing_sample = get_pricing(symbol_list, start_date = start, end_date = end, fields = 'price')
pricing_sample.columns = map(lambda x: x.symbol, pricing_sample.columns)
returns_sample = pricing_sample.pct_change()[1:]
# Take returns from above, MCD and XLF, and compare their variances
xlf_std_dev, mcd_std_dev = returns_sample.std()
print 'XLF standard deviation is: ', xlf_std_dev
print 'MCD standard deviation is: ', mcd_std_dev
# Calculate F-test with MCD.std > XLF.std
test_statistic = (mcd_std_dev / xlf_std_dev)**2
print "F Test statistic: ", test_statistic
#degree of freedom
df1 = len(returns_sample['XLF']) - 1
df2 = len(returns_sample['MCD']) - 1
print df1
print df2
# Calculate critical values.
from scipy.stats import f
upper_crit_value = f.ppf(0.975, df1, df2)
lower_crit_value = f.ppf(0.025, df1, df2)
print 'Upper critical value at a = 0.05 with df1 = {0} and df2 = {1}: '.format(df1, df2), upper_crit_value
print 'Lower critical value at a = 0.05 with df1 = {0} and df2 = {1}: '.format(df1, df2), lower_crit_value
We can see that our F-test statistic is bellow the Upper Critical Value, thus we accept the null hypothesis and reject the alternative and conclude that the variances of XLF and MCD indeed do not differ.
Congratulations on completing the Hypothesis Testing answer key!
As you learn more about writing trading models and the Quantopian platform, enter the daily Quantopian Contest. Your strategy will be evaluated for a cash prize every day.
Start by going through the Writing a Contest Algorithm tutorial.
This presentation is for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation for any security; nor does it constitute an offer to provide investment advisory or other services by Quantopian, Inc. ("Quantopian"). Nothing contained herein constitutes investment advice or offers any opinion with respect to the suitability of any security, and any views expressed herein should not be taken as advice to buy, sell, or hold any security or as an endorsement of any security or company. In preparing the information contained herein, Quantopian, Inc. has not taken into account the investment needs, objectives, and financial circumstances of any particular investor. Any views expressed and data illustrated herein were prepared based upon information, believed to be reliable, available to Quantopian, Inc. at the time of publication. Quantopian makes no guarantees as to their accuracy or completeness. All information is subject to change and may quickly become unreliable for various reasons, including changes in market conditions or economic circumstances.