Shapiro Wilks hypothesis test
The best-fit slope
The standard error in residuals.
SSR is the sum of squared distance between fit y and data y. TTS is the sum of squared distance between average y and all y data. $TTS \geq SSR$
OLS-1D, NLS-ND
(1) Justify with Spearmann test (2) Check normality of residuals (3) hypothesis tests/confidence intervals as needed
yes, $y$ vs $\hat{y}$
When it doesn't change the noise in the model from normal to some other distribution
$\hat{y}$ is the best fit and $y$ is the data. When we write $y$, to achieve equality with our model we have to add $\epsilon$, some noise to describe the discrepancy between our model and the data.
In [1]:
import scipy.stats as ss
ss.shapiro([-26.3,-24.2, -20.9, -25.8, -24.3, -22.6, -23.0, -26.8, -26.5, -23.1, -20.0, -23.1, -22.4, -22.8])
Out[1]:
The $p$-value is 0.43, so it could be normal
In [2]:
import numpy as np
T = (0.2 - 0) / np.sqrt(0.4)
# Use 11 - 1 because null hypothesis is there is no intercept!
1 - (ss.t.cdf(T, 11 - 1) - ss.t.cdf(-T, 11- 1))
Out[2]:
The $p$-value is 0.76, so we cannot reject the null hypothesis of no intercept
Conduct a hpyothesis test for the slope being positive using the above data. This is a one-sided hypothesis test. Hint: a good null hypothesis would be that the slope is negative. Describe your test in Markdown first, then complete it in Python, and finally write an explanation of the p-value in the final cell.
Let's make the null hypothesis that the slope is negative as suggested. We will create a T statistic, which should correspond to some interval/$p$-value that gets smaller (closer to our significance threshold) as we get more positive in our slope. This will work:
$$ p = 1 - \int_{0}^{T} p(t)\,dt$$where $T$ is our positive value reflecting how positive the slope is.
You can use 1 or 2 deducted degrees of freedom. 1 is correct, since there is no degree of freedom for the intercept here, but it's a little bit tricky to see that.
In [4]:
T = 1.6 / np.sqrt(4)
ss.t.cdf(T, 11 - 1) - ss.t.cdf(0,11 - 1)
Out[4]:
The $p$-value is 0.28, so it's not guaranteed that the slope is positive. This is due to the large uncertainty in the intercept
In [ ]:
def ssr(beta):
yhat = beta[0] * x + beta[1] * np.exp(-beta[2] * x)
return np.sum( (y - yhat)**2)
$11 - 1 = 10$. Only deduct number of fit coefficients for non-linear regression