In [2]:
import scipy.stats as ss
import numpy as np
Describe the general process for parametric hypothesis tests.
Why would you choose a non-parametric hypothesis test over a parametric one?
Why would you choose a parametric hypothesis test over a non-parametric one?
If you do not reject the null hypothesis, does that mean you've proved it?
In [4]:
import scipy.stats as ss
ss.t.cdf(-2, 4) * 2
Out[4]:
In [11]:
-ss.norm.ppf(0.025)
Out[11]:
In [7]:
1 - ss.poisson.cdf(7, mu=4.3)
Out[7]:
In [8]:
import math
0.4 / math.sqrt(11)
Out[8]:
State which hypothesis test best fits the example below and state the null hypothesis. You can justify your answer if you feel like mulitiple tests fit.
You know that coffee should be brewed at 186 $^\circ{}$F. You measure coffee from Starbuck 10 times over a week and want to know if they're brewing at the correct temperature.
You believe that the real estate market in SF is the same as NYC. You gather 100 home prices from both markets to compare them.
Australia banned most guns in 2002. You compare homicide rates before and after this date.
A number of states have recently legalized recreational marijuana. You gather teen drug use data for the year prior and two years after the legislation took effect.
You think your mail is being stolen. You know that you typically get five pieces of mail on Wednesdays, but this Wednesday you got no mail.
t-test Null: The coffee is brewed at the correct temperature.
Wilcoxon Sum of Ranks
The real estate prices in SF and NYC are from the same distribution.
Wilcoxon Sum of Ranks
The homicide rates before the dat are from the same distribution
Wilcoxon Signed Ranks
The teen drug use data for the year prior and the year after two years after the legislation are from the same distribution
Poisson
Your mail is not being stolen
Do the following:
Morning | Evening |
---|---|
181 | 180 |
182 | 179 |
181 | 184 |
182 | 179 |
182 | 180 |
183 | 183 |
185 | 180 |
On a typical day in Rochester, there are 11 major car accidents. On the Monday after daylight savings time in the Spring, there are 18 major car accidents. Is this significant?
Your cellphone bill is typically \$20. The last four have been \\$21, \$30. \\$25, \$23. Has it significantly changed?
In [10]:
p = ss.wilcoxon([181, 182, 181, 182, 182, 183, 185], [180, 179, 184, 179, 180, 183, 180])
print(p[1])
In [11]:
1 - ss.poisson.cdf(17, mu=11)
Out[11]:
In [15]:
import numpy as np
data = [21, 30, 25, 23]
se = np.std(data, ddof=1) / np.sqrt(len(data))
T = (np.mean(data) - 20) / se
ss.t.cdf(-abs(T), df=len(data) - 1) * 2
Out[15]:
In [17]:
1 - ss.expon.cdf(16, scale=10)
Out[17]:
No