Homework 10 Key

CHE 116: Numerical Methods and Statistics

4/3/2019



In [2]:

    
import scipy.stats as ss
import numpy as np

1. Conceptual Questions

Describe the general process for parametric hypothesis tests.
Why would you choose a non-parametric hypothesis test over a parametric one?
Why would you choose a parametric hypothesis test over a non-parametric one?
If you do not reject the null hypothesis, does that mean you've proved it?

1.1

You compute the end-points of an interval with values as extreme and more extreme as your sample data. You integrate the area of this interval to obtain your p-value. If the p-value is less than your significance threshold, you reject the null hypothesis.

1.2

To avoid assuming normal or other distribution

1.3

A parametric test can show significance with small amounts of data.

1.4

2. Short Answer Questions

If your p-value is 0.4 and $\alpha = 0.1$, should you reject the null hypothesis?
What is your p-value if your $T$-value is -2 in the two-tailed/two-sided $t$-test with a DOF of 4?
For a one-sample $zM$ test, what is the minimum number of standard deviations away from the population mean a sample should be to reject the null hypothesis with $\alpha = 0.05$?
For an N-sample $zM$ test, what is the minimum number of standard deviations away from the population mean a sample should be to reject the null hypothesis with $\alpha = 0.05$ in terms of $N$?
In a Poisson hypothesis test, what is the p-value if $\mu = 4.3$ and the sample is 8?
What is the standard error for $\bar{x} = 4$, $\sigma_x = 0.4$ and $N = 11$?

2.1

2.2



In [4]:

    
import scipy.stats as ss
ss.t.cdf(-2, 4) * 2









    Out[4]:





0.1161165235168155

2.3



In [11]:

    
-ss.norm.ppf(0.025)









    Out[11]:





1.9599639845400545

2.4

$$ 1.96 = \frac{\sqrt{N}\bar{x}}{\sigma} $$

You should be $ \frac{1.96}{\sqrt{N}} $ standard deviations away

2.5



In [7]:

    
1 - ss.poisson.cdf(7, mu=4.3)









    Out[7]:





0.07103170453943586

2.6



In [8]:

    
import math
0.4 / math.sqrt(11)









    Out[8]:





0.12060453783110546

3. Choose the hypothesis test

State which hypothesis test best fits the example below and state the null hypothesis. You can justify your answer if you feel like mulitiple tests fit.

You know that coffee should be brewed at 186 $^\circ{}$F. You measure coffee from Starbuck 10 times over a week and want to know if they're brewing at the correct temperature.
You believe that the real estate market in SF is the same as NYC. You gather 100 home prices from both markets to compare them.
Australia banned most guns in 2002. You compare homicide rates before and after this date.
A number of states have recently legalized recreational marijuana. You gather teen drug use data for the year prior and two years after the legislation took effect.
You think your mail is being stolen. You know that you typically get five pieces of mail on Wednesdays, but this Wednesday you got no mail.

3.1

t-test Null: The coffee is brewed at the correct temperature.

3.2

Wilcoxon Sum of Ranks

The real estate prices in SF and NYC are from the same distribution.

3.3

Wilcoxon Sum of Ranks

The homicide rates before the dat are from the same distribution

3.4

Wilcoxon Signed Ranks

The teen drug use data for the year prior and the year after two years after the legislation are from the same distribution

3.5

Poisson

Your mail is not being stolen

4. Hypothesis Tests

Do the following:

[1 Point] State the test type
[1 Point] State the null hypothesis
[2 Points] State the p-value
[1 Point] State if you accept/reject the null hypothesis
[1 Point] Answer the question

You have heard an urban legend that you are taller in the morning. Using the height measurements in centimeters below, answer the question

Morning	Evening
181	180
182	179
181	184
182	179
182	180
183	183
185	180

On a typical day in Rochester, there are 11 major car accidents. On the Monday after daylight savings time in the Spring, there are 18 major car accidents. Is this significant?
Your cellphone bill is typically \$20. The last four have been \\$21, \$30. \\$25, \$23. Has it significantly changed?

4.1

Wilcoxon Signed Rank Test
The two heights are from the same distribution
0.17
Cannot reject
No evidence for a difference in heights



In [10]:

    
p = ss.wilcoxon([181, 182, 181, 182, 182, 183, 185], [180, 179, 184, 179, 180, 183, 180])
print(p[1])









    



0.16820413904818

4.2

Poisson
The number of accidents is from the population distribution
0.032
Reject
Yes, there is a significant difference



In [11]:

    
1 - ss.poisson.cdf(17, mu=11)









    Out[11]:





0.03219052223037644

4.3

t-test
The new bills are from the population distribution of previous bills
0.09
Do not reject
No, the new bill is not significantly different



In [15]:

    
import numpy as np
data = [21, 30, 25, 23]
se = np.std(data, ddof=1) / np.sqrt(len(data))
T = (np.mean(data) - 20) / se
ss.t.cdf(-abs(T), df=len(data) - 1) * 2









    Out[15]:





0.09088922301434657

5. Exponential Test (5 Bonus Points)

Your dog typically greets you within 10 seconds of coming home. Is it significant that your dog took 16 seconds?



In [17]:

    
1 - ss.expon.cdf(16, scale=10)









    Out[17]:





0.20189651799465536