Homework 10 Key

CHE 116: Numerical Methods and Statistics

4/3/2019



In [2]:
import scipy.stats as ss
import numpy as np

1. Conceptual Questions

  1. Describe the general process for parametric hypothesis tests.

  2. Why would you choose a non-parametric hypothesis test over a parametric one?

  3. Why would you choose a parametric hypothesis test over a non-parametric one?

  4. If you do not reject the null hypothesis, does that mean you've proved it?

1.1

You compute the end-points of an interval with values as extreme and more extreme as your sample data. You integrate the area of this interval to obtain your p-value. If the p-value is less than your significance threshold, you reject the null hypothesis.

1.2

To avoid assuming normal or other distribution

1.3

A parametric test can show significance with small amounts of data.

1.4

No

2. Short Answer Questions

  1. If your p-value is 0.4 and $\alpha = 0.1$, should you reject the null hypothesis?
  2. What is your p-value if your $T$-value is -2 in the two-tailed/two-sided $t$-test with a DOF of 4?
  3. For a one-sample $zM$ test, what is the minimum number of standard deviations away from the population mean a sample should be to reject the null hypothesis with $\alpha = 0.05$?
  4. For an N-sample $zM$ test, what is the minimum number of standard deviations away from the population mean a sample should be to reject the null hypothesis with $\alpha = 0.05$ in terms of $N$?
  5. In a Poisson hypothesis test, what is the p-value if $\mu = 4.3$ and the sample is 8?
  6. What is the standard error for $\bar{x} = 4$, $\sigma_x = 0.4$ and $N = 11$?

2.1

No

2.2


In [4]:
import scipy.stats as ss
ss.t.cdf(-2, 4) * 2


Out[4]:
0.1161165235168155

2.3


In [11]:
-ss.norm.ppf(0.025)


Out[11]:
1.9599639845400545

2.4

$$ 1.96 = \frac{\sqrt{N}\bar{x}}{\sigma} $$

You should be $ \frac{1.96}{\sqrt{N}} $ standard deviations away

2.5


In [7]:
1 - ss.poisson.cdf(7, mu=4.3)


Out[7]:
0.07103170453943586

2.6


In [8]:
import math
0.4 / math.sqrt(11)


Out[8]:
0.12060453783110546

3. Choose the hypothesis test

State which hypothesis test best fits the example below and state the null hypothesis. You can justify your answer if you feel like mulitiple tests fit.

  1. You know that coffee should be brewed at 186 $^\circ{}$F. You measure coffee from Starbuck 10 times over a week and want to know if they're brewing at the correct temperature.

  2. You believe that the real estate market in SF is the same as NYC. You gather 100 home prices from both markets to compare them.

  3. Australia banned most guns in 2002. You compare homicide rates before and after this date.

  4. A number of states have recently legalized recreational marijuana. You gather teen drug use data for the year prior and two years after the legislation took effect.

  5. You think your mail is being stolen. You know that you typically get five pieces of mail on Wednesdays, but this Wednesday you got no mail.

3.1

t-test Null: The coffee is brewed at the correct temperature.

3.2

Wilcoxon Sum of Ranks

The real estate prices in SF and NYC are from the same distribution.

3.3

Wilcoxon Sum of Ranks

The homicide rates before the dat are from the same distribution

3.4

Wilcoxon Signed Ranks

The teen drug use data for the year prior and the year after two years after the legislation are from the same distribution

3.5

Poisson

Your mail is not being stolen

4. Hypothesis Tests

Do the following:

  1. [1 Point] State the test type
  2. [1 Point] State the null hypothesis
  3. [2 Points] State the p-value
  4. [1 Point] State if you accept/reject the null hypothesis
  5. [1 Point] Answer the question

  1. You have heard an urban legend that you are taller in the morning. Using the height measurements in centimeters below, answer the question
Morning Evening
181 180
182 179
181 184
182 179
182 180
183 183
185 180
  1. On a typical day in Rochester, there are 11 major car accidents. On the Monday after daylight savings time in the Spring, there are 18 major car accidents. Is this significant?

  2. Your cellphone bill is typically \$20. The last four have been \\$21, \$30. \\$25, \$23. Has it significantly changed?

4.1

  1. Wilcoxon Signed Rank Test
  2. The two heights are from the same distribution
  3. 0.17
  4. Cannot reject
  5. No evidence for a difference in heights

In [10]:
p = ss.wilcoxon([181, 182, 181, 182, 182, 183, 185], [180, 179, 184, 179, 180, 183, 180])
print(p[1])


0.16820413904818

4.2

  1. Poisson
  2. The number of accidents is from the population distribution
  3. 0.032
  4. Reject
  5. Yes, there is a significant difference

In [11]:
1 - ss.poisson.cdf(17, mu=11)


Out[11]:
0.03219052223037644

4.3

  1. t-test
  2. The new bills are from the population distribution of previous bills
  3. 0.09
  4. Do not reject
  5. No, the new bill is not significantly different

In [15]:
import numpy as np
data = [21, 30, 25, 23]
se = np.std(data, ddof=1) / np.sqrt(len(data))
T = (np.mean(data) - 20) / se
ss.t.cdf(-abs(T), df=len(data) - 1) * 2


Out[15]:
0.09088922301434657

5. Exponential Test (5 Bonus Points)

Your dog typically greets you within 10 seconds of coming home. Is it significant that your dog took 16 seconds?


In [17]:
1 - ss.expon.cdf(16, scale=10)


Out[17]:
0.20189651799465536

No