General Instructions

For full credit, you must have the following items for each problem:

  • [1 point] Describe what and why the method you're using is applicable. For example, 'I chose the signed rank test because these are two matched datasets describing one measurement'

  • [1 point] Write out the null hypothesis. For example, 'The null hypothesis is that the two measurements sets came from the same population (synonymous with probability distribution)'

  • [1 point] Report the p-value and your alpha value (significance level)

  • [1 point] if you reject or not reject the null hypothesis and answer the question

Put your work into the python cell and your answer to the questions into the markdown cell

Problem 1

You have a sample of an unknown metal with a melting point of $1,070^\circ{}$ C. You know that gold has a melting point of $1,064^\circ{}$ C and your measurements have a standard deviation of $7^\circ{}$ C. Is the unknown metal likely to be gold?


In [19]:
import scipy.stats as ss
import numpy as np

Z = (1070 - 1064) /  7
p = 1 - (ss.norm.cdf(Z - ss.norm.cdf(-Z)))
print(p)


0.254158715222
  • zM Test because it's a single sample against a normal population with known parameters
  • Sample is from population distribution
  • $p = 0.25$ and $\alpha = 0.05$
  • Do not reject: could be gold

Problem 2

Historically your taxes have had a population mean of \$3,452 and a standard deviation of \$120. This year your taxes are \$2341. Should you be concerned you made a mistake or does this appear to be a usual amount?


In [17]:
Z = (3542 - 2341) / 120
p = 1 - (ss.norm.cdf(Z - ss.norm.cdf(-Z)))
print(p)


0.0
  • zM Test because it's a single sample against a normal population with known parameters
  • Sample is from population distribution
  • $p = ~0.0$ and $\alpha = 0.05$
  • Reject: This is an unusual amount

Problem 3

Usually you run an 8 minute mile. After training with a new program for 8 weeks, your latest results are a 7:30 mile, a 10:20 mile, a 8:25 mile, a 7:45 mile and a 9:20 mile. Has your new program made a significant change?


In [20]:
d = [7.5, 10 + 20/60, 8 + 25 / 60, 7 + 45/60, 9 + 20/60]
T = (np.mean(d) - 8) / (np.std(d, ddof=1) / np.sqrt(len(d)))
p = 1 - (ss.t.cdf(T, df=len(d)) - ss.t.cdf(-T, df=len(d)))
print(p)


0.259029458291
  • t-test because it's a set of samples against a normal population with known mean but unknown standard deviation
  • Samples are from population distribution
  • $p = 0.259$ and $\alpha = 0.05$
  • Do not reject: There is not a significant change

Problem 4

Your manufacturing plant has made significant investment in improving quality control to improve yields. Your job is to determine if these investment have improved yields. Results on yield for the last 10 batches and from 5 batches from prior to the changes are available. Did these quality control improvements significantly change yields?


In [21]:
prior = [0.96, 0.97, 0.92, 0.88, 0.99]
post = [0.97, 0.96, 0.95, 0.97, 0.95, 0.85, 0.98, 0.77, 0.99, 0.97]

### BEGIN SOLUTION
ss.ranksums(prior, post)
### END SOLUTION


Out[21]:
RanksumsResult(statistic=-0.06123724356957945, pvalue=0.9511702692969356)
  • Sum of ranks test because it's a two unpaired sets of samples
  • Samples are from the same distribution
  • $p = 0.95$ and $\alpha = 0.05$
  • Do not reject: There is not significant difference in yields

Problem 5

You are doing the statistical analysis for efficacy of a new acne treatment. Each patient applies a control solution on half their face and drug-containing solution on the other half. After 4 weeks, they report the number pimples on both sides. Is the drug effective?


In [7]:
control = [2, 0, 3, 4, 0, 2, 6, 3, 11, 4, 0, 4]
drug = [1, 0, 3, 2, 1, 0, 1, 2, 4, 2, 1, 2]

### BEGIN SOLUTION
import scipy.stats as ss
ss.wilcoxon(control, drug)
### END SOLUTION


Out[7]:
WilcoxonResult(statistic=5.0, pvalue=0.020136751550346339)
  • Signed rank test because it's a set of paired data
  • Samples are from the same distribution
  • $p = 0.02$ and $\alpha = 0.05$
  • Reject: There is a significant change after applying the drug

Problem 6

9 out of 10 professor's recommend colgate. After polling 52 professors at the University of Rochester, 11 do not recommend colgate. Are the UR faculty significantly different than most other universities?


In [23]:
#11 is put into the interval of "extreme" values 
#Or think about 11 is being in the interval that makes our
#estiamte more conservative 
p = 1 - ss.poisson.cdf(11, 0.1 * 52)
print(p)


0.00731049562498
  • Poisson test because the population distribution can betaken as a Poisson approximation of a binomial.
  • Samples are from the population distribution
  • $p = 0.007$ and $\alpha = 0.05$
  • Reject: UR professors are significantly different in their choice of toothpaste

Identifying the Correct Hypothesis Test

For each of the following datasets and questions, say which test you can use and what the null hypothesis would be.

Problem 7

You have data on the grades of BME majors and CHE majors in fluids. How could you see if there is a significant performance in their grades?

Sum of ranks test - they are from the same distribution

Problem 8

Male labrador retrievers on average have an adult weight of 73 pounds with a standard deviation of 11 pounds. Your dog has a weight of 62 pounds. Could it be a labrador retriever?

zM test - The dog is a labrador retriever

Problem 9

Teen drivers get into car accidents within their first 12 months of driving at a rate of 1 in 30. 200 teen drivers participate in a new program whereby their car's speeds are governed using an electronic device. This results in 12 car accidents within 12 months. Is this program effective?

Poisson test - The rate of accidents follows the population Poisson distribution

Problem 10

You have would like to know if there is a correlation between day of the week and number of open parkings spots on campus.

Spearman correlation test - There is no correlation