Homework 9 Key

CHE 116: Numerical Methods and Statistics

Prof. Andrew White

Version 1.0 (3/23/2016)



In [4]:

    
from math import erf, sqrt
import numpy as np
import scipy.stats

General Instructions

For full credit, you must have the following items for each problem:

[1 point] Describe what and why the method you're using is applicable. For example, 'I chose the signed rank test because these are two matched datasets describing one measurement'
[1 point] Write out the null hypothesis. For example, 'The null hypothesis is that the two measurements sets came from the same population (synonymous with probability distribution)'
[1 point] Report the p-value and your alpha value
[1 point] if you accept/reject the null hypothesis and answer the question

1. $zM$ Tests (8 Points)

You have a sample of an unknown metal with a melting point of $1,070^\circ{}$ C. You know that gold has a melting point of $1,064^\circ{}$ C and your measurements have a standard deviation of $5^\circ{}$ C. Is the unknown metal likely to be gold?
Recall from confidence intervals, that the standard deviation in distance from the true mean is $\sigma / \sqrt{N}$ when you know the true standard deviation, $\sigma$. You take three additional samples and get $1,071^\circ{}$ C, $1,067^\circ{}$ C, and $1,075^\circ{}$ C. Does your evidence for gold change? USe the original measurement as well.

1.1 Answer

$zM$ test is chosen because we have one sample compared with a parent group whose mean and standard deviation is known.
The null hypothesis: The sample is gold



In [8]:

    
mu_sample=1070
mu_popul=1064.
st_dev=5
z=(-mu_popul+mu_sample)/st_dev
print('Z:', z)
p=(1 - np.abs((scipy.stats.norm.cdf(z)-scipy.stats.norm.cdf(-z))))
print('P-Value:', p)









    



Z: 1.2
P-Value: 0.230139340443

The $p$-value is 0.23
We do not reject the null hypothesis, so the sample could be gold

1.2 Answer

$zM$ test is chosen because we have a sample compared with a parent group whose mean and standard deviation is known.
The null hypothesis: The sample is gold

The formula for a $Z$-statistic with a sample size greater than 1 is:

$$ Z = \frac{\mu - \bar{x}}{\sigma / \sqrt{N}}$$



In [14]:

    
mu = 1064.
sigma = 5.
data = [1070, 1071, 1067, 1075]
Z = (mu - np.mean(data)) / (sigma / sqrt(len(data)))
print('Z:', Z)
p = 1 - (scipy.stats.norm.cdf(abs(Z)) - scipy.stats.norm.cdf(-abs(Z)))
print('P-Value:', p)









    



Z: -2.7
P-Value: 0.00693394760608

The $p$-value is 0.006
We do reject the null hypothesis, so the sample is not gold. Different than last time

2. $t$-Tests (4 Points)

The median snowfall in Rochester is 89.3. The last four snowfalls have been 112.7, 78, 59.9 and 127. Are these snowfalls abnormal?
Repeat problem 1.2 without knowing the standard deviation

2.1 Answer

$t$-test is chosen because we have a sample compared with a parent group whose mean is known but not standard deviation.
The null hypothesis: The snowfall is about the same as usual



In [12]:

    
mu = 89.3
data = [112.7, 78, 59.9, 127]
T = (mu - np.mean(data)) / np.sqrt(np.var(data, ddof=1) / len(data))
T = np.abs(T)
print('T:', T)
p = 1 - (scipy.stats.t.cdf(T, len(data)) - scipy.stats.t.cdf(-T, len(data)))
print('p-value:', p)









    



T: 0.330534137464
p-value: 0.75758464492

The $p$-value is 0.76
We do not reject the null hypothesis, so the snowfall is the usual

2.2 Answer

$t$-test because we're comparing a single sample with a parent group whose standard deviation is unknown
The null hypothesis: the samples are gold



In [15]:

    
mu = 1064
data = [1070, 1071, 1067, 1075]
T = (mu - np.mean(data)) / np.sqrt(np.var(data, ddof=1) / len(data))
T = np.abs(T)
print('T:', T)
p = 1 - (scipy.stats.t.cdf(T, len(data)) - scipy.stats.t.cdf(-T, len(data)))
print('p-value:', p)









    



T: 4.08590950567
p-value: 0.015025415727

The $p$-value is 0.015
We still reject the null hypothesis

3. Wilcoxon's Sum of Rank Test (4 Points)

You are comparing the GPAs of students who take a new Freshman preparedness course and those who do not. Their GPAs are given below. Does the course help the students?



In [17]:

    
data_1 = [3.05, 3.01, 3.20, 3.16, 3.11, 3.09]
data_2 = [3.18, 3.23, 3.19, 3.28, 3.08, 3.18]

3.1 Answer

The Wilcoxon sum of ranks test is chosen because we are comparing two unpaired sample groups
The null hypothesis: The two sample groups came from the same parent distribution.



In [16]:

    
_,p = scipy.stats.ranksums(data_1, data_2)
print(p)









    



0.0781690858243

The $p$-value is 0.08
We do not reject the null hypothesis, so the course has no effect

4. Wilcoxon's Signed Rank Test (4 Points)

You calculate how long it takes someone to run two miles before and after they've eaten a garbage plate. Does eating a garbage plate influence your ability to run?



In [21]:

    
data_empty_tummy = [17.1, 29.5, 23.8, 37.3, 19.6, 24.2, 30.0, 20.9]
data_garbage_tummy = [14.2, 30.3, 21.5, 36.3, 19.6, 24.5, 26.7, 20.6]

4.1 Answer

Wilcoxon signed rank is chosen because we have two paired sample groups that we're comparing
The null hypothesis: The two sample groups are from the same parent distribution (no change)



In [25]:

    
_,p = scipy.stats.wilcoxon(data_empty_tummy, data_garbage_tummy)
print('p-value:', p)









    



p-value: 0.128190174345






    



/opt/conda/lib/python3.5/site-packages/scipy/stats/morestats.py:2384: UserWarning: Warning: sample size too small for normal approximation.
  warnings.warn("Warning: sample size too small for normal approximation.")

The $p$-value is 0.13
We do not reject the null hypothesis, so there appears to be no difference.

5. Spearman's Correlation Test (4 Points)

We've performed a chemical reaction at different temperatures and would like to see if there is a relationship with temperature and yield. Is there one?



In [26]:

    
temperature = [15, 18, 21, 24, 27, 30, 33]
chem_yield = [66, 69, 69, 70, 64, 73, 75]

5.1 Answer

Spearman's Correlation coefficient because we've measured two different things for one sample group
Null hypothesis: there is no correlation



In [27]:

    
scipy.stats.spearmanr(temperature, chem_yield)









    Out[27]:





SpearmanrResult(correlation=0.63065622388689124, pvalue=0.12888769568495784)

the $p$-value is 0.13
There is barely not enough evidence to reject the null hypothesis. No correlation

6.Poisson Test (4 Points)

Some speculate that the lottery is an elaborate trap for time-travelers. We set-up a lottery where the odds of winning are one in 10 million. If one million people play and we get 3 winners, should we be suspicious of the number of winners?

6.1 Answer

Poisson's test is chosen, because we're comparing a count to a known parent distribution
Null hypothesis: The count is from the known parent distribution



In [29]:

    
p_winning = 1 / 10**7
expected = p_winning * 10**6

p = 2 * (1 - scipy.stats.poisson.cdf(3, mu=expected))
print(p)









    



7.69366785058e-06

$p$-vaule is 0.000008
The null hypothesis is rejected, we should arrest the winners for time travel

7. Binomial Test (4 Points)

You're wondering if you have a fair coin or not. You've flipped it 25 times and gotten heads 8 times. Is there evidence that the coin is unfair?

7.1 Answer

A binomial test is appropriate because we're comparing a sample from a known distribution where the number of trials is fixed and the probability of an outcome is constant
The null hypothesis is that the outcome of the experiment came from the known binomial distribution



In [42]:

    
p = 0.5
N = 25
n = 8

print(2 *scipy.stats.binom.cdf(n, N, p))









    



0.107752144337

The set-up for this p-value is to construct an interval over the known binomial distribution that just includes the value. I've done this by integrating from 0 up to the value. Our value is lower, so we're getting the left side of the interval. I multiply by 2 to get the other side.

The $p$-value is 0.11
The null hypothesis is NOT rejected, it appears that the coin is close enough to fair

8. Choosing the test (5 Points)

State which test is most appropriate for the following:

You drive two different routes home from work. You drive each route 10 times. Are they significantly different?
You normally get 10 likes when you post a selfie. Today you got 25 likes. Are you looking significantly good today?
You have the number of computer viruses a set of 25 users has on their computers after 4 weeks. They each take a training course about risky Internet behavior and you have the number of viruses on their computers in the following 4 weeks. Does the course help them?
On average, students get a 94% on a homework. These are the grades on this homework: 78, 85, 67, 53, 57, 84, 26. Are these significantly different than the previous average?
A drug trial showed that patients using a new drug develop liver problems at a rate of 1 per 25. In a group of 10 patients using the drug, two are showing liver problems. Is this significantly different?

8 Answer:

Wilcoxon's sum of ranks
Poisson's Test
Wilcoson's signed rank test
Student's $t$-test
Binomial test



In [ ]: