Homework 10

CHE 116: Numerical Methods and Statistics

5/5/2018

Homework Requirements:

Write all equations in $\LaTeX$
Simplify all expressions
Put comments in your Python code
Explain or show your work
Follow the academic honesty guidelines in the syllabus

1. Conceptual Questions

In the following picture, which color area corresponds to the p-value?
If a significance level goes up, is it easier or harder to reject a null hypothesis?
If you only have one data point, which hypothesis test or tests can be used?
Is it meaningful to perform a Wilcoxon Signed Rank Test if the two paired data are in different unit systems?
We haven't learned about a "binomial hypothesis test", but what would the null hypothesis of such a test be and provide a situation where you would use it.

1.1

Blue

1.2

easier

1.3

zM test

1.4

yes

1.5

a count out of N came from a population binomial distribution. You count how many questions out of 6 you get correct on a homework when you normally have a probability of 0.3 of getting a question correct.

2. Hypothesis Tests

For the following questions, state the following in Markdown and show your numerical work in Python:

The null hypothesis
The choice of test
The $p$-value and if you are considering both tails (extreme values above and below) or only one side
If the null hypothesis is rejected

Each hypothesis test occurs once in the following, so make sure you do not repeat any of them!

On average, 3 people fall asleep in class. Today 11 fall asleep in class. Is this significant?
Your average running pace over the last few years has been an 8:00 minute mile. You've tried changing running shoes and recorded the following paces on your most recent runs: 7:56, 7:45, 7:34, 8:05, 7:35. Is your running pace significantly different?
You are comparing two batches of a compound prepared by different technicians. The following purities have been recorded for technician A: 0.87, 0.86, 0.88, 0.93, 0.85, 0.67 and the following by technician B: 0.86, 0.96, 0.90, 0.76, 0.87, 0.83, 0.84, 0.80. Are they achiving similar purity?
You are assessing the efficacy of a drug that helps people lose weight. 13 people who enrolled had the following weights at admission and after 8 weeks of the drug:

Person	Weight at Start	Weight at 8 Weeks
1	150	163
2	212	194
3	320	280
4	250	265
5	215	132
6	186	172
7	195	185
8	203	187
9	145	135
10	168	140
11	172	178
12	240	211
13	272	268

is there a significant effect from the drug?

5. A chemical refinery has input crude with a concentration of sulfor of 0.7% on average with a variance of 0.015%. A sample from the crude reveals a concentration of 1.2%. Is this significant enough that you should investigate?

6. You are assessing if a correlation exists between literacy rate and birthrate. You've found the following data from countries:

Country	Literacy Rate	Birthrate per 1000
Afghanistan	38.2%	37.90
Belize	82.7%	24.00
Laos	79.9%	23.60
Lebanon	93.9%	14.30
India	72.1%	19.00
Russia	99.7%	11.00
Argentina	98.1%	16.70
South Africa	94.3%	20.20
Venezuela	95.4%	18.80
Cameroon	75%	35.40
Chad	40.2%	35.60

Is there a relationship between these two?

2.1

This is a sample from the population Poisson
Poisson test
0.0003
reject



In [4]:

    
#2.1
import numpy as np
import scipy.stats as ss
print(1 - ss.poisson.cdf(11 - 1, 3))









    



0.0002923369506473428

2.2

These times come from our population normal distribution
t-test
0.0965
do not reject



In [17]:

    
#2.2
#must convert to sceonds!
times = [ 7 * 60 + 56, 7 * 60 + 45, 7 * 60 + 34, 8 * 60 + 5, 7 * 60 + 35]
T = (8 * 60 - np.mean(times)) / (np.std(times, ddof=1) / np.sqrt(len(times)))
# we look at both sides
p = 2 * ss.t.cdf(-T, len(times) - 1)
#print stat and p value and new mean
print(T, p, np.mean(times) / 60)









    



2.16366366222047 0.09649223504829538 7.783333333333333

2.3

These two numbers are from the same distribution
Wilcoxon sum of ranks
0.70
do not reject



In [22]:

    
A = [0.87, 0.86, 0.88, 0.93, 0.85, 0.67]
B = [0.86, 0.96, 0.90, 0.76, 0.87, 0.83, 0.84, 0.80]

print(ss.ranksums(A, B).pvalue)









    



0.6985353583033387

2.4

The two sets of numbers are from the same distribution
Wilcoxon Signed Rank Test
0.028
Reject



In [31]:

    
#2.4
# use python list to array syntax
data = np.array([
[ 1, 150, 163],
[ 2, 212, 194],
[ 3, 320, 280],
[ 4, 250, 265],
[ 5, 215, 132],
[ 6, 186, 172],
[ 7, 195, 185],
[ 8, 203, 187],
[ 9, 145, 135],
[10, 168, 140],
[11, 172, 178],
[12, 240, 211],
[13, 272, 268]
])
ss.wilcoxon(data[:,1], data[:,2])









    Out[31]:





WilcoxonResult(statistic=14.0, pvalue=0.027660332975047608)

2.5

The sample is from the normal population
zM test
~0
reject



In [30]:

    
# 2.5
#quick syntax without making z score
# CDF here is from -\infty up to high value
# 1 - includes top interval
# 2 * to get bottom interval
print(2 * (1 - ss.norm.cdf(1.2, loc=0.7, scale=np.sqrt(0.015))))









    



4.455709060402491e-05

2.6

There is no correlation between literacy rate and birthrate
Spearman Correlation Test
0.001
reject



In [32]:

    
#2.6
data = np.array([
    [38.2,37.90],
[82.7,24.00],
[79.9,23.60],
[93.9,14.30],
[72.1,19.00],
[99.7,11.00],
[98.1,16.70],
[94.3,20.20],
[95.4,18.80],
[75,35.40],
[40.2,35.60]
])

ss.spearmanr(data[:,0], data[:,1])









    Out[32]:





SpearmanrResult(correlation=-0.8363636363636365, pvalue=0.0013331850799508562)

Person	Weight at Start	Weight at 8 Weeks
1	150	163
2	212	194
3	320	280
4	250	265
5	215	132
6	186	172
7	195	185
8	203	187
9	145	135
10	168	140
11	172	178
12	240	211
13	272	268

Person	Weight at Start	Weight at 8 Weeks
1	150	163
2	212	194
3	320	280
4	250	265
5	215	132
6	186	172
7	195	185
8	203	187
9	145	135
10	168	140
11	172	178
12	240	211
13	272	268

Person	Weight at Start	Weight at 8 Weeks
1	150	163
2	212	194
3	320	280
4	250	265
5	215	132
6	186	172
7	195	185
8	203	187
9	145	135
10	168	140
11	172	178
12	240	211
13	272	268