Homework 10

CHE 116: Numerical Methods and Statistics

5/5/2018


Homework Requirements:

  1. Write all equations in $\LaTeX$
  2. Simplify all expressions
  3. Put comments in your Python code
  4. Explain or show your work
  5. Follow the academic honesty guidelines in the syllabus

1. Conceptual Questions

  1. In the following picture, which color area corresponds to the p-value?
  2. If a significance level goes up, is it easier or harder to reject a null hypothesis?
  3. If you only have one data point, which hypothesis test or tests can be used?
  4. Is it meaningful to perform a Wilcoxon Signed Rank Test if the two paired data are in different unit systems?
  5. We haven't learned about a "binomial hypothesis test", but what would the null hypothesis of such a test be and provide a situation where you would use it.

1.1

Blue

1.2

easier

1.3

zM test

1.4

yes

1.5

a count out of N came from a population binomial distribution. You count how many questions out of 6 you get correct on a homework when you normally have a probability of 0.3 of getting a question correct.

2. Hypothesis Tests

For the following questions, state the following in Markdown and show your numerical work in Python:

  • The null hypothesis
  • The choice of test
  • The $p$-value and if you are considering both tails (extreme values above and below) or only one side
  • If the null hypothesis is rejected

Each hypothesis test occurs once in the following, so make sure you do not repeat any of them!

  1. On average, 3 people fall asleep in class. Today 11 fall asleep in class. Is this significant?
  2. Your average running pace over the last few years has been an 8:00 minute mile. You've tried changing running shoes and recorded the following paces on your most recent runs: 7:56, 7:45, 7:34, 8:05, 7:35. Is your running pace significantly different?
  3. You are comparing two batches of a compound prepared by different technicians. The following purities have been recorded for technician A: 0.87, 0.86, 0.88, 0.93, 0.85, 0.67 and the following by technician B: 0.86, 0.96, 0.90, 0.76, 0.87, 0.83, 0.84, 0.80. Are they achiving similar purity?
  4. You are assessing the efficacy of a drug that helps people lose weight. 13 people who enrolled had the following weights at admission and after 8 weeks of the drug:
Person Weight at Start Weight at 8 Weeks
1 150 163
2 212 194
3 320 280
4 250 265
5 215 132
6 186 172
7 195 185
8 203 187
9 145 135
10 168 140
11 172 178
12 240 211
13 272 268

is there a significant effect from the drug?

5. A chemical refinery has input crude with a concentration of sulfor of 0.7% on average with a variance of 0.015%. A sample from the crude reveals a concentration of 1.2%. Is this significant enough that you should investigate?

6. You are assessing if a correlation exists between literacy rate and birthrate. You've found the following data from countries:

Country Literacy Rate Birthrate per 1000
Afghanistan 38.2% 37.90
Belize 82.7% 24.00
Laos 79.9% 23.60
Lebanon 93.9% 14.30
India 72.1% 19.00
Russia 99.7% 11.00
Argentina 98.1% 16.70
South Africa 94.3% 20.20
Venezuela 95.4% 18.80
Cameroon 75% 35.40
Chad 40.2% 35.60

Is there a relationship between these two?

2.1

  • This is a sample from the population Poisson
  • Poisson test
  • 0.0003
  • reject

In [4]:
#2.1
import numpy as np
import scipy.stats as ss
print(1 - ss.poisson.cdf(11 - 1, 3))


0.0002923369506473428

2.2

  • These times come from our population normal distribution
  • t-test
  • 0.0965
  • do not reject

In [17]:
#2.2
#must convert to sceonds!
times = [ 7 * 60 + 56, 7 * 60 + 45, 7 * 60 + 34, 8 * 60 + 5, 7 * 60 + 35]
T = (8 * 60 - np.mean(times)) / (np.std(times, ddof=1) / np.sqrt(len(times)))
# we look at both sides
p = 2 * ss.t.cdf(-T, len(times) - 1)
#print stat and p value and new mean
print(T, p, np.mean(times) / 60)


2.16366366222047 0.09649223504829538 7.783333333333333

2.3

  • These two numbers are from the same distribution
  • Wilcoxon sum of ranks
  • 0.70
  • do not reject

In [22]:
A = [0.87, 0.86, 0.88, 0.93, 0.85, 0.67]
B = [0.86, 0.96, 0.90, 0.76, 0.87, 0.83, 0.84, 0.80]

print(ss.ranksums(A, B).pvalue)


0.6985353583033387

2.4

  • The two sets of numbers are from the same distribution
  • Wilcoxon Signed Rank Test
  • 0.028
  • Reject

In [31]:
#2.4
# use python list to array syntax
data = np.array([
[ 1, 150, 163],
[ 2, 212, 194],
[ 3, 320, 280],
[ 4, 250, 265],
[ 5, 215, 132],
[ 6, 186, 172],
[ 7, 195, 185],
[ 8, 203, 187],
[ 9, 145, 135],
[10, 168, 140],
[11, 172, 178],
[12, 240, 211],
[13, 272, 268]
])
ss.wilcoxon(data[:,1], data[:,2])


Out[31]:
WilcoxonResult(statistic=14.0, pvalue=0.027660332975047608)

2.5

  • The sample is from the normal population
  • zM test
  • ~0
  • reject

In [30]:
# 2.5
#quick syntax without making z score
# CDF here is from -\infty up to high value
# 1 - includes top interval
# 2 * to get bottom interval
print(2 * (1 - ss.norm.cdf(1.2, loc=0.7, scale=np.sqrt(0.015))))


4.455709060402491e-05

2.6

  • There is no correlation between literacy rate and birthrate
  • Spearman Correlation Test
  • 0.001
  • reject

In [32]:
#2.6
data = np.array([
    [38.2,37.90],
[82.7,24.00],
[79.9,23.60],
[93.9,14.30],
[72.1,19.00],
[99.7,11.00],
[98.1,16.70],
[94.3,20.20],
[95.4,18.80],
[75,35.40],
[40.2,35.60]
])

ss.spearmanr(data[:,0], data[:,1])


Out[32]:
SpearmanrResult(correlation=-0.8363636363636365, pvalue=0.0013331850799508562)