For the following questions, state the following in Markdown and show your numerical work in Python:
Each hypothesis test occurs once in the following, so make sure you do not repeat any of them!
Person | Weight at Start | Weight at 8 Weeks |
---|---|---|
1 | 150 | 163 |
2 | 212 | 194 |
3 | 320 | 280 |
4 | 250 | 265 |
5 | 215 | 132 |
6 | 186 | 172 |
7 | 195 | 185 |
8 | 203 | 187 |
9 | 145 | 135 |
10 | 168 | 140 |
11 | 172 | 178 |
12 | 240 | 211 |
13 | 272 | 268 |
is there a significant effect from the drug?
5. A chemical refinery has input crude with a concentration of sulfor of 0.7% on average with a variance of 0.015%. A sample from the crude reveals a concentration of 1.2%. Is this significant enough that you should investigate?
6. You are assessing if a correlation exists between literacy rate and birthrate. You've found the following data from countries:
Country | Literacy Rate | Birthrate per 1000 |
---|---|---|
Afghanistan | 38.2% | 37.90 |
Belize | 82.7% | 24.00 |
Laos | 79.9% | 23.60 |
Lebanon | 93.9% | 14.30 |
India | 72.1% | 19.00 |
Russia | 99.7% | 11.00 |
Argentina | 98.1% | 16.70 |
South Africa | 94.3% | 20.20 |
Venezuela | 95.4% | 18.80 |
Cameroon | 75% | 35.40 |
Chad | 40.2% | 35.60 |
Is there a relationship between these two?
In [4]:
#2.1
import numpy as np
import scipy.stats as ss
print(1 - ss.poisson.cdf(11 - 1, 3))
In [17]:
#2.2
#must convert to sceonds!
times = [ 7 * 60 + 56, 7 * 60 + 45, 7 * 60 + 34, 8 * 60 + 5, 7 * 60 + 35]
T = (8 * 60 - np.mean(times)) / (np.std(times, ddof=1) / np.sqrt(len(times)))
# we look at both sides
p = 2 * ss.t.cdf(-T, len(times) - 1)
#print stat and p value and new mean
print(T, p, np.mean(times) / 60)
In [22]:
A = [0.87, 0.86, 0.88, 0.93, 0.85, 0.67]
B = [0.86, 0.96, 0.90, 0.76, 0.87, 0.83, 0.84, 0.80]
print(ss.ranksums(A, B).pvalue)
In [31]:
#2.4
# use python list to array syntax
data = np.array([
[ 1, 150, 163],
[ 2, 212, 194],
[ 3, 320, 280],
[ 4, 250, 265],
[ 5, 215, 132],
[ 6, 186, 172],
[ 7, 195, 185],
[ 8, 203, 187],
[ 9, 145, 135],
[10, 168, 140],
[11, 172, 178],
[12, 240, 211],
[13, 272, 268]
])
ss.wilcoxon(data[:,1], data[:,2])
Out[31]:
In [30]:
# 2.5
#quick syntax without making z score
# CDF here is from -\infty up to high value
# 1 - includes top interval
# 2 * to get bottom interval
print(2 * (1 - ss.norm.cdf(1.2, loc=0.7, scale=np.sqrt(0.015))))
In [32]:
#2.6
data = np.array([
[38.2,37.90],
[82.7,24.00],
[79.9,23.60],
[93.9,14.30],
[72.1,19.00],
[99.7,11.00],
[98.1,16.70],
[94.3,20.20],
[95.4,18.80],
[75,35.40],
[40.2,35.60]
])
ss.spearmanr(data[:,0], data[:,1])
Out[32]: