In [67]:
%load_ext rpy2.ipython
P (A or B) = P (A) + P (B) − P (A and B)
P(A and B) = P(A) × P(B)
P(MR) ** 2 * P(FL)
P (A|B) = P (A and B) / P(B) **or** P(A and B) = P(A|B) × P(B)
For z-score
In [3]:
%%R
mean = 1500
sd = 300
d = 1800
(d-mean)/sd
To calculate the percentiles
In [9]:
%%R
mean = 1500
sd = 300
point = 2100
LT = T
pnorm(point,mean=mean,sd=sd,lower.tail=LT)
In [11]:
%%R
mean = 1500
sd = 300
percentile = 0.4
LT = T
qnorm(percentile,mean=mean,sd=sd,lower.tail=LT)
In [30]:
%%R
mean = 70
sd = 3.3
lower = 69
upper = 74
LT = T
# IN RANGE
# 1 - pnorm(upper,mean=mean,sd=sd,lower.tail=!LT) - pnorm(lower,mean=mean,sd=sd,lower.tail=LT)
# OUT RANGE
# pnorm(upper,mean=mean,sd=sd,lower.tail=!LT) + pnorm(lower,mean=mean,sd=sd,lower.tail=LT)
Screenshot taken from Coursera video, 09:53
In [5]:
%%R
k = 8
n = 10
p = 0.3
dbinom(k,size=n,p=p)
For calculating combinations,
In [25]:
%%R
k_success = 8
trials = 8
choose(trials,k_success)
Screenshot taken from Coursera video, 12:07
If for in fact the data is not sufficiently large, then it can't use normal distribution advantage. Then the only is one to do this, is manually compute for each of the value.
P(p,n,k, k < 2) = P(0) + P(1)
When we know mean and standard deviation of binomial-normal distribution, we can determine the probability using z=score.
Screenshot taken from Coursera video, 09:59
If all binomial requirements are followed, then this could also work.
In [75]:
import numpy as np
n = 40
p = 0.35
sd = round(np.sqrt(n*p*(1-p)),2)
print 'Expected Value of point success is', n*p
print 'standard deviation is', sd
print 'variance is' , sd**2
In [74]:
import numpy as np
n = 2500
p = 0.7
sd = round(np.sqrt(n*p*(1-p)),2)
print 'Expected Value of point success is', n*p
print 'standard deviation is', sd
print 'variance is' , sd**2
To check whether the data is sufficiently large,
In [19]:
p = 0.01
n = 300
assert (n*p >= 10 and n*(1-p) >= 10), 'Not large enough to take advantage of normal approximation'
For binorm norm, and for taking the probability of more than one events
In [37]:
%%R
val = 59
mu = 80
sd = 8
tail = T
# Add 0.5 to val if observe small range of observation
ifsmall = (-0.5+!tail)
pnorm (val,mean=mu,sd=sd,lower.tail=tail)
Because there's no exact value in binomial, we have to decrement by 0.05.
Snapshot taken from Coursera 04:26
Let's us state that:
n = 50
mu = 3.2
s = 1.74
z = 1.96
$$SE = \frac{s}{\sqrt{n}} = \frac{1.74}{\sqrt{50}} \approx 0.246$$$$\bar{x} \pm z * SE = 3.2 \pm 1.96 * 0.246 $$$$3.2 \pm 0.48 = (2.72,3.68)$$
In [76]:
%%R
#95% = 1.96
#99% = 2.58
n = 50
mu = 3.2
s = 1.74
# z = 1.96
CL = 0.95
z = round(qnorm((1-CL)/2,lower.tail=F),digits=2)
SE = s/sqrt(n)
ME = z*SE
c(mu-ME,mu+ME)
In [82]:
%%R
CL = 0.9
ME = 4
sd = 18
#CONFIDENCE LEVEL
#95% = 1.96
#99% = 2.58
#90% = 1.65
#####
z_star = round(qnorm((1-CL)/2,lower.tail=F),digits=2)
#z_star = 1.65
((z_star*sd)/ME)**2
In [82]:
%%R
CL = 0.9
ME = 4
sd = 18
#CONFIDENCE LEVEL
#95% = 1.96
#99% = 2.58
#90% = 1.65
#####
z_star = round(qnorm((1-CL)/2,lower.tail=F),digits=2)
#z_star = 1.65
((z_star*sd)/ME)**2
Snapshot taken from Coursera 14:01
Snapshot taken from Coursera 05:19
Interpreting p-value:
This is one-sided HT. two-sided will need to double the counting 0.209*2 = 0.418
To calculate HT for parameter mean in two-sided, lose x2 for one sided
In [71]:
%%R
xbar = 118.2
mu = 100
sd = 6.5
n = 36
SE = round(sd/sqrt(n),digits=2)
pnorm(xbar, mean=mu,sd=SE, lower.tail=xbar < mu) * 2
Statistical significance different you calculate different z-score . Practical significance is where you get the same impact eventhough statistically diffferent. Z-score 5 and 100 is different statistically, but practically the same, produce p-value approximately zero. Therefore one must be careful that they don't want to increase sample size too much, because it's practically useless, and use much resources.