At least, probability


In [1]:
pnorm(0.95,mean=0.9,sd=0.0212,lower.tail=F)


Out[1]:
[1] 0.009174713

In [2]:
sum(dbinom(190:200,200,0.90))


Out[2]:
[1] 0.00807125

One categorical variable


In [15]:
#Confidence interval, one categorical variable,two outcome levels, observing success

p = 0.85
n = 670
CL = 0.95
SE = sqrt(p*(1-p)/n)
z_star = round(qnorm((1-CL)/2,lower.tail=F),digits=2)
ME = z_star * SE

c(p-ME, p+ME)


Out[15]:
[1] 0.822962 0.877038

So based on this data, we can interpret confidence interval as:

  • We are 95% confident that 83% to 87% of all Americans have good intuition about experimental design.
  • 95% of random samples of 670 Americans will yield confidence interval that will capture true proportion of Americans that have good intuition about experimental design.

In [4]:
#Required sample size proportion for desired ME

p = 0.85
z_star = 1.96
ME = 0.01

z_star**2*p*(1-p)/ME**2


Out[4]:
[1] 4898.04

In [31]:
#Required sample size proportion for desired ME

p = 0.5
z_star = 1.96
ME = 0.01

z_star**2*p*(1-p)/ME**2


Out[31]:
[1] 9604

In [6]:
#Confidence Interval
#Observe one level in categorical variable, of categorical of two levels.
#1  = Coursera, 2 = US
n_1 = 83
p_1 = 0.71

n_2 = 1028
p_2 = 0.25

CL = 0.95

SE = sqrt(   (p_1*(1-p_1)/n_1)+(p_2*(1-p_2)/n_2)   )
z_star = round(qnorm((1-CL)/2, lower.tail=F),digits=2)
ME = z_star*SE

c((p_1-p_2)-ME, (p_1-p_2)+ME)


Out[6]:
[1] 0.3588534 0.5611466

In [5]:
#Hypothesis testing, one categorical variable, given null value(p)

p = 0.5
p_hat = 0.6
n = 1983
SL = 0.05
SE = sqrt(p*(1-p)/n)
z_star = round(qnorm((1-CL)/2,lower.tail=F),digits=2)

pnorm(p_hat,mean=p,sd=SE,lower.tail=p_hat < p)


Out[5]:
[1] 2.641113e-19

question? Is majority Americans believe evolution? the data provide convincing evidence that majority of all Americans believe in evolution.


In [6]:
#Confidence Interval
#Observe one level in categorical variable, of categorical of two levels.
#1  = Coursera, 2 = US
n_1 = 83
p_1 = 0.71

n_2 = 1028
p_2 = 0.25

CL = 0.95

SE = sqrt(   (p_1*(1-p_1)/n_1)+(p_2*(1-p_2)/n_2)   )
z_star = round(qnorm((1-CL)/2, lower.tail=F),digits=2)
ME = z_star*SE

c((p_1-p_2)-ME, (p_1-p_2)+ME)


Out[6]:
[1] 0.3588534 0.5611466

In [27]:
#Confidence Interval
#Observe one level in categorical variable, of categorical of two levels.
#1  = Coursera, 2 = US
n_1 = 144
p_1 = 71/144

n_2 = 389
p_2 = 224/389

CL = 0.95

SE = sqrt(   (p_1*(1-p_1)/n_1)+(p_2*(1-p_2)/n_2)   )
z_star = round(qnorm((1-CL)/2, lower.tail=F),digits=2)
ME = z_star*SE

c((p_1-p_2)-ME, (p_1-p_2)+ME)


Out[27]:
[1] -0.17807031  0.01251047

In [30]:
SE


Out[30]:
[1] 0.04861754

we are 95% confident that proportion of Courserians is 36% to 56% higher than US that believe there should be law for banning gun possesion


In [13]:
#Hypothesis testing for null value zero
#Observe one level in categorical variable, of categorical of two levels.
#1 = Male, 2 = Female

n_1 = 90
np_1 = 34
p_1 = round(np_1/n_1,digits=2)


n_2 = 122
np_2 = 61
p_2 = round(np_2/n_2,digits=2)

p_pool = round((np_1+np_2)/(n_1+n_2),digits=2)
null = 0

SE = sqrt((p_pool*(1-p_pool)/n_1) + (p_pool*(1-p_pool)/n_2))
pe = p_1 - p_2

pnorm(pe,mean=null,sd=SE, lower.tail=pe < null) * 2


Out[13]:
[1] 0.08257998

there is no difference between males and females with respect to likelihood reporting their kids to being bullied


In [14]:
source("http://bit.ly/dasi_inference")

Inference based simulation

One small sample proportion


In [16]:
paul = factor(c(rep('yes',8),rep('no',0)), levels=c('yes','no'))
inference(paul,est='proportion',type='ht',method='simulation',success='yes',null=0.5,alternative='greater')


Single proportion -- success: yes 
Summary statistics: p_hat = 1 ;  n = 8 
H0: p = 0.5 
HA: p > 0.5 
p-value =  0.0048 

Comparing two small sample proportion

Screenshot taken from Coursera 04:57

Chi-Square GOF

Screenshot taken from Coursera 11:12


In [17]:
chi_square = 22.63
dof = 4

pchisq(chi_square,dof,lower.tail = F)


Out[17]:
[1] 0.000150104

the data provide convincing evidence that the observed counts distribution of race ethnicity for jurors did not follow population distribution

Chi-square independence test


In [18]:
chi_square = 22.63
dof = 4

pchisq(chi_square,dof,lower.tail = F)


Out[18]:
[1] 0.000150104

the data provide convincing evidence, that obesity and relationship are related.


In [21]:
male = 6+15+3
nostop = 4+3
grandtotal = 6+16+4+6+15+3

(male+nostop)/grandtotal


Out[21]:
[1] 0.62

In [26]:
7/50


Out[26]:
[1] 0.14

In [24]:
0.14*male


Out[24]:
[1] 3.36

In [25]:
0.14*24


Out[25]:
[1] 3.36

In [ ]: