8.Comparing Gaussian means

제기된 주장의 타당성을 검정할떄, 귀무가설(null hypothesis)과 대립가설(alternative hypothesis)이란 용어 사용.

귀무가설(null hypothesis, H0) : 기존의 주장

대립가설(alternative hypothesis, H1) : 증명을 필요로 하는 새로운 주장

기각치(critical value) : 귀무가설 H0를 기각하는 기준값

기각역(critical region) : 기각치를 기준으로 귀무가설을 기각할 수 있는 범위

검정통계량(test statistic) : 가설 검증을 위한 모수의 점 추정량

유의수준(significance level, a) : 귀무가설 H0를 잘못 기각할 확율

예제 : 1년생 붉은 소나무 묘목 40그루의 크기를 조사한 자료


In [14]:
options(jupyter.plot_mimetypes = 'image/png')
tree=c(2.6, 1.9, 1.8, 1.6, 1.4, 2.2, 1.2, 1.6, 1.6, 1.5,
       1.4, 1.6, 2.3, 1.5, 1.1, 1.6, 2.0, 1.5, 1.7, 1.5,
       1.6, 2.1, 2.8, 1.0, 1.2, 1.2, 1.8, 1.7, 0.8, 1.5,
       2.0, 2.2, 1.5, 1.6, 2.2, 2.1, 3.1, 1.7, 1.7, 1.2)
# example. find the mean and standard deviation(SD), standard error(SE)
mean(tree) # mean
sd(tree) # standard deviation
sd(tree)/sqrt(length(tree)) # stanard error
plot(density(tree))


Out[14]:
1.715
Out[14]:
0.474773900304404
Out[14]:
0.0750683449281813

In [16]:
# stadardization of statstics
# Converting N(0,1)
stree=(tree-mean(tree))/(sd(tree)/sqrt(length(tree)))
mean(stree) # mean
sd(stree) # standard deviation
sd(stree)/sqrt(length(stree))
plot(density(stree))


Out[16]:
-8.04911692853238e-16
Out[16]:
6.32455532033676
Out[16]:
1

위의 예제에서 n=40이고 표본평균 u=1.715, 표준편차 s=0.475이므로 유의 수준 0.05에서

H0 : u=1.9, H1 : u > 1.9

검정하기 위한 기각역을 구하고 검정통계량의 관측값을 구한 후 기각역을 만족하는지를 확인


In [19]:
# signification level a=0.05
# critical value z0.05 = 1.645 --> see the "table of normal distribution)
# check z > 1.645
z=(1.715-1.9)/(0.475/(sqrt(40))) # -2.4632
print(z)
# -2.4632 > 1.645 is wrong, that H0 is OK


[1] -2.463248

단측 검정(one-sided hypothesis)와 양측 검정(two-sided hypothesis)

H0 : u=1.9, H1 : u > 1.9(혹은 H0 : u=1.9, H1 : u < 1.9) 형태의 가설 검정은 단측 검정으며 H0 : u=1.9, H1 : u <> 1.9 형태의 검정은 양측 검정이라 한다.


In [2]:
# SMM Model(Seasonal Memory Model) of Dr John
# SMM predicts that a glucose-driven increase in recall performance is more pronounced in summer than in winter
# Recall performance : summer > winter

# but Dr Smith find the opposite result
# the in recall performance is smaller in summer than in winter
# Recall performance : summer < winter

# t-test example
# 1. Group Selection(2 Group of Monthly Salary)
Winter = c(-0.05,0.41,0.17,-0.13,0.00,-0.05,0.00,0.17,0.29,0.04,0.21,0.08,0.37,0.17,0.08,-0.04,-0.04,0.04,-0.13,-0.12,0.04,0.21,0.17,
            0.17,0.17,0.33,0.04,0.04,0.04,0.00,0.21,0.13,0.25,-0.05,0.29,0.42,-0.05,0.12,0.04,0.25,0.12)

Summer = c(0.00,0.38,-0.12,0.12,0.25,0.12,0.13,0.37,0.00,0.50,0.00,0.00,-0.13,-0.37,-0.25,-0.12,0.50,0.25,0.13,0.25,0.25,0.38,0.25,0.12,
            0.00,0.00,0.00,0.00,0.25,0.13,-0.25,-0.38,-0.13,-0.25,0.00,0.00,-0.12,0.25,0.00,0.50,0.00)
summary(Winter)
summary(Summer)
sd(Winter) # Standard Deviation Winter
sd(Summer) # Standard Deviation Summer


Out[2]:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.1300  0.0000  0.0800  0.1076  0.2100  0.4200 
Out[2]:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.38000  0.00000  0.00000  0.07341  0.25000  0.50000 
Out[2]:
0.145426622181169
Out[2]:
0.225384224781789

In [5]:
# 2. Density
plot(density(Winter))
lines(density(Summer), lty=2)


Error in replayPlot(obj): invalid graphics state

In [6]:
# 4. t-test
t.test(Winter,Summer, var.equal=T, alt='two.sided')
# t-value : 0.8151
# p-value : 0.4174


Out[6]:
	Two Sample t-test

data:  Winter and Summer
t = 0.8151, df = 80, p-value = 0.4174
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.04921817  0.11751085
sample estimates:
 mean of x  mean of y 
0.10756098 0.07341463 

p-value가 작아 질 수록 귀무가설(H0) 대항하는 반대의 근거는 강해 진다.

관측된 유의수준인 p-value(0.4174)가 미리 설정된 기준(보통 5%, 1%)보다 크기 때문에 H0 기각하지 못함 (통계적으로 유의하지 않음)

p-value : 0.4174 > 0.05