1960년대 미국 남부에서 배심원 선정에 대한 인종 편견이 문제가 되었다.
이론적으로 배심원 명부는 자격이 있는 시민들 중에서 무작위로 선정되지만
남부주에서는 자격이 있는 시민들 중 50% 흑인이지만, 배심원 명부에 등재된 80명중에서 흑인이 4명뿐이였다.
그리서 피고들이 배심원들의 평결에 이의를 제기했고, 항소심에서 참고인으로 증언했다.
d를 붙이면 확률밀도함수(probability density function)
p를 붙이면 누적밀도함수(cumulation density function)
In [1]:
options(scipen=999)
pbinom(4, 80, prob=0.5)
Out[1]:
$H_0, 귀무가설(영가설)$ 앞의 예에서 배심원이 전체 모집단에서 무작위로 선정된 것이 $H_0$ 흑인이 선정될 확률은 $p=0.5$ $H_1, 대립가설$ 흑인이 배심원으로 선정될 확률이 $p<0.5$
귀무가설에 반대되는 증거를 평가할 통계량을 정한다.$p=0.5, n=80$인 이항확률변수 X임
귀무가설이 사실이라면, 검증통계량이 관측될 확율을 구한다. $pbinom(4, 80, prob=0.5) = 0.0000000000000000014$
$\alpha$는 어떤 결과가 통계적으로 의미 있다고 판단하는 기준점(0.05, 0.01을 많이 사용)
즉 $p-value \le \alpha$ 이면 귀무가설 $H_0$를 기각한다.(위의 예에서는 판사가 귀무가설을 기각함)
In [2]:
# t-test example
# 1. Group Selection(2 Group of Monthly Salary)
Winter = c(-0.05,0.41,0.17,-0.13,0.00,-0.05,0.00,0.17,0.29,0.04,0.21,0.08,0.37,0.17,0.08,-0.04,-0.04,0.04,-0.13,-0.12,0.04,0.21,0.17,
0.17,0.17,0.33,0.04,0.04,0.04,0.00,0.21,0.13,0.25,-0.05,0.29,0.42,-0.05,0.12,0.04,0.25,0.12)
Summer = c(0.00,0.38,-0.12,0.12,0.25,0.12,0.13,0.37,0.00,0.50,0.00,0.00,-0.13,-0.37,-0.25,-0.12,0.50,0.25,0.13,0.25,0.25,0.38,0.25,0.12,
0.00,0.00,0.00,0.00,0.25,0.13,-0.25,-0.38,-0.13,-0.25,0.00,0.00,-0.12,0.25,0.00,0.50,0.00)
summary(Winter)
summary(Summer)
sd(Winter) # Standard Deviation Winter
sd(Summer) # Standard Deviation Summer
Out[2]:
Out[2]:
Out[2]:
Out[2]:
In [9]:
# 2. Density
options(jupyter.plot_mimetypes = 'image/png')
plot(density(Winter))
lines(density(Summer), lty=2)
In [10]:
options(jupyter.plot_mimetypes = 'image/png')
x = Winter-Summer # allowed because it is a within-subjects design
x = x/sd(x) # standardize N(u/sd, 1)
plot(density(x))
In [7]:
# t-test(p-value = 0.4365)
t.test(x, var.equal=T, alt='two.sided')
Out[7]:
하나의 샘플에 대하여 t-test 수행시에 샘플 데이터는 정규분포 $N(\mu, \sigma)$에 따른다고 가정한다.
Dr Smith의 실험데이터(winter-sumer)는 겨울에서 여름의 점수를 빼고 표준화($\sigma$로 나눔)한 예이다.
Stanardized difference scores(ie., "winter scores - summer scores")
영가설(귀무가설)은 차이(winter-summer)의 평균은 "0" : $H_0 : \mu = 0$
대립가설은 차이(winter-summer)의 평균은 "0"이 아님 : $H_1 : \mu \ne 0$
In [9]:
options(jupyter.plot_mimetypes = 'image/png')
dcauchy(0, location=0, scale=1)
pcauchy(0, location=0, scale=1)
qcauchy(0.5, location=0, scale=1) # quantiles(50%)
rcauchy(5, location=0, scale=1)
delta1=rcauchy(1000,location=0, scale=2 )
delta2=rcauchy(1000,location=1, scale=1 )
delta3=rcauchy(1000,location=2, scale=0.5 )
par(mfrow=c(1,1))
plot(density(delta1), xlim=c(-10, 15),ylim=c(0,1), lty=1, col='red')
lines(density(delta2), lty=2,col='blue')
lines(density(delta3), lty=3,col='green')
Out[9]:
Out[9]:
Out[9]:
Out[9]:
In [ ]:
# One-Sample Comparison of Means
modelString = "
model{
# Data
for (i in 1:ndata){
x[i] ~ dnorm(mu,lambda)
}
mu <- delta*sigma
lambda <- pow(sigma,-2)
# delta and sigma Come From (Half) Cauchy Distributions
lambdadelta ~ dchisqr(1)
delta ~ dnorm(0,lambdadelta)
lambdasigma ~ dchisqr(1)
sigmatmp ~ dnorm(0,lambdasigma)
sigma <- abs(sigmatmp)
# Sampling from Prior Distribution for Delta
deltaprior ~ dnorm(0,lambdadeltaprior)
lambdadeltaprior ~ dchisqr(1)
}
In [1]:
dchisq(0, df=10)
pchisq(0, df=10)
qchisq(0.5, df=10)
rchisq(5, df=10)
Out[1]:
Out[1]:
Out[1]:
Out[1]:
In [8]:
options(jupyter.plot_mimetypes = 'image/png')
x=rchisq(10000, df=10)
z=(x-mean(x))/(sd(x)) # z transformaction
plot(density(x), xlim=c(0,20), ylim=c(0,0.5), col='red')
lines(density(z), col='black')
In [15]:
options(jupyter.plot_mimetypes = 'image/png')
# clears workspace:
rm(list=ls())
#JARS
library(R2jags)
# Read data Dr. Smith
Winter = c(-0.05,0.41,0.17,-0.13,0.00,-0.05,0.00,0.17,0.29,0.04,0.21,0.08,0.37,0.17,0.08,-0.04,-0.04,0.04,-0.13,-0.12,0.04,0.21,0.17,
0.17,0.17,0.33,0.04,0.04,0.04,0.00,0.21,0.13,0.25,-0.05,0.29,0.42,-0.05,0.12,0.04,0.25,0.12)
Summer = c(0.00,0.38,-0.12,0.12,0.25,0.12,0.13,0.37,0.00,0.50,0.00,0.00,-0.13,-0.37,-0.25,-0.12,0.50,0.25,0.13,0.25,0.25,0.38,0.25,0.12,
0.00,0.00,0.00,0.00,0.25,0.13,-0.25,-0.38,-0.13,-0.25,0.00,0.00,-0.12,0.25,0.00,0.50,0.00)
x = Winter-Summer # allowed because it is a within-subjects design
x = x/sd(x) # standardize
ndata = length(Winter) # number of subjects
data = list("x", "ndata") # to be passed on to JAGS
# inital value
myinits <- list(
list(delta = rnorm(1,0,3), sigmatmp = rnorm(1,0,1)),
list(delta = rnorm(1,0,3), sigmatmp = rnorm(1,0,1)),
list(delta = rnorm(1,0,3), sigmatmp = rnorm(1,0,1)))
# Parameters to be monitored
parameters = c("delta")
# 1. Model
# One-Sample Comparison of Means
modelString = "
model{
# Data
for (i in 1:ndata){
x[i] ~ dnorm(mu,lambda)
}
mu <- delta*sigma
lambda <- pow(sigma,-2)
# delta and sigma Come From (Half) Cauchy Distributions
lambdadelta ~ dchisqr(1)
delta ~ dnorm(0,lambdadelta)
lambdasigma ~ dchisqr(1)
sigmatmp ~ dnorm(0,lambdasigma)
sigma <- abs(sigmatmp)
# Sampling from Prior Distribution for Delta
deltaprior ~ dnorm(0,lambdadeltaprior)
lambdadeltaprior ~ dchisqr(1)
}"
# The following command calls JAGS with specific options.
# For a detailed description see the R2jags documentation.
samples = jags(data, inits=myinits, parameters,
model.file=textConnection(modelString),
n.chains=3, n.iter=10000, n.burnin=1000, n.thin=1, DIC=T)
# Now the values for the monitored parameters are in the "samples" object,
# ready for inspection.
plot(samples)
In [16]:
# Collect posterior samples across all chains:
delta.posterior = samples$BUGSoutput$sims.list$delta
#============ BFs based on logspline fit ===========================
library(polspline) # this package can be installed from within R
fit.posterior = logspline(delta.posterior)
# 95% confidence interval:
x0 = qlogspline(0.025,fit.posterior)
x1 = qlogspline(0.975,fit.posterior)
posterior = dlogspline(0, fit.posterior) # this gives the pdf at point delta = 0
prior = dcauchy(0) # height of order--restricted prior at delta = 0
BF01 = posterior/prior
BF01
Out[16]:
In [8]:
#============ Plot Prior and Posterior ===========================
par(cex.main = 1.5, mar = c(5, 6, 4, 5) + 0.1, mgp = c(3.5, 1, 0), cex.lab = 1.5,
font.lab = 2, cex.axis = 1.3, bty = "n", las=1)
xlow = -3
xhigh = 3
yhigh = 4
Nbreaks = 80
y = hist(delta.posterior, Nbreaks, plot=F)
plot(c(y$breaks, max(y$breaks)), c(0,y$density,0), type="S", lwd=2, lty=2,
xlim=c(xlow,xhigh), ylim=c(0,yhigh), xlab=" ", ylab="Density", axes=F)
axis(1, at = c(-4,-3,-2,-1,0,1,2,3,4), lab=c("-4","-3","-2","-1","0", "1", "2", "3", "4"))
axis(2)
mtext(expression(delta), side=1, line = 2.8, cex=2)
#now bring in log spline density estimation:
par(new=T)
plot(fit.posterior, ylim=c(0,yhigh), xlim=c(xlow,xhigh), lty=1, lwd=1, axes=F)
points(0, dlogspline(0, fit.posterior),pch=19, cex=2)
# plot the prior:
par(new=T)
plot ( function( x ) dcauchy( x, 0, 1 ), xlow, xhigh, ylim=c(0,yhigh), xlim=c(xlow,xhigh), lwd=2, lty=1, ylab=" ", xlab = " ", axes=F)
axis(1, at = c(-4,-3,-2,-1,0,1,2,3,4), lab=c("-4","-3","-2","-1","0", "1", "2", "3", "4"))
axis(2)
points(0, dcauchy(0), pch=19, cex=2)
그래프에 따르면 $\delta$(effect size)의 사후분포가 0 근처에서 최대치를 이루며 음의 영역보다 양의 영역에서 확률분포가 크다.
$\delta=0$일때 사후분포는 사전분포보다 5배 크며,
이것은 Bayer factor가 영가설(귀무가설) $H_0$을 5:1 비율로 지지한다.
앞장에서 Bayer factor를 $H_0 : \delta=0$ versus $H_1 : \delta \sim Cauchy(0,1)$ 지지 정도로 계산(5:1)하였다.
In [29]:
options(jupyter.plot_mimetypes = 'image/png')
# clears workspace:
rm(list=ls())
#JARS
library(R2jags)
# Read data Dr. Smith
Winter = c(-0.05,0.41,0.17,-0.13,0.00,-0.05,0.00,0.17,0.29,0.04,0.21,0.08,0.37,0.17,0.08,-0.04,-0.04,0.04,-0.13,-0.12,0.04,0.21,0.17,
0.17,0.17,0.33,0.04,0.04,0.04,0.00,0.21,0.13,0.25,-0.05,0.29,0.42,-0.05,0.12,0.04,0.25,0.12)
Summer = c(0.00,0.38,-0.12,0.12,0.25,0.12,0.13,0.37,0.00,0.50,0.00,0.00,-0.13,-0.37,-0.25,-0.12,0.50,0.25,0.13,0.25,0.25,0.38,0.25,0.12,
0.00,0.00,0.00,0.00,0.25,0.13,-0.25,-0.38,-0.13,-0.25,0.00,0.00,-0.12,0.25,0.00,0.50,0.00)
x = Winter-Summer # allowed because it is a within-subjects design
x = x/sd(x) # standardize
ndata = length(Winter) # number of subjects
data = list("x", "ndata") # to be passed on to JAGS
# inital value
myinits = list(
list(delta = -abs(rnorm(1,0,1)), sigmatmp = 0.1),
list(delta = -abs(rnorm(1,0,1)), sigmatmp = 0.2),
list(delta = -abs(rnorm(1,0,1)), sigmatmp = 0.3))
# Parameters to be monitored
parameters = c("delta")
# 1. Model
# One-Sample Order Restricted Comparison of Means
modelString = "
model{
# Data
for (i in 1:ndata){
x[i] ~ dnorm(mu,lambda)
}
mu <- delta*sigma
lambda <- pow(sigma,-2)
# delta and sigma Come From (Half) Cauchy Distributions
lambdadelta ~ dchisqr(1)
delta ~ dnorm(0,lambdadelta)T(,0)
lambdasigma ~ dchisqr(1)
sigmatmp ~ dnorm(0,lambdasigma)
sigma <- abs(sigmatmp)
# Sampling from Prior Distribution for Delta
deltaprior ~ dnorm(0,lambdadeltaprior)T(,0)
lambdadeltaprior ~ dchisqr(1)
}"
# The following command calls JAGS with specific options.
# For a detailed description see the R2jags documentation.
samples = jags(data, inits=myinits, parameters,
model.file=textConnection(modelString),
n.chains=3, n.iter=10000, n.burnin=1000, n.thin=1, DIC=T)
# Now the values for the monitored parameters are in the "samples" object,
# ready for inspection.
plot(samples)
In [30]:
# Collect posterior samples across all chains:
delta.posterior <- samples$BUGSoutput$sims.list$delta
#============ BFs based on logspline fit ===========================
library(polspline) # this package can be installed from within R
fit.posterior <- logspline(delta.posterior)
# 95% confidence interval:
x0 <- qlogspline(0.025,fit.posterior)
x1 <- qlogspline(0.975,fit.posterior)
posterior <- dlogspline(0, fit.posterior) # this gives the pdf at point delta = 0
prior <- dcauchy(0) # height of order--restricted prior at delta = 0
BF01 <- posterior/prior
BF01
Out[30]:
In [33]:
#============ Plot Prior and Posterior ===========================
par(cex.main = 1.5, mar = c(5, 6, 4, 5) + 0.1, mgp = c(3.5, 1, 0), cex.lab = 1.5,
font.lab = 2, cex.axis = 1.3, bty = "n", las=1)
xlow <- -3
xhigh <- 0
yhigh <- 4
Nbreaks <- 80
y <- hist(delta.posterior, Nbreaks, plot=F)
plot(c(y$breaks, max(y$breaks)), c(0,y$density,0), type="S", lwd=2, lty=2,
xlim=c(xlow,xhigh), ylim=c(0,yhigh), xlab=" ", ylab="Density", axes=F)
axis(1, at = c(-4,-3,-2,-1,0,1,2,3,4), lab=c("-4","-3","-2","-1","0", "1", "2", "3", "4"))
axis(2)
mtext(expression(delta), side=1, line = 2.8, cex=2)
#now bring in log spline density estimation:
par(new=T)
plot(fit.posterior, ylim=c(0,yhigh), xlim=c(xlow,xhigh), lty=1, lwd=1, axes=F)
points(0, dlogspline(0, fit.posterior),pch=19, cex=2)
# plot the prior:
par(new=T)
plot ( function( x ) dcauchy( x, 0, 1 ), xlow, xhigh, ylim=c(0,yhigh), xlim=c(xlow,xhigh), lwd=2, lty=1, ylab=" ", xlab = " ", axes=F)
axis(1, at = c(-4,-3,-2,-1,0,1,2,3,4), lab=c("-4","-3","-2","-1","0", "1", "2", "3", "4"))
axis(2)
points(0, dcauchy(0), pch=19, cex=2)
그래프에 따르면 $\delta=0$(effect size)일때
Bayer factor를 $H_0 : \delta=0$ versus $H_2 : \delta \sim Cauchy(0,1)_{X(-\infty,0)}$ 계산하며
$\delta=0$일때 사후분포는 사전분포보다 7.6배 크며,
이것은 Bayer factor가 영가설(귀무가설) $H_0$을 7.6:1 비율로 지지한다.
인지과학에서는 두개의 독립적인 그룹의 데이터의 평균의 비교를 자주한다.
두개의 샘플에 대한 t-test는 표준 빈도주의자가 일반적으로 사용한다.
예제에서는 맹물을 마실때(effect of drinking plain)와 과산화수소를 마실때의 효과(oxygenated water)를 비교한다.
각 샘플은 20개의 데이터로 구성되어
과산화수소 마실때(x) : (70,80,79,83,77,75,84,78,75,75,78,82,74,81,72,70,75,72,76,77)
맹물 마실때(y) : (56,80,63,62,67,71,68,76,79,67,76,74,67,70,62,65,72,72,69,71)
In [1]:
# oxygenated water
x = c(70,80,79,83,77,75,84,78,75,75,78,82,74,81,72,70,75,72,76,77)
mean(x)
sd(x)
Out[1]:
Out[1]:
In [2]:
# plain water
y=c(56,80,63,62,67,71,68,76,79,67,76,74,67,70,62,65,72,72,69,71)
mean(y)
sd(y)
Out[2]:
Out[2]:
In [3]:
#t-test
# t value is 4.47, p-value is 0.00008625
# p-value < 0.01, so H0(there is no difference) rejected
options(scipen=999)
t.test(x,y,"two.sided")
Out[3]:
In [4]:
# 2. Density
options(jupyter.plot_mimetypes = 'image/png')
plot(density(x))
lines(density(y), lty=2)
In [8]:
# oxygenated water
x = c(70,80,79,83,77,75,84,78,75,75,78,82,74,81,72,70,75,72,76,77)
# plain water
y=c(56,80,63,62,67,71,68,76,79,67,76,74,67,70,62,65,72,72,69,71)
# Rescale
zy = (y-mean(x))/sd(x)
zx = (x-mean(x))/sd(x)
#t-test
# t value is 4.4708 p-value is 0.00008625
# p-value < 0.01, so H0(there is no difference) rejected
options(scipen=999)
t.test(zx,zy,"two.sided")
Out[8]:
In [9]:
plot(density(zx))
lines(density(zy), lty=2)
In [15]:
options(jupyter.plot_mimetypes = 'image/png')
# clears workspace:
rm(list=ls())
#JARS
library(R2jags)
# oxygenated water
x = c(70,80,79,83,77,75,84,78,75,75,78,82,74,81,72,70,75,72,76,77)
# plain water
y = c(56,80,63,62,67,71,68,76,79,67,76,74,67,70,62,65,72,72,69,71)
n1 = length(x)
n2 = length(y)
# Rescale
y = y - mean(x)
y = y/sd(x)
x = (x-mean(x))/sd(x);
data <- list("x", "y", "n1", "n2") # to be passed on to JAGS
myinits <- list(
list(delta = rnorm(1,0,3), mu = rnorm(1,0,1), sigmatmp = runif(1,0,5)),
list(delta = rnorm(1,0,3), mu = rnorm(1,0,1), sigmatmp = runif(1,0,5)),
list(delta = rnorm(1,0,3), mu = rnorm(1,0,1), sigmatmp = runif(1,0,5)))
# Parameters to be monitored
parameters = c("delta")
# 1. Model
# Two-sample Comparison of Means
modelString = "
model{
# Data
for (i in 1:n1){
x[i] ~ dnorm(mux,lambda)
}
for (j in 1:n2){
y[j] ~ dnorm(muy,lambda)
}
# Means and precision
alpha <- delta*sigma
mux <- mu+alpha/2
muy <- mu-alpha/2
lambda <- pow(sigma,-2)
# delta, mu, and sigma Come From (Half) Cauchy Distributions
lambdadelta ~ dchisqr(1)
delta ~ dnorm(0,lambdadelta)
lambdamu ~ dchisqr(1)
mu ~ dnorm(0,lambdamu)
lambdasigma ~ dchisqr(1)
sigmatmp ~ dnorm(0,lambdasigma)
sigma <- abs(sigmatmp)
# Sampling from Prior Distribution for Delta
lambdadeltaprior ~ dchisqr(1)
deltaprior ~ dnorm(0,lambdadeltaprior)
}"
# The following command calls JAGS with specific options.
# For a detailed description see the R2jags documentation.
samples = jags(data, inits=myinits, parameters,
model.file=textConnection(modelString),
n.chains=3, n.iter=10000, n.burnin=5000, n.thin=1, DIC=T)
# Now the values for the monitored parameters are in the "samples" object,
# ready for inspection.
plot(samples)
# Collect posterior samples across all chains:
delta.posterior <- samples$BUGSoutput$sims.list$delta
In [16]:
#============ BFs based on logspline fit ===========================
library(polspline) # this package can be installed from within R
fit.posterior <- logspline(delta.posterior)
# 95% confidence interval:
x0 <- qlogspline(0.025,fit.posterior)
x1 <- qlogspline(0.975,fit.posterior)
posterior <- dlogspline(0, fit.posterior) # this gives the pdf at point delta = 0
prior <- dcauchy(0) # height of order--restricted prior at delta = 0
BF01 <- posterior/prior
BF01
Out[16]:
In [17]:
#============ Plot Prior and Posterior ===========================
par(cex.main = 1.5, mar = c(5, 6, 4, 5) + 0.1, mgp = c(3.5, 1, 0), cex.lab = 1.5,
font.lab = 2, cex.axis = 1.3, bty = "n", las=1)
xlow <- -3
xhigh <- 3
yhigh <- 2
Nbreaks <- 80
y <- hist(delta.posterior, Nbreaks, plot=F)
plot(c(y$breaks, max(y$breaks)), c(0,y$density,0), type="S", lwd=2, lty=2,
xlim=c(xlow,xhigh), ylim=c(0,yhigh), xlab=" ", ylab="Density", axes=F)
axis(1, at = c(-4,-3,-2,-1,0,1,2,3,4), lab=c("-4","-3","-2","-1","0", "1", "2", "3", "4"))
axis(2)
mtext(expression(delta), side=1, line = 2.8, cex=2)
#now bring in log spline density estimation:
par(new=T)
plot(fit.posterior, ylim=c(0,yhigh), xlim=c(xlow,xhigh), lty=1, lwd=1, axes=F)
points(0, dlogspline(0, fit.posterior),pch=19, cex=2)
# plot the prior:
par(new=T)
plot ( function( x ) dcauchy( x, 0, 1 ), xlow, xhigh, ylim=c(0,yhigh), xlim=c(xlow,xhigh), lwd=2, lty=1, ylab=" ", xlab = " ", axes=F)
axis(1, at = c(-4,-3,-2,-1,0,1,2,3,4), lab=c("-4","-3","-2","-1","0", "1", "2", "3", "4"))
axis(2)
points(0, dcauchy(0), pch=19, cex=2)
In [ ]: