In [1]:
%load_ext rpy2.ipython
In [119]:
%%R
setwd("/Users/justin/Dropbox/Articles/databozo/data/ProgramsDoingBayesianDataAnalysis/")
source('BernBeta.R')
source('BernGrid.R')
In [13]:
%%R
BernBeta(c(4,4), c(1))
In [150]:
%%R
exercise_6_8_1_a <- function(){
nIntervals = 10
width=1/nIntervals
Theta=seq(from=width/2, to=1-width/2, by=width)
approxMass = dbeta(Theta, 8, 4)*width
pTheta = approxMass/sum(approxMass)
sum(approxMass)
}
exercise_6_8_1_a()
The sum isn't exactly one simply because we have a discrete and thus approximate answer. The more buckets we add though the close we will get to one. For example:
In [152]:
%%R
exercise_6_8_1_a_extra <- function(){
nIntervals = 1000
width=1/nIntervals
Theta=seq(from=width/2, to=1-width/2, by=width)
approxMass = dbeta(Theta, 8, 4)*width
pTheta = approxMass/sum(approxMass)
sum(approxMass)
}
exercise_6_8_1_a_extra()
In [35]:
%%R
plot(approxMass,type='bar')
In [36]:
%%R
Theta=seq(from=0, to=1, by=width)
approxMass = dbeta(Theta, 8, 4)*width
pTheta = approxMass/sum(approxMass)
In [38]:
%%R
sum(approxMass)
In [37]:
%%R
plot(approxMass,type='bar')
In [66]:
%%R
pTheta = c(50:1, 1:50, 50:1, 1:50)
pTheta = pTheta / sum(pTheta)
width = 1/length(pTheta)
Theta = seq(from=width/2, to=1-width/2, by=width)
In [67]:
%%R
plot(pTheta)
In [68]:
%%R
pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 15),rep(0,5)))
In [69]:
%%R
pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 3),rep(0,1)))
In [70]:
%%R
pThetaGivenDataNext = BernGrid(Theta, pThetaGivenData, c(rep(1, 12),rep(0,4)))
Then just for fun let's see what happens if I give a ton more data.
In [71]:
%%R
pThetaGivenDataNextNext = BernGrid(Theta, pThetaGivenDataNext, c(rep(1, 121),rep(0,40)))
In [81]:
%%R
exercise_6_4_a <- function()
pTheta = rep(1,101)
pTheta = pTheta / sum(pTheta)
width = 1/length(pTheta)
Theta = seq(from=width/2, to=1-width/2, by=width)
pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 58),rep(0,42)))
exercise_6_4_a()
In [83]:
%%R
exercise_6_4_a_2 <- function()
pTheta = rep(1,101)
pTheta = pTheta / sum(pTheta)
width = 1/length(pTheta)
Theta = seq(from=width/2, to=1-width/2, by=width)
pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 42),rep(0,58)))
exercise_6_4_a_2()
In [129]:
%%R
exercise_6_4_c <- function(){
pTheta = rep(1,1001)
pTheta = pTheta / sum(pTheta)
width = 1/length(pTheta)
Theta = seq(from=width/2, to=1-width/2, by=width)
pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 58),rep(0,42)))
pThetaGivenDataNext = BernGrid(Theta, pThetaGivenData, c(rep(1, 57),rep(0,43)))
}
exercise_6_4_c()
In [128]:
%%R
exercise_6_5 = function(){
pTheta = rep(1,1001)
pTheta = pTheta / sum(pTheta)
width = 1/length(pTheta)
Theta = seq(from=width/2, to=1-width/2, by=width)
pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 28),rep(0, 472)))
return(pThetaGivenData)
}
given = exercise_6_5()
I used a non-informative prior since we have no prior information. Even doing this, which should be the most conservative estimate, it looks like the HDI is 5-8%. Since 10% is outside of the credibility interval it is not credible to believe that the process is outside of control limits.
In [126]:
%%R
exercise_6_6 <- function(){
pTheta = seq(from=0, to=1001, by=1)^2
pTheta = pTheta / sum(pTheta)
width = 1/length(pTheta)
Theta = seq(from=width/2, to=1-width/2, by=width)
pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 2),rep(0, 2)))
}
exercise_6_6()
In [130]:
%%R
exercise_6_7_1 <- function(){
pTheta = seq(from=0, to=1001, by=1)^2
pTheta = pTheta / sum(pTheta)
width = 1/length(pTheta)
Theta = seq(from=width/2, to=1-width/2, by=width)
pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 6),rep(0, 2)))
}
exercise_6_7_2 <- function(){
pTheta = (1001-seq(from=0, to=1001, by=1))^2
pTheta = pTheta / sum(pTheta)
width = 1/length(pTheta)
Theta = seq(from=width/2, to=1-width/2, by=width)
pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 6),rep(0, 2)))
}
exercise_6_7_1()
exercise_6_7_2()
The ratio of posterior beliefs is 4.66 (optimistic model to pessimistic).
In [154]:
%%R
exercise_6_8_a <- function(){
pTheta = c(rep(1,51),49:1)# seq(from=0, to=1001, by=1)
pTheta = pTheta / sum(pTheta)
plot(pTheta)
}
exercise_6_8_a()
I created this prior as a way of showing that I am very optimistic in the drug giving us SOME effect... the amount of effect declines until it's minimal at 100%. This is also a fairly unusual prior and I'm curious how the rest of the exercise turns out as a result. If I were to make this better, I'd probably give it a peak in the middle so it bulges between 50 and 100% instead of just declining.
In [155]:
%%R
exercise_6_8_b = function(){
pTheta = c(rep(1,501),500:1)
pTheta = pTheta / sum(pTheta)
width = 1/length(pTheta)
Theta = seq(from=width/2, to=1-width/2, by=width)
pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 30),rep(0, 20)))
return(pThetaGivenData)
}
given = exercise_6_8_b()
My reasoning for this prior is that we have a very limited belief that the drug will decrease the birth of boys. Really only the magnitude of the effect of more boys is what we doubt. As an example: this is what would happen if the experiment results were reversed.
In [156]:
%%R
exercise_6_8_b_2 = function(){
pTheta = c(rep(1,501),500:1)
pTheta = pTheta / sum(pTheta)
width = 1/length(pTheta)
Theta = seq(from=width/2, to=1-width/2, by=width)
pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 20),rep(0, 30)))
return(pThetaGivenData)
}
given = exercise_6_8_b_2()
At first when I saw this, I wondered if anything could shift the likelihood to the left. Suppose the experiment had 5 times the data. Then we see this:
In [157]:
%%R
exercise_6_8_b_3 = function(){
pTheta = c(rep(1,501),500:1)
pTheta = pTheta / sum(pTheta)
width = 1/length(pTheta)
Theta = seq(from=width/2, to=1-width/2, by=width)
pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 100),rep(0, 150)))
return(pThetaGivenData)
}
given = exercise_6_8_b_3()
Finally the data overwhelms the prior and causes an almost split personality. Our HDI is actually two intervals: 34.3%-46.1% and 50.2%-52.7%. It is 95% likely that true effect lies in one of these two intervals. The single most likely value however is 50.2%. Just slightly better than random.
In [149]:
%%R
exercise_6_8_c_company = function(){
print("Company subjective beliefs")
pTheta = rep(50, 1001)
pTheta = pTheta / sum(pTheta)
width = 1/length(pTheta)
Theta = seq(from=width/2, to=1-width/2, by=width)
pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 60),rep(0, 40)))
pThetaGivenDataNext = BernGrid(Theta, pThetaGivenData, c(rep(1, 30),rep(0, 20)))
return(pThetaGivenDataNext)
}
exercise_6_8_c_skeptic = function(){
print("Skeptic subjective beliefs")
pTheta = rep(50, 1001)
pTheta = pTheta / sum(pTheta)
width = 1/length(pTheta)
Theta = seq(from=width/2, to=1-width/2, by=width)
pThetaGivenData = BernGrid(Theta, pTheta, c(rep(1, 50),rep(0, 50)))
pThetaGivenDataNext = BernGrid(Theta, pThetaGivenData, c(rep(1, 30),rep(0, 20)))
return(pThetaGivenDataNext)
}
given = exercise_6_8_c_company()
given_2 = exercise_6_8_c_skeptic()
The data wouldn't be enough to move the skeptic. The company would have enough data to believe that their drug works however. It's an interesting mathematical example of the effects of bias in data analysis.
In [ ]: