In [1]:
p <- 0.002 ## probability of success.
ans <- 1-pbinom(q=0, size=55, prob=p)
round(ans, 3) ## just rounds the answer to 3 decimal places.
Out[1]:
On an assembly line of Acme Radio, the probability that a radio is defective is 1/15. If an inspector randomly checks 10 items, what is the probability that no more than 2 are defective?
In [2]:
p <- 1/15
n <- 10
## ---- finish the rest!
Let's go ahead and derive some intuition for the normal distribution. If you don't care, skip down to the problems!
I want to go from
$$ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}exp\{-\frac{1}{2}\frac{(x-\mu)^2}{\sigma^2}\} $$To something that makes sense to everyone.
Let's first define distance. Distance metrics are measure of the distance between two points, so we want the following qualities for a distance function:
All distances are positive, so you can't have negative distance and the distance between something and itself is 0.
Some functions that work well with this include:
$$ d(x,y) = \sqrt{(x-y)^2} $$But, let's say that we want to include the variance of the data into this equation. This makes sense because if $\sigma^2$ is large, that confers different information about the distance (or spread) of the data. for a concrete example, let's take the function that has been included for hw3:
In [3]:
## ---- Don't change this function, or you risk it not working correctly!
norm_samp_dist_plot <- function(n, mean=0, sd=1, num_samples=100000){
## ---- load neccesary packages (but check first and make it easy to install
## if not already there).
list_of_packages <- c("ggplot2", "reshape2")
new_packages <- list_of_packages[!(list_of_packages %in% installed.packages()[,"Package"])]
if(length(new.packages)>1) {
install.packages(new_packages, repos="http://cran.rstudio.com/")
}
library(ggplot2)
## ---- simulate
samps <- rnorm(num_samples, mean, sd)
sd_samps <- rnorm(num_samples, mean, sd/sqrt(n))
## ---- collect
dat <- data.frame(samps, sd_samps)
colnames(dat) <- c("original", "sampling_dist")
dat_m <- reshape2::melt(dat)
colnames(dat_m)<-c("dist", "value")
print(ggplot(dat_m, aes(value))+geom_density(aes(color=dist)) + theme_bw())
}
In [5]:
norm_samp_dist_plot(n=5)
We see that the variance for the sampling distribution of the sample mean has a much smaller variance that that of the original distribution. So take any two points the x axis of this plot, and measure the distance between them. I'm sure for the same two points, the distance between the two points would be the same for both, but is this a really fair comparison? It seems like points on the blue curve might be farther apart on the blue curve than on the orange. This is really determined by the variance. So, why don't we divide our distance measure by the variance. Thus we get a variance-adjusted distance measure. i.e.
$$ d_{kyle}(x,y) = \sqrt{\frac{(x-y)^2}{\sigma^2}} $$Called $d_{kyle}$ because I came up with it. j.k.
Now we have a much fairer distance measure between $(x,y)$. Let's say we want to know the distance between the mean $\mu$ and any value that a random variable with variance $\sigma^2$, then this is:
$$ d(x,\mu) = \sqrt{\frac{(x-\mu)^2}{\sigma^2}} $$And then we can create a probability distribution that measures that is largest at the mean, and assigns less and less probability to values farther from the mean. So we want a probability distribution that is a function of the distance from the mean. i.e:
$$ P(X=x) = f(d(x, \mu)) $$such that all of the requirements of a probability distribution hold, namely.
$$ 0 \leq P(X=x) \leq 1 $$If $A$ is the universe then $$ P(X\in A) = 1 $$
Then a good candidate for this function is the exponential function, namely
$$ f(x) = c \times exp \{ -1/2x^2 \} = c \times e^{-1/2 x^2} $$The $c$ is some constant and $c e^0 = c\times 1$, and $e^{-1/2 x^2}$ gets smaller as $x$ gets larger. Thus, if we plug the distance function in we get:
$$ P(X=x) = c \times exp \{-1/2 (d(x,\mu))^2 \} $$The $c=\frac{1}{\sqrt{2\pi\sigma^2}}$ and is found by
In [9]:
x <- rpois(10000, lambda=1)
In [10]:
hist(x)
In [11]:
x_bars <- numeric(1000)
for(k in 1:length(x_bars)){
x <- rpois(1000, lambda=1)
x_bars[k] <- mean(x)
}
plot(density(x_bars))
Looking pretty normal, right?