[1 point] A probability mass function must give a positive number for each element in the sample space and $\underline{\hspace{0.5in}}$?
[1 point] Which of these are invalid sample spaces and which are valid: $\{1,3,-2\}$, $\{A, B\}$, $\{\textrm{Ace of hearts}, \textrm{king of diamonds}\}$, all real numbers.
[1 point] What rule allows me to rewrite $P(x \,|\,y)P(y)$ as $P(x, y)$?
[2 points] If there is a 10% chance of rain for 3 days in a row, what's the probability of there being rain at least once within those days?
[2 points] Harry says that expected value is like an average, so you can compute two ways: $ E[X] = \sum_i^N \frac{x_i}{N} $ and the way we learned in class: $E[X] = \sum_i P(x) \cdot x$. Is Harry correct or is there an issue with his logic?
[1 point] How many elements will I have in my list if I create it using list(range(5,8))
?
[2 points] In the binomial distribution, we only consider number of successes. Let's try considering each permutation as unique. For example, if $N = 3$ and $n = 1$, you could have $100$, $010$, and $001$. If $N = 10$, how many unique permutations are possible for all numbers of successes? Review your HW 5, questions 1.2-1.5.
sum to 1
all are valid
Definition of conditional
Binomial with $p=0.1,\,N=3$. Being asked $1 - P(n = 0)$. Binomial coefficient is 1 for $0$, so just $1 - (1 - 0.1)^{3} = 0.271$
Expected value is only conceptually like an average. We do not have data, so the first expression requires a sum of data. We have elements in a sample space, so only the second equation can be used. The law of large numbers connects them, but that's in the limit of large amounts of data.
3
You are a baby being carried in a stork to your parents. Your parents live in either:
The probability of your birth location is proportional to the populations. As a baby, you are concerned with your career options, which are
Answer the following using $B$ as the random variable for birthplace and $J$ as the random variable for job. We have the following information:
$$P(J = r \,|\, B = c) = 0.05$$$$P(J = d \,|\, B = c) = 0.5$$$$P(J = r \,|\, B = u) = 0.8$$$$P(J = p\,|\, B = u) = 0.01$$$$P(J = p\,|\, B = g) = 0.75$$$$P(J = d \,|\, B = g) = 0.2$$Label your axes, add a title, and use LaTeX in your labels when necessary. Use dots connected by lines for discrete and lines for continuous.
[6 points] Plot three different parameter of the geometric distribution: $p = 0.2, p = 0.5, p = 0.8$. Add vertical lines at their means. Extra credit: accomplish the plot of the three lines using a for
loop.
[4 points] Plot the binomial distribution for $N = 25, p = 0.7$. Recall that the Poisson is an approximation to the Binomial. Plot the Poisson approximation to this Binomial distribution on the same plot.
[2 points] Make a second plot with the binomial and Poisson, but use $N = 25, p = 0.10$. How good is the approximation?
[6 points] The command plt.fill_between
can be used to plot an area under a curve. For example, fill_between(x, 0, y)
will fill the area between 0 and y
, where y
could be a numpy
array. Using fill_between
, show the cumulative probability function for the exponential distribution from $t = 0$ to $t = 5$ with $\lambda = 0.25$. Ensure that there are two lines on your plot: one that is the exponential pdf and one that is the fill_between
. The pdf should extend further than your fill_between
line. Add a vertical line at $t=5$. No legend necessary.
In [3]:
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
In [4]:
#3.1
p = [0.2, 0.5, 0.8]
n = np.arange(1, 8)
for i, pi in enumerate(p):
plt.plot(n, pi * (1 - pi)**(n - 1), 'o-', label='$p={}$'.format(pi), color='C{}'.format(i))
plt.axvline(x = 1/ pi, color='C{}'.format(i))
plt.title('Problem 3.1 - Geometric')
plt.xlabel('$n$')
plt.ylabel('$P(n)$')
plt.legend()
plt.show()
In [3]:
#3.2
from scipy.special import comb,factorial
N = 4
p = 0.70
mu = N * p
x = np.arange(0, N+1)
plt.plot(x, comb(N, x) * p**x *(1 - p)**(N - x), 'o-', label='binomial')
plt.plot(x, np.exp(-mu) * mu**x / factorial(x), 'o-', label='Poisson')
plt.title('Problem 3.2 - Binomial vs Geometric')
plt.xlabel('$n$')
plt.ylabel('$P(n)$')
plt.legend()
plt.show()
In [4]:
#3.3
from scipy.special import comb,factorial
N = 25
p = 0.10
mu = N * p
x = np.arange(0, N+1)
plt.plot(x, comb(N, x) * p**x *(1 - p)**(N - x), 'o-', label='binomial')
plt.plot(x, np.exp(-mu) * mu**x / factorial(x), 'o-', label='Poisson')
plt.title('Problem 3.3 - Binomial vs Geometric')
plt.xlabel('$n$')
plt.ylabel('$P(n)$')
plt.legend()
plt.show()
In [5]:
#3.4
L = 1 / 4
t = np.linspace(0,7,100)
tsmall = np.linspace(0,5,100)
plt.plot(t, L * np.exp(-L * t))
plt.fill_between(tsmall, 0, L * np.exp(-L * tsmall))
plt.axvline(x=5)
plt.title('Problem 3.4 - Exponential')
plt.xlabel('$t$')
plt.ylabel('$P(t)$')
plt.show()
[1 point] "The 95% prediction interval for a geometric probability distribution" can be described with what mathematical equation? Answer as a $\LaTeX$ equation.
[6 points] Using a for
loop, compute a lower (starting at 0) 90% prediction interval for the binomial distribution with $N = 12, p = 0.3$.
[6 points] Using a for
loop, compute an upper (ending at N) 95% prediction interval for the binomial distribution with $N = 20, p = 0.6$.
[6 points] Using a for
loop, compute a 80% prediction interval for the geomemtric distribution for $p = 0.02$. Just pick a large number for the upper-bound of the for
loop.
[12 Extra Credit Points]. Repeat 4.3 using a while
loop.
In [6]:
#4.2
N = 12
p = 0.3
psum = 0
for ni in range(0, N+1):
psum += comb(N, ni) * p**ni * (1 - p)**(N - ni)
if psum >= 0.9:
break
print('Interval is [0, {}]'.format(ni))
In [7]:
#4.3
N = 20
p = 0.6
psum = 0
#reverse the range so we count down from the top
for ni in range(N + 1, -1, -1):
psum += comb(N, ni) * p**ni * (1 - p)**(N - ni)
if psum >= 0.95:
break
print('Interval is [{}, N]'.format(ni))
In [1]:
#4.4
p = 0.02
psum = 0
for ni in range(1, 500):
psum += p * (1 - p) ** (ni - 1)
if psum >= 0.8:
break
print('Interval is [1, {}]'.format(ni))
In [15]:
#4.5
N = 20
p = 0.6
psum = 0
# count down
ni = N
while psum < 0.95:
psum += comb(N, ni) * p**ni * (1 - p)**(N - ni)
ni -= 1
#add 1, since when we broke we had just subtracted 1
print('Interval is [{}, N]'.format(ni + 1))
Use scipy.stats here as needed. Except for 5.1 and 5.3, answer in Python.
scipy.stats
to answer this one.[2 points] Given that $\mu = 2$, $\sigma = 1.2$, what is the probability of observing a sample between -2 and 0? Answer using a $Z$-score.
[2 points] Given that $\mu = 2$, $\sigma = 1.2$, what is the probability of observing a sample between -2 and 0? Answer without using a $Z$-score.
In [10]:
#5.2
import scipy.stats as ss
print(ss.norm.cdf(-2))
In [11]:
#5.4
zlo = (-2 - 0) / 1.2
zhi = (0 - 0) / 1.2
print(ss.norm.cdf(zhi) - ss.norm.cdf(zlo))
In [12]:
#5.5
print(ss.norm.cdf(0, loc=2, scale=1.2) - ss.norm.cdf(-2, loc=2, scale=1.2))