Homework 6

CHE 116: Numerical Methods and Statistics

2/22/2018


1. Review Questions (10 Points)

  1. [1 point] A probability mass function must give a positive number for each element in the sample space and $\underline{\hspace{0.5in}}$?

  2. [1 point] Which of these are invalid sample spaces and which are valid: $\{1,3,-2\}$, $\{A, B\}$, $\{\textrm{Ace of hearts}, \textrm{king of diamonds}\}$, all real numbers.

  3. [1 point] What rule allows me to rewrite $P(x \,|\,y)P(y)$ as $P(x, y)$?

  4. [2 points] If there is a 10% chance of rain for 3 days in a row, what's the probability of there being rain at least once within those days?

  5. [2 points] Harry says that expected value is like an average, so you can compute two ways: $ E[X] = \sum_i^N \frac{x_i}{N} $ and the way we learned in class: $E[X] = \sum_i P(x) \cdot x$. Is Harry correct or is there an issue with his logic?

  6. [1 point] How many elements will I have in my list if I create it using list(range(5,8))?

  7. [2 points] In the binomial distribution, we only consider number of successes. Let's try considering each permutation as unique. For example, if $N = 3$ and $n = 1$, you could have $100$, $010$, and $001$. If $N = 10$, how many unique permutations are possible for all numbers of successes? Review your HW 5, questions 1.2-1.5.

1.1

sum to 1

1.2

all are valid

1.3

Definition of conditional

1.4

Binomial with $p=0.1,\,N=3$. Being asked $1 - P(n = 0)$. Binomial coefficient is 1 for $0$, so just $1 - (1 - 0.1)^{3} = 0.271$

1.5

Expected value is only conceptually like an average. We do not have data, so the first expression requires a sum of data. We have elements in a sample space, so only the second equation can be used. The law of large numbers connects them, but that's in the limit of large amounts of data.

1.6

3

1.7

$$ 2^{10} = 1024 $$

2. Marginal Probability Review (19 Points)

You are a baby being carried in a stork to your parents. Your parents live in either:

  1. USA (u, 320)
  2. China (c, 1300)
  3. Germany (g, 80)

The probability of your birth location is proportional to the populations. As a baby, you are concerned with your career options, which are

  1. Rock star (r)
  2. Professor (p)
  3. Doctor (d)

Answer the following using $B$ as the random variable for birthplace and $J$ as the random variable for job. We have the following information:

$$P(J = r \,|\, B = c) = 0.05$$$$P(J = d \,|\, B = c) = 0.5$$$$P(J = r \,|\, B = u) = 0.8$$$$P(J = p\,|\, B = u) = 0.01$$$$P(J = p\,|\, B = g) = 0.75$$$$P(J = d \,|\, B = g) = 0.2$$
  1. [2 point] Write out the missing conditionals and marginal probabilities.
  2. [4 points] What is the probability that you will be a professor?
  3. [3 points] What is the probability that you will be a rock star born in China?
  4. [2 point] You were born in Germany. What's the probability of becoming a doctor?
  5. [4 points] Consider the random variable $Z$, which indicates if you are a doctor or rockstar (true for $J=d$ and $J=r$). What is $P(Z = 1 \,|\, B=u)$?
  6. [4 points] What is $P(B=g \,|\, Z = 0)$? Find a way to re-use the calculation you did in 2.2 to help

2.1

$$P(J = p \,|\, B = c) = 0.45$$$$P(J = d \,|\, B = u) = 0.19$$$$P(J = r \,| \,B = g) = 0.05$$$$P(B = c) = \frac{1300}{1700} \approx 0.76$$$$P(B = u) = \frac{320}{1700} \approx 0.19$$$$P(B = g) = \frac{80}{1700} \approx 0.05$$

2.2

$$ P(J = p) = \sum_b P(J = p\, | B = b) P(B = b) = 0.45 \times 0.76 + 0.01\times 0.19 + 0.75\times 0.05 = 0.38 $$

2.3

$$ P(J = r, B = c) = P(J = r\, |\, B = c) P(B = c) = 0.05\times 0.76 = 0.038 $$

2.4

$$ P(J = d\, |\, B = g) = 0.2 $$

2.5

$$ P(Z = 1 \, |\, B = u) = P(J = d, J = p\, | \, B = u) = 1 - P(J = r\, | \, B = u) = 0.2 $$

2.6

$$ P(B = g\, |\, Z = 0) = \frac{P(Z = 0\, |\, B = g) P(B = g)}{P(Z = 0)} = \frac{P(Z = p\, |\, B = g) P(B = g)}{P(B = p)} $$$$ P(B = g\, |\, Z = 0) = \frac{0.75 \times 0.05}{0.38} \approx 0.1 $$

3. Plotting Probability Distributions (18 Points)

Label your axes, add a title, and use LaTeX in your labels when necessary. Use dots connected by lines for discrete and lines for continuous.

  1. [6 points] Plot three different parameter of the geometric distribution: $p = 0.2, p = 0.5, p = 0.8$. Add vertical lines at their means. Extra credit: accomplish the plot of the three lines using a for loop.

  2. [4 points] Plot the binomial distribution for $N = 25, p = 0.7$. Recall that the Poisson is an approximation to the Binomial. Plot the Poisson approximation to this Binomial distribution on the same plot.

  3. [2 points] Make a second plot with the binomial and Poisson, but use $N = 25, p = 0.10$. How good is the approximation?

  4. [6 points] The command plt.fill_between can be used to plot an area under a curve. For example, fill_between(x, 0, y) will fill the area between 0 and y, where y could be a numpy array. Using fill_between, show the cumulative probability function for the exponential distribution from $t = 0$ to $t = 5$ with $\lambda = 0.25$. Ensure that there are two lines on your plot: one that is the exponential pdf and one that is the fill_between. The pdf should extend further than your fill_between line. Add a vertical line at $t=5$. No legend necessary.


In [3]:
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np

In [4]:
#3.1

p = [0.2, 0.5, 0.8]
n = np.arange(1, 8)
for i, pi in enumerate(p):
    plt.plot(n, pi * (1 - pi)**(n - 1), 'o-', label='$p={}$'.format(pi), color='C{}'.format(i))
    plt.axvline(x = 1/ pi, color='C{}'.format(i))
    
plt.title('Problem 3.1 - Geometric')
plt.xlabel('$n$')
plt.ylabel('$P(n)$')
plt.legend()

plt.show()



In [3]:
#3.2
from scipy.special import comb,factorial
N = 4
p = 0.70
mu = N * p
x = np.arange(0, N+1)
plt.plot(x, comb(N, x) * p**x *(1 - p)**(N - x), 'o-', label='binomial')
plt.plot(x, np.exp(-mu) * mu**x / factorial(x), 'o-', label='Poisson')

plt.title('Problem 3.2 - Binomial vs Geometric')
plt.xlabel('$n$')
plt.ylabel('$P(n)$')

plt.legend()
plt.show()



In [4]:
#3.3
from scipy.special import comb,factorial
N = 25
p = 0.10
mu = N * p
x = np.arange(0, N+1)
plt.plot(x, comb(N, x) * p**x *(1 - p)**(N - x), 'o-', label='binomial')
plt.plot(x, np.exp(-mu) * mu**x / factorial(x), 'o-', label='Poisson')

plt.title('Problem 3.3 - Binomial vs Geometric')
plt.xlabel('$n$')
plt.ylabel('$P(n)$')

plt.legend()
plt.show()



In [5]:
#3.4

L = 1 / 4
t = np.linspace(0,7,100)
tsmall = np.linspace(0,5,100)
plt.plot(t, L * np.exp(-L * t))
plt.fill_between(tsmall, 0, L * np.exp(-L * tsmall))
plt.axvline(x=5)

plt.title('Problem 3.4 - Exponential')
plt.xlabel('$t$')
plt.ylabel('$P(t)$')

plt.show()


4. Prediction Intervals and Loops (19 Points + 12 EC)

  1. [1 point] "The 95% prediction interval for a geometric probability distribution" can be described with what mathematical equation? Answer as a $\LaTeX$ equation.

  2. [6 points] Using a for loop, compute a lower (starting at 0) 90% prediction interval for the binomial distribution with $N = 12, p = 0.3$.

  3. [6 points] Using a for loop, compute an upper (ending at N) 95% prediction interval for the binomial distribution with $N = 20, p = 0.6$.

  4. [6 points] Using a for loop, compute a 80% prediction interval for the geomemtric distribution for $p = 0.02$. Just pick a large number for the upper-bound of the for loop.

  5. [12 Extra Credit Points]. Repeat 4.3 using a while loop.

4.1

$$ P(n < x) = 0.9 $$

In [6]:
#4.2
N = 12
p = 0.3
psum = 0
for ni in range(0, N+1):
    psum += comb(N, ni) * p**ni * (1 - p)**(N - ni)
    if psum >= 0.9:
        break
print('Interval is [0, {}]'.format(ni))


Interval is [0, 6]

In [7]:
#4.3
N = 20
p = 0.6
psum = 0
#reverse the range so we count down from the top
for ni in range(N + 1, -1, -1):
    psum += comb(N, ni) * p**ni * (1 - p)**(N - ni)
    if psum >= 0.95:
        break
print('Interval is [{}, N]'.format(ni))


Interval is [8, N]

In [1]:
#4.4

p = 0.02
psum = 0
for ni in range(1, 500):
    psum += p * (1 - p) ** (ni - 1)
    if psum >= 0.8:
        break

print('Interval is [1, {}]'.format(ni))


Interval is [1, 80]

In [15]:
#4.5
N = 20
p = 0.6
psum = 0
# count down
ni = N
while psum < 0.95:
    psum += comb(N, ni) * p**ni * (1 - p)**(N - ni)
    ni -= 1
#add 1, since when we broke we had just subtracted 1
print('Interval is [{}, N]'.format(ni + 1))


Interval is [8, N]

5. Normal Distribution (8 Points)

Use scipy.stats here as needed. Except for 5.1 and 5.3, answer in Python.

  1. [2 points] In the $Z$-score equation $ Z = (x - \mu) / \sigma$, what is $x$?
  2. [1 point] What is $P(x < -2)$ for a standard normal distribution?
  3. [1 point] What is $P(X > 2)$ for a standard normal distribution? Use your knowledge of probability expression, not scipy.stats to answer this one.
  4. [2 points] Given that $\mu = 2$, $\sigma = 1.2$, what is the probability of observing a sample between -2 and 0? Answer using a $Z$-score.

  5. [2 points] Given that $\mu = 2$, $\sigma = 1.2$, what is the probability of observing a sample between -2 and 0? Answer without using a $Z$-score.

5.1

The bounds of an integral


In [10]:
#5.2

import scipy.stats as ss

print(ss.norm.cdf(-2))


0.0227501319482

5.3

$$ 1 - 0.023 = 0.977 $$


In [11]:
#5.4

zlo = (-2 - 0) / 1.2
zhi = (0 - 0) / 1.2
print(ss.norm.cdf(zhi) - ss.norm.cdf(zlo))


0.452209647727

In [12]:
#5.5
print(ss.norm.cdf(0, loc=2, scale=1.2) - ss.norm.cdf(-2, loc=2, scale=1.2))


0.0473612919396