Homework 6

CHE 116: Numerical Methods and Statistics

Version 1.0 (2/17/2016)


1. Misc Distribution Problems (10 Points)

Answer symbolically first, indicating what equations your Python program is using, and then compute the answer in Python. If not specified, say which distribution you're assuming.

  1. [1] The time between traffic tickets is exponentially distributed. Based on past experience, you receive a traffic ticket about every 3 years. What's the probability of having a ticket within 12 months? For 2 bonus points, what's the probability of having 2 tickets within 12 months? Use scipy stats.

  2. [2] You see two deer per day. How many days must pass before you have a 99% of having seen a deer? Answer in days, hours, and minutes.

  3. [1] The expected score on a test is 90% with a standard deviation of 15%. You cannot receive more than 100% on this test. What's the probability failing (< 60%)?

  4. [2] Using the above parameters, what's the probability of getting an A (93%-100%)?

  5. [4] Using the definition of expected value, write a for loop that computes the expected value of a binomial distribution with $N = 10$ and $p = 0.3$. Do not use scipy stats. Compre with the fomula $E[x] = pN$ for binomial.

1.1 Answer

One ticket: We are being asked:

$$\int_0^{12} \lambda e^{-\lambda t}\, dt$$

where $\lambda = \frac{1}{36}$, based on the prompt. The answer is $0.28$

Two tickets : We can view this as kind of a binomial distribution, where each month we can get a traffic ticket. That would give these parameters:

$$N = 12$$$$ p = \frac{1}{36}$$$$P(x = 2) = \binom{x}{N} p^x \,\left(1 - p\right)^{N - x} = 0.03$$

However, of course it's possible to get two tickets in a month. SO, let's try breaking it down by day:

$$N = 365$$$$ p = \frac{1}{3\times365}$$$$P(x = 2) = \binom{x}{N} p^x \,\left(1 - p\right)^{N - x} = 0.0398$$

Ah, we see that it's approximately the same. To go to the extreme, we can use a Poisson distribution, where $\mu = \frac{1}{3}$. That gives $0.0398$, which is the same. To make sure we're sane, we can check if the two distributions are consistent. This Poisson distribution gives 0.24 (instead of 0.28) for a single ticket. Sort of close.


In [14]:
from scipy import stats as ss

print(ss.expon.cdf(12, scale=36))
print(ss.binom.pmf(2, p=1 / 36, n=12))
print(ss.binom.pmf(2, p=1 / (3 * 365), n=365))
print(ss.poisson.pmf(2, mu=1 / 3))
print(ss.poisson.pmf(1, mu=1 / 3))


0.283468689426
0.0384232741801
0.0397647849595
0.0398072950319
0.238843770191

In [3]:
ss.binom?

1.2 Answer

This is an exponential distribution with $\lambda = \frac{2}{24 \times 60}$. We are being aksed

$$P(t > a) = 0.99$$

which is 2 days, 7 hours, and 16 minutes.


In [14]:
result = ss.expon.ppf(0.99, scale = 24 * 60 / 2)
days = int(result / 24 / 60)
hours = int((result / 60 - days * 24))
minutes = (result - days * 24 * 60 - hours * 60)
print(days, hours, minutes)


2 7 15.7225339114

1.3 Answer

We are being asked:

$$\int_{-\infty}^{60} \cal{N}(90, 12)$$

which is 2.3%


In [17]:
ss.norm.cdf(60, scale=15, loc=90)


Out[17]:
0.022750131948179195

1.4 Answer

We are being asked:

$$\int_{93}^{\infty} \cal{N}(90, 12)$$

The reason it goes to infinity is that many who got 100%, would have gotten higher scores had the test allowed it. So they would be greater than 100% if the test scores truly followed a normal distribution. The answer is 42%.


In [18]:
1 - ss.norm.cdf(93, scale=15, loc=90)


Out[18]:
0.42074029056089701

1.7 Answer

The definition of expected value is:

$$E[x] = \sum_0^N x P(x)$$

where $P(x)$ is the binomial distribution for us


In [28]:
from scipy.special import comb

sum = 0
N = 10
p = 0.3

for i in range(0,N+1):
    sum += i * comb(N, i) * p**i * (1 - p)**(N - i)
print(sum, p * N)


3.0 3.0

2. CLT Theory (4 Points)

Indicate if the CLT applies with yes or no. If no, state why.

  1. You measure the density of a solution 50 times and take the average
  2. You sum the number of students who attended the last 20 lectures
  3. Flip a coin 25 times and consider a heads 0 and a tails 1 and take the average
  4. Your final grade, which is the average of 25 homeworks, tests, and a project

2.1 Answer

CLT applies. Enough samples and you took an average

2.2 Answer

CLT applies. It's proportional to an average (you didn't divide by 20)

2.4 Answer

CLT applies

2.5 Answer

CLT applies. You have 27 rvs averaged together

3. Confidence Intervals (12 Points)

Report the given confidence interval for error in the mean using the data in the next cell and describe in words what the confidence interval is for each example

  1. 80% Double
  2. 99% Upper ( a value such that the mean lies above that value 99% of the time)
  3. 95% Double
  4. Redo part 3 with a known standard deviation of 2

In [2]:
data_3_1 = [93.14,94.66, 102.1, 79.98, 96.85, 106.79, 101.92, 91.99, 97.22, 99.1, 88.7, 123.66, 99.7, 115.03, 99.28, 114.59, 102.25, 88.4, 111.06, 75.19, 107.32, 81.21, 100.49, 109.04, 105.09, 96.17, 78.13, 98.37, 104.47, 95.41]
data_3_2 = [2.24,3.86, 2.19, 1.5, 2.34, 2.55, 1.8, 3.99, 2.64, 3.8]
data_3_3 = [53.43,50.49, 52.55, 51.73]

3.1 Answer

This is a normal distribution because $N > 25$. Our interval will run from 10% to 90% to have an area of 80%


In [10]:
import numpy as np
from scipy import stats as ss

sample_mean = np.mean(data_3_1)
sample_std = np.std(data_3_1, ddof=1)
Zlo = ss.norm.ppf(0.1)
Xlo = Zlo * sample_std / np.sqrt(len(data_3_1)) + sample_mean
Xhi = -Zlo * sample_std / np.sqrt(len(data_3_1)) + sample_mean

print(Xlo, 'to', Xhi)


95.9671913504 to 101.18680865

3.2 Answer

This is a t distribution because $N < 25$. Our interval will run from 1% to 100% to have an area of 99%


In [11]:
sample_mean = np.mean(data_3_2)
sample_std = np.std(data_3_2, ddof=1)
Tlo = ss.t.ppf(0.01, len(data_3_2) - 1)
Xlo = Tlo * sample_std /np.sqrt(len(data_3_2)) + sample_mean
print(Xlo)


1.91493832637

3.3 Answer

This is a t distribution because $N < 25$. Our interval will run from 2.5% to 97.5% to have an area of 95%


In [12]:
sample_mean = np.mean(data_3_3)
sample_std = np.std(data_3_3, ddof=1)
Tlo = ss.t.ppf(0.025, len(data_3_3) - 1)
Xlo = Tlo * sample_std / np.sqrt(len(data_3_3)) + sample_mean
Xhi = -Tlo * sample_std / np.sqrt(len(data_3_3)) + sample_mean
print(Xlo, 'to', Xhi)


50.3141851129 to 53.7858148871

3.4 Answer

We can now use a normal distribution because we know the true standard deviation


In [13]:
sample_mean = np.mean(data_3_3)
true_std = 2
Zlo = ss.norm.ppf(0.025)
Xlo = Zlo * true_std / np.sqrt(len(data_3_3)) + sample_mean
Xhi = -Zlo * true_std / np.sqrt(len(data_3_3)) + sample_mean
print(Xlo, 'to', Xhi)


50.0900360155 to 54.0099639845

4. Sample Statistics (17 Points)

Answer the following questions using the data given in the next cell.

  1. [1] What is the sample correlation between $X$ and $Y$?
  2. [5] Write your own method to compute sample covariance using a for loop. Use the code in the second cell below as a starting point. Do not use any numpy methods except to check your answer. Hint: You will need to use two loops
  3. [5 + 1] What is the median of $Y$? Use Python and Numpy. Be careful if you use the sort method, since it will permanently alter your $Y$ array.

In [23]:
X = [1.6,0.4, -1.05, -0.08, 0.99, -1.89, 0.29, 0.71, -0.47, 1.15]
Y = [3.59,1.49, -2.57, -0.0, 2.0, -3.48, 0.14, 1.38, -1.48, 2.6]

In [31]:
for xi, yi in zip(X, Y): 
    print(xi, yi)


1.6 3.59
0.4 1.49
-1.05 -2.57
-0.08 -0.0
0.99 2.0
-1.89 -3.48
0.29 0.14
0.71 1.38
-0.47 -1.48
1.15 2.6

4.1 Answer

$$r_{xy} = -0.16$$

In [24]:
ans = np.corrcoef(X, Y, ddof=1)
print(ans[0,1])


0.985277421286

4.2 Answer


In [36]:
#compute the mean first
xmean = 0
ymean = 0
for xi, yi in zip(X, Y): 
    xmean += xi
    ymean += yi
    
xmean /= len(X)
ymean /= len(Y)

#now compute covariance using our previous calculation.
cov = 0
for xi, yi in zip(X, Y): 
    cov += (xi - xmean) * (yi - ymean)
cov /= len(X) - 1

print(cov)


2.4106833333333326

4.3 Answer

It is an even number, so there is no center element


In [29]:
from math import *

YSort = Y[:]
YSort.sort()
N = len(YSort)
print((YSort[int(N / 2)] + YSort[int(N / 2 - 1)]) / 2)


0.76

In [ ]: