Answer symbolically first, indicating what equations your Python program is using, and then compute the answer in Python. If not specified, say which distribution you're assuming.
[1] The time between traffic tickets is exponentially distributed. Based on past experience, you receive a traffic ticket about every 3 years. What's the probability of having a ticket within 12 months? For 2 bonus points, what's the probability of having 2 tickets within 12 months? Use scipy stats.
[2] You see two deer per day. How many days must pass before you have a 99% of having seen a deer? Answer in days, hours, and minutes.
[1] The expected score on a test is 90% with a standard deviation of 15%. You cannot receive more than 100% on this test. What's the probability failing (< 60%)?
[2] Using the above parameters, what's the probability of getting an A (93%-100%)?
[4] Using the definition of expected value, write a for loop that computes the expected value of a binomial distribution with $N = 10$ and $p = 0.3$. Do not use scipy stats. Compre with the fomula $E[x] = pN$ for binomial.
One ticket: We are being asked:
$$\int_0^{12} \lambda e^{-\lambda t}\, dt$$where $\lambda = \frac{1}{36}$, based on the prompt. The answer is $0.28$
Two tickets : We can view this as kind of a binomial distribution, where each month we can get a traffic ticket. That would give these parameters:
$$N = 12$$$$ p = \frac{1}{36}$$$$P(x = 2) = \binom{x}{N} p^x \,\left(1 - p\right)^{N - x} = 0.03$$However, of course it's possible to get two tickets in a month. SO, let's try breaking it down by day:
$$N = 365$$$$ p = \frac{1}{3\times365}$$$$P(x = 2) = \binom{x}{N} p^x \,\left(1 - p\right)^{N - x} = 0.0398$$Ah, we see that it's approximately the same. To go to the extreme, we can use a Poisson distribution, where $\mu = \frac{1}{3}$. That gives $0.0398$, which is the same. To make sure we're sane, we can check if the two distributions are consistent. This Poisson distribution gives 0.24 (instead of 0.28) for a single ticket. Sort of close.
In [14]:
from scipy import stats as ss
print(ss.expon.cdf(12, scale=36))
print(ss.binom.pmf(2, p=1 / 36, n=12))
print(ss.binom.pmf(2, p=1 / (3 * 365), n=365))
print(ss.poisson.pmf(2, mu=1 / 3))
print(ss.poisson.pmf(1, mu=1 / 3))
In [3]:
ss.binom?
In [14]:
result = ss.expon.ppf(0.99, scale = 24 * 60 / 2)
days = int(result / 24 / 60)
hours = int((result / 60 - days * 24))
minutes = (result - days * 24 * 60 - hours * 60)
print(days, hours, minutes)
In [17]:
ss.norm.cdf(60, scale=15, loc=90)
Out[17]:
In [18]:
1 - ss.norm.cdf(93, scale=15, loc=90)
Out[18]:
In [28]:
from scipy.special import comb
sum = 0
N = 10
p = 0.3
for i in range(0,N+1):
sum += i * comb(N, i) * p**i * (1 - p)**(N - i)
print(sum, p * N)
Indicate if the CLT applies with yes or no. If no, state why.
Report the given confidence interval for error in the mean using the data in the next cell and describe in words what the confidence interval is for each example
In [2]:
data_3_1 = [93.14,94.66, 102.1, 79.98, 96.85, 106.79, 101.92, 91.99, 97.22, 99.1, 88.7, 123.66, 99.7, 115.03, 99.28, 114.59, 102.25, 88.4, 111.06, 75.19, 107.32, 81.21, 100.49, 109.04, 105.09, 96.17, 78.13, 98.37, 104.47, 95.41]
data_3_2 = [2.24,3.86, 2.19, 1.5, 2.34, 2.55, 1.8, 3.99, 2.64, 3.8]
data_3_3 = [53.43,50.49, 52.55, 51.73]
In [10]:
import numpy as np
from scipy import stats as ss
sample_mean = np.mean(data_3_1)
sample_std = np.std(data_3_1, ddof=1)
Zlo = ss.norm.ppf(0.1)
Xlo = Zlo * sample_std / np.sqrt(len(data_3_1)) + sample_mean
Xhi = -Zlo * sample_std / np.sqrt(len(data_3_1)) + sample_mean
print(Xlo, 'to', Xhi)
In [11]:
sample_mean = np.mean(data_3_2)
sample_std = np.std(data_3_2, ddof=1)
Tlo = ss.t.ppf(0.01, len(data_3_2) - 1)
Xlo = Tlo * sample_std /np.sqrt(len(data_3_2)) + sample_mean
print(Xlo)
In [12]:
sample_mean = np.mean(data_3_3)
sample_std = np.std(data_3_3, ddof=1)
Tlo = ss.t.ppf(0.025, len(data_3_3) - 1)
Xlo = Tlo * sample_std / np.sqrt(len(data_3_3)) + sample_mean
Xhi = -Tlo * sample_std / np.sqrt(len(data_3_3)) + sample_mean
print(Xlo, 'to', Xhi)
In [13]:
sample_mean = np.mean(data_3_3)
true_std = 2
Zlo = ss.norm.ppf(0.025)
Xlo = Zlo * true_std / np.sqrt(len(data_3_3)) + sample_mean
Xhi = -Zlo * true_std / np.sqrt(len(data_3_3)) + sample_mean
print(Xlo, 'to', Xhi)
Answer the following questions using the data given in the next cell.
In [23]:
X = [1.6,0.4, -1.05, -0.08, 0.99, -1.89, 0.29, 0.71, -0.47, 1.15]
Y = [3.59,1.49, -2.57, -0.0, 2.0, -3.48, 0.14, 1.38, -1.48, 2.6]
In [31]:
for xi, yi in zip(X, Y):
print(xi, yi)
In [24]:
ans = np.corrcoef(X, Y, ddof=1)
print(ans[0,1])
In [36]:
#compute the mean first
xmean = 0
ymean = 0
for xi, yi in zip(X, Y):
xmean += xi
ymean += yi
xmean /= len(X)
ymean /= len(Y)
#now compute covariance using our previous calculation.
cov = 0
for xi, yi in zip(X, Y):
cov += (xi - xmean) * (yi - ymean)
cov /= len(X) - 1
print(cov)
In [29]:
from math import *
YSort = Y[:]
YSort.sort()
N = len(YSort)
print((YSort[int(N / 2)] + YSort[int(N / 2 - 1)]) / 2)
In [ ]: