Homework 8 Key

CHE 116: Numerical Methods and Statistics

3/22/2018

1.CLT Concepts (8 Points)

If you sum together 20 numbers sampled from a binomial distribution and 10 from a Poisson distribution, how is your sum distribted?
If you sample 25 numbers from different beta distributions, how will each of the numbers be distributed?
Assume a HW grade is determined as the average of 3 HW assignments. How is the HW grade distributed?
You measure the height of 3 people. What distribution will the uncertainty of the mean of the heights follow?

Answers:

1.1 The sum should follow a normal distribution. Since the total sample size is 30, it is large enough for the CLT to apply. Hence the differences in distributions that we are sampling from does not apply.

1.2 Since we are just sampling 25 numbers without taking either their sum or mean, each of the numbers will reflect the beta distribution that it is sampled from.

1.3 Since we are taking the mean of 3 HW assignments as the HW grade, it will follow a normal distribution according to CLT. Here we are assuming that we know the individual standard deviations of each of the HW assignments separately and that they are normally distributed. However, if the standard deviation of each HW assignment is not known, the HW grade will follow a t-distribution.

1.4 The uncertainty of the mean of weights will follow a t-distribution. This is because 3 is a small sample size and we do not know the value of the true weight standard deviation.

2.Confidence Interval (16 Points)

Report the given confidence interval for error in the mean using the data in the next cell and describe in words what the confidence interval is for each example. 4 points each

80% Double.
99% Upper ( a value such that the mean lies above that value 99% of the time)
95% Double
Redo part 3 with a known standard deviation of 2

data_1 = [0.41,2.69,3.82,0.42,1.20]

data_2 = [5.07,2.79,1.24,6.50,3.17,3.59,5.42,4.10,1.26,0.54,1.22,4.43,3.83,0.93,3.45,5.24,3.51,4.64,0.65,3.27,2.41,4.31,4.15,2.24,2.30,3.3]

data_3 = [5.62,2.34,2.76,2.80,1.15,5.19,-0.91]

2.1 Answer

Since $N=5$ and the true standard deviation is not known, we use the t-distribution. We can say with 80% confidence that the true mean lies between the interval 1.7 $\pm$ 1.0 .



In [1]:

    
from scipy import stats as ss
import numpy as np



In [2]:

    
data1 = np.array([0.41,2.69,3.82,0.42,1.20])
CI = 0.80
sample_mean = np.mean(data1)
sample_var = np.var(data1, ddof=1) 
T = ss.t.ppf((1 - CI) / 2, df=len(data1)-1)
y = -T * np.sqrt(sample_var / len(data1))

print('{} +/ {}'.format(sample_mean, y))









    



1.7079999999999997 +/ 1.0300293818335124

2.2 Answer

Since $N=26$, we use the normal distribution. We can say with 99% confidence that the true mean lies above 3.693 (3.214 + 0.479).



In [3]:

    
data2 = np.array([5.07,2.79,1.24,6.50,3.17,3.59,5.42,4.10,1.26,0.54,1.22,4.43,3.83,0.93,3.45,5.24,3.51,4.64,0.65,3.27,2.41,4.31,4.15,2.24,2.30,3.3])
CI = 0.99
sample_mean = np.mean(data2)
sample_var = np.var(data2, ddof=1) 
Z = ss.norm.ppf((1 - CI))
y = -Z * np.sqrt(sample_var / len(data2))

print('{} + {}'.format(sample_mean, y))









    



3.213846153846154 + 0.727156929961314

2.3 Answer

Since $N=7$ and the true standard deviation is not known, we use the t-distribution. We can say with 95% confidence that the true mean lies between the interval 2.70 $\pm$ 2 .



In [4]:

    
data3 = np.array([5.62,2.34,2.76,2.80,1.15,5.19,-0.91])
CI = 0.95
sample_mean = np.mean(data3)
sample_var = np.var(data3,ddof=1)
T = ss.t.ppf((1 - CI)/2,df=len(data3)-1)
y = -T * np.sqrt(sample_var / len(data3))

print('{} +/ {}'.format(sample_mean, y))









    



2.707142857142857 +/ 2.0784675474628465

2.4 Answer

Even though we have small sample size, $N=7$, we use the normal distribution since we know the true standard deviation. We can say with 95% confidence that the true mean lies between the interval 2.7 $\pm$ 1.5 .



In [5]:

    
data3 = np.array([5.62,2.34,2.76,2.80,1.15,5.19,-0.91])
CI = 0.95
sample_mean = np.mean(data3)
true_var = 2**2
T = ss.norm.ppf((1 - CI)/2)
y = -T * np.sqrt(true_var / len(data3))

print('{} +/ {}'.format(sample_mean, y))









    



2.707142857142857 +/ 1.4815935090674932

3.Confidence Intervals (8 Points)

State the distribution and its parameters for each of the following cases. 2 points each.

$P(\mu - \bar{x})$, $\sigma = 2.4$, $N = 4$
$P(\mu)$, $\bar{x} = 11$, $\sigma_x = 3.2$, $N = 11$
$P(\mu)$, $\bar{x} = -3$, $\sigma_x = 2.1$, $N = 35$
$P(\mu)$, $\bar{x} = 6$, $\sigma = 11$, $N = 30$

Answers

3.1 Normal distribution

mean=0, standard deviation=$\sigma/\sqrt N$ = 1.2

3.2 t-distribution

parameters: $\sigma_x/\sqrt N$=0.965, N-1=10

3.3 Normal distribution

mean=-3, standard deviation=$\sigma_x/\sqrt N$ = 0.355

3.4 Normal distribution

mean=6, standard deviation=$\sigma/\sqrt N$ = 2.008