0. Warmup

Ungraded

1. Book Problems

6.1

This is an exponential distribution with rate $2$ inverse minutes. The question is about $ 1 - \int_0^{0.75} P(t)\,dt$


In [119]:
#Method 1
import scipy.stats
#documentation says scale = 1 / lambda 
print "Scipy.stats", 1 - scipy.stats.expon.cdf(0.75, scale=1 / 2.)

#Method 2
from scipy.integrate import quad
lamb = 2.0
ans, err =  quad(lambda s: lamb * np.exp(-lamb * s ), 0, 0.75)
print "quad integration", 1 - ans


Scipy.stats 0.223130160148
quad integration 0.223130160148

7.1

This is a normal distribution and we're integrating it


In [8]:
from math import sqrt, erf

mu = 100.0
sigma = 16.0

#Part a
Z = (90 - mu) / sigma
part_a = 0.5 * (1 + erf(Z /sqrt(2.)))
print 'a) {}'.format(part_a)

#Part b
Z = (130 - mu) / sigma
part_b = 1 - 0.5 * (1 + erf(Z / sqrt(2.)))
print 'b) {}'.format(part_b)

#Part c
Z_lo = (95 - mu) / sigma
Z_hi = (105 - mu) / sigma
part_c = 0.5  * ( erf(Z_hi / sqrt(2)) - erf(Z_lo / sqrt(2)))
print 'c) {}'.format(part_c)


a) 0.265985529049
b) 0.0303963617653
c) 0.24533943694

Q15


In [17]:
import numpy as np

data = [22, 26, 34, 26, 24, 20, 28, 24, 24, 26, 26, 28,
        26, 30, 20, 24, 28, 26, 24, 22, 24, 26, 22, 26,
        28, 24, 24, 28, 28, 26, 22, 30, 24]
print 'Mean:', np.mean(data)
print 'Standard Deviation:', sqrt(np.var(data, ddof=1))


Mean: 25.4545454545
Standard Deviation: 2.96954235837

Q17


In [34]:
data = range(1, 8)
print 'Mean:', np.mean(data)
print 'Median:', data[len(data) / 2]


Mean: 4.0
Median: 4

Q25


In [31]:
data = [14,
        13,
        12,
        15,
        16,
        14,
        15,
        14,
        15,
        12,
        13,
        15,
        15,
        13,
        13,
        15,
        15,
        17,
        14,
        16,
        19,
        17,
        16,
        17,
        19,
        13,
        12,
        14,
        16,
        15,
        15,
        17,
        16,
        17,
        20,
        16,
        15,
        15,
        15,
        14,
        11,
        14,
        18,
        15,
        15,
        16,
        15,
        14,
        15,
        18]
data = np.array(data)# dtype makes them into floating points

print 'Mean:', np.mean(data)

print np.sort(data)
print np.sort(data)[len(data) / 2]


Mean: 15.1
[11 12 12 12 13 13 13 13 13 14 14 14 14 14 14 14 14 15 15 15 15 15 15 15 15
 15 15 15 15 15 15 15 15 16 16 16 16 16 16 16 17 17 17 17 17 18 18 19 19 20]
15

The mean is 54.151", the mode is 54.15", and the median is 54.15". They are the same because the distribution is symmetric


In [ ]:

,2. CLT

  1. No, not enough samples
  2. Yes (Ok to say no not enough samples too)
  3. No, there is no sum/average
  4. Yes
  5. Yes

3. Confidence Intervals


In [2]:
data_3_1 = [5.81, 5.27,  5.4,  4.9,  5.83,  3.2,  6.76,  4.29,  4.76,  5.51]
data_3_2 = [51.47, 48.18,  48.35,  53.57]
data_3_3 = [91.8, 104.04,  129.62,  99.34,  75.92,  56.03,  103.87,  66.27,  88.41,  105.17,  115.05,  111.13,  86.2,  113.48,  96.25,  100.81,  96.56,  89.02,  111.9,  106.55,  117.35,  87.61,  81.97,  106.32,  78.38,  102.38,  80.87,  110.6,  89.09,  132.1]
data_3_4 = [5.89, 3.73,  -10.77,  -13.92,  0.73,  -2.52,  -9.69,  14.15,  -8.16,  2.62,  -0.93,  -13.46,  -2.95,  -7.13,  1.01,  1.45,  16.0,  -17.47,  9.58,  13.3]

Answer 3.1


In [17]:
import scipy.stats
import numpy as np
from math import *

sample_mean = np.mean(data_3_1 )
sample_var = np.var(data_3_1, ddof=1)
T = scipy.stats.t.ppf(0.95, len(data_3_1))
y = T * sample_var**0.5 / sqrt(len(data_3_1))
print 'The true mean is {} +/- {} with 90% confidence{}__{}'.format(sample_mean, y, sample_mean-y, sample_mean+y)


The true mean is 5.173 +/- 0.55529614009 with 90% confidence4.61770385991__5.72829614009

Answer 3.2


In [16]:
sample_mean = np.mean(data_3_2 )
sample_var = np.var(data_3_2, ddof=1)
st=np.std(data_3_2)
T = scipy.stats.t.ppf(0.975, len(data_3_2))
y = T * sample_var**0.5 / sqrt(len(data_3_2))
print 'The true mean is {} +/- {} with 95% confidence {}; {}'.format(sample_mean, y, sample_mean-y, sample_mean+y)


The true mean is 50.3925 +/- 3.61333423834 with 95% confidence 46.7791657617; 54.0058342383

Answer 3.3


In [24]:
sample_mean = np.mean(data_3_3 )
sample_var = np.var(data_3_3, ddof=1)
Z = scipy.stats.norm.ppf(0.995)
y = Z * sample_var**0.5 / sqrt(len(data_3_3))
print 'The true mean is {} +/- {} with 99% confidence {}, {}'.format(sample_mean, y,  sample_mean-y, sample_mean+y)


The true mean is 97.803 +/- 8.10703239366 with 99% confidence 89.6959676063, 105.910032394

Answer 3.4


In [38]:
sample_mean = np.mean(data_3_4 )
sample_var = np.var(data_3_4, ddof=1)
Z = scipy.stats.norm.ppf(0.90)
y = Z * sample_var**0.5 / sqrt(len(data_3_4))
print sample_mean
print 'The true mean is less than {} with 90% confidence'.format(sample_mean+y)


-0.927
The true mean is less than 1.8545989135 with 90% confidence

4. Math Equations

  1. $\cos(x^2) - x = 0$
  2. $x^2 - 4839 = 0$
  3. $\int_\pi^x \sin^2 s\,dx - 1 = 0$

In [1]:
from scipy.optimize import newton
import numpy as np
from math import *

print newton(lambda x: cos(x**2) - x, x0=0)


0.801070765209

In [2]:
print newton(lambda x: x**2 - 4839, x0=1)


69.5629211578

In [107]:
from scipy.integrate import quad

def integral(x):
    ans, err = quad(lambda y: np.sin(y)**2, pi, x)
    return ans - 1

x =  newton(integral, x0=1)
print x, integral(x)


4.93041265958 -3.33066907388e-16