Homework 4

CHE 116: Numerical Methods and Statistics

Prof. Andrew White

Version 1.1(2/5/2015)


1. Plotting Practice (6 Points)

Create a plot of $\sin(2x)$ and $\sin(x)$ from $0$ to $2\pi$ with the following properties:

  1. In Seaborn in the "Poster" context
  2. LaTeX equations labeling the two curves
  3. A Vertical line at $\pi$ that is the color of the same sine wave
  4. A legend
  5. An $x$ and $y$ label
  6. A title that says "I love Python"

In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from math import pi
import seaborn

seaborn.set_context("talk")

x = np.linspace(0, 2*pi, 250)
              
y1 = np.sin(x)
y2 = np.sin(2*x)
plt.xlim(0, 2*pi)

plt.xlabel('$x$')
plt.ylabel('$sine$   $wave$')
plt.plot(x, y1, label='$sin(x)$')
lines1 = plt.plot(x, y2, label='$sin(2x)$')
# A backslace, \, allows you to wrap a line
plt.vlines(x=pi, ymin=-1.5, ymax=1.5, label='$x=\pi$', color=lines1[0].get_color())
plt.xlabel('$x$')
plt.ylabel('$y$')
plt.title(u'Me Gusta el Pitón!!!')
plt.legend(loc='lower left')
plt.show()


2. Working with Data (5 Points)

Make sure you have downloaded the Excel spreadsheet for this week's homework. The data in the spreadsheet is from a 4-plate hydrogen fuel cell. The cell below loads the data. Run the cell and then write your own python code in subsequent cells.

  1. Using a for loop, calculate the sample mean of Power of the fuel cell.
  2. Using the numpy mean function, calculate the sample mean of the Power of the fuel cell.
  3. Use a for loop to calculate the standard deviation of the Power of the fuel cell.
  4. Using the numpy var function, calculate the standard deviation of the Power of the fuel cell.
  5. Your answers between 2.3 and 2.4 should be slightly different. Why?

In [2]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn
seaborn.set_context("poster")

#load the fuel cell data
fuel_data = pd.read_excel('fuel_cell.xlsx')

#show the columns
for i in fuel_data.columns:
    print '\'{}\''.format(i)


'Time'
'Voltage'
'Current'
'Power'
'Efficiency'
'Resistance'
'Control Sig'
'Fan %'
'H2 In'
'H2 Out'
'Cell 1'
'Cell 2'
'Cell 3'
'Cell 4'
'RH %'
'Temperature'

In [4]:
#This is how you access data
print fuel_data['Time']


0     3691.651
1     3693.891
2     3696.715
3     3699.508
4     3702.460
5     3705.275
6     3707.923
7     3710.187
8     3712.795
9     3715.428
10    3718.260
11    3721.035
12    3723.707
13    3725.963
14    3728.203
...
662    5496.243
663    5499.492
664    5502.652
665    5505.819
666    5508.987
667    5512.171
668    5514.691
669    5516.948
670    5519.284
671    5522.412
672    5525.595
673    5528.739
674    5531.987
675    5535.188
676    5538.324
Name: Time, Length: 677, dtype: float64

2.1 Answer


In [5]:
#Ex2.1
sum=0
power=fuel_data['Power']
totalnumber=len(power)
for i in power:
    sum+=i
print sum/totalnumber, sum/676


0.898273435746 0.899602242604

2.2 Answer


In [6]:
#Ex 2.2
mu=np.mean(power)
print mu


0.898273435746

2.3 Answer


In [8]:
# Ex 2.3
import  math

sume=0
for i in power:
    sume+=(i-mu)**2
print math.sqrt(sume/(totalnumber-1))
print math.sqrt(np.var(power, ddof=1))


0.0123478111501
0.01234781115

2.4 Answer


In [8]:
#Ex 2.4
var= np.var(power)
print math.sqrt(var)


0.0123386882739

2.5 Answer

2.3 divides by N and 2.4 divides by $(N-1)$

3. Watching Youtube with the Geometric Distribution (3 Points)

Write what quantity the question asks for symbolically, write the equation you need to compute symbolically (if necessary) and compute your answer in Python.

  1. You accidentally open youtube while doing your homework. After watching one video, you find another interesting video you must watch with probability 75%. What is the probability you return to your homework after 1 video?

  2. What is the probability you return to your homework after exactly 5 videos?

  3. What is the expected number of videos you will watch? You may use any method to compute this.

3.1 Answer

Given: $p=1-0.75$ - probability of not finding a video that you will watch.

$P(n)=(1-p)^{(n-1)}\times p$

$n = 1$ because we have success (go back to HW) on the first video.


In [9]:
n=1
p=0.25
P=(1-p)**(n-1)*p
print P


0.25

3.2 Answer

$P(n = 5)$


In [10]:
n=5
p=0.25
P=(1-p)**(n-1)*p
print P


0.0791015625

3.3 Answer

There are many ways to answer this one. I'll show two.

$E(N=n)=\displaystyle\sum\limits_{n}^N nP(N=n)$.

For geometric we know $E[n] = 1 / p$.


In [11]:
#Print the analytical expected value.

p=0.25
E=1/p
print E

#Compute using numpy
n = np.arange(1,100)
exp_n = n * (1 - p)**(n-1) * p
E = np.sum(exp_n)
print E


4.0
3.99999999996

4. Python Practice (10 Points)

Answer the following problems in Python

  1. Create an array containing the integers from 1 to 100 squared. Note: do not include 0.
  2. Using a for loop, sum the squared integers and break if your sum is greater than 1000. Print the integer which causes your sum to exceed 1000.
  3. Take the geometric distribution from Problem 3 and sum the probability of from 1 to 100 videos using a for loop. It should be very close to 1.
  4. Add a break to your for loop from 4.3 and print the number of videos when the sum is 0.95. What does that represent?
  5. [3 Points] Turn your for loop into while loop, so that you do not need to guess how big your array will need to be and turn it into a function. Your function should take two arguments: the probability of success and the confidence level (default 5%) for a geometric distribution. It should return a number of trials for which there is $>=$ 95% probability of success before reaching that trial number.
  6. [3 Points] Given a normal distribution with $\mu=5$, $\sigma = 2.5$, what is $P(-5 < x < 1)$?

4.1 Answer


In [12]:
import numpy as np
some_ints = np.arange(1,101) ** 2
print some_ints


[    1     4     9    16    25    36    49    64    81   100   121   144
   169   196   225   256   289   324   361   400   441   484   529   576
   625   676   729   784   841   900   961  1024  1089  1156  1225  1296
  1369  1444  1521  1600  1681  1764  1849  1936  2025  2116  2209  2304
  2401  2500  2601  2704  2809  2916  3025  3136  3249  3364  3481  3600
  3721  3844  3969  4096  4225  4356  4489  4624  4761  4900  5041  5184
  5329  5476  5625  5776  5929  6084  6241  6400  6561  6724  6889  7056
  7225  7396  7569  7744  7921  8100  8281  8464  8649  8836  9025  9216
  9409  9604  9801 10000]

4.2 Answer


In [13]:
#4.2
sqsum=0 # sum of the squared integers
for i in some_ints:
    sqsum += i
    if(sqsum > 1000):
        break
    
print '{} was an integer that gives a sum greater than 1000. The sum finished with a value of {}'.format(i, sqsum)


196 was an integer that gives a sum greater than 1000. The sum finished with a value of 1015

4.3 Answer


In [14]:
#4.3
#create domain
n=np.arange(1, 101)
#set parameters
p=0.25
#initialize sum
Psum = 0
for ni in n:
    Psum += (1-p)**(ni-1)*p #add to Psum
print "{:e}is the probability of watching {} or less videos.".format(Psum, ni)


1.000000e+00is the probability of watching 100 or less videos.

4.4 Answer


In [15]:
#4.4
#create domain
n=np.arange(1, 101)
#set parameters
p=0.25
#initialize sum
Psum = 0
for ni in n:
    Psum += (1-p)**(ni-1)*p #add to Psum
    if (Psum>0.95):
        break
print "When {} videos are watched, the probability is greater than 0.95" .format(ni)


When 11 videos are watched, the probability is greater than 0.95

4.5 Answer


In [16]:
def geometric_trial(p, confidence=0.05):
    #Takes in a probability of success and confidence level
    #Returns the maxmium number of trials which will occur according
    #to the confidence level.
    n = 1
    psum = 0
    
    while psum < 1 - confidence:
        psum += (1 - p) ** (n - 1) * p
        n += 1
        
    return n - 1 #Note: Since we added 1 before our check in the while loop, we must subtract one
print geometric_trial(0.25)
print geometric_trial(0.25, .1)


11
9

4.5 Answer


In [17]:
from math import pi, sqrt

dx = 0.000025
T = np.arange(-5, 1, dx)
sigma = 2.5
mu = 5
prob = np.sum( 1 / (sigma * sqrt(2 * pi) ) * np.exp(-(T - mu)**2 / (2 * sigma ** 2)) ) * dx
print "P(-5 < x < 1)=", prob


P(-5 < x < 1)= 0.0547670665237

5. Laughing at Tweets with Poisson Distribution (3 Points)

Write what quantity the question asks for symbolically, write the equation you need to compute symbolically and compute your answer in Python.

  1. The probability of laughing at a tweet is 0.1%. If you browse 500 tweets, what's the probability that you laugh three times?
  2. What is the probability you laugh less than three times?
  3. What is the probability you laugh more than three times?

5.1 Answer

$$\mu = 500 \times 0.001$$$$P(x = 3) = \frac{e^{-\mu} \mu^x}{x!}$$

In [18]:
from math import exp, factorial
p=0.001# probability of the laugh
N=500 # number of tweets
x=3 # number of times one laughs
mu = N * p
#P(x=3)?
P3=exp(-mu)*(mu)**x / factorial(x)
print "The probability of laughing {} times is {:%}" .format(x, P3)


The probability of laughing 3 times is 1.263606%

5.2 Answer

$$P(x < 3) = P(1) + P(2)$$

In [19]:
Psum = exp(-mu)*(mu)**1 / factorial(1) + exp(-mu)*(mu)**2 / factorial(2)
print "The probability of laughing less than three time is: {:%}"  .format(Psum)


The probability of laughing less than three time is: 37.908166%

5.3 Answer

$$P(x > 3) = 1 - P(x < 3) - P(x = 3)$$

In [20]:
Pmore=(1-Psum-P3) 
print "The probability of laughing more than three times is: {:%}" .format(Pmore)


The probability of laughing more than three times is: 60.828228%

6. Weekend Dating with Binomial Distribution (4 Points)

  1. The probability that you go on a second date is 10%. If you go on 8 first dates this weekend, what's the probability you'll have 3 second dates next weekend? State the distribution, its parameters and the question being asked symbolically and answer the problem in Python.

  2. What is the probability of having more than 3 second dates for next weekend? State the question being asked symbolically and answer the problem in Python. Hint: consider using the np.sum function to make it easy, but to answer 6.5 you might want to try it with a for loop.

  3. To better understand your dating agenda, make a plot of the probability of the number of second dates.

  4. Caclulate the expected number of second dates with the parameters above using a for loop. Do not just use the expected value equation for the binomial distribution.

6.1 Answer

$P(x;N,p) = {N \choose x} p^{x}(1−p)^{N−x}$

$p=0.10$ - probability of the second date

$N=8$ number of dates per first week

$P(x=3)= {8 \choose 3} 0.1^{3}(1−0.1)^{8−3}$


In [21]:
from scipy.special import comb
pbnm=0.1
N=8
x=3
P3=comb(N,x)*pbnm**x*(1-pbnm)**(N-x)
print "The probability of going to {} second dates is {:%}".format(x, P3)


The probability of going to 3 second dates is 3.306744%

6.2 Answer


In [22]:
dates=np.arange(4, 9)
pbnm=0.1
N=8
P_3_more = 0
for di in dates:
    P_3_more += comb(N,di)*pbnm**di*(1-pbnm)**(N-di)
print "The probability of having more than three dates is", P_3_more


The probability of having more than three dates is 0.00502435

6.3 Answer


In [23]:
dates=np.arange(0, 9)
P3more=comb(N, dates)*pbnm**dates*(1-pbnm)**(N-dates)
plt.plot(dates, P3more, 'yo-')
plt.title("Second Dates")
plt.show()


6.4 Answer

$$E[n] = \sum_0^N n P(n)$$

In [24]:
dates=np.arange(0, 9)
En = 0
for di in dates:
    En += di * comb(N,di)*pbnm**di*(1-pbnm)**(N-di)

analytical= N * pbnm
print "The numerical {} and analytical, {} expected values are the same!" .format(En, analytical)


The numerical 0.8 and analytical, 0.8 expected values are the same!

In [24]: