Complete homework notebook in a homework directory with your name and zip up the homework directory and submit it to our class blackboard/elearn site. Complete all the parts 6.1 to 6.5 for score of 3.
Investigate plotting, linearegression, or complex matrix manipulation to get a score of 4 or cover two additional investigations for a score of 5.
In [ ]:
In [ ]:
In [ ]:
In [ ]:
Write a function, estimate_prob, that uses flip_sum to estimate the following probability:
$P( k_1 <= $ number of heads in $n$ flips $< k_2 ) $
The function should estimate the probability by running $m$ different trials of flip_sum(n), probably using a for loop.
In order to receive full credit estimate_prob call flip_sum (aka: flip_sum is located inside the estimate_prob function)
In [1]:
def estimate_prob(n,k1,k2,m):
"""Estimate the probability that n flips of a fair coin result in k1 to k2 heads
n: the number of coin flips (length of the sequence)
k1,k2: the trial is successful if the number of heads is
between k1 and k2-1
m: the number of trials (number of sequences of length n)
output: the estimated probability
"""
In [4]:
# this is a small sanity check
x = estimate_prob(100,45,55,1000)
print x
assert 'float' in str(type(x))
print "does x==0.687?"
In [ ]:
In [ ]:
In a recent study, the following data were obtained in response to the question" "Do you favor the proposal of the school’s combining the elementary and middle school students in one building?"
Answers = [Yes, No, No opinion] Males = [75, 89, 10] Females = [105, 56, 6]
If a person is selected at random, find these probabilities solving using python.
In [7]:
from numpy import array
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
[ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24],
[ 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36],
[ 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48],
[ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60],
[ 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72],
[ 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84],
[ 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96],
[ 9, 18, 27, 36, 45, 54, 63, 72, 81, 90, 99, 108],
[ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120],
[ 11, 22, 33, 44, 55, 66, 77, 88, 99, 110, 121, 132],
[ 12, 24, 36, 48, 60, 72, 84, 96, 108, 120, 132, 144]])
Out[7]:
Answer the following questions with respect to the https://data.cdc.gov/NCHS/NCHS-Leading-Causes-of-Death-United-States/bi63-dtpu
How many patients were censored? What is the correlation coefficient between state and Suicide for deaths above 100 ? What is the average deaths for each state and type of cause ? What is the year that was the most deadly for each cause name ?
In [15]:
import pandas as pd
dfh = pd.read_csv(".\data\NCHS_-_Leading_Causes_of_Death__United_States.csv")
dfh.head()
Out[15]:
In [ ]: