HW 6 Statistics and probability homework

Complete homework notebook in a homework directory with your name and zip up the homework directory and submit it to our class blackboard/elearn site. Complete all the parts 6.1 to 6.5 for score of 3.

Investigate plotting, linearegression, or complex matrix manipulation to get a score of 4 or cover two additional investigations for a score of 5.

6.1 Coin flipping


In [ ]:

6.1.1

Write a function, flip_sum, which generates $n$ random coin flips from a fair coin and then returns the number of heads.

A fair coin is defined to be a coin where $P($heads$)=\frac{1}{2}$

The output type should be a numpy integer, hint: use random.rand()


In [ ]:

6.1.2 Test it

Check it by showing the results of 100 coins being flipped


In [ ]:

6.1.3 Create and display a histogram of 200 experiments of flipping 5 coins.


In [ ]:

6.1.4

Write a function, estimate_prob, that uses flip_sum to estimate the following probability:

$P( k_1 <= $ number of heads in $n$ flips $< k_2 ) $

The function should estimate the probability by running $m$ different trials of flip_sum(n), probably using a for loop.

In order to receive full credit estimate_prob call flip_sum (aka: flip_sum is located inside the estimate_prob function)


In [1]:
def estimate_prob(n,k1,k2,m):
    """Estimate the probability that n flips of a fair coin result in k1 to k2 heads
         n: the number of coin flips (length of the sequence)
         k1,k2: the trial is successful if the number of heads is 
                between k1 and k2-1
         m: the number of trials (number of sequences of length n)
         
         output: the estimated probability 
         """

In [4]:
# this is a small sanity check


x = estimate_prob(100,45,55,1000)
print x
assert 'float' in str(type(x))
print "does x==0.687?"


does x==0.687?

6.2.2 Calculate the actual probablities and compare it to your estimates for:

n= number of coins k1 = min number of heads k2 = upper limit of number of heads m = the number of experiments

6.2.2.a n=100, k1 = 40, k2=60 m=100


In [ ]:

6.2.2.b n=100, k1 = 40, k2=60 m=1000


In [ ]:

6.3 Conditional probablity

In a recent study, the following data were obtained in response to the question" "Do you favor the proposal of the school’s combining the elementary and middle school students in one building?"

Answers = [Yes, No, No opinion] Males = [75, 89, 10] Females = [105, 56, 6]

If a person is selected at random, find these probabilities solving using python.

  1. The person has no opinion
  2. The person is a male or is against the issue.
  3. The person is a female, given that the person opposes the issue.

6.4 Matrix creation

Write a 12 by 12 times table matrix shown below. Do this 6.4.1 using nested for loops 6.4.2 using numpy fromfunction array constructor 6.4.3 using numpy broadcasting


In [7]:
from numpy import array 
array([[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12],
   [  2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24],
   [  3,   6,   9,  12,  15,  18,  21,  24,  27,  30,  33,  36],
   [  4,   8,  12,  16,  20,  24,  28,  32,  36,  40,  44,  48],
   [  5,  10,  15,  20,  25,  30,  35,  40,  45,  50,  55,  60],
   [  6,  12,  18,  24,  30,  36,  42,  48,  54,  60,  66,  72],
   [  7,  14,  21,  28,  35,  42,  49,  56,  63,  70,  77,  84],
   [  8,  16,  24,  32,  40,  48,  56,  64,  72,  80,  88,  96],
   [  9,  18,  27,  36,  45,  54,  63,  72,  81,  90,  99, 108],
   [ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100, 110, 120],
   [ 11,  22,  33,  44,  55,  66,  77,  88,  99, 110, 121, 132],
   [ 12,  24,  36,  48,  60,  72,  84,  96, 108, 120, 132, 144]])


Out[7]:
array([[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12],
       [  2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24],
       [  3,   6,   9,  12,  15,  18,  21,  24,  27,  30,  33,  36],
       [  4,   8,  12,  16,  20,  24,  28,  32,  36,  40,  44,  48],
       [  5,  10,  15,  20,  25,  30,  35,  40,  45,  50,  55,  60],
       [  6,  12,  18,  24,  30,  36,  42,  48,  54,  60,  66,  72],
       [  7,  14,  21,  28,  35,  42,  49,  56,  63,  70,  77,  84],
       [  8,  16,  24,  32,  40,  48,  56,  64,  72,  80,  88,  96],
       [  9,  18,  27,  36,  45,  54,  63,  72,  81,  90,  99, 108],
       [ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100, 110, 120],
       [ 11,  22,  33,  44,  55,  66,  77,  88,  99, 110, 121, 132],
       [ 12,  24,  36,  48,  60,  72,  84,  96, 108, 120, 132, 144]])

6.5

Answer the following questions with respect to the https://data.cdc.gov/NCHS/NCHS-Leading-Causes-of-Death-United-States/bi63-dtpu

How many patients were censored? What is the correlation coefficient between state and Suicide for deaths above 100 ? What is the average deaths for each state and type of cause ? What is the year that was the most deadly for each cause name ?


In [15]:
import pandas as pd
dfh = pd.read_csv(".\data\NCHS_-_Leading_Causes_of_Death__United_States.csv")
dfh.head()


Out[15]:
Year 113 Cause Name Cause Name State Deaths Age-adjusted Death Rate
0 2016 Accidents (unintentional injuries) (V01-X59,Y8... Unintentional injuries Alabama 2755 55.5
1 2016 Accidents (unintentional injuries) (V01-X59,Y8... Unintentional injuries Alaska 439 63.1
2 2016 Accidents (unintentional injuries) (V01-X59,Y8... Unintentional injuries Arizona 4010 54.2
3 2016 Accidents (unintentional injuries) (V01-X59,Y8... Unintentional injuries Arkansas 1604 51.8
4 2016 Accidents (unintentional injuries) (V01-X59,Y8... Unintentional injuries California 13213 32.0

In [ ]: