In this notebook we will reason about recent presidential approval poll data. We will explore how the concepts of conditional probability, Law of Total Probability and Bayes' Theorem help us better understand a simple survey. Along the way we will learn how the Python data analysis library pandas
facilitates easy manipulation of data tables.
Learning Goals:
pandas
skillsProblem: You collect data on whether or not people approve of President Trump, a potential candidate in the upcoming election. We have collected real poll data from the last 13 CNN polls, which can be found here (link directly to the CNN poll here).
Let $A$ be the event that a person says they approve of the way President Trump is handling his job as president. Let $M$ be the event that a user answered "No opinion." We are interested in estimating $P(A)$, however that is hard given the small but significant number of users who answered "No opinion".
Note 1: We assume in our model that given enough information the "No opinion" users would make an approve/disapprove decision.
Note 2: The latest CNN poll (Jan 16-19, 2020) had a sample of 1156 respondents. For simplicity we will assume all polls also had this sample size.
In [0]:
num_respondents = 1156
dates = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'June', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', 'Jan2020']
data = {}
data['approve'] = [37, 40, 42, 43, 43, 43, 40, 39, 41, 42, 43, 43]
data['disapprove'] = [57, 55, 51, 52, 52, 52, 54, 55, 57, 54, 53, 53]
data['no_opinion'] = [7, 5, 8, 5, 5, 5, 6, 6, 2, 4, 4, 4]
In the below cell, import pandas
and make a DataFrame object using the above poll data and using the dates
list as the index
.
Then, display the data by printing your DataFrame object.
Hint: Instead of using print
, try using the DataFrame variable name alone on a single line at the end of the cell. It will look prettier :)
In [0]:
import pandas as pd
polldf = # TODO
polldf
Using your DataFrame object created above, compute $P(M^C)$. See pandas.DataFrame.sum to sum rows or columns of the table.
Hint: Try accessing the DataFrame using its column names and then doing elementwise vector math. For example, use polldf['approve'] / ...
instead of for
loops.
In [0]:
# TODO
You know the drill :)
In [0]:
# TODO
In [0]:
polldf['P(A) w/ A.1'] = # TODO
polldf['P(A) w/ A.2'] = # TODO
polldf['P(A) w/ A.3'] = # TODO
polldf