In this notebook we will reason about recent presidential approval poll data. We will explore how the concepts of conditional probability, Law of Total Probability and Bayes' Theorem help us better understand a simple survey. Along the way we will learn how the Python data analysis library `pandas`

facilitates easy manipulation of data tables.

**Learning Goals:**

- Analyze poll data with conditional probability, Law of Total Probability and Bayes' Theorem
- Learn some basic
`pandas`

skills

**Problem:** You collect data on whether or not people approve of President Trump, a potential candidate in the upcoming election. We have collected real poll data from the last 13 CNN polls, which can be found here (link directly to the CNN poll here).

Let $A$ be the event that a person says they approve of the way President Trump is handling his job as president. Let $M$ be the event that a user answered "No opinion." We are interested in estimating $P(A)$, however that is hard given the small but significant number of users who answered "No opinion".

**Note 1:** We assume in our model that given enough information the "No opinion" users would make an approve/disapprove decision.

**Note 2:** The latest CNN poll (Jan 16-19, 2020) had a sample of 1156 respondents. For simplicity we will assume all polls also had this sample size.

```
In [0]:
```num_respondents = 1156
dates = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'June', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', 'Jan2020']
data = {}
data['approve'] = [37, 40, 42, 43, 43, 43, 40, 39, 41, 42, 43, 43]
data['disapprove'] = [57, 55, 51, 52, 52, 52, 54, 55, 57, 54, 53, 53]
data['no_opinion'] = [7, 5, 8, 5, 5, 5, 6, 6, 2, 4, 4, 4]

In the below cell, import `pandas`

and make a DataFrame object using the above poll data and using the `dates`

list as the `index`

.

Then, display the data by printing your DataFrame object.

**Hint:** Instead of using `print`

, try using the DataFrame variable name alone on a single line at the end of the cell. It will look prettier :)

```
In [0]:
```import pandas as pd
polldf = # TODO
polldf

Using your DataFrame object created above, compute $P(M^C)$. See pandas.DataFrame.sum to sum rows or columns of the table.

**Hint:** Try accessing the DataFrame using its column names and then doing elementwise vector math. For example, use `polldf['approve'] / ...`

instead of `for`

loops.

```
In [0]:
``````
# TODO
```

You know the drill :)

```
In [0]:
``````
# TODO
```

```
In [0]:
```polldf['P(A) w/ A.1'] = # TODO
polldf['P(A) w/ A.2'] = # TODO
polldf['P(A) w/ A.3'] = # TODO
polldf