Using data from this FiveThirtyEight post, write code to calculate the correlation of the responses from the poll. Respond to the story in your PR. Is this a good example of data journalism? Why or why not?
In [1]:
import pandas as pd
%matplotlib inline
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
In [4]:
df = pd.read_csv("data-iran.csv")
In [6]:
df.columns =['Group', 'Favor_Iran_Deal','Obama_Approve' ]
In [8]:
df.head()
Out[8]:
Create a new df just with data for Approve of Obama
In [10]:
df.corr()['Favor_Iran_Deal']
Out[10]:
In [12]:
lm = smf.ols(formula="Favor_Iran_Deal~Obama_Approve",data=df).fit()
In [13]:
intercept, slope = lm.params
In [14]:
lm.summary()
Out[14]:
In [16]:
df.plot(kind="scatter",x="Obama_Approve",y="Favor_Iran_Deal")
plt.plot(df["Obama_Approve"],slope*df["Obama_Approve"]+intercept,"-",color="red")
plt.xlabel('Approve of Obama')
plt.ylabel('Favors Iran Deal')
Out[16]: