Assignment 4

Using data from this FiveThirtyEight post, write code to calculate the correlation of the responses from the poll. Respond to the story in your PR. Is this a good example of data journalism? Why or why not?

http://fivethirtyeight.com/datalab/opinions-about-the-iran-deal-are-more-about-obama-than-iran/



In [1]:

    
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt # package for doing plotting (necessary for adding the line)
import statsmodels.formula.api as smf # package we'll be using for linear regression
import matplotlib
import pandas as pd
import numpy as np
plt.style.use('ggplot')
import dateutil.parser
import math
import random
import matplotlib.ticker as plticker


matplotlib.rcParams['ps.fonttype'] = 42



In [2]:

    
df = pd.read_excel("iran_obama.xlsx")
df.head()









    Out[2]:






  
    
      
      Iran_Deal
      Approval_for_Obama
    
  
  
    
      Democrats
      60
      78
    
    
      Republicans
      34
      10
    
    
      Black
      54
      85
    
    
      White
      45
      35
    
    
      Ind
      44
      37



In [10]:

    
lm = smf.ols(formula="Iran_Deal~Approval_for_Obama",data=df).fit() #notice the formula regresses Y on X (Y~X)
lm.params









    Out[10]:





Intercept             30.730004
Approval_for_Obama     0.354593
dtype: float64



In [11]:

    
intercept, slope = lm.params



In [12]:

    
fig, ax = plt.subplots(figsize=(10,5))
df.plot(kind="scatter",x="Approval_for_Obama",y="Iran_Deal", ax=ax)
plt.plot(df["Approval_for_Obama"],slope*df["Approval_for_Obama"]+intercept,"-",color="red")
ax.set_ylim([0, 100])
ax.set_title("Predict Feelings on Iran Deal", color='brown')
ax.set_xlabel('Approval for Obama', color='brown')
ax.set_ylabel('Favour Iran Deal', color='brown')









    Out[12]:





<matplotlib.text.Text at 0x9b28d90>



In [6]:

    
lm.summary()









    



c:\users\radhika\appdata\local\programs\python\python35-32\lib\site-packages\scipy\stats\stats.py:1535: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=11
  "anyway, n=%i" % int(n))






    Out[6]:





OLS Regression Results

  Dep. Variable:     Approval_for_Obama    R-squared:             0.849


  Model:                     OLS           Adj. R-squared:        0.833


  Method:               Least Squares      F-statistic:           50.70


  Date:               Tue, 26 Jul 2016     Prob (F-statistic):  5.54e-05


  Time:                   23:53:00         Log-Likelihood:      -38.948


  No. Observations:            11          AIC:                   81.90


  Df Residuals:                 9          BIC:                   82.69


  Df Model:                     1                                      


  Covariance Type:        nonrobust                                    




               coef      std err       t       P>|t|   [0.025     0.975]  


  Intercept    -66.5279     16.172     -4.114   0.003   -103.111    -29.945


  Iran_Deal      2.3950      0.336      7.121   0.000      1.634      3.156




  Omnibus:        11.277    Durbin-Watson:         2.838


  Prob(Omnibus):   0.004    Jarque-Bera (JB):      5.627


  Skew:            1.476    Prob(JB):             0.0600


  Kurtosis:        4.889    Cond. No.               280.



In [7]:

    
neg_df = pd.read_excel("negiran_obama.xlsx")
neg_df.head()









    Out[7]:






  
    
      
      Iran_neg
      Approval_for_Obama
    
  
  
    
      Democrats
      29
      78
    
    
      Republicans
      4
      10
    
    
      Black
      37
      85
    
    
      White
      11
      35
    
    
      Ind
      15
      37



In [13]:

    
neg_lm = smf.ols(formula="Iran_neg~Approval_for_Obama",data=neg_df).fit() #notice the formula regresses Y on X (Y~X)
neg_lm.params









    Out[13]:





Intercept            -0.762126
Approval_for_Obama    0.399968
dtype: float64



In [14]:

    
intercept, slope = neg_lm.params



In [15]:

    
fig, ax = plt.subplots(figsize=(10,5))
neg_df.plot(kind="scatter",x="Approval_for_Obama",y="Iran_neg", ax=ax)
plt.plot(df["Approval_for_Obama"],slope*df["Approval_for_Obama"]+intercept,"-",color="red")
ax.set_ylim([0, 100])
ax.set_title("Opinion on Iran Deal", color='brown')
ax.set_xlabel('Approval for Obama', color='brown')
ax.set_ylabel('Confident in Negotiations with Iran', color='brown')









    Out[15]:





<matplotlib.text.Text at 0x9b3cb90>



In [16]:

    
neg_lm.summary()









    



c:\users\radhika\appdata\local\programs\python\python35-32\lib\site-packages\scipy\stats\stats.py:1535: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=11
  "anyway, n=%i" % int(n))






    Out[16]:





OLS Regression Results

  Dep. Variable:         Iran_neg        R-squared:             0.948


  Model:                    OLS          Adj. R-squared:        0.942


  Method:              Least Squares     F-statistic:           163.1


  Date:              Thu, 28 Jul 2016    Prob (F-statistic):  4.52e-07


  Time:                  11:18:50        Log-Likelihood:      -23.340


  No. Observations:           11         AIC:                   50.68


  Df Residuals:                9         BIC:                   51.48


  Df Model:                    1                                     


  Covariance Type:       nonrobust                                   




                        coef      std err       t       P>|t|   [0.025     0.975]  


  Intercept              -0.7621      1.616     -0.472   0.648     -4.418      2.893


  Approval_for_Obama      0.4000      0.031     12.772   0.000      0.329      0.471




  Omnibus:         0.645    Durbin-Watson:         2.464


  Prob(Omnibus):   0.724    Jarque-Bera (JB):      0.571


  Skew:            0.153    Prob(JB):              0.752


  Kurtosis:        1.927    Cond. No.               124.



In [ ]:

Dep. Variable:	Approval_for_Obama	R-squared:	0.849
Model:	OLS	Adj. R-squared:	0.833
Method:	Least Squares	F-statistic:	50.70
Date:	Tue, 26 Jul 2016	Prob (F-statistic):	5.54e-05
Time:	23:53:00	Log-Likelihood:	-38.948
No. Observations:	11	AIC:	81.90
Df Residuals:	9	BIC:	82.69
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	-66.5279	16.172	-4.114	0.003	-103.111	-29.945
Iran_Deal	2.3950	0.336	7.121	0.000	1.634	3.156

Omnibus:	11.277	Durbin-Watson:	2.838
Prob(Omnibus):	0.004	Jarque-Bera (JB):	5.627
Skew:	1.476	Prob(JB):	0.0600
Kurtosis:	4.889	Cond. No.	280.

Dep. Variable:	Iran_neg	R-squared:	0.948
Model:	OLS	Adj. R-squared:	0.942
Method:	Least Squares	F-statistic:	163.1
Date:	Thu, 28 Jul 2016	Prob (F-statistic):	4.52e-07
Time:	11:18:50	Log-Likelihood:	-23.340
No. Observations:	11	AIC:	50.68
Df Residuals:	9	BIC:	51.48
Df Model:	1
Covariance Type:	nonrobust

Omnibus:	0.645	Durbin-Watson:	2.464
Prob(Omnibus):	0.724	Jarque-Bera (JB):	0.571
Skew:	0.153	Prob(JB):	0.752
Kurtosis:	1.927	Cond. No.	124.