Exercise 3.9


In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf

3.9 a


In [2]:
auto = pd.read_csv('data/Auto.csv', na_values=['?'])
subset = auto.dropna()
sns.pairplot(subset[:-1]) # pairplot doesn't like NaN's


Out[2]:
<seaborn.axisgrid.PairGrid at 0x8fb7f98>

3.9 b


In [3]:
auto.corr()


Out[3]:
mpg cylinders displacement horsepower weight acceleration year origin
mpg 1.000000 -0.776260 -0.804443 -0.778427 -0.831739 0.422297 0.581469 0.563698
cylinders -0.776260 1.000000 0.950920 0.842983 0.897017 -0.504061 -0.346717 -0.564972
displacement -0.804443 0.950920 1.000000 0.897257 0.933104 -0.544162 -0.369804 -0.610664
horsepower -0.778427 0.842983 0.897257 1.000000 0.864538 -0.689196 -0.416361 -0.455171
weight -0.831739 0.897017 0.933104 0.864538 1.000000 -0.419502 -0.307900 -0.581265
acceleration 0.422297 -0.504061 -0.544162 -0.689196 -0.419502 1.000000 0.282901 0.210084
year 0.581469 -0.346717 -0.369804 -0.416361 -0.307900 0.282901 1.000000 0.184314
origin 0.563698 -0.564972 -0.610664 -0.455171 -0.581265 0.210084 0.184314 1.000000

3.9 c


In [4]:
formula = 'mpg ~ ' + '+'.join(auto.columns.difference(['mpg', 'name']))
model = smf.ols(formula, data=auto)
fit = model.fit()
print(fit.summary())


                            OLS Regression Results                            
==============================================================================
Dep. Variable:                    mpg   R-squared:                       0.821
Model:                            OLS   Adj. R-squared:                  0.818
Method:                 Least Squares   F-statistic:                     252.4
Date:                Sun, 09 Aug 2015   Prob (F-statistic):          2.04e-139
Time:                        13:14:41   Log-Likelihood:                -1023.5
No. Observations:                 392   AIC:                             2063.
Df Residuals:                     384   BIC:                             2095.
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
================================================================================
                   coef    std err          t      P>|t|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------
Intercept      -17.2184      4.644     -3.707      0.000       -26.350    -8.087
acceleration     0.0806      0.099      0.815      0.415        -0.114     0.275
cylinders       -0.4934      0.323     -1.526      0.128        -1.129     0.142
displacement     0.0199      0.008      2.647      0.008         0.005     0.035
horsepower      -0.0170      0.014     -1.230      0.220        -0.044     0.010
origin           1.4261      0.278      5.127      0.000         0.879     1.973
weight          -0.0065      0.001     -9.929      0.000        -0.008    -0.005
year             0.7508      0.051     14.729      0.000         0.651     0.851
==============================================================================
Omnibus:                       31.906   Durbin-Watson:                   1.309
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               53.100
Skew:                           0.529   Prob(JB):                     2.95e-12
Kurtosis:                       4.460   Cond. No.                     8.59e+04
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 8.59e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

In [ ]: