3.4 Statistical inference


In [1]:
from __future__ import print_function, division
%matplotlib inline

import matplotlib
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# use matplotlib style sheet
plt.style.use('ggplot')

# import statsmodels for R-style regression
import statsmodels.formula.api as smf

Read the data

Data are in the child.iq directory of the ARM_Data download-- you might have to change the path I use below to reflect the path on your computer.


In [2]:
kidiq  = pd.read_stata("../../ARM_Data/child.iq/kidiq.dta")
kidiq.head()


Out[2]:
kid_score mom_hs mom_iq mom_work mom_age
0 65 1 121.117529 4 27
1 98 1 89.361882 4 25
2 85 1 115.443165 4 27
3 83 1 99.449639 3 25
4 115 1 92.745710 4 27

Regression-- to demonstrate reports of fit, Pg38


In [3]:
fit = smf.ols('kid_score ~ mom_hs + mom_iq', data=kidiq).fit()

Display, Pg 38

There is no python counterpart to the display function in R. However, we can quickly write one for use-- the idea here is to provide an intermediate amount of information, preferred by the authors, about the fit.


In [4]:
def display(f):
    """Replicate R-style display command."""
    
    output = "{:<12s}  {:>10s}   {:>10s}\n".format("", "coef.est", "coef.se")
    for p in fit.bse.index:
        output += "{:<12s}  {:>10.2f}   {:>10.2f}\n".format(p, fit.params[p],
                                                            fit.bse[p])
    
    output += "---\n"
    output += "n = {}, k = {}\n".format(int(fit.nobs), int(fit.df_model)+1)
    
    # residual sd from Pg 41
    resid_sd = np.sqrt(np.sum(fit.resid**2)/(fit.nobs-fit.df_model-1))
    output += "residual sd = {:.2f}, R-squared = {:.2f}\n".format(resid_sd,
                                                                  fit.rsquared)
    
    print(output)

Now, use it:


In [5]:
display(fit)


                coef.est      coef.se
Intercept          25.73         5.88
mom_hs              5.95         2.21
mom_iq              0.56         0.06
---
n = 434, k = 3
residual sd = 18.14, R-squared = 0.21

Print, Pg 39

Too little information? Just the parameters.


In [6]:
print(fit.params)


Intercept    25.731538
mom_hs        5.950117
mom_iq        0.563906
dtype: float64

Summary, Pg 38

Too much information? A ton of (undesireable?) information.


In [7]:
print(fit.summary())


                            OLS Regression Results                            
==============================================================================
Dep. Variable:              kid_score   R-squared:                       0.214
Model:                            OLS   Adj. R-squared:                  0.210
Method:                 Least Squares   F-statistic:                     58.72
Date:                Thu, 30 Jul 2015   Prob (F-statistic):           2.79e-23
Time:                        15:30:59   Log-Likelihood:                -1872.0
No. Observations:                 434   AIC:                             3750.
Df Residuals:                     431   BIC:                             3762.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept     25.7315      5.875      4.380      0.000        14.184    37.279
mom_hs         5.9501      2.212      2.690      0.007         1.603    10.297
mom_iq         0.5639      0.061      9.309      0.000         0.445     0.683
==============================================================================
Omnibus:                        7.327   Durbin-Watson:                   1.625
Prob(Omnibus):                  0.026   Jarque-Bera (JB):                7.530
Skew:                          -0.313   Prob(JB):                       0.0232
Kurtosis:                       2.845   Cond. No.                         683.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.