3.4 Statistical inference



In [1]:

    
from __future__ import print_function, division
%matplotlib inline

import matplotlib
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# use matplotlib style sheet
plt.style.use('ggplot')

# import statsmodels for R-style regression
import statsmodels.formula.api as smf

Read the data

Data are in the child.iq directory of the ARM_Data download-- you might have to change the path I use below to reflect the path on your computer.



In [2]:

    
kidiq  = pd.read_stata("../../ARM_Data/child.iq/kidiq.dta")
kidiq.head()

Regression-- to demonstrate reports of fit, Pg38



In [3]:

    
fit = smf.ols('kid_score ~ mom_hs + mom_iq', data=kidiq).fit()

Display, Pg 38

There is no python counterpart to the display function in R. However, we can quickly write one for use-- the idea here is to provide an intermediate amount of information, preferred by the authors, about the fit.



In [4]:

    
def display(f):
    """Replicate R-style display command."""
    
    output = "{:<12s}  {:>10s}   {:>10s}\n".format("", "coef.est", "coef.se")
    for p in fit.bse.index:
        output += "{:<12s}  {:>10.2f}   {:>10.2f}\n".format(p, fit.params[p],
                                                            fit.bse[p])
    
    output += "---\n"
    output += "n = {}, k = {}\n".format(int(fit.nobs), int(fit.df_model)+1)
    
    # residual sd from Pg 41
    resid_sd = np.sqrt(np.sum(fit.resid**2)/(fit.nobs-fit.df_model-1))
    output += "residual sd = {:.2f}, R-squared = {:.2f}\n".format(resid_sd,
                                                                  fit.rsquared)
    
    print(output)

Now, use it:



In [5]:

    
display(fit)









    



                coef.est      coef.se
Intercept          25.73         5.88
mom_hs              5.95         2.21
mom_iq              0.56         0.06
---
n = 434, k = 3
residual sd = 18.14, R-squared = 0.21

Print, Pg 39

Too little information? Just the parameters.



In [6]:

    
print(fit.params)









    



Intercept    25.731538
mom_hs        5.950117
mom_iq        0.563906
dtype: float64

Summary, Pg 38

Too much information? A ton of (undesireable?) information.



In [7]:

    
print(fit.summary())









    



                            OLS Regression Results                            
==============================================================================
Dep. Variable:              kid_score   R-squared:                       0.214
Model:                            OLS   Adj. R-squared:                  0.210
Method:                 Least Squares   F-statistic:                     58.72
Date:                Thu, 30 Jul 2015   Prob (F-statistic):           2.79e-23
Time:                        15:30:59   Log-Likelihood:                -1872.0
No. Observations:                 434   AIC:                             3750.
Df Residuals:                     431   BIC:                             3762.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept     25.7315      5.875      4.380      0.000        14.184    37.279
mom_hs         5.9501      2.212      2.690      0.007         1.603    10.297
mom_iq         0.5639      0.061      9.309      0.000         0.445     0.683
==============================================================================
Omnibus:                        7.327   Durbin-Watson:                   1.625
Prob(Omnibus):                  0.026   Jarque-Bera (JB):                7.530
Skew:                          -0.313   Prob(JB):                       0.0232
Kurtosis:                       2.845   Cond. No.                         683.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

	kid_score	mom_hs	mom_iq	mom_work	mom_age
0	65	1	121.117529	4	27
1	98	1	89.361882	4	25
2	85	1	115.443165	4	27
3	83	1	99.449639	3	25
4	115	1	92.745710	4	27