重回帰分析(Multiple Regression Analysis)

Author: Yoshimasa Ogawa
LastModified: 2015-12-23

圓川隆夫『多変量のデータ解析』(1988年, 朝倉書店)第2章「重回帰分析」をPythonで実行します。


In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

In [2]:
# データ読み込み
data = pd.read_csv('data/tab26.csv')
data.head()


Out[2]:
y x1 x2 x3 x4
0 2.9 2 0 0 1
1 10.5 8 1 0 3
2 7.5 6 1 1 2
3 6.5 4 1 0 1
4 6.0 3 1 1 1

In [3]:
# 説明変数設定
X = data[['x1', 'x2', 'x3', 'x4']]
X = sm.add_constant(X)
# 非説明変数設定
Y = data['y']

In [4]:
# OLSの実行
model1 = sm.OLS(Y,X)
results1 = model1.fit()
results1.summary()


Out[4]:
OLS Regression Results
Dep. Variable: y R-squared: 0.897
Model: OLS Adj. R-squared: 0.814
Method: Least Squares F-statistic: 10.86
Date: Wed, 23 Dec 2015 Prob (F-statistic): 0.0111
Time: 00:28:14 Log-Likelihood: -9.7935
No. Observations: 10 AIC: 29.59
Df Residuals: 5 BIC: 31.10
Df Model: 4
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
const 2.2947 0.904 2.539 0.052 -0.029 4.618
x1 0.5771 0.395 1.463 0.203 -0.437 1.591
x2 1.4843 0.723 2.053 0.095 -0.374 3.343
x3 -0.8356 0.707 -1.182 0.290 -2.653 0.981
x4 0.5600 1.151 0.487 0.647 -2.398 3.518
Omnibus: 0.141 Durbin-Watson: 2.294
Prob(Omnibus): 0.932 Jarque-Bera (JB): 0.156
Skew: 0.162 Prob(JB): 0.925
Kurtosis: 2.483 Cond. No. 28.8

In [5]:
# 説明変数設定
X = data[['x1', 'x2', 'x3']]
X = sm.add_constant(X)
# OLSの実行
model2 = sm.OLS(Y,X)
results2 = model2.fit()
results2.summary()


Out[5]:
OLS Regression Results
Dep. Variable: y R-squared: 0.892
Model: OLS Adj. R-squared: 0.838
Method: Least Squares F-statistic: 16.51
Date: Wed, 23 Dec 2015 Prob (F-statistic): 0.00265
Time: 00:28:14 Log-Likelihood: -10.025
No. Observations: 10 AIC: 28.05
Df Residuals: 6 BIC: 29.26
Df Model: 3
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
const 2.5003 0.746 3.350 0.015 0.674 4.327
x1 0.7599 0.112 6.765 0.001 0.485 1.035
x2 1.3900 0.651 2.136 0.077 -0.202 2.982
x3 -0.8633 0.658 -1.311 0.238 -2.474 0.747
Omnibus: 0.107 Durbin-Watson: 2.260
Prob(Omnibus): 0.948 Jarque-Bera (JB): 0.300
Skew: 0.158 Prob(JB): 0.861
Kurtosis: 2.213 Cond. No. 19.5

In [6]:
# 説明変数設定
X = data[['x1', 'x2']]
X = sm.add_constant(X)
# OLSの実行
model3 = sm.OLS(Y,X)
results3 = model3.fit()
results3.summary()


Out[6]:
OLS Regression Results
Dep. Variable: y R-squared: 0.861
Model: OLS Adj. R-squared: 0.821
Method: Least Squares F-statistic: 21.67
Date: Wed, 23 Dec 2015 Prob (F-statistic): 0.00100
Time: 00:28:14 Log-Likelihood: -11.285
No. Observations: 10 AIC: 28.57
Df Residuals: 7 BIC: 29.48
Df Model: 2
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
const 2.6155 0.778 3.360 0.012 0.775 4.456
x1 0.7369 0.117 6.324 0.000 0.461 1.012
x2 1.0233 0.617 1.658 0.141 -0.436 2.483
Omnibus: 0.240 Durbin-Watson: 2.132
Prob(Omnibus): 0.887 Jarque-Bera (JB): 0.396
Skew: -0.021 Prob(JB): 0.820
Kurtosis: 2.026 Cond. No. 18.0

In [7]:
# 説明変数設定
X = data[['x1']]
X = sm.add_constant(X)
# OLSの実行
model4 = sm.OLS(Y,X)
results4 = model4.fit()
results4.summary()


Out[7]:
OLS Regression Results
Dep. Variable: y R-squared: 0.806
Model: OLS Adj. R-squared: 0.782
Method: Least Squares F-statistic: 33.31
Date: Wed, 23 Dec 2015 Prob (F-statistic): 0.000419
Time: 00:28:14 Log-Likelihood: -12.942
No. Observations: 10 AIC: 29.88
Df Residuals: 8 BIC: 30.49
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
const 3.3053 0.726 4.551 0.002 1.630 4.980
x1 0.7421 0.129 5.771 0.000 0.446 1.039
Omnibus: 1.684 Durbin-Watson: 2.321
Prob(Omnibus): 0.431 Jarque-Bera (JB): 0.731
Skew: -0.652 Prob(JB): 0.694
Kurtosis: 2.766 Cond. No. 13.5

In [8]:
# モデル選択
criteria = pd.DataFrame(index=['results1', 'results2', 'results3', 'results4'])
criteria["AIC"] = [results1.aic, results2.aic, results3.aic, results4.aic]
criteria["BIC"] = [results1.bic, results2.bic, results3.bic, results4.bic]
criteria


Out[8]:
AIC BIC
results1 29.587070 31.099996
results2 28.049772 29.260113
results3 28.570178 29.477933
results4 29.883355 30.488525