# 教育経済学：課題１

## 「大学進学率に対する回帰分析」

（出典）「家計調査結果」(総務省統計局)、総務省統計局「日本の統計 2015」、文部科学省調査より算出

``````

In [1]:

%matplotlib inline

``````
``````

In [2]:

# -*- coding:utf-8 -*-
from __future__ import print_function
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from scipy import stats

``````
``````

In [3]:

# データを読み込む
df = pd.read_csv("domestic.csv", index_col='year', dtype='float')
df['log_income'] = np.log(df['income'])
df['log_pay'] = np.log(df['pay'])
df['log_pop'] = np.log(df['pop'])

``````
``````

In [4]:

# 要約統計量
df[['enroll' ,'log_income', 'log_pay', 'log_pop']].describe()

``````
``````

Out[4]:

enroll
log_income
log_pay
log_pop

count
47.000000
47.000000
47.000000
47.000000

mean
37.798936
12.864880
13.588054
14.342272

std
11.082234
0.234680
0.341217
0.165470

min
15.660000
12.275375
12.826275
14.008883

25%
34.905000
12.824653
13.230646
14.249943

50%
37.610000
12.949692
13.685634
14.307435

75%
48.390000
13.037063
13.893989
14.448059

max
56.200000
13.089220
13.984339
14.728288

``````
``````

In [5]:

# 相関を求める
df[['log_income', 'log_pay', 'log_pop']].corr()

``````
``````

Out[5]:

log_income
log_pay
log_pop

log_income
1.000000
0.857604
-0.401428

log_pay
0.857604
1.000000
-0.383214

log_pop
-0.401428
-0.383214
1.000000

``````
``````

In [6]:

# 単回帰分析（大学進学率と可処分所得）
# 説明変数設定
X = df[['log_income']]
# 被説明変数設定
Y = df['enroll']
# OLSの実行(Ordinary Least Squares: 最小二乗法)
model1 = sm.OLS(Y,X)
results1 = model1.fit()
print(results1.summary())
plt.scatter(df['log_income'], df['enroll'])
plt.plot(df['log_income'], results1.predict())

``````
``````

OLS Regression Results
==============================================================================
Dep. Variable:                 enroll   R-squared:                       0.780
Model:                            OLS   Adj. R-squared:                  0.775
Method:                 Least Squares   F-statistic:                     159.3
Date:                Sun, 25 Oct 2015   Prob (F-statistic):           2.20e-16
Time:                        23:39:17   Log-Likelihood:                -143.69
No. Observations:                  47   AIC:                             291.4
Df Residuals:                      45   BIC:                             295.1
Df Model:                           1
Covariance Type:            nonrobust
==============================================================================
coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const       -498.6344     42.515    -11.728      0.000      -584.264  -413.004
log_income    41.6975      3.304     12.620      0.000        35.042    48.353
==============================================================================
Omnibus:                        4.017   Durbin-Watson:                   0.128
Prob(Omnibus):                  0.134   Jarque-Bera (JB):                3.172
Skew:                           0.628   Prob(JB):                        0.205
Kurtosis:                       3.211   Cond. No.                         717.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Out[6]:

[<matplotlib.lines.Line2D at 0x108e2dd10>]

``````
``````

In [7]:

# 単回帰分析（大学進学率と初年度納付金）
# 説明変数設定
X = df[['log_pay']]
# 被説明変数設定
Y = df['enroll']
# OLSの実行(Ordinary Least Squares: 最小二乗法)
model2 = sm.OLS(Y,X)
results2 = model2.fit()
print(results2.summary())
plt.scatter(df['log_pay'], df['enroll'])
plt.plot(df['log_pay'], results2.predict())

``````
``````

OLS Regression Results
==============================================================================
Dep. Variable:                 enroll   R-squared:                       0.722
Model:                            OLS   Adj. R-squared:                  0.716
Method:                 Least Squares   F-statistic:                     117.0
Date:                Sun, 25 Oct 2015   Prob (F-statistic):           4.18e-14
Time:                        23:39:17   Log-Likelihood:                -149.13
No. Observations:                  47   AIC:                             302.3
Df Residuals:                      45   BIC:                             306.0
Df Model:                           1
Covariance Type:            nonrobust
==============================================================================
coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const       -337.2651     34.681     -9.725      0.000      -407.116  -267.414
log_pay       27.6025      2.552     10.818      0.000        22.463    32.742
==============================================================================
Omnibus:                        2.065   Durbin-Watson:                   0.180
Prob(Omnibus):                  0.356   Jarque-Bera (JB):                1.438
Skew:                           0.424   Prob(JB):                        0.487
Kurtosis:                       3.127   Cond. No.                         550.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Out[7]:

[<matplotlib.lines.Line2D at 0x1094ee1d0>]

``````
``````

In [8]:

# 単回帰分析（大学進学率と初年度納付金）
# 説明変数設定
X = np.log(df[['log_pop']])
# 被説明変数設定
Y = df['enroll']
# OLSの実行(Ordinary Least Squares: 最小二乗法)
model3 = sm.OLS(Y,X)
results3 = model3.fit()
print(results3.summary())
plt.scatter(np.log(df['log_pop']), df['enroll'])
plt.plot(np.log(df['log_pop']), results3.predict())

``````
``````

OLS Regression Results
==============================================================================
Dep. Variable:                 enroll   R-squared:                       0.514
Model:                            OLS   Adj. R-squared:                  0.504
Method:                 Least Squares   F-statistic:                     47.68
Date:                Sun, 25 Oct 2015   Prob (F-statistic):           1.41e-08
Time:                        23:39:17   Log-Likelihood:                -162.26
No. Observations:                  47   AIC:                             328.5
Df Residuals:                      45   BIC:                             332.2
Df Model:                           1
Covariance Type:            nonrobust
==============================================================================
coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const       1875.1078    266.083      7.047      0.000      1339.189  2411.027
log_pop     -689.9016     99.912     -6.905      0.000      -891.135  -488.668
==============================================================================
Omnibus:                       15.329   Durbin-Watson:                   0.150
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               18.906
Skew:                          -1.133   Prob(JB):                     7.85e-05
Kurtosis:                       5.126   Cond. No.                         710.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Out[8]:

[<matplotlib.lines.Line2D at 0x10992af10>]

``````
``````

In [9]:

# 説明変数設定
X = df[['log_income', 'log_pay', 'log_pop']]
# 被説明変数設定
Y = df['enroll']
# OLSの実行(Ordinary Least Squares: 最小二乗法)
model4 = sm.OLS(Y,X)
results4 = model4.fit()
print(results4.summary())

``````
``````

OLS Regression Results
==============================================================================
Dep. Variable:                 enroll   R-squared:                       0.958
Model:                            OLS   Adj. R-squared:                  0.956
Method:                 Least Squares   F-statistic:                     330.6
Date:                Sun, 25 Oct 2015   Prob (F-statistic):           1.04e-29
Time:                        23:39:18   Log-Likelihood:                -104.49
No. Observations:                  47   AIC:                             217.0
Df Residuals:                      43   BIC:                             224.4
Df Model:                           3
Covariance Type:            nonrobust
==============================================================================
coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const         29.1362     45.057      0.647      0.521       -61.731   120.003
log_income    22.0736      2.888      7.642      0.000        16.248    27.899
log_pay        9.3663      1.970      4.755      0.000         5.394    13.339
log_pop      -28.0695      2.281    -12.305      0.000       -32.670   -23.469
==============================================================================
Omnibus:                        1.450   Durbin-Watson:                   0.398
Prob(Omnibus):                  0.484   Jarque-Bera (JB):                1.362
Skew:                          -0.295   Prob(JB):                        0.506
Kurtosis:                       2.410   Cond. No.                     3.12e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.12e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

``````
``````

In [10]:

plt.plot(np.asarray(df.index).astype(int), df['enroll'], label='actual')
plt.plot(np.asarray(df.index).astype(int), results1.predict(), label='predict1')
plt.xlabel('year')
plt.ylabel('Enrollment Rate')
plt.legend(loc=2)
plt.savefig('predict1.png')

``````
``````

``````
``````

In [11]:

plt.plot(np.asarray(df.index).astype(int), df['enroll'], label='actual')
plt.plot(np.asarray(df.index).astype(int), results2.predict(), label='predict2')
plt.xlabel('year')
plt.ylabel('Enrollment Rate')
plt.legend(loc=2)
plt.savefig('predict2.png')

``````
``````

``````
``````

In [12]:

plt.plot(np.asarray(df.index).astype(int), df['enroll'], label='actual')
plt.plot(np.asarray(df.index).astype(int), results3.predict(), label='predict3')
plt.xlabel('year')
plt.ylabel('Enrollment Rate')
plt.legend(loc=2)
plt.savefig('predict3.png')

``````
``````

``````
``````

In [13]:

plt.plot(np.asarray(df.index).astype(int), df['enroll'], label='actual')
plt.plot(np.asarray(df.index).astype(int), results4.predict(), label='predict4')
plt.xlabel('year')
plt.ylabel('Enrollment Rate')
plt.legend(loc=2)
plt.savefig('predict4.png')

``````
``````

``````