Energy Efficiency - UCI

Analysis of the energy efficiency dataset from UCI.


In [1]:
import numpy as np
import pandas as pd
%pylab inline
pylab.style.use('ggplot')
import seaborn as sns


Populating the interactive namespace from numpy and matplotlib

In [2]:
data_df = pd.read_csv('energy_efficiency.csv')

In [4]:
data_df.head()


Out[4]:
X1 X2 X3 X4 X5 X6 X7 X8 Y1 Y2
0 0.98 514.5 294.0 110.25 7.0 2 0.0 0 15.55 21.33
1 0.98 514.5 294.0 110.25 7.0 3 0.0 0 15.55 21.33
2 0.98 514.5 294.0 110.25 7.0 4 0.0 0 15.55 21.33
3 0.98 514.5 294.0 110.25 7.0 5 0.0 0 15.55 21.33
4 0.90 563.5 318.5 122.50 7.0 2 0.0 0 20.84 28.28

Attribute Information

X1 Relative Compactness
X2 Surface Area
X3 Wall Area
X4 Roof Area
X5 Overall Height
X6 Orientation
X7 Glazing Area
X8 Glazing Area Distribution
y1 Heating Load
y2 Cooling Load

Bivariate Analysis - Y1


In [9]:
for fname in feature_df:
    pylab.figure()
    sns.jointplot(x=fname, y='Y1', data=data_df)


<matplotlib.figure.Figure at 0x162ed625da0>
<matplotlib.figure.Figure at 0x162ed8660b8>
<matplotlib.figure.Figure at 0x162eda0db00>
<matplotlib.figure.Figure at 0x162edb1b208>
<matplotlib.figure.Figure at 0x162eec82f98>
<matplotlib.figure.Figure at 0x162eedbee48>
<matplotlib.figure.Figure at 0x162eef01a58>
<matplotlib.figure.Figure at 0x162ef05d2e8>

Feature Correlations with Y1


In [10]:
feature_df = data_df.drop(['Y1', 'Y2'], axis=1)

In [11]:
y1_corrs = feature_df.corrwith(data_df.Y1)
y1_corrs.plot(kind='bar')


Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x162ef48dc50>

Between Feature Correlations


In [15]:
f_corrs = feature_df.corr()
sns.heatmap(f_corrs, annot=True)


Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x162ef5fce48>

The Regression Model


In [17]:
import statsmodels.formula.api as sm

In [24]:
y1_model = sm.ols(data=data_df, 
                  formula='Y1 ~ X4 + X2 + X7')
y1_result = y1_model.fit()
y1_result.summary()


Out[24]:
OLS Regression Results
Dep. Variable: Y1 R-squared: 0.861
Model: OLS Adj. R-squared: 0.860
Method: Least Squares F-statistic: 1577.
Date: Mon, 07 Aug 2017 Prob (F-statistic): 0.00
Time: 23:16:38 Log-Likelihood: -2106.9
No. Observations: 768 AIC: 4222.
Df Residuals: 764 BIC: 4240.
Df Model: 3
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 32.5391 1.343 24.229 0.000 29.903 35.175
X4 -0.2810 0.006 -44.166 0.000 -0.294 -0.269
X2 0.0515 0.003 15.792 0.000 0.045 0.058
X7 20.4380 1.022 20.002 0.000 18.432 22.444
Omnibus: 144.564 Durbin-Watson: 0.535
Prob(Omnibus): 0.000 Jarque-Bera (JB): 310.656
Skew: 1.039 Prob(JB): 3.48e-68
Kurtosis: 5.322 Cond. No. 7.06e+03

Bivariate Analysis - Y2


In [26]:
for fname in feature_df:
    pylab.figure()
    sns.jointplot(x=fname, y='Y2', data=data_df)


<matplotlib.figure.Figure at 0x162ef3eda20>
<matplotlib.figure.Figure at 0x162ef3ed208>
<matplotlib.figure.Figure at 0x162eec6d128>
<matplotlib.figure.Figure at 0x162eefbca90>
<matplotlib.figure.Figure at 0x162ed94fda0>
<matplotlib.figure.Figure at 0x162ef0fd160>
<matplotlib.figure.Figure at 0x162ed8d9358>
<matplotlib.figure.Figure at 0x162eef1f748>

In [27]:
y2_corrs = feature_df.corrwith(data_df.Y2)
y2_corrs.plot(kind='bar')


Out[27]:
<matplotlib.axes._subplots.AxesSubplot at 0x162f0d540b8>

In [28]:
y2_model = sm.ols(data=data_df, 
                  formula='Y2 ~ X4 + X2 + X7')
y2_result = y1_model.fit()
y2_result.summary()


Out[28]:
OLS Regression Results
Dep. Variable: Y1 R-squared: 0.861
Model: OLS Adj. R-squared: 0.860
Method: Least Squares F-statistic: 1577.
Date: Mon, 07 Aug 2017 Prob (F-statistic): 0.00
Time: 23:21:04 Log-Likelihood: -2106.9
No. Observations: 768 AIC: 4222.
Df Residuals: 764 BIC: 4240.
Df Model: 3
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 32.5391 1.343 24.229 0.000 29.903 35.175
X4 -0.2810 0.006 -44.166 0.000 -0.294 -0.269
X2 0.0515 0.003 15.792 0.000 0.045 0.058
X7 20.4380 1.022 20.002 0.000 18.432 22.444
Omnibus: 144.564 Durbin-Watson: 0.535
Prob(Omnibus): 0.000 Jarque-Bera (JB): 310.656
Skew: 1.039 Prob(JB): 3.48e-68
Kurtosis: 5.322 Cond. No. 7.06e+03

In [ ]: