Multiple Regression

Simple Linear Regression: $$ y = \beta_0 + \beta_1 X $$ Multiple Linear Regression: $$ y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... $$ Well studied field in statistics Focus will be on what is relevant for Data Science - practical and relevant for predection


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

In [2]:
from sklearn.datasets import load_boston

In [3]:
boston_data = load_boston()

In [4]:
df = pd.DataFrame(boston_data.data, columns=boston_data.feature_names)

In [5]:
df.head()


Out[5]:
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 15.3 396.90 4.98
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 17.8 396.90 9.14
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8 392.83 4.03
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7 394.63 2.94
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 18.7 396.90 5.33

In [6]:
df.shape


Out[6]:
(506, 13)

In [7]:
X = df

In [8]:
y = boston_data.target

Statsmodels


In [10]:
import statsmodels.api as sm
import statsmodels.formula.api as smf


---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-10-93c018dacf9a> in <module>()
----> 1 import statsmodels.api as sm
      2 import statsmodels.formula.api as smf

ModuleNotFoundError: No module named 'statsmodels'

In [ ]: