Multiple Linear Regression

We just saw how we can predict life expectancy using BMI. Here, BMI was the predictor, also known as an independent variable. A predictor is a variable we're looking at in order to make predictions about other variables, while the values we are trying to predict are known as dependent variables. In this case, life expectancy was the dependent variable.

If the outcome we want to predict depends on more than one variable, we can make a more complicated model. As long as they're relevant to the situation, using more independent/predictor variables can help we get a better prediction.

When there's just one predictor, the linear regression model is a line, but as we add more predictor variables, we're adding more dimensions.

When we have one predictor variable, the equation of the line is

y=mx+b

and the plot might look something like we saw before.

Adding a predictor variable to go to two predictor variables means that the predicting equation is:

$ y=m_1x_1 +m_2x_2 + b $

To represent this graphically, we'll need a three-dimensional plot, with the linear regression model represented as a plane.

Data discover

We'll be using the Boston house-prices dataset. The dataset consists of 13 features of 506 houses and their median value in $1000's. We'll fit a model on the 13 features to predict on the value of houses.

Load the libraries


In [1]:
from sklearn.linear_model import LinearRegression
# here we just downloaded the data from the library
from sklearn.datasets import load_boston

Load the data


In [2]:
# Load the data from the the boston house-prices dataset 
boston_data = load_boston()
x = boston_data['data']
y = boston_data['target']

Linear Regression


In [3]:
# Make and fit the linear regression model
# Fit the model and Assign it to the model variable
model = LinearRegression()
model.fit(x,y)


Out[3]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [4]:
# Make a prediction using the model
sample_house = [[2.29690000e-01, 0.00000000e+00, 1.05900000e+01, 0.00000000e+00, 4.89000000e-01,
                6.32600000e+00, 5.25000000e+01, 4.35490000e+00, 4.00000000e+00, 2.77000000e+02,
                1.86000000e+01, 3.94870000e+02, 1.09700000e+01]]

Prediction


In [5]:
# Predict housing price for the sample_house
prediction = model.predict(sample_house)
print(prediction)


[ 23.68420569]