In [50]:
from __future__ import division
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
Linear regression is a type of parametrized model identification problem in which we assume a linear relationship between the input and output variables.
For a simple regression over a set of $N$ input-output pairs $(x_i, y_i)$, we assume that the output $y$ has the following dependence on input $x$: $$y = m x + b,$$ where $m$ and $b$ are model parameters that must be recovered from the data.
In [27]:
# read in data
df = pd.read_csv('Table7-1.csv', skiprows=[0]).rename(index=str, columns=(lambda x: x.strip()))
df['Rating'] = df['Rating'].apply(lambda x: x.strip())
In [46]:
# data visualization
plt.scatter(df.Size, df.Price);
plt.plot(df.Size, df.Size*1.2-400, 'r');
plt.xlim([400, 1080]);
plt.ylim([250, 700]);
A standard approach is to select $m$ and $b$ that minimize the mean squared error (MSE) over the available dataset. For a given value of $m$ and $b$, and estimated output $\hat{y}$, the MSE is defined as: $$MSE(m,b) = \frac{1}{2N} \sum_{i=1}^N \left(\hat{y}_i - y_i\right)^2 = \frac{1}{2N} \sum_{i=1}^N \left(m x_i + b - y_i\right)^2.$$
In [198]:
# the MSE for various values of slope m and y-intercept b:
m = np.linspace(-10, 15, 40);
b = np.linspace(-3000, 3000, 20);
MSEmb = np.zeros((len(m), len(b)));
for ii in range(len(m)):
for jj in range(len(b)):
MSEmb[ii,jj] = sum((m[ii]*df.Size + b[jj] - df.Price)**2)/(2*len(df))
In [199]:
plt.pcolor(b, m, MSEmb);
To minimize the MSE, consider the derivative with respect to each of the model parameters. The minimum MSE occurs where the derivative is zero with respect to both parameters:
$$\frac{\partial MSE}{\partial m} = \frac{1}{N} \sum_{i=1}^N ((m x_i + b) - y_i)x_i = 0 \Rightarrow m = \frac{\sum_{i=1}^N (b - y_i)x_i}{\sum_{i=1}^N x_i^2}$$$$\frac{\partial MSE}{\partial b} = \frac{1}{N} \sum_{i=1}^N ((m x_i + b) - y_i) = 0 \Rightarrow b = \frac{\sum_{i=1}^N (m x_i - y_i)}{N}$$This gives you a set of two coupled linear equations, which can be solved exactly... but it is often useful to use a more generally applicable iterative method like gradient descent.
Gradient descent is a function-minimization (maximization) method that is used in a wide range of applications.
How it works:
Notes on choosing the learning rate:
In [213]:
# set step size:
a = 0.000001
N = len(df)
# set initial conditions
b = 10
m = 0.8
# set initial MSE values
MSE_old = 1.e8
MSE = sum(m*df.Size + b - df.Price)**2/(2*N)
steps = 0
MSE_vals = [MSE]
b_vals = [b]
m_vals = [m]
# run gradient descent:
while (MSE_old-MSE > 0.00001) and (steps < 1e5):
# compute gradient
dMSE_db = sum(m*df.Size + b - df.Price)/N
dMSE_dm = sum((m*df.Size + b - df.Price)*df.Size)/N
# update parameter values
m -= a*dMSE_dm
b -= a*dMSE_db
# update MSE values
MSE_old = MSE
MSE = sum(m*df.Size + b - df.Price)**2/(2*N)
MSE_vals.append(MSE)
b_vals.append(b)
m_vals.append(m)
steps += 1
In [208]:
# descrease in MSE with incresing step number
plt.plot(MSE_vals);
plt.ylabel("MSE");
plt.xlabel("step");
In [211]:
# path in parameter space
# the MSE for various values of slope m and y-intercept b:
mv = np.linspace(-0.5, 1.5, 40);
bv = np.linspace(0, 15, 20);
MSEmb = np.zeros((len(mv), len(bv)));
for ii in range(len(mv)):
for jj in range(len(bv)):
MSEmb[ii,jj] = sum((mv[ii]*df.Size + bv[jj] - df.Price)**2)/(2*len(df))
plt.pcolor(bv, mv, MSEmb);
plt.plot(b_vals, m_vals, 'xg');
plt.plot(6.47, 0.62, 'or');
plt.xlabel("b")
plt.ylabel("m")
Out[211]:
In [216]:
plt.plot(df.Size, df.Price, 'ob');
plt.plot(df.Size, m*df.Size+b, 'r');
plt.plot(df.Size, 0.62*df.Size+6.47, '--k');
Multilinear regression is the same as linear regression, but now you have more than one input feature. That is, you have a set of input features $x^1, \ldots, x^M$ and input-output data $(x^1_i, \ldots, x^M_i, y_i)$, and you are trying to optimize over parameters $(\beta_0, \beta_1, \ldots, \beta_M)$, where you assume a model of the form: $$y = \beta_0 + \beta_1 x^1 + \ldots + \beta_M x^M$$ For convenience, it is helpful to define an additional "feature" $x^0$ that is always equal to $1$. Then you can write: $$y = \beta_0 x^0 + \beta_1 x^1 + \ldots + \beta_M x^M = B^T X,$$ where $B = (\beta_0, \ldots, \beta_M)^T$ and $X = (x^0, \ldots, x^M)^T$.
Just like before, your goal is to minimize the MSE: $$MSE(B) = \frac{1}{2N} \sum_{i=1}^N \left(\hat{y}_i - y_i\right)^2 = \frac{1}{2N} \sum_{i=1}^N \left(\beta_0 x^0_i + \ldots + \beta_M x^M_i - y_i\right)^2 = \frac{1}{2N} \sum_{i=1}^N \left(B^TX_i - y_i\right)^2.$$
Then you just use gradient descent like any other time.
In [218]:
from sklearn import linear_model
from sklearn.metrics import mean_squared_error, r2_score
In [219]:
# linear regression model from sklearn
lin_mod = linear_model.LinearRegression()
In [222]:
lin_mod.fit(df[['Size','Floor','Rate']], df.Price)
Out[222]:
In [244]:
# Fit coefficients:
print('Coefficients: %s, %s'%(lin_mod.intercept_.astype(str), ", ".join(map(str, lin_mod.coef_))))
In [245]:
plt.plot(df.Size, df.Price, 'ob');
plt.plot(df.Size, lin_mod.intercept_ + lin_mod.coef_[0]*df.Size + lin_mod.coef_[1]*df.Floor + lin_mod.coef_[2]*df.Rate, 'r');
plt.plot(df.Size, -0.1513 + 0.6270*df.Size - 0.1781*df.Floor + 0.0714*df.Rate, '--k');
plt.xlabel("Size");
plt.ylabel("Price");
In [246]:
Price_pred = lin_mod.predict(df[['Size','Floor','Rate']])
In [248]:
mean_squared_error(df.Price, Price_pred)
Out[248]:
In [249]:
r2_score(df.Price, Price_pred)
Out[249]:
Your (multi)linear regression becomes a ridge regression if you add an extra term to your cost function (rather than just using the MSE): $$J(B) = \frac{1}{2N} \sum_{i=1}^N \left(\hat{y}_i - y_i\right)^2 + \frac{\lambda}{2M}\sum_{j=1}^M\beta_j^2.$$
Your (multi)linear regression becomes a LASSO regression if you add an extra term to your cost function (rather than just using the MSE): $$J(B) = \frac{1}{2N} \sum_{i=1}^N \left(\hat{y}_i - y_i\right)^2 + \frac{\lambda}{2M}\sum_{j=1}^M\left|\beta_j\right|.$$