Consider the multivariate linear model:
$$y = X\beta + \epsilon$$where $y$ is a length $n$ vector, $X$ is an $m \times p$ matrix, and $\beta$ is a $p$ length vector of coefficients.
OLS Regression seeks to minimize the following cost function:
$$\|y - \beta\mathbf {X}\|^{2}$$The best fit coefficients can be obtained by:
$$\hat{\beta} = (X^T X)^{-1}X^Ty$$where $X^T$ is the transpose of the matrix $X$ and $X^{-1}$ is the inverse of the matrix $X$.
Ridge Regression introduces an L2 regularization term to the cost function:
$$\|y - \beta\mathbf {X}\|^{2}+\|\Gamma \beta \|^{2}$$Where $\Gamma = \alpha I$ for some constant $\alpha$ and the identity matrix $I$.
The best fit coefficients can be obtained by: $$\hat{\beta} = (X^T X+\Gamma^T\Gamma)^{-1}X^Ty$$
Lasso Regression introduces an L1 regularization term and restricts the total number of predictor variables in the model. The following cost function: $${\displaystyle \min _{\beta _{0},\beta }\left\{{\frac {1}{m}}\left\|y-\beta _{0}-X\beta \right\|_{2}^{2}\right\}{\text{ subject to }}\|\beta \|_{1}\leq \alpha.}$$
does not have a nice closed form solution. For the sake of this exercise, you may use the sklearn.linear_model.Lasso class, which uses a coordinate descent algorithm to find the best fit. You should only use the class in the fit() method of this exercise (ie. do not re-use the sklearn for other methods in your class).
The $R^2$ score is defined as: $${R^{2} = {1-{SS_E \over SS_T}}}$$
Where:
$$SS_T=\sum_i (y_i-\bar{y})^2, SS_R=\sum_i (\hat{y_i}-\bar{y})^2, SS_E=\sum_i (y_i - \hat{y_i})^2$$where ${y_i}$ are the original data values, $\hat{y_i}$ are the predicted values, and $\bar{y_i}$ is the mean of the original data values.
Write a class called Regression
with the following methods:
$fit(X, y)$: Fits linear model to $X$ and $y$.
$get\_params()$: Returns $\hat{\beta}$ for the fitted model. The parameters should be stored in a dictionary.
$predict(X)$: Predict new values with the fitted model given $X$.
$score(X, y)$: Returns $R^2$ value of the fitted model.
$set\_params()$: Manually set the parameters of the linear model.
This parent class should throw a NotImplementedError
for methods that are intended to be implemented by subclasses.
In [1]:
class Regression(object):
def __init__(self):
self.params = dict()
def get_params(self):
return self.params
def set_params(self, **kwargs):
for k,v in kwargs.iteritems():
self.params[k] = v
def fit(self, X, y):
raise NotImplementedError()
def predict(self, X):
raise NotImplementedError()
def score(self, X, y):
raise NotImplementedError()
In [2]:
class LinearRegression(Regression):
def __init__(self):
super()
#super(LinearRegression, self).__init__() # For Python2.7 only
def fit(self, X, y):
X = np.append(np.ones((X.shape[0], 1)), X, axis=1)
beta = np.linalg.pinv(X.T.dot(X)).dot(X.T).dot(y)
self.params['coef'] = beta[1:]
self.params['intercept'] = beta[0]
def predict(self, X):
return np.dot(X, self.params['coef'])+self.params['intercept']
def score(self, X, y):
y_hat = self.predict(X)
y_bar = np.mean(y)
SST = np.sum((y - y_bar)**2)
SSRes = np.sum((y_hat - y)**2)
R2 = 1 - SSRes / SST
return R2
In [3]:
class RidgeRegression(LinearRegression):
def __init__(self, alpha=0.1):
super()
#super(RidgeRegression, self).__init__() # For Python2.7 only
self.params['alpha'] = alpha
def fit(self, X, y):
X = np.append(np.ones((X.shape[0], 1)), X, axis=1)
C = X.T.dot(X) + self.params['alpha']*np.eye(X.shape[1])
beta = np.linalg.inv(C).dot(X.T.dot(y))
self.params['coef'] = beta[1:]
self.params['intercept'] = beta[0]
In [4]:
from sklearn import linear_model
from sklearn.model_selection import train_test_split
class LassoRegression(LinearRegression):
def __init__(self, alpha=0.1):
super(self)
#super(LassoRegression, self).__init__() # For Python2.7 only
self.params['alpha'] = alpha
def fit(self, X, y):
clf = linear_model.Lasso(alpha=self.params['alpha'])
clf.fit(X, y)
self.params['coef'] = clf.coef_
self.params['intercept'] = clf.intercept_
You will use the Boston dataset for this part.
Instantiate each of the three models above. Using a for loop, fit (on the training data) and score (on the testing data) each model on the Boston dataset.
Print out the $R^2$ value for each model and the parameters for the best model using the get_params()
method. Use an $\alpha$ value of 0.1.
Hint: You can consider using the sklearn.model_selection.train_test_split
method to create the training and test datasets.
In [5]:
import numpy as np
from sklearn import datasets
from collections import Counter
import matplotlib.pyplot as plt
%matplotlib inline
dataset = datasets.load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(dataset['data'],
dataset['target'],
test_size=0.25,
random_state=42)
alpha = 0.1
models = [LinearRegression(),
RidgeRegression(alpha),
LassoRegression(alpha)]
for model in models:
model.fit(X_train, y_train)
print model.score(X_test, y_test)
We can evaluate how the models perform for various values of $\alpha$. Calculate the $R^2$ scores for each model for $\alpha \in [0.05, 1]$ and plot the three lines on the same graph. To change the parameters, use the set_params()
method. Be sure to label each line and add axis labels.
In [54]:
lines = np.zeros((3, 50, 2))
alpha = 0.1
models = [LinearRegression(),
RidgeRegression(alpha),
LassoRegression(alpha)]
for id, alpha in enumerate(np.linspace(0.05, 1, 50)):
for num, model in enumerate(models):
model.set_params(alpha=alpha)
model.fit(X_train, y_train)
lines[num, id] = [model.score(X_test, y_test), alpha]
In [61]:
plt.xlabel("alpha value")
plt.ylabel("$R^2$ Score")
for i, line in enumerate(lines):
plt.plot(line[:,1], line[:,0])
plt.legend(["Linear Regression", "Ridge Regression", "Lasso Regression"], bbox_to_anchor=(0.5, 0.5))
plt.show()