Regularized Linear Regression

Hopefully, by now you have understood the intuition behind Regularized Linear Regression and the purpose of the lambda value.

The goal of this exercise is to help you qualitatively understand how changing lambda-values impact the model accuracy. By going through this exercise, you will be able to get a visual idea of how different lambda values affect the model. Make sure you go through the exercise fully and understand what is going on. Feel free to change the lambda values to get a sense of how the model's generalizability is affected.

Data source: http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=MachineLearning&doc=exercises/ex5/ex5.html



In [46]:

    
# import libraries

import matplotlib
import IPython
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import pylab
import seaborn as sns
import sklearn as sk

%matplotlib inline



In [47]:

    
# Load your data here
x_values = pd.read_csv('ex5Linx.dat', sep='\s+', header=None, skiprows=1)
y_values = pd.read_csv('ex5Liny.dat', sep='\s+', header=None, skiprows=1)

To perform Ridge Regression, we will use another module called Ridge that has similar methods to Linear Regression. The first argument that Ridge takes is the lambda-value. Below, there are three different Ridge Regression models that have lambda values of 0, 1, and respectively. You are free to choose different values. First, we will plot our model's prediction when the lambda-value is 0. This is identical to Non-regularized Linear Regression.



In [48]:

    
from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

# lambda value of 0
ridge_reg_0 = make_pipeline(PolynomialFeatures(6), Ridge(alpha=0))

ridge_reg_0.fit(x_values, y_values)
y_predicted = ridge_reg_0.predict(x_values)

plt.scatter(x_values,y_values,c='r')
plt.plot(x_values, y_predicted, c='b')









    Out[48]:





[<matplotlib.lines.Line2D at 0x10f525410>]

It is fairly obvious that this model is overfitting on the data. This is where regularization, more specifically the lambda-value comes into the picture. By increasing the lambda-value, we start penalizing overly complex modules and move towards a more generalizable model.



In [49]:

    
# lambda value of 0.8
ridge_reg_1 = make_pipeline(PolynomialFeatures(6), Ridge(alpha=0.8))
ridge_reg_1.fit(x_values, y_values)
y_predicted = ridge_reg_1.predict(x_values)

plt.scatter(x_values,y_values,c='r')
plt.plot(x_values, y_predicted, c='b')









    Out[49]:





[<matplotlib.lines.Line2D at 0x103d42f10>]

With a lambda-value of 1, notice the drastic change in the model. All the conditions are the same as before except for the lambda-value which is now 1 instead of 0. However, that change has reduced the complexity of the model and it does not seem to be overfitting like the previous model. By seeing this diagram, it should help build the intuition as to how the lambda-value affects a model. Now the question is, does increasing the lambda-value always guarantee a better model? Think about this question before you move onto the last part of the exercise.



In [50]:

    
# lambda value of 8
ridge_reg_10 = make_pipeline(PolynomialFeatures(6), Ridge(alpha=20))
ridge_reg_10.fit(x_values, y_values)
y_predicted = ridge_reg_10.predict(x_values)

plt.scatter(x_values,y_values,c='r')
plt.plot(x_values, y_predicted, c='b')









    Out[50]:





[<matplotlib.lines.Line2D at 0x103d1a150>]

As you can see, increasing the lambda-value does not guarantee a more successful model. Recall the error equation that was presented in the slides. When the lambda-value is increased, the model complexity term dominates the training error term. In other words, the model is oversimplified and results in underfitting (opposite of overfitting).

Optional DIY Learning Exercise

Hopefully, by now, you understand the concept of Regularized Linear Regression. We now have an optional coding exercise for you to try on your own time.

Given the modified version of the Boston Housing Data from the previous modules, try to apply Ridge Regression to this data and determine the optimal lambda-value using K-fold Cross-Validation. For reference, you can use the pseudocode given in the Module 3b slides. Good luck!