Ridge & Lasso Regression Tutorial

Ridge and Lasso regression are techniques used for preventing overfitting. Before going into the details, lets try to figure out how exactly can we detect overfitting, which will give us some untuition towards why the Ridge and Lasso models actually work.

Consider a simple scenario of modeling the 'sine' function. Below I have simulated the sine function with some noise. We will use this data for further analysis.

Define a sine simulation for explaining ridge:

Getting Started

Let's load the required modules first


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 12, 10
import random

Creating the data:

Here I am going to take a sine function with angles between 60 and 3000 degrees. The corresponding sine value is modifier with some random noise. The aim is to get as close to the sine function as possible using regression models.


In [2]:
x = np.array([i*np.pi/180 for i in range(60,300,4)])
np.random.seed(10)  #Setting seed for reproducability
y = np.sin(x) + np.random.normal(0,0.15,len(x))
data = pd.DataFrame(np.column_stack([x,y]),columns=['x','y'])
plt.plot(data['x'],data['y'],'.')


Out[2]:
[<matplotlib.lines.Line2D at 0x10c1c4c10>]

In [3]:
print data.shape


(60, 2)

Fit simple linear regression:

Lets set the ball rolling by fitting a simple linear regression


In [4]:
#Import Linear Regression model from scikit-learn.
from sklearn.linear_model import LinearRegression
def linear_regression(data, power, models_to_plot):
    
    #initialize predictors:
    predictors=['x']
    if power>=2:
        predictors.extend(['x_%d'%i for i in range(2,power+1)])
    
    #Fit the model
    linreg = LinearRegression(normalize=True)
    linreg.fit(data[predictors],data['y'])
    y_pred = linreg.predict(data[predictors])
    
    #Check if a plot is to be made for the entered power
    if power in models_to_plot:
        plt.subplot(models_to_plot[power])
        plt.tight_layout()
        plt.plot(data['x'],y_pred)
        plt.plot(data['x'],data['y'],'.')
        plt.title('Plot for power: %d'%power)
    
    #Return the result in pre-defined format
    rss = sum((y_pred-data['y'])**2)
    ret = [rss]
    ret.extend([linreg.intercept_])
    ret.extend(linreg.coef_)
    return ret

Determining overfitting:

Create 15 powers:


In [5]:
#Create powers upto 15:
for i in range(2,16):
    colname = 'x_%d'%i
    data[colname] = data['x']**i
print data.head()


          x         y       x_2       x_3       x_4       x_5       x_6  \
0  1.047198  1.065763  1.096623  1.148381  1.202581  1.259340  1.318778   
1  1.117011  1.006086  1.247713  1.393709  1.556788  1.738948  1.942424   
2  1.186824  0.695374  1.408551  1.671702  1.984016  2.354677  2.794587   
3  1.256637  0.949799  1.579137  1.984402  2.493673  3.133642  3.937850   
4  1.326450  1.063496  1.759470  2.333850  3.095735  4.106339  5.446854   

        x_7       x_8        x_9       x_10       x_11       x_12       x_13  \
0  1.381021  1.446202   1.514459   1.585938   1.660790   1.739176   1.821260   
1  2.169709  2.423588   2.707173   3.023942   3.377775   3.773011   4.214494   
2  3.316683  3.936319   4.671717   5.544505   6.580351   7.809718   9.268760   
3  4.948448  6.218404   7.814277   9.819710  12.339811  15.506664  19.486248   
4  7.224981  9.583578  12.712139  16.862020  22.366630  29.668222  39.353420   

        x_14       x_15  
0   1.907219   1.997235  
1   4.707635   5.258479  
2  11.000386  13.055521  
3  24.487142  30.771450  
4  52.200353  69.241170  

Fit a Linear regression model on the 15 powers:


In [6]:
#Initialize a dataframe to store the results:
col = ['rss','intercept'] + ['coef_x_%d'%i for i in range(1,16)]
ind = ['model_pow_%d'%i for i in range(1,16)]
coef_matrix_simple = pd.DataFrame(index=ind, columns=col)

#Define the powers for which a plot is required:
models_to_plot = {1:231,3:232,6:233,9:234,12:235,15:236}

#Iterate through all powers and assimilate results
for i in range(1,16):
    coef_matrix_simple.iloc[i-1,0:i+2] = linear_regression(data, power=i, models_to_plot=models_to_plot)



In [7]:
#Set the display format to be scientific for ease of analysis
pd.options.display.float_format = '{:,.2g}'.format
coef_matrix_simple


Out[7]:
rss intercept coef_x_1 coef_x_2 coef_x_3 coef_x_4 coef_x_5 coef_x_6 coef_x_7 coef_x_8 coef_x_9 coef_x_10 coef_x_11 coef_x_12 coef_x_13 coef_x_14 coef_x_15
model_pow_1 3.3 2 -0.62 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
model_pow_2 3.3 1.9 -0.58 -0.006 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
model_pow_3 1.1 -1.1 3 -1.3 0.14 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
model_pow_4 1.1 -0.27 1.7 -0.53 -0.036 0.014 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
model_pow_5 1 3 -5.1 4.7 -1.9 0.33 -0.021 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
model_pow_6 0.99 -2.8 9.5 -9.7 5.2 -1.6 0.23 -0.014 NaN NaN NaN NaN NaN NaN NaN NaN NaN
model_pow_7 0.93 19 -56 69 -45 17 -3.5 0.4 -0.019 NaN NaN NaN NaN NaN NaN NaN NaN
model_pow_8 0.92 43 -1.4e+02 1.8e+02 -1.3e+02 58 -15 2.4 -0.21 0.0077 NaN NaN NaN NaN NaN NaN NaN
model_pow_9 0.87 1.7e+02 -6.1e+02 9.6e+02 -8.5e+02 4.6e+02 -1.6e+02 37 -5.2 0.42 -0.015 NaN NaN NaN NaN NaN NaN
model_pow_10 0.87 1.4e+02 -4.9e+02 7.3e+02 -6e+02 2.9e+02 -87 15 -0.81 -0.14 0.026 -0.0013 NaN NaN NaN NaN NaN
model_pow_11 0.87 -75 5.1e+02 -1.3e+03 1.9e+03 -1.6e+03 9.1e+02 -3.5e+02 91 -16 1.8 -0.12 0.0034 NaN NaN NaN NaN
model_pow_12 0.87 -3.4e+02 1.9e+03 -4.4e+03 6e+03 -5.2e+03 3.1e+03 -1.3e+03 3.8e+02 -80 12 -1.1 0.062 -0.0016 NaN NaN NaN
model_pow_13 0.86 3.2e+03 -1.8e+04 4.5e+04 -6.7e+04 6.6e+04 -4.6e+04 2.3e+04 -8.5e+03 2.3e+03 -4.5e+02 62 -5.7 0.31 -0.0078 NaN NaN
model_pow_14 0.79 2.4e+04 -1.4e+05 3.8e+05 -6.1e+05 6.6e+05 -5e+05 2.8e+05 -1.2e+05 3.7e+04 -8.5e+03 1.5e+03 -1.8e+02 15 -0.73 0.017 NaN
model_pow_15 0.7 -3.6e+04 2.4e+05 -7.5e+05 1.4e+06 -1.7e+06 1.5e+06 -1e+06 5e+05 -1.9e+05 5.4e+04 -1.2e+04 1.9e+03 -2.2e+02 17 -0.81 0.018

Though RSS is going down, but the coefficients are increasing in magnitude.

Ridge Modeling:


In [8]:
from sklearn.linear_model import Ridge
def ridge_regression(data, predictors, alpha, models_to_plot={}):
    #Fit the model
    ridgereg = Ridge(alpha=alpha,normalize=True)
    ridgereg.fit(data[predictors],data['y'])
    y_pred = ridgereg.predict(data[predictors])
    
    #Check if a plot is to be made for the entered alpha
    if alpha in models_to_plot:
        plt.subplot(models_to_plot[alpha])
        plt.tight_layout()
        plt.plot(data['x'],y_pred)
        plt.plot(data['x'],data['y'],'.')
        plt.title('Plot for alpha: %.3g'%alpha)
    
    #Return the result in pre-defined format
    rss = sum((y_pred-data['y'])**2)
    ret = [rss]
    ret.extend([ridgereg.intercept_])
    ret.extend(ridgereg.coef_)
    return ret

In [9]:
# predictors=['x']
# predictors.extend(['x_%d'%i for i in range(2,16)])
# alp = 1e5
# print ridge_regression(data, predictors, alpha=alp, models_to_plot={alp:111})

In [10]:
predictors=['x']
    predictors.extend(['x_%d'%i for i in range(2,16)])

    alpha_ridge = [1e-15, 1e-10, 1e-8, 1e-4, 1e-3,1e-2, 1, 5, 10, 20]

    col = ['rss','intercept'] + ['coef_x_%d'%i for i in range(1,16)]
    ind = ['alpha_%.2g'%alpha_ridge[i] for i in range(0,10)]
    coef_matrix_ridge = pd.DataFrame(index=ind, columns=col)

    models_to_plot = {1e-15:231, 1e-10:232, 1e-4:233, 1e-3:234, 1e-2:235, 5:236}
    for i in range(10):
        coef_matrix_ridge.iloc[i,] = ridge_regression(data, predictors, alpha_ridge[i], models_to_plot)



In [11]:
#Set the display format to be scientific for ease of analysis
pd.options.display.float_format = '{:,.2g}'.format
coef_matrix_ridge


Out[11]:
rss intercept coef_x_1 coef_x_2 coef_x_3 coef_x_4 coef_x_5 coef_x_6 coef_x_7 coef_x_8 coef_x_9 coef_x_10 coef_x_11 coef_x_12 coef_x_13 coef_x_14 coef_x_15
alpha_1e-15 0.87 95 -3e+02 3.8e+02 -2.4e+02 66 0.96 -4.8 0.64 0.15 -0.026 -0.0054 0.00086 0.00022 -5.5e-05 3.9e-06 -6.9e-08
alpha_1e-10 0.92 11 -29 31 -15 2.9 0.17 -0.091 -0.011 0.002 0.00064 2.4e-05 -2e-05 -4.2e-06 2.2e-07 2.3e-07 -2.3e-08
alpha_1e-08 0.95 1.3 -1.5 1.7 -0.68 0.039 0.016 0.00016 -0.00036 -5.4e-05 -2.9e-07 1.1e-06 1.9e-07 2e-08 3.9e-09 8.2e-10 -4.6e-10
alpha_0.0001 0.96 0.56 0.55 -0.13 -0.026 -0.0028 -0.00011 4.1e-05 1.5e-05 3.7e-06 7.4e-07 1.3e-07 1.9e-08 1.9e-09 -1.3e-10 -1.5e-10 -6.2e-11
alpha_0.001 1 0.82 0.31 -0.087 -0.02 -0.0028 -0.00022 1.8e-05 1.2e-05 3.4e-06 7.3e-07 1.3e-07 1.9e-08 1.7e-09 -1.5e-10 -1.4e-10 -5.2e-11
alpha_0.01 1.4 1.3 -0.088 -0.052 -0.01 -0.0014 -0.00013 7.2e-07 4.1e-06 1.3e-06 3e-07 5.6e-08 9e-09 1.1e-09 4.3e-11 -3.1e-11 -1.5e-11
alpha_1 5.6 0.97 -0.14 -0.019 -0.003 -0.00047 -7e-05 -9.9e-06 -1.3e-06 -1.4e-07 -9.3e-09 1.3e-09 7.8e-10 2.4e-10 6.2e-11 1.4e-11 3.2e-12
alpha_5 14 0.55 -0.059 -0.0085 -0.0014 -0.00024 -4.1e-05 -6.9e-06 -1.1e-06 -1.9e-07 -3.1e-08 -5.1e-09 -8.2e-10 -1.3e-10 -2e-11 -3e-12 -4.2e-13
alpha_10 18 0.4 -0.037 -0.0055 -0.00095 -0.00017 -3e-05 -5.2e-06 -9.2e-07 -1.6e-07 -2.9e-08 -5.1e-09 -9.1e-10 -1.6e-10 -2.9e-11 -5.1e-12 -9.1e-13
alpha_20 23 0.28 -0.022 -0.0034 -0.0006 -0.00011 -2e-05 -3.6e-06 -6.6e-07 -1.2e-07 -2.2e-08 -4e-09 -7.5e-10 -1.4e-10 -2.5e-11 -4.7e-12 -8.7e-13

In [12]:
coef_matrix_ridge.apply(lambda x: sum(x.values==0),axis=1)


Out[12]:
alpha_1e-15     0
alpha_1e-10     0
alpha_1e-08     0
alpha_0.0001    0
alpha_0.001     0
alpha_0.01      0
alpha_1         0
alpha_5         0
alpha_10        0
alpha_20        0
dtype: int64

Lasso Modeling:


In [18]:
from sklearn.linear_model import Lasso
def lasso_regression(data, predictors, alpha, models_to_plot={}):
    #Fit the model
    lassoreg = Lasso(alpha=alpha,normalize=True, max_iter=1e6)
    lassoreg.fit(data[predictors],data['y'])
    y_pred = lassoreg.predict(data[predictors])
    
    #Check if a plot is to be made for the entered alpha
    if alpha in models_to_plot:
        plt.subplot(models_to_plot[alpha])
        plt.tight_layout()
        plt.plot(data['x'],y_pred)
        plt.plot(data['x'],data['y'],'.')
        plt.title('Plot for alpha: %.3g'%alpha)
    
    #Return the result in pre-defined format
    rss = sum((y_pred-data['y'])**2)
    ret = [rss]
    ret.extend([lassoreg.intercept_])
    ret.extend(lassoreg.coef_)
    return ret

In [20]:
predictors=['x']
predictors.extend(['x_%d'%i for i in range(2,16)])

alpha_lasso = [1e-15, 1e-10, 1e-8, 1e-5,1e-4, 1e-3,1e-2, 1, 5, 10]

col = ['rss','intercept'] + ['coef_x_%d'%i for i in range(1,16)]
ind = ['alpha_%.2g'%alpha_lasso[i] for i in range(0,10)]
coef_matrix_lasso = pd.DataFrame(index=ind, columns=col)

models_to_plot = {1e-10:231, 1e-5:232,1e-4:233, 1e-3:234, 1e-2:235, 1:236}
for i in range(10):
    coef_matrix_lasso.iloc[i,] = lasso_regression(data, predictors, alpha_lasso[i], models_to_plot)



In [21]:
#Set the display format to be scientific for ease of analysis
pd.options.display.float_format = '{:,.2g}'.format
coef_matrix_lasso


Out[21]:
rss intercept coef_x_1 coef_x_2 coef_x_3 coef_x_4 coef_x_5 coef_x_6 coef_x_7 coef_x_8 coef_x_9 coef_x_10 coef_x_11 coef_x_12 coef_x_13 coef_x_14 coef_x_15
alpha_1e-15 0.95 -0.033 1.4 -0.46 -0.027 0.01 0.0011 -0.00011 -5.2e-05 -7.5e-06 -9.8e-08 2.6e-07 8.6e-08 1.7e-08 2e-09 -1.8e-10 -2e-10
alpha_1e-10 0.95 -0.033 1.4 -0.46 -0.027 0.01 0.0011 -0.00011 -5.2e-05 -7.5e-06 -1e-07 2.6e-07 8.6e-08 1.7e-08 2e-09 -1.8e-10 -2e-10
alpha_1e-08 0.95 -0.024 1.4 -0.45 -0.028 0.0098 0.0011 -9.5e-05 -5.2e-05 -7.5e-06 -2.1e-07 2.6e-07 8.6e-08 1.7e-08 1.9e-09 -1.3e-10 -2.1e-10
alpha_1e-05 0.96 0.5 0.6 -0.13 -0.038 -0 0 0 0 7.7e-06 1e-06 7.7e-08 0 0 0 -0 -7e-11
alpha_0.0001 1 0.9 0.17 -0 -0.048 -0 -0 0 0 9.5e-06 5.1e-07 0 0 0 -0 -0 -4.4e-11
alpha_0.001 1.7 1.3 -0 -0.13 -0 -0 -0 0 0 0 0 0 1.5e-08 7.5e-10 0 0 0
alpha_0.01 3.6 1.8 -0.55 -0.00056 -0 -0 -0 -0 -0 -0 -0 0 0 0 0 0 0
alpha_1 37 0.038 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0
alpha_5 37 0.038 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0
alpha_10 37 0.038 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0

In [22]:
coef_matrix_lasso.apply(lambda x: sum(x.values==0),axis=1)


Out[22]:
alpha_1e-15      0
alpha_1e-10      0
alpha_1e-08      0
alpha_1e-05      8
alpha_0.0001    10
alpha_0.001     12
alpha_0.01      13
alpha_1         15
alpha_5         15
alpha_10        15
dtype: int64

Check correlation:


In [23]:
data.corr()


Out[23]:
x y x_2 x_3 x_4 x_5 x_6 x_7 x_8 x_9 x_10 x_11 x_12 x_13 x_14 x_15
x 1 -0.95 0.99 0.95 0.91 0.87 0.84 0.8 0.77 0.75 0.72 0.7 0.68 0.66 0.64 0.62
y -0.95 1 -0.94 -0.9 -0.85 -0.8 -0.76 -0.71 -0.67 -0.64 -0.61 -0.58 -0.56 -0.53 -0.52 -0.5
x_2 0.99 -0.94 1 0.99 0.97 0.94 0.91 0.88 0.86 0.83 0.81 0.79 0.77 0.75 0.73 0.71
x_3 0.95 -0.9 0.99 1 0.99 0.98 0.96 0.94 0.92 0.9 0.87 0.86 0.84 0.82 0.8 0.79
x_4 0.91 -0.85 0.97 0.99 1 1 0.98 0.97 0.95 0.94 0.92 0.9 0.89 0.87 0.86 0.84
x_5 0.87 -0.8 0.94 0.98 1 1 1 0.99 0.98 0.97 0.95 0.94 0.92 0.91 0.9 0.88
x_6 0.84 -0.76 0.91 0.96 0.98 1 1 1 0.99 0.98 0.97 0.96 0.95 0.94 0.93 0.91
x_7 0.8 -0.71 0.88 0.94 0.97 0.99 1 1 1 0.99 0.99 0.98 0.97 0.96 0.95 0.94
x_8 0.77 -0.67 0.86 0.92 0.95 0.98 0.99 1 1 1 0.99 0.99 0.98 0.97 0.97 0.96
x_9 0.75 -0.64 0.83 0.9 0.94 0.97 0.98 0.99 1 1 1 1 0.99 0.99 0.98 0.97
x_10 0.72 -0.61 0.81 0.87 0.92 0.95 0.97 0.99 0.99 1 1 1 1 0.99 0.99 0.98
x_11 0.7 -0.58 0.79 0.86 0.9 0.94 0.96 0.98 0.99 1 1 1 1 1 0.99 0.99
x_12 0.68 -0.56 0.77 0.84 0.89 0.92 0.95 0.97 0.98 0.99 1 1 1 1 1 0.99
x_13 0.66 -0.53 0.75 0.82 0.87 0.91 0.94 0.96 0.97 0.99 0.99 1 1 1 1 1
x_14 0.64 -0.52 0.73 0.8 0.86 0.9 0.93 0.95 0.97 0.98 0.99 0.99 1 1 1 1
x_15 0.62 -0.5 0.71 0.79 0.84 0.88 0.91 0.94 0.96 0.97 0.98 0.99 0.99 1 1 1

In [ ]: