Title: Effect Of Alpha On Lasso Regression
Slug: effect_of_alpha_on_lasso_regression
Summary: The effect of alpha values on lasso regression in Scikit-Learn
Date: 2016-11-01 12:00
Category: Machine Learning
Tags: Feature Selection
Authors: Chris Albon
Often we want conduct a process called regularization), wherein we penalize the number of features in a model in order to only keep the most important features. This can be particularly important when you have a dataset with 100,000+ features.
Lasso regression) is a common modeling technique to do regularization. The math behind it is pretty interesting, but practically, what you need to know is that Lasso regression comes with a parameter, alpha
, and the higher the alpha
, the most feature coefficients are zero.
That is, when alpha
is 0
, Lasso regression produces the same coefficients as a linear regression. When alpha
is very very large, all coefficients are zero.
In this tutorial, I run three lasso regressions, with varying levels of alpha, and show the resulting effect on the coefficients.
In [1]:
from sklearn.linear_model import Lasso
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_boston
import pandas as pd
In [2]:
boston = load_boston()
scaler = StandardScaler()
X = scaler.fit_transform(boston["data"])
Y = boston["target"]
names = boston["feature_names"]
In [3]:
# Create a function called lasso,
def lasso(alphas):
'''
Takes in a list of alphas. Outputs a dataframe containing the coefficients of lasso regressions from each alpha.
'''
# Create an empty data frame
df = pd.DataFrame()
# Create a column of feature names
df['Feature Name'] = names
# For each alpha value in the list of alpha values,
for alpha in alphas:
# Create a lasso regression with that alpha value,
lasso = Lasso(alpha=alpha)
# Fit the lasso regression
lasso.fit(X, Y)
# Create a column name for that alpha value
column_name = 'Alpha = %f' % alpha
# Create a column of coefficient values
df[column_name] = lasso.coef_
# Return the datafram
return df
In [4]:
# Run the function called, Lasso
lasso([.0001, .5, 10])
Out[4]:
Notice that as the alpha value increases, more features have a coefficient of 0.