This exercise intends to compare the performance of OLS, Ridge and Lasso from the perspective of beta profile.



In [16]:

    
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt
%pylab inline









    



Populating the interactive namespace from numpy and matplotlib



In [17]:

    
N = 10000 # number of samples
p = 100 # number of predictors



In [31]:

    
# Generate data samples X according to normal distribution, note they are correlated
rho = 0.9
SigmaX = np.empty([p,p])
SigmaX.fill(rho)
SigmaX = SigmaX + np.identity(p) * (1-rho)
muX = np.zeros(p)
X = np.random.multivariate_normal(muX, SigmaX, N)



In [21]:

    
X.shape









    Out[21]:





(10000, 100)



In [32]:

    
# True value of beta
true_beta = np.zeros([p,1])
true_beta[p/2:p, 0] = 1
plt.plot(true_beta)
plt.ylabel("true_beta")









    Out[32]:





<matplotlib.text.Text at 0x995ff60>



In [33]:

    
#Generate noise and observations y
noise_std = 1
noise = np.random.normal(0, noise_std, [N,1])
y = np.dot(X, true_beta) + noise



In [34]:

    
lr = linear_model.LinearRegression()



In [35]:

    
#OLS
lr.fit(X,y)
beta_OLS = np.squeeze(lr.coef_)
plt.plot(beta_OLS)
plt.ylabel("beta_OLS")









    Out[35]:





<matplotlib.text.Text at 0x9b5f208>



In [30]:

    
np.squeeze(lr.coef_).shape









    Out[30]:





(100,)



In [15]:

    
lr.intercept_









    Out[15]:





array([ 0.00046983])



In [ ]:



In [11]:

    
#Ridge
ridge = linear_model.Ridge(alpha = 0.1, fit_intercept = False, normalize = True)
ridge.fit(X,y)
beta_ridge = np.squeeze(ridge.coef_)
plt.plot(beta_ridge)
plt.ylabel("beta_ridge")









    Out[11]:





<matplotlib.text.Text at 0x8471d30>

This is a bit unexpected. Why Ridge gives similar result as OLS? Needs some investigation



In [12]:

    
#Lasso
lasso = linear_model.Lasso(alpha = 0.1, fit_intercept = False, normalize = True, max_iter = 50000)



In [13]:

    
lasso.fit(X,y)
beta_lasso = np.squeeze(lasso.coef_)
plt.plot(beta_lasso)
plt.ylabel("Beta_Lasso")









    Out[13]:





<matplotlib.text.Text at 0x84d5978>