Over 370,000 used cars were scraped from Ebay-Kleinanzeigen. The content of the data is in German, so one has to translate it first to English. The data is available here The fields included in the file data/autos.csv are:
Goal
Given the characteristics/features of the car, the sale price of the car is to be predicted.
In [1]:
#import the required libraries
In [2]:
#Load the data
In [3]:
#Do basic sanity check.
#1. Look into the first few records
In [4]:
#2. What are the column names?
In [5]:
#3. What are the column types?
In [6]:
#4. Do label encoding
In [7]:
#5. Ideally, we would do some exploratory analysis.
#For practice, plot: year of registration vs price
In [51]:
#6.Build OLS Model - sklearn
In [9]:
#7. Report the diagnostics. And discuss the results
In [10]:
#8. Build L2 Regression using sklearn - linear_model.Ridge
In [11]:
#9. Try with different values of alpha (0.001, 0.01, 0.05, 0.1, 0,5)
10. The following code is from sklearn official documentation
# Author: Fabian Pedregosa -- <fabian.pedregosa@inria.fr>
# License: BSD 3 clause
print(__doc__)
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
# X is the 10x10 Hilbert matrix
X = 1. / (np.arange(1, 11) + np.arange(0, 10)[:, np.newaxis])
y = np.ones(10)
#compute paths
n_alphas = 200
alphas = np.logspace(-10, -2, n_alphas)
clf = linear_model.Ridge(fit_intercept=False)
coefs = []
for a in alphas:
clf.set_params(alpha=a)
clf.fit(X, y)
coefs.append(clf.coef_)
#Display results
ax = plt.gca()
ax.plot(alphas, coefs)
ax.set_xscale('log')
ax.set_xlim(ax.get_xlim()[::-1]) # reverse axis
plt.xlabel('alpha')
plt.ylabel('weights')
plt.title('Ridge coefficients as a function of the regularization')
plt.axis('tight')
plt.show()
Can you modify this code to plot for 10 values of alpha and see how weights get changed ?
In [ ]:
In [12]:
#11. Build a L1 Linear Model
In [13]:
#12. Feature selection from L1 Linear Model
In [14]:
#13. Find the generalization error.
#Split dataset into two: Train and Test : 80% and 20%
In [15]:
#14. Build L2 Regularization model on train and predict on test
In [16]:
#15. Report the RMSE