Exercise 11

Car Price Prediction

Predict if the price of a car is low or high


In [9]:
%matplotlib inline
import pandas as pd

data = pd.read_csv('https://github.com/albahnsen/PracticalMachineLearningClass/raw/master/datasets/dataTrain_carListings.zip')
data = data.loc[data['Model'].str.contains('Camry')].drop(['Make', 'State'], axis=1)
data = data.join(pd.get_dummies(data['Model'], prefix='M'))
data['HighPrice'] = (data['Price'] > data['Price'].mean()).astype(int)
data = data.drop(['Model', 'Price'], axis=1)

data.head()


Out[9]:
Year Mileage M_Camry M_Camry4dr M_CamryBase M_CamryL M_CamryLE M_CamrySE M_CamryXLE HighPrice
15 2016 29242 0 0 0 0 1 0 0 1
47 2015 26465 0 0 0 0 1 0 0 1
85 2012 46739 0 1 0 0 0 0 0 1
141 2017 41722 0 0 0 0 0 1 0 1
226 2014 77669 0 0 0 0 0 0 1 0

In [12]:
data.shape


Out[12]:
(13150, 10)

In [10]:
y = data['HighPrice']
X = data.drop(['HighPrice'], axis=1)

In [11]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [ ]:

Exercise 11.1

Estimate a Decision Tree Classifier Manually using the code created in the Notebook #13

Evaluate the accuracy on the testing set


In [ ]:

Exercise 11.2

Estimate a Bagging of 10 Decision Tree Classifiers Manually using the code created in the Notebook #13

Evaluate the accuracy on the testing set


In [ ]:

Exercise 11.3

Implement the variable max_features on the Decision Tree Classifier created in 11.1.

Compare the impact in the results by varing the parameter max_features

Evaluate the accuracy on the testing set


In [ ]:

Exercise 11.4

Estimate a Bagging of 10 Decision Tree Classifiers with max_features = log(n_features)

Evaluate the accuracy on the testing set


In [ ]:

Exercise 11.5

Using sklearn, train a RandomForestClassifier

Evaluate the accuracy on the testing set


In [ ]:

Exercise 11.6

Find the best parameters of the RandomForestClassifier (max_depth, max_features, n_estimators)

Evaluate the accuracy on the testing set


In [ ]: