Exercise 11

Car Price Prediction

Predict if the price of a car is low or high



In [9]:

    
%matplotlib inline
import pandas as pd

data = pd.read_csv('https://github.com/albahnsen/PracticalMachineLearningClass/raw/master/datasets/dataTrain_carListings.zip')
data = data.loc[data['Model'].str.contains('Camry')].drop(['Make', 'State'], axis=1)
data = data.join(pd.get_dummies(data['Model'], prefix='M'))
data['HighPrice'] = (data['Price'] > data['Price'].mean()).astype(int)
data = data.drop(['Model', 'Price'], axis=1)

data.head()









    Out[9]:







  
    
      
      Year
      Mileage
      M_Camry
      M_Camry4dr
      M_CamryBase
      M_CamryL
      M_CamryLE
      M_CamrySE
      M_CamryXLE
      HighPrice
    
  
  
    
      15
      2016
      29242
      0
      0
      0
      0
      1
      0
      0
      1
    
    
      47
      2015
      26465
      0
      0
      0
      0
      1
      0
      0
      1
    
    
      85
      2012
      46739
      0
      1
      0
      0
      0
      0
      0
      1
    
    
      141
      2017
      41722
      0
      0
      0
      0
      0
      1
      0
      1
    
    
      226
      2014
      77669
      0
      0
      0
      0
      0
      0
      1
      0



In [12]:

    
data.shape









    Out[12]:





(13150, 10)



In [10]:

    
y = data['HighPrice']
X = data.drop(['HighPrice'], axis=1)



In [11]:

    
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)



In [ ]:

Exercise 11.1

Estimate a Decision Tree Classifier Manually using the code created in the Notebook #13

Evaluate the accuracy on the testing set



In [ ]:

Exercise 11.2

Estimate a Bagging of 10 Decision Tree Classifiers Manually using the code created in the Notebook #13

Evaluate the accuracy on the testing set



In [ ]:

Exercise 11.3

Implement the variable max_features on the Decision Tree Classifier created in 11.1.

Compare the impact in the results by varing the parameter max_features

Evaluate the accuracy on the testing set



In [ ]:

Exercise 11.4

Estimate a Bagging of 10 Decision Tree Classifiers with max_features = log(n_features)

Evaluate the accuracy on the testing set



In [ ]:

Exercise 11.5

Using sklearn, train a RandomForestClassifier

Evaluate the accuracy on the testing set



In [ ]:

Exercise 11.6

Find the best parameters of the RandomForestClassifier (max_depth, max_features, n_estimators)

Evaluate the accuracy on the testing set



In [ ]:

	Year	Mileage	M_Camry4dr	M_CamryLE	M_CamrySE	M_CamryXLE	HighPrice
15	2016	29242	0	1	0	0	1
47	2015	26465	0	1	0	0	1
85	2012	46739	1	0	0	0	1
141	2017	41722	0	0	1	0	1
226	2014	77669	0	0	0	1	0