This Notebook will go through multiple models (KNN, Logistic Regression, Decision Trees, Support Vector Machines and Random Forest) to assess the best one.


In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
%matplotlib inline

In [5]:
df = pd.read_csv("car.csv")
df.head()


Out[5]:
buying maint doors persons lug_boot safety acceptability
0 vhigh vhigh 2 2 small low unacc
1 vhigh vhigh 2 2 small med unacc
2 vhigh vhigh 2 2 small high unacc
3 vhigh vhigh 2 2 med low unacc
4 vhigh vhigh 2 2 med med unacc

Checking the unique values for each of the columns.


In [6]:
print df.buying.unique()

print df.maint.unique()

print df.doors.unique()

print df.persons.unique()

print df.lug_boot.unique()

print df.safety.unique()

print df.acceptability.unique()


['vhigh' 'high' 'med' 'low']
['vhigh' 'high' 'med' 'low']
['2' '3' '4' '5more']
['2' '4' 'more']
['small' 'med' 'big']
['low' 'med' 'high']
['unacc' 'acc' 'vgood' 'good']

Using the information in the cell above, maps will be used to create a scale.


In [7]:
map1 = {'low':1,
        'med':2,
        'high':3,
        'vhigh':4}

map2 = {'small':1,
        'med':2,
        'big':3}

map3 = {'unacc':1,
        'acc':2,
        'good':3,
        'vgood':4}

map4 = {'2': 2,
        '4': 4,
        'more': 5}

map5 = {'2': 2,
        '3': 3,
        '4': 4,
        '5more': 5}

Splitting up the needed features from my target which is acceptability.


In [9]:
features = [c for c in df.columns if c != 'acceptability']
#removing 'acceptability'

df1 = df.copy()

df1.buying= df.buying.map(map1)

df1.maint= df.maint.map(map1)

df1.doors = df.doors.map(map5)

df1.persons = df.persons.map(map4)

df1.lug_boot = df.lug_boot.map(map2)

df1.safety = df.safety.map(map1)

df1.acceptability = df.acceptability.map(map3)

X = df1[features]
y = df1['acceptability']
X.head(10)
#making sure it worked


Out[9]:
buying maint doors persons lug_boot safety
0 4 4 2 2 1 1
1 4 4 2 2 1 2
2 4 4 2 2 1 3
3 4 4 2 2 2 1
4 4 4 2 2 2 2
5 4 4 2 2 2 3
6 4 4 2 2 3 1
7 4 4 2 2 3 2
8 4 4 2 2 3 3
9 4 4 2 4 1 1

Train test split and creating a function to evaluate the models being created next.


In [13]:
from sklearn.cross_validation import train_test_split, KFold
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix, classification_report

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

def evaluate_model(model):
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    
    a = accuracy_score(y_test, y_pred)
    
    cm = confusion_matrix(y_test, y_pred)
    cr = classification_report(y_test, y_pred)
    
    print cm
    print cr
    
    return a

various_models = {}


//anaconda/lib/python2.7/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)

KNN Classifier


In [14]:
from sklearn.neighbors import KNeighborsClassifier

a = evaluate_model(KNeighborsClassifier())


[[354   9   0   0]
 [  8 107   0   0]
 [  0   9  11   1]
 [  0   2   0  18]]
             precision    recall  f1-score   support

          1       0.98      0.98      0.98       363
          2       0.84      0.93      0.88       115
          3       1.00      0.52      0.69        21
          4       0.95      0.90      0.92        20

avg / total       0.95      0.94      0.94       519


In [15]:
from sklearn.grid_search import GridSearchCV

params = {'n_neighbors': range(2,60)}

gsknn = GridSearchCV(KNeighborsClassifier(),
                     params, n_jobs=-1,
                     cv=KFold(len(y), n_folds=3, shuffle=True))


//anaconda/lib/python2.7/site-packages/sklearn/grid_search.py:43: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)

In [16]:
gsknn.fit(X, y)


Out[16]:
GridSearchCV(cv=sklearn.cross_validation.KFold(n=1728, n_folds=3, shuffle=True, random_state=None),
       error_score='raise',
       estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform'),
       fit_params={}, iid=True, n_jobs=-1,
       param_grid={'n_neighbors': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59]},
       pre_dispatch='2*n_jobs', refit=True, scoring=None, verbose=0)

In [17]:
gsknn.best_params_


Out[17]:
{'n_neighbors': 5}

In [18]:
gsknn.best_score_


Out[18]:
0.9537037037037037

In [19]:
evaluate_model(gsknn.best_estimator_)


[[354   9   0   0]
 [  8 107   0   0]
 [  0   9  11   1]
 [  0   2   0  18]]
             precision    recall  f1-score   support

          1       0.98      0.98      0.98       363
          2       0.84      0.93      0.88       115
          3       1.00      0.52      0.69        21
          4       0.95      0.90      0.92        20

avg / total       0.95      0.94      0.94       519

Out[19]:
0.94412331406551064

In [20]:
various_models['knn'] = {'model': gsknn.best_estimator_,
                     'score': a}

Bagging KNN Classifier. Resulted in a small decrease in the score (from .944 to .940).


In [21]:
from sklearn.ensemble import BaggingClassifier
baggingknn = BaggingClassifier(KNeighborsClassifier())

In [22]:
evaluate_model(baggingknn)


[[351  12   0   0]
 [  6 107   2   0]
 [  0   6  14   1]
 [  0   4   0  16]]
             precision    recall  f1-score   support

          1       0.98      0.97      0.97       363
          2       0.83      0.93      0.88       115
          3       0.88      0.67      0.76        21
          4       0.94      0.80      0.86        20

avg / total       0.94      0.94      0.94       519

Out[22]:
0.94026974951830444

In [23]:
bagging_params = {'n_estimators': [10, 20],
                  'max_samples': [0.7, 1.0],
                  'max_features': [0.7, 1.0],
                  'bootstrap_features': [True, False]}


gsbaggingknn = GridSearchCV(baggingknn,
                            bagging_params, n_jobs=-1,
                            cv=KFold(len(y), n_folds=3, shuffle=True))

In [24]:
gsbaggingknn.fit(X, y)


//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
Out[24]:
GridSearchCV(cv=sklearn.cross_validation.KFold(n=1728, n_folds=3, shuffle=True, random_state=None),
       error_score='raise',
       estimator=BaggingClassifier(base_estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform'),
         bootstrap=True, bootstrap_features=False, max_features=1.0,
         max_samples=1.0, n_estimators=10, n_jobs=1, oob_score=False,
         random_state=None, verbose=0, warm_start=False),
       fit_params={}, iid=True, n_jobs=-1,
       param_grid={'n_estimators': [10, 20], 'max_samples': [0.7, 1.0], 'bootstrap_features': [True, False], 'max_features': [0.7, 1.0]},
       pre_dispatch='2*n_jobs', refit=True, scoring=None, verbose=0)

In [25]:
gsbaggingknn.best_params_


Out[25]:
{'bootstrap_features': False,
 'max_features': 1.0,
 'max_samples': 1.0,
 'n_estimators': 20}

In [26]:
various_models['gsbaggingknn'] = {'model': gsbaggingknn.best_estimator_,
                     'score': evaluate_model(gsbaggingknn.best_estimator_)}


[[356   7   0   0]
 [  6 109   0   0]
 [  0   9  12   0]
 [  0   3   0  17]]
             precision    recall  f1-score   support

          1       0.98      0.98      0.98       363
          2       0.85      0.95      0.90       115
          3       1.00      0.57      0.73        21
          4       1.00      0.85      0.92        20

avg / total       0.96      0.95      0.95       519

Now moving onto Logistic Regression.


In [27]:
from sklearn.linear_model import LogisticRegression

lr = LogisticRegression()
various_models['lr'] = {'model': lr,
                    'score': evaluate_model(lr)}


[[347  11   4   1]
 [ 59  53   3   0]
 [  5  15   1   0]
 [  0  19   0   1]]
             precision    recall  f1-score   support

          1       0.84      0.96      0.90       363
          2       0.54      0.46      0.50       115
          3       0.12      0.05      0.07        21
          4       0.50      0.05      0.09        20

avg / total       0.73      0.77      0.74       519


In [28]:
params = {'C': [0.001, 0.01, 0.1, 1.0, 10.0, 100.0],
          'penalty': ['l1', 'l2']}

gslr = GridSearchCV(lr,
                    params, n_jobs=-1,
                    cv=KFold(len(y), n_folds=3, shuffle=True))

gslr.fit(X, y)

print gslr.best_params_
print gslr.best_score_

various_models['gslr'] = {'model': gslr.best_estimator_,
                             'score': evaluate_model(gslr.best_estimator_)}


{'penalty': 'l1', 'C': 10.0}
0.827546296296
[[344  14   4   1]
 [ 48  64   3   0]
 [  4  14   2   1]
 [  0   9   0  11]]
             precision    recall  f1-score   support

          1       0.87      0.95      0.91       363
          2       0.63      0.56      0.59       115
          3       0.22      0.10      0.13        21
          4       0.85      0.55      0.67        20

avg / total       0.79      0.81      0.80       519


In [48]:
gsbagginglr = GridSearchCV(BaggingClassifier(gslr.best_estimator_),
                           bagging_params, n_jobs=-1,
                           cv=KFold(len(y), n_folds=3, shuffle=True))

gsbagginglr.fit(X, y)

print gsbagginglr.best_params_
print gsbagginglr.best_score_

various_models['gsbagginglr'] = {'model': gsbagginglr.best_estimator_,
                             'score': evaluate_model(gsbagginglr.best_estimator_)}


//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
{'max_features': 1.0, 'max_samples': 1.0, 'n_estimators': 20, 'bootstrap_features': False}
0.829282407407
[[344  14   4   1]
 [ 47  65   3   0]
 [  4  12   4   1]
 [  0   9   0  11]]
             precision    recall  f1-score   support

          1       0.87      0.95      0.91       363
          2       0.65      0.57      0.60       115
          3       0.36      0.19      0.25        21
          4       0.85      0.55      0.67        20

avg / total       0.80      0.82      0.80       519

Decision Trees are next.


In [ ]:
from sklearn.tree import DecisionTreeClassifier

dt = DecisionTreeClassifier()
various_models['dt'] = {'model': dt,
                    'score': evaluate_model(dt)}

In [ ]:
params = {'criterion': ['gini', 'entropy'],
          'splitter': ['best', 'random'],
          'max_depth': [None, 5, 10],
          'min_samples_split': [2, 5],
          'min_samples_leaf': [1, 2, 3]}

gsdt = GridSearchCV(dt,
                    params, n_jobs=-1,
                    cv=KFold(len(y), n_folds=3, shuffle=True))

gsdt.fit(X, y)
print gsdt.best_params_
print gsdt.best_score_

various_models['gsdt'] = {'model': gsdt.best_estimator_,
                      'score': evaluate_model(gsdt.best_estimator_)}

In [ ]:
gsbaggingdt = GridSearchCV(BaggingClassifier(gsdt.best_estimator_),
                           bagging_params, n_jobs=-1,
                           cv=KFold(len(y), n_folds=3, shuffle=True))

gsbaggingdt.fit(X, y)

print gsbaggingdt.best_params_
print gsbaggingdt.best_score_

various_models['gsbaggingdt'] = {'model': gsbaggingdt.best_estimator_,
                             'score': evaluate_model(gsbaggingdt.best_estimator_)}

On to Support Vector Machines.


In [30]:
from sklearn.svm import SVC

svm = SVC()
various_models['svm'] = {'model': svm,
                     'score': evaluate_model(svm)}


[[352  11   0   0]
 [  4 110   1   0]
 [  0   5  14   2]
 [  0   1   0  19]]
             precision    recall  f1-score   support

          1       0.99      0.97      0.98       363
          2       0.87      0.96      0.91       115
          3       0.93      0.67      0.78        21
          4       0.90      0.95      0.93        20

avg / total       0.96      0.95      0.95       519


In [31]:
params = {'C': [0.01, 0.1, 1.0, 10.0, 30.0, 100.0],
          'gamma': ['auto', 0.1, 1.0, 10.0],
          'kernel': ['linear', 'rbf']}


gssvm = GridSearchCV(svm,
                    params, n_jobs=-1,
                    cv=KFold(len(y), n_folds=3, shuffle=True))

gssvm.fit(X, y)
print gssvm.best_params_
print gssvm.best_score_

various_models['gssvm'] = {'model': gssvm.best_estimator_,
                      'score': evaluate_model(gssvm.best_estimator_)}


{'kernel': 'rbf', 'C': 30.0, 'gamma': 'auto'}
0.988425925926
[[363   0   0   0]
 [  4 111   0   0]
 [  0   2  19   0]
 [  0   0   0  20]]
             precision    recall  f1-score   support

          1       0.99      1.00      0.99       363
          2       0.98      0.97      0.97       115
          3       1.00      0.90      0.95        21
          4       1.00      1.00      1.00        20

avg / total       0.99      0.99      0.99       519


In [32]:
gsbaggingsvm = GridSearchCV(BaggingClassifier(gssvm.best_estimator_),
                           bagging_params, n_jobs=-1,
                           cv=KFold(len(y), n_folds=3, shuffle=True))

gsbaggingsvm.fit(X, y)

print gsbaggingsvm.best_params_
print gsbaggingsvm.best_score_

various_models['gsbaggingsvm'] = {'model': gsbaggingsvm.best_estimator_,
                             'score': evaluate_model(gsbaggingsvm.best_estimator_)}


//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
{'max_features': 1.0, 'max_samples': 1.0, 'n_estimators': 20, 'bootstrap_features': False}
0.981481481481
[[363   0   0   0]
 [  4 111   0   0]
 [  0   1  19   1]
 [  0   0   0  20]]
             precision    recall  f1-score   support

          1       0.99      1.00      0.99       363
          2       0.99      0.97      0.98       115
          3       1.00      0.90      0.95        21
          4       0.95      1.00      0.98        20

avg / total       0.99      0.99      0.99       519

Random Forests and Extra Trees are up next.


In [44]:
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier

rf = RandomForestClassifier()
various_models['rf'] = {'model': rf,
                    'score': evaluate_model(rf)}



et = ExtraTreesClassifier()
various_models['et'] = {'model': et,
                    'score': evaluate_model(et)}


[[357   6   0   0]
 [  4 111   0   0]
 [  0   8  13   0]
 [  0   5   0  15]]
             precision    recall  f1-score   support

          1       0.99      0.98      0.99       363
          2       0.85      0.97      0.91       115
          3       1.00      0.62      0.76        21
          4       1.00      0.75      0.86        20

avg / total       0.96      0.96      0.95       519

[[357   5   1   0]
 [ 11 104   0   0]
 [  0   4  16   1]
 [  0   3   0  17]]
             precision    recall  f1-score   support

          1       0.97      0.98      0.98       363
          2       0.90      0.90      0.90       115
          3       0.94      0.76      0.84        21
          4       0.94      0.85      0.89        20

avg / total       0.95      0.95      0.95       519


In [45]:
params = {'n_estimators':[3, 5, 10, 50],
          'criterion': ['gini', 'entropy'],
          'max_depth': [None, 3, 5],
          'min_samples_split': [2,5],
          'class_weight':[None, 'balanced']}


gsrf = GridSearchCV(RandomForestClassifier(n_jobs=-1),
                    params, n_jobs=-1,
                    cv=KFold(len(y), n_folds=3, shuffle=True))

gsrf.fit(X, y)
print gsrf.best_params_
print gsrf.best_score_

various_models['gsrf'] = {'model': gsrf.best_estimator_,
                      'score': evaluate_model(gsrf.best_estimator_)}


{'min_samples_split': 2, 'n_estimators': 50, 'criterion': 'entropy', 'max_depth': None, 'class_weight': None}
0.974537037037
[[357   6   0   0]
 [  4 109   0   2]
 [  0   4  16   1]
 [  0   1   0  19]]
             precision    recall  f1-score   support

          1       0.99      0.98      0.99       363
          2       0.91      0.95      0.93       115
          3       1.00      0.76      0.86        21
          4       0.86      0.95      0.90        20

avg / total       0.97      0.97      0.97       519


In [47]:
gset = GridSearchCV(ExtraTreesClassifier(n_jobs=-1),
                    params, n_jobs=-1,
                    cv=KFold(len(y), n_folds=3, shuffle=True))

gset.fit(X, y)
print gset.best_params_
print gset.best_score_

various_models['gset'] = {'model': gset.best_estimator_,
                      'score': evaluate_model(gset.best_estimator_)}


{'min_samples_split': 5, 'n_estimators': 50, 'criterion': 'entropy', 'max_depth': None, 'class_weight': None}
0.97337962963
[[357   5   1   0]
 [  4 109   1   1]
 [  0   6  13   2]
 [  0   1   0  19]]
             precision    recall  f1-score   support

          1       0.99      0.98      0.99       363
          2       0.90      0.95      0.92       115
          3       0.87      0.62      0.72        21
          4       0.86      0.95      0.90        20

avg / total       0.96      0.96      0.96       519

Creating a dataframe to compare the models.


In [50]:
scores = pd.DataFrame([(k, v['score']) for k, v in various_models.iteritems()],
             columns=['model', 'score']).set_index('model').sort_values('score', ascending=False)

plt.style.use('fivethirtyeight')
scores.plot(kind='bar')
plt.ylim(0.5, 1.05)

scores


Out[50]:
score
model
gsbaggingsvm 0.988439
gssvm 0.988439
gsrf 0.965318
gset 0.959538
rf 0.955684
svm 0.953757
et 0.951830
gsbaggingknn 0.951830
knn 0.944123
gsbagginglr 0.816956
gslr 0.811175
lr 0.774566

Both gridsearch bagging SVM and gridsearch SVM were identical in the above modeling process.


In [51]:
#Repeating the tests on my various models
from sklearn.cross_validation import cross_val_score, StratifiedKFold

def retest(model):
    scores = cross_val_score(model, X, y,
                             cv=StratifiedKFold(y, shuffle=True),
                             n_jobs=-1)
    m = scores.mean()
    s = scores.std()
    
    return m, s

for k, v in various_models.iteritems():
    cvres = retest(v['model'])
    print k, 
    various_models[k]['cvres'] = cvres


knn
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
 gsbagginglr gsrf svm et
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
 gsbaggingsvm gslr rf lr gset gssvm
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py:540: UserWarning: Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1
  **self._backend_args)
 gsbaggingknn

In [52]:
cvscores = pd.DataFrame([(k, v['cvres'][0], v['cvres'][1] ) for k, v in various_models.iteritems()],
                        columns=['model', 'score', 'error']).set_index('model').sort_values('score', ascending=False)



fig, ax = plt.subplots()
rects1 = ax.bar(range(len(cvscores)), cvscores.score,
                yerr=cvscores.error,
                tick_label=cvscores.index)

plt.style.use('fivethirtyeight')
ax.set_ylabel('Scores')
plt.xticks(rotation=70)
plt.ylim(0.5, 1.05)

cvscores


Out[52]:
score error
model
gsbaggingsvm 0.983221 0.002931
gssvm 0.982070 0.007246
gsrf 0.976858 0.007782
gset 0.970486 0.002456
rf 0.967591 0.004343
gsbaggingknn 0.962387 0.004952
svm 0.955445 0.003505
et 0.954871 0.007436
knn 0.947919 0.002797
gslr 0.831575 0.016026
gsbagginglr 0.827558 0.010062
lr 0.804410 0.010016

The top 7 listed above were very close to each other in their scores; with Support Vector Machines with gridsearch and bagging besting gridsearch SVM by a miniscule amount, .983 to .982.

This lab was extensive with all the different models used; it was extremely helpful in giving me a deeper understanding of the modeling process. For this reason it is one of my favorite labs from our course.