This is an introduction to the babysaver
module written by Team Babies as part of DSSG 2015 for our project with the Illinois Department of Human Services (IDHS).
First, we're going to need to establish a connection with our PostgreSQL server and insert into our system path the location of our babysaver
module.
In [1]:
import pandas as pd
import numpy as np
import psycopg2
from sqlalchemy import create_engine
import json
with open('/mnt/data/predicting-adverse-births/passwords/psql_psycopg2.password', 'r') as f:
params = json.load(f)
try:
conn = psycopg2.connect(**params)
conn.autocommit
cur = conn.cursor()
except:
print('Unable to connect to database')
with open('/mnt/data/predicting-adverse-births/passwords/psql_engine.password', 'r') as f:
engine = create_engine(f.read())
babysaver_parent = '/mnt/data/predicting-adverse-births/babies/' # clone the babies repo
import sys
sys.path.insert(0, babysaver_parent)
from babysaver import features
from babysaver import models
from babysaver.models import WeightedQuestions
from babysaver import evaluation
The next step is to gather our data. We have a data configuration file that allows you to specify which questions from which assessment during which time frame from which populations in addition to which additional features and which outcome you would like to extract from our database. We also have a config_writer()
function that allows you to write a dictionary of values to a config file to avoid entering values in a spreadsheet. The data_getter()
function also has the option to create all two-way interaction terms and carry out basic imputation strategies (such as impute all missing question values with 0).
In [2]:
config_add1 = {
'Features': None,
'Include 707G?': 'Y',
'707G Questions': range(35,52),
'707G Start Date': '2014-07-01',
'707G End Date': None,
'Include 711?': 'N',
'711 Questions': [],
'711 Start Date': None,
'711 End Date': None,
'Include FCM?': 'Y',
'Include BBO?': 'Y',
'Include other?': 'Y',
'Outcome': 'ADVB1_OTC'
}
features.config_writer(config_add1, '/home/ipan/configs/config_add1.csv')
data_dct = features.data_getter('/home/ipan/configs/config_add1.csv',
conn=conn,
unique_identifier='UNI_PART_ID_I',
impute='fill_mode',
interactions=False)
data_getter: there are no continuous values to standardize
data_getter: dataset has dimensions (6457, 22)
data_getter()
returns a dictionary that includes the resulting dataframe.
It also contains the path to the config file used to generate the dataset, as well as the list of features, the outcome, the unique identifier column, holdout dataset if specified and the date column (deprecated).
In [4]:
data_dct.keys()
Out[4]:
['config_file',
'features',
'dataframe',
'date',
'holdout',
'outcome',
'unique_id']
In [5]:
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.naive_bayes import GaussianNB, BernoulliNB
from sklearn.lda import LDA
from sklearn.qda import QDA
from sklearn.svm import SVC
logit_lib = {'clf': LogisticRegression,
'param_dict': {'C': [1e-4, 1e-3, 0.01, 0.1, 1, 10, 1e3, 1e4, 1e20],
'penalty': ['l1', 'l2'],
'class_weight': [None, 'auto']
}
}
rf_lib = {'clf': RandomForestClassifier,
'param_dict': {'n_estimators': [100],
'max_depth': [None, 2, 5, 10, 20, 50],
'max_features': [None, 'sqrt', 'log2'],
'min_samples_split': [2, 5, 10],
'n_jobs': [-1],
'criterion': ['entropy', 'gini']
}
}
adaboost_lib = {'clf': AdaBoostClassifier,
'param_dict': {'n_estimators': [100],
'learning_rate': [0.1, 0.5, 1, 2, 5]
}
}
gnb_lib = {'clf': GaussianNB,
'param_dict': {}
}
bnb_lib = {'clf': BernoulliNB,
'param_dict': {}
}
lda_lib = {'clf': LDA,
'param_dict': {}
}
qda_lib = {'clf': QDA,
'param_dict': {}
}
Now, we can pass a list of these libraries and the data_dct
from data_getter()
to the machine_learner()
function to start training our classifiers. We can also specify whether we want to print evaluation sheets (make_evals=True
) but this will increase the runtime. Models will be pickled in the specified directory (or ./pickles/
by default -- note that if you do not specify the folder it will prompt you to confirm the folder, so if you want to run this through an automated script you should specify a different folder name). There are different cross-validation schemes as well but kfold_cv
is the only one that has been fully tested. This function will return a dictionary of dataframes, each of which is a list of metrics for each classifier, as well as a dictionary for the pickle file name of each classifier. See help(models.machine_learner)
for more info.
In [6]:
eval_dct, pkl_dct = models.machine_learner(data_dct, clf_library=[logit_lib, rf_lib, adaboost_lib,
gnb_lib, bnb_lib, lda_lib, qda_lib],
cv='kfold_cv', verbose=True, n_folds=10, pkl_folder='yay_pkls',
k=[0.05, 0.1, 0.15, 0.2, 0.25, 0.3])
Running LogisticRegression(C=0.0001, class_weight=None, dual=False,
fit_intercept=True, intercept_scaling=1, max_iter=100,
multi_class='ovr', penalty='l1', random_state=None,
solver='liblinear', tol=0.0001, verbose=0)
...
Finished in: 0:00:00.320212
Running LogisticRegression(C=0.0001, class_weight='auto', dual=False,
fit_intercept=True, intercept_scaling=1, max_iter=100,
multi_class='ovr', penalty='l1', random_state=None,
solver='liblinear', tol=0.0001, verbose=0)
...
Finished in: 0:00:00.320910
Running LogisticRegression(C=0.001, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr',
penalty='l1', random_state=None, solver='liblinear', tol=0.0001,
verbose=0)
...
Finished in: 0:00:00.328651
Running LogisticRegression(C=0.001, class_weight='auto', dual=False,
fit_intercept=True, intercept_scaling=1, max_iter=100,
multi_class='ovr', penalty='l1', random_state=None,
solver='liblinear', tol=0.0001, verbose=0)
...
Finished in: 0:00:00.316168
Running LogisticRegression(C=0.01, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr',
penalty='l1', random_state=None, solver='liblinear', tol=0.0001,
verbose=0)
...
Finished in: 0:00:00.330891
Running LogisticRegression(C=0.01, class_weight='auto', dual=False,
fit_intercept=True, intercept_scaling=1, max_iter=100,
multi_class='ovr', penalty='l1', random_state=None,
solver='liblinear', tol=0.0001, verbose=0)
...
Finished in: 0:00:00.318869
Running LogisticRegression(C=0.1, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr',
penalty='l1', random_state=None, solver='liblinear', tol=0.0001,
verbose=0)
...
Finished in: 0:00:00.382091
Running LogisticRegression(C=0.1, class_weight='auto', dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr',
penalty='l1', random_state=None, solver='liblinear', tol=0.0001,
verbose=0)
...
Finished in: 0:00:00.382712
Running LogisticRegression(C=1, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr',
penalty='l1', random_state=None, solver='liblinear', tol=0.0001,
verbose=0)
...
Finished in: 0:00:00.396965
Running LogisticRegression(C=1, class_weight='auto', dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr',
penalty='l1', random_state=None, solver='liblinear', tol=0.0001,
verbose=0)
...
Finished in: 0:00:00.402331
Running LogisticRegression(C=10, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr',
penalty='l1', random_state=None, solver='liblinear', tol=0.0001,
verbose=0)
...
Finished in: 0:00:00.402694
Running LogisticRegression(C=10, class_weight='auto', dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr',
penalty='l1', random_state=None, solver='liblinear', tol=0.0001,
verbose=0)
...
Finished in: 0:00:00.401536
Running LogisticRegression(C=1000.0, class_weight=None, dual=False,
fit_intercept=True, intercept_scaling=1, max_iter=100,
multi_class='ovr', penalty='l1', random_state=None,
solver='liblinear', tol=0.0001, verbose=0)
...
Finished in: 0:00:00.402182
Running LogisticRegression(C=1000.0, class_weight='auto', dual=False,
fit_intercept=True, intercept_scaling=1, max_iter=100,
multi_class='ovr', penalty='l1', random_state=None,
solver='liblinear', tol=0.0001, verbose=0)
...
Finished in: 0:00:00.400726
Running LogisticRegression(C=10000.0, class_weight=None, dual=False,
fit_intercept=True, intercept_scaling=1, max_iter=100,
multi_class='ovr', penalty='l1', random_state=None,
solver='liblinear', tol=0.0001, verbose=0)
...
Finished in: 0:00:00.399456
Running LogisticRegression(C=10000.0, class_weight='auto', dual=False,
fit_intercept=True, intercept_scaling=1, max_iter=100,
multi_class='ovr', penalty='l1', random_state=None,
solver='liblinear', tol=0.0001, verbose=0)
...
Finished in: 0:00:00.404000
Running LogisticRegression(C=1e+20, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr',
penalty='l1', random_state=None, solver='liblinear', tol=0.0001,
verbose=0)
...
Finished in: 0:00:00.403667
Running LogisticRegression(C=1e+20, class_weight='auto', dual=False,
fit_intercept=True, intercept_scaling=1, max_iter=100,
multi_class='ovr', penalty='l1', random_state=None,
solver='liblinear', tol=0.0001, verbose=0)
...
Finished in: 0:00:00.402328
Running LogisticRegression(C=0.0001, class_weight=None, dual=False,
fit_intercept=True, intercept_scaling=1, max_iter=100,
multi_class='ovr', penalty='l2', random_state=None,
solver='liblinear', tol=0.0001, verbose=0)
...
Finished in: 0:00:00.364524
Running LogisticRegression(C=0.0001, class_weight='auto', dual=False,
fit_intercept=True, intercept_scaling=1, max_iter=100,
multi_class='ovr', penalty='l2', random_state=None,
solver='liblinear', tol=0.0001, verbose=0)
...
Finished in: 0:00:00.381151
Running LogisticRegression(C=0.001, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr',
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0)
...
Finished in: 0:00:00.379761
Running LogisticRegression(C=0.001, class_weight='auto', dual=False,
fit_intercept=True, intercept_scaling=1, max_iter=100,
multi_class='ovr', penalty='l2', random_state=None,
solver='liblinear', tol=0.0001, verbose=0)
...
Finished in: 0:00:00.376176
Running LogisticRegression(C=0.01, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr',
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0)
...
Finished in: 0:00:00.391615
Running LogisticRegression(C=0.01, class_weight='auto', dual=False,
fit_intercept=True, intercept_scaling=1, max_iter=100,
multi_class='ovr', penalty='l2', random_state=None,
solver='liblinear', tol=0.0001, verbose=0)
...
Finished in: 0:00:00.434814
Running LogisticRegression(C=0.1, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr',
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0)
...
Finished in: 0:00:00.406865
Running LogisticRegression(C=0.1, class_weight='auto', dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr',
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0)
...
Finished in: 0:00:00.414197
Running LogisticRegression(C=1, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr',
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0)
...
Finished in: 0:00:00.426004
Running LogisticRegression(C=1, class_weight='auto', dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr',
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0)
...
Finished in: 0:00:00.430902
Running LogisticRegression(C=10, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr',
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0)
...
Finished in: 0:00:00.426705
Running LogisticRegression(C=10, class_weight='auto', dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr',
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0)
...
Finished in: 0:00:00.432601
Running LogisticRegression(C=1000.0, class_weight=None, dual=False,
fit_intercept=True, intercept_scaling=1, max_iter=100,
multi_class='ovr', penalty='l2', random_state=None,
solver='liblinear', tol=0.0001, verbose=0)
...
Finished in: 0:00:00.421830
Running LogisticRegression(C=1000.0, class_weight='auto', dual=False,
fit_intercept=True, intercept_scaling=1, max_iter=100,
multi_class='ovr', penalty='l2', random_state=None,
solver='liblinear', tol=0.0001, verbose=0)
...
Finished in: 0:00:00.432508
Running LogisticRegression(C=10000.0, class_weight=None, dual=False,
fit_intercept=True, intercept_scaling=1, max_iter=100,
multi_class='ovr', penalty='l2', random_state=None,
solver='liblinear', tol=0.0001, verbose=0)
...
Finished in: 0:00:00.422352
Running LogisticRegression(C=10000.0, class_weight='auto', dual=False,
fit_intercept=True, intercept_scaling=1, max_iter=100,
multi_class='ovr', penalty='l2', random_state=None,
solver='liblinear', tol=0.0001, verbose=0)
...
Finished in: 0:00:00.444862
Running LogisticRegression(C=1e+20, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr',
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0)
...
Finished in: 0:00:00.420858
Running LogisticRegression(C=1e+20, class_weight='auto', dual=False,
fit_intercept=True, intercept_scaling=1, max_iter=100,
multi_class='ovr', penalty='l2', random_state=None,
solver='liblinear', tol=0.0001, verbose=0)
...
Finished in: 0:00:00.437257
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=None, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.306707
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=2, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.533452
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=5, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.001351
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=10, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.822077
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=20, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.129220
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=50, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.495920
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=None, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.030545
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=2, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.863402
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=5, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.906797
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=10, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.898788
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=20, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.183787
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=50, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.024716
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=None, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.062133
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=2, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.826304
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=5, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.919846
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=10, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.937089
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=20, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.170421
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=50, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.247641
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.081492
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=2, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.878534
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=5, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.062954
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=10, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.838082
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=20, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.025704
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=50, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.149634
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.116550
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=2, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.656106
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=5, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.865192
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=10, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.839699
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=20, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.079139
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=50, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.076720
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.091777
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=2, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.560966
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=5, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.870882
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=10, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.927998
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=20, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.063943
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=50, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.078795
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=None, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.209857
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=2, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.535220
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=5, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.005319
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=10, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.833844
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=20, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.315878
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=50, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.364685
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=None, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.029695
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=2, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.659135
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=5, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.884351
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=10, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.851791
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=20, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.072281
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=50, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.034257
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=None, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.061289
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=2, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.781985
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=5, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.910828
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=10, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.042829
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=20, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.022270
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=50, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.021346
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.005198
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=2, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.772041
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=5, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.961416
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=10, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.824478
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=20, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.195120
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=50, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.972312
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.010753
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=2, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.886135
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=5, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.939969
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=10, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.856663
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=20, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.002592
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=50, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.066369
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.102459
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=2, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.839102
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=5, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.844212
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=10, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.896800
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=20, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.012214
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=50, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=5,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.070955
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=None, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.271440
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=2, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.827547
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=5, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.956711
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=10, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.861525
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=20, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.919479
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=50, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.378601
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=None, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.028166
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=2, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.767604
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=5, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.053902
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=10, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.932311
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=20, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.036924
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=50, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.017551
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=None, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.099946
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=2, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.774105
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=5, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.863050
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=10, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.826703
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=20, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.002088
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
max_depth=50, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.967269
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.715545
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=2, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.824188
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=5, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.953321
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=10, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.730658
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=20, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.949908
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=50, max_features=None, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.186746
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.974667
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=2, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.661516
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=5, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.867236
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=10, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.951479
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=20, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.071011
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=50, max_features='sqrt', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.080103
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.983088
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=2, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.570611
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=5, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.825060
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=10, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.904813
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=20, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:05.941646
Running RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=50, max_features='log2', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=10,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)
...
Finished in: 0:00:06.138915
Running AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None,
learning_rate=0.1, n_estimators=100, random_state=None)
...
Finished in: 0:00:05.195547
Running AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None,
learning_rate=0.5, n_estimators=100, random_state=None)
...
Finished in: 0:00:05.194237
Running AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None, learning_rate=1,
n_estimators=100, random_state=None)
...
Finished in: 0:00:05.351599
Running AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None, learning_rate=2,
n_estimators=100, random_state=None)
...
Finished in: 0:00:05.182008
Running AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None, learning_rate=5,
n_estimators=100, random_state=None)
...
Finished in: 0:00:04.435360
Running GaussianNB()
...
Finished in: 0:00:00.362537
Running BernoulliNB(alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True)
...
Finished in: 0:00:00.370889
Running LDA(n_components=None, priors=None, shrinkage=None, solver='svd',
store_covariance=False, tol=0.0001)
...
Finished in: 0:00:00.401709
Running QDA(priors=None, reg_param=0.0)
...
Finished in: 0:00:00.408922
machine_learner: finished running models
machine_learner: pickle files available in yay_pkls/
machine_learner: total runtime was 0:11:25.637773
/opt/anaconda/lib/python2.7/site-packages/sklearn/metrics/classification.py:958: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples.
'precision', 'predicted', average, warn_for)
In [7]:
evaluation_df = evaluation.dict_to_dataframe(eval_dct, pkl_dct)
evaluation_df.head()
Out[7]:
avg_prec_score_mean
avg_prec_score_std
roc_auc_mean
roc_auc_std
avg_prec_0.05 mean
avg_prec_0.05 std
precision at 0.05 mean
precision at 0.05 std
recall at 0.05 mean
recall at 0.05 std
...
avg_prec_0.3 std
precision at 0.3 mean
precision at 0.3 std
recall at 0.3 mean
recall at 0.3 std
test_count at 0.3 mean
test_count at 0.3 std
test_percent at 0.3 mean
test_percent at 0.3 std
pickle_file
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n max_depth=None, max_features=None, max_leaf_nodes=None,\n min_samples_leaf=1, min_samples_split=10,\n min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,\n oob_score=False, random_state=None, verbose=0,\n warm_start=False)
0.227689
0.037991
0.564351
0.035016
0.042793
0.026549
0.280492
0.096231
0.092683
0.033848
...
0.038559
0.215661
0.030034
0.421209
0.052806
205.5
16.641648
0.318261
0.025804
yay_pkls/RandomForestClassifier126.pkl
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n max_depth=None, max_features='log2', max_leaf_nodes=None,\n min_samples_leaf=1, min_samples_split=5,\n min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,\n oob_score=False, random_state=None, verbose=0,\n warm_start=False)
0.228491
0.036778
0.563230
0.036368
0.046269
0.030128
0.301346
0.128485
0.083150
0.034220
...
0.037046
0.211890
0.024902
0.435476
0.069877
215.5
24.636242
0.333738
0.038073
yay_pkls/RandomForestClassifier102.pkl
AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None, learning_rate=5,\n n_estimators=100, random_state=None)
0.349449
0.121060
0.634610
0.085664
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
...
0.000000
0.000000
0.000000
0.000000
0.000000
0.0
0.000000
0.000000
0.000000
yay_pkls/AdaBoostClassifier148.pkl
LogisticRegression(C=10, class_weight='auto', dual=False, fit_intercept=True,\n intercept_scaling=1, max_iter=100, multi_class='ovr',\n penalty='l2', random_state=None, solver='liblinear', tol=0.0001,\n verbose=0)
0.269039
0.036457
0.594393
0.032791
0.071361
0.032236
0.381378
0.097234
0.120357
0.033885
...
0.040510
0.226779
0.023373
0.430751
0.063380
198.6
16.153431
0.307571
0.025008
yay_pkls/LogisticRegression29.pkl
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n max_depth=50, max_features='log2', max_leaf_nodes=None,\n min_samples_leaf=1, min_samples_split=5,\n min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,\n oob_score=False, random_state=None, verbose=0,\n warm_start=False)
0.226853
0.034828
0.563092
0.036765
0.045904
0.027237
0.306021
0.124561
0.085998
0.034483
...
0.033072
0.211291
0.027636
0.435485
0.078397
215.8
24.956852
0.334204
0.038582
yay_pkls/RandomForestClassifier107.pkl
5 rows × 65 columns
Now we can sort the dataframe by a particular metric to see which classifier did the best on that metric. For example, we might be interested in precision at 10%.
In [8]:
print evaluation_df.columns.values
sorted_df = evaluation_df.sort('precision at 0.1 mean', ascending=False)
sorted_df[['precision at 0.1 mean', 'precision at 0.1 std', 'roc_auc_mean', 'roc_auc_std', 'test_percent at 0.1 mean', 'pickle_file']].head()
['avg_prec_score_mean' 'avg_prec_score_std' 'roc_auc_mean' 'roc_auc_std'
'avg_prec_0.05 mean' 'avg_prec_0.05 std' 'precision at 0.05 mean'
'precision at 0.05 std' 'recall at 0.05 mean' 'recall at 0.05 std'
'test_count at 0.05 mean' 'test_count at 0.05 std'
'test_percent at 0.05 mean' 'test_percent at 0.05 std' 'avg_prec_0.1 mean'
'avg_prec_0.1 std' 'precision at 0.1 mean' 'precision at 0.1 std'
'recall at 0.1 mean' 'recall at 0.1 std' 'test_count at 0.1 mean'
'test_count at 0.1 std' 'test_percent at 0.1 mean'
'test_percent at 0.1 std' 'avg_prec_0.15 mean' 'avg_prec_0.15 std'
'precision at 0.15 mean' 'precision at 0.15 std' 'recall at 0.15 mean'
'recall at 0.15 std' 'test_count at 0.15 mean' 'test_count at 0.15 std'
'test_percent at 0.15 mean' 'test_percent at 0.15 std' 'avg_prec_0.2 mean'
'avg_prec_0.2 std' 'precision at 0.2 mean' 'precision at 0.2 std'
'recall at 0.2 mean' 'recall at 0.2 std' 'test_count at 0.2 mean'
'test_count at 0.2 std' 'test_percent at 0.2 mean'
'test_percent at 0.2 std' 'avg_prec_0.25 mean' 'avg_prec_0.25 std'
'precision at 0.25 mean' 'precision at 0.25 std' 'recall at 0.25 mean'
'recall at 0.25 std' 'test_count at 0.25 mean' 'test_count at 0.25 std'
'test_percent at 0.25 mean' 'test_percent at 0.25 std' 'avg_prec_0.3 mean'
'avg_prec_0.3 std' 'precision at 0.3 mean' 'precision at 0.3 std'
'recall at 0.3 mean' 'recall at 0.3 std' 'test_count at 0.3 mean'
'test_count at 0.3 std' 'test_percent at 0.3 mean'
'test_percent at 0.3 std' 'pickle_file']
Out[8]:
precision at 0.1 mean
precision at 0.1 std
roc_auc_mean
roc_auc_std
test_percent at 0.1 mean
pickle_file
LogisticRegression(C=1, class_weight='auto', dual=False, fit_intercept=True,\n intercept_scaling=1, max_iter=100, multi_class='ovr',\n penalty='l1', random_state=None, solver='liblinear', tol=0.0001,\n verbose=0)
0.319557
0.065447
0.594181
0.032089
0.099116
yay_pkls/LogisticRegression9.pkl
LogisticRegression(C=1, class_weight=None, dual=False, fit_intercept=True,\n intercept_scaling=1, max_iter=100, multi_class='ovr',\n penalty='l2', random_state=None, solver='liblinear', tol=0.0001,\n verbose=0)
0.318762
0.068244
0.590139
0.032635
0.099116
yay_pkls/LogisticRegression26.pkl
LogisticRegression(C=1, class_weight='auto', dual=False, fit_intercept=True,\n intercept_scaling=1, max_iter=100, multi_class='ovr',\n penalty='l2', random_state=None, solver='liblinear', tol=0.0001,\n verbose=0)
0.317870
0.065527
0.593924
0.033104
0.099581
yay_pkls/LogisticRegression27.pkl
LogisticRegression(C=1, class_weight=None, dual=False, fit_intercept=True,\n intercept_scaling=1, max_iter=100, multi_class='ovr',\n penalty='l1', random_state=None, solver='liblinear', tol=0.0001,\n verbose=0)
0.316662
0.066351
0.592775
0.033538
0.099735
yay_pkls/LogisticRegression8.pkl
LogisticRegression(C=10, class_weight=None, dual=False, fit_intercept=True,\n intercept_scaling=1, max_iter=100, multi_class='ovr',\n penalty='l1', random_state=None, solver='liblinear', tol=0.0001,\n verbose=0)
0.316258
0.068131
0.590239
0.032488
0.098806
yay_pkls/LogisticRegression10.pkl
If we want to load the best model back into memory, we can do that using joblib
. This is especially useful when running a model in the backend. That wraps up the babysaver
module!
In [9]:
from sklearn.externals import joblib
best_clf = joblib.load(sorted_df['pickle_file'][0])
best_clf
Out[9]:
LogisticRegression(C=1, class_weight='auto', dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr',
penalty='l1', random_state=None, solver='liblinear', tol=0.0001,
verbose=0)
Content source: dssg/babies-public
Similar notebooks: