Reading the file toto2.csv
Read csv file: toto2.csv
args: {'encoding': 'utf-8-sig', 'sep': ',', 'decimal': ',', 'engine': 'python', 'filepath_or_buffer': 'toto2.csv', 'thousands': '.', 'parse_dates': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'predict'], 'infer_datetime_format': True}
Inital dtypes is a float64
b float64
c float64
d float64
e float64
f float64
g float64
h float64
predict float64
dtype: object
Work on PolynomialFeatures: degree 1
Optimal number of clusters
(10000, 9)
Polynomial Features: generate a new feature matrix
consisting of all polynomial combinations of the features.
For 2 features [a, b]:
the degree 1 polynomial give [a, b]
the degree 2 polynomial give [1, a, b, a^2, ab, b^2]
...
ELBOW: explain the variance as a function of clusters.
OOB: this is the average error for each training observations,
calculted using the trees that doesn't contains this observation
during the creation of the tree.
Estimator DecisionTreeRegressor
Decision Tree Regressor: poses a series of carefully crafted questions
about the attributes of the test record with addition noisy observation.
Fitting 3 folds for each of 10 candidates, totalling 30 fits
[Parallel(n_jobs=1)]: Done 30 out of 30 | elapsed: 16.7s finished
Best params => {'min_samples_split': 4, 'min_samples_leaf': 4, 'max_depth': 10, 'criterion': 'mse'}
Best Score => 0.974
Check the decision tree: 2017-08-1813:12:56.922924.png
Estimator ExtraTreesRegressor
dot: graph is too large for cairo-renderer bitmaps. Scaling by 0.303036 to fit
ExtraTreesRegressor: as in random forests, a random subset of candidate
features is used, but instead of looking for the most discriminative
thresholds, thresholds are drawn at random for each candidate feature and
the best of these randomly-generated thresholds is picked as
the splitting rule.
Fitting 3 folds for each of 10 candidates, totalling 30 fits
[Parallel(n_jobs=1)]: Done 30 out of 30 | elapsed: 8.4s finished
Best params => {'n_estimators': 75, 'min_samples_split': 7, 'min_samples_leaf': 3, 'max_features': 0.8, 'bootstrap': True}
Best Score => 0.990
Estimator ElasticNetCV
ElasticNetCV: linear regression with combined
L1 (lasso penalty) and L2(ridge penalty) priors as regularizer.
Fitting 3 folds for each of 10 candidates, totalling 30 fits
[Parallel(n_jobs=1)]: Done 30 out of 30 | elapsed: 1.9s finished
Best params => {'tol': 0.1, 'l1_ratio': 0.9}
Best Score => 1.000
Estimator LassoLarsCV
[Parallel(n_jobs=1)]: Done 6 out of 6 | elapsed: 0.1s finished
Best params => {'normalize': True}
Best Score => 1.000
Estimator RidgeCV
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 0.0s finished
Best params => {}
Best Score => 1.000
Estimator XGBRegressor
LassoLarsCV: performs L1 regularization, it adds a factor of sum of
absolute value of coefficients in the optimization objective.
Usefull with lot of features, made some feature selection.
Fitting 3 folds for each of 10 candidates, totalling 30 fits
Fitting 3 folds for each of 2 candidates, totalling 6 fits
RidgeCV: performs on L2 regularization, it adds a factor of sum of squares
of coefficients in the optimization objective.
Usefull with higly correlated features.
Fitting 3 folds for each of 10 candidates, totalling 30 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Gradient boosting is an approach where new models are created that predict
the residuals or errors of prior models and then added together to make
the final prediction. It is called gradient boosting because it uses a
gradient descent algorithm to minimize the loss when adding new models.
Fitting 3 folds for each of 10 candidates, totalling 30 fits
[Parallel(n_jobs=1)]: Done 30 out of 30 | elapsed: 14.4s finished
Best params => {'subsample': 0.8, 'n_estimators': 75, 'min_child_weight': 6, 'max_depth': 8, 'learning_rate': 0.1}
Best Score => 0.996
Work on PolynomialFeatures: degree 2
Optimal number of clusters
Polynomial Features: generate a new feature matrix
consisting of all polynomial combinations of the features.
For 2 features [a, b]:
the degree 1 polynomial give [a, b]
the degree 2 polynomial give [1, a, b, a^2, ab, b^2]
...
ELBOW: explain the variance as a function of clusters.
OOB: this is the average error for each training observations,
calculted using the trees that doesn't contains this observation
during the creation of the tree.
Estimator DecisionTreeRegressor
Decision Tree Regressor: poses a series of carefully crafted questions
about the attributes of the test record with addition noisy observation.
Fitting 3 folds for each of 10 candidates, totalling 30 fits
[Parallel(n_jobs=1)]: Done 30 out of 30 | elapsed: 1.8min finished
Best params => {'min_samples_split': 8, 'min_samples_leaf': 7, 'max_depth': 10, 'criterion': 'mae'}
Best Score => 0.967
Check the decision tree: 2017-08-1813:18:33.319946.png
Estimator ExtraTreesRegressor
dot: graph is too large for cairo-renderer bitmaps. Scaling by 0.570446 to fit
ExtraTreesRegressor: as in random forests, a random subset of candidate
features is used, but instead of looking for the most discriminative
thresholds, thresholds are drawn at random for each candidate feature and
the best of these randomly-generated thresholds is picked as
the splitting rule.
Fitting 3 folds for each of 10 candidates, totalling 30 fits
[Parallel(n_jobs=1)]: Done 30 out of 30 | elapsed: 7.5s finished
Best params => {'n_estimators': 10, 'min_samples_split': 10, 'min_samples_leaf': 4, 'max_features': 0.7, 'bootstrap': False}
Best Score => 0.991
Estimator ElasticNetCV
ElasticNetCV: linear regression with combined
L1 (lasso penalty) and L2(ridge penalty) priors as regularizer.
Fitting 3 folds for each of 10 candidates, totalling 30 fits
[Parallel(n_jobs=1)]: Done 30 out of 30 | elapsed: 3.3s finished
Best params => {'tol': 0.5, 'l1_ratio': 0.9}
Best Score => 1.000
Estimator LassoLarsCV
[Parallel(n_jobs=1)]: Done 6 out of 6 | elapsed: 0.1s finished
Best params => {'normalize': True}
Best Score => 1.000
Estimator RidgeCV
LassoLarsCV: performs L1 regularization, it adds a factor of sum of
absolute value of coefficients in the optimization objective.
Usefull with lot of features, made some feature selection.
Fitting 3 folds for each of 10 candidates, totalling 30 fits
Fitting 3 folds for each of 2 candidates, totalling 6 fits
RidgeCV: performs on L2 regularization, it adds a factor of sum of squares
of coefficients in the optimization objective.
Usefull with higly correlated features.
Fitting 3 folds for each of 10 candidates, totalling 30 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 0.1s finished
Best params => {}
Best Score => 1.000
Estimator XGBRegressor
Gradient boosting is an approach where new models are created that predict
the residuals or errors of prior models and then added together to make
the final prediction. It is called gradient boosting because it uses a
gradient descent algorithm to minimize the loss when adding new models.
Fitting 3 folds for each of 10 candidates, totalling 30 fits
[Parallel(n_jobs=1)]: Done 30 out of 30 | elapsed: 25.0s finished
Best params => {'subsample': 0.1, 'n_estimators': 100, 'min_child_weight': 10, 'max_depth': 5, 'learning_rate': 0.1}
Best Score => 0.992
Estimator Score Degree
0 LassoLarsCV(copy_X=True, cv=None, eps=2.220446... 1.000000 1
1 LassoLarsCV(copy_X=True, cv=None, eps=2.220446... 1.000000 2
2 RidgeCV(alphas=(0.1, 1.0, 10.0), cv=None, fit_... 1.000000 1
3 RidgeCV(alphas=(0.1, 1.0, 10.0), cv=None, fit_... 1.000000 2
4 ElasticNetCV(alphas=None, copy_X=True, cv=None... 0.999896 1
5 ElasticNetCV(alphas=None, copy_X=True, cv=None... 0.999896 2
6 XGBRegressor(base_score=0.5, colsample_bylevel... 0.996338 1
7 XGBRegressor(base_score=0.5, colsample_bylevel... 0.992464 2
8 (ExtraTreeRegressor(criterion='mse', max_depth... 0.990852 2
9 (ExtraTreeRegressor(criterion='mse', max_depth... 0.990334 1
10 DecisionTreeRegressor(criterion='mse', max_dep... 0.974015 1
11 DecisionTreeRegressor(criterion='mae', max_dep... 0.967276 2
Stacking: is a model ensembling technique used to combine information
from multiple predictive models to generate a new model.
task: [regression]
metric: [r2_score]
model 0: [LassoLarsCV]
----
MEAN: [1.00000000]
model 1: [RidgeCV]
----
MEAN: [1.00000000]
model 2: [ElasticNetCV]
----
MEAN: [0.99989605]
model 3: [XGBRegressor]
----
MEAN: [0.99643045]
model 4: [ExtraTreesRegressor]
0%| | 0/63 [00:00<?, ?it/s]
----
MEAN: [0.99009462]
model 5: [DecisionTreeRegressor]
----
MEAN: [0.97408961]
Stacking 6 models: 100%|██████████| 63/63 [00:41<00:00, 1.33it/s]