Machine Learning with H2O - Tutorial 3b: Regression Models (Grid Search)

Objective:

This tutorial explains how to fine-tune regression models for better out-of-bag performance.

Wine Quality Dataset:

Steps:

GBM with default settings
GBM with manual settings
GBM with manual settings & cross-validation
GBM with manual settings, cross-validation and early stopping
GBM with cross-validation, early stopping and full grid search
GBM with cross-validation, early stopping and random grid search
Model stacking (combining different GLM, DRF, GBM and DNN models)

Full Technical Reference:

http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/modeling.html



In [1]:

    
# Start and connect to a local H2O cluster
import h2o
h2o.init(nthreads = -1)









    



Checking whether there is an H2O instance running at http://localhost:54321..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "1.8.0_131"; OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11); OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
  Starting server from /home/joe/anaconda3/lib/python3.6/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /tmp/tmp68uwhnzg
  JVM stdout: /tmp/tmp68uwhnzg/h2o_joe_started_from_python.out
  JVM stderr: /tmp/tmp68uwhnzg/h2o_joe_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321... successful.






    




H2O cluster uptime:
01 secs
H2O cluster version:
3.10.5.2
H2O cluster version age:
10 days 
H2O cluster name:
H2O_from_python_joe_wncaln
H2O cluster total nodes:
1
H2O cluster free memory:
5.210 Gb
H2O cluster total cores:
8
H2O cluster allowed cores:
8
H2O cluster status:
accepting new members, healthy
H2O connection url:
http://127.0.0.1:54321
H2O connection proxy:
None
H2O internal security:
False
Python version:
3.6.1 final



In [2]:

    
# Import wine quality data from a local CSV file
wine = h2o.import_file("winequality-white.csv")
wine.head(5)









    



Parse progress: |█████████████████████████████████████████████████████████| 100%






    






  fixed acidity   volatile acidity   citric acid   residual sugar   chlorides   free sulfur dioxide   total sulfur dioxide   density   pH   sulphates   alcohol   quality


            7                0.27          0.36             20.7       0.045                    45                    170    1.001 3          0.45       8.8         6
            6.3               0.3          0.34              1.6       0.049                    14                    132    0.994 3.3        0.49       9.5         6
            8.1               0.28          0.4              6.9       0.05                    30                     97    0.9951 3.26        0.44      10.1         6
            7.2               0.23          0.32              8.5       0.058                    47                    186    0.9956 3.19        0.4       9.9         6
            7.2               0.23          0.32              8.5       0.058                    47                    186    0.9956 3.19        0.4       9.9         6








    Out[2]:



In [3]:

    
# Define features (or predictors)
features = list(wine.columns) # we want to use all the information
features.remove('quality')    # we need to exclude the target 'quality' (otherwise there is nothing to predict)
features









    Out[3]:





['fixed acidity',
 'volatile acidity',
 'citric acid',
 'residual sugar',
 'chlorides',
 'free sulfur dioxide',
 'total sulfur dioxide',
 'density',
 'pH',
 'sulphates',
 'alcohol']



In [4]:

    
# Split the H2O data frame into training/test sets
# so we can evaluate out-of-bag performance
wine_split = wine.split_frame(ratios = [0.8], seed = 1234)

wine_train = wine_split[0] # using 80% for training
wine_test = wine_split[1]  # using the rest 20% for out-of-bag evaluation



In [5]:

    
wine_train.shape









    Out[5]:





(3932, 12)



In [6]:

    
wine_test.shape









    Out[6]:





(966, 12)

Step 1 - Gradient Boosting Machines (GBM) with Default Settings



In [7]:

    
# Build a Gradient Boosting Machines (GBM) model with default settings

# Import the function for GBM
from h2o.estimators.gbm import H2OGradientBoostingEstimator

# Set up GBM for regression
# Add a seed for reproducibility
gbm_default = H2OGradientBoostingEstimator(model_id = 'gbm_default', 
                                           seed = 1234)

# Use .train() to build the model
gbm_default.train(x = features, 
                  y = 'quality', 
                  training_frame = wine_train)









    



gbm Model Build progress: |███████████████████████████████████████████████| 100%



In [8]:

    
# Check the model performance on test dataset
gbm_default.model_performance(wine_test)









    



ModelMetricsRegression: gbm
** Reported on test data. **

MSE: 0.45511211588709155
RMSE: 0.6746199788674299
MAE: 0.5219768028633305
RMSLE: 0.10013755931021842
Mean Residual Deviance: 0.45511211588709155






    Out[8]:

Step 2 - GBM with Manual Settings



In [9]:

    
# Build a GBM with manual settings

# Set up GBM for regression
# Add a seed for reproducibility
gbm_manual = H2OGradientBoostingEstimator(model_id = 'gbm_manual', 
                                          seed = 1234,
                                          ntrees = 100,
                                          sample_rate = 0.9,
                                          col_sample_rate = 0.9)

# Use .train() to build the model
gbm_manual.train(x = features, 
                 y = 'quality', 
                 training_frame = wine_train)









    



gbm Model Build progress: |███████████████████████████████████████████████| 100%



In [10]:

    
# Check the model performance on test dataset
gbm_manual.model_performance(wine_test)









    



ModelMetricsRegression: gbm
** Reported on test data. **

MSE: 0.44325665649714924
RMSE: 0.6657752297113112
MAE: 0.5114358481376113
RMSLE: 0.09895809708429235
Mean Residual Deviance: 0.44325665649714924






    Out[10]:

Step 3 - GBM with Manual Settings & Cross-Validation (CV)



In [11]:

    
# Build a GBM with manual settings & cross-validation

# Set up GBM for regression
# Add a seed for reproducibility
gbm_manual_cv = H2OGradientBoostingEstimator(model_id = 'gbm_manual_cv', 
                                             seed = 1234,
                                             ntrees = 100,
                                             sample_rate = 0.9,
                                             col_sample_rate = 0.9,
                                             nfolds = 5)
                                            
# Use .train() to build the model
gbm_manual_cv.train(x = features, 
                    y = 'quality', 
                    training_frame = wine_train)









    



gbm Model Build progress: |███████████████████████████████████████████████| 100%



In [12]:

    
# Check the cross-validation model performance
gbm_manual_cv









    



Model Details
=============
H2OGradientBoostingEstimator :  Gradient Boosting Machine
Model Key:  gbm_manual_cv


ModelMetricsRegression: gbm
** Reported on train data. **

MSE: 0.27438346229216
RMSE: 0.5238162485950202
MAE: 0.4075920913493524
RMSLE: 0.0774835431572533
Mean Residual Deviance: 0.27438346229216

ModelMetricsRegression: gbm
** Reported on cross-validation data. **

MSE: 0.45021820302163834
RMSE: 0.6709830124687497
MAE: 0.5185163944803867
RMSLE: 0.10007842575662584
Mean Residual Deviance: 0.45021820302163834
Cross-Validation Metrics Summary: 






    





mean
sd
cv_1_valid
cv_2_valid
cv_3_valid
cv_4_valid
cv_5_valid
mae
0.5183839
0.0057996
0.5173063
0.5328950
0.507699
0.5151827
0.5188366
mean_residual_deviance
0.4501753
0.0090064
0.4428255
0.4701306
0.4537360
0.4317105
0.4524740
mse
0.4501753
0.0090064
0.4428255
0.4701306
0.4537360
0.4317105
0.4524740
r2
0.4309830
0.0176490
0.4365418
0.3850305
0.4380041
0.4612002
0.4341383
residual_deviance
0.4501753
0.0090064
0.4428255
0.4701306
0.4537360
0.4317105
0.4524740
rmse
0.670884
0.0067072
0.6654513
0.6856607
0.6735993
0.6570468
0.6726618
rmsle
0.1000738
0.0007271
0.0983455
0.1015277
0.0999000
0.1004637
0.1001324






    



Scoring History: 






    





timestamp
duration
number_of_trees
training_rmse
training_mae
training_deviance

2017-06-29 23:25:30
 2.770 sec
0.0
0.8900853
0.6768335
0.7922518

2017-06-29 23:25:30
 2.779 sec
1.0
0.8599939
0.6513445
0.7395894

2017-06-29 23:25:30
 2.784 sec
2.0
0.8321552
0.6295051
0.6924822

2017-06-29 23:25:30
 2.791 sec
3.0
0.8081890
0.6151421
0.6531695

2017-06-29 23:25:30
 2.797 sec
4.0
0.7882018
0.6064024
0.6212620
---
---
---
---
---
---
---

2017-06-29 23:25:30
 3.190 sec
96.0
0.5267685
0.4106401
0.2774850

2017-06-29 23:25:30
 3.193 sec
97.0
0.5255850
0.4095203
0.2762396

2017-06-29 23:25:30
 3.196 sec
98.0
0.5252608
0.4091795
0.2758989

2017-06-29 23:25:30
 3.200 sec
99.0
0.5239927
0.4076997
0.2745683

2017-06-29 23:25:30
 3.203 sec
100.0
0.5238162
0.4075921
0.2743835






    



See the whole table with table.as_data_frame()
Variable Importances: 






    




variable
relative_importance
scaled_importance
percentage
alcohol
3520.4504395
1.0
0.3371040
volatile acidity
1474.0030518
0.4186973
0.1411445
free sulfur dioxide
1111.8027344
0.3158126
0.1064617
pH
621.6004639
0.1765684
0.0595219
residual sugar
608.0207520
0.1727111
0.0582216
total sulfur dioxide
592.9692993
0.1684356
0.0567803
fixed acidity
558.3989868
0.1586158
0.0534700
density
545.3736572
0.1549159
0.0522228
citric acid
479.6713257
0.1362528
0.0459314
sulphates
474.8290405
0.1348774
0.0454677
chlorides
456.0965881
0.1295563
0.0436740






    Out[12]:



In [13]:

    
# Check the model performance on test dataset
gbm_manual_cv.model_performance(wine_test)
# It should be the same as gbm_manual above as the model is trained with same parameters









    



ModelMetricsRegression: gbm
** Reported on test data. **

MSE: 0.44325665649714924
RMSE: 0.6657752297113112
MAE: 0.5114358481376113
RMSLE: 0.09895809708429235
Mean Residual Deviance: 0.44325665649714924






    Out[13]:

Step 4 - GBM with Manual Settings, CV and Early Stopping



In [14]:

    
# Build a GBM with manual settings, CV and early stopping

# Set up GBM for regression
# Add a seed for reproducibility
gbm_manual_cv_es = H2OGradientBoostingEstimator(model_id = 'gbm_manual_cv_es', 
                                                seed = 1234,
                                                ntrees = 10000,   # increase the number of trees 
                                                sample_rate = 0.9,
                                                col_sample_rate = 0.9,
                                                nfolds = 5,
                                                stopping_metric = 'mse', # let early stopping feature determine
                                                stopping_rounds = 15,     # the optimal number of trees
                                                score_tree_interval = 1) # by looking at the MSE metric
# Use .train() to build the model
gbm_manual_cv_es.train(x = features, 
                       y = 'quality', 
                       training_frame = wine_train)









    



gbm Model Build progress: |███████████████████████████████████████████████| 100%



In [15]:

    
# Check the model summary
gbm_manual_cv_es.summary()









    



Model Summary: 






    





number_of_trees
number_of_internal_trees
model_size_in_bytes
min_depth
max_depth
mean_depth
min_leaves
max_leaves
mean_leaves

155.0
155.0
49771.0
5.0
5.0
5.0
7.0
32.0
20.374193






    Out[15]:



In [16]:

    
# Check the cross-validation model performance
gbm_manual_cv_es









    



Model Details
=============
H2OGradientBoostingEstimator :  Gradient Boosting Machine
Model Key:  gbm_manual_cv_es


ModelMetricsRegression: gbm
** Reported on train data. **

MSE: 0.22107991282362896
RMSE: 0.470191357665822
MAE: 0.36200557768890596
RMSLE: 0.06954327915133354
Mean Residual Deviance: 0.22107991282362896

ModelMetricsRegression: gbm
** Reported on cross-validation data. **

MSE: 0.44288792056243054
RMSE: 0.665498249856775
MAE: 0.5094014952755754
RMSLE: 0.09937081861305609
Mean Residual Deviance: 0.44288792056243054
Cross-Validation Metrics Summary: 






    





mean
sd
cv_1_valid
cv_2_valid
cv_3_valid
cv_4_valid
cv_5_valid
mae
0.5095274
0.0064806
0.4993615
0.5179948
0.4980329
0.5125335
0.5197147
mean_residual_deviance
0.4430673
0.0067716
0.4305977
0.4522071
0.4484727
0.4323305
0.4517285
mse
0.4430673
0.0067716
0.4305977
0.4522071
0.4484727
0.4323305
0.4517285
r2
0.4401194
0.0126575
0.4521007
0.4084759
0.4445232
0.4604265
0.4350706
residual_deviance
0.4430673
0.0067716
0.4305977
0.4522071
0.4484727
0.4323305
0.4517285
rmse
0.665594
0.0050970
0.6561994
0.6724635
0.6696811
0.6575184
0.6721075
rmsle
0.0993869
0.0008079
0.0972056
0.0996676
0.0995299
0.1005154
0.1000163






    



Scoring History: 






    





timestamp
duration
number_of_trees
training_rmse
training_mae
training_deviance

2017-06-29 23:25:37
 6.820 sec
0.0
0.8900853
0.6768335
0.7922518

2017-06-29 23:25:37
 6.825 sec
1.0
0.8599939
0.6513445
0.7395894

2017-06-29 23:25:37
 6.829 sec
2.0
0.8321552
0.6295051
0.6924822

2017-06-29 23:25:37
 6.832 sec
3.0
0.8081890
0.6151421
0.6531695

2017-06-29 23:25:37
 6.836 sec
4.0
0.7882018
0.6064024
0.6212620
---
---
---
---
---
---
---

2017-06-29 23:25:38
 7.395 sec
151.0
0.4719094
0.3635129
0.2226985

2017-06-29 23:25:38
 7.399 sec
152.0
0.4714653
0.3632114
0.2222796

2017-06-29 23:25:38
 7.403 sec
153.0
0.4712549
0.3630109
0.2220812

2017-06-29 23:25:38
 7.409 sec
154.0
0.4704735
0.3622888
0.2213453

2017-06-29 23:25:38
 7.413 sec
155.0
0.4701914
0.3620056
0.2210799






    



See the whole table with table.as_data_frame()
Variable Importances: 






    




variable
relative_importance
scaled_importance
percentage
alcohol
3619.5112305
1.0
0.3124334
volatile acidity
1571.8613281
0.4342745
0.1356818
free sulfur dioxide
1227.1920166
0.3390491
0.1059302
pH
727.2174072
0.2009159
0.0627728
residual sugar
694.1660156
0.1917845
0.0599199
total sulfur dioxide
680.5446777
0.1880212
0.0587441
fixed acidity
664.0781860
0.1834718
0.0573227
density
651.0623779
0.1798758
0.0561992
citric acid
594.4930420
0.1642468
0.0513162
sulphates
592.9971313
0.1638335
0.0511870
chlorides
561.7833252
0.1552097
0.0484927






    Out[16]:



In [17]:

    
# Check the model performance on test dataset
gbm_manual_cv_es.model_performance(wine_test)









    



ModelMetricsRegression: gbm
** Reported on test data. **

MSE: 0.4287344643828695
RMSE: 0.6547781795256081
MAE: 0.4990124321946826
RMSLE: 0.09753734379917677
Mean Residual Deviance: 0.4287344643828695






    Out[17]:

Step 5 - GBM with CV, Early Stopping and Full Grid Search



In [18]:

    
# import Grid Search
from h2o.grid.grid_search import H2OGridSearch



In [19]:

    
# define the criteria for full grid search
search_criteria = {'strategy': "Cartesian"}



In [20]:

    
# define the range of hyper-parameters for grid search
hyper_params = {'sample_rate': [0.7, 0.8, 0.9],
                'col_sample_rate': [0.7, 0.8, 0.9]}



In [21]:

    
# Set up GBM grid search
# Add a seed for reproducibility
gbm_full_grid = H2OGridSearch(
                    H2OGradientBoostingEstimator(
                        model_id = 'gbm_full_grid', 
                        seed = 1234,
                        ntrees = 10000,   
                        nfolds = 5,
                        stopping_metric = 'mse', 
                        stopping_rounds = 15,     
                        score_tree_interval = 1),
                    search_criteria = search_criteria, # full grid search
                    hyper_params = hyper_params)



In [22]:

    
# Use .train() to start the grid search
gbm_full_grid.train(x = features, 
                    y = 'quality', 
                    training_frame = wine_train)









    



gbm Grid Build progress: |████████████████████████████████████████████████| 100%



In [23]:

    
# Sort and show the grid search results
gbm_full_grid_sorted = gbm_full_grid.get_grid(sort_by='mse', decreasing=False)
print(gbm_full_grid_sorted)









    



    col_sample_rate sample_rate  \
0               0.8         0.9   
1               0.7         0.9   
2               0.8         0.8   
3               0.9         0.9   
4               0.9         0.8   
5               0.9         0.7   
6               0.7         0.8   
7               0.7         0.7   
8               0.8         0.7   

                                                     model_ids  \
0  Grid_GBM_py_4_sid_9f52_model_python_1498775122375_1_model_7   
1  Grid_GBM_py_4_sid_9f52_model_python_1498775122375_1_model_6   
2  Grid_GBM_py_4_sid_9f52_model_python_1498775122375_1_model_4   
3  Grid_GBM_py_4_sid_9f52_model_python_1498775122375_1_model_8   
4  Grid_GBM_py_4_sid_9f52_model_python_1498775122375_1_model_5   
5  Grid_GBM_py_4_sid_9f52_model_python_1498775122375_1_model_2   
6  Grid_GBM_py_4_sid_9f52_model_python_1498775122375_1_model_3   
7  Grid_GBM_py_4_sid_9f52_model_python_1498775122375_1_model_0   
8  Grid_GBM_py_4_sid_9f52_model_python_1498775122375_1_model_1   

                   mse  
0  0.43780785687779805  
1  0.44060532786277523  
2  0.44096100224896634  
3  0.44288792056243054  
4  0.44475412455519636  
5   0.4457317997358452  
6    0.448140619501795  
7   0.4528872144586896  
8   0.4529771807006373



In [24]:

    
# Extract the best model from full grid search
best_model_id = gbm_full_grid_sorted.model_ids[0]
best_gbm_from_full_grid = h2o.get_model(best_model_id)
best_gbm_from_full_grid.summary()









    



Model Summary: 






    





number_of_trees
number_of_internal_trees
model_size_in_bytes
min_depth
max_depth
mean_depth
min_leaves
max_leaves
mean_leaves

187.0
187.0
57180.0
5.0
5.0
5.0
7.0
31.0
19.160427






    Out[24]:



In [25]:

    
# Check the model performance on test dataset
best_gbm_from_full_grid.model_performance(wine_test)









    



ModelMetricsRegression: gbm
** Reported on test data. **

MSE: 0.4196124030489544
RMSE: 0.6477749632773363
MAE: 0.48965435078727043
RMSLE: 0.09630232810628427
Mean Residual Deviance: 0.4196124030489544






    Out[25]:

GBM with CV, Early Stopping and Random Grid Search



In [26]:

    
# define the criteria for random grid search
search_criteria = {'strategy': "RandomDiscrete", 
                   'max_models': 9,
                   'seed': 1234}



In [27]:

    
# define the range of hyper-parameters for grid search
# 27 combinations in total
hyper_params = {'sample_rate': [0.7, 0.8, 0.9],
                'col_sample_rate': [0.7, 0.8, 0.9],
                'max_depth': [3, 5, 7]}



In [28]:

    
# Set up GBM grid search
# Add a seed for reproducibility
gbm_rand_grid = H2OGridSearch(
                    H2OGradientBoostingEstimator(
                        model_id = 'gbm_rand_grid', 
                        seed = 1234,
                        ntrees = 10000,   
                        nfolds = 5,
                        stopping_metric = 'mse', 
                        stopping_rounds = 15,     
                        score_tree_interval = 1),
                    search_criteria = search_criteria, # full grid search
                    hyper_params = hyper_params)



In [29]:

    
# Use .train() to start the grid search
gbm_rand_grid.train(x = features, 
                    y = 'quality', 
                    training_frame = wine_train)









    



gbm Grid Build progress: |████████████████████████████████████████████████| 100%



In [30]:

    
# Sort and show the grid search results
gbm_rand_grid_sorted = gbm_rand_grid.get_grid(sort_by='mse', decreasing=False)
print(gbm_rand_grid_sorted)









    



    col_sample_rate max_depth sample_rate  \
0               0.9         7         0.9   
1               0.7         7         0.7   
2               0.9         7         0.7   
3               0.8         7         0.7   
4               0.7         5         0.8   
5               0.8         3         0.9   
6               0.9         3         0.9   
7               0.8         3         0.8   
8               0.7         3         0.7   

                                                     model_ids  \
0  Grid_GBM_py_4_sid_9f52_model_python_1498775122375_2_model_5   
1  Grid_GBM_py_4_sid_9f52_model_python_1498775122375_2_model_1   
2  Grid_GBM_py_4_sid_9f52_model_python_1498775122375_2_model_6   
3  Grid_GBM_py_4_sid_9f52_model_python_1498775122375_2_model_4   
4  Grid_GBM_py_4_sid_9f52_model_python_1498775122375_2_model_0   
5  Grid_GBM_py_4_sid_9f52_model_python_1498775122375_2_model_7   
6  Grid_GBM_py_4_sid_9f52_model_python_1498775122375_2_model_2   
7  Grid_GBM_py_4_sid_9f52_model_python_1498775122375_2_model_3   
8  Grid_GBM_py_4_sid_9f52_model_python_1498775122375_2_model_8   

                   mse  
0   0.4227388012308513  
1   0.4327748309201154  
2   0.4369533108701783  
3   0.4397321318633594  
4    0.448140619501795  
5   0.4647039373596571  
6   0.4690321721360509  
7  0.47384072192391513  
8  0.47745552186979223



In [31]:

    
# Extract the best model from random grid search
best_model_id = gbm_rand_grid_sorted.model_ids[0]
best_gbm_from_rand_grid = h2o.get_model(best_model_id)
best_gbm_from_rand_grid.summary()









    



Model Summary: 






    





number_of_trees
number_of_internal_trees
model_size_in_bytes
min_depth
max_depth
mean_depth
min_leaves
max_leaves
mean_leaves

142.0
142.0
87920.0
7.0
7.0
7.0
16.0
82.0
44.049297






    Out[31]:



In [32]:

    
# Check the model performance on test dataset
best_gbm_from_rand_grid.model_performance(wine_test)









    



ModelMetricsRegression: gbm
** Reported on test data. **

MSE: 0.4047189762404106
RMSE: 0.636175271635428
MAE: 0.47321498369668896
RMSLE: 0.09498904157909563
Mean Residual Deviance: 0.4047189762404106






    Out[32]:

Comparison of Model Performance on Test Data



In [33]:

    
print('GBM with Default Settings                        :', gbm_default.model_performance(wine_test).mse())
print('GBM with Manual Settings                         :', gbm_manual.model_performance(wine_test).mse())
print('GBM with Manual Settings & CV                    :', gbm_manual_cv.model_performance(wine_test).mse())
print('GBM with Manual Settings, CV & Early Stopping    :', gbm_manual_cv_es.model_performance(wine_test).mse())
print('GBM with CV, Early Stopping & Full Grid Search   :', 
          best_gbm_from_full_grid.model_performance(wine_test).mse())
print('GBM with CV, Early Stopping & Random Grid Search :', 
          best_gbm_from_rand_grid.model_performance(wine_test).mse())









    



GBM with Default Settings                        : 0.45511211588709155
GBM with Manual Settings                         : 0.44325665649714924
GBM with Manual Settings & CV                    : 0.44325665649714924
GBM with Manual Settings, CV & Early Stopping    : 0.4287344643828695
GBM with CV, Early Stopping & Full Grid Search   : 0.4196124030489544
GBM with CV, Early Stopping & Random Grid Search : 0.4047189762404106

H2O cluster uptime:	01 secs
H2O cluster version:	3.10.5.2
H2O cluster version age:	10 days
H2O cluster name:	H2O_from_python_joe_wncaln
H2O cluster total nodes:	1
H2O cluster free memory:	5.210 Gb
H2O cluster total cores:	8
H2O cluster allowed cores:	8
H2O cluster status:	accepting new members, healthy
H2O connection url:	http://127.0.0.1:54321
H2O connection proxy:	None
H2O internal security:	False
Python version:	3.6.1 final

fixed acidity	volatile acidity	citric acid	residual sugar	chlorides	free sulfur dioxide	total sulfur dioxide	density	pH	sulphates	alcohol	quality
7	0.27	0.36	20.7	0.045	45	170	1.001	3	0.45	8.8	6
6.3	0.3	0.34	1.6	0.049	14	132	0.994	3.3	0.49	9.5	6
8.1	0.28	0.4	6.9	0.05	30	97	0.9951	3.26	0.44	10.1	6
7.2	0.23	0.32	8.5	0.058	47	186	0.9956	3.19	0.4	9.9	6
7.2	0.23	0.32	8.5	0.058	47	186	0.9956	3.19	0.4	9.9	6

	mean	sd	cv_1_valid	cv_2_valid	cv_3_valid	cv_4_valid	cv_5_valid
mae	0.5183839	0.0057996	0.5173063	0.5328950	0.507699	0.5151827	0.5188366
mean_residual_deviance	0.4501753	0.0090064	0.4428255	0.4701306	0.4537360	0.4317105	0.4524740
mse	0.4501753	0.0090064	0.4428255	0.4701306	0.4537360	0.4317105	0.4524740
r2	0.4309830	0.0176490	0.4365418	0.3850305	0.4380041	0.4612002	0.4341383
residual_deviance	0.4501753	0.0090064	0.4428255	0.4701306	0.4537360	0.4317105	0.4524740
rmse	0.670884	0.0067072	0.6654513	0.6856607	0.6735993	0.6570468	0.6726618
rmsle	0.1000738	0.0007271	0.0983455	0.1015277	0.0999000	0.1004637	0.1001324

	timestamp	duration	number_of_trees	training_rmse	training_mae	training_deviance
	2017-06-29 23:25:30	2.770 sec	0.0	0.8900853	0.6768335	0.7922518
	2017-06-29 23:25:30	2.779 sec	1.0	0.8599939	0.6513445	0.7395894
	2017-06-29 23:25:30	2.784 sec	2.0	0.8321552	0.6295051	0.6924822
	2017-06-29 23:25:30	2.791 sec	3.0	0.8081890	0.6151421	0.6531695
	2017-06-29 23:25:30	2.797 sec	4.0	0.7882018	0.6064024	0.6212620
---	---	---	---	---	---	---
	2017-06-29 23:25:30	3.190 sec	96.0	0.5267685	0.4106401	0.2774850
	2017-06-29 23:25:30	3.193 sec	97.0	0.5255850	0.4095203	0.2762396
	2017-06-29 23:25:30	3.196 sec	98.0	0.5252608	0.4091795	0.2758989
	2017-06-29 23:25:30	3.200 sec	99.0	0.5239927	0.4076997	0.2745683
	2017-06-29 23:25:30	3.203 sec	100.0	0.5238162	0.4075921	0.2743835

variable	relative_importance	scaled_importance	percentage
alcohol	3520.4504395	1.0	0.3371040
volatile acidity	1474.0030518	0.4186973	0.1411445
free sulfur dioxide	1111.8027344	0.3158126	0.1064617
pH	621.6004639	0.1765684	0.0595219
residual sugar	608.0207520	0.1727111	0.0582216
total sulfur dioxide	592.9692993	0.1684356	0.0567803
fixed acidity	558.3989868	0.1586158	0.0534700
density	545.3736572	0.1549159	0.0522228
citric acid	479.6713257	0.1362528	0.0459314
sulphates	474.8290405	0.1348774	0.0454677
chlorides	456.0965881	0.1295563	0.0436740

	number_of_trees	number_of_internal_trees	model_size_in_bytes	min_depth	max_depth	mean_depth	min_leaves	max_leaves	mean_leaves
	155.0	155.0	49771.0	5.0	5.0	5.0	7.0	32.0	20.374193