H2O Tutorial: EEG Eye State Classification

Author: Erin LeDell

Contact: erin@h2o.ai

This tutorial steps through a quick introduction to H2O's Python API. The goal of this tutorial is to introduce through a complete example H2O's capabilities from Python.

Most of the functionality for a Pandas DataFrame is exactly the same syntax for an H2OFrame, so if you are comfortable with Pandas, data frame manipulation will come naturally to you in H2O. The modeling syntax in the H2O Python API may also remind you of scikit-learn.

References: H2O Python API documentation and H2O general documentation

Install H2O in Python

Prerequisites

This tutorial assumes you have Python 2.7 installed. The h2o Python package has a few dependencies which can be installed using pip. The packages that are required are (which also have their own dependencies):

pip install requests
pip install tabulate
pip install scikit-learn

If you have any problems (for example, installing the scikit-learn package), check out this page for tips.

Install h2o

Once the dependencies are installed, you can install H2O. We will use the latest stable version of the h2o package. The installation instructions are on the "Install in Python" tab.

For reference, the Python documentation for the latest stable release of H2O is here, and the general H2O User Guide is here.

Start up an H2O cluster

In a Python terminal, we can import the h2o package and start up an H2O cluster.



In [2]:

    
import h2o

# Start an H2O Cluster on your local machine
h2o.init()









    



Checking whether there is an H2O instance running at http://localhost:54321. connected.

If you already have an H2O cluster running that you'd like to connect to (for example, in a multi-node Hadoop environment), then you can specify the IP and port of that cluster as follows:



In [2]:

    
# This will not actually do anything since it's a fake IP address
# h2o.init(ip="123.45.67.89", port=54321)

Download EEG Data

The following code downloads a copy of the EEG Eye State dataset. All data is from one continuous EEG measurement with the Emotiv EEG Neuroheadset. The duration of the measurement was 117 seconds. The eye state was detected via a camera during the EEG measurement and added later manually to the file after analysing the video frames. '1' indicates the eye-closed and '0' the eye-open state. All values are in chronological order with the first measured value at the top of the data.

We can import the data directly into H2O using the import_file method in the Python API. The import path can be a URL, a local path, a path to an HDFS file, or a file on Amazon S3.



In [4]:

    
#csv_url = "http://www.stat.berkeley.edu/~ledell/data/eeg_eyestate_splits.csv"
csv_url = "https://h2o-public-test-data.s3.amazonaws.com/smalldata/eeg/eeg_eyestate_splits.csv"
data = h2o.import_file(csv_url)









    



Parse Progress: [##################################################] 100%

Explore Data

Once we have loaded the data, let's take a quick look. First the dimension of the frame:



In [6]:

    
data.shape









    Out[6]:





(14980, 16)

Now let's take a look at the top of the frame:



In [7]:

    
data.head()









    





    AF3      F7      F3     FC5      T7      P7      O1      O2      P8      T8     FC6      F4      F8     AF4   eyeDetection split  
4329.23 4009.23 4289.23 4148.21 4350.26 4586.15 4096.92 4641.03 4222.05 4238.46 4211.28 4280.51 4635.9 4393.85              0 valid  
4324.62 4004.62 4293.85 4148.72 4342.05 4586.67 4097.44 4638.97 4210.77 4226.67 4207.69 4279.49 4632.82 4384.1              0 test   
4327.69 4006.67 4295.38 4156.41 4336.92 4583.59 4096.92 4630.26 4207.69 4222.05 4206.67 4282.05 4628.72 4389.23              0 train  
4328.72 4011.79 4296.41 4155.9 4343.59 4582.56 4097.44 4630.77 4217.44 4235.38 4210.77 4287.69 4632.31 4396.41              0 train  
4326.15 4011.79 4292.31 4151.28 4347.69 4586.67 4095.9 4627.69 4210.77 4244.1 4212.82 4288.21 4632.82 4398.46              0 train  
4321.03 4004.62 4284.1 4153.33 4345.64 4587.18 4093.33 4616.92 4202.56 4232.82 4209.74 4281.03 4628.21 4389.74              0 train  
4319.49 4001.03 4280.51 4151.79 4343.59 4584.62 4089.74 4615.9 4212.31 4226.67 4201.03 4269.74 4625.13 4378.46              0 test   
4325.64 4006.67 4278.46 4143.08 4344.1 4583.08 4087.18 4614.87 4205.64 4230.26 4195.9 4266.67 4622.05 4380.51              0 test   
4326.15 4010.77 4276.41 4139.49 4345.13 4584.1 4091.28 4608.21 4187.69 4229.74 4202.05 4273.85 4627.18 4389.74              0 test   
4326.15 4011.28 4276.92 4142.05 4344.1 4582.56 4092.82 4608.72 4194.36 4228.72 4212.82 4277.95 4637.44 4393.33              0 train  







    Out[7]:

The first 14 columns are numeric values that represent EEG measurements from the headset. The "eyeDetection" column is the response. There is an additional column called "split" that was added (by me) in order to specify partitions of the data (so we can easily benchmark against other tools outside of H2O using the same splits). I randomly divided the dataset into three partitions: train (60%), valid (%20) and test (20%) and marked which split each row belongs to in the "split" column.

Let's take a look at the column names. The data contains derived features from the medical images of the tumors.



In [6]:

    
data.columns









    Out[6]:





[u'AF3',
 u'F7',
 u'F3',
 u'FC5',
 u'T7',
 u'P7',
 u'O1',
 u'O2',
 u'P8',
 u'T8',
 u'FC6',
 u'F4',
 u'F8',
 u'AF4',
 u'eyeDetection',
 u'split']

To select a subset of the columns to look at, typical Pandas indexing applies:



In [8]:

    
columns = ['AF3', 'eyeDetection', 'split']
data[columns].head()









    





    AF3   eyeDetection split  
4329.23              0 valid  
4324.62              0 test   
4327.69              0 train  
4328.72              0 train  
4326.15              0 train  
4321.03              0 train  
4319.49              0 test   
4325.64              0 test   
4326.15              0 test   
4326.15              0 train  







    Out[8]:

Now let's select a single column, for example -- the response column, and look at the data more closely:



In [9]:

    
y = 'eyeDetection'
data[y]









    





  eyeDetection
             0
             0
             0
             0
             0
             0
             0
             0
             0
             0







    Out[9]:

It looks like a binary response, but let's validate that assumption:



In [10]:

    
data[y].unique()

If you don't specify the column types when you import the file, H2O makes a guess at what your column types are. If there are 0's and 1's in a column, H2O will automatically parse that as numeric by default.

Therefore, we should convert the response column to a more efficient "enum" representation -- in this case it is a categorial variable with two levels, 0 and 1. If the only column in my data that is categorical is the response, I typically don't bother specifying the column type during the parse, and instead use this one-liner to convert it aftewards:



In [11]:

    
data[y] = data[y].asfactor()

Now we can check that there are two levels in our response column:



In [12]:

    
data[y].nlevels()









    Out[12]:





[2]

We can query the categorical "levels" as well ('0' and '1' stand for "eye open" and "eye closed") to see what they are:



In [13]:

    
data[y].levels()









    Out[13]:





[['0', '1']]

We may want to check if there are any missing values, so let's look for NAs in our dataset. For tree-based methods like GBM and RF, H2O handles missing feature values automatically, so it's not a problem if we are missing certain feature values. However, it is always a good idea to check to make sure that you are not missing any of the training labels.

To figure out which, if any, values are missing, we can use the isna method on the diagnosis column. The columns in an H2O Frame are also H2O Frames themselves, so all the methods that apply to a Frame also apply to a single column.



In [14]:

    
data.isna()









    





  C1   C2   C3   C4   C5   C6   C7   C8   C9   C10   C11   C12   C13   C14   C15   C16
   0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0
   0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0
   0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0
   0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0
   0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0
   0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0
   0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0
   0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0
   0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0
   0    0    0    0    0    0    0    0    0     0     0     0     0     0     0     0







    Out[14]:



In [15]:

    
data[y].isna()

The isna method doesn't directly answer the question, "Does the response column contain any NAs?", rather it returns a 0 if that cell is not missing (Is NA? FALSE == 0) and a 1 if it is missing (Is NA? TRUE == 1). So if there are no missing values, then summing over the whole column should produce a summand equal to 0.0. Let's take a look:



In [16]:

    
data[y].isna().sum()









    Out[16]:





0.0

Great, no missing labels. :-)

Out of curiosity, let's see if there is any missing data in this frame:



In [17]:

    
data.isna().sum()









    Out[17]:





0.0

The sum is still zero, so there are no missing values in any of the cells.

The next thing I may wonder about in a binary classification problem is the distribution of the response in the training data. Is one of the two outcomes under-represented in the training set? Many real datasets have what's called an "imbalanace" problem, where one of the classes has far fewer training examples than the other class. Let's take a look at the distribution:



In [18]:

    
data[y].table()









    





  eyeDetection   Count
             0    8257
             1    6723







    Out[18]:

Ok, the data is not exactly evenly distributed between the two classes -- there are more 0's than 1's in the dataset. However, this level of imbalance shouldn't be much of an issue for the machine learning algos. (We will revisit this later in the modeling section below).

Let's calculate the percentage that each class represents:



In [19]:

    
n = data.shape[0]  # Total number of training samples
data[y].table()['Count']/n









    





   Count
0.551202
0.448798







    Out[19]:

Split H2O Frame into a train and test set

So far we have explored the original dataset (all rows). For the machine learning portion of this tutorial, we will break the dataset into three parts: a training set, validation set and a test set.

If you want H2O to do the splitting for you, you can use the split_frame method. However, we have explicit splits that we want (for reproducibility reasons), so we can just subset the Frame to get the partitions we want.

Subset the data H2O Frame on the "split" column:



In [20]:

    
train = data[data['split']=="train"]
train.shape









    Out[20]:





(8988, 16)



In [21]:

    
valid = data[data['split']=="valid"]
valid.shape









    Out[21]:





(2996, 16)



In [22]:

    
test = data[data['split']=="test"]
test.shape









    Out[22]:





(2996, 16)

Machine Learning in H2O

We will do a quick demo of the H2O software using a Gradient Boosting Machine (GBM). The goal of this problem is to train a model to predict eye state (open vs closed) from EEG data.

Train and Test a GBM model



In [23]:

    
# Import H2O GBM:
from h2o.estimators.gbm import H2OGradientBoostingEstimator

We first create a model object of class, "H2OGradientBoostingEstimator". This does not actually do any training, it just sets the model up for training by specifying model parameters.



In [24]:

    
model = H2OGradientBoostingEstimator(distribution='bernoulli',
                                    ntrees=100,
                                    max_depth=4,
                                    learn_rate=0.1)

Specify the predictor set and response

The model object, like all H2O estimator objects, has a train method, which will actually perform model training. At this step we specify the training and (optionally) a validation set, along with the response and predictor variables.

The x argument should be a list of predictor names in the training frame, and y specifies the response column. We have already set y = "eyeDetector" above, but we still need to specify x.



In [25]:

    
x = list(train.columns)
x









    Out[25]:





[u'AF3',
 u'F7',
 u'F3',
 u'FC5',
 u'T7',
 u'P7',
 u'O1',
 u'O2',
 u'P8',
 u'T8',
 u'FC6',
 u'F4',
 u'F8',
 u'AF4',
 u'eyeDetection',
 u'split']



In [27]:

    
del x[14:16]  #Remove the 14th and 15th columns, 'eyeDetection' and 'split'
x









    Out[27]:





[u'AF3',
 u'F7',
 u'F3',
 u'FC5',
 u'T7',
 u'P7',
 u'O1',
 u'O2',
 u'P8',
 u'T8',
 u'FC6',
 u'F4']

Now that we have specified x and y, we can train the model:



In [28]:

    
model.train(x=x, y=y, training_frame=train, validation_frame=valid)









    



gbm Model Build Progress: [##################################################] 100%

Inspect Model

The type of results shown when you print a model, are determined by the following:

Model class of the estimator (e.g. GBM, RF, GLM, DL)
The type of machine learning problem (e.g. binary classification, multiclass classification, regression)
The data you specify (e.g. training_frame only, training_frame and validation_frame, or training_frame and nfolds)

Below, we see a GBM Model Summary, as well as training and validation metrics since we supplied a validation_frame. Since this a binary classification task, we are shown the relevant performance metrics, which inclues: MSE, R^2, LogLoss, AUC and Gini. Also, we are shown a Confusion Matrix, where the threshold for classification is chosen automatically (by H2O) as the threshold which maximizes the F1 score.

The scoring history is also printed, which shows the performance metrics over some increment such as "number of trees" in the case of GBM and RF.

Lastly, for tree-based methods (GBM and RF), we also print variable importance.



In [27]:

    
print(model)









    



Model Details
=============
H2OGradientBoostingEstimator :  Gradient Boosting Machine
Model Key:  GBM_model_python_1448559565749_9080

Model Summary:






    





number_of_trees
model_size_in_bytes
min_depth
max_depth
mean_depth
min_leaves
max_leaves
mean_leaves

100.0
23614.0
4.0
4.0
4.0
10.0
16.0
14.9






    




ModelMetricsBinomial: gbm
** Reported on train data. **

MSE: 0.114026790434
R^2: 0.539835211
LogLoss: 0.376005292812
AUC: 0.936370388939
Gini: 0.872740777878

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.43076103173:






    





0
1
Error
Rate
0
4102.0
814.0
0.1656
 (814.0/4916.0)
1
534.0
3538.0
0.1311
 (534.0/4072.0)
Total
4636.0
4352.0
0.15
 (1348.0/8988.0)






    



Maximum Metrics: Maximum metrics at their respective thresholds







    




metric
threshold
value
idx
max f1
0.4307610
0.8399810
225.0
max f2
0.2903699
0.8927355
280.0
max f0point5
0.5822403
0.8685376
167.0
max accuracy
0.5002179
0.8529150
199.0
max precision
0.9870341
1.0
0.0
max absolute_MCC
0.5002179
0.7030709
199.0
max min_per_class_accuracy
0.4473262
0.8492677
218.0






    



Gains/Lift Table: Avg response rate: 45.30 %







    





group
lower_threshold
cumulative_data_fraction
response_rate
cumulative_response_rate
capture_rate
cumulative_capture_rate
lift
cumulative_lift
gain
cumulative_gain

1
0.9206586
0.0500668
1.0
1.0
0.1105108
0.1105108
2.2072692
2.2072692
120.7269155
120.7269155

2
0.8721988
0.1000223
0.9955457
0.9977753
0.1097741
0.2202849
2.1974372
2.2023587
119.7437221
120.2358657

3
0.8154129
0.1500890
0.9888889
0.9948110
0.1092829
0.3295678
2.1827439
2.1958156
118.2743942
119.5815572

4
0.7587556
0.2000445
0.9599109
0.9860957
0.1058448
0.4354126
2.1187818
2.1765785
111.8781750
117.6578538

5
0.7006297
0.25
0.9020045
0.9692924
0.0994597
0.5348723
1.9909666
2.1394892
99.0966610
113.9489195

6
0.6450366
0.3000668
0.8644444
0.9517983
0.0955305
0.6304028
1.9080616
2.1008750
90.8061559
110.0875017

7
0.5826161
0.3500223
0.7260579
0.9195804
0.0800589
0.7104617
1.6026052
2.0297615
60.2605222
102.9761496

8
0.5183693
0.3999777
0.5924276
0.8787204
0.0653242
0.7757859
1.3076472
1.9395725
30.7647206
93.9572534

9
0.4640757
0.4500445
0.5222222
0.8390606
0.0577112
0.8334971
1.1526850
1.8520325
15.2685003
85.2032512

10
0.4124356
0.5
0.4365256
0.7988429
0.0481336
0.8816306
0.9635295
1.7632613
-3.6470480
76.3261297

11
0.3623212
0.5499555
0.3385301
0.7570301
0.0373281
0.9189587
0.7472270
1.6709693
-25.2773025
67.0969286

12
0.3185648
0.6000223
0.2666667
0.7161135
0.0294695
0.9484283
0.5886051
1.5806552
-41.1394892
58.0655197

13
0.2778815
0.6499777
0.1826281
0.6751113
0.0201375
0.9685658
0.4031093
1.4901523
-59.6890711
49.0152268

14
0.2374738
0.6999332
0.1469933
0.6374185
0.0162083
0.9847741
0.3244538
1.4069543
-67.5546182
40.6954270

15
0.2014388
0.75
0.0644444
0.5991693
0.0071218
0.9918959
0.1422462
1.3225278
-85.7753766
32.2527832

16
0.1656835
0.7999555
0.0356347
0.5639777
0.0039293
0.9958251
0.0786555
1.2448507
-92.1344529
24.4850685

17
0.1326734
0.8499110
0.0200445
0.5320068
0.0022102
0.9980354
0.0442437
1.1742822
-95.5756298
17.4282216

18
0.1036424
0.8999777
0.0088889
0.5029052
0.0009823
0.9990177
0.0196202
1.1100471
-98.0379830
11.0047092

19
0.0758269
0.9499332
0.0044543
0.4766924
0.0004912
0.9995088
0.0098319
1.0521885
-99.0168066
5.2188506

20
0.0081794
1.0
0.0044444
0.4530485
0.0004912
1.0
0.0098101
1.0
-99.0189915
0.0






    




ModelMetricsBinomial: gbm
** Reported on validation data. **

MSE: 0.124121459821
R^2: 0.499326493922
LogLoss: 0.400023227684
AUC: 0.917514329947
Gini: 0.835028659894

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.460577730095:






    





0
1
Error
Rate
0
1364.0
271.0
0.1657
 (271.0/1635.0)
1
223.0
1138.0
0.1639
 (223.0/1361.0)
Total
1587.0
1409.0
0.1649
 (494.0/2996.0)






    



Maximum Metrics: Maximum metrics at their respective thresholds







    




metric
threshold
value
idx
max f1
0.4605777
0.8216606
211.0
max f2
0.3288904
0.8816658
269.0
max f0point5
0.5924794
0.8415575
158.0
max accuracy
0.4749281
0.8354473
205.0
max precision
0.9728563
1.0
0.0
max absolute_MCC
0.4620595
0.6691432
210.0
max min_per_class_accuracy
0.4605777
0.8342508
211.0






    



Gains/Lift Table: Avg response rate: 45.43 %







    





group
lower_threshold
cumulative_data_fraction
response_rate
cumulative_response_rate
capture_rate
cumulative_capture_rate
lift
cumulative_lift
gain
cumulative_gain

1
0.9213965
0.0500668
1.0
1.0
0.1102131
0.1102131
2.2013226
2.2013226
120.1322557
120.1322557

2
0.8773782
0.1001335
0.9933333
0.9966667
0.1094783
0.2196914
2.1866471
2.1939848
118.6647073
119.3984815

3
0.8248667
0.1502003
0.9866667
0.9933333
0.1087436
0.3284350
2.1719716
2.1866471
117.1971590
118.6647073

4
0.7540418
0.2002670
0.92
0.975
0.1013960
0.4298310
2.0252168
2.1462895
102.5216752
114.6289493

5
0.7023105
0.25
0.8590604
0.9519359
0.0940485
0.5238795
1.8910690
2.0955180
89.1069042
109.5518001

6
0.6471993
0.3000668
0.7666667
0.9210234
0.0844967
0.6083762
1.6876806
2.0274695
68.7680627
102.7469496

7
0.5924509
0.3501335
0.7066667
0.8903718
0.0778839
0.6862601
1.5556013
1.9599955
55.5601274
95.9995489

8
0.5395983
0.4002003
0.6133333
0.8557131
0.0675974
0.7538575
1.3501445
1.8837005
35.0144502
88.3700537

9
0.4826484
0.4499332
0.5167785
0.8182493
0.0565760
0.8104335
1.1375962
1.8012305
13.7596221
80.1230549

10
0.4285123
0.5
0.4066667
0.7770360
0.0448200
0.8552535
0.8952045
1.7105070
-10.4795494
71.0506980

11
0.3800505
0.5500668
0.42
0.7445388
0.0462895
0.9015430
0.9245555
1.6389701
-7.5444526
63.8970132

12
0.3377836
0.6001335
0.3066667
0.7080089
0.0337987
0.9353417
0.6750723
1.5585560
-32.4927749
55.8555959

13
0.2898491
0.6498665
0.1946309
0.6687211
0.0213079
0.9566495
0.4284453
1.4720709
-57.1554670
47.2070862

14
0.2490383
0.6999332
0.1
0.6280401
0.0110213
0.9676708
0.2201323
1.3825187
-77.9867744
38.2518745

15
0.2117511
0.75
0.0866667
0.5919003
0.0095518
0.9772226
0.1907813
1.3029635
-80.9218712
30.2963507

16
0.1752189
0.8000668
0.1
0.5611181
0.0110213
0.9882439
0.2201323
1.2352019
-77.9867744
23.5201852

17
0.1445499
0.8497997
0.0469799
0.5310291
0.0051433
0.9933872
0.1034178
1.1689663
-89.6582162
16.8966260

18
0.1085791
0.8998665
0.0266667
0.5029674
0.0029390
0.9963262
0.0587019
1.1071934
-94.1298065
10.7193393

19
0.0788782
0.9499332
0.02
0.4775123
0.0022043
0.9985305
0.0440265
1.0511586
-95.5973549
5.1158593

20
0.0087775
1.0
0.0133333
0.4542724
0.0014695
1.0
0.0293510
1.0
-97.0649033
0.0






    




Scoring History:






    





timestamp
duration
number_of_trees
training_MSE
training_logloss
training_AUC
training_lift
training_classification_error
validation_MSE
validation_logloss
validation_AUC
validation_lift
validation_classification_error

2015-11-26 10:25:17
 0.036 sec
1.0
0.2393135
0.6716245
0.7312101
1.9746189
0.3159769
0.2393579
0.6717168
0.7310525
2.0087068
0.3197597

2015-11-26 10:25:17
 0.071 sec
2.0
0.2324178
0.6576643
0.7498719
2.0431475
0.3172007
0.2326440
0.6581388
0.7446153
2.0202614
0.3147530

2015-11-26 10:25:18
 0.102 sec
3.0
0.2261150
0.6447969
0.7735294
2.0350459
0.3147530
0.2265954
0.6458099
0.7649803
2.0103402
0.2973965

2015-11-26 10:25:18
 0.138 sec
4.0
0.2205965
0.6333786
0.7768265
2.0432395
0.2890521
0.2210435
0.6343441
0.7709237
2.0264511
0.3034045

2015-11-26 10:25:18
 0.183 sec
5.0
0.2152810
0.6223737
0.7890473
2.1408557
0.2900534
0.2157338
0.6233813
0.7850207
2.0890102
0.3017356
---
---
---
---
---
---
---
---
---
---
---
---
---
---

2015-11-26 10:25:21
 3.451 sec
34.0
0.1554295
0.4845459
0.8784078
2.2023641
0.2209613
0.1583969
0.4916962
0.8697198
2.2013226
0.2166222

2015-11-26 10:25:21
 3.612 sec
35.0
0.1541826
0.4815032
0.8802874
2.2024603
0.2212951
0.1571958
0.4887622
0.8718223
2.2013226
0.2192924

2015-11-26 10:25:21
 3.770 sec
36.0
0.1533055
0.4793517
0.8821833
2.2024073
0.2170672
0.1562774
0.4865299
0.8738704
2.2013226
0.2176235

2015-11-26 10:25:21
 3.927 sec
37.0
0.1523871
0.4767990
0.8825759
2.2024073
0.2159546
0.1554071
0.4841040
0.8742059
2.2013226
0.2152870

2015-11-26 10:25:22
 4.811 sec
100.0
0.1140268
0.3760053
0.9363704
2.2072692
0.1499777
0.1241215
0.4000232
0.9175143
2.2013226
0.1648865






    



Variable Importances:






    




variable
relative_importance
scaled_importance
percentage
P7
1308.6181641
1.0
0.2080290
O1
1043.3781738
0.7973129
0.1658642
F7
835.8756714
0.6387468
0.1328779
AF3
730.2410889
0.5580246
0.1160853
F4
465.6864014
0.3558612
0.0740295
O2
465.3877869
0.3556330
0.0739820
T8
340.6835938
0.2603384
0.0541580
FC6
282.6287231
0.2159749
0.0449291
FC5
249.6704864
0.1907894
0.0396897
F3
238.9603577
0.1826051
0.0379872
T7
233.7027283
0.1785874
0.0371514
P8
95.7217865
0.0731472
0.0152167

Model Performance on a Test Set

Once a model has been trained, you can also use it to make predictions on a test set. In the case above, we just ran the model once, so our validation set (passed as validation_frame), could have also served as a "test set." We technically have already created test set predictions and evaluated test set performance.

However, when performing model selection over a variety of model parameters, it is common for users to train a variety of models (using different parameters) using the training set, train, and a validation set, valid. Once the user selects the best model (based on validation set performance), the true test of model performance is performed by making a final set of predictions on the held-out (never been used before) test set, test.

You can use the model_performance method to generate predictions on a new dataset. The results are stored in an object of class, "H2OBinomialModelMetrics".



In [28]:

    
perf = model.model_performance(test)
print(perf.__class__)









    



<class 'h2o.model.metrics_base.H2OBinomialModelMetrics'>

Individual model performance metrics can be extracted using methods like auc and mse. In the case of binary classification, we may be most interested in evaluating test set Area Under the ROC Curve (AUC).



In [30]:

    
perf.auc()









    Out[30]:





0.91671733144306



In [31]:

    
perf.mse()









    Out[31]:





0.12372290870105287

Cross-validated Performance

To perform k-fold cross-validation, you use the same code as above, but you specify nfolds as an integer greater than 1, or add a "fold_column" to your H2O Frame which indicates a fold ID for each row.

Unless you have a specific reason to manually assign the observations to folds, you will find it easiest to simply use the nfolds argument.

When performing cross-validation, you can still pass a validation_frame, but you can also choose to use the original dataset that contains all the rows. We will cross-validate a model below using the original H2O Frame which is called data.



In [32]:

    
cvmodel = H2OGradientBoostingEstimator(distribution='bernoulli',
                                       ntrees=100,
                                       max_depth=4,
                                       learn_rate=0.1,
                                       nfolds=5)

cvmodel.train(x=x, y=y, training_frame=data)









    



gbm Model Build Progress: [##################################################] 100%

This time around, we will simply pull the training and cross-validation metrics out of the model. To do so, you use the auc method again, and you can specify train or xval as True to get the correct metric.



In [33]:

    
print(cvmodel.auc(train=True))
print(cvmodel.auc(xval=True))









    



0.926208136139
0.909288088259

Grid Search

One way of evaluting models with different parameters is to perform a grid search over a set of parameter values. For example, in GBM, here are three model parameters that may be useful to search over:

ntrees: Number of trees
max_depth: Maximum depth of a tree
learn_rate: Learning rate in the GBM

We will define a grid as follows:



In [34]:

    
ntrees_opt = [5,50,100]
max_depth_opt = [2,3,5]
learn_rate_opt = [0.1,0.2]

hyper_params = {'ntrees': ntrees_opt, 
                'max_depth': max_depth_opt,
                'learn_rate': learn_rate_opt}

Define an "H2OGridSearch" object by specifying the algorithm (GBM) and the hyper parameters:



In [35]:

    
from h2o.grid.grid_search import H2OGridSearch

gs = H2OGridSearch(H2OGradientBoostingEstimator, hyper_params = hyper_params)

An "H2OGridSearch" object also has a train method, which is used to train all the models in the grid.



In [36]:

    
gs.train(x=x, y=y, training_frame=train, validation_frame=valid)









    



gbm Grid Build Progress: [##################################################] 100%

Compare Models



In [37]:

    
print(gs)









    



Grid Search Results for H2OGradientBoostingEstimator:






    




Model Id
Hyperparameters: [learn_rate, ntrees, max_depth]
mse
Grid_GBM_py_17_model_python_1448559565749_9958_model_17
[0.2, 100, 5]
0.0511374
Grid_GBM_py_17_model_python_1448559565749_9958_model_16
[0.2, 50, 5]
0.0825649
Grid_GBM_py_17_model_python_1448559565749_9958_model_8
[0.1, 100, 5]
0.0827864
Grid_GBM_py_17_model_python_1448559565749_9958_model_7
[0.1, 50, 5]
0.1148579
Grid_GBM_py_17_model_python_1448559565749_9958_model_14
[0.2, 100, 3]
0.1183549
Grid_GBM_py_17_model_python_1448559565749_9958_model_13
[0.2, 50, 3]
0.1433345
Grid_GBM_py_17_model_python_1448559565749_9958_model_5
[0.1, 100, 3]
0.1446235
Grid_GBM_py_17_model_python_1448559565749_9958_model_4
[0.1, 50, 3]
0.1671745
Grid_GBM_py_17_model_python_1448559565749_9958_model_11
[0.2, 100, 2]
0.1718563
Grid_GBM_py_17_model_python_1448559565749_9958_model_15
[0.2, 5, 5]
0.1796775
Grid_GBM_py_17_model_python_1448559565749_9958_model_10
[0.2, 50, 2]
0.1876036
Grid_GBM_py_17_model_python_1448559565749_9958_model_2
[0.1, 100, 2]
0.1879522
Grid_GBM_py_17_model_python_1448559565749_9958_model_1
[0.1, 50, 2]
0.2024828
Grid_GBM_py_17_model_python_1448559565749_9958_model_6
[0.1, 5, 5]
0.2042339
Grid_GBM_py_17_model_python_1448559565749_9958_model_12
[0.2, 5, 3]
0.2126602
Grid_GBM_py_17_model_python_1448559565749_9958_model_3
[0.1, 5, 3]
0.2264710
Grid_GBM_py_17_model_python_1448559565749_9958_model_9
[0.2, 5, 2]
0.2276228
Grid_GBM_py_17_model_python_1448559565749_9958_model_0
[0.1, 5, 2]
0.2355396



In [38]:

    
# print out the auc for all of the models
auc_table = gs.sort_by('auc(valid=True)',increasing=False)
print(auc_table)









    



Grid Search Results for H2OGradientBoostingEstimator:






    




Model Id
Hyperparameters: [learn_rate, ntrees, max_depth]
auc(valid=True)
Grid_GBM_py_17_model_python_1448559565749_9958_model_17
[0.2, 100, 5]
0.9604258
Grid_GBM_py_17_model_python_1448559565749_9958_model_8
[0.1, 100, 5]
0.9422169
Grid_GBM_py_17_model_python_1448559565749_9958_model_16
[0.2, 50, 5]
0.9417059
Grid_GBM_py_17_model_python_1448559565749_9958_model_7
[0.1, 50, 5]
0.9169205
Grid_GBM_py_17_model_python_1448559565749_9958_model_14
[0.2, 100, 3]
0.9134707
Grid_GBM_py_17_model_python_1448559565749_9958_model_13
[0.2, 50, 3]
0.8833912
Grid_GBM_py_17_model_python_1448559565749_9958_model_5
[0.1, 100, 3]
0.8803587
Grid_GBM_py_17_model_python_1448559565749_9958_model_4
[0.1, 50, 3]
0.8480756
Grid_GBM_py_17_model_python_1448559565749_9958_model_15
[0.2, 5, 5]
0.8447013
Grid_GBM_py_17_model_python_1448559565749_9958_model_6
[0.1, 5, 5]
0.8227407
Grid_GBM_py_17_model_python_1448559565749_9958_model_11
[0.2, 100, 2]
0.8102571
Grid_GBM_py_17_model_python_1448559565749_9958_model_2
[0.1, 100, 2]
0.7821284
Grid_GBM_py_17_model_python_1448559565749_9958_model_10
[0.2, 50, 2]
0.7818671
Grid_GBM_py_17_model_python_1448559565749_9958_model_12
[0.2, 5, 3]
0.7646862
Grid_GBM_py_17_model_python_1448559565749_9958_model_1
[0.1, 50, 2]
0.7548347
Grid_GBM_py_17_model_python_1448559565749_9958_model_3
[0.1, 5, 3]
0.7282790
Grid_GBM_py_17_model_python_1448559565749_9958_model_9
[0.2, 5, 2]
0.6927028
Grid_GBM_py_17_model_python_1448559565749_9958_model_0
[0.1, 5, 2]
0.6761445

The "best" model in terms of validation set AUC is listed first in auc_table.



In [39]:

    
best_model = h2o.get_model(auc_table['Model Id'][0])
best_model.auc()









    Out[39]:





0.9894042107804035

The last thing we may want to do is generate predictions on the test set using the "best" model, and evaluate the test set AUC.



In [40]:

    
best_perf = best_model.model_performance(test)
best_perf.auc()









    Out[40]:





0.9609710824540837

The test set AUC is approximately 0.96. Not bad!!



In [ ]:

AF3	F7	F3	FC5	T7	P7	O1	O2	P8	T8	FC6	F4	F8	AF4	split
4329.23	4009.23	4289.23	4148.21	4350.26	4586.15	4096.92	4641.03	4222.05	4238.46	4211.28	4280.51	4635.9	4393.85	valid
4324.62	4004.62	4293.85	4148.72	4342.05	4586.67	4097.44	4638.97	4210.77	4226.67	4207.69	4279.49	4632.82	4384.1	test
4327.69	4006.67	4295.38	4156.41	4336.92	4583.59	4096.92	4630.26	4207.69	4222.05	4206.67	4282.05	4628.72	4389.23	train
4328.72	4011.79	4296.41	4155.9	4343.59	4582.56	4097.44	4630.77	4217.44	4235.38	4210.77	4287.69	4632.31	4396.41	train
4326.15	4011.79	4292.31	4151.28	4347.69	4586.67	4095.9	4627.69	4210.77	4244.1	4212.82	4288.21	4632.82	4398.46	train
4321.03	4004.62	4284.1	4153.33	4345.64	4587.18	4093.33	4616.92	4202.56	4232.82	4209.74	4281.03	4628.21	4389.74	train
4319.49	4001.03	4280.51	4151.79	4343.59	4584.62	4089.74	4615.9	4212.31	4226.67	4201.03	4269.74	4625.13	4378.46	test
4325.64	4006.67	4278.46	4143.08	4344.1	4583.08	4087.18	4614.87	4205.64	4230.26	4195.9	4266.67	4622.05	4380.51	test
4326.15	4010.77	4276.41	4139.49	4345.13	4584.1	4091.28	4608.21	4187.69	4229.74	4202.05	4273.85	4627.18	4389.74	test
4326.15	4011.28	4276.92	4142.05	4344.1	4582.56	4092.82	4608.72	4194.36	4228.72	4212.82	4277.95	4637.44	4393.33	train

C1	C2	C3	C4	C5	C6	C7	C8	C9	C10	C11	C12	C13	C14	C15	C16
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

	number_of_trees	model_size_in_bytes	min_depth	max_depth	mean_depth	min_leaves	max_leaves	mean_leaves
	100.0	23614.0	4.0	4.0	4.0	10.0	16.0	14.9

	0	1	Error	Rate
0	4102.0	814.0	0.1656	(814.0/4916.0)
1	534.0	3538.0	0.1311	(534.0/4072.0)
Total	4636.0	4352.0	0.15	(1348.0/8988.0)

metric	threshold	value	idx
max f1	0.4307610	0.8399810	225.0
max f2	0.2903699	0.8927355	280.0
max f0point5	0.5822403	0.8685376	167.0
max accuracy	0.5002179	0.8529150	199.0
max precision	0.9870341	1.0	0.0
max absolute_MCC	0.5002179	0.7030709	199.0
max min_per_class_accuracy	0.4473262	0.8492677	218.0

group	lower_threshold	cumulative_data_fraction	response_rate	cumulative_response_rate	capture_rate	cumulative_capture_rate	lift	cumulative_lift	gain	cumulative_gain
1	0.9206586	0.0500668	1.0	1.0	0.1105108	0.1105108	2.2072692	2.2072692	120.7269155	120.7269155
2	0.8721988	0.1000223	0.9955457	0.9977753	0.1097741	0.2202849	2.1974372	2.2023587	119.7437221	120.2358657
3	0.8154129	0.1500890	0.9888889	0.9948110	0.1092829	0.3295678	2.1827439	2.1958156	118.2743942	119.5815572
4	0.7587556	0.2000445	0.9599109	0.9860957	0.1058448	0.4354126	2.1187818	2.1765785	111.8781750	117.6578538
5	0.7006297	0.25	0.9020045	0.9692924	0.0994597	0.5348723	1.9909666	2.1394892	99.0966610	113.9489195
6	0.6450366	0.3000668	0.8644444	0.9517983	0.0955305	0.6304028	1.9080616	2.1008750	90.8061559	110.0875017
7	0.5826161	0.3500223	0.7260579	0.9195804	0.0800589	0.7104617	1.6026052	2.0297615	60.2605222	102.9761496
8	0.5183693	0.3999777	0.5924276	0.8787204	0.0653242	0.7757859	1.3076472	1.9395725	30.7647206	93.9572534
9	0.4640757	0.4500445	0.5222222	0.8390606	0.0577112	0.8334971	1.1526850	1.8520325	15.2685003	85.2032512
10	0.4124356	0.5	0.4365256	0.7988429	0.0481336	0.8816306	0.9635295	1.7632613	-3.6470480	76.3261297
11	0.3623212	0.5499555	0.3385301	0.7570301	0.0373281	0.9189587	0.7472270	1.6709693	-25.2773025	67.0969286
12	0.3185648	0.6000223	0.2666667	0.7161135	0.0294695	0.9484283	0.5886051	1.5806552	-41.1394892	58.0655197
13	0.2778815	0.6499777	0.1826281	0.6751113	0.0201375	0.9685658	0.4031093	1.4901523	-59.6890711	49.0152268
14	0.2374738	0.6999332	0.1469933	0.6374185	0.0162083	0.9847741	0.3244538	1.4069543	-67.5546182	40.6954270
15	0.2014388	0.75	0.0644444	0.5991693	0.0071218	0.9918959	0.1422462	1.3225278	-85.7753766	32.2527832
16	0.1656835	0.7999555	0.0356347	0.5639777	0.0039293	0.9958251	0.0786555	1.2448507	-92.1344529	24.4850685
17	0.1326734	0.8499110	0.0200445	0.5320068	0.0022102	0.9980354	0.0442437	1.1742822	-95.5756298	17.4282216
18	0.1036424	0.8999777	0.0088889	0.5029052	0.0009823	0.9990177	0.0196202	1.1100471	-98.0379830	11.0047092
19	0.0758269	0.9499332	0.0044543	0.4766924	0.0004912	0.9995088	0.0098319	1.0521885	-99.0168066	5.2188506
20	0.0081794	1.0	0.0044444	0.4530485	0.0004912	1.0	0.0098101	1.0	-99.0189915	0.0

	0	1	Error	Rate
0	1364.0	271.0	0.1657	(271.0/1635.0)
1	223.0	1138.0	0.1639	(223.0/1361.0)
Total	1587.0	1409.0	0.1649	(494.0/2996.0)

	timestamp	duration	number_of_trees	training_MSE	training_logloss	training_AUC	training_lift	training_classification_error	validation_MSE	validation_logloss	validation_AUC	validation_lift	validation_classification_error
	2015-11-26 10:25:17	0.036 sec	1.0	0.2393135	0.6716245	0.7312101	1.9746189	0.3159769	0.2393579	0.6717168	0.7310525	2.0087068	0.3197597
	2015-11-26 10:25:17	0.071 sec	2.0	0.2324178	0.6576643	0.7498719	2.0431475	0.3172007	0.2326440	0.6581388	0.7446153	2.0202614	0.3147530
	2015-11-26 10:25:18	0.102 sec	3.0	0.2261150	0.6447969	0.7735294	2.0350459	0.3147530	0.2265954	0.6458099	0.7649803	2.0103402	0.2973965
	2015-11-26 10:25:18	0.138 sec	4.0	0.2205965	0.6333786	0.7768265	2.0432395	0.2890521	0.2210435	0.6343441	0.7709237	2.0264511	0.3034045
	2015-11-26 10:25:18	0.183 sec	5.0	0.2152810	0.6223737	0.7890473	2.1408557	0.2900534	0.2157338	0.6233813	0.7850207	2.0890102	0.3017356
---	---	---	---	---	---	---	---	---	---	---	---	---	---
	2015-11-26 10:25:21	3.451 sec	34.0	0.1554295	0.4845459	0.8784078	2.2023641	0.2209613	0.1583969	0.4916962	0.8697198	2.2013226	0.2166222
	2015-11-26 10:25:21	3.612 sec	35.0	0.1541826	0.4815032	0.8802874	2.2024603	0.2212951	0.1571958	0.4887622	0.8718223	2.2013226	0.2192924
	2015-11-26 10:25:21	3.770 sec	36.0	0.1533055	0.4793517	0.8821833	2.2024073	0.2170672	0.1562774	0.4865299	0.8738704	2.2013226	0.2176235
	2015-11-26 10:25:21	3.927 sec	37.0	0.1523871	0.4767990	0.8825759	2.2024073	0.2159546	0.1554071	0.4841040	0.8742059	2.2013226	0.2152870
	2015-11-26 10:25:22	4.811 sec	100.0	0.1140268	0.3760053	0.9363704	2.2072692	0.1499777	0.1241215	0.4000232	0.9175143	2.2013226	0.1648865

variable	relative_importance	scaled_importance	percentage
P7	1308.6181641	1.0	0.2080290
O1	1043.3781738	0.7973129	0.1658642
F7	835.8756714	0.6387468	0.1328779
AF3	730.2410889	0.5580246	0.1160853
F4	465.6864014	0.3558612	0.0740295
O2	465.3877869	0.3556330	0.0739820
T8	340.6835938	0.2603384	0.0541580
FC6	282.6287231	0.2159749	0.0449291
FC5	249.6704864	0.1907894	0.0396897
F3	238.9603577	0.1826051	0.0379872
T7	233.7027283	0.1785874	0.0371514
P8	95.7217865	0.0731472	0.0152167

Model Id	Hyperparameters: [learn_rate, ntrees, max_depth]	mse
Grid_GBM_py_17_model_python_1448559565749_9958_model_17	[0.2, 100, 5]	0.0511374
Grid_GBM_py_17_model_python_1448559565749_9958_model_16	[0.2, 50, 5]	0.0825649
Grid_GBM_py_17_model_python_1448559565749_9958_model_8	[0.1, 100, 5]	0.0827864
Grid_GBM_py_17_model_python_1448559565749_9958_model_7	[0.1, 50, 5]	0.1148579
Grid_GBM_py_17_model_python_1448559565749_9958_model_14	[0.2, 100, 3]	0.1183549
Grid_GBM_py_17_model_python_1448559565749_9958_model_13	[0.2, 50, 3]	0.1433345
Grid_GBM_py_17_model_python_1448559565749_9958_model_5	[0.1, 100, 3]	0.1446235
Grid_GBM_py_17_model_python_1448559565749_9958_model_4	[0.1, 50, 3]	0.1671745
Grid_GBM_py_17_model_python_1448559565749_9958_model_11	[0.2, 100, 2]	0.1718563
Grid_GBM_py_17_model_python_1448559565749_9958_model_15	[0.2, 5, 5]	0.1796775
Grid_GBM_py_17_model_python_1448559565749_9958_model_10	[0.2, 50, 2]	0.1876036
Grid_GBM_py_17_model_python_1448559565749_9958_model_2	[0.1, 100, 2]	0.1879522
Grid_GBM_py_17_model_python_1448559565749_9958_model_1	[0.1, 50, 2]	0.2024828
Grid_GBM_py_17_model_python_1448559565749_9958_model_6	[0.1, 5, 5]	0.2042339
Grid_GBM_py_17_model_python_1448559565749_9958_model_12	[0.2, 5, 3]	0.2126602
Grid_GBM_py_17_model_python_1448559565749_9958_model_3	[0.1, 5, 3]	0.2264710
Grid_GBM_py_17_model_python_1448559565749_9958_model_9	[0.2, 5, 2]	0.2276228
Grid_GBM_py_17_model_python_1448559565749_9958_model_0	[0.1, 5, 2]	0.2355396

C1	C2	C3	C4	C5	C6	C7	C8	C9	C10	C11	C12	C13	C14	C15	C16
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

Model Id	Hyperparameters: [learn_rate, ntrees, max_depth]	auc(valid=True)
Grid_GBM_py_17_model_python_1448559565749_9958_model_17	[0.2, 100, 5]	0.9604258
Grid_GBM_py_17_model_python_1448559565749_9958_model_8	[0.1, 100, 5]	0.9422169
Grid_GBM_py_17_model_python_1448559565749_9958_model_16	[0.2, 50, 5]	0.9417059
Grid_GBM_py_17_model_python_1448559565749_9958_model_7	[0.1, 50, 5]	0.9169205
Grid_GBM_py_17_model_python_1448559565749_9958_model_14	[0.2, 100, 3]	0.9134707
Grid_GBM_py_17_model_python_1448559565749_9958_model_13	[0.2, 50, 3]	0.8833912
Grid_GBM_py_17_model_python_1448559565749_9958_model_5	[0.1, 100, 3]	0.8803587
Grid_GBM_py_17_model_python_1448559565749_9958_model_4	[0.1, 50, 3]	0.8480756
Grid_GBM_py_17_model_python_1448559565749_9958_model_15	[0.2, 5, 5]	0.8447013
Grid_GBM_py_17_model_python_1448559565749_9958_model_6	[0.1, 5, 5]	0.8227407
Grid_GBM_py_17_model_python_1448559565749_9958_model_11	[0.2, 100, 2]	0.8102571
Grid_GBM_py_17_model_python_1448559565749_9958_model_2	[0.1, 100, 2]	0.7821284
Grid_GBM_py_17_model_python_1448559565749_9958_model_10	[0.2, 50, 2]	0.7818671
Grid_GBM_py_17_model_python_1448559565749_9958_model_12	[0.2, 5, 3]	0.7646862
Grid_GBM_py_17_model_python_1448559565749_9958_model_1	[0.1, 50, 2]	0.7548347
Grid_GBM_py_17_model_python_1448559565749_9958_model_3	[0.1, 5, 3]	0.7282790
Grid_GBM_py_17_model_python_1448559565749_9958_model_9	[0.2, 5, 2]	0.6927028
Grid_GBM_py_17_model_python_1448559565749_9958_model_0	[0.1, 5, 2]	0.6761445

C1	C2	C3	C4	C5	C6	C7	C8	C9	C10	C11	C12	C13	C14	C15	C16
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0