In [12]:
#conda install -c r ipython-notebook r-irkernel

#install.packages("RCurl")
#install.packages("jsonlite")
#install.packages("statmod")
#install.packages(c("devtools", "roxygen2", "testthat"))

In [1]:
library(h2o)
h2o.init(nthreads=-1)
if (!h2o.deepwater.available()) return()


Loading required package: statmod

----------------------------------------------------------------------

Your next step is to start H2O:
    > h2o.init()

For H2O package documentation, ask for help:
    > ??h2o

After starting H2O, you can use the Web UI at http://localhost:54321
For more information visit http://docs.h2o.ai

----------------------------------------------------------------------


Attaching package: ‘h2o’

The following objects are masked from ‘package:stats’:

    cor, sd, var

The following objects are masked from ‘package:base’:

    &&, %*%, %in%, ||, apply, as.factor, as.numeric, colnames,
    colnames<-, ifelse, is.character, is.factor, is.numeric, log,
    log10, log1p, log2, round, signif, trunc

 Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         21 minutes 43 seconds 
    H2O cluster version:        3.11.0.99999 
    H2O cluster version age:    21 minutes  
    H2O cluster name:           arno 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   13.93 GB 
    H2O cluster total cores:    12 
    H2O cluster allowed cores:  12 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    R Version:                  R version 3.3.1 (2016-06-21) 

Error in eval(expr, envir, enclos): could not find function "h2o.deepwater.available"
Traceback:

In [2]:
train <- as.h2o(iris)
predictors=1:4
response_col=5
hidden_opts <- list(c(20, 20), c(50, 50, 50), c(200,200), c(50,50,50,50,50))
activation_opts <- c("tanh", "rectifier")
learnrate_opts <- seq(1e-3, 1e-2, 1e-3)
max_models <- 1000      ## don't build more than this many models (this won't trigger)
nfolds <- 3             ## use cross-validation to rank models and to find optimal number of epochs for each model
seed <- 42
max_runtime_secs <- 30  ## limit overall time (this triggers)


  |======================================================================| 100%

In [3]:
hyper_params <- list(activation = activation_opts, hidden = hidden_opts, learning_rate = learnrate_opts)
search_criteria = list(strategy = "RandomDiscrete",
                       max_models = max_models, seed = seed, max_runtime_secs = max_runtime_secs,
                       stopping_rounds=5,          ## enable early stopping of the overall leaderboard
                       stopping_metric="logloss",
                       stopping_tolerance=1e-4)

In [4]:
dw_grid = h2o.grid("deepwater", grid_id="deepwater_grid",
                   x=predictors, y=response_col, training_frame=train,
                   epochs=500,                 ## long enough to allow early stopping 
                   nfolds=nfolds,              
                   stopping_rounds=3,          ## enable early stopping of each model in the hyperparameter search
                   stopping_metric="logloss",
                   stopping_tolerance=1e-3,    ## stop once validation logloss of the cv models doesn't improve enough
                   hyper_params=hyper_params,  
                   search_criteria = search_criteria)


  |======================================================================| 100%

In [5]:
dw_grid


H2O Grid Details
================

Grid ID: deepwater_grid 
Used hyper parameters: 
  -  activation 
  -  hidden 
  -  learning_rate 
Number of models: 17 
Number of failed models: 0 

Hyper-Parameter Search Summary: ordered by increasing logloss
   activation               hidden learning_rate               model_ids
1        Tanh             [20, 20]          0.01  deepwater_grid_model_5
2   Rectifier             [20, 20]         0.001 deepwater_grid_model_12
3   Rectifier             [20, 20]         0.004  deepwater_grid_model_0
4   Rectifier             [20, 20]         0.001  deepwater_grid_model_3
5        Tanh             [20, 20]          0.01 deepwater_grid_model_14
6        Tanh           [200, 200]         0.005 deepwater_grid_model_16
7        Tanh [50, 50, 50, 50, 50]         0.009  deepwater_grid_model_2
8        Tanh           [200, 200]         0.005  deepwater_grid_model_7
9        Tanh         [50, 50, 50]         0.002 deepwater_grid_model_15
10       Tanh         [50, 50, 50]         0.002  deepwater_grid_model_6
11       Tanh [50, 50, 50, 50, 50]         0.009 deepwater_grid_model_11
12  Rectifier           [200, 200]         0.008 deepwater_grid_model_13
13  Rectifier         [50, 50, 50]         0.009  deepwater_grid_model_1
14  Rectifier           [200, 200]         0.008  deepwater_grid_model_4
15  Rectifier         [50, 50, 50]         0.009 deepwater_grid_model_10
16  Rectifier [50, 50, 50, 50, 50]         0.007  deepwater_grid_model_8
17  Rectifier [50, 50, 50, 50, 50]         0.007 deepwater_grid_model_17
               logloss
1  0.10360537308271477
2  0.11026169578623926
3   0.1625494618028055
4  0.18870357386174968
5   0.2840712183726002
6   0.3728919690428224
7   0.4132216265277581
8   0.4676174315562619
9   0.5330389890839002
10  0.6295032648019662
11  0.7263985566596572
12  0.8995364557765182
13  1.2044396481497057
14  1.4559550555517036
15  1.7264889062929403
16  2.3025850929940455
17  2.5328436022934504