In [1]:
library(h2o)
h2o.init(nthreads=-1)
if (!h2o.deepwater_available()) return()


Loading required package: statmod

----------------------------------------------------------------------

Your next step is to start H2O:
    > h2o.init()

For H2O package documentation, ask for help:
    > ??h2o

After starting H2O, you can use the Web UI at http://localhost:54321
For more information visit http://docs.h2o.ai

----------------------------------------------------------------------


Attaching package: ‘h2o’

The following objects are masked from ‘package:stats’:

    cor, sd, var

The following objects are masked from ‘package:base’:

    ||, &&, %*%, apply, as.factor, as.numeric, colnames, colnames<-,
    ifelse, %in%, is.character, is.factor, is.numeric, log, log10,
    log1p, log2, round, signif, trunc

 Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         7 minutes 30 seconds 
    H2O cluster version:        3.11.0.99999 
    H2O cluster version age:    9 hours and 21 minutes  
    H2O cluster name:           ubuntu 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   3.11 GB 
    H2O cluster total cores:    8 
    H2O cluster allowed cores:  8 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    R Version:                  R version 3.2.3 (2015-12-10) 

Error in eval(expr, envir, enclos): could not find function "h2o.deepwater_available"
Traceback:

Data Ingest

Image dataset

For simplicity, H2O Deep Water allows you to specify a list of URIs (file paths) or URLs (links) to images, together with a response column (either a class membership (enum) or regression target (numeric)).

For this example, we use this simple cat/dog/mouse dataset that has a few hundred images, and a label of cardinality 3.


In [2]:
df <- h2o.importFile("/home/ubuntu/h2o-3/bigdata/laptop/deepwater/imagenet/cat_dog_mouse.csv")
print(head(df))
path = 1 ## must be the first column
response = 2


  |======================================================================| 100%
                                                               C1  C2
1  bigdata/laptop/deepwater/imagenet/cat/102194502_49f003abd9.jpg cat
2   bigdata/laptop/deepwater/imagenet/cat/11146807_00a5f35255.jpg cat
3 bigdata/laptop/deepwater/imagenet/cat/1140846215_70e326f868.jpg cat
4  bigdata/laptop/deepwater/imagenet/cat/114170569_6cbdf4bbdb.jpg cat
5 bigdata/laptop/deepwater/imagenet/cat/1217664848_de4c7fc296.jpg cat
6 bigdata/laptop/deepwater/imagenet/cat/1241603780_5e8c8f1ced.jpg cat

Let's look at a random subset of 10 images

Now, we build a classic convolutional neural network, called LeNet

We'll use a GPU to train such a LeNet model in seconds

To build a LeNet image classification model in H2O, simply specify network = "lenet":


In [4]:
model <- h2o.deepwater(x=path, y=response, 
                       training_frame=df, epochs=50, 
                       learning_rate=1e-3, network = "lenet")
model


  |======================================================================| 100%
Model Details:
==============

H2OMultinomialModel: deepwater
Model ID:  DeepWater_model_R_1477378862430_2 
Status of Deep Learning Model: lenet, 1.6 MB, predicting C2, 3-class classification, 14,336 training samples, mini-batch size 32
  input_neurons     rate momentum
1          2352 0.000986 0.990000


H2OMultinomialMetrics: deepwater
** Reported on training data. **
** Metrics reported on full training frame **

Training Set Metrics: 
=====================

Extract training frame with `h2o.getFrame("cat_dog_mouse.hex_sid_95f8_1")`
MSE: (Extract with `h2o.mse`) 0.131072
RMSE: (Extract with `h2o.rmse`) 0.3620386
Logloss: (Extract with `h2o.logloss`) 0.4176429
Mean Per-Class Error: 0.1165104
Confusion Matrix: Extract with `h2o.confusionMatrix(<model>,train = TRUE)`)
=========================================================================
Confusion Matrix: vertical: actual; across: predicted
       cat dog mouse  Error       Rate
cat     75   4    11 0.1667 =  15 / 90
dog      4  75     6 0.1176 =  10 / 85
mouse    3   3    86 0.0652 =   6 / 92
Totals  82  82   103 0.1161 = 31 / 267

Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>,train = TRUE)`
=======================================================================
Top-3 Hit Ratios: 
  k hit_ratio
1 1  0.883895
2 2  0.973783
3 3  1.000000



If you'd like to build your own LeNet network architecture, then this is easy as well. In this example script, we are using the 'mxnet' backend. Models can easily be imported/exported between H2O and MXNet since H2O uses MXNet's format for model definition.


In [5]:
get_symbol <- function(num_classes = 1000) {
  library(mxnet)
  data <- mx.symbol.Variable('data')
  # first conv
  conv1 <- mx.symbol.Convolution(data = data, kernel = c(5, 5), num_filter = 20)

  tanh1 <- mx.symbol.Activation(data = conv1, act_type = "tanh")
  pool1 <- mx.symbol.Pooling(data = tanh1, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))

  # second conv
  conv2 <- mx.symbol.Convolution(data = pool1, kernel = c(5, 5), num_filter = 50)
  tanh2 <- mx.symbol.Activation(data = conv2, act_type = "tanh")
  pool2 <- mx.symbol.Pooling(data = tanh2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))
  # first fullc
  flatten <- mx.symbol.Flatten(data = pool2)
  fc1 <- mx.symbol.FullyConnected(data = flatten, num_hidden = 500)
  tanh3 <- mx.symbol.Activation(data = fc1, act_type = "tanh")
  # second fullc
  fc2 <- mx.symbol.FullyConnected(data = tanh3, num_hidden = num_classes)
  # loss
  lenet <- mx.symbol.SoftmaxOutput(data = fc2, name = 'softmax')
  return(lenet)
}

In [7]:
nclasses = h2o.nlevels(df[,response])
network <- get_symbol(nclasses)
cat(network$as.json(), file = "/tmp/symbol_lenet-R.json", sep = '')

In [8]:
# sudo apt-get install graphviz
graph.viz(network$as.json())


HTML widgets cannot be represented in plain text (need html)

In [9]:
model = h2o.deepwater(x=path, y=response, training_frame = df,
                      epochs=500, ## early stopping is on by default and might trigger before
                      network_definition_file="/tmp/symbol_lenet-R.json",  ## specify the model
                      image_shape=c(28,28),                                ## provide expected (or matching) image size
                      channels=3)                                          ## 3 for color, 1 for monochrome


  |======================================================================| 100%

In [10]:
summary(model)


Model Details:
==============

H2OMultinomialModel: deepwater
Model Key:  DeepWater_model_R_1477378862430_3 
Status of Deep Learning Model: user, 1.6 MB, predicting C2, 3-class classification, 134,144 training samples, mini-batch size 32
  input_neurons     rate momentum
1          2352 0.004409 0.990000

H2OMultinomialMetrics: deepwater
** Reported on training data. **
** Metrics reported on full training frame **

Training Set Metrics: 
=====================

Extract training frame with `h2o.getFrame("cat_dog_mouse.hex_sid_95f8_1")`
MSE: (Extract with `h2o.mse`) 0.03078524
RMSE: (Extract with `h2o.rmse`) 0.1754572
Logloss: (Extract with `h2o.logloss`) 0.1154222
Mean Per-Class Error: 0.03366487
Confusion Matrix: Extract with `h2o.confusionMatrix(<model>,train = TRUE)`)
=========================================================================
Confusion Matrix: vertical: actual; across: predicted
       cat dog mouse  Error      Rate
cat     88   2     0 0.0222 =  2 / 90
dog      2  82     1 0.0353 =  3 / 85
mouse    1   3    88 0.0435 =  4 / 92
Totals  91  87    89 0.0337 = 9 / 267

Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>,train = TRUE)`
=======================================================================
Top-3 Hit Ratios: 
  k hit_ratio
1 1  0.966292
2 2  0.996255
3 3  1.000000





Scoring History: 
            timestamp   duration training_speed    epochs iterations
1 2016-10-25 07:10:50  0.000 sec                  0.00000          0
2 2016-10-25 07:10:52  1.755 sec    627 obs/sec   3.83521          1
3 2016-10-25 07:10:57  6.907 sec   4990 obs/sec 126.56180         33
4 2016-10-25 07:11:02 12.020 sec   5607 obs/sec 249.28839         65
5 2016-10-25 07:11:07 17.160 sec   5843 obs/sec 372.01498         97
6 2016-10-25 07:11:12 22.305 sec   5969 obs/sec 494.74157        129
7 2016-10-25 07:11:13 22.656 sec   5971 obs/sec 502.41199        131
8 2016-10-25 07:11:13 22.713 sec   5966 obs/sec 502.41199        131
        samples training_rmse training_logloss training_classification_error
1      0.000000                                                             
2   1024.000000       0.66039          1.15794                       0.62547
3  33792.000000       0.56444          0.86754                       0.42322
4  66560.000000       0.54945          0.90199                       0.40075
5  99328.000000       0.32128          0.30245                       0.13483
6 132096.000000       0.17546          0.11542                       0.03371
7 134144.000000       0.30373          0.28780                       0.13483
8 134144.000000       0.17546          0.11542                       0.03371

To see how much slower training a convolutional neural net is trained without a GPU, disable the 'gpu' flag. Note that using MKL or other optimized implementations of BLAS can make this difference shrink a lot, but generally a GPU is at least 5x faster than the best CPU implementations for realistic workloads, and often 50x faster or even more.

Instead of training for 500 epochs on the GPU, we'll train for 10 epochs on the CPU.


In [11]:
model = h2o.deepwater(x=path, y=response, training_frame = df,
                      epochs=10,
                      network_definition_file="/tmp/symbol_lenet-R.json",
                      image_shape=c(28,28),
                      channels=3,
                      gpu=FALSE)


  |======================================================================| 100%

In [12]:
summary(model)


Model Details:
==============

H2OMultinomialModel: deepwater
Model Key:  DeepWater_model_R_1477378862430_4 
Status of Deep Learning Model: user, 1.6 MB, predicting C2, 3-class classification, 3,072 training samples, mini-batch size 32
  input_neurons     rate momentum
1          2352 0.004985 0.927648

H2OMultinomialMetrics: deepwater
** Reported on training data. **
** Metrics reported on full training frame **

Training Set Metrics: 
=====================

Extract training frame with `h2o.getFrame("cat_dog_mouse.hex_sid_95f8_1")`
MSE: (Extract with `h2o.mse`) 0.3830328
RMSE: (Extract with `h2o.rmse`) 0.6188964
Logloss: (Extract with `h2o.logloss`) 1.007723
Mean Per-Class Error: 0.5300322
Confusion Matrix: Extract with `h2o.confusionMatrix(<model>,train = TRUE)`)
=========================================================================
Confusion Matrix: vertical: actual; across: predicted
       cat dog mouse  Error        Rate
cat     59   7    24 0.3444 =   31 / 90
dog     42  17    26 0.8000 =   68 / 85
mouse   37   4    51 0.4457 =   41 / 92
Totals 138  28   101 0.5243 = 140 / 267

Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>,train = TRUE)`
=======================================================================
Top-3 Hit Ratios: 
  k hit_ratio
1 1  0.475655
2 2  0.801498
3 3  1.000000





Scoring History: 
            timestamp   duration training_speed   epochs iterations     samples
1 2016-10-25 07:11:20  0.000 sec                 0.00000          0    0.000000
2 2016-10-25 07:11:27  7.852 sec    141 obs/sec  3.83521          1 1024.000000
3 2016-10-25 07:11:34 15.001 sec    147 obs/sec  7.67041          2 2048.000000
4 2016-10-25 07:11:42 22.781 sec    144 obs/sec 11.50562          3 3072.000000
5 2016-10-25 07:11:43 23.312 sec    144 obs/sec 11.50562          3 3072.000000
  training_rmse training_logloss training_classification_error
1                                                             
2       0.61890          1.00772                       0.52434
3       0.64650          1.21219                       0.51311
4       0.66322          1.17192                       0.62921
5       0.61890          1.00772                       0.52434