In [1]:
library(h2o)
#Start H2O on your local machine using all available cores.
#By default, CRAN policies limit use to only 2 cores.
h2o.init(nthreads = -1)
#Show a demo
demo(h2o.glm)
demo(h2o.gbm)
demo(h2o.deeplearning)
#prova molto positiva


----------------------------------------------------------------------

Your next step is to start H2O:
    > h2o.init()

For H2O package documentation, ask for help:
    > ??h2o

After starting H2O, you can use the Web UI at http://localhost:54321
For more information visit http://docs.h2o.ai

----------------------------------------------------------------------


Attaching package: ‘h2o’

The following objects are masked from ‘package:stats’:

    cor, sd, var

The following objects are masked from ‘package:base’:

    &&, %*%, %in%, ||, apply, as.factor, as.numeric, colnames,
    colnames<-, ifelse, is.character, is.factor, is.numeric, log,
    log10, log1p, log2, round, signif, trunc

H2O is not running yet, starting it now...

Note:  In case of errors look at the following log files:
    /tmp/RtmpiLXSNs/h2o_micio1970_started_from_r.out
    /tmp/RtmpiLXSNs/h2o_micio1970_started_from_r.err


Starting H2O JVM and connecting: ... Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         2 seconds 655 milliseconds 
    H2O cluster version:        3.10.2.2 
    H2O cluster version age:    1 month and 6 days  
    H2O cluster name:           H2O_started_from_R_micio1970_hro550 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   1.71 GB 
    H2O cluster total cores:    4 
    H2O cluster allowed cores:  4 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    R Version:                  R version 3.3.2 (2016-10-31) 



	demo(h2o.glm)
	---- ~~~~~~~

> # This is a demo of H2O's GLM function
> # It imports a data set, parses it, and prints a summary
> # Then, it runs GLM with a binomial link function using 10-fold cross-validation
> # Note: This demo runs H2O on localhost:54321
> library(h2o)

> h2o.init()
 Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         2 seconds 827 milliseconds 
    H2O cluster version:        3.10.2.2 
    H2O cluster version age:    1 month and 6 days  
    H2O cluster name:           H2O_started_from_R_micio1970_hro550 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   1.71 GB 
    H2O cluster total cores:    4 
    H2O cluster allowed cores:  4 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    R Version:                  R version 3.3.2 (2016-10-31) 


> prostate.hex = h2o.uploadFile(path = system.file("extdata", "prostate.csv", package="h2o"), destination_frame = "prostate.hex")
  |======================================================================| 100%

> summary(prostate.hex)
Warning message in summary.H2OFrame(prostate.hex):
“Approximated quantiles computed! If you are interested in exact quantiles, please pass the `exact_quantiles=TRUE` parameter.”
 ID               CAPSULE          AGE             RACE           
 Min.   :  1.00   Min.   :0.0000   Min.   :43.00   Min.   :0.000  
 1st Qu.: 95.75   1st Qu.:0.0000   1st Qu.:62.00   1st Qu.:1.000  
 Median :190.50   Median :0.0000   Median :67.00   Median :1.000  
 Mean   :190.50   Mean   :0.4026   Mean   :66.04   Mean   :1.087  
 3rd Qu.:285.25   3rd Qu.:1.0000   3rd Qu.:71.00   3rd Qu.:1.000  
 Max.   :380.00   Max.   :1.0000   Max.   :79.00   Max.   :2.000  
 DPROS           DCAPS           PSA               VOL            
 Min.   :1.000   Min.   :1.000   Min.   :  0.300   Min.   : 0.00  
 1st Qu.:1.000   1st Qu.:1.000   1st Qu.:  4.900   1st Qu.: 0.00  
 Median :2.000   Median :1.000   Median :  8.664   Median :14.20  
 Mean   :2.271   Mean   :1.108   Mean   : 15.409   Mean   :15.81  
 3rd Qu.:3.000   3rd Qu.:1.000   3rd Qu.: 17.063   3rd Qu.:26.40  
 Max.   :4.000   Max.   :2.000   Max.   :139.700   Max.   :97.60  
 GLEASON        
 Min.   :0.000  
 1st Qu.:6.000  
 Median :6.000  
 Mean   :6.384  
 3rd Qu.:7.000  
 Max.   :9.000  

> prostate.glm = h2o.glm(x = c("AGE","RACE","PSA","DCAPS"), y = "CAPSULE", training_frame = prostate.hex, family = "binomial", alpha = 0.5)
  |======================================================================| 100%

> print(prostate.glm)
Model Details:
==============

H2OBinomialModel: glm
Model ID:  GLM_model_R_1487447500666_1 
GLM Model: summary
    family  link                                regularization
1 binomial logit Elastic Net (alpha = 0.5, lambda = 3.247E-4 )
  number_of_predictors_total number_of_active_predictors number_of_iterations
1                          4                           4                    4
  training_frame
1   prostate.hex

Coefficients: glm coefficients
      names coefficients standardized_coefficients
1 Intercept    -1.114418                 -0.337704
2       AGE    -0.010977                 -0.071648
3      RACE    -0.623216                 -0.192433
4     DCAPS     1.314591                  0.408386
5       PSA     0.046892                  0.937727

H2OBinomialMetrics: glm
** Reported on training data. **

MSE:  0.2027036
RMSE:  0.4502261
LogLoss:  0.5914634
Mean Per-Class Error:  0.3826121
AUC:  0.717601
Gini:  0.435202
R^2:  0.1572256
Null Deviance:  512.2888
Residual Deviance:  449.5122
AIC:  459.5122

Confusion Matrix for F1-optimal threshold:
        0   1    Error      Rate
0      80 147 0.647577  =147/227
1      18 135 0.117647   =18/153
Totals 98 282 0.434211  =165/380

Maximum Metrics: Maximum metrics at their respective thresholds
                        metric threshold    value idx
1                       max f1  0.284048 0.620690 274
2                       max f2  0.207093 0.778230 360
3                 max f0point5  0.413268 0.636672 108
4                 max accuracy  0.413268 0.705263 108
5                max precision  0.998478 1.000000   0
6                   max recall  0.207093 1.000000 360
7              max specificity  0.998478 1.000000   0
8             max absolute_mcc  0.413268 0.369123 108
9   max min_per_class_accuracy  0.331806 0.647577 176
10 max mean_per_class_accuracy  0.373175 0.672123 126

Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`



> myLabels = c(prostate.glm@model$x, "Intercept")

> plot(prostate.glm@model$coefficients, xaxt = "n", xlab = "Coefficients", ylab = "Values")

> axis(1, at = 1:length(myLabels), labels = myLabels)

> abline(h = 0, col = 2, lty = 2)

> title("Coefficients from Logistic Regression\n of Prostate Cancer Data")

> barplot(prostate.glm@model$coefficients, main = "Coefficients from Logistic Regression\n of Prostate Cancer Data")

	demo(h2o.gbm)
	---- ~~~~~~~

> # This is a demo of H2O's GBM function
> # It imports a data set, parses it, and prints a summary
> # Then, it runs GBM on a subset of the dataset
> # Note: This demo runs H2O on localhost:54321
> library(h2o)

> h2o.init()
 Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         28 seconds 380 milliseconds 
    H2O cluster version:        3.10.2.2 
    H2O cluster version age:    1 month and 6 days  
    H2O cluster name:           H2O_started_from_R_micio1970_hro550 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   1.54 GB 
    H2O cluster total cores:    4 
    H2O cluster allowed cores:  4 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    R Version:                  R version 3.3.2 (2016-10-31) 


> prostate.hex = h2o.uploadFile(path = system.file("extdata", "prostate.csv", package="h2o"), destination_frame = "prostate.hex")
  |======================================================================| 100%

> summary(prostate.hex)
Warning message in summary.H2OFrame(prostate.hex):
“Approximated quantiles computed! If you are interested in exact quantiles, please pass the `exact_quantiles=TRUE` parameter.”
 ID               CAPSULE          AGE             RACE           
 Min.   :  1.00   Min.   :0.0000   Min.   :43.00   Min.   :0.000  
 1st Qu.: 95.75   1st Qu.:0.0000   1st Qu.:62.00   1st Qu.:1.000  
 Median :190.50   Median :0.0000   Median :67.00   Median :1.000  
 Mean   :190.50   Mean   :0.4026   Mean   :66.04   Mean   :1.087  
 3rd Qu.:285.25   3rd Qu.:1.0000   3rd Qu.:71.00   3rd Qu.:1.000  
 Max.   :380.00   Max.   :1.0000   Max.   :79.00   Max.   :2.000  
 DPROS           DCAPS           PSA               VOL            
 Min.   :1.000   Min.   :1.000   Min.   :  0.300   Min.   : 0.00  
 1st Qu.:1.000   1st Qu.:1.000   1st Qu.:  4.900   1st Qu.: 0.00  
 Median :2.000   Median :1.000   Median :  8.664   Median :14.20  
 Mean   :2.271   Mean   :1.108   Mean   : 15.409   Mean   :15.81  
 3rd Qu.:3.000   3rd Qu.:1.000   3rd Qu.: 17.063   3rd Qu.:26.40  
 Max.   :4.000   Max.   :2.000   Max.   :139.700   Max.   :97.60  
 GLEASON        
 Min.   :0.000  
 1st Qu.:6.000  
 Median :6.000  
 Mean   :6.384  
 3rd Qu.:7.000  
 Max.   :9.000  

> prostate.gbm = h2o.gbm(x = setdiff(colnames(prostate.hex), "CAPSULE"), y = "CAPSULE", training_frame = prostate.hex, ntrees = 10, max_depth = 5, learn_rate = 0.1)
  |======================================================================| 100%

> print(prostate.gbm)
Model Details:
==============

H2ORegressionModel: gbm
Model ID:  GBM_model_R_1487447500666_3 
Model Summary: 
  number_of_trees number_of_internal_trees model_size_in_bytes min_depth
1              10                       10                3311         5
  max_depth mean_depth min_leaves max_leaves mean_leaves
1         5    5.00000         17         24    21.40000


H2ORegressionMetrics: gbm
** Reported on training data. **

MSE:  0.1358996
RMSE:  0.3686456
MAE:  0.3391817
RMSLE:  0.259886
Mean Residual Deviance :  0.1358996





> prostate.gbm2 = h2o.gbm(x = c("AGE", "RACE", "PSA", "VOL", "GLEASON"), y = "CAPSULE", training_frame = prostate.hex, ntrees = 10, max_depth = 8, min_rows = 10, learn_rate = 0.2)
  |======================================================================| 100%

> print(prostate.gbm2)
Model Details:
==============

H2ORegressionModel: gbm
Model ID:  GBM_model_R_1487447500666_4 
Model Summary: 
  number_of_trees number_of_internal_trees model_size_in_bytes min_depth
1              10                       10                3985         6
  max_depth mean_depth min_leaves max_leaves mean_leaves
1         8    7.50000         19         30    26.60000


H2ORegressionMetrics: gbm
** Reported on training data. **

MSE:  0.1051889
RMSE:  0.3243284
MAE:  0.2698054
RMSLE:  0.2289359
Mean Residual Deviance :  0.1051889





> # This is a demo of H2O's GBM use of default parameters on iris dataset (three classes)
> iris.hex = h2o.uploadFile(path = system.file("extdata", "iris.csv", package="h2o"), destination_frame = "iris.hex")
  |======================================================================| 100%

> summary(iris.hex)
Warning message in summary.H2OFrame(iris.hex):
“Approximated quantiles computed! If you are interested in exact quantiles, please pass the `exact_quantiles=TRUE` parameter.”
 C1              C2              C3              C4              
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.1000  
 1st Qu.:5.099   1st Qu.:2.799   1st Qu.:1.596   1st Qu.:0.2992  
 Median :5.798   Median :2.998   Median :4.348   Median :1.3000  
 Mean   :5.843   Mean   :3.054   Mean   :3.759   Mean   :1.1987  
 3rd Qu.:6.399   3rd Qu.:3.298   3rd Qu.:5.095   3rd Qu.:1.7992  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.5000  
 C5                 
 Iris-setosa    :50 
 Iris-versicolor:50 
 Iris-virginica :50 
                    
                    
                    

> iris.gbm = h2o.gbm(x = 1:4, y = 5, training_frame = iris.hex)
  |======================================================================| 100%

> print(iris.gbm)
Model Details:
==============

H2OMultinomialModel: gbm
Model ID:  GBM_model_R_1487447500666_5 
Model Summary: 
  number_of_trees number_of_internal_trees model_size_in_bytes min_depth
1              50                      150               28329         1
  max_depth mean_depth min_leaves max_leaves mean_leaves
1         5    4.92000          2         12    10.07333


H2OMultinomialMetrics: gbm
** Reported on training data. **

Training Set Metrics: 
=====================

Extract training frame with `h2o.getFrame("iris.hex")`
MSE: (Extract with `h2o.mse`) 0.00283639
RMSE: (Extract with `h2o.rmse`) 0.05325777
Logloss: (Extract with `h2o.logloss`) 0.01881246
Mean Per-Class Error: 0
Confusion Matrix: Extract with `h2o.confusionMatrix(<model>,train = TRUE)`)
=========================================================================
Confusion Matrix: vertical: actual; across: predicted
                Iris-setosa Iris-versicolor Iris-virginica  Error      Rate
Iris-setosa              50               0              0 0.0000 =  0 / 50
Iris-versicolor           0              50              0 0.0000 =  0 / 50
Iris-virginica            0               0             50 0.0000 =  0 / 50
Totals                   50              50             50 0.0000 = 0 / 150

Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>,train = TRUE)`
=======================================================================
Top-3 Hit Ratios: 
  k hit_ratio
1 1  1.000000
2 2  1.000000
3 3  1.000000






	demo(h2o.deeplearning)
	---- ~~~~~~~~~~~~~~~~

> # This is a demo of H2O's Deep Learning function
> # It imports a data set, parses it, and prints a summary
> # Then, it runs Deep Learning on the dataset
> # Note: This demo runs H2O on localhost:54321
> library(h2o)

> h2o.init()
 Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         33 seconds 190 milliseconds 
    H2O cluster version:        3.10.2.2 
    H2O cluster version age:    1 month and 6 days  
    H2O cluster name:           H2O_started_from_R_micio1970_hro550 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   1.54 GB 
    H2O cluster total cores:    4 
    H2O cluster allowed cores:  4 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    R Version:                  R version 3.3.2 (2016-10-31) 


> prostate.hex = h2o.uploadFile(path = system.file("extdata", "prostate.csv", package="h2o"), destination_frame = "prostate.hex")
  |======================================================================| 100%

> summary(prostate.hex)
Warning message in summary.H2OFrame(prostate.hex):
“Approximated quantiles computed! If you are interested in exact quantiles, please pass the `exact_quantiles=TRUE` parameter.”
 ID               CAPSULE          AGE             RACE           
 Min.   :  1.00   Min.   :0.0000   Min.   :43.00   Min.   :0.000  
 1st Qu.: 95.75   1st Qu.:0.0000   1st Qu.:62.00   1st Qu.:1.000  
 Median :190.50   Median :0.0000   Median :67.00   Median :1.000  
 Mean   :190.50   Mean   :0.4026   Mean   :66.04   Mean   :1.087  
 3rd Qu.:285.25   3rd Qu.:1.0000   3rd Qu.:71.00   3rd Qu.:1.000  
 Max.   :380.00   Max.   :1.0000   Max.   :79.00   Max.   :2.000  
 DPROS           DCAPS           PSA               VOL            
 Min.   :1.000   Min.   :1.000   Min.   :  0.300   Min.   : 0.00  
 1st Qu.:1.000   1st Qu.:1.000   1st Qu.:  4.900   1st Qu.: 0.00  
 Median :2.000   Median :1.000   Median :  8.664   Median :14.20  
 Mean   :2.271   Mean   :1.108   Mean   : 15.409   Mean   :15.81  
 3rd Qu.:3.000   3rd Qu.:1.000   3rd Qu.: 17.063   3rd Qu.:26.40  
 Max.   :4.000   Max.   :2.000   Max.   :139.700   Max.   :97.60  
 GLEASON        
 Min.   :0.000  
 1st Qu.:6.000  
 Median :6.000  
 Mean   :6.384  
 3rd Qu.:7.000  
 Max.   :9.000  

> # Set the CAPSULE column to be a factor column then build model.
> prostate.hex$CAPSULE = as.factor(prostate.hex$CAPSULE)

> model = h2o.deeplearning(x = setdiff(colnames(prostate.hex), c("ID","CAPSULE")), y = "CAPSULE", training_frame = prostate.hex, activation = "Tanh", hidden = c(10, 10, 10), epochs = 10000)
  |======================================================================| 100%

> print(model@model$model_summary)
Status of Neuron Layers: predicting CAPSULE, 2-class classification, bernoulli distribution, CrossEntropy loss, 322 weights/biases, 8,4 KB, 3.800.000 training samples, mini-batch size 1
  layer units    type dropout       l1       l2 mean_rate rate_rms momentum
1     1     7   Input  0.00 %                                              
2     2    10    Tanh  0.00 % 0.000000 0.000000  0.002345 0.001300 0.000000
3     3    10    Tanh  0.00 % 0.000000 0.000000  0.005969 0.007077 0.000000
4     4    10    Tanh  0.00 % 0.000000 0.000000  0.015578 0.019282 0.000000
5     5     2 Softmax         0.000000 0.000000  0.012518 0.001613 0.000000
  mean_weight weight_rms mean_bias bias_rms
1                                          
2   -0.129283   0.902729  0.007419 0.497297
3   -0.028854   0.993868 -0.197894 0.573969
4    0.048970   1.482009 -0.170856 0.789989
5   -0.297535   3.766088  0.001749 0.130906

> # Make predictions with the trained model with training data.
> predictions = predict(object = model, newdata = prostate.hex)
  |======================================================================| 100%

> # Export predictions from H2O Cluster as R dataframe.
> predictions.R = as.data.frame(predictions)

> head(predictions.R)
  predict           p0           p1
1       0 9.999997e-01 3.382484e-07
2       0 9.989477e-01 1.052301e-03
3       0 9.998880e-01 1.119508e-04
4       0 9.982859e-01 1.714129e-03
5       0 9.989841e-01 1.015916e-03
6       1 5.329026e-08 9.999999e-01

> tail(predictions.R)
    predict          p0           p1
375       0 0.999999978 2.203862e-08
376       0 0.999995975 4.024867e-06
377       0 0.999950882 4.911775e-05
378       1 0.005341035 9.946590e-01
379       0 0.999999671 3.285235e-07
380       0 0.998471500 1.528500e-03

> # Check performance of classification model.
> performance = h2o.performance(model = model)

> print(performance)
H2OBinomialMetrics: deeplearning
** Reported on training data. **
** Metrics reported on full training frame **

MSE:  0.01138505
RMSE:  0.1067008
LogLoss:  0.04281056
Mean Per-Class Error:  0.01420921
AUC:  0.9986179
Gini:  0.9972359

Confusion Matrix for F1-optimal threshold:
         0   1    Error    Rate
0      225   2 0.008811  =2/227
1        3 150 0.019608  =3/153
Totals 228 152 0.013158  =5/380

Maximum Metrics: Maximum metrics at their respective thresholds
                        metric threshold    value idx
1                       max f1  0.624901 0.983607 109
2                       max f2  0.134930 0.988296 114
3                 max f0point5  0.824712 0.989376 107
4                 max accuracy  0.824712 0.986842 107
5                max precision  1.000000 1.000000   0
6                   max recall  0.007199 1.000000 141
7              max specificity  1.000000 1.000000   0
8             max absolute_mcc  0.824712 0.972691 107
9   max min_per_class_accuracy  0.141093 0.982379 112
10 max mean_per_class_accuracy  0.624901 0.985791 109

Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`

In [ ]: