In [1]:
library(h2o)
#Start H2O on your local machine using all available cores.
#By default, CRAN policies limit use to only 2 cores.
h2o.init(nthreads = -1)
#Show a demo
demo(h2o.glm)
demo(h2o.gbm)
demo(h2o.deeplearning)
#prova molto positiva
----------------------------------------------------------------------
Your next step is to start H2O:
> h2o.init()
For H2O package documentation, ask for help:
> ??h2o
After starting H2O, you can use the Web UI at http://localhost:54321
For more information visit http://docs.h2o.ai
----------------------------------------------------------------------
Attaching package: ‘h2o’
The following objects are masked from ‘package:stats’:
cor, sd, var
The following objects are masked from ‘package:base’:
&&, %*%, %in%, ||, apply, as.factor, as.numeric, colnames,
colnames<-, ifelse, is.character, is.factor, is.numeric, log,
log10, log1p, log2, round, signif, trunc
H2O is not running yet, starting it now...
Note: In case of errors look at the following log files:
/tmp/RtmpiLXSNs/h2o_micio1970_started_from_r.out
/tmp/RtmpiLXSNs/h2o_micio1970_started_from_r.err
Starting H2O JVM and connecting: ... Connection successful!
R is connected to the H2O cluster:
H2O cluster uptime: 2 seconds 655 milliseconds
H2O cluster version: 3.10.2.2
H2O cluster version age: 1 month and 6 days
H2O cluster name: H2O_started_from_R_micio1970_hro550
H2O cluster total nodes: 1
H2O cluster total memory: 1.71 GB
H2O cluster total cores: 4
H2O cluster allowed cores: 4
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
R Version: R version 3.3.2 (2016-10-31)
demo(h2o.glm)
---- ~~~~~~~
> # This is a demo of H2O's GLM function
> # It imports a data set, parses it, and prints a summary
> # Then, it runs GLM with a binomial link function using 10-fold cross-validation
> # Note: This demo runs H2O on localhost:54321
> library(h2o)
> h2o.init()
Connection successful!
R is connected to the H2O cluster:
H2O cluster uptime: 2 seconds 827 milliseconds
H2O cluster version: 3.10.2.2
H2O cluster version age: 1 month and 6 days
H2O cluster name: H2O_started_from_R_micio1970_hro550
H2O cluster total nodes: 1
H2O cluster total memory: 1.71 GB
H2O cluster total cores: 4
H2O cluster allowed cores: 4
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
R Version: R version 3.3.2 (2016-10-31)
> prostate.hex = h2o.uploadFile(path = system.file("extdata", "prostate.csv", package="h2o"), destination_frame = "prostate.hex")
|======================================================================| 100%
> summary(prostate.hex)
Warning message in summary.H2OFrame(prostate.hex):
“Approximated quantiles computed! If you are interested in exact quantiles, please pass the `exact_quantiles=TRUE` parameter.”
ID CAPSULE AGE RACE
Min. : 1.00 Min. :0.0000 Min. :43.00 Min. :0.000
1st Qu.: 95.75 1st Qu.:0.0000 1st Qu.:62.00 1st Qu.:1.000
Median :190.50 Median :0.0000 Median :67.00 Median :1.000
Mean :190.50 Mean :0.4026 Mean :66.04 Mean :1.087
3rd Qu.:285.25 3rd Qu.:1.0000 3rd Qu.:71.00 3rd Qu.:1.000
Max. :380.00 Max. :1.0000 Max. :79.00 Max. :2.000
DPROS DCAPS PSA VOL
Min. :1.000 Min. :1.000 Min. : 0.300 Min. : 0.00
1st Qu.:1.000 1st Qu.:1.000 1st Qu.: 4.900 1st Qu.: 0.00
Median :2.000 Median :1.000 Median : 8.664 Median :14.20
Mean :2.271 Mean :1.108 Mean : 15.409 Mean :15.81
3rd Qu.:3.000 3rd Qu.:1.000 3rd Qu.: 17.063 3rd Qu.:26.40
Max. :4.000 Max. :2.000 Max. :139.700 Max. :97.60
GLEASON
Min. :0.000
1st Qu.:6.000
Median :6.000
Mean :6.384
3rd Qu.:7.000
Max. :9.000
> prostate.glm = h2o.glm(x = c("AGE","RACE","PSA","DCAPS"), y = "CAPSULE", training_frame = prostate.hex, family = "binomial", alpha = 0.5)
|======================================================================| 100%
> print(prostate.glm)
Model Details:
==============
H2OBinomialModel: glm
Model ID: GLM_model_R_1487447500666_1
GLM Model: summary
family link regularization
1 binomial logit Elastic Net (alpha = 0.5, lambda = 3.247E-4 )
number_of_predictors_total number_of_active_predictors number_of_iterations
1 4 4 4
training_frame
1 prostate.hex
Coefficients: glm coefficients
names coefficients standardized_coefficients
1 Intercept -1.114418 -0.337704
2 AGE -0.010977 -0.071648
3 RACE -0.623216 -0.192433
4 DCAPS 1.314591 0.408386
5 PSA 0.046892 0.937727
H2OBinomialMetrics: glm
** Reported on training data. **
MSE: 0.2027036
RMSE: 0.4502261
LogLoss: 0.5914634
Mean Per-Class Error: 0.3826121
AUC: 0.717601
Gini: 0.435202
R^2: 0.1572256
Null Deviance: 512.2888
Residual Deviance: 449.5122
AIC: 459.5122
Confusion Matrix for F1-optimal threshold:
0 1 Error Rate
0 80 147 0.647577 =147/227
1 18 135 0.117647 =18/153
Totals 98 282 0.434211 =165/380
Maximum Metrics: Maximum metrics at their respective thresholds
metric threshold value idx
1 max f1 0.284048 0.620690 274
2 max f2 0.207093 0.778230 360
3 max f0point5 0.413268 0.636672 108
4 max accuracy 0.413268 0.705263 108
5 max precision 0.998478 1.000000 0
6 max recall 0.207093 1.000000 360
7 max specificity 0.998478 1.000000 0
8 max absolute_mcc 0.413268 0.369123 108
9 max min_per_class_accuracy 0.331806 0.647577 176
10 max mean_per_class_accuracy 0.373175 0.672123 126
Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
> myLabels = c(prostate.glm@model$x, "Intercept")
> plot(prostate.glm@model$coefficients, xaxt = "n", xlab = "Coefficients", ylab = "Values")
> axis(1, at = 1:length(myLabels), labels = myLabels)
> abline(h = 0, col = 2, lty = 2)
> title("Coefficients from Logistic Regression\n of Prostate Cancer Data")
> barplot(prostate.glm@model$coefficients, main = "Coefficients from Logistic Regression\n of Prostate Cancer Data")
demo(h2o.gbm)
---- ~~~~~~~
> # This is a demo of H2O's GBM function
> # It imports a data set, parses it, and prints a summary
> # Then, it runs GBM on a subset of the dataset
> # Note: This demo runs H2O on localhost:54321
> library(h2o)
> h2o.init()
Connection successful!
R is connected to the H2O cluster:
H2O cluster uptime: 28 seconds 380 milliseconds
H2O cluster version: 3.10.2.2
H2O cluster version age: 1 month and 6 days
H2O cluster name: H2O_started_from_R_micio1970_hro550
H2O cluster total nodes: 1
H2O cluster total memory: 1.54 GB
H2O cluster total cores: 4
H2O cluster allowed cores: 4
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
R Version: R version 3.3.2 (2016-10-31)
> prostate.hex = h2o.uploadFile(path = system.file("extdata", "prostate.csv", package="h2o"), destination_frame = "prostate.hex")
|======================================================================| 100%
> summary(prostate.hex)
Warning message in summary.H2OFrame(prostate.hex):
“Approximated quantiles computed! If you are interested in exact quantiles, please pass the `exact_quantiles=TRUE` parameter.”
ID CAPSULE AGE RACE
Min. : 1.00 Min. :0.0000 Min. :43.00 Min. :0.000
1st Qu.: 95.75 1st Qu.:0.0000 1st Qu.:62.00 1st Qu.:1.000
Median :190.50 Median :0.0000 Median :67.00 Median :1.000
Mean :190.50 Mean :0.4026 Mean :66.04 Mean :1.087
3rd Qu.:285.25 3rd Qu.:1.0000 3rd Qu.:71.00 3rd Qu.:1.000
Max. :380.00 Max. :1.0000 Max. :79.00 Max. :2.000
DPROS DCAPS PSA VOL
Min. :1.000 Min. :1.000 Min. : 0.300 Min. : 0.00
1st Qu.:1.000 1st Qu.:1.000 1st Qu.: 4.900 1st Qu.: 0.00
Median :2.000 Median :1.000 Median : 8.664 Median :14.20
Mean :2.271 Mean :1.108 Mean : 15.409 Mean :15.81
3rd Qu.:3.000 3rd Qu.:1.000 3rd Qu.: 17.063 3rd Qu.:26.40
Max. :4.000 Max. :2.000 Max. :139.700 Max. :97.60
GLEASON
Min. :0.000
1st Qu.:6.000
Median :6.000
Mean :6.384
3rd Qu.:7.000
Max. :9.000
> prostate.gbm = h2o.gbm(x = setdiff(colnames(prostate.hex), "CAPSULE"), y = "CAPSULE", training_frame = prostate.hex, ntrees = 10, max_depth = 5, learn_rate = 0.1)
|======================================================================| 100%
> print(prostate.gbm)
Model Details:
==============
H2ORegressionModel: gbm
Model ID: GBM_model_R_1487447500666_3
Model Summary:
number_of_trees number_of_internal_trees model_size_in_bytes min_depth
1 10 10 3311 5
max_depth mean_depth min_leaves max_leaves mean_leaves
1 5 5.00000 17 24 21.40000
H2ORegressionMetrics: gbm
** Reported on training data. **
MSE: 0.1358996
RMSE: 0.3686456
MAE: 0.3391817
RMSLE: 0.259886
Mean Residual Deviance : 0.1358996
> prostate.gbm2 = h2o.gbm(x = c("AGE", "RACE", "PSA", "VOL", "GLEASON"), y = "CAPSULE", training_frame = prostate.hex, ntrees = 10, max_depth = 8, min_rows = 10, learn_rate = 0.2)
|======================================================================| 100%
> print(prostate.gbm2)
Model Details:
==============
H2ORegressionModel: gbm
Model ID: GBM_model_R_1487447500666_4
Model Summary:
number_of_trees number_of_internal_trees model_size_in_bytes min_depth
1 10 10 3985 6
max_depth mean_depth min_leaves max_leaves mean_leaves
1 8 7.50000 19 30 26.60000
H2ORegressionMetrics: gbm
** Reported on training data. **
MSE: 0.1051889
RMSE: 0.3243284
MAE: 0.2698054
RMSLE: 0.2289359
Mean Residual Deviance : 0.1051889
> # This is a demo of H2O's GBM use of default parameters on iris dataset (three classes)
> iris.hex = h2o.uploadFile(path = system.file("extdata", "iris.csv", package="h2o"), destination_frame = "iris.hex")
|======================================================================| 100%
> summary(iris.hex)
Warning message in summary.H2OFrame(iris.hex):
“Approximated quantiles computed! If you are interested in exact quantiles, please pass the `exact_quantiles=TRUE` parameter.”
C1 C2 C3 C4
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.1000
1st Qu.:5.099 1st Qu.:2.799 1st Qu.:1.596 1st Qu.:0.2992
Median :5.798 Median :2.998 Median :4.348 Median :1.3000
Mean :5.843 Mean :3.054 Mean :3.759 Mean :1.1987
3rd Qu.:6.399 3rd Qu.:3.298 3rd Qu.:5.095 3rd Qu.:1.7992
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.5000
C5
Iris-setosa :50
Iris-versicolor:50
Iris-virginica :50
> iris.gbm = h2o.gbm(x = 1:4, y = 5, training_frame = iris.hex)
|======================================================================| 100%
> print(iris.gbm)
Model Details:
==============
H2OMultinomialModel: gbm
Model ID: GBM_model_R_1487447500666_5
Model Summary:
number_of_trees number_of_internal_trees model_size_in_bytes min_depth
1 50 150 28329 1
max_depth mean_depth min_leaves max_leaves mean_leaves
1 5 4.92000 2 12 10.07333
H2OMultinomialMetrics: gbm
** Reported on training data. **
Training Set Metrics:
=====================
Extract training frame with `h2o.getFrame("iris.hex")`
MSE: (Extract with `h2o.mse`) 0.00283639
RMSE: (Extract with `h2o.rmse`) 0.05325777
Logloss: (Extract with `h2o.logloss`) 0.01881246
Mean Per-Class Error: 0
Confusion Matrix: Extract with `h2o.confusionMatrix(<model>,train = TRUE)`)
=========================================================================
Confusion Matrix: vertical: actual; across: predicted
Iris-setosa Iris-versicolor Iris-virginica Error Rate
Iris-setosa 50 0 0 0.0000 = 0 / 50
Iris-versicolor 0 50 0 0.0000 = 0 / 50
Iris-virginica 0 0 50 0.0000 = 0 / 50
Totals 50 50 50 0.0000 = 0 / 150
Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>,train = TRUE)`
=======================================================================
Top-3 Hit Ratios:
k hit_ratio
1 1 1.000000
2 2 1.000000
3 3 1.000000
demo(h2o.deeplearning)
---- ~~~~~~~~~~~~~~~~
> # This is a demo of H2O's Deep Learning function
> # It imports a data set, parses it, and prints a summary
> # Then, it runs Deep Learning on the dataset
> # Note: This demo runs H2O on localhost:54321
> library(h2o)
> h2o.init()
Connection successful!
R is connected to the H2O cluster:
H2O cluster uptime: 33 seconds 190 milliseconds
H2O cluster version: 3.10.2.2
H2O cluster version age: 1 month and 6 days
H2O cluster name: H2O_started_from_R_micio1970_hro550
H2O cluster total nodes: 1
H2O cluster total memory: 1.54 GB
H2O cluster total cores: 4
H2O cluster allowed cores: 4
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
R Version: R version 3.3.2 (2016-10-31)
> prostate.hex = h2o.uploadFile(path = system.file("extdata", "prostate.csv", package="h2o"), destination_frame = "prostate.hex")
|======================================================================| 100%
> summary(prostate.hex)
Warning message in summary.H2OFrame(prostate.hex):
“Approximated quantiles computed! If you are interested in exact quantiles, please pass the `exact_quantiles=TRUE` parameter.”
ID CAPSULE AGE RACE
Min. : 1.00 Min. :0.0000 Min. :43.00 Min. :0.000
1st Qu.: 95.75 1st Qu.:0.0000 1st Qu.:62.00 1st Qu.:1.000
Median :190.50 Median :0.0000 Median :67.00 Median :1.000
Mean :190.50 Mean :0.4026 Mean :66.04 Mean :1.087
3rd Qu.:285.25 3rd Qu.:1.0000 3rd Qu.:71.00 3rd Qu.:1.000
Max. :380.00 Max. :1.0000 Max. :79.00 Max. :2.000
DPROS DCAPS PSA VOL
Min. :1.000 Min. :1.000 Min. : 0.300 Min. : 0.00
1st Qu.:1.000 1st Qu.:1.000 1st Qu.: 4.900 1st Qu.: 0.00
Median :2.000 Median :1.000 Median : 8.664 Median :14.20
Mean :2.271 Mean :1.108 Mean : 15.409 Mean :15.81
3rd Qu.:3.000 3rd Qu.:1.000 3rd Qu.: 17.063 3rd Qu.:26.40
Max. :4.000 Max. :2.000 Max. :139.700 Max. :97.60
GLEASON
Min. :0.000
1st Qu.:6.000
Median :6.000
Mean :6.384
3rd Qu.:7.000
Max. :9.000
> # Set the CAPSULE column to be a factor column then build model.
> prostate.hex$CAPSULE = as.factor(prostate.hex$CAPSULE)
> model = h2o.deeplearning(x = setdiff(colnames(prostate.hex), c("ID","CAPSULE")), y = "CAPSULE", training_frame = prostate.hex, activation = "Tanh", hidden = c(10, 10, 10), epochs = 10000)
|======================================================================| 100%
> print(model@model$model_summary)
Status of Neuron Layers: predicting CAPSULE, 2-class classification, bernoulli distribution, CrossEntropy loss, 322 weights/biases, 8,4 KB, 3.800.000 training samples, mini-batch size 1
layer units type dropout l1 l2 mean_rate rate_rms momentum
1 1 7 Input 0.00 %
2 2 10 Tanh 0.00 % 0.000000 0.000000 0.002345 0.001300 0.000000
3 3 10 Tanh 0.00 % 0.000000 0.000000 0.005969 0.007077 0.000000
4 4 10 Tanh 0.00 % 0.000000 0.000000 0.015578 0.019282 0.000000
5 5 2 Softmax 0.000000 0.000000 0.012518 0.001613 0.000000
mean_weight weight_rms mean_bias bias_rms
1
2 -0.129283 0.902729 0.007419 0.497297
3 -0.028854 0.993868 -0.197894 0.573969
4 0.048970 1.482009 -0.170856 0.789989
5 -0.297535 3.766088 0.001749 0.130906
> # Make predictions with the trained model with training data.
> predictions = predict(object = model, newdata = prostate.hex)
|======================================================================| 100%
> # Export predictions from H2O Cluster as R dataframe.
> predictions.R = as.data.frame(predictions)
> head(predictions.R)
predict p0 p1
1 0 9.999997e-01 3.382484e-07
2 0 9.989477e-01 1.052301e-03
3 0 9.998880e-01 1.119508e-04
4 0 9.982859e-01 1.714129e-03
5 0 9.989841e-01 1.015916e-03
6 1 5.329026e-08 9.999999e-01
> tail(predictions.R)
predict p0 p1
375 0 0.999999978 2.203862e-08
376 0 0.999995975 4.024867e-06
377 0 0.999950882 4.911775e-05
378 1 0.005341035 9.946590e-01
379 0 0.999999671 3.285235e-07
380 0 0.998471500 1.528500e-03
> # Check performance of classification model.
> performance = h2o.performance(model = model)
> print(performance)
H2OBinomialMetrics: deeplearning
** Reported on training data. **
** Metrics reported on full training frame **
MSE: 0.01138505
RMSE: 0.1067008
LogLoss: 0.04281056
Mean Per-Class Error: 0.01420921
AUC: 0.9986179
Gini: 0.9972359
Confusion Matrix for F1-optimal threshold:
0 1 Error Rate
0 225 2 0.008811 =2/227
1 3 150 0.019608 =3/153
Totals 228 152 0.013158 =5/380
Maximum Metrics: Maximum metrics at their respective thresholds
metric threshold value idx
1 max f1 0.624901 0.983607 109
2 max f2 0.134930 0.988296 114
3 max f0point5 0.824712 0.989376 107
4 max accuracy 0.824712 0.986842 107
5 max precision 1.000000 1.000000 0
6 max recall 0.007199 1.000000 141
7 max specificity 1.000000 1.000000 0
8 max absolute_mcc 0.824712 0.972691 107
9 max min_per_class_accuracy 0.141093 0.982379 112
10 max mean_per_class_accuracy 0.624901 0.985791 109
Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
In [ ]:
Content source: micio1970/H2oaiMaster
Similar notebooks: