In [1]:
import h2o

In [2]:
h2o.init()


H2O cluster uptime: 4 minutes 24 seconds 528 milliseconds
H2O cluster version: 3.5.0.99999
H2O cluster name: ece
H2O cluster total nodes: 1
H2O cluster total memory: 10.67 GB
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster healthy: True
H2O Connection ip: 127.0.0.1
H2O Connection port: 54321

In [3]:
from h2o.utils.shared_utils import _locate # private function. used to find files within h2o git project directory.

prostate = h2o.upload_file(path=_locate("smalldata/logreg/prostate.csv"))
prostate.describe()


Parse Progress: [##################################################] 100%
Uploaded py9b6e1d63-2887-488f-9a46-785fdf2434e5 into cluster with 380 rows and 9 cols
Rows: 380 Cols: 9

Chunk compression summary:
chunk_type chunk_name count count_percentage size size_percentage
CBS Bits 1 11.111112 118 B 2.4210093
C1N 1-Byte Integers (w/o NAs) 5 55.555557 2.2 KB 45.958145
C2 2-Byte Integers 1 11.111112 828 B 16.9881
C2S 2-Byte Fractions 2 22.222223 1.6 KB 34.632744
Frame distribution summary:
size number_of_rows number_of_chunks_per_column number_of_chunks
10.0.0.24:54321 4.8 KB 380.0 1.0 9.0
mean 4.8 KB 380.0 1.0 9.0
min 4.8 KB 380.0 1.0 9.0
max 4.8 KB 380.0 1.0 9.0
stddev 0 B 0.0 0.0 0.0
total 4.8 KB 380.0 1.0 9.0
Column-by-Column Summary:

ID CAPSULE AGE RACE DPROS DCAPS PSA VOL GLEASON
type int int int int int int real real int
mins 1.0 0.0 43.0 0.0 1.0 1.0 0.3 0.0 0.0
maxs 380.0 1.0 79.0 2.0 4.0 2.0 139.7 97.6 9.0
mean 190.5 0.402631578947 66.0394736842 1.08684210526 2.27105263158 1.10789473684 15.4086315789 15.8129210526 6.38421052632
sigma 109.840793879 0.491074338963 6.52707126917 0.308773258025 1.00010761815 0.310656449351 19.9975726686 18.3476199673 1.09195337443
zero_count 0 227 0 3 0 0 0 167 2
missing_count 0 0 0 0 0 0 0 0 0

In [4]:
prostate["CAPSULE"] = prostate["CAPSULE"].asfactor()
model = h2o.deeplearning(x = prostate[list(set(prostate.columns) - set(["ID","CAPSULE"]))], y = prostate["CAPSULE"], training_frame = prostate, activation = "Tanh", hidden = [10, 10, 10], epochs = 10000)
model.show()


deeplearning Model Build Progress: [##################################################] 100%
Model Details
=============
H2OBinomialModel :  Deep Learning
Model Key:  DeepLearning_model_python_1444621872790_25

Status of Neuron Layers: predicting CAPSULE, 2-class classification, bernoulli distribution, CrossEntropy loss, 322 weights/biases, 8.5 KB, 3,800,000 training samples, mini-batch size 1

layer units type dropout l1 l2 mean_rate rate_RMS momentum mean_weight weight_RMS mean_bias bias_RMS
1 7 Input 0.0
2 10 Tanh 0.0 0.0 0.0 0.0431974798491 0.0596570521593 0.0 0.339393859047 1.28530693054 0.0687711206469 0.90153336525
3 10 Tanh 0.0 0.0 0.0 0.0849079652375 0.0955412983894 0.0 0.184064850851 1.30200242996 0.173068487576 0.660982608795
4 10 Tanh 0.0 0.0 0.0 0.197120434018 0.265433907509 0.0 -0.221858388707 1.96628475189 1.03366405584 1.32316446304
5 2 Softmax 0.0 0.0 0.0747286665253 0.0741221606731 0.0 0.0154564060271 5.87673377991 -0.154785814024 0.984552383423

ModelMetricsBinomial: deeplearning
** Reported on train data. **

MSE: 0.0109153292711
R^2: 0.954617674506
LogLoss: 0.0385994595722
AUC: 0.999424145576
Gini: 0.998848291152

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.0371296297347:
0 1 Error Rate
0 224.0 3.0 0.0132 (3.0/227.0)
1 0.0 153.0 0.0 (0.0/153.0)
Total 224.0 156.0 0.0079 (3.0/380.0)
Maximum Metrics: Maximum metrics at their respective thresholds

metric threshold value idx
max f1 0.0371296297347 0.990291262136 96.0
max f2 0.0371296297347 0.99609375 96.0
max f0point5 0.401888940217 0.986928104575 93.0
max accuracy 0.0371296297347 0.992105263158 96.0
max precision 0.999999999312 1.0 0.0
max absolute_MCC 0.0371296297347 0.983772088887 96.0
max min_per_class_accuracy 0.401888940217 0.986928104575 93.0
Scoring History:
timestamp duration training_speed epochs samples training_MSE training_r2 training_logloss training_AUC training_classification_error
2015-10-11 20:55:38 0.000 sec None 0.0 0.0 nan nan nan nan nan
2015-10-11 20:55:38 0.026 sec 146153.846 rows/sec 10.0 3800.0 0.196451486022 0.183219758096 0.58663358323 0.77783536322 0.297368421053
2015-10-11 20:55:43 5.029 sec 315092.464 rows/sec 4170.0 1584600.0 0.00864409272223 0.964060724163 0.0314068106534 0.999539316461 0.00789473684211
2015-10-11 20:55:48 10.030 sec 303090.728 rows/sec 8000.0 3040000.0 0.00904158664897 0.96240807601 0.0303556924702 0.99951052374 0.0105263157895
2015-10-11 20:55:50 12.754 sec 297945.743 rows/sec 10000.0 3800000.0 0.0109153292711 0.954617674506 0.0385994595722 0.999424145576 0.00789473684211

In [5]:
predictions = model.predict(prostate)
predictions.show()


H2OFrame with 380 rows and 3 columns: 
predict p0 p1
0 0 0.999988 1.187702e-05
1 0 1.000000 2.372383e-08
2 0 1.000000 1.921410e-12
3 0 0.999998 1.825251e-06
4 0 1.000000 5.822914e-21
5 1 0.000036 9.999637e-01
6 0 0.999920 7.996056e-05
7 0 1.000000 1.378506e-13
8 0 0.999927 7.308038e-05
9 0 1.000000 2.075219e-13

In [6]:
performance = model.model_performance(prostate)
performance.show()


ModelMetricsBinomial: deeplearning
** Reported on test data. **

MSE: 0.0109153292711
R^2: 0.954617674506
LogLoss: 0.0385994595722
AUC: 0.999424145576
Gini: 0.998848291152

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.0371296297347:
0 1 Error Rate
0 224.0 3.0 0.0132 (3.0/227.0)
1 0.0 153.0 0.0 (0.0/153.0)
Total 224.0 156.0 0.0079 (3.0/380.0)
Maximum Metrics: Maximum metrics at their respective thresholds

metric threshold value idx
max f1 0.0371296297347 0.990291262136 144.0
max f2 0.0371296297347 0.99609375 144.0
max f0point5 0.401888940217 0.986928104575 141.0
max accuracy 0.0371296297347 0.992105263158 144.0
max precision 1.0 1.0 0.0
max absolute_MCC 0.0371296297347 0.983772088887 144.0
max min_per_class_accuracy 0.401888940217 0.986928104575 141.0