In [13]:
import h2o

In [14]:
h2o.init()


H2O cluster uptime: 1 minutes 47 seconds 826 milliseconds
H2O cluster version: 3.1.0.99999
H2O cluster name: ece
H2O cluster total nodes: 1
H2O cluster total memory: 4.44 GB
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster healthy: True
H2O Connection ip: 127.0.0.1
H2O Connection port: 54321

In [15]:
prostate = h2o.upload_file(path=h2o.locate("smalldata/logreg/prostate.csv"))
prostate.describe()


Parse Progress: [##################################################] 100%
Uploaded py9485de29-6e44-4cde-bd77-b480bdc9d6cb into cluster with 380 rows and 9 cols
Rows: 380 Cols: 9

Chunk compression summary:

chunk_type chunk_name count count_percentage size size_percentage
CBS Bits 1 11.111112 118 B 2.4210093
C1N 1-Byte Integers (w/o NAs) 5 55.555557 2.2 KB 45.958145
C2 2-Byte Integers 1 11.111112 828 B 16.9881
C2S 2-Byte Fractions 2 22.222223 1.6 KB 34.632744
Frame distribution summary:

size number_of_rows number_of_chunks_per_column number_of_chunks
172.16.2.41:54321 4.8 KB 380.0 1.0 9.0
mean 4.8 KB 380.0 1.0 9.0
min 4.8 KB 380.0 1.0 9.0
max 4.8 KB 380.0 1.0 9.0
stddev 0 B 0.0 0.0 0.0
total 4.8 KB 380.0 1.0 9.0
Column-by-Column Summary:

ID CAPSULE AGE RACE DPROS DCAPS PSA VOL GLEASON
type int int int int int int real real int
mins 1.0 0.0 43.0 0.0 1.0 1.0 0.3 0.0 0.0
mean 190.5 0.402631578947 66.0394736842 1.08684210526 2.27105263158 1.10789473684 15.4086315789 15.8129210526 6.38421052632
maxs 380.0 1.0 79.0 2.0 4.0 2.0 139.7 97.6 9.0
sigma 109.840793879 0.491074338963 6.52707126917 0.308773258025 1.00010761815 0.310656449351 19.9975726686 18.3476199673 1.09195337443
zero_count 0 227 0 3 0 0 0 167 2
missing_count 0 0 0 0 0 0 0 0 0

In [16]:
prostate["CAPSULE"] = prostate["CAPSULE"].asfactor()
model = h2o.deeplearning(x = prostate[list(set(prostate.columns) - set(["ID","CAPSULE"]))], y = prostate["CAPSULE"], training_frame = prostate, activation = "Tanh", hidden = [10, 10, 10], epochs = 10000)
model.show()


deeplearning Model Build Progress: [##################################################] 100%
Model Details
=============
H2OBinomialModel :  Deep Learning
Model Key:  DeepLearningModel__bd863374e41fb02662d41e05ea504f7e

Status of Neuron Layers:

layer units type dropout l1 l2 mean_rate rate_RMS momentum mean_weight weight_RMS mean_bias bias_RMS
1 7 Input 0.0
2 10 Tanh 0.0 0.0 0.0 0.0053118872 0.0064453385 0.0 -0.052273955 1.2920758 -0.015679162 0.6260032
3 10 Tanh 0.0 0.0 0.0 0.04846302 0.12402522 0.0 -0.18807818 1.5033592 -0.11971774 1.2407272
4 10 Tanh 0.0 0.0 0.0 0.022704514 0.043265637 0.0 -0.06319846 1.7067846 0.33969718 0.96845585
5 2 Softmax 0.0 0.0 0.0053924634 0.000767734 0.0 -0.3149048 4.5431895 -0.00843814 0.7755003

ModelMetricsBinomial: deeplearning
** Reported on train data. **

MSE: 0.00515095678079
R^2: 0.978640384798
LogLoss: 0.0189778247313
AUC: 0.999740865509
Gini: 0.999481731018

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.558165788651:

0 1 Error Rate
0 226.0 1.0 0.0044 (1.0/227.0)
1 1.0 152.0 0.0065 (1.0/153.0)
Total 227.0 153.0 0.0109 (0.0109/380.0)
Maximum Metrics:

metric threshold value idx
f1 0.558165788651 0.993464052288 95.0
f2 0.558165788651 0.993464052288 95.0
f0point5 0.784003376961 0.997357992074 93.0
accuracy 0.784003376961 0.994736842105 93.0
precision 1.0 1.0 0.0
absolute_MCC 0.784003376961 0.989094861551 93.0
min_per_class_accuracy 0.558165788651 0.993464052288 95.0
tns 1.0 227.0 0.0
fns 1.0 102.0 0.0
fps 1.6900796636e-24 227.0 322.0
tps 0.0501889586449 153.0 103.0
tnr 1.0 1.0 0.0
fnr 1.0 0.666666666667 0.0
fpr 1.6900796636e-24 1.0 322.0
tpr 0.0501889586449 1.0 103.0
Scoring History:

timestamp duration training_speed epochs samples training_MSE training_r2 training_logloss training_AUC training_classification_error
2015-05-22 13:22:58 0.000 sec None 0.0 0.0 nan nan nan nan nan
2015-05-22 13:22:58 0.012 sec 316666.667 rows/sec 10.0 3800.0 0.204413234872 0.15235397523 0.597336218462 nan 0.313157894737
2015-05-22 13:23:03 5.020 sec 372430.279 rows/sec 4920.0 1869600.0 0.0165889609441 0.931210096019 0.0492657643152 nan 0.0184210526316
2015-05-22 13:23:08 10.028 sec 355444.755 rows/sec 9380.0 3564400.0 0.00753752319028 0.968743943743 0.0230601929024 nan 0.00789473684211
2015-05-22 13:23:09 10.754 sec 353356.890 rows/sec 10000.0 3800000.0 0.00515095678079 0.978640384798 0.0189778247313 nan 0.00526315789474

In [17]:
predictions = model.predict(prostate)
predictions.show()


Displaying 10 row(s):
Row ID predict p0 p1
1 [u'0'] [0.9999997615814209] [2.7966822813141334e-07]
2 [u'0'] [0.9999793767929077] [2.0565463273669593e-05]
3 [u'0'] [0.9999675750732422] [3.237894270569086e-05]
4 [u'0'] [0.9999769926071167] [2.2972568331169896e-05]
5 [u'0'] [1.0] [2.3572571168249332e-14]
6 [u'1'] [5.477976405821039e-10] [1.0]
7 [u'0'] [0.9999896287918091] [1.033373519021552e-05]
8 [u'0'] [0.9994649291038513] [0.0005351081490516663]
9 [u'0'] [0.9906359314918518] [0.009364110417664051]
10 [u'0'] [1.0] [3.102281764810755e-10]

In [18]:
performance = model.model_performance(prostate)
performance.show()


ModelMetricsBinomial: deeplearning
** Reported on test data. **

MSE: 0.00515095678079
R^2: 0.978640384798
LogLoss: 0.0189778247313
AUC: 0.999740865509
Gini: 0.999481731018

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.558165788651:

0 1 Error Rate
0 226.0 1.0 0.0044 (1.0/227.0)
1 1.0 152.0 0.0065 (1.0/153.0)
Total 227.0 153.0 0.0109 (0.0109/380.0)
Maximum Metrics:

metric threshold value idx
f1 0.558165788651 0.993464052288 95.0
f2 0.558165788651 0.993464052288 95.0
f0point5 0.784003376961 0.997357992074 93.0
accuracy 0.784003376961 0.994736842105 93.0
precision 1.0 1.0 0.0
absolute_MCC 0.784003376961 0.989094861551 93.0
min_per_class_accuracy 0.558165788651 0.993464052288 95.0
tns 1.0 227.0 0.0
fns 1.0 102.0 0.0
fps 1.6900796636e-24 227.0 322.0
tps 0.0501889586449 153.0 103.0
tnr 1.0 1.0 0.0
fnr 1.0 0.666666666667 0.0
fpr 1.6900796636e-24 1.0 322.0
tpr 0.0501889586449 1.0 103.0