In [1]:
import h2o
from h2o.estimators.deeplearning import H2ODeepLearningEstimator

In [2]:
h2o.init()


Warning: Version mismatch. H2O is version 3.5.0.99999, but the python package is version UNKNOWN.
H2O cluster uptime: 58 minutes 48 seconds 43 milliseconds
H2O cluster version: 3.5.0.99999
H2O cluster name: ludirehak
H2O cluster total nodes: 1
H2O cluster total memory: 4.44 GB
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster healthy: True
H2O Connection ip: 127.0.0.1
H2O Connection port: 54321

In [3]:
from h2o.utils.shared_utils import _locate # private function. used to find files within h2o git project directory.

prostate = h2o.upload_file(path=_locate("smalldata/logreg/prostate.csv"))
prostate.describe()


Parse Progress: [##################################################] 100%
Uploaded py2a71800e-2ec6-4f71-b955-854a4f22aeb3 into cluster with 380 rows and 9 cols
Rows: 380 Cols: 9

Chunk compression summary:
chunk_type chunk_name count count_percentage size size_percentage
CBS Bits 1 11.111112 118 B 2.4210093
C1N 1-Byte Integers (w/o NAs) 5 55.555557 2.2 KB 45.958145
C2 2-Byte Integers 1 11.111112 828 B 16.9881
C2S 2-Byte Fractions 2 22.222223 1.6 KB 34.632744
Frame distribution summary:
size number_of_rows number_of_chunks_per_column number_of_chunks
172.16.2.37:54321 4.8 KB 380.0 1.0 9.0
mean 4.8 KB 380.0 1.0 9.0
min 4.8 KB 380.0 1.0 9.0
max 4.8 KB 380.0 1.0 9.0
stddev 0 B 0.0 0.0 0.0
total 4.8 KB 380.0 1.0 9.0
Column-by-Column Summary:

ID CAPSULE AGE RACE DPROS DCAPS PSA VOL GLEASON
type int int int int int int real real int
mins 1.0 0.0 43.0 0.0 1.0 1.0 0.3 0.0 0.0
maxs 380.0 1.0 79.0 2.0 4.0 2.0 139.7 97.6 9.0
mean 190.5 0.4 66.0 1.1 2.3 1.1 15.4 15.8 6.4
sigma 109.8 0.5 6.5 0.3 1.0 0.3 20.0 18.3 1.1
zero_count 0 227 0 3 0 0 0 167 2
missing_count 0 0 0 0 0 0 0 0 0

In [4]:
prostate["CAPSULE"] = prostate["CAPSULE"].asfactor()
model = H2ODeepLearningEstimator(activation = "Tanh", hidden = [10, 10, 10], epochs = 10000)
model.train(x = list(set(prostate.columns) - set(["ID","CAPSULE"])), y ="CAPSULE", training_frame = prostate)
model.show()


deeplearning Model Build Progress: [##################################################] 100%
Model Details
=============
H2ODeepLearningEstimator :  Deep Learning
Model Key:  DeepLearning_model_python_1445544453075_137

Status of Neuron Layers: predicting CAPSULE, 2-class classification, bernoulli distribution, CrossEntropy loss, 322 weights/biases, 8.5 KB, 3,800,000 training samples, mini-batch size 1

layer units type dropout l1 l2 mean_rate rate_RMS momentum mean_weight weight_RMS mean_bias bias_RMS
1 7 Input 0.0
2 10 Tanh 0.0 0.0 0.0 0.1 0.1 0.0 0.1 1.0 0.4 0.7
3 10 Tanh 0.0 0.0 0.0 0.1 0.1 0.0 0.0 1.4 1.1 0.7
4 10 Tanh 0.0 0.0 0.0 0.2 0.2 0.0 -0.2 1.8 -0.2 0.8
5 2 Softmax 0.0 0.0 0.4 0.1 0.0 -0.2 5.7 0.1 0.3

ModelMetricsBinomial: deeplearning
** Reported on train data. **

MSE: 0.010708193224
R^2: 0.955478877615
LogLoss: 0.0689458344205
AUC: 0.996818404307
Gini: 0.993636808615

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.85259659057:
0 1 Error Rate
0 224.0 3.0 0.0132 (3.0/227.0)
1 0.0 153.0 0.0 (0.0/153.0)
Total 224.0 156.0 0.0079 (3.0/380.0)
Maximum Metrics: Maximum metrics at their respective thresholds

metric threshold value idx
max f1 0.9 1.0 107.0
max f2 0.9 1.0 107.0
max f0point5 0.9 1.0 107.0
max accuracy 0.9 1.0 107.0
max precision 1.0 1.0 0.0
max absolute_MCC 0.9 1.0 107.0
max min_per_class_accuracy 0.9 1.0 105.0
Scoring History:
timestamp duration training_speed epochs samples training_MSE training_r2 training_logloss training_AUC training_classification_error
2015-10-22 14:06:21 0.000 sec None 0.0 0.0 nan nan nan nan nan
2015-10-22 14:06:21 0.038 sec 115151 rows/sec 10.0 3800.0 0.2 0.2 0.6 0.8 0.3
2015-10-22 14:06:26 5.047 sec 309126 rows/sec 4100.0 1558000.0 0.0 0.9 0.1 1.0 0.0
2015-10-22 14:06:31 10.051 sec 308783 rows/sec 8160.0 3100800.0 0.0 0.9 0.1 1.0 0.0
2015-10-22 14:06:34 12.360 sec 307717 rows/sec 10000.0 3800000.0 0.0 1.0 0.1 1.0 0.0

In [5]:
predictions = model.predict(prostate)
predictions.show()


H2OFrame with 380 rows and 3 columns: 
predict p0 p1
0 0 9.993875e-01 6.125394e-04
1 0 9.999998e-01 1.937478e-07
2 0 9.999646e-01 3.535732e-05
3 0 1.000000e+00 2.235483e-12
4 0 9.999950e-01 5.024862e-06
5 1 1.237468e-07 9.999999e-01
6 0 9.992793e-01 7.206910e-04
7 0 1.000000e+00 9.146884e-19
8 0 1.000000e+00 8.434714e-13
9 0 9.999994e-01 6.112821e-07

In [6]:
performance = model.model_performance(prostate)
performance.show()


ModelMetricsBinomial: deeplearning
** Reported on test data. **

MSE: 0.010708193224
R^2: 0.955478877615
LogLoss: 0.0689458344205
AUC: 0.996804007947
Gini: 0.993608015894

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.85259659057:
0 1 Error Rate
0 224.0 3.0 0.0132 (3.0/227.0)
1 0.0 153.0 0.0 (0.0/153.0)
Total 224.0 156.0 0.0079 (3.0/380.0)
Maximum Metrics: Maximum metrics at their respective thresholds

metric threshold value idx
max f1 0.9 1.0 155.0
max f2 0.9 1.0 155.0
max f0point5 0.9 1.0 155.0
max accuracy 0.9 1.0 155.0
max precision 1.0 1.0 0.0
max absolute_MCC 0.9 1.0 155.0
max min_per_class_accuracy 0.9 1.0 153.0