notebook.community

Edit and run



In [1]:

    
import h2o



In [2]:

    
h2o.init()









    




H2O cluster uptime: 
4 minutes 24 seconds 528 milliseconds 
H2O cluster version: 
3.5.0.99999
H2O cluster name: 
ece
H2O cluster total nodes: 
1
H2O cluster total memory: 
10.67 GB
H2O cluster total cores: 
8
H2O cluster allowed cores: 
8
H2O cluster healthy: 
True
H2O Connection ip: 
127.0.0.1
H2O Connection port: 
54321



In [3]:

    
from h2o.utils.shared_utils import _locate # private function. used to find files within h2o git project directory.

prostate = h2o.upload_file(path=_locate("smalldata/logreg/prostate.csv"))
prostate.describe()









    



Parse Progress: [##################################################] 100%
Uploaded py9b6e1d63-2887-488f-9a46-785fdf2434e5 into cluster with 380 rows and 9 cols
Rows: 380 Cols: 9

Chunk compression summary:






    




chunk_type
chunk_name
count
count_percentage
size
size_percentage
CBS
Bits
1
11.111112
    118  B
2.4210093
C1N
1-Byte Integers (w/o NAs)
5
55.555557
    2.2 KB
45.958145
C2
2-Byte Integers
1
11.111112
    828  B
16.9881
C2S
2-Byte Fractions
2
22.222223
    1.6 KB
34.632744






    



Frame distribution summary:






    





size
number_of_rows
number_of_chunks_per_column
number_of_chunks
10.0.0.24:54321
    4.8 KB
380.0
1.0
9.0
mean
    4.8 KB
380.0
1.0
9.0
min
    4.8 KB
380.0
1.0
9.0
max
    4.8 KB
380.0
1.0
9.0
stddev
      0  B
0.0
0.0
0.0
total
    4.8 KB
380.0
1.0
9.0






    



Column-by-Column Summary:







    





ID
CAPSULE
AGE
RACE
DPROS
DCAPS
PSA
VOL
GLEASON
type
int
int
int
int
int
int
real
real
int
mins
1.0
0.0
43.0
0.0
1.0
1.0
0.3
0.0
0.0
maxs
380.0
1.0
79.0
2.0
4.0
2.0
139.7
97.6
9.0
mean
190.5
0.402631578947
66.0394736842
1.08684210526
2.27105263158
1.10789473684
15.4086315789
15.8129210526
6.38421052632
sigma
109.840793879
0.491074338963
6.52707126917
0.308773258025
1.00010761815
0.310656449351
19.9975726686
18.3476199673
1.09195337443
zero_count
0
227
0
3
0
0
0
167
2
missing_count
0
0
0
0
0
0
0
0
0



In [4]:

    
prostate["CAPSULE"] = prostate["CAPSULE"].asfactor()
model = h2o.deeplearning(x = prostate[list(set(prostate.columns) - set(["ID","CAPSULE"]))], y = prostate["CAPSULE"], training_frame = prostate, activation = "Tanh", hidden = [10, 10, 10], epochs = 10000)
model.show()









    



deeplearning Model Build Progress: [##################################################] 100%
Model Details
=============
H2OBinomialModel :  Deep Learning
Model Key:  DeepLearning_model_python_1444621872790_25

Status of Neuron Layers: predicting CAPSULE, 2-class classification, bernoulli distribution, CrossEntropy loss, 322 weights/biases, 8.5 KB, 3,800,000 training samples, mini-batch size 1







    





layer
units
type
dropout
l1
l2
mean_rate
rate_RMS
momentum
mean_weight
weight_RMS
mean_bias
bias_RMS

1
7
Input
0.0










2
10
Tanh
0.0
0.0
0.0
0.0431974798491
0.0596570521593
0.0
0.339393859047
1.28530693054
0.0687711206469
0.90153336525

3
10
Tanh
0.0
0.0
0.0
0.0849079652375
0.0955412983894
0.0
0.184064850851
1.30200242996
0.173068487576
0.660982608795

4
10
Tanh
0.0
0.0
0.0
0.197120434018
0.265433907509
0.0
-0.221858388707
1.96628475189
1.03366405584
1.32316446304

5
2
Softmax

0.0
0.0
0.0747286665253
0.0741221606731
0.0
0.0154564060271
5.87673377991
-0.154785814024
0.984552383423






    




ModelMetricsBinomial: deeplearning
** Reported on train data. **

MSE: 0.0109153292711
R^2: 0.954617674506
LogLoss: 0.0385994595722
AUC: 0.999424145576
Gini: 0.998848291152

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.0371296297347:






    





0
1
Error
Rate
0
224.0
3.0
0.0132
 (3.0/227.0)
1
0.0
153.0
0.0
 (0.0/153.0)
Total
224.0
156.0
0.0079
 (3.0/380.0)






    



Maximum Metrics: Maximum metrics at their respective thresholds







    




metric
threshold
value
idx
max f1
0.0371296297347
0.990291262136
96.0
max f2
0.0371296297347
0.99609375
96.0
max f0point5
0.401888940217
0.986928104575
93.0
max accuracy
0.0371296297347
0.992105263158
96.0
max precision
0.999999999312
1.0
0.0
max absolute_MCC
0.0371296297347
0.983772088887
96.0
max min_per_class_accuracy
0.401888940217
0.986928104575
93.0






    



Scoring History:






    





timestamp
duration
training_speed
epochs
samples
training_MSE
training_r2
training_logloss
training_AUC
training_classification_error

2015-10-11 20:55:38
 0.000 sec
None
0.0
0.0
nan
nan
nan
nan
nan

2015-10-11 20:55:38
 0.026 sec
146153.846 rows/sec
10.0
3800.0
0.196451486022
0.183219758096
0.58663358323
0.77783536322
0.297368421053

2015-10-11 20:55:43
 5.029 sec
315092.464 rows/sec
4170.0
1584600.0
0.00864409272223
0.964060724163
0.0314068106534
0.999539316461
0.00789473684211

2015-10-11 20:55:48
10.030 sec
303090.728 rows/sec
8000.0
3040000.0
0.00904158664897
0.96240807601
0.0303556924702
0.99951052374
0.0105263157895

2015-10-11 20:55:50
12.754 sec
297945.743 rows/sec
10000.0
3800000.0
0.0109153292711
0.954617674506
0.0385994595722
0.999424145576
0.00789473684211



In [5]:

    
predictions = model.predict(prostate)
predictions.show()









    



H2OFrame with 380 rows and 3 columns: 






    






  
    
      
      predict
      p0
      p1
    
  
  
    
      0
      0
      0.999988
      1.187702e-05
    
    
      1
      0
      1.000000
      2.372383e-08
    
    
      2
      0
      1.000000
      1.921410e-12
    
    
      3
      0
      0.999998
      1.825251e-06
    
    
      4
      0
      1.000000
      5.822914e-21
    
    
      5
      1
      0.000036
      9.999637e-01
    
    
      6
      0
      0.999920
      7.996056e-05
    
    
      7
      0
      1.000000
      1.378506e-13
    
    
      8
      0
      0.999927
      7.308038e-05
    
    
      9
      0
      1.000000
      2.075219e-13



In [6]:

    
performance = model.model_performance(prostate)
performance.show()









    



ModelMetricsBinomial: deeplearning
** Reported on test data. **

MSE: 0.0109153292711
R^2: 0.954617674506
LogLoss: 0.0385994595722
AUC: 0.999424145576
Gini: 0.998848291152

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.0371296297347:






    





0
1
Error
Rate
0
224.0
3.0
0.0132
 (3.0/227.0)
1
0.0
153.0
0.0
 (0.0/153.0)
Total
224.0
156.0
0.0079
 (3.0/380.0)






    



Maximum Metrics: Maximum metrics at their respective thresholds







    




metric
threshold
value
idx
max f1
0.0371296297347
0.990291262136
144.0
max f2
0.0371296297347
0.99609375
144.0
max f0point5
0.401888940217
0.986928104575
141.0
max accuracy
0.0371296297347
0.992105263158
144.0
max precision
1.0
1.0
0.0
max absolute_MCC
0.0371296297347
0.983772088887
144.0
max min_per_class_accuracy
0.401888940217
0.986928104575
141.0

H2O cluster uptime:	4 minutes 24 seconds 528 milliseconds
H2O cluster version:	3.5.0.99999
H2O cluster name:	ece
H2O cluster total nodes:	1
H2O cluster total memory:	10.67 GB
H2O cluster total cores:	8
H2O cluster allowed cores:	8
H2O cluster healthy:	True
H2O Connection ip:	127.0.0.1
H2O Connection port:	54321

chunk_type	chunk_name	count	count_percentage	size	size_percentage
CBS	Bits	1	11.111112	118 B	2.4210093
C1N	1-Byte Integers (w/o NAs)	5	55.555557	2.2 KB	45.958145
C2	2-Byte Integers	1	11.111112	828 B	16.9881
C2S	2-Byte Fractions	2	22.222223	1.6 KB	34.632744

	size	number_of_rows	number_of_chunks_per_column	number_of_chunks
10.0.0.24:54321	4.8 KB	380.0	1.0	9.0
mean	4.8 KB	380.0	1.0	9.0
min	4.8 KB	380.0	1.0	9.0
max	4.8 KB	380.0	1.0	9.0
stddev	0 B	0.0	0.0	0.0
total	4.8 KB	380.0	1.0	9.0

	ID	CAPSULE	AGE	RACE	DPROS	DCAPS	PSA	VOL	GLEASON
type	int	int	int	int	int	int	real	real	int
mins	1.0	0.0	43.0	0.0	1.0	1.0	0.3	0.0	0.0
maxs	380.0	1.0	79.0	2.0	4.0	2.0	139.7	97.6	9.0
mean	190.5	0.402631578947	66.0394736842	1.08684210526	2.27105263158	1.10789473684	15.4086315789	15.8129210526	6.38421052632
sigma	109.840793879	0.491074338963	6.52707126917	0.308773258025	1.00010761815	0.310656449351	19.9975726686	18.3476199673	1.09195337443
zero_count	0	227	0	3	0	0	0	167	2
missing_count	0	0	0	0	0	0	0	0	0

layer	units	type	dropout	l1	l2	mean_rate	rate_RMS	momentum	mean_weight	weight_RMS	mean_bias	bias_RMS
1	7	Input	0.0
2	10	Tanh	0.0	0.0	0.0	0.0431974798491	0.0596570521593	0.0	0.339393859047	1.28530693054	0.0687711206469	0.90153336525
3	10	Tanh	0.0	0.0	0.0	0.0849079652375	0.0955412983894	0.0	0.184064850851	1.30200242996	0.173068487576	0.660982608795
4	10	Tanh	0.0	0.0	0.0	0.197120434018	0.265433907509	0.0	-0.221858388707	1.96628475189	1.03366405584	1.32316446304
5	2	Softmax		0.0	0.0	0.0747286665253	0.0741221606731	0.0	0.0154564060271	5.87673377991	-0.154785814024	0.984552383423

	0	1	Error	Rate
0	224.0	3.0	0.0132	(3.0/227.0)
1	0.0	153.0	0.0	(0.0/153.0)
Total	224.0	156.0	0.0079	(3.0/380.0)

metric	threshold	value	idx
max f1	0.0371296297347	0.990291262136	96.0
max f2	0.0371296297347	0.99609375	96.0
max f0point5	0.401888940217	0.986928104575	93.0
max accuracy	0.0371296297347	0.992105263158	96.0
max precision	0.999999999312	1.0	0.0
max absolute_MCC	0.0371296297347	0.983772088887	96.0
max min_per_class_accuracy	0.401888940217	0.986928104575	93.0

timestamp	duration	training_speed	epochs	samples	training_MSE	training_r2	training_logloss	training_AUC	training_classification_error
2015-10-11 20:55:38	0.000 sec	None	0.0	0.0	nan	nan	nan	nan	nan
2015-10-11 20:55:38	0.026 sec	146153.846 rows/sec	10.0	3800.0	0.196451486022	0.183219758096	0.58663358323	0.77783536322	0.297368421053
2015-10-11 20:55:43	5.029 sec	315092.464 rows/sec	4170.0	1584600.0	0.00864409272223	0.964060724163	0.0314068106534	0.999539316461	0.00789473684211
2015-10-11 20:55:48	10.030 sec	303090.728 rows/sec	8000.0	3040000.0	0.00904158664897	0.96240807601	0.0303556924702	0.99951052374	0.0105263157895
2015-10-11 20:55:50	12.754 sec	297945.743 rows/sec	10000.0	3800000.0	0.0109153292711	0.954617674506	0.0385994595722	0.999424145576	0.00789473684211

	predict	p0	p1
0	0	0.999988	1.187702e-05
1	0	1.000000	2.372383e-08
2	0	1.000000	1.921410e-12
3	0	0.999998	1.825251e-06
4	0	1.000000	5.822914e-21
5	1	0.000036	9.999637e-01
6	0	0.999920	7.996056e-05
7	0	1.000000	1.378506e-13
8	0	0.999927	7.308038e-05
9	0	1.000000	2.075219e-13