H2O Use Case - Predictive Maintenance


In [1]:
# Load h2o library
suppressPackageStartupMessages(library(h2o))

In [2]:
# Start and connect to a local H2O cluster
h2o.init(nthreads = -1)


H2O is not running yet, starting it now...

Note:  In case of errors look at the following log files:
    /tmp/RtmpCsdpyq/h2o_joe_started_from_r.out
    /tmp/RtmpCsdpyq/h2o_joe_started_from_r.err


Starting H2O JVM and connecting: .. Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         2 seconds 539 milliseconds 
    H2O cluster version:        3.10.4.4 
    H2O cluster version age:    3 days  
    H2O cluster name:           H2O_started_from_R_joe_vlz382 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   5.21 GB 
    H2O cluster total cores:    8 
    H2O cluster allowed cores:  8 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    H2O Internal Security:      FALSE 
    R Version:                  R version 3.3.2 (2016-10-31) 


In [3]:
# Importing data from local CSV
h_secom <- h2o.importFile(path = "secom.csv", destination_frame = "h_secom")


  |======================================================================| 100%

In [4]:
# Print out column names
colnames(h_secom)


  1. 'Classification'
  2. 'Feature 001'
  3. 'Feature 002'
  4. 'Feature 003'
  5. 'Feature 004'
  6. 'Feature 005'
  7. 'Feature 006'
  8. 'Feature 007'
  9. 'Feature 008'
  10. 'Feature 009'
  11. 'Feature 010'
  12. 'Feature 011'
  13. 'Feature 012'
  14. 'Feature 013'
  15. 'Feature 014'
  16. 'Feature 015'
  17. 'Feature 016'
  18. 'Feature 017'
  19. 'Feature 018'
  20. 'Feature 019'
  21. 'Feature 020'
  22. 'Feature 021'
  23. 'Feature 022'
  24. 'Feature 023'
  25. 'Feature 024'
  26. 'Feature 025'
  27. 'Feature 026'
  28. 'Feature 027'
  29. 'Feature 028'
  30. 'Feature 029'
  31. 'Feature 030'
  32. 'Feature 031'
  33. 'Feature 032'
  34. 'Feature 033'
  35. 'Feature 034'
  36. 'Feature 035'
  37. 'Feature 036'
  38. 'Feature 037'
  39. 'Feature 038'
  40. 'Feature 039'
  41. 'Feature 040'
  42. 'Feature 041'
  43. 'Feature 042'
  44. 'Feature 043'
  45. 'Feature 044'
  46. 'Feature 045'
  47. 'Feature 046'
  48. 'Feature 047'
  49. 'Feature 048'
  50. 'Feature 049'
  51. 'Feature 050'
  52. 'Feature 051'
  53. 'Feature 052'
  54. 'Feature 053'
  55. 'Feature 054'
  56. 'Feature 055'
  57. 'Feature 056'
  58. 'Feature 057'
  59. 'Feature 058'
  60. 'Feature 059'
  61. 'Feature 060'
  62. 'Feature 061'
  63. 'Feature 062'
  64. 'Feature 063'
  65. 'Feature 064'
  66. 'Feature 065'
  67. 'Feature 066'
  68. 'Feature 067'
  69. 'Feature 068'
  70. 'Feature 069'
  71. 'Feature 070'
  72. 'Feature 071'
  73. 'Feature 072'
  74. 'Feature 073'
  75. 'Feature 074'
  76. 'Feature 075'
  77. 'Feature 076'
  78. 'Feature 077'
  79. 'Feature 078'
  80. 'Feature 079'
  81. 'Feature 080'
  82. 'Feature 081'
  83. 'Feature 082'
  84. 'Feature 083'
  85. 'Feature 084'
  86. 'Feature 085'
  87. 'Feature 086'
  88. 'Feature 087'
  89. 'Feature 088'
  90. 'Feature 089'
  91. 'Feature 090'
  92. 'Feature 091'
  93. 'Feature 092'
  94. 'Feature 093'
  95. 'Feature 094'
  96. 'Feature 095'
  97. 'Feature 096'
  98. 'Feature 097'
  99. 'Feature 098'
  100. 'Feature 099'
  101. 'Feature 100'
  102. 'Feature 101'
  103. 'Feature 102'
  104. 'Feature 103'
  105. 'Feature 104'
  106. 'Feature 105'
  107. 'Feature 106'
  108. 'Feature 107'
  109. 'Feature 108'
  110. 'Feature 109'
  111. 'Feature 110'
  112. 'Feature 111'
  113. 'Feature 112'
  114. 'Feature 113'
  115. 'Feature 114'
  116. 'Feature 115'
  117. 'Feature 116'
  118. 'Feature 117'
  119. 'Feature 118'
  120. 'Feature 119'
  121. 'Feature 120'
  122. 'Feature 121'
  123. 'Feature 122'
  124. 'Feature 123'
  125. 'Feature 124'
  126. 'Feature 125'
  127. 'Feature 126'
  128. 'Feature 127'
  129. 'Feature 128'
  130. 'Feature 129'
  131. 'Feature 130'
  132. 'Feature 131'
  133. 'Feature 132'
  134. 'Feature 133'
  135. 'Feature 134'
  136. 'Feature 135'
  137. 'Feature 136'
  138. 'Feature 137'
  139. 'Feature 138'
  140. 'Feature 139'
  141. 'Feature 140'
  142. 'Feature 141'
  143. 'Feature 142'
  144. 'Feature 143'
  145. 'Feature 144'
  146. 'Feature 145'
  147. 'Feature 146'
  148. 'Feature 147'
  149. 'Feature 148'
  150. 'Feature 149'
  151. 'Feature 150'
  152. 'Feature 151'
  153. 'Feature 152'
  154. 'Feature 153'
  155. 'Feature 154'
  156. 'Feature 155'
  157. 'Feature 156'
  158. 'Feature 157'
  159. 'Feature 158'
  160. 'Feature 159'
  161. 'Feature 160'
  162. 'Feature 161'
  163. 'Feature 162'
  164. 'Feature 163'
  165. 'Feature 164'
  166. 'Feature 165'
  167. 'Feature 166'
  168. 'Feature 167'
  169. 'Feature 168'
  170. 'Feature 169'
  171. 'Feature 170'
  172. 'Feature 171'
  173. 'Feature 172'
  174. 'Feature 173'
  175. 'Feature 174'
  176. 'Feature 175'
  177. 'Feature 176'
  178. 'Feature 177'
  179. 'Feature 178'
  180. 'Feature 179'
  181. 'Feature 180'
  182. 'Feature 181'
  183. 'Feature 182'
  184. 'Feature 183'
  185. 'Feature 184'
  186. 'Feature 185'
  187. 'Feature 186'
  188. 'Feature 187'
  189. 'Feature 188'
  190. 'Feature 189'
  191. 'Feature 190'
  192. 'Feature 191'
  193. 'Feature 192'
  194. 'Feature 193'
  195. 'Feature 194'
  196. 'Feature 195'
  197. 'Feature 196'
  198. 'Feature 197'
  199. 'Feature 198'
  200. 'Feature 199'
  201. 'Feature 200'
  202. 'Feature 201'
  203. 'Feature 202'
  204. 'Feature 203'
  205. 'Feature 204'
  206. 'Feature 205'
  207. 'Feature 206'
  208. 'Feature 207'
  209. 'Feature 208'
  210. 'Feature 209'
  211. 'Feature 210'
  212. 'Feature 211'
  213. 'Feature 212'
  214. 'Feature 213'
  215. 'Feature 214'
  216. 'Feature 215'
  217. 'Feature 216'
  218. 'Feature 217'
  219. 'Feature 218'
  220. 'Feature 219'
  221. 'Feature 220'
  222. 'Feature 221'
  223. 'Feature 222'
  224. 'Feature 223'
  225. 'Feature 224'
  226. 'Feature 225'
  227. 'Feature 226'
  228. 'Feature 227'
  229. 'Feature 228'
  230. 'Feature 229'
  231. 'Feature 230'
  232. 'Feature 231'
  233. 'Feature 232'
  234. 'Feature 233'
  235. 'Feature 234'
  236. 'Feature 235'
  237. 'Feature 236'
  238. 'Feature 237'
  239. 'Feature 238'
  240. 'Feature 239'
  241. 'Feature 240'
  242. 'Feature 241'
  243. 'Feature 242'
  244. 'Feature 243'
  245. 'Feature 244'
  246. 'Feature 245'
  247. 'Feature 246'
  248. 'Feature 247'
  249. 'Feature 248'
  250. 'Feature 249'
  251. 'Feature 250'
  252. 'Feature 251'
  253. 'Feature 252'
  254. 'Feature 253'
  255. 'Feature 254'
  256. 'Feature 255'
  257. 'Feature 256'
  258. 'Feature 257'
  259. 'Feature 258'
  260. 'Feature 259'
  261. 'Feature 260'
  262. 'Feature 261'
  263. 'Feature 262'
  264. 'Feature 263'
  265. 'Feature 264'
  266. 'Feature 265'
  267. 'Feature 266'
  268. 'Feature 267'
  269. 'Feature 268'
  270. 'Feature 269'
  271. 'Feature 270'
  272. 'Feature 271'
  273. 'Feature 272'
  274. 'Feature 273'
  275. 'Feature 274'
  276. 'Feature 275'
  277. 'Feature 276'
  278. 'Feature 277'
  279. 'Feature 278'
  280. 'Feature 279'
  281. 'Feature 280'
  282. 'Feature 281'
  283. 'Feature 282'
  284. 'Feature 283'
  285. 'Feature 284'
  286. 'Feature 285'
  287. 'Feature 286'
  288. 'Feature 287'
  289. 'Feature 288'
  290. 'Feature 289'
  291. 'Feature 290'
  292. 'Feature 291'
  293. 'Feature 292'
  294. 'Feature 293'
  295. 'Feature 294'
  296. 'Feature 295'
  297. 'Feature 296'
  298. 'Feature 297'
  299. 'Feature 298'
  300. 'Feature 299'
  301. 'Feature 300'
  302. 'Feature 301'
  303. 'Feature 302'
  304. 'Feature 303'
  305. 'Feature 304'
  306. 'Feature 305'
  307. 'Feature 306'
  308. 'Feature 307'
  309. 'Feature 308'
  310. 'Feature 309'
  311. 'Feature 310'
  312. 'Feature 311'
  313. 'Feature 312'
  314. 'Feature 313'
  315. 'Feature 314'
  316. 'Feature 315'
  317. 'Feature 316'
  318. 'Feature 317'
  319. 'Feature 318'
  320. 'Feature 319'
  321. 'Feature 320'
  322. 'Feature 321'
  323. 'Feature 322'
  324. 'Feature 323'
  325. 'Feature 324'
  326. 'Feature 325'
  327. 'Feature 326'
  328. 'Feature 327'
  329. 'Feature 328'
  330. 'Feature 329'
  331. 'Feature 330'
  332. 'Feature 331'
  333. 'Feature 332'
  334. 'Feature 333'
  335. 'Feature 334'
  336. 'Feature 335'
  337. 'Feature 336'
  338. 'Feature 337'
  339. 'Feature 338'
  340. 'Feature 339'
  341. 'Feature 340'
  342. 'Feature 341'
  343. 'Feature 342'
  344. 'Feature 343'
  345. 'Feature 344'
  346. 'Feature 345'
  347. 'Feature 346'
  348. 'Feature 347'
  349. 'Feature 348'
  350. 'Feature 349'
  351. 'Feature 350'
  352. 'Feature 351'
  353. 'Feature 352'
  354. 'Feature 353'
  355. 'Feature 354'
  356. 'Feature 355'
  357. 'Feature 356'
  358. 'Feature 357'
  359. 'Feature 358'
  360. 'Feature 359'
  361. 'Feature 360'
  362. 'Feature 361'
  363. 'Feature 362'
  364. 'Feature 363'
  365. 'Feature 364'
  366. 'Feature 365'
  367. 'Feature 366'
  368. 'Feature 367'
  369. 'Feature 368'
  370. 'Feature 369'
  371. 'Feature 370'
  372. 'Feature 371'
  373. 'Feature 372'
  374. 'Feature 373'
  375. 'Feature 374'
  376. 'Feature 375'
  377. 'Feature 376'
  378. 'Feature 377'
  379. 'Feature 378'
  380. 'Feature 379'
  381. 'Feature 380'
  382. 'Feature 381'
  383. 'Feature 382'
  384. 'Feature 383'
  385. 'Feature 384'
  386. 'Feature 385'
  387. 'Feature 386'
  388. 'Feature 387'
  389. 'Feature 388'
  390. 'Feature 389'
  391. 'Feature 390'
  392. 'Feature 391'
  393. 'Feature 392'
  394. 'Feature 393'
  395. 'Feature 394'
  396. 'Feature 395'
  397. 'Feature 396'
  398. 'Feature 397'
  399. 'Feature 398'
  400. 'Feature 399'
  401. 'Feature 400'
  402. 'Feature 401'
  403. 'Feature 402'
  404. 'Feature 403'
  405. 'Feature 404'
  406. 'Feature 405'
  407. 'Feature 406'
  408. 'Feature 407'
  409. 'Feature 408'
  410. 'Feature 409'
  411. 'Feature 410'
  412. 'Feature 411'
  413. 'Feature 412'
  414. 'Feature 413'
  415. 'Feature 414'
  416. 'Feature 415'
  417. 'Feature 416'
  418. 'Feature 417'
  419. 'Feature 418'
  420. 'Feature 419'
  421. 'Feature 420'
  422. 'Feature 421'
  423. 'Feature 422'
  424. 'Feature 423'
  425. 'Feature 424'
  426. 'Feature 425'
  427. 'Feature 426'
  428. 'Feature 427'
  429. 'Feature 428'
  430. 'Feature 429'
  431. 'Feature 430'
  432. 'Feature 431'
  433. 'Feature 432'
  434. 'Feature 433'
  435. 'Feature 434'
  436. 'Feature 435'
  437. 'Feature 436'
  438. 'Feature 437'
  439. 'Feature 438'
  440. 'Feature 439'
  441. 'Feature 440'
  442. 'Feature 441'
  443. 'Feature 442'
  444. 'Feature 443'
  445. 'Feature 444'
  446. 'Feature 445'
  447. 'Feature 446'
  448. 'Feature 447'
  449. 'Feature 448'
  450. 'Feature 449'
  451. 'Feature 450'
  452. 'Feature 451'
  453. 'Feature 452'
  454. 'Feature 453'
  455. 'Feature 454'
  456. 'Feature 455'
  457. 'Feature 456'
  458. 'Feature 457'
  459. 'Feature 458'
  460. 'Feature 459'
  461. 'Feature 460'
  462. 'Feature 461'
  463. 'Feature 462'
  464. 'Feature 463'
  465. 'Feature 464'
  466. 'Feature 465'
  467. 'Feature 466'
  468. 'Feature 467'
  469. 'Feature 468'
  470. 'Feature 469'
  471. 'Feature 470'
  472. 'Feature 471'
  473. 'Feature 472'
  474. 'Feature 473'
  475. 'Feature 474'
  476. 'Feature 475'
  477. 'Feature 476'
  478. 'Feature 477'
  479. 'Feature 478'
  480. 'Feature 479'
  481. 'Feature 480'
  482. 'Feature 481'
  483. 'Feature 482'
  484. 'Feature 483'
  485. 'Feature 484'
  486. 'Feature 485'
  487. 'Feature 486'
  488. 'Feature 487'
  489. 'Feature 488'
  490. 'Feature 489'
  491. 'Feature 490'
  492. 'Feature 491'
  493. 'Feature 492'
  494. 'Feature 493'
  495. 'Feature 494'
  496. 'Feature 495'
  497. 'Feature 496'
  498. 'Feature 497'
  499. 'Feature 498'
  500. 'Feature 499'
  501. 'Feature 500'
  502. 'Feature 501'
  503. 'Feature 502'
  504. 'Feature 503'
  505. 'Feature 504'
  506. 'Feature 505'
  507. 'Feature 506'
  508. 'Feature 507'
  509. 'Feature 508'
  510. 'Feature 509'
  511. 'Feature 510'
  512. 'Feature 511'
  513. 'Feature 512'
  514. 'Feature 513'
  515. 'Feature 514'
  516. 'Feature 515'
  517. 'Feature 516'
  518. 'Feature 517'
  519. 'Feature 518'
  520. 'Feature 519'
  521. 'Feature 520'
  522. 'Feature 521'
  523. 'Feature 522'
  524. 'Feature 523'
  525. 'Feature 524'
  526. 'Feature 525'
  527. 'Feature 526'
  528. 'Feature 527'
  529. 'Feature 528'
  530. 'Feature 529'
  531. 'Feature 530'
  532. 'Feature 531'
  533. 'Feature 532'
  534. 'Feature 533'
  535. 'Feature 534'
  536. 'Feature 535'
  537. 'Feature 536'
  538. 'Feature 537'
  539. 'Feature 538'
  540. 'Feature 539'
  541. 'Feature 540'
  542. 'Feature 541'
  543. 'Feature 542'
  544. 'Feature 543'
  545. 'Feature 544'
  546. 'Feature 545'
  547. 'Feature 546'
  548. 'Feature 547'
  549. 'Feature 548'
  550. 'Feature 549'
  551. 'Feature 550'
  552. 'Feature 551'
  553. 'Feature 552'
  554. 'Feature 553'
  555. 'Feature 554'
  556. 'Feature 555'
  557. 'Feature 556'
  558. 'Feature 557'
  559. 'Feature 558'
  560. 'Feature 559'
  561. 'Feature 560'
  562. 'Feature 561'
  563. 'Feature 562'
  564. 'Feature 563'
  565. 'Feature 564'
  566. 'Feature 565'
  567. 'Feature 566'
  568. 'Feature 567'
  569. 'Feature 568'
  570. 'Feature 569'
  571. 'Feature 570'
  572. 'Feature 571'
  573. 'Feature 572'
  574. 'Feature 573'
  575. 'Feature 574'
  576. 'Feature 575'
  577. 'Feature 576'
  578. 'Feature 577'
  579. 'Feature 578'
  580. 'Feature 579'
  581. 'Feature 580'
  582. 'Feature 581'
  583. 'Feature 582'
  584. 'Feature 583'
  585. 'Feature 584'
  586. 'Feature 585'
  587. 'Feature 586'
  588. 'Feature 587'
  589. 'Feature 588'
  590. 'Feature 589'
  591. 'Feature 590'

In [5]:
# Look at "Classification"
summary(h_secom$Classification, exact_quantiles=TRUE)


 Classification   
 Min.   :-1.0000  
 1st Qu.:-1.0000  
 Median :-1.0000  
 Mean   :-0.8673  
 3rd Qu.:-1.0000  
 Max.   : 1.0000  

In [6]:
# "Classification" is a column of numerical values
# Convert "Classification" in secom dataset from numerical to categorical value
h_secom$Classification <- as.factor(h_secom$Classification)

In [7]:
# Look at "Classification" again
summary(h_secom$Classification, exact_quantiles=TRUE)


 Classification
 -1:1463       
 1 : 104       

In [8]:
# Define target (y) and features (x)
target <- "Classification"
features <- setdiff(colnames(h_secom), target)
print(features)


  [1] "Feature 001" "Feature 002" "Feature 003" "Feature 004" "Feature 005"
  [6] "Feature 006" "Feature 007" "Feature 008" "Feature 009" "Feature 010"
 [11] "Feature 011" "Feature 012" "Feature 013" "Feature 014" "Feature 015"
 [16] "Feature 016" "Feature 017" "Feature 018" "Feature 019" "Feature 020"
 [21] "Feature 021" "Feature 022" "Feature 023" "Feature 024" "Feature 025"
 [26] "Feature 026" "Feature 027" "Feature 028" "Feature 029" "Feature 030"
 [31] "Feature 031" "Feature 032" "Feature 033" "Feature 034" "Feature 035"
 [36] "Feature 036" "Feature 037" "Feature 038" "Feature 039" "Feature 040"
 [41] "Feature 041" "Feature 042" "Feature 043" "Feature 044" "Feature 045"
 [46] "Feature 046" "Feature 047" "Feature 048" "Feature 049" "Feature 050"
 [51] "Feature 051" "Feature 052" "Feature 053" "Feature 054" "Feature 055"
 [56] "Feature 056" "Feature 057" "Feature 058" "Feature 059" "Feature 060"
 [61] "Feature 061" "Feature 062" "Feature 063" "Feature 064" "Feature 065"
 [66] "Feature 066" "Feature 067" "Feature 068" "Feature 069" "Feature 070"
 [71] "Feature 071" "Feature 072" "Feature 073" "Feature 074" "Feature 075"
 [76] "Feature 076" "Feature 077" "Feature 078" "Feature 079" "Feature 080"
 [81] "Feature 081" "Feature 082" "Feature 083" "Feature 084" "Feature 085"
 [86] "Feature 086" "Feature 087" "Feature 088" "Feature 089" "Feature 090"
 [91] "Feature 091" "Feature 092" "Feature 093" "Feature 094" "Feature 095"
 [96] "Feature 096" "Feature 097" "Feature 098" "Feature 099" "Feature 100"
[101] "Feature 101" "Feature 102" "Feature 103" "Feature 104" "Feature 105"
[106] "Feature 106" "Feature 107" "Feature 108" "Feature 109" "Feature 110"
[111] "Feature 111" "Feature 112" "Feature 113" "Feature 114" "Feature 115"
[116] "Feature 116" "Feature 117" "Feature 118" "Feature 119" "Feature 120"
[121] "Feature 121" "Feature 122" "Feature 123" "Feature 124" "Feature 125"
[126] "Feature 126" "Feature 127" "Feature 128" "Feature 129" "Feature 130"
[131] "Feature 131" "Feature 132" "Feature 133" "Feature 134" "Feature 135"
[136] "Feature 136" "Feature 137" "Feature 138" "Feature 139" "Feature 140"
[141] "Feature 141" "Feature 142" "Feature 143" "Feature 144" "Feature 145"
[146] "Feature 146" "Feature 147" "Feature 148" "Feature 149" "Feature 150"
[151] "Feature 151" "Feature 152" "Feature 153" "Feature 154" "Feature 155"
[156] "Feature 156" "Feature 157" "Feature 158" "Feature 159" "Feature 160"
[161] "Feature 161" "Feature 162" "Feature 163" "Feature 164" "Feature 165"
[166] "Feature 166" "Feature 167" "Feature 168" "Feature 169" "Feature 170"
[171] "Feature 171" "Feature 172" "Feature 173" "Feature 174" "Feature 175"
[176] "Feature 176" "Feature 177" "Feature 178" "Feature 179" "Feature 180"
[181] "Feature 181" "Feature 182" "Feature 183" "Feature 184" "Feature 185"
[186] "Feature 186" "Feature 187" "Feature 188" "Feature 189" "Feature 190"
[191] "Feature 191" "Feature 192" "Feature 193" "Feature 194" "Feature 195"
[196] "Feature 196" "Feature 197" "Feature 198" "Feature 199" "Feature 200"
[201] "Feature 201" "Feature 202" "Feature 203" "Feature 204" "Feature 205"
[206] "Feature 206" "Feature 207" "Feature 208" "Feature 209" "Feature 210"
[211] "Feature 211" "Feature 212" "Feature 213" "Feature 214" "Feature 215"
[216] "Feature 216" "Feature 217" "Feature 218" "Feature 219" "Feature 220"
[221] "Feature 221" "Feature 222" "Feature 223" "Feature 224" "Feature 225"
[226] "Feature 226" "Feature 227" "Feature 228" "Feature 229" "Feature 230"
[231] "Feature 231" "Feature 232" "Feature 233" "Feature 234" "Feature 235"
[236] "Feature 236" "Feature 237" "Feature 238" "Feature 239" "Feature 240"
[241] "Feature 241" "Feature 242" "Feature 243" "Feature 244" "Feature 245"
[246] "Feature 246" "Feature 247" "Feature 248" "Feature 249" "Feature 250"
[251] "Feature 251" "Feature 252" "Feature 253" "Feature 254" "Feature 255"
[256] "Feature 256" "Feature 257" "Feature 258" "Feature 259" "Feature 260"
[261] "Feature 261" "Feature 262" "Feature 263" "Feature 264" "Feature 265"
[266] "Feature 266" "Feature 267" "Feature 268" "Feature 269" "Feature 270"
[271] "Feature 271" "Feature 272" "Feature 273" "Feature 274" "Feature 275"
[276] "Feature 276" "Feature 277" "Feature 278" "Feature 279" "Feature 280"
[281] "Feature 281" "Feature 282" "Feature 283" "Feature 284" "Feature 285"
[286] "Feature 286" "Feature 287" "Feature 288" "Feature 289" "Feature 290"
[291] "Feature 291" "Feature 292" "Feature 293" "Feature 294" "Feature 295"
[296] "Feature 296" "Feature 297" "Feature 298" "Feature 299" "Feature 300"
[301] "Feature 301" "Feature 302" "Feature 303" "Feature 304" "Feature 305"
[306] "Feature 306" "Feature 307" "Feature 308" "Feature 309" "Feature 310"
[311] "Feature 311" "Feature 312" "Feature 313" "Feature 314" "Feature 315"
[316] "Feature 316" "Feature 317" "Feature 318" "Feature 319" "Feature 320"
[321] "Feature 321" "Feature 322" "Feature 323" "Feature 324" "Feature 325"
[326] "Feature 326" "Feature 327" "Feature 328" "Feature 329" "Feature 330"
[331] "Feature 331" "Feature 332" "Feature 333" "Feature 334" "Feature 335"
[336] "Feature 336" "Feature 337" "Feature 338" "Feature 339" "Feature 340"
[341] "Feature 341" "Feature 342" "Feature 343" "Feature 344" "Feature 345"
[346] "Feature 346" "Feature 347" "Feature 348" "Feature 349" "Feature 350"
[351] "Feature 351" "Feature 352" "Feature 353" "Feature 354" "Feature 355"
[356] "Feature 356" "Feature 357" "Feature 358" "Feature 359" "Feature 360"
[361] "Feature 361" "Feature 362" "Feature 363" "Feature 364" "Feature 365"
[366] "Feature 366" "Feature 367" "Feature 368" "Feature 369" "Feature 370"
[371] "Feature 371" "Feature 372" "Feature 373" "Feature 374" "Feature 375"
[376] "Feature 376" "Feature 377" "Feature 378" "Feature 379" "Feature 380"
[381] "Feature 381" "Feature 382" "Feature 383" "Feature 384" "Feature 385"
[386] "Feature 386" "Feature 387" "Feature 388" "Feature 389" "Feature 390"
[391] "Feature 391" "Feature 392" "Feature 393" "Feature 394" "Feature 395"
[396] "Feature 396" "Feature 397" "Feature 398" "Feature 399" "Feature 400"
[401] "Feature 401" "Feature 402" "Feature 403" "Feature 404" "Feature 405"
[406] "Feature 406" "Feature 407" "Feature 408" "Feature 409" "Feature 410"
[411] "Feature 411" "Feature 412" "Feature 413" "Feature 414" "Feature 415"
[416] "Feature 416" "Feature 417" "Feature 418" "Feature 419" "Feature 420"
[421] "Feature 421" "Feature 422" "Feature 423" "Feature 424" "Feature 425"
[426] "Feature 426" "Feature 427" "Feature 428" "Feature 429" "Feature 430"
[431] "Feature 431" "Feature 432" "Feature 433" "Feature 434" "Feature 435"
[436] "Feature 436" "Feature 437" "Feature 438" "Feature 439" "Feature 440"
[441] "Feature 441" "Feature 442" "Feature 443" "Feature 444" "Feature 445"
[446] "Feature 446" "Feature 447" "Feature 448" "Feature 449" "Feature 450"
[451] "Feature 451" "Feature 452" "Feature 453" "Feature 454" "Feature 455"
[456] "Feature 456" "Feature 457" "Feature 458" "Feature 459" "Feature 460"
[461] "Feature 461" "Feature 462" "Feature 463" "Feature 464" "Feature 465"
[466] "Feature 466" "Feature 467" "Feature 468" "Feature 469" "Feature 470"
[471] "Feature 471" "Feature 472" "Feature 473" "Feature 474" "Feature 475"
[476] "Feature 476" "Feature 477" "Feature 478" "Feature 479" "Feature 480"
[481] "Feature 481" "Feature 482" "Feature 483" "Feature 484" "Feature 485"
[486] "Feature 486" "Feature 487" "Feature 488" "Feature 489" "Feature 490"
[491] "Feature 491" "Feature 492" "Feature 493" "Feature 494" "Feature 495"
[496] "Feature 496" "Feature 497" "Feature 498" "Feature 499" "Feature 500"
[501] "Feature 501" "Feature 502" "Feature 503" "Feature 504" "Feature 505"
[506] "Feature 506" "Feature 507" "Feature 508" "Feature 509" "Feature 510"
[511] "Feature 511" "Feature 512" "Feature 513" "Feature 514" "Feature 515"
[516] "Feature 516" "Feature 517" "Feature 518" "Feature 519" "Feature 520"
[521] "Feature 521" "Feature 522" "Feature 523" "Feature 524" "Feature 525"
[526] "Feature 526" "Feature 527" "Feature 528" "Feature 529" "Feature 530"
[531] "Feature 531" "Feature 532" "Feature 533" "Feature 534" "Feature 535"
[536] "Feature 536" "Feature 537" "Feature 538" "Feature 539" "Feature 540"
[541] "Feature 541" "Feature 542" "Feature 543" "Feature 544" "Feature 545"
[546] "Feature 546" "Feature 547" "Feature 548" "Feature 549" "Feature 550"
[551] "Feature 551" "Feature 552" "Feature 553" "Feature 554" "Feature 555"
[556] "Feature 556" "Feature 557" "Feature 558" "Feature 559" "Feature 560"
[561] "Feature 561" "Feature 562" "Feature 563" "Feature 564" "Feature 565"
[566] "Feature 566" "Feature 567" "Feature 568" "Feature 569" "Feature 570"
[571] "Feature 571" "Feature 572" "Feature 573" "Feature 574" "Feature 575"
[576] "Feature 576" "Feature 577" "Feature 578" "Feature 579" "Feature 580"
[581] "Feature 581" "Feature 582" "Feature 583" "Feature 584" "Feature 585"
[586] "Feature 586" "Feature 587" "Feature 588" "Feature 589" "Feature 590"

In [9]:
# Splitting dataset into training and test
h_split <- h2o.splitFrame(h_secom, ratios = 0.7, seed = 1234)
h_train <- h_split[[1]] # 70%
h_test  <- h_split[[2]] # 30%

In [10]:
# Look at the size
dim(h_train)
dim(h_test)


  1. 1105
  2. 591
  1. 462
  2. 591

In [11]:
# Check Classification in each dataset
summary(h_train$Classification, exact_quantiles = TRUE)
summary(h_test$Classification, exact_quantiles = TRUE)


 Classification
 -1:1028       
 1 :  77       
 Classification
 -1:435        
 1 : 27        


Build GBM Models using Random Grid Search and Extract the Best Model


In [12]:
# Define the criteria for random grid search
search_criteria = list(strategy = "RandomDiscrete",
                       max_models = 10,   
                       seed = 1234)

In [13]:
# Define the range of hyper-parameters for grid search
hyper_params <- list(
    sample_rate = c(0.6, 0.7, 0.8, 0.9),
    col_sample_rate = c(0.6, 0.7, 0.8, 0.9),
    max_depth = c(4, 5, 6)
)

In [14]:
# Set up grid search
# Add a seed for reproducibility
rand_grid <- h2o.grid(
  
    # Core parameters for model training
    x = features,
    y = target,
    training_frame = h_train,
    ntrees = 500,
    learn_rate = 0.05,
    balance_classes = TRUE,
    seed = 1234,
    
    # Settings for Cross-Validation
    nfolds = 5,
    fold_assignment = "Stratified",
    
    # Parameters for early stopping
    stopping_metric = "mean_per_class_error",
    stopping_rounds = 15,
    score_tree_interval = 1,
        
    # Parameters for grid search
    grid_id = "rand_grid",
    hyper_params = hyper_params,
    algorithm = "gbm",
    search_criteria = search_criteria  
  
)


  |======================================================================| 100%

In [15]:
# Sort and show the grid search results
rand_grid <- h2o.getGrid(grid_id = "rand_grid", sort_by = "mean_per_class_error", decreasing = FALSE)
print(rand_grid)


H2O Grid Details
================

Grid ID: rand_grid 
Used hyper parameters: 
  -  col_sample_rate 
  -  max_depth 
  -  sample_rate 
Number of models: 10 
Number of failed models: 0 

Hyper-Parameter Search Summary: ordered by increasing mean_per_class_error
   col_sample_rate max_depth sample_rate         model_ids mean_per_class_error
1              0.6         4         0.8 rand_grid_model_2   0.3678874627318207
2              0.6         4         0.7 rand_grid_model_1  0.37603592905149325
3              0.6         6         0.6 rand_grid_model_5   0.3794658648744252
4              0.8         4         0.8 rand_grid_model_6   0.3818725049269796
5              0.7         4         0.7 rand_grid_model_8   0.3851066248926171
6              0.8         4         0.7 rand_grid_model_9    0.389534589923695
7              0.8         4         0.6 rand_grid_model_7  0.38985042195158925
8              0.9         6         0.7 rand_grid_model_4   0.4093562079943403
9              0.8         6         0.6 rand_grid_model_0  0.41636136237303556
10             0.9         6         0.8 rand_grid_model_3  0.42382763151245645

In [16]:
# Extract the best model from random grid search
best_model_id <- rand_grid@model_ids[[1]] # top of the list
best_model <- h2o.getModel(best_model_id)
print(best_model)


Model Details:
==============

H2OBinomialModel: gbm
Model ID:  rand_grid_model_2 
Model Summary: 
  number_of_trees number_of_internal_trees model_size_in_bytes min_depth
1              61                       61               15119         4
  max_depth mean_depth min_leaves max_leaves mean_leaves
1         4    4.00000         11         16    13.77049


H2OBinomialMetrics: gbm
** Reported on training data. **

MSE:  0.2561832
RMSE:  0.5061454
LogLoss:  0.6409075
Mean Per-Class Error:  0.0004863813
AUC:  0.9999754
Gini:  0.9999509

Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
         -1    1    Error     Rate
-1     1027    1 0.000973  =1/1028
1         0 1030 0.000000  =0/1030
Totals 1027 1031 0.000486  =1/2058

Maximum Metrics: Maximum metrics at their respective thresholds
                        metric threshold    value idx
1                       max f1  0.166784 0.999515  75
2                       max f2  0.166784 0.999806  75
3                 max f0point5  0.166784 0.999224  75
4                 max accuracy  0.166784 0.999514  75
5                max precision  0.365659 1.000000   0
6                   max recall  0.166784 1.000000  75
7              max specificity  0.365659 1.000000   0
8             max absolute_mcc  0.166784 0.999029  75
9   max min_per_class_accuracy  0.166784 0.999027  75
10 max mean_per_class_accuracy  0.166784 0.999514  75

Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`

H2OBinomialMetrics: gbm
** Reported on cross-validation data. **
** 5-fold cross-validation on training data (Metrics computed for combined holdout predictions) **

MSE:  0.06572745
RMSE:  0.2563737
LogLoss:  0.2767561
Mean Per-Class Error:  0.3678875
AUC:  0.6739855
Gini:  0.3479711

Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
        -1   1    Error       Rate
-1     819 209 0.203307  =209/1028
1       41  36 0.532468     =41/77
Totals 860 245 0.226244  =250/1105

Maximum Metrics: Maximum metrics at their respective thresholds
                        metric threshold    value idx
1                       max f1  0.028906 0.223602 174
2                       max f2  0.011951 0.328467 297
3                 max f0point5  0.082901 0.204778  44
4                 max accuracy  0.164703 0.929412   0
5                max precision  0.084812 0.229167  40
6                   max recall  0.001585 1.000000 391
7              max specificity  0.164703 0.999027   0
8             max absolute_mcc  0.028906 0.161951 174
9   max min_per_class_accuracy  0.018432 0.610390 243
10 max mean_per_class_accuracy  0.028906 0.632113 174

Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
Cross-Validation Metrics Summary: 
                                 mean          sd   cv_1_valid  cv_2_valid
accuracy                   0.85497105 0.028622998    0.8867925  0.78629035
auc                        0.70107853 0.040595975   0.79950494  0.64812684
err                        0.14502896 0.028622998   0.11320755  0.21370968
err_count                        32.6    8.057295         24.0        53.0
f0point5                   0.24869587 0.032992177   0.23584905   0.1826484
f1                         0.29050645 0.026739191   0.29411766  0.23188406
f2                         0.35414198 0.018278506     0.390625  0.31746033
lift_top_group              1.0666667   1.5084945          0.0         0.0
logloss                    0.27472907 0.040207863   0.20996137  0.32169387
max_per_class_error         0.5788664   0.0290633          0.5  0.57894737
mcc                        0.23437075 0.032911915    0.2716149  0.15754637
mean_per_class_accuracy     0.6542992 0.019754296    0.7029703  0.61882323
mean_per_class_error        0.3457008 0.019754296    0.2970297  0.38117674
mse                        0.06533862 0.009910801  0.045586333  0.07369682
precision                  0.22766261  0.03529381   0.20833333        0.16
r2                      -0.0136614265 0.017498508 -0.014273335 -0.04174886
recall                      0.4211336   0.0290633          0.5  0.42105263
rmse                       0.25412405 0.019488202   0.21350956   0.2714716
specificity                0.88746476 0.028320558    0.9059406   0.8165939
                          cv_3_valid    cv_4_valid  cv_5_valid
accuracy                  0.84792626          0.85  0.90384614
auc                       0.66700506     0.6582114  0.73254436
err                       0.15207373          0.15  0.09615385
err_count                       33.0          33.0        20.0
f0point5                  0.29411766    0.22222222  0.30864197
f1                         0.3265306    0.26666668  0.33333334
f2                        0.36697248    0.33333334  0.36231884
lift_top_group                   0.0           0.0   5.3333335
logloss                   0.35666713    0.26497382  0.22034925
max_per_class_error              0.6           0.6  0.61538464
mcc                        0.2494203    0.20780657   0.2854656
mean_per_class_accuracy    0.6467005     0.6414634   0.6615385
mean_per_class_error       0.3532995    0.35853657  0.33846155
mse                       0.08653373   0.063971125 0.056905072
precision                 0.27586207           0.2  0.29411766
r2                      -0.034209877 -0.0068951435  0.02882008
recall                           0.4           0.4   0.3846154
rmse                      0.29416618    0.25292513  0.23854785
specificity                 0.893401     0.8829268  0.93846154

In [17]:
# Check performance on test set
h2o.performance(best_model, h_test)


H2OBinomialMetrics: gbm

MSE:  0.05339771
RMSE:  0.2310794
LogLoss:  0.2138174
Mean Per-Class Error:  0.2693487
AUC:  0.7553427
Gini:  0.5106854

Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
        -1  1    Error     Rate
-1     394 41 0.094253  =41/435
1       12 15 0.444444   =12/27
Totals 406 56 0.114719  =53/462

Maximum Metrics: Maximum metrics at their respective thresholds
                        metric threshold    value idx
1                       max f1  0.044940 0.361446  55
2                       max f2  0.032325 0.465686  92
3                 max f0point5  0.121518 0.338983   7
4                 max accuracy  0.121518 0.941558   7
5                max precision  0.121518 0.500000   7
6                   max recall  0.007008 1.000000 329
7              max specificity  0.167899 0.997701   0
8             max absolute_mcc  0.044940 0.331555  55
9   max min_per_class_accuracy  0.032325 0.703704  92
10 max mean_per_class_accuracy  0.032325 0.763346  92

Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`

Making Predictions


In [18]:
# Use the model for predictions
yhat_test <- h2o.predict(best_model, h_test)


  |======================================================================| 100%

In [19]:
# Show first 10 rows
head(yhat_test, 10)


predictp-1p1
-1 0.9643505 0.03564950
-1 0.9805827 0.01941735
-1 0.9763582 0.02364177
-1 0.8712040 0.12879599
-1 0.9765110 0.02348895
-1 0.9884546 0.01154544
-1 0.9591888 0.04081122
-1 0.9747553 0.02524473
-1 0.9782788 0.02172118
-1 0.9652645 0.03473554