In [1]:
# Load h2o library
suppressPackageStartupMessages(library(h2o))
In [2]:
# Start and connect to a local H2O cluster
h2o.init(nthreads = -1)
H2O is not running yet, starting it now...
Note: In case of errors look at the following log files:
/tmp/Rtmpf9glXx/h2o_joe_started_from_r.out
/tmp/Rtmpf9glXx/h2o_joe_started_from_r.err
Starting H2O JVM and connecting: ... Connection successful!
R is connected to the H2O cluster:
H2O cluster uptime: 3 seconds 160 milliseconds
H2O cluster version: 3.10.4.4
H2O cluster version age: 3 days
H2O cluster name: H2O_started_from_R_joe_cqn703
H2O cluster total nodes: 1
H2O cluster total memory: 5.21 GB
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
H2O Internal Security: FALSE
R Version: R version 3.3.2 (2016-10-31)
In [3]:
# Importing data from local CSV
h_secom <- h2o.importFile(path = "secom.csv", destination_frame = "h_secom")
|======================================================================| 100%
In [4]:
# Print out column names
colnames(h_secom)
- 'Classification'
- 'Feature 001'
- 'Feature 002'
- 'Feature 003'
- 'Feature 004'
- 'Feature 005'
- 'Feature 006'
- 'Feature 007'
- 'Feature 008'
- 'Feature 009'
- 'Feature 010'
- 'Feature 011'
- 'Feature 012'
- 'Feature 013'
- 'Feature 014'
- 'Feature 015'
- 'Feature 016'
- 'Feature 017'
- 'Feature 018'
- 'Feature 019'
- 'Feature 020'
- 'Feature 021'
- 'Feature 022'
- 'Feature 023'
- 'Feature 024'
- 'Feature 025'
- 'Feature 026'
- 'Feature 027'
- 'Feature 028'
- 'Feature 029'
- 'Feature 030'
- 'Feature 031'
- 'Feature 032'
- 'Feature 033'
- 'Feature 034'
- 'Feature 035'
- 'Feature 036'
- 'Feature 037'
- 'Feature 038'
- 'Feature 039'
- 'Feature 040'
- 'Feature 041'
- 'Feature 042'
- 'Feature 043'
- 'Feature 044'
- 'Feature 045'
- 'Feature 046'
- 'Feature 047'
- 'Feature 048'
- 'Feature 049'
- 'Feature 050'
- 'Feature 051'
- 'Feature 052'
- 'Feature 053'
- 'Feature 054'
- 'Feature 055'
- 'Feature 056'
- 'Feature 057'
- 'Feature 058'
- 'Feature 059'
- 'Feature 060'
- 'Feature 061'
- 'Feature 062'
- 'Feature 063'
- 'Feature 064'
- 'Feature 065'
- 'Feature 066'
- 'Feature 067'
- 'Feature 068'
- 'Feature 069'
- 'Feature 070'
- 'Feature 071'
- 'Feature 072'
- 'Feature 073'
- 'Feature 074'
- 'Feature 075'
- 'Feature 076'
- 'Feature 077'
- 'Feature 078'
- 'Feature 079'
- 'Feature 080'
- 'Feature 081'
- 'Feature 082'
- 'Feature 083'
- 'Feature 084'
- 'Feature 085'
- 'Feature 086'
- 'Feature 087'
- 'Feature 088'
- 'Feature 089'
- 'Feature 090'
- 'Feature 091'
- 'Feature 092'
- 'Feature 093'
- 'Feature 094'
- 'Feature 095'
- 'Feature 096'
- 'Feature 097'
- 'Feature 098'
- 'Feature 099'
- 'Feature 100'
- 'Feature 101'
- 'Feature 102'
- 'Feature 103'
- 'Feature 104'
- 'Feature 105'
- 'Feature 106'
- 'Feature 107'
- 'Feature 108'
- 'Feature 109'
- 'Feature 110'
- 'Feature 111'
- 'Feature 112'
- 'Feature 113'
- 'Feature 114'
- 'Feature 115'
- 'Feature 116'
- 'Feature 117'
- 'Feature 118'
- 'Feature 119'
- 'Feature 120'
- 'Feature 121'
- 'Feature 122'
- 'Feature 123'
- 'Feature 124'
- 'Feature 125'
- 'Feature 126'
- 'Feature 127'
- 'Feature 128'
- 'Feature 129'
- 'Feature 130'
- 'Feature 131'
- 'Feature 132'
- 'Feature 133'
- 'Feature 134'
- 'Feature 135'
- 'Feature 136'
- 'Feature 137'
- 'Feature 138'
- 'Feature 139'
- 'Feature 140'
- 'Feature 141'
- 'Feature 142'
- 'Feature 143'
- 'Feature 144'
- 'Feature 145'
- 'Feature 146'
- 'Feature 147'
- 'Feature 148'
- 'Feature 149'
- 'Feature 150'
- 'Feature 151'
- 'Feature 152'
- 'Feature 153'
- 'Feature 154'
- 'Feature 155'
- 'Feature 156'
- 'Feature 157'
- 'Feature 158'
- 'Feature 159'
- 'Feature 160'
- 'Feature 161'
- 'Feature 162'
- 'Feature 163'
- 'Feature 164'
- 'Feature 165'
- 'Feature 166'
- 'Feature 167'
- 'Feature 168'
- 'Feature 169'
- 'Feature 170'
- 'Feature 171'
- 'Feature 172'
- 'Feature 173'
- 'Feature 174'
- 'Feature 175'
- 'Feature 176'
- 'Feature 177'
- 'Feature 178'
- 'Feature 179'
- 'Feature 180'
- 'Feature 181'
- 'Feature 182'
- 'Feature 183'
- 'Feature 184'
- 'Feature 185'
- 'Feature 186'
- 'Feature 187'
- 'Feature 188'
- 'Feature 189'
- 'Feature 190'
- 'Feature 191'
- 'Feature 192'
- 'Feature 193'
- 'Feature 194'
- 'Feature 195'
- 'Feature 196'
- 'Feature 197'
- 'Feature 198'
- 'Feature 199'
- 'Feature 200'
- 'Feature 201'
- 'Feature 202'
- 'Feature 203'
- 'Feature 204'
- 'Feature 205'
- 'Feature 206'
- 'Feature 207'
- 'Feature 208'
- 'Feature 209'
- 'Feature 210'
- 'Feature 211'
- 'Feature 212'
- 'Feature 213'
- 'Feature 214'
- 'Feature 215'
- 'Feature 216'
- 'Feature 217'
- 'Feature 218'
- 'Feature 219'
- 'Feature 220'
- 'Feature 221'
- 'Feature 222'
- 'Feature 223'
- 'Feature 224'
- 'Feature 225'
- 'Feature 226'
- 'Feature 227'
- 'Feature 228'
- 'Feature 229'
- 'Feature 230'
- 'Feature 231'
- 'Feature 232'
- 'Feature 233'
- 'Feature 234'
- 'Feature 235'
- 'Feature 236'
- 'Feature 237'
- 'Feature 238'
- 'Feature 239'
- 'Feature 240'
- 'Feature 241'
- 'Feature 242'
- 'Feature 243'
- 'Feature 244'
- 'Feature 245'
- 'Feature 246'
- 'Feature 247'
- 'Feature 248'
- 'Feature 249'
- 'Feature 250'
- 'Feature 251'
- 'Feature 252'
- 'Feature 253'
- 'Feature 254'
- 'Feature 255'
- 'Feature 256'
- 'Feature 257'
- 'Feature 258'
- 'Feature 259'
- 'Feature 260'
- 'Feature 261'
- 'Feature 262'
- 'Feature 263'
- 'Feature 264'
- 'Feature 265'
- 'Feature 266'
- 'Feature 267'
- 'Feature 268'
- 'Feature 269'
- 'Feature 270'
- 'Feature 271'
- 'Feature 272'
- 'Feature 273'
- 'Feature 274'
- 'Feature 275'
- 'Feature 276'
- 'Feature 277'
- 'Feature 278'
- 'Feature 279'
- 'Feature 280'
- 'Feature 281'
- 'Feature 282'
- 'Feature 283'
- 'Feature 284'
- 'Feature 285'
- 'Feature 286'
- 'Feature 287'
- 'Feature 288'
- 'Feature 289'
- 'Feature 290'
- 'Feature 291'
- 'Feature 292'
- 'Feature 293'
- 'Feature 294'
- 'Feature 295'
- 'Feature 296'
- 'Feature 297'
- 'Feature 298'
- 'Feature 299'
- 'Feature 300'
- 'Feature 301'
- 'Feature 302'
- 'Feature 303'
- 'Feature 304'
- 'Feature 305'
- 'Feature 306'
- 'Feature 307'
- 'Feature 308'
- 'Feature 309'
- 'Feature 310'
- 'Feature 311'
- 'Feature 312'
- 'Feature 313'
- 'Feature 314'
- 'Feature 315'
- 'Feature 316'
- 'Feature 317'
- 'Feature 318'
- 'Feature 319'
- 'Feature 320'
- 'Feature 321'
- 'Feature 322'
- 'Feature 323'
- 'Feature 324'
- 'Feature 325'
- 'Feature 326'
- 'Feature 327'
- 'Feature 328'
- 'Feature 329'
- 'Feature 330'
- 'Feature 331'
- 'Feature 332'
- 'Feature 333'
- 'Feature 334'
- 'Feature 335'
- 'Feature 336'
- 'Feature 337'
- 'Feature 338'
- 'Feature 339'
- 'Feature 340'
- 'Feature 341'
- 'Feature 342'
- 'Feature 343'
- 'Feature 344'
- 'Feature 345'
- 'Feature 346'
- 'Feature 347'
- 'Feature 348'
- 'Feature 349'
- 'Feature 350'
- 'Feature 351'
- 'Feature 352'
- 'Feature 353'
- 'Feature 354'
- 'Feature 355'
- 'Feature 356'
- 'Feature 357'
- 'Feature 358'
- 'Feature 359'
- 'Feature 360'
- 'Feature 361'
- 'Feature 362'
- 'Feature 363'
- 'Feature 364'
- 'Feature 365'
- 'Feature 366'
- 'Feature 367'
- 'Feature 368'
- 'Feature 369'
- 'Feature 370'
- 'Feature 371'
- 'Feature 372'
- 'Feature 373'
- 'Feature 374'
- 'Feature 375'
- 'Feature 376'
- 'Feature 377'
- 'Feature 378'
- 'Feature 379'
- 'Feature 380'
- 'Feature 381'
- 'Feature 382'
- 'Feature 383'
- 'Feature 384'
- 'Feature 385'
- 'Feature 386'
- 'Feature 387'
- 'Feature 388'
- 'Feature 389'
- 'Feature 390'
- 'Feature 391'
- 'Feature 392'
- 'Feature 393'
- 'Feature 394'
- 'Feature 395'
- 'Feature 396'
- 'Feature 397'
- 'Feature 398'
- 'Feature 399'
- 'Feature 400'
- 'Feature 401'
- 'Feature 402'
- 'Feature 403'
- 'Feature 404'
- 'Feature 405'
- 'Feature 406'
- 'Feature 407'
- 'Feature 408'
- 'Feature 409'
- 'Feature 410'
- 'Feature 411'
- 'Feature 412'
- 'Feature 413'
- 'Feature 414'
- 'Feature 415'
- 'Feature 416'
- 'Feature 417'
- 'Feature 418'
- 'Feature 419'
- 'Feature 420'
- 'Feature 421'
- 'Feature 422'
- 'Feature 423'
- 'Feature 424'
- 'Feature 425'
- 'Feature 426'
- 'Feature 427'
- 'Feature 428'
- 'Feature 429'
- 'Feature 430'
- 'Feature 431'
- 'Feature 432'
- 'Feature 433'
- 'Feature 434'
- 'Feature 435'
- 'Feature 436'
- 'Feature 437'
- 'Feature 438'
- 'Feature 439'
- 'Feature 440'
- 'Feature 441'
- 'Feature 442'
- 'Feature 443'
- 'Feature 444'
- 'Feature 445'
- 'Feature 446'
- 'Feature 447'
- 'Feature 448'
- 'Feature 449'
- 'Feature 450'
- 'Feature 451'
- 'Feature 452'
- 'Feature 453'
- 'Feature 454'
- 'Feature 455'
- 'Feature 456'
- 'Feature 457'
- 'Feature 458'
- 'Feature 459'
- 'Feature 460'
- 'Feature 461'
- 'Feature 462'
- 'Feature 463'
- 'Feature 464'
- 'Feature 465'
- 'Feature 466'
- 'Feature 467'
- 'Feature 468'
- 'Feature 469'
- 'Feature 470'
- 'Feature 471'
- 'Feature 472'
- 'Feature 473'
- 'Feature 474'
- 'Feature 475'
- 'Feature 476'
- 'Feature 477'
- 'Feature 478'
- 'Feature 479'
- 'Feature 480'
- 'Feature 481'
- 'Feature 482'
- 'Feature 483'
- 'Feature 484'
- 'Feature 485'
- 'Feature 486'
- 'Feature 487'
- 'Feature 488'
- 'Feature 489'
- 'Feature 490'
- 'Feature 491'
- 'Feature 492'
- 'Feature 493'
- 'Feature 494'
- 'Feature 495'
- 'Feature 496'
- 'Feature 497'
- 'Feature 498'
- 'Feature 499'
- 'Feature 500'
- 'Feature 501'
- 'Feature 502'
- 'Feature 503'
- 'Feature 504'
- 'Feature 505'
- 'Feature 506'
- 'Feature 507'
- 'Feature 508'
- 'Feature 509'
- 'Feature 510'
- 'Feature 511'
- 'Feature 512'
- 'Feature 513'
- 'Feature 514'
- 'Feature 515'
- 'Feature 516'
- 'Feature 517'
- 'Feature 518'
- 'Feature 519'
- 'Feature 520'
- 'Feature 521'
- 'Feature 522'
- 'Feature 523'
- 'Feature 524'
- 'Feature 525'
- 'Feature 526'
- 'Feature 527'
- 'Feature 528'
- 'Feature 529'
- 'Feature 530'
- 'Feature 531'
- 'Feature 532'
- 'Feature 533'
- 'Feature 534'
- 'Feature 535'
- 'Feature 536'
- 'Feature 537'
- 'Feature 538'
- 'Feature 539'
- 'Feature 540'
- 'Feature 541'
- 'Feature 542'
- 'Feature 543'
- 'Feature 544'
- 'Feature 545'
- 'Feature 546'
- 'Feature 547'
- 'Feature 548'
- 'Feature 549'
- 'Feature 550'
- 'Feature 551'
- 'Feature 552'
- 'Feature 553'
- 'Feature 554'
- 'Feature 555'
- 'Feature 556'
- 'Feature 557'
- 'Feature 558'
- 'Feature 559'
- 'Feature 560'
- 'Feature 561'
- 'Feature 562'
- 'Feature 563'
- 'Feature 564'
- 'Feature 565'
- 'Feature 566'
- 'Feature 567'
- 'Feature 568'
- 'Feature 569'
- 'Feature 570'
- 'Feature 571'
- 'Feature 572'
- 'Feature 573'
- 'Feature 574'
- 'Feature 575'
- 'Feature 576'
- 'Feature 577'
- 'Feature 578'
- 'Feature 579'
- 'Feature 580'
- 'Feature 581'
- 'Feature 582'
- 'Feature 583'
- 'Feature 584'
- 'Feature 585'
- 'Feature 586'
- 'Feature 587'
- 'Feature 588'
- 'Feature 589'
- 'Feature 590'
In [5]:
# Look at "Classification"
summary(h_secom$Classification, exact_quantiles=TRUE)
Classification
Min. :-1.0000
1st Qu.:-1.0000
Median :-1.0000
Mean :-0.8673
3rd Qu.:-1.0000
Max. : 1.0000
In [6]:
# "Classification" is a column of numerical values
# Convert "Classification" in secom dataset from numerical to categorical value
h_secom$Classification <- as.factor(h_secom$Classification)
In [7]:
# Look at "Classification" again
summary(h_secom$Classification, exact_quantiles=TRUE)
Classification
-1:1463
1 : 104
In [8]:
# Define target (y) and features (x)
target <- "Classification"
features <- setdiff(colnames(h_secom), target)
print(features)
[1] "Feature 001" "Feature 002" "Feature 003" "Feature 004" "Feature 005"
[6] "Feature 006" "Feature 007" "Feature 008" "Feature 009" "Feature 010"
[11] "Feature 011" "Feature 012" "Feature 013" "Feature 014" "Feature 015"
[16] "Feature 016" "Feature 017" "Feature 018" "Feature 019" "Feature 020"
[21] "Feature 021" "Feature 022" "Feature 023" "Feature 024" "Feature 025"
[26] "Feature 026" "Feature 027" "Feature 028" "Feature 029" "Feature 030"
[31] "Feature 031" "Feature 032" "Feature 033" "Feature 034" "Feature 035"
[36] "Feature 036" "Feature 037" "Feature 038" "Feature 039" "Feature 040"
[41] "Feature 041" "Feature 042" "Feature 043" "Feature 044" "Feature 045"
[46] "Feature 046" "Feature 047" "Feature 048" "Feature 049" "Feature 050"
[51] "Feature 051" "Feature 052" "Feature 053" "Feature 054" "Feature 055"
[56] "Feature 056" "Feature 057" "Feature 058" "Feature 059" "Feature 060"
[61] "Feature 061" "Feature 062" "Feature 063" "Feature 064" "Feature 065"
[66] "Feature 066" "Feature 067" "Feature 068" "Feature 069" "Feature 070"
[71] "Feature 071" "Feature 072" "Feature 073" "Feature 074" "Feature 075"
[76] "Feature 076" "Feature 077" "Feature 078" "Feature 079" "Feature 080"
[81] "Feature 081" "Feature 082" "Feature 083" "Feature 084" "Feature 085"
[86] "Feature 086" "Feature 087" "Feature 088" "Feature 089" "Feature 090"
[91] "Feature 091" "Feature 092" "Feature 093" "Feature 094" "Feature 095"
[96] "Feature 096" "Feature 097" "Feature 098" "Feature 099" "Feature 100"
[101] "Feature 101" "Feature 102" "Feature 103" "Feature 104" "Feature 105"
[106] "Feature 106" "Feature 107" "Feature 108" "Feature 109" "Feature 110"
[111] "Feature 111" "Feature 112" "Feature 113" "Feature 114" "Feature 115"
[116] "Feature 116" "Feature 117" "Feature 118" "Feature 119" "Feature 120"
[121] "Feature 121" "Feature 122" "Feature 123" "Feature 124" "Feature 125"
[126] "Feature 126" "Feature 127" "Feature 128" "Feature 129" "Feature 130"
[131] "Feature 131" "Feature 132" "Feature 133" "Feature 134" "Feature 135"
[136] "Feature 136" "Feature 137" "Feature 138" "Feature 139" "Feature 140"
[141] "Feature 141" "Feature 142" "Feature 143" "Feature 144" "Feature 145"
[146] "Feature 146" "Feature 147" "Feature 148" "Feature 149" "Feature 150"
[151] "Feature 151" "Feature 152" "Feature 153" "Feature 154" "Feature 155"
[156] "Feature 156" "Feature 157" "Feature 158" "Feature 159" "Feature 160"
[161] "Feature 161" "Feature 162" "Feature 163" "Feature 164" "Feature 165"
[166] "Feature 166" "Feature 167" "Feature 168" "Feature 169" "Feature 170"
[171] "Feature 171" "Feature 172" "Feature 173" "Feature 174" "Feature 175"
[176] "Feature 176" "Feature 177" "Feature 178" "Feature 179" "Feature 180"
[181] "Feature 181" "Feature 182" "Feature 183" "Feature 184" "Feature 185"
[186] "Feature 186" "Feature 187" "Feature 188" "Feature 189" "Feature 190"
[191] "Feature 191" "Feature 192" "Feature 193" "Feature 194" "Feature 195"
[196] "Feature 196" "Feature 197" "Feature 198" "Feature 199" "Feature 200"
[201] "Feature 201" "Feature 202" "Feature 203" "Feature 204" "Feature 205"
[206] "Feature 206" "Feature 207" "Feature 208" "Feature 209" "Feature 210"
[211] "Feature 211" "Feature 212" "Feature 213" "Feature 214" "Feature 215"
[216] "Feature 216" "Feature 217" "Feature 218" "Feature 219" "Feature 220"
[221] "Feature 221" "Feature 222" "Feature 223" "Feature 224" "Feature 225"
[226] "Feature 226" "Feature 227" "Feature 228" "Feature 229" "Feature 230"
[231] "Feature 231" "Feature 232" "Feature 233" "Feature 234" "Feature 235"
[236] "Feature 236" "Feature 237" "Feature 238" "Feature 239" "Feature 240"
[241] "Feature 241" "Feature 242" "Feature 243" "Feature 244" "Feature 245"
[246] "Feature 246" "Feature 247" "Feature 248" "Feature 249" "Feature 250"
[251] "Feature 251" "Feature 252" "Feature 253" "Feature 254" "Feature 255"
[256] "Feature 256" "Feature 257" "Feature 258" "Feature 259" "Feature 260"
[261] "Feature 261" "Feature 262" "Feature 263" "Feature 264" "Feature 265"
[266] "Feature 266" "Feature 267" "Feature 268" "Feature 269" "Feature 270"
[271] "Feature 271" "Feature 272" "Feature 273" "Feature 274" "Feature 275"
[276] "Feature 276" "Feature 277" "Feature 278" "Feature 279" "Feature 280"
[281] "Feature 281" "Feature 282" "Feature 283" "Feature 284" "Feature 285"
[286] "Feature 286" "Feature 287" "Feature 288" "Feature 289" "Feature 290"
[291] "Feature 291" "Feature 292" "Feature 293" "Feature 294" "Feature 295"
[296] "Feature 296" "Feature 297" "Feature 298" "Feature 299" "Feature 300"
[301] "Feature 301" "Feature 302" "Feature 303" "Feature 304" "Feature 305"
[306] "Feature 306" "Feature 307" "Feature 308" "Feature 309" "Feature 310"
[311] "Feature 311" "Feature 312" "Feature 313" "Feature 314" "Feature 315"
[316] "Feature 316" "Feature 317" "Feature 318" "Feature 319" "Feature 320"
[321] "Feature 321" "Feature 322" "Feature 323" "Feature 324" "Feature 325"
[326] "Feature 326" "Feature 327" "Feature 328" "Feature 329" "Feature 330"
[331] "Feature 331" "Feature 332" "Feature 333" "Feature 334" "Feature 335"
[336] "Feature 336" "Feature 337" "Feature 338" "Feature 339" "Feature 340"
[341] "Feature 341" "Feature 342" "Feature 343" "Feature 344" "Feature 345"
[346] "Feature 346" "Feature 347" "Feature 348" "Feature 349" "Feature 350"
[351] "Feature 351" "Feature 352" "Feature 353" "Feature 354" "Feature 355"
[356] "Feature 356" "Feature 357" "Feature 358" "Feature 359" "Feature 360"
[361] "Feature 361" "Feature 362" "Feature 363" "Feature 364" "Feature 365"
[366] "Feature 366" "Feature 367" "Feature 368" "Feature 369" "Feature 370"
[371] "Feature 371" "Feature 372" "Feature 373" "Feature 374" "Feature 375"
[376] "Feature 376" "Feature 377" "Feature 378" "Feature 379" "Feature 380"
[381] "Feature 381" "Feature 382" "Feature 383" "Feature 384" "Feature 385"
[386] "Feature 386" "Feature 387" "Feature 388" "Feature 389" "Feature 390"
[391] "Feature 391" "Feature 392" "Feature 393" "Feature 394" "Feature 395"
[396] "Feature 396" "Feature 397" "Feature 398" "Feature 399" "Feature 400"
[401] "Feature 401" "Feature 402" "Feature 403" "Feature 404" "Feature 405"
[406] "Feature 406" "Feature 407" "Feature 408" "Feature 409" "Feature 410"
[411] "Feature 411" "Feature 412" "Feature 413" "Feature 414" "Feature 415"
[416] "Feature 416" "Feature 417" "Feature 418" "Feature 419" "Feature 420"
[421] "Feature 421" "Feature 422" "Feature 423" "Feature 424" "Feature 425"
[426] "Feature 426" "Feature 427" "Feature 428" "Feature 429" "Feature 430"
[431] "Feature 431" "Feature 432" "Feature 433" "Feature 434" "Feature 435"
[436] "Feature 436" "Feature 437" "Feature 438" "Feature 439" "Feature 440"
[441] "Feature 441" "Feature 442" "Feature 443" "Feature 444" "Feature 445"
[446] "Feature 446" "Feature 447" "Feature 448" "Feature 449" "Feature 450"
[451] "Feature 451" "Feature 452" "Feature 453" "Feature 454" "Feature 455"
[456] "Feature 456" "Feature 457" "Feature 458" "Feature 459" "Feature 460"
[461] "Feature 461" "Feature 462" "Feature 463" "Feature 464" "Feature 465"
[466] "Feature 466" "Feature 467" "Feature 468" "Feature 469" "Feature 470"
[471] "Feature 471" "Feature 472" "Feature 473" "Feature 474" "Feature 475"
[476] "Feature 476" "Feature 477" "Feature 478" "Feature 479" "Feature 480"
[481] "Feature 481" "Feature 482" "Feature 483" "Feature 484" "Feature 485"
[486] "Feature 486" "Feature 487" "Feature 488" "Feature 489" "Feature 490"
[491] "Feature 491" "Feature 492" "Feature 493" "Feature 494" "Feature 495"
[496] "Feature 496" "Feature 497" "Feature 498" "Feature 499" "Feature 500"
[501] "Feature 501" "Feature 502" "Feature 503" "Feature 504" "Feature 505"
[506] "Feature 506" "Feature 507" "Feature 508" "Feature 509" "Feature 510"
[511] "Feature 511" "Feature 512" "Feature 513" "Feature 514" "Feature 515"
[516] "Feature 516" "Feature 517" "Feature 518" "Feature 519" "Feature 520"
[521] "Feature 521" "Feature 522" "Feature 523" "Feature 524" "Feature 525"
[526] "Feature 526" "Feature 527" "Feature 528" "Feature 529" "Feature 530"
[531] "Feature 531" "Feature 532" "Feature 533" "Feature 534" "Feature 535"
[536] "Feature 536" "Feature 537" "Feature 538" "Feature 539" "Feature 540"
[541] "Feature 541" "Feature 542" "Feature 543" "Feature 544" "Feature 545"
[546] "Feature 546" "Feature 547" "Feature 548" "Feature 549" "Feature 550"
[551] "Feature 551" "Feature 552" "Feature 553" "Feature 554" "Feature 555"
[556] "Feature 556" "Feature 557" "Feature 558" "Feature 559" "Feature 560"
[561] "Feature 561" "Feature 562" "Feature 563" "Feature 564" "Feature 565"
[566] "Feature 566" "Feature 567" "Feature 568" "Feature 569" "Feature 570"
[571] "Feature 571" "Feature 572" "Feature 573" "Feature 574" "Feature 575"
[576] "Feature 576" "Feature 577" "Feature 578" "Feature 579" "Feature 580"
[581] "Feature 581" "Feature 582" "Feature 583" "Feature 584" "Feature 585"
[586] "Feature 586" "Feature 587" "Feature 588" "Feature 589" "Feature 590"
In [9]:
# Splitting dataset into training and test
h_split <- h2o.splitFrame(h_secom, ratios = 0.7, seed = 1234)
h_train <- h_split[[1]] # 70%
h_test <- h_split[[2]] # 30%
In [10]:
# Look at the size
dim(h_train)
dim(h_test)
- 1105
- 591
- 462
- 591
In [11]:
# Check Classification in each dataset
summary(h_train$Classification, exact_quantiles = TRUE)
summary(h_test$Classification, exact_quantiles = TRUE)
Classification
-1:1028
1 : 77
Classification
-1:435
1 : 27
In [12]:
# H2O Gradient Boosting Machine with default settings
model <- h2o.gbm(x = features,
y = target,
training_frame = h_train,
seed = 1234)
Warning message in .h2o.startModelJob(algo, params, h2oRestApiVersion):
“Dropping constant columns: [Feature 516, Feature 234, Feature 233, Feature 236, Feature 235, Feature 510, Feature 238, Feature 513, Feature 237, Feature 479, Feature 515, Feature 514, Feature 193, Feature 192, Feature 195, Feature 194, Feature 075, Feature 230, Feature 232, Feature 231, Feature 529, Feature 244, Feature 365, Feature 401, Feature 400, Feature 006, Feature 403, Feature 402, Feature 405, Feature 404, Feature 241, Feature 482, Feature 243, Feature 242, Feature 180, Feature 179, Feature 459, Feature 050, Feature 053, Feature 450, Feature 210, Feature 331, Feature 452, Feature 330, Feature 451, Feature 191, Feature 070, Feature 190, Feature 506, Feature 505, Feature 508, Feature 507, Feature 509, Feature 465, Feature 343, Feature 464, Feature 467, Feature 466, Feature 227, Feature 348, Feature 502, Feature 504, Feature 503, Feature 463, Feature 187, Feature 462, Feature 399, Feature 277, Feature 398, Feature 315, Feature 314, Feature 316, Feature 150, Feature 395, Feature 397, Feature 396, Feature 329, Feature 323, Feature 326, Feature 207, Feature 328, Feature 327, Feature 285, Feature 043, Feature 539, Feature 538, Feature 531, Feature 014, Feature 376, Feature 530, Feature 258, Feature 379, Feature 533, Feature 257, Feature 499, Feature 532, Feature 535, Feature 259, Feature 534, Feature 537, Feature 415, Feature 536, Feature 371, Feature 370, Feature 373, Feature 098, Feature 372, Feature 375, Feature 374, Feature 267, Feature 266, Feature 423, Feature 380, Feature 261, Feature 382, Feature 260, Feature 381, Feature 142, Feature 263, Feature 262, Feature 265, Feature 264].
”
|======================================================================| 100%
In [13]:
# Print out model summary
summary(model)
Model Details:
==============
H2OBinomialModel: gbm
Model Key: GBM_model_R_1492548973134_3
Model Summary:
number_of_trees number_of_internal_trees model_size_in_bytes min_depth
1 50 50 11653 5
max_depth mean_depth min_leaves max_leaves mean_leaves
1 5 5.00000 7 18 12.56000
H2OBinomialMetrics: gbm
** Reported on training data. **
MSE: 0.004654337
RMSE: 0.0682227
LogLoss: 0.03489075
Mean Per-Class Error: 0
AUC: 1
Gini: 1
Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
-1 1 Error Rate
-1 1028 0 0.000000 =0/1028
1 0 77 0.000000 =0/77
Totals 1028 77 0.000000 =0/1105
Maximum Metrics: Maximum metrics at their respective thresholds
metric threshold value idx
1 max f1 0.510169 1.000000 75
2 max f2 0.510169 1.000000 75
3 max f0point5 0.510169 1.000000 75
4 max accuracy 0.510169 1.000000 75
5 max precision 0.929811 1.000000 0
6 max recall 0.510169 1.000000 75
7 max specificity 0.929811 1.000000 0
8 max absolute_mcc 0.510169 1.000000 75
9 max min_per_class_accuracy 0.510169 1.000000 75
10 max mean_per_class_accuracy 0.510169 1.000000 75
Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
Scoring History:
timestamp duration number_of_trees training_rmse training_logloss
1 2017-04-18 21:56:21 0.010 sec 0 0.25461 0.25282
2 2017-04-18 21:56:21 0.430 sec 1 0.24691 0.23022
3 2017-04-18 21:56:21 0.673 sec 2 0.23629 0.20687
4 2017-04-18 21:56:21 0.934 sec 3 0.22676 0.19006
5 2017-04-18 21:56:22 1.086 sec 4 0.21903 0.17883
training_auc training_lift training_classification_error
1 0.50000 1.00000 0.93032
2 0.80543 9.32792 0.07602
3 0.90912 14.35065 0.06063
4 0.93813 14.35065 0.03620
5 0.95722 14.35065 0.03077
---
timestamp duration number_of_trees training_rmse
35 2017-04-18 21:56:24 3.674 sec 34 0.10475
36 2017-04-18 21:56:24 3.768 sec 35 0.10214
37 2017-04-18 21:56:24 3.849 sec 36 0.09871
38 2017-04-18 21:56:24 3.928 sec 37 0.09683
39 2017-04-18 21:56:25 4.003 sec 38 0.09387
40 2017-04-18 21:56:25 4.754 sec 50 0.06822
training_logloss training_auc training_lift training_classification_error
35 0.05787 0.99999 14.35065 0.00090
36 0.05603 0.99999 14.35065 0.00090
37 0.05377 0.99999 14.35065 0.00090
38 0.05254 1.00000 14.35065 0.00000
39 0.05051 1.00000 14.35065 0.00000
40 0.03489 1.00000 14.35065 0.00000
Variable Importances: (Extract with `h2o.varimp`)
=================================================
Variable Importances:
variable relative_importance scaled_importance percentage
1 Feature 060 13.682457 1.000000 0.059038
2 Feature 065 7.080432 0.517483 0.030551
3 Feature 122 6.408298 0.468359 0.027651
4 Feature 017 5.218538 0.381404 0.022517
5 Feature 133 4.659705 0.340561 0.020106
---
variable relative_importance scaled_importance percentage
463 Feature 579 0.000000 0.000000 0.000000
464 Feature 582 0.000000 0.000000 0.000000
465 Feature 584 0.000000 0.000000 0.000000
466 Feature 585 0.000000 0.000000 0.000000
467 Feature 586 0.000000 0.000000 0.000000
468 Feature 590 0.000000 0.000000 0.000000
In [14]:
# Check performance on test set
h2o.performance(model, h_test)
H2OBinomialMetrics: gbm
MSE: 0.05435156
RMSE: 0.2331342
LogLoss: 0.2154845
Mean Per-Class Error: 0.3260536
AUC: 0.7404427
Gini: 0.4808855
Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
-1 1 Error Rate
-1 393 42 0.096552 =42/435
1 15 12 0.555556 =15/27
Totals 408 54 0.123377 =57/462
Maximum Metrics: Maximum metrics at their respective thresholds
metric threshold value idx
1 max f1 0.056665 0.296296 53
2 max f2 0.038989 0.382653 86
3 max f0point5 0.084880 0.259259 26
4 max accuracy 0.584555 0.939394 0
5 max precision 0.084880 0.259259 26
6 max recall 0.007542 1.000000 375
7 max specificity 0.584555 0.997701 0
8 max absolute_mcc 0.056665 0.254007 53
9 max min_per_class_accuracy 0.026387 0.703704 137
10 max mean_per_class_accuracy 0.023328 0.711750 153
Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
In [15]:
# Use the model for predictions
yhat_test <- h2o.predict(model, h_test)
|======================================================================| 100%
In [16]:
# Show first 10 rows
head(yhat_test, 10)
predict p-1 p1
-1 0.9891922 0.010807836
-1 0.9667993 0.033200722
-1 0.9404231 0.059576900
-1 0.8932517 0.106748270
-1 0.9900038 0.009996166
-1 0.9528278 0.047172190
-1 0.8492997 0.150700332
-1 0.9855647 0.014435350
-1 0.9465926 0.053407411
-1 0.9617147 0.038285313
Content source: woobe/h2o_tutorials
Similar notebooks: