- Using Random Grid Search to fine tune models parameters
In [1]:
# Load h2o library
suppressPackageStartupMessages(library(h2o))
In [2]:
# Start and connect to a local H2O cluster
h2o.init(nthreads = -1)
H2O is not running yet, starting it now...
Note: In case of errors look at the following log files:
/tmp/RtmpCsdpyq/h2o_joe_started_from_r.out
/tmp/RtmpCsdpyq/h2o_joe_started_from_r.err
Starting H2O JVM and connecting: .. Connection successful!
R is connected to the H2O cluster:
H2O cluster uptime: 2 seconds 539 milliseconds
H2O cluster version: 3.10.4.4
H2O cluster version age: 3 days
H2O cluster name: H2O_started_from_R_joe_vlz382
H2O cluster total nodes: 1
H2O cluster total memory: 5.21 GB
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
H2O Internal Security: FALSE
R Version: R version 3.3.2 (2016-10-31)
In [3]:
# Importing data from local CSV
h_secom <- h2o.importFile(path = "secom.csv", destination_frame = "h_secom")
|======================================================================| 100%
In [4]:
# Print out column names
colnames(h_secom)
- 'Classification'
- 'Feature 001'
- 'Feature 002'
- 'Feature 003'
- 'Feature 004'
- 'Feature 005'
- 'Feature 006'
- 'Feature 007'
- 'Feature 008'
- 'Feature 009'
- 'Feature 010'
- 'Feature 011'
- 'Feature 012'
- 'Feature 013'
- 'Feature 014'
- 'Feature 015'
- 'Feature 016'
- 'Feature 017'
- 'Feature 018'
- 'Feature 019'
- 'Feature 020'
- 'Feature 021'
- 'Feature 022'
- 'Feature 023'
- 'Feature 024'
- 'Feature 025'
- 'Feature 026'
- 'Feature 027'
- 'Feature 028'
- 'Feature 029'
- 'Feature 030'
- 'Feature 031'
- 'Feature 032'
- 'Feature 033'
- 'Feature 034'
- 'Feature 035'
- 'Feature 036'
- 'Feature 037'
- 'Feature 038'
- 'Feature 039'
- 'Feature 040'
- 'Feature 041'
- 'Feature 042'
- 'Feature 043'
- 'Feature 044'
- 'Feature 045'
- 'Feature 046'
- 'Feature 047'
- 'Feature 048'
- 'Feature 049'
- 'Feature 050'
- 'Feature 051'
- 'Feature 052'
- 'Feature 053'
- 'Feature 054'
- 'Feature 055'
- 'Feature 056'
- 'Feature 057'
- 'Feature 058'
- 'Feature 059'
- 'Feature 060'
- 'Feature 061'
- 'Feature 062'
- 'Feature 063'
- 'Feature 064'
- 'Feature 065'
- 'Feature 066'
- 'Feature 067'
- 'Feature 068'
- 'Feature 069'
- 'Feature 070'
- 'Feature 071'
- 'Feature 072'
- 'Feature 073'
- 'Feature 074'
- 'Feature 075'
- 'Feature 076'
- 'Feature 077'
- 'Feature 078'
- 'Feature 079'
- 'Feature 080'
- 'Feature 081'
- 'Feature 082'
- 'Feature 083'
- 'Feature 084'
- 'Feature 085'
- 'Feature 086'
- 'Feature 087'
- 'Feature 088'
- 'Feature 089'
- 'Feature 090'
- 'Feature 091'
- 'Feature 092'
- 'Feature 093'
- 'Feature 094'
- 'Feature 095'
- 'Feature 096'
- 'Feature 097'
- 'Feature 098'
- 'Feature 099'
- 'Feature 100'
- 'Feature 101'
- 'Feature 102'
- 'Feature 103'
- 'Feature 104'
- 'Feature 105'
- 'Feature 106'
- 'Feature 107'
- 'Feature 108'
- 'Feature 109'
- 'Feature 110'
- 'Feature 111'
- 'Feature 112'
- 'Feature 113'
- 'Feature 114'
- 'Feature 115'
- 'Feature 116'
- 'Feature 117'
- 'Feature 118'
- 'Feature 119'
- 'Feature 120'
- 'Feature 121'
- 'Feature 122'
- 'Feature 123'
- 'Feature 124'
- 'Feature 125'
- 'Feature 126'
- 'Feature 127'
- 'Feature 128'
- 'Feature 129'
- 'Feature 130'
- 'Feature 131'
- 'Feature 132'
- 'Feature 133'
- 'Feature 134'
- 'Feature 135'
- 'Feature 136'
- 'Feature 137'
- 'Feature 138'
- 'Feature 139'
- 'Feature 140'
- 'Feature 141'
- 'Feature 142'
- 'Feature 143'
- 'Feature 144'
- 'Feature 145'
- 'Feature 146'
- 'Feature 147'
- 'Feature 148'
- 'Feature 149'
- 'Feature 150'
- 'Feature 151'
- 'Feature 152'
- 'Feature 153'
- 'Feature 154'
- 'Feature 155'
- 'Feature 156'
- 'Feature 157'
- 'Feature 158'
- 'Feature 159'
- 'Feature 160'
- 'Feature 161'
- 'Feature 162'
- 'Feature 163'
- 'Feature 164'
- 'Feature 165'
- 'Feature 166'
- 'Feature 167'
- 'Feature 168'
- 'Feature 169'
- 'Feature 170'
- 'Feature 171'
- 'Feature 172'
- 'Feature 173'
- 'Feature 174'
- 'Feature 175'
- 'Feature 176'
- 'Feature 177'
- 'Feature 178'
- 'Feature 179'
- 'Feature 180'
- 'Feature 181'
- 'Feature 182'
- 'Feature 183'
- 'Feature 184'
- 'Feature 185'
- 'Feature 186'
- 'Feature 187'
- 'Feature 188'
- 'Feature 189'
- 'Feature 190'
- 'Feature 191'
- 'Feature 192'
- 'Feature 193'
- 'Feature 194'
- 'Feature 195'
- 'Feature 196'
- 'Feature 197'
- 'Feature 198'
- 'Feature 199'
- 'Feature 200'
- 'Feature 201'
- 'Feature 202'
- 'Feature 203'
- 'Feature 204'
- 'Feature 205'
- 'Feature 206'
- 'Feature 207'
- 'Feature 208'
- 'Feature 209'
- 'Feature 210'
- 'Feature 211'
- 'Feature 212'
- 'Feature 213'
- 'Feature 214'
- 'Feature 215'
- 'Feature 216'
- 'Feature 217'
- 'Feature 218'
- 'Feature 219'
- 'Feature 220'
- 'Feature 221'
- 'Feature 222'
- 'Feature 223'
- 'Feature 224'
- 'Feature 225'
- 'Feature 226'
- 'Feature 227'
- 'Feature 228'
- 'Feature 229'
- 'Feature 230'
- 'Feature 231'
- 'Feature 232'
- 'Feature 233'
- 'Feature 234'
- 'Feature 235'
- 'Feature 236'
- 'Feature 237'
- 'Feature 238'
- 'Feature 239'
- 'Feature 240'
- 'Feature 241'
- 'Feature 242'
- 'Feature 243'
- 'Feature 244'
- 'Feature 245'
- 'Feature 246'
- 'Feature 247'
- 'Feature 248'
- 'Feature 249'
- 'Feature 250'
- 'Feature 251'
- 'Feature 252'
- 'Feature 253'
- 'Feature 254'
- 'Feature 255'
- 'Feature 256'
- 'Feature 257'
- 'Feature 258'
- 'Feature 259'
- 'Feature 260'
- 'Feature 261'
- 'Feature 262'
- 'Feature 263'
- 'Feature 264'
- 'Feature 265'
- 'Feature 266'
- 'Feature 267'
- 'Feature 268'
- 'Feature 269'
- 'Feature 270'
- 'Feature 271'
- 'Feature 272'
- 'Feature 273'
- 'Feature 274'
- 'Feature 275'
- 'Feature 276'
- 'Feature 277'
- 'Feature 278'
- 'Feature 279'
- 'Feature 280'
- 'Feature 281'
- 'Feature 282'
- 'Feature 283'
- 'Feature 284'
- 'Feature 285'
- 'Feature 286'
- 'Feature 287'
- 'Feature 288'
- 'Feature 289'
- 'Feature 290'
- 'Feature 291'
- 'Feature 292'
- 'Feature 293'
- 'Feature 294'
- 'Feature 295'
- 'Feature 296'
- 'Feature 297'
- 'Feature 298'
- 'Feature 299'
- 'Feature 300'
- 'Feature 301'
- 'Feature 302'
- 'Feature 303'
- 'Feature 304'
- 'Feature 305'
- 'Feature 306'
- 'Feature 307'
- 'Feature 308'
- 'Feature 309'
- 'Feature 310'
- 'Feature 311'
- 'Feature 312'
- 'Feature 313'
- 'Feature 314'
- 'Feature 315'
- 'Feature 316'
- 'Feature 317'
- 'Feature 318'
- 'Feature 319'
- 'Feature 320'
- 'Feature 321'
- 'Feature 322'
- 'Feature 323'
- 'Feature 324'
- 'Feature 325'
- 'Feature 326'
- 'Feature 327'
- 'Feature 328'
- 'Feature 329'
- 'Feature 330'
- 'Feature 331'
- 'Feature 332'
- 'Feature 333'
- 'Feature 334'
- 'Feature 335'
- 'Feature 336'
- 'Feature 337'
- 'Feature 338'
- 'Feature 339'
- 'Feature 340'
- 'Feature 341'
- 'Feature 342'
- 'Feature 343'
- 'Feature 344'
- 'Feature 345'
- 'Feature 346'
- 'Feature 347'
- 'Feature 348'
- 'Feature 349'
- 'Feature 350'
- 'Feature 351'
- 'Feature 352'
- 'Feature 353'
- 'Feature 354'
- 'Feature 355'
- 'Feature 356'
- 'Feature 357'
- 'Feature 358'
- 'Feature 359'
- 'Feature 360'
- 'Feature 361'
- 'Feature 362'
- 'Feature 363'
- 'Feature 364'
- 'Feature 365'
- 'Feature 366'
- 'Feature 367'
- 'Feature 368'
- 'Feature 369'
- 'Feature 370'
- 'Feature 371'
- 'Feature 372'
- 'Feature 373'
- 'Feature 374'
- 'Feature 375'
- 'Feature 376'
- 'Feature 377'
- 'Feature 378'
- 'Feature 379'
- 'Feature 380'
- 'Feature 381'
- 'Feature 382'
- 'Feature 383'
- 'Feature 384'
- 'Feature 385'
- 'Feature 386'
- 'Feature 387'
- 'Feature 388'
- 'Feature 389'
- 'Feature 390'
- 'Feature 391'
- 'Feature 392'
- 'Feature 393'
- 'Feature 394'
- 'Feature 395'
- 'Feature 396'
- 'Feature 397'
- 'Feature 398'
- 'Feature 399'
- 'Feature 400'
- 'Feature 401'
- 'Feature 402'
- 'Feature 403'
- 'Feature 404'
- 'Feature 405'
- 'Feature 406'
- 'Feature 407'
- 'Feature 408'
- 'Feature 409'
- 'Feature 410'
- 'Feature 411'
- 'Feature 412'
- 'Feature 413'
- 'Feature 414'
- 'Feature 415'
- 'Feature 416'
- 'Feature 417'
- 'Feature 418'
- 'Feature 419'
- 'Feature 420'
- 'Feature 421'
- 'Feature 422'
- 'Feature 423'
- 'Feature 424'
- 'Feature 425'
- 'Feature 426'
- 'Feature 427'
- 'Feature 428'
- 'Feature 429'
- 'Feature 430'
- 'Feature 431'
- 'Feature 432'
- 'Feature 433'
- 'Feature 434'
- 'Feature 435'
- 'Feature 436'
- 'Feature 437'
- 'Feature 438'
- 'Feature 439'
- 'Feature 440'
- 'Feature 441'
- 'Feature 442'
- 'Feature 443'
- 'Feature 444'
- 'Feature 445'
- 'Feature 446'
- 'Feature 447'
- 'Feature 448'
- 'Feature 449'
- 'Feature 450'
- 'Feature 451'
- 'Feature 452'
- 'Feature 453'
- 'Feature 454'
- 'Feature 455'
- 'Feature 456'
- 'Feature 457'
- 'Feature 458'
- 'Feature 459'
- 'Feature 460'
- 'Feature 461'
- 'Feature 462'
- 'Feature 463'
- 'Feature 464'
- 'Feature 465'
- 'Feature 466'
- 'Feature 467'
- 'Feature 468'
- 'Feature 469'
- 'Feature 470'
- 'Feature 471'
- 'Feature 472'
- 'Feature 473'
- 'Feature 474'
- 'Feature 475'
- 'Feature 476'
- 'Feature 477'
- 'Feature 478'
- 'Feature 479'
- 'Feature 480'
- 'Feature 481'
- 'Feature 482'
- 'Feature 483'
- 'Feature 484'
- 'Feature 485'
- 'Feature 486'
- 'Feature 487'
- 'Feature 488'
- 'Feature 489'
- 'Feature 490'
- 'Feature 491'
- 'Feature 492'
- 'Feature 493'
- 'Feature 494'
- 'Feature 495'
- 'Feature 496'
- 'Feature 497'
- 'Feature 498'
- 'Feature 499'
- 'Feature 500'
- 'Feature 501'
- 'Feature 502'
- 'Feature 503'
- 'Feature 504'
- 'Feature 505'
- 'Feature 506'
- 'Feature 507'
- 'Feature 508'
- 'Feature 509'
- 'Feature 510'
- 'Feature 511'
- 'Feature 512'
- 'Feature 513'
- 'Feature 514'
- 'Feature 515'
- 'Feature 516'
- 'Feature 517'
- 'Feature 518'
- 'Feature 519'
- 'Feature 520'
- 'Feature 521'
- 'Feature 522'
- 'Feature 523'
- 'Feature 524'
- 'Feature 525'
- 'Feature 526'
- 'Feature 527'
- 'Feature 528'
- 'Feature 529'
- 'Feature 530'
- 'Feature 531'
- 'Feature 532'
- 'Feature 533'
- 'Feature 534'
- 'Feature 535'
- 'Feature 536'
- 'Feature 537'
- 'Feature 538'
- 'Feature 539'
- 'Feature 540'
- 'Feature 541'
- 'Feature 542'
- 'Feature 543'
- 'Feature 544'
- 'Feature 545'
- 'Feature 546'
- 'Feature 547'
- 'Feature 548'
- 'Feature 549'
- 'Feature 550'
- 'Feature 551'
- 'Feature 552'
- 'Feature 553'
- 'Feature 554'
- 'Feature 555'
- 'Feature 556'
- 'Feature 557'
- 'Feature 558'
- 'Feature 559'
- 'Feature 560'
- 'Feature 561'
- 'Feature 562'
- 'Feature 563'
- 'Feature 564'
- 'Feature 565'
- 'Feature 566'
- 'Feature 567'
- 'Feature 568'
- 'Feature 569'
- 'Feature 570'
- 'Feature 571'
- 'Feature 572'
- 'Feature 573'
- 'Feature 574'
- 'Feature 575'
- 'Feature 576'
- 'Feature 577'
- 'Feature 578'
- 'Feature 579'
- 'Feature 580'
- 'Feature 581'
- 'Feature 582'
- 'Feature 583'
- 'Feature 584'
- 'Feature 585'
- 'Feature 586'
- 'Feature 587'
- 'Feature 588'
- 'Feature 589'
- 'Feature 590'
In [5]:
# Look at "Classification"
summary(h_secom$Classification, exact_quantiles=TRUE)
Classification
Min. :-1.0000
1st Qu.:-1.0000
Median :-1.0000
Mean :-0.8673
3rd Qu.:-1.0000
Max. : 1.0000
In [6]:
# "Classification" is a column of numerical values
# Convert "Classification" in secom dataset from numerical to categorical value
h_secom$Classification <- as.factor(h_secom$Classification)
In [7]:
# Look at "Classification" again
summary(h_secom$Classification, exact_quantiles=TRUE)
Classification
-1:1463
1 : 104
In [8]:
# Define target (y) and features (x)
target <- "Classification"
features <- setdiff(colnames(h_secom), target)
print(features)
[1] "Feature 001" "Feature 002" "Feature 003" "Feature 004" "Feature 005"
[6] "Feature 006" "Feature 007" "Feature 008" "Feature 009" "Feature 010"
[11] "Feature 011" "Feature 012" "Feature 013" "Feature 014" "Feature 015"
[16] "Feature 016" "Feature 017" "Feature 018" "Feature 019" "Feature 020"
[21] "Feature 021" "Feature 022" "Feature 023" "Feature 024" "Feature 025"
[26] "Feature 026" "Feature 027" "Feature 028" "Feature 029" "Feature 030"
[31] "Feature 031" "Feature 032" "Feature 033" "Feature 034" "Feature 035"
[36] "Feature 036" "Feature 037" "Feature 038" "Feature 039" "Feature 040"
[41] "Feature 041" "Feature 042" "Feature 043" "Feature 044" "Feature 045"
[46] "Feature 046" "Feature 047" "Feature 048" "Feature 049" "Feature 050"
[51] "Feature 051" "Feature 052" "Feature 053" "Feature 054" "Feature 055"
[56] "Feature 056" "Feature 057" "Feature 058" "Feature 059" "Feature 060"
[61] "Feature 061" "Feature 062" "Feature 063" "Feature 064" "Feature 065"
[66] "Feature 066" "Feature 067" "Feature 068" "Feature 069" "Feature 070"
[71] "Feature 071" "Feature 072" "Feature 073" "Feature 074" "Feature 075"
[76] "Feature 076" "Feature 077" "Feature 078" "Feature 079" "Feature 080"
[81] "Feature 081" "Feature 082" "Feature 083" "Feature 084" "Feature 085"
[86] "Feature 086" "Feature 087" "Feature 088" "Feature 089" "Feature 090"
[91] "Feature 091" "Feature 092" "Feature 093" "Feature 094" "Feature 095"
[96] "Feature 096" "Feature 097" "Feature 098" "Feature 099" "Feature 100"
[101] "Feature 101" "Feature 102" "Feature 103" "Feature 104" "Feature 105"
[106] "Feature 106" "Feature 107" "Feature 108" "Feature 109" "Feature 110"
[111] "Feature 111" "Feature 112" "Feature 113" "Feature 114" "Feature 115"
[116] "Feature 116" "Feature 117" "Feature 118" "Feature 119" "Feature 120"
[121] "Feature 121" "Feature 122" "Feature 123" "Feature 124" "Feature 125"
[126] "Feature 126" "Feature 127" "Feature 128" "Feature 129" "Feature 130"
[131] "Feature 131" "Feature 132" "Feature 133" "Feature 134" "Feature 135"
[136] "Feature 136" "Feature 137" "Feature 138" "Feature 139" "Feature 140"
[141] "Feature 141" "Feature 142" "Feature 143" "Feature 144" "Feature 145"
[146] "Feature 146" "Feature 147" "Feature 148" "Feature 149" "Feature 150"
[151] "Feature 151" "Feature 152" "Feature 153" "Feature 154" "Feature 155"
[156] "Feature 156" "Feature 157" "Feature 158" "Feature 159" "Feature 160"
[161] "Feature 161" "Feature 162" "Feature 163" "Feature 164" "Feature 165"
[166] "Feature 166" "Feature 167" "Feature 168" "Feature 169" "Feature 170"
[171] "Feature 171" "Feature 172" "Feature 173" "Feature 174" "Feature 175"
[176] "Feature 176" "Feature 177" "Feature 178" "Feature 179" "Feature 180"
[181] "Feature 181" "Feature 182" "Feature 183" "Feature 184" "Feature 185"
[186] "Feature 186" "Feature 187" "Feature 188" "Feature 189" "Feature 190"
[191] "Feature 191" "Feature 192" "Feature 193" "Feature 194" "Feature 195"
[196] "Feature 196" "Feature 197" "Feature 198" "Feature 199" "Feature 200"
[201] "Feature 201" "Feature 202" "Feature 203" "Feature 204" "Feature 205"
[206] "Feature 206" "Feature 207" "Feature 208" "Feature 209" "Feature 210"
[211] "Feature 211" "Feature 212" "Feature 213" "Feature 214" "Feature 215"
[216] "Feature 216" "Feature 217" "Feature 218" "Feature 219" "Feature 220"
[221] "Feature 221" "Feature 222" "Feature 223" "Feature 224" "Feature 225"
[226] "Feature 226" "Feature 227" "Feature 228" "Feature 229" "Feature 230"
[231] "Feature 231" "Feature 232" "Feature 233" "Feature 234" "Feature 235"
[236] "Feature 236" "Feature 237" "Feature 238" "Feature 239" "Feature 240"
[241] "Feature 241" "Feature 242" "Feature 243" "Feature 244" "Feature 245"
[246] "Feature 246" "Feature 247" "Feature 248" "Feature 249" "Feature 250"
[251] "Feature 251" "Feature 252" "Feature 253" "Feature 254" "Feature 255"
[256] "Feature 256" "Feature 257" "Feature 258" "Feature 259" "Feature 260"
[261] "Feature 261" "Feature 262" "Feature 263" "Feature 264" "Feature 265"
[266] "Feature 266" "Feature 267" "Feature 268" "Feature 269" "Feature 270"
[271] "Feature 271" "Feature 272" "Feature 273" "Feature 274" "Feature 275"
[276] "Feature 276" "Feature 277" "Feature 278" "Feature 279" "Feature 280"
[281] "Feature 281" "Feature 282" "Feature 283" "Feature 284" "Feature 285"
[286] "Feature 286" "Feature 287" "Feature 288" "Feature 289" "Feature 290"
[291] "Feature 291" "Feature 292" "Feature 293" "Feature 294" "Feature 295"
[296] "Feature 296" "Feature 297" "Feature 298" "Feature 299" "Feature 300"
[301] "Feature 301" "Feature 302" "Feature 303" "Feature 304" "Feature 305"
[306] "Feature 306" "Feature 307" "Feature 308" "Feature 309" "Feature 310"
[311] "Feature 311" "Feature 312" "Feature 313" "Feature 314" "Feature 315"
[316] "Feature 316" "Feature 317" "Feature 318" "Feature 319" "Feature 320"
[321] "Feature 321" "Feature 322" "Feature 323" "Feature 324" "Feature 325"
[326] "Feature 326" "Feature 327" "Feature 328" "Feature 329" "Feature 330"
[331] "Feature 331" "Feature 332" "Feature 333" "Feature 334" "Feature 335"
[336] "Feature 336" "Feature 337" "Feature 338" "Feature 339" "Feature 340"
[341] "Feature 341" "Feature 342" "Feature 343" "Feature 344" "Feature 345"
[346] "Feature 346" "Feature 347" "Feature 348" "Feature 349" "Feature 350"
[351] "Feature 351" "Feature 352" "Feature 353" "Feature 354" "Feature 355"
[356] "Feature 356" "Feature 357" "Feature 358" "Feature 359" "Feature 360"
[361] "Feature 361" "Feature 362" "Feature 363" "Feature 364" "Feature 365"
[366] "Feature 366" "Feature 367" "Feature 368" "Feature 369" "Feature 370"
[371] "Feature 371" "Feature 372" "Feature 373" "Feature 374" "Feature 375"
[376] "Feature 376" "Feature 377" "Feature 378" "Feature 379" "Feature 380"
[381] "Feature 381" "Feature 382" "Feature 383" "Feature 384" "Feature 385"
[386] "Feature 386" "Feature 387" "Feature 388" "Feature 389" "Feature 390"
[391] "Feature 391" "Feature 392" "Feature 393" "Feature 394" "Feature 395"
[396] "Feature 396" "Feature 397" "Feature 398" "Feature 399" "Feature 400"
[401] "Feature 401" "Feature 402" "Feature 403" "Feature 404" "Feature 405"
[406] "Feature 406" "Feature 407" "Feature 408" "Feature 409" "Feature 410"
[411] "Feature 411" "Feature 412" "Feature 413" "Feature 414" "Feature 415"
[416] "Feature 416" "Feature 417" "Feature 418" "Feature 419" "Feature 420"
[421] "Feature 421" "Feature 422" "Feature 423" "Feature 424" "Feature 425"
[426] "Feature 426" "Feature 427" "Feature 428" "Feature 429" "Feature 430"
[431] "Feature 431" "Feature 432" "Feature 433" "Feature 434" "Feature 435"
[436] "Feature 436" "Feature 437" "Feature 438" "Feature 439" "Feature 440"
[441] "Feature 441" "Feature 442" "Feature 443" "Feature 444" "Feature 445"
[446] "Feature 446" "Feature 447" "Feature 448" "Feature 449" "Feature 450"
[451] "Feature 451" "Feature 452" "Feature 453" "Feature 454" "Feature 455"
[456] "Feature 456" "Feature 457" "Feature 458" "Feature 459" "Feature 460"
[461] "Feature 461" "Feature 462" "Feature 463" "Feature 464" "Feature 465"
[466] "Feature 466" "Feature 467" "Feature 468" "Feature 469" "Feature 470"
[471] "Feature 471" "Feature 472" "Feature 473" "Feature 474" "Feature 475"
[476] "Feature 476" "Feature 477" "Feature 478" "Feature 479" "Feature 480"
[481] "Feature 481" "Feature 482" "Feature 483" "Feature 484" "Feature 485"
[486] "Feature 486" "Feature 487" "Feature 488" "Feature 489" "Feature 490"
[491] "Feature 491" "Feature 492" "Feature 493" "Feature 494" "Feature 495"
[496] "Feature 496" "Feature 497" "Feature 498" "Feature 499" "Feature 500"
[501] "Feature 501" "Feature 502" "Feature 503" "Feature 504" "Feature 505"
[506] "Feature 506" "Feature 507" "Feature 508" "Feature 509" "Feature 510"
[511] "Feature 511" "Feature 512" "Feature 513" "Feature 514" "Feature 515"
[516] "Feature 516" "Feature 517" "Feature 518" "Feature 519" "Feature 520"
[521] "Feature 521" "Feature 522" "Feature 523" "Feature 524" "Feature 525"
[526] "Feature 526" "Feature 527" "Feature 528" "Feature 529" "Feature 530"
[531] "Feature 531" "Feature 532" "Feature 533" "Feature 534" "Feature 535"
[536] "Feature 536" "Feature 537" "Feature 538" "Feature 539" "Feature 540"
[541] "Feature 541" "Feature 542" "Feature 543" "Feature 544" "Feature 545"
[546] "Feature 546" "Feature 547" "Feature 548" "Feature 549" "Feature 550"
[551] "Feature 551" "Feature 552" "Feature 553" "Feature 554" "Feature 555"
[556] "Feature 556" "Feature 557" "Feature 558" "Feature 559" "Feature 560"
[561] "Feature 561" "Feature 562" "Feature 563" "Feature 564" "Feature 565"
[566] "Feature 566" "Feature 567" "Feature 568" "Feature 569" "Feature 570"
[571] "Feature 571" "Feature 572" "Feature 573" "Feature 574" "Feature 575"
[576] "Feature 576" "Feature 577" "Feature 578" "Feature 579" "Feature 580"
[581] "Feature 581" "Feature 582" "Feature 583" "Feature 584" "Feature 585"
[586] "Feature 586" "Feature 587" "Feature 588" "Feature 589" "Feature 590"
In [9]:
# Splitting dataset into training and test
h_split <- h2o.splitFrame(h_secom, ratios = 0.7, seed = 1234)
h_train <- h_split[[1]] # 70%
h_test <- h_split[[2]] # 30%
In [10]:
# Look at the size
dim(h_train)
dim(h_test)
- 1105
- 591
- 462
- 591
In [11]:
# Check Classification in each dataset
summary(h_train$Classification, exact_quantiles = TRUE)
summary(h_test$Classification, exact_quantiles = TRUE)
Classification
-1:1028
1 : 77
Classification
-1:435
1 : 27
In [12]:
# Define the criteria for random grid search
search_criteria = list(strategy = "RandomDiscrete",
max_models = 10,
seed = 1234)
In [13]:
# Define the range of hyper-parameters for grid search
hyper_params <- list(
sample_rate = c(0.6, 0.7, 0.8, 0.9),
col_sample_rate = c(0.6, 0.7, 0.8, 0.9),
max_depth = c(4, 5, 6)
)
In [14]:
# Set up grid search
# Add a seed for reproducibility
rand_grid <- h2o.grid(
# Core parameters for model training
x = features,
y = target,
training_frame = h_train,
ntrees = 500,
learn_rate = 0.05,
balance_classes = TRUE,
seed = 1234,
# Settings for Cross-Validation
nfolds = 5,
fold_assignment = "Stratified",
# Parameters for early stopping
stopping_metric = "mean_per_class_error",
stopping_rounds = 15,
score_tree_interval = 1,
# Parameters for grid search
grid_id = "rand_grid",
hyper_params = hyper_params,
algorithm = "gbm",
search_criteria = search_criteria
)
|======================================================================| 100%
In [15]:
# Sort and show the grid search results
rand_grid <- h2o.getGrid(grid_id = "rand_grid", sort_by = "mean_per_class_error", decreasing = FALSE)
print(rand_grid)
H2O Grid Details
================
Grid ID: rand_grid
Used hyper parameters:
- col_sample_rate
- max_depth
- sample_rate
Number of models: 10
Number of failed models: 0
Hyper-Parameter Search Summary: ordered by increasing mean_per_class_error
col_sample_rate max_depth sample_rate model_ids mean_per_class_error
1 0.6 4 0.8 rand_grid_model_2 0.3678874627318207
2 0.6 4 0.7 rand_grid_model_1 0.37603592905149325
3 0.6 6 0.6 rand_grid_model_5 0.3794658648744252
4 0.8 4 0.8 rand_grid_model_6 0.3818725049269796
5 0.7 4 0.7 rand_grid_model_8 0.3851066248926171
6 0.8 4 0.7 rand_grid_model_9 0.389534589923695
7 0.8 4 0.6 rand_grid_model_7 0.38985042195158925
8 0.9 6 0.7 rand_grid_model_4 0.4093562079943403
9 0.8 6 0.6 rand_grid_model_0 0.41636136237303556
10 0.9 6 0.8 rand_grid_model_3 0.42382763151245645
In [16]:
# Extract the best model from random grid search
best_model_id <- rand_grid@model_ids[[1]] # top of the list
best_model <- h2o.getModel(best_model_id)
print(best_model)
Model Details:
==============
H2OBinomialModel: gbm
Model ID: rand_grid_model_2
Model Summary:
number_of_trees number_of_internal_trees model_size_in_bytes min_depth
1 61 61 15119 4
max_depth mean_depth min_leaves max_leaves mean_leaves
1 4 4.00000 11 16 13.77049
H2OBinomialMetrics: gbm
** Reported on training data. **
MSE: 0.2561832
RMSE: 0.5061454
LogLoss: 0.6409075
Mean Per-Class Error: 0.0004863813
AUC: 0.9999754
Gini: 0.9999509
Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
-1 1 Error Rate
-1 1027 1 0.000973 =1/1028
1 0 1030 0.000000 =0/1030
Totals 1027 1031 0.000486 =1/2058
Maximum Metrics: Maximum metrics at their respective thresholds
metric threshold value idx
1 max f1 0.166784 0.999515 75
2 max f2 0.166784 0.999806 75
3 max f0point5 0.166784 0.999224 75
4 max accuracy 0.166784 0.999514 75
5 max precision 0.365659 1.000000 0
6 max recall 0.166784 1.000000 75
7 max specificity 0.365659 1.000000 0
8 max absolute_mcc 0.166784 0.999029 75
9 max min_per_class_accuracy 0.166784 0.999027 75
10 max mean_per_class_accuracy 0.166784 0.999514 75
Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
H2OBinomialMetrics: gbm
** Reported on cross-validation data. **
** 5-fold cross-validation on training data (Metrics computed for combined holdout predictions) **
MSE: 0.06572745
RMSE: 0.2563737
LogLoss: 0.2767561
Mean Per-Class Error: 0.3678875
AUC: 0.6739855
Gini: 0.3479711
Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
-1 1 Error Rate
-1 819 209 0.203307 =209/1028
1 41 36 0.532468 =41/77
Totals 860 245 0.226244 =250/1105
Maximum Metrics: Maximum metrics at their respective thresholds
metric threshold value idx
1 max f1 0.028906 0.223602 174
2 max f2 0.011951 0.328467 297
3 max f0point5 0.082901 0.204778 44
4 max accuracy 0.164703 0.929412 0
5 max precision 0.084812 0.229167 40
6 max recall 0.001585 1.000000 391
7 max specificity 0.164703 0.999027 0
8 max absolute_mcc 0.028906 0.161951 174
9 max min_per_class_accuracy 0.018432 0.610390 243
10 max mean_per_class_accuracy 0.028906 0.632113 174
Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
Cross-Validation Metrics Summary:
mean sd cv_1_valid cv_2_valid
accuracy 0.85497105 0.028622998 0.8867925 0.78629035
auc 0.70107853 0.040595975 0.79950494 0.64812684
err 0.14502896 0.028622998 0.11320755 0.21370968
err_count 32.6 8.057295 24.0 53.0
f0point5 0.24869587 0.032992177 0.23584905 0.1826484
f1 0.29050645 0.026739191 0.29411766 0.23188406
f2 0.35414198 0.018278506 0.390625 0.31746033
lift_top_group 1.0666667 1.5084945 0.0 0.0
logloss 0.27472907 0.040207863 0.20996137 0.32169387
max_per_class_error 0.5788664 0.0290633 0.5 0.57894737
mcc 0.23437075 0.032911915 0.2716149 0.15754637
mean_per_class_accuracy 0.6542992 0.019754296 0.7029703 0.61882323
mean_per_class_error 0.3457008 0.019754296 0.2970297 0.38117674
mse 0.06533862 0.009910801 0.045586333 0.07369682
precision 0.22766261 0.03529381 0.20833333 0.16
r2 -0.0136614265 0.017498508 -0.014273335 -0.04174886
recall 0.4211336 0.0290633 0.5 0.42105263
rmse 0.25412405 0.019488202 0.21350956 0.2714716
specificity 0.88746476 0.028320558 0.9059406 0.8165939
cv_3_valid cv_4_valid cv_5_valid
accuracy 0.84792626 0.85 0.90384614
auc 0.66700506 0.6582114 0.73254436
err 0.15207373 0.15 0.09615385
err_count 33.0 33.0 20.0
f0point5 0.29411766 0.22222222 0.30864197
f1 0.3265306 0.26666668 0.33333334
f2 0.36697248 0.33333334 0.36231884
lift_top_group 0.0 0.0 5.3333335
logloss 0.35666713 0.26497382 0.22034925
max_per_class_error 0.6 0.6 0.61538464
mcc 0.2494203 0.20780657 0.2854656
mean_per_class_accuracy 0.6467005 0.6414634 0.6615385
mean_per_class_error 0.3532995 0.35853657 0.33846155
mse 0.08653373 0.063971125 0.056905072
precision 0.27586207 0.2 0.29411766
r2 -0.034209877 -0.0068951435 0.02882008
recall 0.4 0.4 0.3846154
rmse 0.29416618 0.25292513 0.23854785
specificity 0.893401 0.8829268 0.93846154
In [17]:
# Check performance on test set
h2o.performance(best_model, h_test)
H2OBinomialMetrics: gbm
MSE: 0.05339771
RMSE: 0.2310794
LogLoss: 0.2138174
Mean Per-Class Error: 0.2693487
AUC: 0.7553427
Gini: 0.5106854
Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
-1 1 Error Rate
-1 394 41 0.094253 =41/435
1 12 15 0.444444 =12/27
Totals 406 56 0.114719 =53/462
Maximum Metrics: Maximum metrics at their respective thresholds
metric threshold value idx
1 max f1 0.044940 0.361446 55
2 max f2 0.032325 0.465686 92
3 max f0point5 0.121518 0.338983 7
4 max accuracy 0.121518 0.941558 7
5 max precision 0.121518 0.500000 7
6 max recall 0.007008 1.000000 329
7 max specificity 0.167899 0.997701 0
8 max absolute_mcc 0.044940 0.331555 55
9 max min_per_class_accuracy 0.032325 0.703704 92
10 max mean_per_class_accuracy 0.032325 0.763346 92
Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
In [18]:
# Use the model for predictions
yhat_test <- h2o.predict(best_model, h_test)
|======================================================================| 100%
In [19]:
# Show first 10 rows
head(yhat_test, 10)
predict p-1 p1
-1 0.9643505 0.03564950
-1 0.9805827 0.01941735
-1 0.9763582 0.02364177
-1 0.8712040 0.12879599
-1 0.9765110 0.02348895
-1 0.9884546 0.01154544
-1 0.9591888 0.04081122
-1 0.9747553 0.02524473
-1 0.9782788 0.02172118
-1 0.9652645 0.03473554
Content source: woobe/h2o_tutorials
Similar notebooks: