In [1]:
library(h2o)
h2o.init()


----------------------------------------------------------------------

Your next step is to start H2O:
    > h2o.init()

For H2O package documentation, ask for help:
    > ??h2o

After starting H2O, you can use the Web UI at http://localhost:54321
For more information visit http://docs.h2o.ai

----------------------------------------------------------------------


Attaching package: ‘h2o’

The following objects are masked from ‘package:stats’:

    cor, sd, var

The following objects are masked from ‘package:base’:

    &&, %*%, %in%, ||, apply, as.factor, as.numeric, colnames,
    colnames<-, ifelse, is.character, is.factor, is.numeric, log,
    log10, log1p, log2, round, signif, trunc

H2O is not running yet, starting it now...

Note:  In case of errors look at the following log files:
    /tmp/RtmpBMvgSb/h2o_micio1970_started_from_r.out
    /tmp/RtmpBMvgSb/h2o_micio1970_started_from_r.err


Starting H2O JVM and connecting: ... Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         3 seconds 917 milliseconds 
    H2O cluster version:        3.10.2.2 
    H2O cluster version age:    3 months and 24 days !!! 
    H2O cluster name:           H2O_started_from_R_micio1970_bmf429 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   1.71 GB 
    H2O cluster total cores:    4 
    H2O cluster allowed cores:  2 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    R Version:                  R version 3.3.2 (2016-10-31) 
Warning message in h2o.clusterInfo():
“
Your H2O cluster version is too old (3 months and 24 days)!
Please download and install the latest version from http://h2o.ai/download/”
Note:  As started, H2O is limited to the CRAN default of 2 CPUs.
       Shut down and restart H2O as shown below to use all your CPUs.
           > h2o.shutdown()
           > h2o.init(nthreads = -1)


In [2]:
mtcar <- read.csv("../data/auto_design.csv")
mtcar$gear <- as.factor(mtcar$gear)  
mtcar$carb <- as.factor(mtcar$carb) 
mtcar$cyl <- as.factor(mtcar$cyl)  
mtcar$vs  <- as.factor(mtcar$vs)  
mtcar$am  <- as.factor(mtcar$am)
mtcar$ID  <- 1:nrow(mtcar)  
mtcar.hex  <- as.h2o(mtcar)


  |======================================================================| 100%

In [3]:
# Use a bigger DNN
mtcar.dl = h2o.deeplearning(x = 1:10, training_frame = mtcar.hex, autoencoder = TRUE,hidden = c(50, 50, 50), epochs = 100,seed=1)


  |======================================================================| 100%

In [10]:
errors <- h2o.anomaly(mtcar.dl, mtcar.hex, per_feature = FALSE)
errors_r <- as.data.frame(errors)
print(errors_r)


   Reconstruction.MSE
1          0.08062820
2          0.08062627
3          0.08107896
4          0.08023649
5          0.08041644
6          0.08035883
7          0.08059006
8          0.08080999
9          0.08077441
10         0.08058066
11         0.13421215
12         0.08028465
13         0.08024953
14         0.08034106
15         0.08179186
16         0.08165718
17         0.08113470
18         0.11731048
19         0.09553176
20         0.08212478
21         0.08095087
22         0.08039952
23         0.08038477
24         0.08060766
25         0.08064635
26         0.08155116
27         0.08104111
28         0.08149853
29         0.08047391
30         0.09316467
31         0.08041732
32         0.08092992

In [15]:
# Outliers (define 0.099 as the cut-off point)
row_outliers <- which(errors_r > 0.099) # based on plot above
#print (row_outliers)
mtcar[row_outliers,]


XmpgcyldisphpdratwtqsecvsamgearcarbID
11Merc 280C17.8 6 167.6 210 800 900 1000 1 0 4 4 11
18Fiat 128 32.4 4 780.0 2100 400 200 700 1 1 4 1 18

In [ ]:


In [ ]: