Machine Learning with H2O - Tutorial 1: Data in H2O with R


Objective:

  • This tutorial demonstrates three different ways to import data into H2O.

Wine Quality Dataset:


Methods:

  1. Import data from a local CSV file.
  2. Import data from the web.
  3. Convert a Pandas data frame into H2O data frame.

Full Technical Reference:



In [1]:
# Start and connect to a local H2O cluster
suppressPackageStartupMessages(library(h2o))
h2o.init(nthreads = -1)


H2O is not running yet, starting it now...

Note:  In case of errors look at the following log files:
    /tmp/RtmpBf35BI/h2o_joe_started_from_r.out
    /tmp/RtmpBf35BI/h2o_joe_started_from_r.err


Starting H2O JVM and connecting: .. Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         2 seconds 553 milliseconds 
    H2O cluster version:        3.10.4.4 
    H2O cluster version age:    5 days  
    H2O cluster name:           H2O_started_from_R_joe_esh642 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   5.21 GB 
    H2O cluster total cores:    8 
    H2O cluster allowed cores:  8 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    H2O Internal Security:      FALSE 
    R Version:                  R version 3.3.2 (2016-10-31) 



In [2]:
# Method 1 - Import data from a local CSV file
data_from_csv = h2o.importFile("winequality-white.csv")
head(data_from_csv, 5)


  |======================================================================| 100%
fixed acidityvolatile aciditycitric acidresidual sugarchloridesfree sulfur dioxidetotal sulfur dioxidedensitypHsulphatesalcoholquality
7.0 0.27 0.36 20.7 0.045 45 170 1.00103.00 0.45 8.8 6
6.3 0.30 0.34 1.6 0.049 14 132 0.99403.30 0.49 9.5 6
8.1 0.28 0.40 6.9 0.050 30 97 0.99513.26 0.44 10.1 6
7.2 0.23 0.32 8.5 0.058 47 186 0.99563.19 0.40 9.9 6
7.2 0.23 0.32 8.5 0.058 47 186 0.99563.19 0.40 9.9 6



In [3]:
# Method 2 - Import data from the web
data_from_web = h2o.importFile("https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv")
head(data_from_web, 5)


  |======================================================================| 100%
fixed acidityvolatile aciditycitric acidresidual sugarchloridesfree sulfur dioxidetotal sulfur dioxidedensitypHsulphatesalcoholquality
7.0 0.27 0.36 20.7 0.045 45 170 1.00103.00 0.45 8.8 6
6.3 0.30 0.34 1.6 0.049 14 132 0.99403.30 0.49 9.5 6
8.1 0.28 0.40 6.9 0.050 30 97 0.99513.26 0.44 10.1 6
7.2 0.23 0.32 8.5 0.058 47 186 0.99563.19 0.40 9.9 6
7.2 0.23 0.32 8.5 0.058 47 186 0.99563.19 0.40 9.9 6



In [4]:
# Method 3 - Convert R data frame into H2O data frame

## Import Wine Quality data using R
wine_df = read.csv('winequality-white.csv', sep = ';')
head(wine_df, 5)


fixed.acidityvolatile.aciditycitric.acidresidual.sugarchloridesfree.sulfur.dioxidetotal.sulfur.dioxidedensitypHsulphatesalcoholquality
7.0 0.27 0.36 20.7 0.045 45 170 1.00103.00 0.45 8.8 6
6.3 0.30 0.34 1.6 0.049 14 132 0.99403.30 0.49 9.5 6
8.1 0.28 0.40 6.9 0.050 30 97 0.99513.26 0.44 10.1 6
7.2 0.23 0.32 8.5 0.058 47 186 0.99563.19 0.40 9.9 6
7.2 0.23 0.32 8.5 0.058 47 186 0.99563.19 0.40 9.9 6

In [5]:
## Convert R data frame into H2O data frame
data_from_df = as.h2o(wine_df)
head(data_from_df, 5)


  |======================================================================| 100%
fixed.acidityvolatile.aciditycitric.acidresidual.sugarchloridesfree.sulfur.dioxidetotal.sulfur.dioxidedensitypHsulphatesalcoholquality
7.0 0.27 0.36 20.7 0.045 45 170 1.00103.00 0.45 8.8 6
6.3 0.30 0.34 1.6 0.049 14 132 0.99403.30 0.49 9.5 6
8.1 0.28 0.40 6.9 0.050 30 97 0.99513.26 0.44 10.1 6
7.2 0.23 0.32 8.5 0.058 47 186 0.99563.19 0.40 9.9 6
7.2 0.23 0.32 8.5 0.058 47 186 0.99563.19 0.40 9.9 6