Machine Learning with H2O - Tutorial 1: Data in H2O with Python


Objective:

  • This tutorial demonstrates three different ways to import data into H2O.

Wine Quality Dataset:


Methods:

  1. Import data from a local CSV file.
  2. Import data from the web.
  3. Convert a Pandas data frame into H2O data frame.

Full Technical Reference:



In [1]:
# Start and connect to a local H2O cluster
import h2o
h2o.init(nthreads = -1)


Checking whether there is an H2O instance running at http://localhost:54321..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "1.8.0_131"; OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11); OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
  Starting server from /home/joe/anaconda3/lib/python3.6/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /tmp/tmp2g_oakfh
  JVM stdout: /tmp/tmp2g_oakfh/h2o_joe_started_from_python.out
  JVM stderr: /tmp/tmp2g_oakfh/h2o_joe_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321... successful.
H2O cluster uptime: 01 secs
H2O cluster version: 3.10.5.2
H2O cluster version age: 10 days
H2O cluster name: H2O_from_python_joe_oc2161
H2O cluster total nodes: 1
H2O cluster free memory: 5.210 Gb
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster status: accepting new members, healthy
H2O connection url: http://127.0.0.1:54321
H2O connection proxy: None
H2O internal security: False
Python version: 3.6.1 final



In [2]:
# Method 1 - Import data from a local CSV file
data_from_csv = h2o.import_file("winequality-white.csv")
data_from_csv.head(5)


Parse progress: |█████████████████████████████████████████████████████████| 100%
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality
7 0.27 0.36 20.7 0.045 45 170 1.001 3 0.45 8.8 6
6.3 0.3 0.34 1.6 0.049 14 132 0.994 3.3 0.49 9.5 6
8.1 0.28 0.4 6.9 0.05 30 97 0.99513.26 0.44 10.1 6
7.2 0.23 0.32 8.5 0.058 47 186 0.99563.19 0.4 9.9 6
7.2 0.23 0.32 8.5 0.058 47 186 0.99563.19 0.4 9.9 6
Out[2]:


In [3]:
# Method 2 - Import data from the web
data_from_web = h2o.import_file("https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv")
data_from_web.head(5)


Parse progress: |█████████████████████████████████████████████████████████| 100%
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality
7 0.27 0.36 20.7 0.045 45 170 1.001 3 0.45 8.8 6
6.3 0.3 0.34 1.6 0.049 14 132 0.994 3.3 0.49 9.5 6
8.1 0.28 0.4 6.9 0.05 30 97 0.99513.26 0.44 10.1 6
7.2 0.23 0.32 8.5 0.058 47 186 0.99563.19 0.4 9.9 6
7.2 0.23 0.32 8.5 0.058 47 186 0.99563.19 0.4 9.9 6
Out[3]:


In [4]:
# Method 3 - Convert Python data frame into H2O data frame

## Import Wine Quality data using Pandas
import pandas as pd
wine_df = pd.read_csv('winequality-white.csv', sep = ';')
wine_df.head(5)


Out[4]:
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality
0 7.0 0.27 0.36 20.7 0.045 45.0 170.0 1.0010 3.00 0.45 8.8 6
1 6.3 0.30 0.34 1.6 0.049 14.0 132.0 0.9940 3.30 0.49 9.5 6
2 8.1 0.28 0.40 6.9 0.050 30.0 97.0 0.9951 3.26 0.44 10.1 6
3 7.2 0.23 0.32 8.5 0.058 47.0 186.0 0.9956 3.19 0.40 9.9 6
4 7.2 0.23 0.32 8.5 0.058 47.0 186.0 0.9956 3.19 0.40 9.9 6

In [5]:
## Convert Pandas data frame into H2O data frame
data_from_df = h2o.H2OFrame(wine_df)
data_from_df.head(5)


Parse progress: |█████████████████████████████████████████████████████████| 100%
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality
7 0.27 0.36 20.7 0.045 45 170 1.001 3 0.45 8.8 6
6.3 0.3 0.34 1.6 0.049 14 132 0.994 3.3 0.49 9.5 6
8.1 0.28 0.4 6.9 0.05 30 97 0.99513.26 0.44 10.1 6
7.2 0.23 0.32 8.5 0.058 47 186 0.99563.19 0.4 9.9 6
7.2 0.23 0.32 8.5 0.058 47 186 0.99563.19 0.4 9.9 6
Out[5]: