DeepLearning

MNIST Dataset using DeepWater and TensorFlow

The MNIST database is a well-known academic dataset used to benchmark classification performance. The data consists of 60,000 training images and 10,000 test images. Each image is a standardized $28^2$ pixel greyscale image of a single handwritten digit. An example of the scanned handwritten digits is shown


In [1]:
import h2o
h2o.init()


Checking whether there is an H2O instance running at http://localhost:54321. connected.
H2O cluster uptime: 1 hour 44 mins
H2O cluster version: 3.11.0.99999
H2O cluster version age: 28 days, 23 hours and 53 minutes
H2O cluster name: ubuntu
H2O cluster total nodes: 1
H2O cluster free memory: 8.74 Gb
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster status: locked, healthy
H2O connection url: http://localhost:54321
H2O connection proxy: None
Python version: 2.7.12 final

In [2]:
import os.path
PATH = os.path.expanduser("~/h2o-3/")

In [3]:
test_df = h2o.import_file(PATH + "bigdata/laptop/mnist/test.csv.gz")


Parse progress: |█████████████████████████████████████████████████████████| 100%

In [4]:
train_df = h2o.import_file(PATH + "/bigdata/laptop/mnist/train.csv.gz")


Parse progress: |█████████████████████████████████████████████████████████| 100%

Specify the response and predictor columns


In [5]:
y = "C785"
x = train_df.names[0:784]

Convert the number to a class


In [6]:
train_df[y] = train_df[y].asfactor()
test_df[y] = test_df[y].asfactor()

Train Deep Learning model and validate on test set

LeNET 1989

In this demo you will learn how to use a simple LeNET Model using TensorFlow.

Using the LeNET model architecture for training in H2O

We are ready to start the training procedure.


In [7]:
from h2o.estimators.deepwater import H2ODeepWaterEstimator

In [8]:
lenet_model = H2ODeepWaterEstimator(
    epochs=10,
    learning_rate=1e-3, 
    mini_batch_size=64,
    network='lenet',        
    image_shape=[28,28],
    problem_type='dataset',      ## Not 'image' since we're not passing paths to image files, but raw numbers
    ignore_const_cols=False,     ## We need to keep all 28x28=784 pixel values, even if some are always 0
    channels=1,
    backend="tensorflow"
)

In [9]:
lenet_model.train(x=train_df.names, y=y, training_frame=train_df, validation_frame=test_df)


deepwater Model Build progress: |█████████████████████████████████████████| 100%

In [10]:
error = lenet_model.model_performance(valid=True).mean_per_class_error()
print "model error:", error


model error: 0.0132572646506