The MNIST database is a well-known academic dataset used to benchmark
classification performance. The data consists of 60,000 training images and
10,000 test images. Each image is a standardized $28^2$ pixel greyscale image of
a single handwritten digit. An example of the scanned handwritten digits is
shown
In [1]:
import h2o
h2o.init()
In [2]:
import os.path
PATH = os.path.expanduser("~/h2o-3/")
In [3]:
test_df = h2o.import_file(PATH + "bigdata/laptop/mnist/test.csv.gz")
In [4]:
train_df = h2o.import_file(PATH + "/bigdata/laptop/mnist/train.csv.gz")
Specify the response and predictor columns
In [5]:
y = "C785"
x = train_df.names[0:784]
Convert the number to a class
In [6]:
train_df[y] = train_df[y].asfactor()
test_df[y] = test_df[y].asfactor()
Train Deep Learning model and validate on test set
In [7]:
from h2o.estimators.deepwater import H2ODeepWaterEstimator
In [8]:
lenet_model = H2ODeepWaterEstimator(
epochs=10,
learning_rate=1e-3,
mini_batch_size=64,
network='lenet',
image_shape=[28,28],
problem_type='dataset', ## Not 'image' since we're not passing paths to image files, but raw numbers
ignore_const_cols=False, ## We need to keep all 28x28=784 pixel values, even if some are always 0
channels=1,
backend="tensorflow"
)
In [9]:
lenet_model.train(x=train_df.names, y=y, training_frame=train_df, validation_frame=test_df)
In [10]:
error = lenet_model.model_performance(valid=True).mean_per_class_error()
print "model error:", error