DeepLearning

MNIST Dataset using DeepWater and TensorFlow

The MNIST database is a well-known academic dataset used to benchmark classification performance. The data consists of 60,000 training images and 10,000 test images. Each image is a standardized $28^2$ pixel greyscale image of a single handwritten digit. An example of the scanned handwritten digits is shown



In [1]:

    
import h2o
h2o.init()









    



Checking whether there is an H2O instance running at http://localhost:54321. connected.






    




H2O cluster uptime:
1 hour 44 mins
H2O cluster version:
3.11.0.99999
H2O cluster version age:
28 days, 23 hours and 53 minutes 
H2O cluster name:
ubuntu
H2O cluster total nodes:
1
H2O cluster free memory:
8.74 Gb
H2O cluster total cores:
8
H2O cluster allowed cores:
8
H2O cluster status:
locked, healthy
H2O connection url:
http://localhost:54321
H2O connection proxy:
None
Python version:
2.7.12 final



In [2]:

    
import os.path
PATH = os.path.expanduser("~/h2o-3/")



In [3]:

    
test_df = h2o.import_file(PATH + "bigdata/laptop/mnist/test.csv.gz")









    



Parse progress: |█████████████████████████████████████████████████████████| 100%



In [4]:

    
train_df = h2o.import_file(PATH + "/bigdata/laptop/mnist/train.csv.gz")









    



Parse progress: |█████████████████████████████████████████████████████████| 100%

Specify the response and predictor columns



In [5]:

    
y = "C785"
x = train_df.names[0:784]

Convert the number to a class



In [6]:

    
train_df[y] = train_df[y].asfactor()
test_df[y] = test_df[y].asfactor()

Train Deep Learning model and validate on test set

LeNET 1989

In this demo you will learn how to use a simple LeNET Model using TensorFlow.

Using the LeNET model architecture for training in H2O

We are ready to start the training procedure.



In [7]:

    
from h2o.estimators.deepwater import H2ODeepWaterEstimator



In [8]:

    
lenet_model = H2ODeepWaterEstimator(
    epochs=10,
    learning_rate=1e-3, 
    mini_batch_size=64,
    network='lenet',        
    image_shape=[28,28],
    problem_type='dataset',      ## Not 'image' since we're not passing paths to image files, but raw numbers
    ignore_const_cols=False,     ## We need to keep all 28x28=784 pixel values, even if some are always 0
    channels=1,
    backend="tensorflow"
)



In [9]:

    
lenet_model.train(x=train_df.names, y=y, training_frame=train_df, validation_frame=test_df)









    



deepwater Model Build progress: |█████████████████████████████████████████| 100%



In [10]:

    
error = lenet_model.model_performance(valid=True).mean_per_class_error()
print "model error:", error









    



model error: 0.0132572646506

H2O cluster uptime:	1 hour 44 mins
H2O cluster version:	3.11.0.99999
H2O cluster version age:	28 days, 23 hours and 53 minutes
H2O cluster name:	ubuntu
H2O cluster total nodes:	1
H2O cluster free memory:	8.74 Gb
H2O cluster total cores:	8
H2O cluster allowed cores:	8
H2O cluster status:	locked, healthy
H2O connection url:	http://localhost:54321
H2O connection proxy:	None
Python version:	2.7.12 final