Autoencoder

This notebook demonstrates the invocation of the SystemML autoencoder script, and alternative ways of passing in/out data.

This notebook is supported with SystemML 0.14.0 and above.


In [ ]:
!pip show systemml

In [ ]:
import pandas as pd
from systemml import MLContext, dml
ml = MLContext(sc)
print(ml.info())
sc.version

SystemML Read/Write data from local file system


In [ ]:
FsPath = "/tmp/data/"
inp  = FsPath + "Input/"
outp = FsPath + "Output/"

Generate Data and write out to file.


In [ ]:
import numpy as np
X_pd = pd.DataFrame(np.arange(1,2001, dtype=np.float)).values.reshape(100,20)
# X_pd = pd.DataFrame(range(1, 2001,1),dtype=float).values.reshape(100,20)
script ="""
    write(X, $Xfile)
"""
prog = dml(script).input(X=X_pd).input(**{"$Xfile":inp+"X.csv"})
ml.execute(prog)

In [ ]:
!ls -l /tmp/data/Input

In [ ]:
autoencoderURL = "https://raw.githubusercontent.com/apache/incubator-systemml/master/scripts/staging/autoencoder-2layer.dml"
rets = ("iter", "num_iters_per_epoch", "beg", "end", "o")

prog = dml(autoencoderURL).input(**{"$X":inp+"X.csv"}) \
                          .input(**{"$H1":500, "$H2":2, "$BATCH":36, "$EPOCH":5 \
                                    , "$W1_out":outp+"W1_out", "$b1_out":outp+"b1_out" \
                                    , "$W2_out":outp+"W2_out", "$b2_out":outp+"b2_out" \
                                    , "$W3_out":outp+"W3_out", "$b3_out":outp+"b3_out" \
                                    , "$W4_out":outp+"W4_out", "$b4_out":outp+"b4_out" \
                                   }).output(*rets)
iter, num_iters_per_epoch, beg, end, o = ml.execute(prog).get(*rets)
print (iter, num_iters_per_epoch, beg, end, o)

In [ ]:
!ls -l /tmp/data/Output

Alternatively to passing in/out file names, use Python variables.


In [ ]:
autoencoderURL = "https://raw.githubusercontent.com/apache/incubator-systemml/master/scripts/staging/autoencoder-2layer.dml"
rets = ("iter", "num_iters_per_epoch", "beg", "end", "o")
rets2 = ("W1", "b1", "W2", "b2", "W3", "b3", "W4", "b4")

prog = dml(autoencoderURL).input(X=X_pd) \
                          .input(**{ "$H1":500, "$H2":2, "$BATCH":36, "$EPOCH":5}) \
                          .output(*rets) \
                          .output(*rets2)
result = ml.execute(prog)
iter, num_iters_per_epoch, beg, end, o = result.get(*rets)
W1, b1, W2, b2, W3, b3, W4, b4 = result.get(*rets2)

print (iter, num_iters_per_epoch, beg, end, o)