Alink demonstration

This is a sample machine learning demo created with Alink from Alibaba. Cleuton Sampaio, Data Learning Hub


In [1]:
#Imports
from pyalink.alink import *


Use one of the following command to start using pyalink:
使用以下一条命令来开始使用 pyalink:
 - useLocalEnv(parallelism, flinkHome=None, config=None)
 - useRemoteEnv(host, port, parallelism, flinkHome=None, localIp="localhost", config=None)
Call resetEnv() to reset environment and switch to another.
使用 resetEnv() 来重置运行环境,并切换到另一个。


In [2]:
#Environment configuration
useLocalEnv(1, flinkHome=None, config=None)
#parallism We will not use, but we could use a Flink cluster https://flink.apache.org/poweredby.html


JVM listening on 127.0.0.1:41249
Out[2]:
JavaObject id=o6

In [3]:
#Preparing dataframe
#we'll read a CSV dataset containing Weights and Heights of students. We'll try to predict Weight based on Height
URL = "./weight-height.csv"
SCHEMA_STR = "weight double,height double"
mnist_data = CsvSourceBatchOp() \
    .setFilePath(URL) \
    .setSchemaStr(SCHEMA_STR)\
    .setFieldDelimiter(",")
spliter = SplitBatchOp().setFraction(0.8)
train = spliter.linkFrom(mnist_data)
test = spliter.getSideOutput(0)

In [4]:
#Creating Linear Regression Model based on operator
lr = LinearRegression().setFeatureCols(["weight"]).setLabelCol("height").setPredictionCol("prediction")

In [5]:
#Training and printing results
model = lr.fit(train)
model.transform(train).print()


     weight  height  prediction
0      61.0    1.62    1.610758
1      61.0    1.63    1.610758
2      68.0    1.68    1.681880
3      73.0    1.75    1.732682
4      67.0    1.68    1.671720
..      ...     ...         ...
234    66.0    1.67    1.661560
235    50.0    1.51    1.498994
236    70.0    1.70    1.702201
237    58.0    1.59    1.580277
238    73.0    1.75    1.732682

[239 rows x 3 columns]

In [ ]: