Creating a Dataset is a simple REST call. The notebook cells below use pymldb
's Connection
class to make REST API calls. You can check out the Using pymldb
Tutorial for more details.
In [11]:
from pymldb import Connection
mldb = Connection()
Let's create a dataset called example
.
In [12]:
mldb.put('/v1/datasets/example', { "type":"sparse.mutable" })
Out[12]:
That's all there is to it, and now we can add some rows.
In [13]:
mldb.post('/v1/datasets/example/rows', {
"rowName": "first row",
"columns": [
["first column", 1, 0],
["second column", 2, 0]
]
})
mldb.post('/v1/datasets/example/rows', {
"rowName": "second row",
"columns": [
["first column", 3, 0],
["second column", 4, 0]
]
})
mldb.post("/v1/datasets/example/commit")
Out[13]:
So now we have a little bit of data in our dataset. Let's check.
In [14]:
mldb.query("select * from example")
Out[14]:
In [15]:
%mkdir /mldb_data/datasets
In [16]:
%%writefile /mldb_data/datasets/sample.csv
a column,another column
a,b
c,d
We can import this dataset with a simple procedure.
In [17]:
mldb.put('/v1/procedures/import_example2', {
"type":"import.text",
"params": {
"dataFileUrl":"file:///mldb_data/datasets/sample.csv",
"outputDataset": "example2",
"runOnCreation": True
}
})
Out[17]:
And a query to validate that things got loaded correctly!
In [18]:
mldb.query("select * from example2")
Out[18]:
Procedures take Datasets as inputs and can create Datasets as outputs. This is how you can do data cleanup/transformation in MLDB. Here's a simple example with the transform
Procedure:
In [19]:
mldb.put('/v1/procedures/example', {
"type": "transform",
"params": {
"inputData": 'select "first column" + "second column" as "transformed column" from example',
"outputDataset": "example3"
}
})
mldb.post('/v1/procedures/example/runs')
Out[19]:
In [20]:
mldb.query("select * from example3")
Out[20]:
Check out the other Tutorials and Demos.
In [ ]: