Interactions with MLDB occurs via a REST API. Interacting with a REST API over HTTP from a Notebook interface can be a little bit laborious if you're using a general-purpose Python library like requests
directly, so MLDB comes with a Python library called pymldb
to ease the pain.
pymldb
does this in three ways:
%mldb
magics: these are Jupyter line- and cell-magic commands which allow you to make raw HTTP calls to MLDB, and also provides some higher-level functions. This tutorial shows you how to use them.Resource
class: this is simple class which wraps the requests
library so as to make HTTP calls to the MLDB API more friendly in a Notebook environment. Check out the Resource Wrapper Tutorial for more info on the Resource
class.BatFrame
class: this is a class that behaves like the Pandas DataFrame but offloads computation to the server via HTTP calls. Check out the BatFrame Tutorial for more info on the BatFrame.
In [1]:
%reload_ext pymldb
And then we'll ask it for some help
In [2]:
%mldb help
The most basic way in which the %mldb
magic can help us with MLDB's REST API is by allowing us to type natural-feeling REST commands, like this one, which will list all of the available dataset types:
In [3]:
%mldb GET /v1/types/datasets
Out[3]:
You can use similar syntax to run PUT, POST and DELETE queries as well.
The %mldb
magic system also includes syntax to do more advanced operations like loading and querying data. Let's load the dataset from the Predicting Titanic Survival demo with a single command (after deleting it first if it's already loaded):
In [4]:
%mldb DELETE /v1/datasets/titanic
%mldb loadcsv titanic https://raw.githubusercontent.com/datacratic/mldb-pytanic-plugin/master/titanic_train.csv
And now let's run an SQL query on it:
In [5]:
%mldb query select * from titanic limit 5
Out[5]:
We can get the results out as a Pandas DataFrame just as easily:
In [6]:
df = %mldb query select * from titanic
type(df)
Out[6]:
Python code which is executed in a normal Notebook cell runs within the Notebook Python interpreter. MLDB supports the sending of Python scripts via HTTP for execution within its own in-process Python interpreter. Server-side python code gets access to a high-performance version of the REST API which bypasses HTTP, via an mldb.perform()
function.
There's an %mldb
magic command for running server-side Python code, from the comfort of your Notebook:
In [7]:
%%mldb py
# this code will run on the server!
print mldb.perform("GET", "/v1/types/datasets", [], {})["response"]
Now that you've seen the basics, check out the Mapping Reddit demo to see how to use the %mldb
magic system to do machine learning with MLDB.
In [ ]: