This notebook is meant to demonstrate the transformation of an annotated notebook into a HTTP API using the Jupyter kernel gateway. The result is a simple scotch recommendation engine.
The original scotch data is from https://www.mathstat.strath.ac.uk/outreach/nessie/nessie_whisky.html.
In [1]:
import pandas as pd
import pickle
import requests
import json
In [2]:
features_uri = 'https://ibm.box.com/shared/static/2vntdqbozf9lzmukkeoq1lfi2pcb00j1.dataframe'
sim_uri = 'https://ibm.box.com/shared/static/54kzs5zquv0vjycemjckjbh0n00e7m5t.dataframe'
In [3]:
resp = requests.get(features_uri)
resp.raise_for_status()
features_df = pickle.loads(resp.content)
In [4]:
resp = requests.get(sim_uri)
resp.raise_for_status()
sim_df = pickle.loads(resp.content)
Drop the cluster column. Don't need it here.
In [5]:
features_df = features_df.drop('cluster', axis=1)
In [6]:
REQUEST = json.dumps({
'path' : {},
'args' : {}
})
Provide a way to get the names of all the scotches known by the model.
In [7]:
# GET /scotches
names = sim_df.columns.tolist()
print(json.dumps(dict(names=names)))
Let clients query for features about a specific scotch given its name.
In [ ]:
# GET /scotches/:scotch
request = json.loads(REQUEST)
name = request['path'].get('scotch', 'Talisker')
features = features_df.loc[name]
# can't use to_dict because it retains numpy types which blow up when we json.dumps
print('{"features":%s}' % features.to_json())
Let clients request a set of scotches similar to the one named. Let clients specify how many results they wish to receive (count) and if they want all of the raw feature data included in the result or not (include_features).
In [ ]:
# GET /scotches/:scotch/similar
request = json.loads(REQUEST)
name = request['path'].get('scotch', 'Talisker')
count = request['args'].get('count', 5)
inc_features = request['args'].get('include_features', True)
similar = sim_df[name].order(ascending=False)
similar.name = 'Similarity'
df = pd.DataFrame(similar).ix[1:count+1]
if inc_features:
df = df.join(features_df)
df = df.reset_index().rename(columns={'Distillery': 'Name'})
result = {
'recommendations' : [row[1].to_dict() for row in df.iterrows()],
'for': name
}
print(json.dumps(result))