Got Scotch API?

This notebook is meant to demonstrate the transformation of an annotated notebook into a HTTP API using the Jupyter kernel gateway. The result is a simple scotch recommendation engine.

The original scotch data is from https://www.mathstat.strath.ac.uk/outreach/nessie/nessie_whisky.html.


In [1]:
import pandas as pd
import pickle
import requests
import json

Data

We read the scotch data from a public Box URL to make this notebook more portable. This is acceptable for small, public, demo data which is what we have here.


In [2]:
features_uri = 'https://ibm.box.com/shared/static/2vntdqbozf9lzmukkeoq1lfi2pcb00j1.dataframe' 
sim_uri = 'https://ibm.box.com/shared/static/54kzs5zquv0vjycemjckjbh0n00e7m5t.dataframe'

In [3]:
resp = requests.get(features_uri)
resp.raise_for_status()
features_df = pickle.loads(resp.content)

In [4]:
resp = requests.get(sim_uri)
resp.raise_for_status()
sim_df = pickle.loads(resp.content)

Drop the cluster column. Don't need it here.


In [5]:
features_df = features_df.drop('cluster', axis=1)

API

We need to define a global REQUEST JSON string that will be replaced on each invocation of the API. We only care about path parameters and query string arguments, so we default those to blank here for development.


In [6]:
REQUEST = json.dumps({
    'path' : {},
    'args' : {}
})

Provide a way to get the names of all the scotches known by the model.


In [7]:
# GET /scotches
names = sim_df.columns.tolist()
print(json.dumps(dict(names=names)))


{"names": ["Aberfeldy", "Aberlour", "AnCnoc", "Ardbeg", "Ardmore", "ArranIsleOf", "Auchentoshan", "Auchroisk", "Aultmore", "Balblair", "Balmenach", "Belvenie", "BenNevis", "Benriach", "Benrinnes", "Benromach", "Bladnoch", "BlairAthol", "Bowmore", "Bruichladdich", "Bunnahabhain", "Caol Ila", "Cardhu", "Clynelish", "Craigallechie", "Craigganmore", "Dailuaine", "Dalmore", "Dalwhinnie", "Deanston", "Dufftown", "Edradour", "GlenDeveronMacduff", "GlenElgin", "GlenGarioch", "GlenGrant", "GlenKeith", "GlenMoray", "GlenOrd", "GlenScotia", "GlenSpey", "Glenallachie", "Glendronach", "Glendullan", "Glenfarclas", "Glenfiddich", "Glengoyne", "Glenkinchie", "Glenlivet", "Glenlossie", "Glenmorangie", "Glenrothes", "Glenturret", "Highland Park", "Inchgower", "Isle of Jura", "Knochando", "Lagavulin", "Laphroig", "Linkwood", "Loch Lomond", "Longmorn", "Macallan", "Mannochmore", "Miltonduff", "Mortlach", "Oban", "OldFettercairn", "OldPulteney", "RoyalBrackla", "RoyalLochnagar", "Scapa", "Speyburn", "Speyside", "Springbank", "Strathisla", "Strathmill", "Talisker", "Tamdhu", "Tamnavulin", "Teaninich", "Tobermory", "Tomatin", "Tomintoul", "Tormore", "Tullibardine"]}

Let clients query for features about a specific scotch given its name.


In [ ]:
# GET /scotches/:scotch
request = json.loads(REQUEST)
name = request['path'].get('scotch', 'Talisker')
features = features_df.loc[name]
# can't use to_dict because it retains numpy types which blow up when we json.dumps
print('{"features":%s}' % features.to_json())

Let clients request a set of scotches similar to the one named. Let clients specify how many results they wish to receive (count) and if they want all of the raw feature data included in the result or not (include_features).


In [ ]:
# GET /scotches/:scotch/similar
request = json.loads(REQUEST)
name = request['path'].get('scotch', 'Talisker')
count = request['args'].get('count', 5)
inc_features = request['args'].get('include_features', True)

similar = sim_df[name].order(ascending=False)
similar.name = 'Similarity'
df = pd.DataFrame(similar).ix[1:count+1]

if inc_features:
    df = df.join(features_df)
    
df = df.reset_index().rename(columns={'Distillery': 'Name'})
result = {
    'recommendations' : [row[1].to_dict() for row in df.iterrows()],
    'for': name
}
print(json.dumps(result))