This notebook is meant to demonstrate the transformation of an annotated notebook into a HTTP API using the Jupyter kernel gateway. The result is a simple scotch recommendation engine.

The original scotch data is from https://www.mathstat.strath.ac.uk/outreach/nessie/nessie_whisky.html.


In [ ]:
library(RCurl)
library(jsonlite)

Data

We read the scotch data from a public Dropbox URL to make this notebook more portable. This is acceptable for small, public, demo data which is what we have here.


In [ ]:
whisky_json <- getURL("https://dl.dropboxusercontent.com/u/19043899/whisky_features_df.json", ssl.verifypeer = FALSE, useragent= 'R')
whisky_similarity_json <- getURL("https://dl.dropboxusercontent.com/u/19043899/whisky_similarity_features_df.json", ssl.verifypeer = FALSE, useragent= 'R')

In [ ]:
whisky_df <- fromJSON(whisky_json)
whisky_similarity_df <- fromJSON(whisky_similarity_json)
whisky_similarity_df

Drop the cluster column. Don't need it here.


In [ ]:
whisky_df <- subset(whisky_df, select=-c(cluster))

API

We need to define a global REQUEST JSON string that will be replaced on each invocation of the API. We only care about path parameters and query string arguments, so we default those to blank here for development.


In [ ]:
REQUEST <- ""

Provide a way to get the names of all the scotches known by the model.


In [ ]:
# GET /scotches
scotches <- subset(whisky_df, select=c(Distillery))
print(toJSON(scotches))

Let clients query for features about a specific scotch given its name.


In [ ]:
# GET /scotches/:scotch
scotch_requested <- fromJSON(REQUEST)$path$scotch
if(is.null(scotch_requested)) {
    scotch_requested <- "Talisker"
}
scotch_features <- whisky_df[whisky_df$Distillery==scotch_requested,]
print(toJSON(scotch_features))

Let clients request a set of scotches similar to the one named. Let clients specify how many results they wish to receive (count) and if they want all of the raw feature data included in the result or not (include_features).


In [ ]:
# GET /scotches/:scotch/similar
scotch_requested <- fromJSON(REQUEST)$path$scotch
if(is.null(scotch_requested)) {
    scotch_requested <- "Talisker"
}
similarity_count <- fromJSON(REQUEST)$args$count
if(is.null(similarity_count)) {
    similarity_count <- 5
}
features_requested <- fromJSON(REQUEST)$args$include_features
if(is.null(features_requested)) {
    features_requested <- ""
}

#subset df columns to Distillery and the scotch requested
top_similar <- subset(whisky_similarity_df, select=c('Distillery',scotch_requested))
#order df by scotch requested
top_similar <- top_similar[order(-top_similar[scotch_requested]),]
#take the top similarity count rows of the df
top_similar <- top_similar[2:(similarity_count+1),]

if (features_requested == "True") {
    top_similar_with_features <- merge(x = top_similar, y = whisky_df, by = "Distillery", all.x = TRUE)
    #print(top_similar_with_features)
    print(toJSON(top_similar_with_features))
} else {
    #print(top_similar)
    print(toJSON(top_similar))
}