This notebook implements a pre-trained sentiment analysis pipeline including a regex pre-processing step, tokenization, n-gram computation, and logistic regreression model as a RESTful API.
In [ ]:
import cPickle
import json
import pandas as pd
import sklearn
import requests
In [ ]:
resp = requests.get("https://raw.githubusercontent.com/crawles/gpdb_sentiment_analysis_twitter_model/master/twitter_sentiment_model.pkl")
resp.raise_for_status()
cl = cPickle.loads(resp.content)
In [ ]:
def regex_preprocess(raw_tweets):
pp_text = pd.Series(raw_tweets)
user_pat = '(?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z]+[A-Za-z0-9]+)'
http_pat = '(https?:\/\/(?:www\.|(?!www))[^\s\.]+\.[^\s]{2,}|www\.[^\s]+\.[^\s]{2,})'
repeat_pat, repeat_repl = "(.)\\1\\1+",'\\1\\1'
pp_text = pp_text.str.replace(pat = user_pat, repl = 'USERNAME')
pp_text = pp_text.str.replace(pat = http_pat, repl = 'URL')
pp_text.str.replace(pat = repeat_pat, repl = repeat_repl)
return pp_text
Jupyter Kernel Gateway utilizes a global REQUEST JSON string that will be replaced on each invocation of the API.
In [ ]:
REQUEST = json.dumps({
'path' : {},
'args' : {}
})
Using the kernel gateway, a cell is created as an HTTP handler using a single line comment. The handler supports common HTTP verbs (GET, POST, DELETE, etc). For more information, view the docs.
In [ ]:
# POST /polarity_compute
req = json.loads(REQUEST)
tweets = req['body']['data']
print(cl.predict_proba(regex_preprocess(tweets))[:][:,1])