This notebook implements a pre-trained sentiment analysis pipeline including a regex pre-processing step, tokenization, n-gram computation, and logistic regreression model as a RESTful API.



In [ ]:

    
import cPickle
import json

import pandas as pd
import sklearn
import requests

Import the trained model



In [ ]:

    
resp = requests.get("https://raw.githubusercontent.com/crawles/gpdb_sentiment_analysis_twitter_model/master/twitter_sentiment_model.pkl")
resp.raise_for_status()
cl = cPickle.loads(resp.content)

Import data pre-processing function



In [ ]:

    
def regex_preprocess(raw_tweets):
    pp_text = pd.Series(raw_tweets)
    
    user_pat = '(?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z]+[A-Za-z0-9]+)'
    http_pat = '(https?:\/\/(?:www\.|(?!www))[^\s\.]+\.[^\s]{2,}|www\.[^\s]+\.[^\s]{2,})'
    repeat_pat, repeat_repl = "(.)\\1\\1+",'\\1\\1'

    pp_text = pp_text.str.replace(pat = user_pat, repl = 'USERNAME')
    pp_text = pp_text.str.replace(pat = http_pat, repl = 'URL')
    pp_text.str.replace(pat = repeat_pat, repl = repeat_repl)
    return pp_text

Setup the API

Jupyter Kernel Gateway utilizes a global REQUEST JSON string that will be replaced on each invocation of the API.



In [ ]:

    
REQUEST = json.dumps({
    'path' : {},
    'args' : {}
})

Compute sentiment using trained model and serve using POST

Using the kernel gateway, a cell is created as an HTTP handler using a single line comment. The handler supports common HTTP verbs (GET, POST, DELETE, etc). For more information, view the docs.



In [ ]:

    
# POST /polarity_compute
req = json.loads(REQUEST)
tweets = req['body']['data']
print(cl.predict_proba(regex_preprocess(tweets))[:][:,1])