It's common for companies to have useful data hidden in large volumes of text:
For example, when shopping it can be challenging to decide between products with the same star rating. When this happens, shoppers often sift through the raw text of reviews to understand the strengths and weaknesses of each option.
In this notebook we seek to automate the task of determining product strengths and weaknesses from review text.
GraphLab Create includes feature engineering objects that leverage spaCy, a high performance NLP package. Here we use it for extracting parts of speech and parsing reviews into sentences.
In [1]:
import graphlab as gl
In [2]:
from graphlab.toolkits.text_analytics import trim_rare_words, split_by_sentence, extract_part_of_speech, stopwords, PartOfSpeech
def nlp_pipeline(reviews, title, aspects):
print(title)
print('1. Get reviews for this product')
reviews = reviews.filter_by(title, 'name')
print('2. Splitting reviews into sentences')
reviews['sentences'] = split_by_sentence(reviews['review'])
sentences = reviews.stack('sentences', 'sentence').dropna()
print('3. Tagging relevant reviews')
tags = gl.SFrame({'tag': aspects})
tagger_model = gl.data_matching.autotagger.create(tags, verbose=False)
tagged = tagger_model.tag(sentences, query_name='sentence', similarity_threshold=.3, verbose=False)\
.join(sentences, on='sentence')
print('4. Extracting adjectives')
tagged['cleaned'] = trim_rare_words(tagged['sentence'], stopwords=list(stopwords()))
tagged['adjectives'] = extract_part_of_speech(tagged['cleaned'], [PartOfSpeech.ADJ])
print('5. Predicting sentence-level sentiment')
model = gl.sentiment_analysis.create(tagged, features=['review'])
tagged['sentiment'] = model.predict(tagged)
return tagged
In [3]:
reviews = gl.SFrame('amazon_baby.gl')
In [4]:
reviews
Out[4]:
In [5]:
from helper_util import *
In [6]:
aspects = ['audio', 'price', 'signal', 'range', 'battery life']
In [7]:
reviews = search(reviews, 'monitor')
In [8]:
reviews
Out[8]:
In [9]:
item_a = 'Infant Optics DXR-5 2.4 GHz Digital Video Baby Monitor with Night Vision'
reviews_a = nlp_pipeline(reviews, item_a, aspects)
In [10]:
reviews_a
Out[10]:
In [11]:
dropdown = get_dropdown(reviews)
display(dropdown)
In [12]:
item_b = dropdown.value
reviews_b = nlp_pipeline(reviews, item_b, aspects)
counts, sentiment, adjectives = get_comparisons(reviews_a, reviews_b, item_a, item_b, aspects)
Comparing the number of sentences that mention each aspect
In [13]:
counts
Out[13]:
Comparing the sentence-level sentiment for each aspect of each product
In [14]:
sentiment
Out[14]:
Comparing the use of adjectives for each aspect
In [ ]:
adjectives
In [ ]:
good, bad = get_extreme_sentences(reviews_a)
Print good sentences for the first item, where adjectives and aspects are highlighted.
In [ ]:
print_sentences(good['highlighted'])
Print bad sentences for the first item, where adjectives and aspects are highlighted.
In [ ]:
print_sentences(bad['highlighted'])
In [17]:
service = gl.deploy.predictive_service.load("s3://gl-demo-usw2/predictive_service/demolab/ps-1.8.5")
In [18]:
service.get_predictive_objects_status()
In [ ]:
def word_count(text):
sa = gl.SArray([text])
sa = gl.text_analytics.count_words(sa)
return sa[0]
In [ ]:
service.update('chris_bow', word_count)
In [ ]:
service.apply_changes()
In [ ]:
service.query('chris_bow', text=["It's a beautiful day in the neighborhood. Beautiful day for a neighbor."])