Predicting sentiment from product reviews

Fire up GraphLab Create



In [ ]:

    
import graphlab

Read some product review data

Loading reviews for a set of baby products.



In [ ]:

    
products = graphlab.SFrame('amazon_baby.gl/')

Let's explore this data together

Data includes the product name, the review text and the rating of the review.



In [ ]:

    
products.head()

Build the word count vector for each review



In [ ]:

    
products['word_count'] = graphlab.text_analytics.count_words(products['review'])



In [ ]:

    
products.head()



In [ ]:

    
graphlab.canvas.set_target('ipynb')



In [ ]:

    
products['name'].show()

Examining the reviews for most-sold product: 'Vulli Sophie the Giraffe Teether'



In [ ]:

    
giraffe_reviews = products[products['name'] == 'Vulli Sophie the Giraffe Teether']



In [ ]:

    
len(giraffe_reviews)



In [ ]:

    
giraffe_reviews['rating'].show(view='Categorical')

Build a sentiment classifier



In [ ]:

    
products['rating'].show(view='Categorical')

Define what's a positive and a negative sentiment

We will ignore all reviews with rating = 3, since they tend to have a neutral sentiment. Reviews with a rating of 4 or higher will be considered positive, while the ones with rating of 2 or lower will have a negative sentiment.



In [ ]:

    
#ignore all 3* reviews
products = products[products['rating'] != 3]



In [ ]:

    
#positive sentiment = 4* or 5* reviews
products['sentiment'] = products['rating'] >=4



In [ ]:

    
products.head()

Let's train the sentiment classifier



In [ ]:

    
train_data,test_data = products.random_split(.8, seed=0)



In [ ]:

    
sentiment_model = graphlab.logistic_classifier.create(train_data,
                                                     target='sentiment',
                                                     features=['word_count'],
                                                     validation_set=test_data)

Evaluate the sentiment model



In [ ]:

    
sentiment_model.evaluate(test_data, metric='roc_curve')



In [ ]:

    
sentiment_model.show(view='Evaluation')

Applying the learned model to understand sentiment for Giraffe



In [ ]:

    
giraffe_reviews['predicted_sentiment'] = sentiment_model.predict(giraffe_reviews, output_type='probability')



In [ ]:

    
giraffe_reviews.head()

Sort the reviews based on the predicted sentiment and explore



In [ ]:

    
giraffe_reviews = giraffe_reviews.sort('predicted_sentiment', ascending=False)



In [ ]:

    
giraffe_reviews.head()

Most positive reviews for the giraffe



In [ ]:

    
giraffe_reviews[0]['review']



In [ ]:

    
giraffe_reviews[1]['review']

Show most negative reviews for giraffe



In [ ]:

    
giraffe_reviews[-1]['review']



In [ ]:

    
giraffe_reviews[-2]['review']



In [ ]:

    
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']



In [ ]:

    
def awesome_count(word_count):
    return word_count.get('awesome', 0)



In [ ]:

    
products['awesome'] = products['word_count'].apply(awesome_count)



In [ ]:

    
def get_count(word_count, word):
    return word_count.get(word, 0)



In [ ]:

    
products['awesome'].head()



In [ ]:

    
products['awesome'].sum()



In [ ]:

    
for word in selected_words:
    products[word] = products['word_count'].apply(lambda word_count: get_count(word_count, word))



In [ ]:

    
products.head()



In [ ]:

    
len(selected_words)



In [ ]:

    
for word in selected_words:
    print word, products[word].sum()



In [ ]:

    
train_data, test_data = products.random_split(.8, seed=0)



In [ ]:

    
simple_model = graphlab.logistic_classifier.create(train_data,
                                                     target='sentiment',
                                                     features=selected_words,
                                                     validation_set=test_data)



In [ ]:

    
simple_model['coefficients'].sort('value').print_rows(15)



In [ ]:

    
simple_model.evaluate(test_data)



In [ ]:

    
sentiment_model.evaluate(test_data)



In [ ]:

    
diaper_champ_reviews = products[products['name']=='Baby Trend Diaper Champ']



In [ ]:

    
diaper_champ_reviews['predicted_sentiment'] = sentiment_model.predict(diaper_champ_reviews, output_type='probability')



In [ ]:

    
diaper_champ_reviews = diaper_champ_reviews.sort('predicted_sentiment', ascending=False)



In [ ]:

    
simple_model.predict(diaper_champ_reviews[0:1], output_type='probability')



In [ ]:

    
diaper_champ_reviews[0]



In [ ]:

    
test_data['sentiment'].show(view='Categorical')



In [ ]:

    
products[['word_count']].stack('word_count',new_column_name=['word', 'count'])



In [ ]: