Predicting sentiment from product reviews

Fire up GraphLab Create



In [1]:

    
import graphlab

Read some product review data

Loading reviews for a set of baby products.



In [2]:

    
products = graphlab.SFrame('amazon_baby.gl/')









    



[INFO] This non-commercial license of GraphLab Create is assigned to iliassweb@gmail.comand will expire on September 22, 2016. For commercial licensing options, visit https://dato.com/buy/.

[INFO] Start server at: ipc:///tmp/graphlab_server-25262 - Server binary: /home/zax/anaconda/lib/python2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1445838606.log
[INFO] GraphLab Server Version: 1.6.1

Let's explore this data together

Data includes the product name, the review text and the rating of the review.



In [3]:

    
products.head()









    Out[3]:





    
        name
        review
        rating
    
    
        Planetwise Flannel Wipes
        These flannel wipes are
OK, but in my opinion ...
        3.0
    
    
        Planetwise Wipe Pouch
        it came early and was not
disappointed. i love ...
        5.0
    
    
        Annas Dream Full Quilt
with 2 Shams ...
        Very soft and comfortable
and warmer than it ...
        5.0
    
    
        Stop Pacifier Sucking
without tears with ...
        This is a product well
worth the purchase.  I ...
        5.0
    
    
        Stop Pacifier Sucking
without tears with ...
        All of my kids have cried
non-stop when I tried to ...
        5.0
    
    
        Stop Pacifier Sucking
without tears with ...
        When the Binky Fairy came
to our house, we didn't ...
        5.0
    
    
        A Tale of Baby's Days
with Peter Rabbit ...
        Lovely book, it's bound
tightly so you may no ...
        4.0
    
    
        Baby Tracker&reg; - Daily
Childcare Journal, ...
        Perfect for new parents.
We were able to keep ...
        5.0
    
    
        Baby Tracker&reg; - Daily
Childcare Journal, ...
        A friend of mine pinned
this product on Pinte ...
        5.0
    
    
        Baby Tracker&reg; - Daily
Childcare Journal, ...
        This has been an easy way
for my nanny to record ...
        4.0
    

[10 rows x 3 columns]

Build the word count vector for each review



In [4]:

    
products['word_count'] = graphlab.text_analytics.count_words(products['review'])



In [6]:

    
products.head()









    Out[6]:





    
        name
        review
        rating
        word_count
    
    
        Planetwise Flannel Wipes
        These flannel wipes are
OK, but in my opinion ...
        3.0
        {'and': 5, 'stink': 1,
'because': 1, 'ordered': ...
    
    
        Planetwise Wipe Pouch
        it came early and was not
disappointed. i love ...
        5.0
        {'and': 3, 'love': 1,
'it': 2, 'highly': 1, ...
    
    
        Annas Dream Full Quilt
with 2 Shams ...
        Very soft and comfortable
and warmer than it ...
        5.0
        {'and': 2, 'quilt': 1,
'it': 1, 'comfortable': ...
    
    
        Stop Pacifier Sucking
without tears with ...
        This is a product well
worth the purchase.  I ...
        5.0
        {'ingenious': 1, 'and':
3, 'love': 2, ...
    
    
        Stop Pacifier Sucking
without tears with ...
        All of my kids have cried
non-stop when I tried to ...
        5.0
        {'and': 2, 'parents!!':
1, 'all': 2, 'puppet.': ...
    
    
        Stop Pacifier Sucking
without tears with ...
        When the Binky Fairy came
to our house, we didn't ...
        5.0
        {'and': 2, 'cute': 1,
'help': 2, 'doll': 1, ...
    
    
        A Tale of Baby's Days
with Peter Rabbit ...
        Lovely book, it's bound
tightly so you may no ...
        4.0
        {'shop': 1, 'be': 1,
'is': 1, 'it': 1, 'as': ...
    
    
        Baby Tracker&reg; - Daily
Childcare Journal, ...
        Perfect for new parents.
We were able to keep ...
        5.0
        {'feeding,': 1, 'and': 2,
'all': 1, 'right': 1, ...
    
    
        Baby Tracker&reg; - Daily
Childcare Journal, ...
        A friend of mine pinned
this product on Pinte ...
        5.0
        {'and': 1, 'help': 1,
'give': 1, 'is': 1, ...
    
    
        Baby Tracker&reg; - Daily
Childcare Journal, ...
        This has been an easy way
for my nanny to record ...
        4.0
        {'journal.': 1, 'all': 1,
'standarad': 1, ...
    

[10 rows x 4 columns]



In [5]:

    
graphlab.canvas.set_target('ipynb')



In [6]:

    
products['rating'].show()
#print products['rating']

Examining the reviews for most-sold product: 'Vulli Sophie the Giraffe Teether'



In [194]:

    
giraffe_reviews = products[products['name'] == 'Vulli Sophie the Giraffe Teether']



In [195]:

    
len(giraffe_reviews)









    Out[195]:





785



In [196]:

    
giraffe_reviews['rating'].show(view='Categorical')

Build a sentiment classifier



In [7]:

    
products['rating'].show(view='Categorical')

Define what's a positive and a negative sentiment

We will ignore all reviews with rating = 3, since they tend to have a neutral sentiment. Reviews with a rating of 4 or higher will be considered positive, while the ones with rating of 2 or lower will have a negative sentiment.



In [8]:

    
#ignore all 3* reviews
products = products[products['rating'] != 3]



In [9]:

    
#positive sentiment = 4* or 5* reviews
products['sentiment'] = products['rating'] >=4



In [ ]:

Let's train the sentiment classifier



In [10]:

    
train_data,test_data = products.random_split(.8, seed=0)



In [11]:

    
sentiment_model = graphlab.logistic_classifier.create(train_data,
                                                     target='sentiment',
                                                     features=['word_count'],
                                                     validation_set=test_data)









    



PROGRESS: Logistic regression:
PROGRESS: --------------------------------------------------------
PROGRESS: Number of examples          : 133448
PROGRESS: Number of classes           : 2
PROGRESS: Number of feature columns   : 1
PROGRESS: Number of unpacked features : 219217
PROGRESS: Number of coefficients    : 219218
PROGRESS: Starting L-BFGS
PROGRESS: --------------------------------------------------------
PROGRESS: +-----------+----------+-----------+--------------+-------------------+---------------------+
PROGRESS: | Iteration | Passes   | Step size | Elapsed Time | Training-accuracy | Validation-accuracy |
PROGRESS: +-----------+----------+-----------+--------------+-------------------+---------------------+
PROGRESS: | 1         | 5        | 0.000002  | 2.717550     | 0.841481          | 0.839989            |
PROGRESS: | 2         | 9        | 3.000000  | 4.360094     | 0.947425          | 0.894877            |
PROGRESS: | 3         | 10       | 3.000000  | 4.974734     | 0.923768          | 0.866232            |
PROGRESS: | 4         | 11       | 3.000000  | 5.590956     | 0.971779          | 0.912743            |
PROGRESS: | 5         | 12       | 3.000000  | 6.205668     | 0.975511          | 0.908900            |
PROGRESS: | 6         | 13       | 3.000000  | 6.823351     | 0.899991          | 0.825967            |
PROGRESS: | 10        | 18       | 1.000000  | 9.760932     | 0.988715          | 0.916256            |
PROGRESS: +-----------+----------+-----------+--------------+-------------------+---------------------+

Evaluate the sentiment model



In [39]:

    
sentiment_model.evaluate(test_data, metric='roc_curve')









    Out[39]:





{'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 
 Rows: 1001
 
 Data:
 +------------------+----------------+------------------+-------+------+
 |    threshold     |      fpr       |       tpr        |   p   |  n   |
 +------------------+----------------+------------------+-------+------+
 |       0.0        | 0.222096743836 | 0.00438533941814 | 28048 | 5313 |
 | 0.0010000000475  | 0.777903256164 |  0.995614660582  | 28048 | 5313 |
 | 0.00200000009499 | 0.738377564465 |  0.99447375927   | 28048 | 5313 |
 | 0.00300000002608 | 0.716167890081 |  0.993725042784  | 28048 | 5313 |
 | 0.00400000018999 | 0.70092226614  |  0.993190245294  | 28048 | 5313 |
 | 0.00499999988824 | 0.689629211368 |  0.992833713634  | 28048 | 5313 |
 | 0.00600000005215 | 0.679841897233 |  0.992298916144  | 28048 | 5313 |
 | 0.00700000021607 | 0.66930171278  |  0.991942384484  | 28048 | 5313 |
 | 0.00800000037998 | 0.658949745906 |  0.991692812322  | 28048 | 5313 |
 | 0.00899999961257 | 0.651797477884 |  0.991336280662  | 28048 | 5313 |
 +------------------+----------------+------------------+-------+------+
 [1001 rows x 5 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}



In [40]:

    
sentiment_model.show(view='Evaluation')

Applying the learned model to understand sentiment for Giraffe



In [205]:

    
giraffe_reviews['predicted_sentiment'] = sentiment_model.predict(giraffe_reviews, output_type='probability')



In [13]:

    
giraffe_reviews.head()









    



  File "<ipython-input-13-5af05b58209d>", line 1
    giraffe_reviews.head(
                         ^
SyntaxError: unexpected EOF while parsing

Sort the reviews based on the predicted sentiment and explore



In [29]:

    
giraffe_reviews = giraffe_reviews.sort('predicted_sentiment', ascending=False)



In [14]:

    
giraffe_reviews.head()









    



  File "<ipython-input-14-5af05b58209d>", line 1
    giraffe_reviews.head(
                         ^
SyntaxError: unexpected EOF while parsing

Most positive reviews for the giraffe



In [22]:

    
giraffe_reviews[0]['review']









    Out[22]:





"Sophie, oh Sophie, your time has come. My granddaughter, Violet is 5 months old and starting to teeth. What joy little Sophie brings to Violet. Sophie is made of a very pliable rubber that is sturdy but not tough. It is quite easy for Violet to twist Sophie into unheard of positions to get Sophie into her mouth. The little nose and hooves fit perfectly into small mouths, and the drooling has purpose. The paint on Sophie is food quality.Sophie was born in 1961 in France. The maker had wondered why there was nothing available for babies and made Sophie from the finest rubber, phthalate-free on St Sophie's Day, thus the name was born. Since that time millions of Sophie's populate the world. She is soft and for babies little hands easy to grasp. Violet especially loves the bumpy head and horns of Sophie. Sophie has a long neck that easy to grasp and twist. She has lovely, sizable spots that attract Violet's attention. Sophie has happy little squeaks that bring squeals of delight from Violet. She is able to make Sophie squeak and that brings much joy. Sophie's smooth skin is soothing to Violet's little gums. Sophie is 7 inches tall and is the exact correct size for babies to hold and love.As you well know the first thing babies grasp, goes into their mouths- how wonderful to have a toy that stimulates all of the senses and helps with the issue of teething. Sophie is small enough to fit into any size pocket or bag. Sophie is the perfect find for babies from a few months to a year old. How wonderful to hear the giggles and laughs that emanate from babies who find Sophie irresistible. Viva La Sophie!Highly Recommended.  prisrob 12-11-09"



In [23]:

    
giraffe_reviews[1]['review']









    Out[23]:





"I'm not sure why Sophie is such a hit with the little ones, but my 7 month old baby girl is one of her adoring fans.  The rubber is softer and more pleasant to handle, and my daughter has enjoyed chewing on her legs and the nubs on her head even before she started teething.  She also loves the squeak that Sophie makes when you squeeze her.  Not sure what it is but if Sophie is amongst a pile of her other toys, my daughter will more often than not reach for Sophie.  And I have the peace of mind of knowing that only edible and safe paints and materials have been used to make Sophie, as opposed to Bright Starts and other baby toys made in China.  Now that the research is out on phthalates and other toxic substances in baby toys, I think it's more important than ever to find good quality toys that are also safe for our babies to handle and put in their mouths.  Sophie is a must-have for every new mom in my opinion.  Even if your kid is one of the few that can take or leave her, it's worth a try.  Vulli, the makers of Sophie, also make natural rubber teething rings that my daughter loves as well."

Show most negative reviews for giraffe



In [24]:

    
giraffe_reviews[-1]['review']









    Out[24]:





"My son (now 2.5) LOVED his Sophie, and I bought one for every baby shower I've gone to. Now, my daughter (6 months) just today nearly choked on it and I will never give it to her again. Had I not been within hearing range it could have been fatal. The strange sound she was making caught my attention and when I went to her and found the front curved leg shoved well down her throat and her face a purply/blue I panicked. I pulled it out and she vomited all over the carpet before screaming her head off. I can't believe how my opinion of this toy has changed from a must-have to a must-not-use. Please don't disregard any of the choking hazard comments, they are not over exaggerated!"



In [ ]:

Building awesome count function



In [13]:

    
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']



In [46]:

    
def awesome_count(dicts):
    word = 'and'
    if word in dicts:
        return dicts[word]
    else:
        return 0

# automation function
""""def automate(word_list, products):
    for i in xrange(len(word_list)):
        print word_list[i]
        products[word_list[i]]= products['word_count'].apply(awesome_count)     
 """"



In [17]:

    
products.head()









    Out[17]:





    
        name
        review
        rating
        word_count
        sentiment
        awesome
    
    
        Planetwise Wipe Pouch
        it came early and was not
disappointed. i love ...
        5.0
        {'and': 3, 'love': 1,
'it': 2, 'highly': 1, ...
        1
        0
    
    
        Annas Dream Full Quilt
with 2 Shams ...
        Very soft and comfortable
and warmer than it ...
        5.0
        {'and': 2, 'quilt': 1,
'it': 1, 'comfortable': ...
        1
        0
    
    
        Stop Pacifier Sucking
without tears with ...
        This is a product well
worth the purchase.  I ...
        5.0
        {'ingenious': 1, 'and':
3, 'love': 2, ...
        1
        0
    
    
        Stop Pacifier Sucking
without tears with ...
        All of my kids have cried
non-stop when I tried to ...
        5.0
        {'and': 2, 'parents!!':
1, 'all': 2, 'puppet.': ...
        1
        0
    
    
        Stop Pacifier Sucking
without tears with ...
        When the Binky Fairy came
to our house, we didn't ...
        5.0
        {'and': 2, 'cute': 1,
'help': 2, 'doll': 1, ...
        1
        0
    
    
        A Tale of Baby's Days
with Peter Rabbit ...
        Lovely book, it's bound
tightly so you may no ...
        4.0
        {'shop': 1, 'be': 1,
'is': 1, 'it': 1, 'as': ...
        1
        0
    
    
        Baby Tracker&reg; - Daily
Childcare Journal, ...
        Perfect for new parents.
We were able to keep ...
        5.0
        {'feeding,': 1, 'and': 2,
'all': 1, 'right': 1, ...
        1
        0
    
    
        Baby Tracker&reg; - Daily
Childcare Journal, ...
        A friend of mine pinned
this product on Pinte ...
        5.0
        {'and': 1, 'help': 1,
'give': 1, 'is': 1, ...
        1
        0
    
    
        Baby Tracker&reg; - Daily
Childcare Journal, ...
        This has been an easy way
for my nanny to record ...
        4.0
        {'journal.': 1, 'all': 1,
'standarad': 1, ...
        1
        0
    
    
        Baby Tracker&reg; - Daily
Childcare Journal, ...
        I love this journal and
our nanny uses it ...
        4.0
        {'all': 1, 'forget': 1,
'just': 1, "daughter's": ...
        1
        0
    

[10 rows x 6 columns]



In [ ]:

Buildind the awesome feature column



In [16]:

    
def awesome_count(dicts):
    word = 'awesome'
    if word in dicts:
        return dicts[word]
    else:
        return 0
products['awesome'] = products['word_count'].apply(awesome_count)



In [ ]:

Add 10 more new features



In [18]:

    
def awesome_count(dicts):
    word = 'great'
    if word in dicts:
        return dicts[word]
    else:
        return 0
## Great
products['great'] = products['word_count'].apply(awesome_count)



In [19]:

    
def awesome_count(dicts):
    word = 'fantastic'
    if word in dicts:
        return dicts[word]
    else:
        return 0
# Fantastic
products['fantastic'] = products['word_count'].apply(awesome_count)



In [20]:

    
def awesome_count(dicts):
    word = 'amazing'
    if word in dicts:
        return dicts[word]
    else:
        return 0
## amazing
products['amazing'] = products['word_count'].apply(awesome_count, skip_undefined=True)



In [21]:

    
def awesome_count(dicts):
    word = 'love'
    if word in dicts:
        return dicts[word]
    else:
        return 0
## love
products['love'] = products['word_count'].apply(awesome_count)



In [22]:

    
def awesome_count(dicts):
    word = 'horrible'
    if word in dicts:
        return dicts[word]
    else:
        return 0
## horrible
products['horrible'] = products['word_count'].apply(awesome_count)



In [23]:

    
def awesome_count(dicts):
    word = 'bad'
    if word in dicts:
        return dicts[word]
    else:
        return 0
## bad
products['bad'] = products['word_count'].apply(awesome_count)



In [24]:

    
def awesome_count(dicts):
    word = 'terrible'
    if word in dicts:
        return dicts[word]
    else:
        return 0
## terrible
products['terrible'] = products['word_count'].apply(awesome_count)



In [ ]:



In [25]:

    
def awesome_count(dicts):
    word = 'awful'
    if word in dicts:
        return dicts[word]
    else:
        return 0
## awful
products['awful'] = products['word_count'].apply(awesome_count)



In [26]:

    
def awesome_count(dicts):
    word = 'wow'
    if word in dicts:
        return dicts[word]
    else:
        return 0
## wow
products['wow'] = products['word_count'].apply(awesome_count)



In [27]:

    
def awesome_count(dicts):
    word = 'hate'
    if word in dicts:
        return dicts[word]
    else:
        return 0
## hate
products['hate'] = products['word_count'].apply(awesome_count)



In [ ]:



In [28]:

    
products









    Out[28]:





    
        name
        review
        rating
        word_count
        sentiment
        awesome
    
    
        Planetwise Wipe Pouch
        it came early and was not
disappointed. i love ...
        5.0
        {'and': 3, 'love': 1,
'it': 2, 'highly': 1, ...
        1
        0
    
    
        Annas Dream Full Quilt
with 2 Shams ...
        Very soft and comfortable
and warmer than it ...
        5.0
        {'and': 2, 'quilt': 1,
'it': 1, 'comfortable': ...
        1
        0
    
    
        Stop Pacifier Sucking
without tears with ...
        This is a product well
worth the purchase.  I ...
        5.0
        {'ingenious': 1, 'and':
3, 'love': 2, ...
        1
        0
    
    
        Stop Pacifier Sucking
without tears with ...
        All of my kids have cried
non-stop when I tried to ...
        5.0
        {'and': 2, 'parents!!':
1, 'all': 2, 'puppet.': ...
        1
        0
    
    
        Stop Pacifier Sucking
without tears with ...
        When the Binky Fairy came
to our house, we didn't ...
        5.0
        {'and': 2, 'cute': 1,
'help': 2, 'doll': 1, ...
        1
        0
    
    
        A Tale of Baby's Days
with Peter Rabbit ...
        Lovely book, it's bound
tightly so you may no ...
        4.0
        {'shop': 1, 'be': 1,
'is': 1, 'it': 1, 'as': ...
        1
        0
    
    
        Baby Tracker&reg; - Daily
Childcare Journal, ...
        Perfect for new parents.
We were able to keep ...
        5.0
        {'feeding,': 1, 'and': 2,
'all': 1, 'right': 1, ...
        1
        0
    
    
        Baby Tracker&reg; - Daily
Childcare Journal, ...
        A friend of mine pinned
this product on Pinte ...
        5.0
        {'and': 1, 'help': 1,
'give': 1, 'is': 1, ...
        1
        0
    
    
        Baby Tracker&reg; - Daily
Childcare Journal, ...
        This has been an easy way
for my nanny to record ...
        4.0
        {'journal.': 1, 'all': 1,
'standarad': 1, ...
        1
        0
    
    
        Baby Tracker&reg; - Daily
Childcare Journal, ...
        I love this journal and
our nanny uses it ...
        4.0
        {'all': 1, 'forget': 1,
'just': 1, "daughter's": ...
        1
        0
    


    
        great
        fantastic
        amazing
        love
        horrible
        bad
        terrible
        awful
        wow
        hate
    
    
        0
        0
        0
        1
        0
        0
        0
        0
        0
        0
    
    
        0
        0
        0
        0
        0
        0
        0
        0
        0
        0
    
    
        0
        0
        0
        2
        0
        0
        0
        0
        0
        0
    
    
        1
        0
        0
        0
        0
        0
        0
        0
        0
        0
    
    
        1
        0
        0
        0
        0
        0
        0
        0
        0
        0
    
    
        0
        0
        0
        0
        0
        0
        0
        0
        0
        0
    
    
        0
        0
        0
        0
        0
        0
        0
        0
        0
        0
    
    
        0
        0
        0
        0
        0
        0
        0
        0
        0
        0
    
    
        0
        0
        0
        0
        0
        0
        0
        0
        0
        0
    
    
        0
        0
        0
        2
        0
        0
        0
        0
        0
        0
    

[166752 rows x 16 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.



In [29]:

    
# Sum of differents words 
result = []
for i in xrange(len(selected_words)):
    print selected_words[i]
    print products[selected_words[i]].sum()

    result.append(products[selected_words[i]].sum())
    
print result









    



awesome
2002
great
42420
fantastic
873
amazing
1305
love
40277
horrible
659
bad
3197
terrible
673
awful
345
wow
131
hate
1057
[2002, 42420, 873, 1305, 40277, 659, 3197, 673, 345, 131, 1057]



In [30]:

    
train_data,test_data = products.random_split(.8, seed=0)

Build a new sentiment analysis model using selected_words



In [31]:

    
selected_words_model = graphlab.logistic_classifier.create(train_data,
                                                     target='sentiment',
                                                     features=selected_words,
                                                     validation_set=test_data)









    



PROGRESS: Logistic regression:
PROGRESS: --------------------------------------------------------
PROGRESS: Number of examples          : 133448
PROGRESS: Number of classes           : 2
PROGRESS: Number of feature columns   : 11
PROGRESS: Number of unpacked features : 11
PROGRESS: Number of coefficients    : 12
PROGRESS: Starting Newton Method
PROGRESS: --------------------------------------------------------
PROGRESS: +-----------+----------+--------------+-------------------+---------------------+
PROGRESS: | Iteration | Passes   | Elapsed Time | Training-accuracy | Validation-accuracy |
PROGRESS: +-----------+----------+--------------+-------------------+---------------------+
PROGRESS: | 1         | 2        | 0.354655     | 0.844299          | 0.842842            |
PROGRESS: | 2         | 3        | 0.578448     | 0.844186          | 0.842842            |
PROGRESS: | 3         | 4        | 0.783086     | 0.844276          | 0.843142            |
PROGRESS: | 4         | 5        | 0.994790     | 0.844269          | 0.843142            |
PROGRESS: | 5         | 6        | 1.203048     | 0.844269          | 0.843142            |
PROGRESS: | 6         | 7        | 1.412275     | 0.844269          | 0.843142            |
PROGRESS: +-----------+----------+--------------+-------------------+---------------------+



In [38]:

    
coefficients = selected_words_model['coefficients']
coefficients.sort('value', ascending=False)
coefficients.print_rows(12,4)









    



+-------------+-------+-------+------------------+
|     name    | index | class |      value       |
+-------------+-------+-------+------------------+
| (intercept) |  None |   1   |  1.36728315229   |
|   awesome   |  None |   1   |  1.05800888878   |
|    great    |  None |   1   |  0.883937894898  |
|  fantastic  |  None |   1   |  0.891303090304  |
|   amazing   |  None |   1   |  0.892802422508  |
|     love    |  None |   1   |  1.39989834302   |
|   horrible  |  None |   1   |  -1.99651800559  |
|     bad     |  None |   1   | -0.985827369929  |
|   terrible  |  None |   1   |  -2.09049998487  |
|    awful    |  None |   1   |  -1.76469955631  |
|     wow     |  None |   1   | -0.0541450123333 |
|     hate    |  None |   1   |  -1.40916406276  |
+-------------+-------+-------+------------------+
[12 rows x 4 columns]



In [ ]:

Comparing accuracy of different sentiment analysis model



In [44]:

    
selected_words_model.evaluate(test_data, metric='roc_curve')









    Out[44]:





{'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 
 Rows: 1001
 
 Data:
 +------------------+-------------------+-----+-------+------+
 |    threshold     |        fpr        | tpr |   p   |  n   |
 +------------------+-------------------+-----+-------+------+
 |       0.0        | 0.000188608072425 | 0.0 | 28012 | 5302 |
 | 0.0010000000475  |   0.999811391928  | 1.0 | 28012 | 5302 |
 | 0.00200000009499 |   0.999622783855  | 1.0 | 28012 | 5302 |
 | 0.00300000002608 |   0.999622783855  | 1.0 | 28012 | 5302 |
 | 0.00400000018999 |   0.999434175783  | 1.0 | 28012 | 5302 |
 | 0.00499999988824 |   0.999434175783  | 1.0 | 28012 | 5302 |
 | 0.00600000005215 |   0.99924556771   | 1.0 | 28012 | 5302 |
 | 0.00700000021607 |   0.99924556771   | 1.0 | 28012 | 5302 |
 | 0.00800000037998 |   0.99924556771   | 1.0 | 28012 | 5302 |
 | 0.00899999961257 |   0.99924556771   | 1.0 | 28012 | 5302 |
 +------------------+-------------------+-----+-------+------+
 [1001 rows x 5 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}



In [46]:

    
selected_words_model.show(view='Evaluation')

Interpreting the difference in performance between the models



In [47]:

    
daiper_champ_reviews = products[products['name']=='Baby Trend Diaper Champ']



In [48]:

    
daiper_champ_reviews.head()









    Out[48]:





    
        name
        review
        rating
        word_count
        sentiment
        awesome
    
    
        Baby Trend Diaper Champ
        Ok - newsflash.  Diapers
are just smelly.  We've ...
        4.0
        {'just': 2, 'less': 1,
'-': 3, 'smell- ...
        1
        0
    
    
        Baby Trend Diaper Champ
        My husband and I selected
the Diaper "Champ" ma ...
        1.0
        {'just': 1, 'less': 1,
'when': 3, 'over': 1, ...
        0
        0
    
    
        Baby Trend Diaper Champ
        Excellent diaper disposal
unit.  I used it in ...
        5.0
        {'control': 1, 'am': 1,
'it': 1, 'used': 1, ' ...
        1
        0
    
    
        Baby Trend Diaper Champ
        We love our diaper champ.
It is very easy to use ...
        5.0
        {'and': 3, 'over.': 1,
'all': 1, 'love': 1, ...
        1
        0
    
    
        Baby Trend Diaper Champ
        Two girlfriends and two
family members put me ...
        5.0
        {'just': 1, 'when': 1,
'both': 1, 'results': 1, ...
        1
        0
    
    
        Baby Trend Diaper Champ
        I waited to review this
until I saw how it ...
        4.0
        {'lysol': 1, 'all': 1,
'mom.': 1, 'busy': 1, ...
        1
        0
    
    
        Baby Trend Diaper Champ
        I have had a diaper genie
for almost 4 years since ...
        1.0
        {'all': 1, 'bags.': 1,
'just': 1, "don't": 2, ...
        0
        0
    
    
        Baby Trend Diaper Champ
        I originally put this
item on my baby registry ...
        5.0
        {'lysol': 1, 'all': 2,
'bags.': 1, 'feedback': ...
        1
        0
    
    
        Baby Trend Diaper Champ
        I am so glad I got the
Diaper Champ instead of ...
        5.0
        {'and': 2, 'all': 1,
'just': 1, 'is': 2, ' ...
        1
        0
    
    
        Baby Trend Diaper Champ
        We had 2 diaper Genie's
both given to us as a ...
        4.0
        {'hand.': 1, '(required':
1, 'before': 1, ...
        1
        0
    


    
        great
        fantastic
        amazing
        love
        horrible
        bad
        terrible
        awful
        wow
        hate
    
    
        0
        0
        0
        0
        0
        0
        0
        0
        0
        0
    
    
        0
        0
        0
        0
        0
        0
        0
        0
        0
        0
    
    
        0
        0
        0
        0
        0
        0
        0
        0
        0
        0
    
    
        0
        0
        0
        1
        0
        0
        0
        0
        0
        0
    
    
        0
        0
        0
        0
        1
        0
        0
        0
        0
        0
    
    
        0
        0
        0
        0
        0
        1
        0
        0
        0
        0
    
    
        0
        0
        0
        0
        0
        0
        0
        0
        0
        0
    
    
        0
        0
        0
        0
        0
        0
        0
        0
        0
        0
    
    
        0
        0
        0
        0
        0
        0
        0
        0
        0
        0
    
    
        0
        0
        0
        2
        0
        0
        0
        0
        0
        0
    

[10 rows x 16 columns]

Make prediction



In [49]:

    
daiper_champ_reviews['predicted_sentiment'] = sentiment_model.predict(daiper_champ_reviews, output_type='probability')



In [53]:

    
daiper_champ_reviews.head()









    Out[53]:





    
        name
        review
        rating
        word_count
        sentiment
        awesome
    
    
        Baby Trend Diaper Champ
        Baby Luke can turn a
clean diaper to a dirty ...
        5.0
        {'all': 1, 'less': 1,
"friend's": 1, '(which': ...
        1
        0
    
    
        Baby Trend Diaper Champ
        I LOOOVE this diaper
pail!  Its the easies ...
        5.0
        {'just': 1, 'over': 1,
'rweek': 1, 'sooo': 1, ...
        1
        0
    
    
        Baby Trend Diaper Champ
        We researched all of the
different types of di ...
        4.0
        {'all': 2, 'just': 4,
"don't": 2, 'one,': 1, ...
        1
        0
    
    
        Baby Trend Diaper Champ
        My baby is now 8 months
and the can has been ...
        5.0
        {"don't": 1, 'when': 1,
'over': 1, 'soon': 1, ...
        1
        0
    
    
        Baby Trend Diaper Champ
        This is absolutely, by
far, the best diaper  ...
        5.0
        {'just': 3, 'money': 1,
'not': 2, 'mechanism' ...
        1
        0
    
    
        Baby Trend Diaper Champ
        Diaper Champ or Diaper
Genie? That was my ...
        5.0
        {'all': 1, 'bags.': 1,
'son,': 1, '(i': 1, ...
        1
        0
    
    
        Baby Trend Diaper Champ
        Wow!  This is fabulous.
It was a toss-up between ...
        5.0
        {'and': 4, '"genie".': 1,
'since': 1, 'garbage' ...
        1
        0
    
    
        Baby Trend Diaper Champ
        I originally put this
item on my baby registry ...
        5.0
        {'lysol': 1, 'all': 2,
'bags.': 1, 'feedback': ...
        1
        0
    
    
        Baby Trend Diaper Champ
        Two girlfriends and two
family members put me ...
        5.0
        {'just': 1, 'when': 1,
'both': 1, 'results': 1, ...
        1
        0
    
    
        Baby Trend Diaper Champ
        I am one of those super-
critical shoppers who ...
        5.0
        {'taller': 1, 'bags.': 1,
'just': 1, "don't": 4, ...
        1
        0
    


    
        great
        fantastic
        amazing
        love
        horrible
        bad
        terrible
        awful
        wow
        hate
        predicted_sentiment
    
    
        0
        0
        0
        0
        0
        0
        0
        0
        0
        0
        0.999999937267
    
    
        0
        0
        0
        1
        0
        0
        0
        0
        0
        0
        0.999999917406
    
    
        0
        0
        0
        0
        0
        1
        0
        0
        0
        0
        0.999999899509
    
    
        2
        0
        0
        0
        0
        1
        0
        0
        0
        0
        0.999999836182
    
    
        0
        0
        0
        2
        0
        0
        0
        0
        0
        0
        0.999999824745
    
    
        0
        0
        0
        0
        0
        0
        0
        0
        0
        0
        0.999999759315
    
    
        0
        0
        0
        0
        0
        0
        0
        0
        0
        0
        0.999999692111
    
    
        0
        0
        0
        0
        0
        0
        0
        0
        0
        0
        0.999999642488
    
    
        0
        0
        0
        0
        1
        0
        0
        0
        0
        0
        0.999999604504
    
    
        0
        0
        0
        1
        0
        0
        0
        0
        0
        0
        0.999999486804
    

[10 rows x 17 columns]

Sort the result and make exploration



In [52]:

    
daiper_champ_reviews=daiper_champ_reviews.sort('predicted_sentiment', ascending=False)



In [54]:

    
daiper_champ_reviews[0]['review']









    Out[54]:





'Baby Luke can turn a clean diaper to a dirty diaper in 3 seconds flat. The diaper champ turns the smelly diaper into "what diaper smell" in less time than that. I hesitated and wondered what I REALLY needed for the nursery. This is one of the best purchases we made. The champ, the baby bjorn, fluerville diaper bag, and graco pack and play bassinet all vie for the best baby purchase.Great product, easy to use, economical, effective, absolutly fabulous.UpdateI knew that I loved the champ, and useing the diaper genie at a friend\'s house REALLY reinforced that!! There is no comparison, the chanp is easy and smell free, the genie was difficult to use one handed (which is absolutly vital if you have a little one on a changing pad) and there was a deffinite odor eminating from the genieplus we found that the quick tie garbage bags where the ties are integrated into the bag work really well because there isn\'t any added bulk around the sealing edge of the champ.'



In [55]:

    
daiper_champ_reviews[-1]['review']









    Out[55]:





'My husband and I selected the Diaper "Champ" mainly because you can use ordinary trash bags and not be roped into buying the specialty refill bags, and it was moderately priced (a little less than the Diaper Dekor). It also seemed that the reviews of this product were generally more positive...The positives are:1. You can use any trash bag2. Easy to use and refillThe negatives are:1. The bag doesn\'t seal around the dirty diapers, so when it comes time to refill the bag, it\'s just like opening a regular trash can. Smells like the Champ is trying to knock YOU out with odor!2. The plastic seems to smell, ie. You put a dirty diaper in the hole, and flip the handle to dump the diaper into the champ. That "side" of the plastic dumper-thingie is in contact with the air inside the dirty diaper changer, so when you flip it over the next time to dispose of another diaper, you smell the last 8 diapers you put in there...pretty gross.3. The "odor seal" (some soft material) really seems to retain odor. It cannot be washed or replaced, so after a while, the Diaper "Champ" smells even when freshly washed and deodorized (I\'m talking about hosing down outside and scrubbing with Clorox cleanser!). Super-frustrating! This is my primary complaint.Okay, so some things are a given as far as disposal systems go (ie. *some* odor, must empty frequently, must wash and disinfect occasionally), but still I think this product leaves much to be desired.We\'re going to try another disposal system.'

Prediction using selected_words_model



In [59]:

    
selected_words_model.predict(daiper_champ_reviews[0:1], output_type='probability')









    Out[59]:





dtype: float
Rows: 1
[0.7969408512906712]



In [63]:

    
daiper_champ_reviews[0]









    Out[63]:





{'amazing': 0,
 'awesome': 0,
 'awful': 0,
 'bad': 0,
 'fantastic': 0,
 'great': 0,
 'hate': 0,
 'horrible': 0,
 'love': 0,
 'name': 'Baby Trend Diaper Champ',
 'predicted_sentiment': 0.9999999372669541,
 'rating': 5.0,
 'review': 'Baby Luke can turn a clean diaper to a dirty diaper in 3 seconds flat. The diaper champ turns the smelly diaper into "what diaper smell" in less time than that. I hesitated and wondered what I REALLY needed for the nursery. This is one of the best purchases we made. The champ, the baby bjorn, fluerville diaper bag, and graco pack and play bassinet all vie for the best baby purchase.Great product, easy to use, economical, effective, absolutly fabulous.UpdateI knew that I loved the champ, and useing the diaper genie at a friend\'s house REALLY reinforced that!! There is no comparison, the chanp is easy and smell free, the genie was difficult to use one handed (which is absolutly vital if you have a little one on a changing pad) and there was a deffinite odor eminating from the genieplus we found that the quick tie garbage bags where the ties are integrated into the bag work really well because there isn\'t any added bulk around the sealing edge of the champ.',
 'sentiment': 1,
 'terrible': 0,
 'word_count': {'"what': 1,
  '(which': 1,
  '3': 1,
  'a': 6,
  'absolutly': 2,
  'added': 1,
  'all': 1,
  'and': 6,
  'any': 1,
  'are': 1,
  'around': 1,
  'at': 1,
  'baby': 3,
  'bag': 1,
  'bag,': 1,
  'bags': 1,
  'bassinet': 1,
  'because': 1,
  'best': 2,
  'bjorn,': 1,
  'bulk': 1,
  'can': 1,
  'champ': 1,
  'champ,': 2,
  'champ.': 1,
  'changing': 1,
  'chanp': 1,
  'clean': 1,
  'comparison,': 1,
  'deffinite': 1,
  'diaper': 7,
  'difficult': 1,
  'dirty': 1,
  'easy': 2,
  'economical,': 1,
  'edge': 1,
  'effective,': 1,
  'eminating': 1,
  'fabulous.updatei': 1,
  'flat.': 1,
  'fluerville': 1,
  'for': 2,
  'found': 1,
  'free,': 1,
  "friend's": 1,
  'from': 1,
  'garbage': 1,
  'genie': 2,
  'genieplus': 1,
  'graco': 1,
  'handed': 1,
  'have': 1,
  'hesitated': 1,
  'house': 1,
  'i': 3,
  'if': 1,
  'in': 2,
  'integrated': 1,
  'into': 2,
  'is': 4,
  "isn't": 1,
  'knew': 1,
  'less': 1,
  'little': 1,
  'loved': 1,
  'luke': 1,
  'made.': 1,
  'needed': 1,
  'no': 1,
  'nursery.': 1,
  'odor': 1,
  'of': 2,
  'on': 1,
  'one': 3,
  'pack': 1,
  'pad)': 1,
  'play': 1,
  'product,': 1,
  'purchase.great': 1,
  'purchases': 1,
  'quick': 1,
  'really': 3,
  'reinforced': 1,
  'sealing': 1,
  'seconds': 1,
  'smell': 1,
  'smell"': 1,
  'smelly': 1,
  'than': 1,
  'that': 2,
  'that!!': 1,
  'that.': 1,
  'the': 17,
  'there': 3,
  'this': 1,
  'tie': 1,
  'ties': 1,
  'time': 1,
  'to': 3,
  'turn': 1,
  'turns': 1,
  'use': 1,
  'use,': 1,
  'useing': 1,
  'vie': 1,
  'vital': 1,
  'was': 2,
  'we': 2,
  'well': 1,
  'what': 1,
  'where': 1,
  'wondered': 1,
  'work': 1,
  'you': 1},
 'wow': 0}



In [ ]:

name	review	rating
Planetwise Flannel Wipes	These flannel wipes are OK, but in my opinion ...	3.0
Planetwise Wipe Pouch	it came early and was not disappointed. i love ...	5.0
Annas Dream Full Quilt with 2 Shams ...	Very soft and comfortable and warmer than it ...	5.0
Stop Pacifier Sucking without tears with ...	This is a product well worth the purchase. I ...	5.0
Stop Pacifier Sucking without tears with ...	All of my kids have cried non-stop when I tried to ...	5.0
Stop Pacifier Sucking without tears with ...	When the Binky Fairy came to our house, we didn't ...	5.0
A Tale of Baby's Days with Peter Rabbit ...	Lovely book, it's bound tightly so you may no ...	4.0
Baby Tracker® - Daily Childcare Journal, ...	Perfect for new parents. We were able to keep ...	5.0
Baby Tracker® - Daily Childcare Journal, ...	A friend of mine pinned this product on Pinte ...	5.0
Baby Tracker® - Daily Childcare Journal, ...	This has been an easy way for my nanny to record ...	4.0

name	review	rating	word_count
Planetwise Flannel Wipes	These flannel wipes are OK, but in my opinion ...	3.0	{'and': 5, 'stink': 1, 'because': 1, 'ordered': ...
Planetwise Wipe Pouch	it came early and was not disappointed. i love ...	5.0	{'and': 3, 'love': 1, 'it': 2, 'highly': 1, ...
Annas Dream Full Quilt with 2 Shams ...	Very soft and comfortable and warmer than it ...	5.0	{'and': 2, 'quilt': 1, 'it': 1, 'comfortable': ...
Stop Pacifier Sucking without tears with ...	This is a product well worth the purchase. I ...	5.0	{'ingenious': 1, 'and': 3, 'love': 2, ...
Stop Pacifier Sucking without tears with ...	All of my kids have cried non-stop when I tried to ...	5.0	{'and': 2, 'parents!!': 1, 'all': 2, 'puppet.': ...
Stop Pacifier Sucking without tears with ...	When the Binky Fairy came to our house, we didn't ...	5.0	{'and': 2, 'cute': 1, 'help': 2, 'doll': 1, ...
A Tale of Baby's Days with Peter Rabbit ...	Lovely book, it's bound tightly so you may no ...	4.0	{'shop': 1, 'be': 1, 'is': 1, 'it': 1, 'as': ...
Baby Tracker® - Daily Childcare Journal, ...	Perfect for new parents. We were able to keep ...	5.0	{'feeding,': 1, 'and': 2, 'all': 1, 'right': 1, ...
Baby Tracker® - Daily Childcare Journal, ...	A friend of mine pinned this product on Pinte ...	5.0	{'and': 1, 'help': 1, 'give': 1, 'is': 1, ...
Baby Tracker® - Daily Childcare Journal, ...	This has been an easy way for my nanny to record ...	4.0	{'journal.': 1, 'all': 1, 'standarad': 1, ...

great	fantastic	amazing	love	horrible	bad	terrible	awful	wow	hate
0	0	0	1	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	2	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	2	0	0	0	0	0	0

name	review	rating	word_count	sentiment
Baby Trend Diaper Champ	Ok - newsflash. Diapers are just smelly. We've ...	4.0	{'just': 2, 'less': 1, '-': 3, 'smell- ...	1
Baby Trend Diaper Champ	My husband and I selected the Diaper "Champ" ma ...	1.0	{'just': 1, 'less': 1, 'when': 3, 'over': 1, ...	0
Baby Trend Diaper Champ	Excellent diaper disposal unit. I used it in ...	5.0	{'control': 1, 'am': 1, 'it': 1, 'used': 1, ' ...	1
Baby Trend Diaper Champ	We love our diaper champ. It is very easy to use ...	5.0	{'and': 3, 'over.': 1, 'all': 1, 'love': 1, ...	1
Baby Trend Diaper Champ	Two girlfriends and two family members put me ...	5.0	{'just': 1, 'when': 1, 'both': 1, 'results': 1, ...	1
Baby Trend Diaper Champ	I waited to review this until I saw how it ...	4.0	{'lysol': 1, 'all': 1, 'mom.': 1, 'busy': 1, ...	1
Baby Trend Diaper Champ	I have had a diaper genie for almost 4 years since ...	1.0	{'all': 1, 'bags.': 1, 'just': 1, "don't": 2, ...	0
Baby Trend Diaper Champ	I originally put this item on my baby registry ...	5.0	{'lysol': 1, 'all': 2, 'bags.': 1, 'feedback': ...	1
Baby Trend Diaper Champ	I am so glad I got the Diaper Champ instead of ...	5.0	{'and': 2, 'all': 1, 'just': 1, 'is': 2, ' ...	1
Baby Trend Diaper Champ	We had 2 diaper Genie's both given to us as a ...	4.0	{'hand.': 1, '(required': 1, 'before': 1, ...	1

love	horrible	bad
0	0	0
0	0	0
0	0	0
1	0	0
0	1	0
0	0	1
0	0	0
0	0	0
0	0	0
2	0	0

name	review	rating	word_count	sentiment
Baby Trend Diaper Champ	Baby Luke can turn a clean diaper to a dirty ...	5.0	{'all': 1, 'less': 1, "friend's": 1, '(which': ...	1
Baby Trend Diaper Champ	I LOOOVE this diaper pail! Its the easies ...	5.0	{'just': 1, 'over': 1, 'rweek': 1, 'sooo': 1, ...	1
Baby Trend Diaper Champ	We researched all of the different types of di ...	4.0	{'all': 2, 'just': 4, "don't": 2, 'one,': 1, ...	1
Baby Trend Diaper Champ	My baby is now 8 months and the can has been ...	5.0	{"don't": 1, 'when': 1, 'over': 1, 'soon': 1, ...	1
Baby Trend Diaper Champ	This is absolutely, by far, the best diaper ...	5.0	{'just': 3, 'money': 1, 'not': 2, 'mechanism' ...	1
Baby Trend Diaper Champ	Diaper Champ or Diaper Genie? That was my ...	5.0	{'all': 1, 'bags.': 1, 'son,': 1, '(i': 1, ...	1
Baby Trend Diaper Champ	Wow! This is fabulous. It was a toss-up between ...	5.0	{'and': 4, '"genie".': 1, 'since': 1, 'garbage' ...	1
Baby Trend Diaper Champ	I originally put this item on my baby registry ...	5.0	{'lysol': 1, 'all': 2, 'bags.': 1, 'feedback': ...	1
Baby Trend Diaper Champ	Two girlfriends and two family members put me ...	5.0	{'just': 1, 'when': 1, 'both': 1, 'results': 1, ...	1
Baby Trend Diaper Champ	I am one of those super- critical shoppers who ...	5.0	{'taller': 1, 'bags.': 1, 'just': 1, "don't": 4, ...	1

great	love	horrible	bad	predicted_sentiment
0	0	0	0	0.999999937267
0	1	0	0	0.999999917406
0	0	0	1	0.999999899509
2	0	0	1	0.999999836182
0	2	0	0	0.999999824745
0	0	0	0	0.999999759315
0	0	0	0	0.999999692111
0	0	0	0	0.999999642488
0	0	1	0	0.999999604504
0	1	0	0	0.999999486804

great	fantastic	amazing	love	horrible	bad	terrible	awful	wow	hate
0	0	0	1	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	2	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	2	0	0	0	0	0	0

love	horrible	bad
0	0	0
0	0	0
0	0	0
1	0	0
0	1	0
0	0	1
0	0	0
0	0	0
0	0	0
2	0	0

great	fantastic	amazing	love	horrible	bad	terrible	awful	wow	hate
0	0	0	1	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	2	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	2	0	0	0	0	0	0

love	horrible	bad
0	0	0
0	0	0
0	0	0
1	0	0
0	1	0
0	0	1
0	0	0
0	0	0
0	0	0
2	0	0