In [5]:
import graphlab

加载数据


In [6]:
products = graphlab.SFrame('amazon_baby.gl/')

探索数据


In [9]:
products.head()


Out[9]:
name review rating
Planetwise Flannel Wipes These flannel wipes are
OK, but in my opinion ...
3.0
Planetwise Wipe Pouch it came early and was not
disappointed. i love ...
5.0
Annas Dream Full Quilt
with 2 Shams ...
Very soft and comfortable
and warmer than it ...
5.0
Stop Pacifier Sucking
without tears with ...
This is a product well
worth the purchase. I ...
5.0
Stop Pacifier Sucking
without tears with ...
All of my kids have cried
non-stop when I tried to ...
5.0
Stop Pacifier Sucking
without tears with ...
When the Binky Fairy came
to our house, we didn't ...
5.0
A Tale of Baby's Days
with Peter Rabbit ...
Lovely book, it's bound
tightly so you may no ...
4.0
Baby Tracker® - Daily
Childcare Journal, ...
Perfect for new parents.
We were able to keep ...
5.0
Baby Tracker® - Daily
Childcare Journal, ...
A friend of mine pinned
this product on Pinte ...
5.0
Baby Tracker® - Daily
Childcare Journal, ...
This has been an easy way
for my nanny to record ...
4.0
[10 rows x 3 columns]

为每条评论建立单词统计量


In [13]:
products['word_count'] = graphlab.text_analytics.count_words(products['review'])

In [33]:
products.head()


Out[33]:
name review rating word_count sentiment
Planetwise Wipe Pouch it came early and was not
disappointed. i love ...
5.0 {'and': 3, 'love': 1,
'it': 2, 'highly': 1, ...
1
Annas Dream Full Quilt
with 2 Shams ...
Very soft and comfortable
and warmer than it ...
5.0 {'and': 2, 'quilt': 1,
'it': 1, 'comfortable': ...
1
Stop Pacifier Sucking
without tears with ...
This is a product well
worth the purchase. I ...
5.0 {'ingenious': 1, 'and':
3, 'love': 2, ...
1
Stop Pacifier Sucking
without tears with ...
All of my kids have cried
non-stop when I tried to ...
5.0 {'and': 2, 'parents!!':
1, 'all': 2, 'puppet.': ...
1
Stop Pacifier Sucking
without tears with ...
When the Binky Fairy came
to our house, we didn't ...
5.0 {'and': 2, 'this': 2,
'her': 1, 'help': 2, ...
1
A Tale of Baby's Days
with Peter Rabbit ...
Lovely book, it's bound
tightly so you may no ...
4.0 {'shop': 1, 'noble': 1,
'is': 1, 'it': 1, 'as': ...
1
Baby Tracker® - Daily
Childcare Journal, ...
Perfect for new parents.
We were able to keep ...
5.0 {'and': 2, 'all': 1,
'right': 1, 'when': 1, ...
1
Baby Tracker® - Daily
Childcare Journal, ...
A friend of mine pinned
this product on Pinte ...
5.0 {'and': 1, 'help': 1,
'give': 1, 'is': 1, ' ...
1
Baby Tracker® - Daily
Childcare Journal, ...
This has been an easy way
for my nanny to record ...
4.0 {'journal.': 1, 'nanny':
1, 'standarad': 1, ...
1
Baby Tracker® - Daily
Childcare Journal, ...
I love this journal and
our nanny uses it ...
4.0 {'all': 1, 'forget': 1,
'just': 1, 'food': 1, ...
1
[10 rows x 5 columns]


In [29]:
graphlab.canvas.set_target('ipynb')

In [15]:
products['name'].show()


探索最受欢迎产品 Vulli Sophie


In [16]:
giraffe_views = products[products['name'] == 'Vulli Sophie the Giraffe Teether']

In [19]:
giraffe_views['rating'].show(view='Categorical')


建立情感分类器

1.定义好评和差评


In [25]:
#忽略三星的评价
products = products[products['rating'] != 3]

In [27]:
#好评 4* or 5*
#差评 1* or 2*
products['sentiment'] = products['rating'] >= 4

2.训练分类器


In [28]:
train_data, test_data = products.random_split(.8, seed=0)

In [32]:
sentiment_model = graphlab.logistic_classifier.create(train_data, target='sentiment', features=['word_count'], validation_set=test_data)


WARNING: The number of feature dimensions in this problem is very large in comparison with the number of examples. Unless an appropriate regularization value is set, this model may not provide accurate predictions for a validation/test set.
Logistic regression:
--------------------------------------------------------
Number of examples          : 133448
Number of classes           : 2
Number of feature columns   : 1
Number of unpacked features : 219217
Number of coefficients    : 219218
Starting L-BFGS
--------------------------------------------------------
+-----------+----------+-----------+--------------+-------------------+---------------------+
| Iteration | Passes   | Step size | Elapsed Time | Training-accuracy | Validation-accuracy |
+-----------+----------+-----------+--------------+-------------------+---------------------+
| 1         | 5        | 0.000002  | 6.220948     | 0.841481          | 0.839989            |
| 2         | 9        | 3.000000  | 12.053105    | 0.947425          | 0.894877            |
| 3         | 10       | 3.000000  | 14.086256    | 0.923768          | 0.866232            |
| 4         | 11       | 3.000000  | 15.705626    | 0.971779          | 0.912743            |
| 5         | 12       | 3.000000  | 17.282436    | 0.975511          | 0.908900            |
| 6         | 13       | 3.000000  | 19.140040    | 0.899991          | 0.825967            |
| 7         | 15       | 1.000000  | 21.783334    | 0.984548          | 0.921451            |
| 8         | 16       | 1.000000  | 23.310993    | 0.985118          | 0.921871            |
| 9         | 17       | 1.000000  | 25.148886    | 0.987066          | 0.919709            |
| 10        | 18       | 1.000000  | 27.294883    | 0.988715          | 0.916256            |
+-----------+----------+-----------+--------------+-------------------+---------------------+
TERMINATED: Iteration limit reached.
This model may not be optimal. To improve it, consider increasing `max_iterations`.

模型评估


In [35]:
sentiment_model.evaluate(test_data, metric='roc_curve')


Out[35]:
{'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 
 Rows: 100001
 
 Data:
 +-----------+----------------+----------------+-------+------+
 | threshold |      fpr       |      tpr       |   p   |  n   |
 +-----------+----------------+----------------+-------+------+
 |    0.0    |      1.0       |      1.0       | 27976 | 5328 |
 |   1e-05   | 0.909346846847 | 0.998856162425 | 27976 | 5328 |
 |   2e-05   | 0.896021021021 | 0.998748927652 | 27976 | 5328 |
 |   3e-05   | 0.886448948949 | 0.998462968259 | 27976 | 5328 |
 |   4e-05   | 0.879692192192 | 0.998284243637 | 27976 | 5328 |
 |   5e-05   | 0.875187687688 | 0.998212753789 | 27976 | 5328 |
 |   6e-05   | 0.872184684685 | 0.998177008865 | 27976 | 5328 |
 |   7e-05   | 0.868618618619 | 0.998034029168 | 27976 | 5328 |
 |   8e-05   | 0.864677177177 | 0.997998284244 | 27976 | 5328 |
 |   9e-05   | 0.860735735736 | 0.997962539319 | 27976 | 5328 |
 +-----------+----------------+----------------+-------+------+
 [100001 rows x 5 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}

In [37]:
sentiment_model.show(view='Evaluation')


应用模型


In [38]:
giraffe_views['predicted_sentiment'] = sentiment_model.predict(giraffe_views, output_type='probability')

In [39]:
giraffe_views.head()


Out[39]:
name review rating word_count predicted_sentiment
Vulli Sophie the Giraffe
Teether ...
He likes chewing on all
the parts especially the ...
5.0 {'and': 1, 'all': 1,
'because': 1, 'it': 1, ...
0.999513023521
Vulli Sophie the Giraffe
Teether ...
My son loves this toy and
fits great in the diaper ...
5.0 {'and': 1, 'right': 1,
'help': 1, 'just': 1, ...
0.999320678306
Vulli Sophie the Giraffe
Teether ...
There really should be a
large warning on the ...
1.0 {'and': 2, 'all': 1,
'would': 1, 'latex.': 1, ...
0.013558811687
Vulli Sophie the Giraffe
Teether ...
All the moms in my moms'
group got Sophie for ...
5.0 {'and': 2, 'one!': 1,
'all': 1, 'love': 1, ...
0.995769474148
Vulli Sophie the Giraffe
Teether ...
I was a little skeptical
on whether Sophie was ...
5.0 {'and': 3, 'all': 1,
'months': 1, 'old': 1, ...
0.662374415673
Vulli Sophie the Giraffe
Teether ...
I have been reading about
Sophie and was going ...
5.0 {'and': 6, 'seven': 1,
'already': 1, 'love': 1, ...
0.999997148186
Vulli Sophie the Giraffe
Teether ...
My neice loves her sophie
and has spent hours ...
5.0 {'and': 4, 'drooling,':
1, 'love': 1, ...
0.989190989536
Vulli Sophie the Giraffe
Teether ...
What a friendly face!
And those mesmerizing ...
5.0 {'and': 3, 'chew': 1,
'be': 1, 'is': 1, ...
0.999563518413
Vulli Sophie the Giraffe
Teether ...
We got this just for my
son to chew on instea ...
5.0 {'chew': 2, 'seemed': 1,
'because': 1, 'about.': ...
0.970160542725
Vulli Sophie the Giraffe
Teether ...
My baby seems to like
this toy, but I could ...
3.0 {'and': 2, 'already': 1,
'some': 1, 'it': 3, ...
0.195367644588
[10 rows x 5 columns]

排序点评


In [41]:
giraffe_views = giraffe_views.sort('predicted_sentiment', ascending=False)

In [42]:
giraffe_views.head()


Out[42]:
name review rating word_count predicted_sentiment
Vulli Sophie the Giraffe
Teether ...
Sophie, oh Sophie, your
time has come. My ...
5.0 {'giggles': 1, 'all': 1,
"violet's": 2, 'bring': ...
1.0
Vulli Sophie the Giraffe
Teether ...
I'm not sure why Sophie
is such a hit with the ...
4.0 {'adoring': 1, 'find': 1,
'month': 1, 'bright': 1, ...
0.999999999703
Vulli Sophie the Giraffe
Teether ...
I'll be honest...I bought
this toy because all the ...
4.0 {'all': 2, 'discovered':
1, 'existence.': 1, ...
0.999999999392
Vulli Sophie the Giraffe
Teether ...
We got this little
giraffe as a gift from a ...
5.0 {'all': 2, "don't": 1,
'(literally).so': 1, ...
0.99999999919
Vulli Sophie the Giraffe
Teether ...
As a mother of 16month
old twins; I bought ...
5.0 {'cute': 1, 'all': 1,
'reviews.': 2, 'just' ...
0.999999998657
Vulli Sophie the Giraffe
Teether ...
Sophie the Giraffe is the
perfect teething toy. ...
5.0 {'just': 2, 'both': 1,
'month': 1, 'ears,': 1, ...
0.999999997108
Vulli Sophie the Giraffe
Teether ...
Sophie la giraffe is
absolutely the best toy ...
5.0 {'and': 5, 'the': 1,
'all': 1, 'that': 2, ...
0.999999995589
Vulli Sophie the Giraffe
Teether ...
My 5-mos old son took to
this immediately. The ...
5.0 {'just': 1, 'shape': 2,
'mutt': 1, '"dog': 1, ...
0.999999995573
Vulli Sophie the Giraffe
Teether ...
My nephews and my four
kids all had Sophie in ...
5.0 {'and': 4, 'chew': 1,
'all': 1, 'perfect;': 1, ...
0.999999989527
Vulli Sophie the Giraffe
Teether ...
Never thought I'd see my
son French kissing a ...
5.0 {'giggles': 1, 'all': 1,
'out,': 1, 'over': 1, ...
0.999999985069
[10 rows x 5 columns]


In [ ]: