Exploring precision and recall

The goal of this second notebook is to understand precision-recall in the context of classifiers.

Use Amazon review data in its entirety.
Train a logistic regression model.
Explore various evaluation metrics: accuracy, confusion matrix, precision, recall.
Explore how various metrics can be combined to produce a cost of making an error.
Explore precision and recall curves.

Because we are using the full Amazon review dataset (not a subset of words or reviews), in this assignment we return to using GraphLab Create for its efficiency. As usual, let's start by firing up GraphLab Create.

Make sure you have the latest version of GraphLab Create (1.8.3 or later). If you don't find the decision tree module, then you would need to upgrade graphlab-create using

   pip install graphlab-create --upgrade

See this page for detailed instructions on upgrading.



In [1]:

    
import graphlab
from __future__ import division
import numpy as np
graphlab.canvas.set_target('ipynb')

Load amazon review dataset



In [2]:

    
products = graphlab.SFrame('amazon_baby.gl/')









    



[INFO] graphlab.cython.cy_server: GraphLab Create v1.10.1 started. Logging: /tmp/graphlab_server_1466671956.log






    



This non-commercial license of GraphLab Create is assigned to abhishek.raj@iiitdmj.ac.in and will expire on May 23, 2017. For commercial licensing options, visit https://dato.com/buy/.



In [3]:

    
len(products)









    Out[3]:





183531

Extract word counts and sentiments

As in the first assignment of this course, we compute the word counts for individual words and extract positive and negative sentiments from ratings. To summarize, we perform the following:

Remove punctuation.
Remove reviews with "neutral" sentiment (rating 3).
Set reviews with rating 4 or more to be positive and those with 2 or less to be negative.



In [3]:

    
def remove_punctuation(text):
    import string
    return text.translate(None, string.punctuation) 

# Remove punctuation.
review_clean = products['review'].apply(remove_punctuation)

# Count words
products['word_count'] = graphlab.text_analytics.count_words(review_clean)

# Drop neutral sentiment reviews.
products = products[products['rating'] != 3]

# Positive sentiment to +1 and negative sentiment to -1
products['sentiment'] = products['rating'].apply(lambda rating : +1 if rating > 3 else -1)

Now, let's remember what the dataset looks like by taking a quick peek:



In [4]:

    
products









    Out[4]:





    
        name
        review
        rating
        word_count
        sentiment
    
    
        Planetwise Wipe Pouch
        it came early and was not
disappointed. i love ...
        5.0
        {'and': 3, 'love': 1,
'it': 3, 'highly': 1, ...
        1
    
    
        Annas Dream Full Quilt
with 2 Shams ...
        Very soft and comfortable
and warmer than it ...
        5.0
        {'and': 2, 'quilt': 1,
'it': 1, 'comfortable': ...
        1
    
    
        Stop Pacifier Sucking
without tears with ...
        This is a product well
worth the purchase.  I ...
        5.0
        {'and': 3, 'ingenious':
1, 'love': 2, 'what': 1, ...
        1
    
    
        Stop Pacifier Sucking
without tears with ...
        All of my kids have cried
non-stop when I tried to ...
        5.0
        {'and': 2, 'all': 2,
'help': 1, 'cried': 1, ...
        1
    
    
        Stop Pacifier Sucking
without tears with ...
        When the Binky Fairy came
to our house, we didn't ...
        5.0
        {'and': 2, 'this': 2,
'her': 1, 'help': 2, ...
        1
    
    
        A Tale of Baby's Days
with Peter Rabbit ...
        Lovely book, it's bound
tightly so you may no ...
        4.0
        {'shop': 1, 'noble': 1,
'is': 1, 'it': 1, 'as': ...
        1
    
    
        Baby Tracker&reg; - Daily
Childcare Journal, ...
        Perfect for new parents.
We were able to keep ...
        5.0
        {'and': 2, 'all': 1,
'right': 1, 'had': 1, ...
        1
    
    
        Baby Tracker&reg; - Daily
Childcare Journal, ...
        A friend of mine pinned
this product on Pinte ...
        5.0
        {'and': 1, 'fantastic':
1, 'help': 1, 'give': 1, ...
        1
    
    
        Baby Tracker&reg; - Daily
Childcare Journal, ...
        This has been an easy way
for my nanny to record ...
        4.0
        {'all': 1, 'standarad':
1, 'another': 1, 'when': ...
        1
    
    
        Baby Tracker&reg; - Daily
Childcare Journal, ...
        I love this journal and
our nanny uses it ...
        4.0
        {'all': 2, 'nannys': 1,
'just': 1, 'food': 1, ...
        1
    

[166752 rows x 5 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

Split data into training and test sets

We split the data into a 80-20 split where 80% is in the training set and 20% is in the test set.



In [5]:

    
train_data, test_data = products.random_split(.8, seed=1)

Train a logistic regression classifier

We will now train a logistic regression classifier with sentiment as the target and word_count as the features. We will set validation_set=None to make sure everyone gets exactly the same results.

Remember, even though we now know how to implement logistic regression, we will use GraphLab Create for its efficiency at processing this Amazon dataset in its entirety. The focus of this assignment is instead on the topic of precision and recall.



In [6]:

    
model = graphlab.logistic_classifier.create(train_data, target='sentiment',
                                            features=['word_count'],
                                            validation_set=None)









    




Logistic regression:






    




--------------------------------------------------------






    




Number of examples          : 133416






    




Number of classes           : 2






    




Number of feature columns   : 1






    




Number of unpacked features : 121712






    




Number of coefficients    : 121713






    




Starting L-BFGS






    




--------------------------------------------------------






    




+-----------+----------+-----------+--------------+-------------------+






    




| Iteration | Passes   | Step size | Elapsed Time | Training-accuracy |






    




+-----------+----------+-----------+--------------+-------------------+






    




| 1         | 5        | 0.000002  | 2.001432     | 0.840754          |






    




| 2         | 9        | 3.000000  | 3.036611     | 0.931350          |






    




| 3         | 10       | 3.000000  | 3.427701     | 0.882046          |






    




| 4         | 11       | 3.000000  | 3.811500     | 0.954076          |






    




| 5         | 12       | 3.000000  | 4.160261     | 0.960964          |






    




| 6         | 13       | 3.000000  | 4.571008     | 0.975033          |






    




+-----------+----------+-----------+--------------+-------------------+






    




TERMINATED: Terminated due to numerical difficulties.






    




This model may not be ideal. To improve it, consider doing one of the following:
(a) Increasing the regularization.
(b) Standardizing the input data.
(c) Removing highly correlated features.
(d) Removing `inf` and `NaN` values in the training data.

Model Evaluation

We will explore the advanced model evaluation concepts that were discussed in the lectures.

Accuracy

One performance metric we will use for our more advanced exploration is accuracy, which we have seen many times in past assignments. Recall that the accuracy is given by

$$ \mbox{accuracy} = \frac{\mbox{# correctly classified data points}}{\mbox{# total data points}} $$

To obtain the accuracy of our trained models using GraphLab Create, simply pass the option metric='accuracy' to the evaluate function. We compute the accuracy of our logistic regression model on the test_data as follows:



In [7]:

    
accuracy= model.evaluate(test_data, metric='accuracy')['accuracy']
print "Test Accuracy: %s" % accuracy









    



Test Accuracy: 0.914536837053

Baseline: Majority class prediction

Recall from an earlier assignment that we used the majority class classifier as a baseline (i.e reference) model for a point of comparison with a more sophisticated classifier. The majority classifier model predicts the majority class for all data points.

Typically, a good model should beat the majority class classifier. Since the majority class in this dataset is the positive class (i.e., there are more positive than negative reviews), the accuracy of the majority class classifier can be computed as follows:



In [8]:

    
baseline = len(test_data[test_data['sentiment'] == 1])/len(test_data)
print "Baseline accuracy (majority class classifier): %s" % baseline









    



Baseline accuracy (majority class classifier): 0.842782577394

Quiz Question: Using accuracy as the evaluation metric, was our logistic regression model better than the baseline (majority class classifier)?

Answer: Yes

Confusion Matrix

The accuracy, while convenient, does not tell the whole story. For a fuller picture, we turn to the confusion matrix. In the case of binary classification, the confusion matrix is a 2-by-2 matrix laying out correct and incorrect predictions made in each label as follows:

              +---------------------------------------------+
              |                Predicted label              |
              +----------------------+----------------------+
              |          (+1)        |         (-1)         |
+-------+-----+----------------------+----------------------+
| True  |(+1) | # of true positives  | # of false negatives |
| label +-----+----------------------+----------------------+
|       |(-1) | # of false positives | # of true negatives  |
+-------+-----+----------------------+----------------------+

To print out the confusion matrix for a classifier, use metric='confusion_matrix':



In [9]:

    
confusion_matrix = model.evaluate(test_data, metric='confusion_matrix')['confusion_matrix']
confusion_matrix









    Out[9]:





    
        target_label
        predicted_label
        count
    
    
        1
        -1
        1406
    
    
        -1
        -1
        3798
    
    
        -1
        1
        1443
    
    
        1
        1
        26689
    

[4 rows x 3 columns]

Quiz Question: How many predicted values in the test set are false positives?

1443

Computing the cost of mistakes

Put yourself in the shoes of a manufacturer that sells a baby product on Amazon.com and you want to monitor your product's reviews in order to respond to complaints. Even a few negative reviews may generate a lot of bad publicity about the product. So you don't want to miss any reviews with negative sentiments --- you'd rather put up with false alarms about potentially negative reviews instead of missing negative reviews entirely. In other words, false positives cost more than false negatives. (It may be the other way around for other scenarios, but let's stick with the manufacturer's scenario for now.)

Suppose you know the costs involved in each kind of mistake:

\$100 for each false positive.
\$1 for each false negative.
Correctly classified reviews incur no cost.

Quiz Question: Given the stipulation, what is the cost associated with the logistic regression classifier's performance on the test set?



In [10]:

    
(1433 * 100) + (1406)









    Out[10]:





144706

Precision and Recall

You may not have exact dollar amounts for each kind of mistake. Instead, you may simply prefer to reduce the percentage of false positives to be less than, say, 3.5% of all positive predictions. This is where precision comes in:

$$ [\text{precision}] = \frac{[\text{# positive data points with positive predicitions}]}{\text{[# all data points with positive predictions]}} = \frac{[\text{# true positives}]}{[\text{# true positives}] + [\text{# false positives}]} $$

So to keep the percentage of false positives below 3.5% of positive predictions, we must raise the precision to 96.5% or higher.

First, let us compute the precision of the logistic regression classifier on the test_data.



In [11]:

    
precision = model.evaluate(test_data, metric='precision')['precision']
print "Precision on test data: %s" % precision









    



Precision on test data: 0.948706099815

Quiz Question: Out of all reviews in the test set that are predicted to be positive, what fraction of them are false positives? (Round to the second decimal place e.g. 0.25)



In [12]:

    
1443 / float(1406 + 3798 + 1443 + 26689)









    Out[12]:





0.04328653707703384

Quiz Question: Based on what we learned in lecture, if we wanted to reduce this fraction of false positives to be below 3.5%, we would: (see the quiz)

A complementary metric is recall, which measures the ratio between the number of true positives and that of (ground-truth) positive reviews:

$$ [\text{recall}] = \frac{[\text{# positive data points with positive predicitions}]}{\text{[# all positive data points]}} = \frac{[\text{# true positives}]}{[\text{# true positives}] + [\text{# false negatives}]} $$

Let us compute the recall on the test_data.



In [13]:

    
recall = model.evaluate(test_data, metric='recall')['recall']
print "Recall on test data: %s" % recall









    



Recall on test data: 0.949955508098

Quiz Question: What fraction of the positive reviews in the test_set were correctly predicted as positive by the classifier?

Quiz Question: What is the recall value for a classifier that predicts +1 for all data points in the test_data?



In [43]:

    
(26689 + 1406) / float(1406 + 26689)









    Out[43]:





1.0

Precision-recall tradeoff

In this part, we will explore the trade-off between precision and recall discussed in the lecture. We first examine what happens when we use a different threshold value for making class predictions. We then explore a range of threshold values and plot the associated precision-recall curve.

Varying the threshold

False positives are costly in our example, so we may want to be more conservative about making positive predictions. To achieve this, instead of thresholding class probabilities at 0.5, we can choose a higher threshold.

Write a function called apply_threshold that accepts two things

probabilities (an SArray of probability values)
threshold (a float between 0 and 1).

The function should return an array, where each element is set to +1 or -1 depending whether the corresponding probability exceeds threshold.



In [16]:

    
def apply_threshold(probabilities, threshold):
    ### YOUR CODE GOES HERE
    # +1 if >= threshold and -1 otherwise.
    return probabilities.apply(lambda x: -1 if x < threshold else +1)

Run prediction with output_type='probability' to get the list of probability values. Then use thresholds set at 0.5 (default) and 0.9 to make predictions from these probability values.



In [17]:

    
probabilities = model.predict(test_data, output_type='probability')
predictions_with_default_threshold = apply_threshold(probabilities, 0.5)
predictions_with_high_threshold = apply_threshold(probabilities, 0.9)



In [18]:

    
print "Number of positive predicted reviews (threshold = 0.5): %s" % (predictions_with_default_threshold == 1).sum()









    



Number of positive predicted reviews (threshold = 0.5): 28132



In [19]:

    
print "Number of positive predicted reviews (threshold = 0.9): %s" % (predictions_with_high_threshold == 1).sum()









    



Number of positive predicted reviews (threshold = 0.9): 25630

Quiz Question: What happens to the number of positive predicted reviews as the threshold increased from 0.5 to 0.9?

Exploring the associated precision and recall as the threshold varies

By changing the probability threshold, it is possible to influence precision and recall. We can explore this as follows:



In [20]:

    
# Threshold = 0.5
precision_with_default_threshold = graphlab.evaluation.precision(test_data['sentiment'],
                                        predictions_with_default_threshold)

recall_with_default_threshold = graphlab.evaluation.recall(test_data['sentiment'],
                                        predictions_with_default_threshold)

# Threshold = 0.9
precision_with_high_threshold = graphlab.evaluation.precision(test_data['sentiment'],
                                        predictions_with_high_threshold)
recall_with_high_threshold = graphlab.evaluation.recall(test_data['sentiment'],
                                        predictions_with_high_threshold)



In [21]:

    
print "Precision (threshold = 0.5): %s" % precision_with_default_threshold
print "Recall (threshold = 0.5)   : %s" % recall_with_default_threshold









    



Precision (threshold = 0.5): 0.948706099815
Recall (threshold = 0.5)   : 0.949955508098



In [22]:

    
print "Precision (threshold = 0.9): %s" % precision_with_high_threshold
print "Recall (threshold = 0.9)   : %s" % recall_with_high_threshold









    



Precision (threshold = 0.9): 0.969527896996
Recall (threshold = 0.9)   : 0.884463427656

Quiz Question (variant 1): Does the precision increase with a higher threshold?

Quiz Question (variant 2): Does the recall increase with a higher threshold?

Precision-recall curve

Now, we will explore various different values of tresholds, compute the precision and recall scores, and then plot the precision-recall curve.



In [23]:

    
threshold_values = np.linspace(0.5, 1, num=100)
print threshold_values









    



[ 0.5         0.50505051  0.51010101  0.51515152  0.52020202  0.52525253
  0.53030303  0.53535354  0.54040404  0.54545455  0.55050505  0.55555556
  0.56060606  0.56565657  0.57070707  0.57575758  0.58080808  0.58585859
  0.59090909  0.5959596   0.6010101   0.60606061  0.61111111  0.61616162
  0.62121212  0.62626263  0.63131313  0.63636364  0.64141414  0.64646465
  0.65151515  0.65656566  0.66161616  0.66666667  0.67171717  0.67676768
  0.68181818  0.68686869  0.69191919  0.6969697   0.7020202   0.70707071
  0.71212121  0.71717172  0.72222222  0.72727273  0.73232323  0.73737374
  0.74242424  0.74747475  0.75252525  0.75757576  0.76262626  0.76767677
  0.77272727  0.77777778  0.78282828  0.78787879  0.79292929  0.7979798
  0.8030303   0.80808081  0.81313131  0.81818182  0.82323232  0.82828283
  0.83333333  0.83838384  0.84343434  0.84848485  0.85353535  0.85858586
  0.86363636  0.86868687  0.87373737  0.87878788  0.88383838  0.88888889
  0.89393939  0.8989899   0.9040404   0.90909091  0.91414141  0.91919192
  0.92424242  0.92929293  0.93434343  0.93939394  0.94444444  0.94949495
  0.95454545  0.95959596  0.96464646  0.96969697  0.97474747  0.97979798
  0.98484848  0.98989899  0.99494949  1.        ]

For each of the values of threshold, we compute the precision and recall scores.



In [29]:

    
precision_all = []
recall_all = []

probabilities = model.predict(test_data, output_type='probability')
for threshold in threshold_values:
    predictions = apply_threshold(probabilities, threshold)
    
    precision = graphlab.evaluation.precision(test_data['sentiment'], predictions)
    recall = graphlab.evaluation.recall(test_data['sentiment'], predictions)
    print "Precision (threshold = %s): %s" % (threshold, precision)
    print "Recall (threshold = %s)   : %s" % (threshold, recall)
    
    precision_all.append(precision)
    recall_all.append(recall)









    



Precision (threshold = 0.5): 0.948706099815
Recall (threshold = 0.5)   : 0.949955508098
Precision (threshold = 0.505050505051): 0.94905908719
Recall (threshold = 0.505050505051)   : 0.949599572878
Precision (threshold = 0.510101010101): 0.949288256228
Recall (threshold = 0.510101010101)   : 0.94945719879
Precision (threshold = 0.515151515152): 0.949506819072
Recall (threshold = 0.515151515152)   : 0.94910126357
Precision (threshold = 0.520202020202): 0.949624140511
Recall (threshold = 0.520202020202)   : 0.94874532835
Precision (threshold = 0.525252525253): 0.949805711026
Recall (threshold = 0.525252525253)   : 0.948318206086
Precision (threshold = 0.530303030303): 0.950203324534
Recall (threshold = 0.530303030303)   : 0.948140238477
Precision (threshold = 0.535353535354): 0.950417648319
Recall (threshold = 0.535353535354)   : 0.947677522691
Precision (threshold = 0.540404040404): 0.950696677385
Recall (threshold = 0.540404040404)   : 0.947143619861
Precision (threshold = 0.545454545455): 0.950877694755
Recall (threshold = 0.545454545455)   : 0.946680904075
Precision (threshold = 0.550505050505): 0.951062459755
Recall (threshold = 0.550505050505)   : 0.946289375334
Precision (threshold = 0.555555555556): 0.951424684994
Recall (threshold = 0.555555555556)   : 0.94604022068
Precision (threshold = 0.560606060606): 0.951534907046
Recall (threshold = 0.560606060606)   : 0.94550631785
Precision (threshold = 0.565656565657): 0.951761459341
Recall (threshold = 0.565656565657)   : 0.945257163196
Precision (threshold = 0.570707070707): 0.952177656598
Recall (threshold = 0.570707070707)   : 0.944687666845
Precision (threshold = 0.575757575758): 0.952541642734
Recall (threshold = 0.575757575758)   : 0.944438512191
Precision (threshold = 0.580808080808): 0.952825782345
Recall (threshold = 0.580808080808)   : 0.943940202883
Precision (threshold = 0.585858585859): 0.952950902164
Recall (threshold = 0.585858585859)   : 0.943691048229
Precision (threshold = 0.590909090909): 0.953033408854
Recall (threshold = 0.590909090909)   : 0.943263925965
Precision (threshold = 0.59595959596): 0.953081711222
Recall (threshold = 0.59595959596)   : 0.942836803702
Precision (threshold = 0.60101010101): 0.953231323132
Recall (threshold = 0.60101010101)   : 0.942374087916
Precision (threshold = 0.606060606061): 0.953525236877
Recall (threshold = 0.606060606061)   : 0.942053746218
Precision (threshold = 0.611111111111): 0.953680340278
Recall (threshold = 0.611111111111)   : 0.941697810998
Precision (threshold = 0.616161616162): 0.953691347784
Recall (threshold = 0.616161616162)   : 0.941199501691
Precision (threshold = 0.621212121212): 0.954012200845
Recall (threshold = 0.621212121212)   : 0.940701192383
Precision (threshold = 0.626262626263): 0.95415959253
Recall (threshold = 0.626262626263)   : 0.940167289553
Precision (threshold = 0.631313131313): 0.954481362305
Recall (threshold = 0.631313131313)   : 0.939668980246
Precision (threshold = 0.636363636364): 0.954630969609
Recall (threshold = 0.636363636364)   : 0.939170670938
Precision (threshold = 0.641414141414): 0.954956912159
Recall (threshold = 0.641414141414)   : 0.938743548674
Precision (threshold = 0.646464646465): 0.955217391304
Recall (threshold = 0.646464646465)   : 0.938387613454
Precision (threshold = 0.651515151515): 0.955425794284
Recall (threshold = 0.651515151515)   : 0.937640149493
Precision (threshold = 0.656565656566): 0.955603150978
Recall (threshold = 0.656565656566)   : 0.936963872575
Precision (threshold = 0.661616161616): 0.955716205907
Recall (threshold = 0.661616161616)   : 0.936394376224
Precision (threshold = 0.666666666667): 0.955933682373
Recall (threshold = 0.666666666667)   : 0.935824879872
Precision (threshold = 0.671717171717): 0.95600756859
Recall (threshold = 0.671717171717)   : 0.935148602954
Precision (threshold = 0.676767676768): 0.956162388494
Recall (threshold = 0.676767676768)   : 0.934721480691
Precision (threshold = 0.681818181818): 0.956453611253
Recall (threshold = 0.681818181818)   : 0.934223171383
Precision (threshold = 0.686868686869): 0.956670800204
Recall (threshold = 0.686868686869)   : 0.933618081509
Precision (threshold = 0.691919191919): 0.956951949759
Recall (threshold = 0.691919191919)   : 0.932870617548
Precision (threshold = 0.69696969697): 0.957200292398
Recall (threshold = 0.69696969697)   : 0.932158747108
Precision (threshold = 0.70202020202): 0.95730904302
Recall (threshold = 0.70202020202)   : 0.931446876668
Precision (threshold = 0.707070707071): 0.957558224696
Recall (threshold = 0.707070707071)   : 0.930735006229
Precision (threshold = 0.712121212121): 0.957740800469
Recall (threshold = 0.712121212121)   : 0.930094322833
Precision (threshold = 0.717171717172): 0.958172812328
Recall (threshold = 0.717171717172)   : 0.929524826482
Precision (threshold = 0.722222222222): 0.958434310054
Recall (threshold = 0.722222222222)   : 0.929062110696
Precision (threshold = 0.727272727273): 0.958762128786
Recall (threshold = 0.727272727273)   : 0.928492614344
Precision (threshold = 0.732323232323): 0.959152130713
Recall (threshold = 0.732323232323)   : 0.927709556861
Precision (threshold = 0.737373737374): 0.959266352387
Recall (threshold = 0.737373737374)   : 0.927068873465
Precision (threshold = 0.742424242424): 0.95958553044
Recall (threshold = 0.742424242424)   : 0.92625022246
Precision (threshold = 0.747474747475): 0.959906966441
Recall (threshold = 0.747474747475)   : 0.925467164976
Precision (threshold = 0.752525252525): 0.959957149717
Recall (threshold = 0.752525252525)   : 0.924968855668
Precision (threshold = 0.757575757576): 0.960170118343
Recall (threshold = 0.757575757576)   : 0.924114611141
Precision (threshold = 0.762626262626): 0.96034655115
Recall (threshold = 0.762626262626)   : 0.923224773091
Precision (threshold = 0.767676767677): 0.960716006374
Recall (threshold = 0.767676767677)   : 0.922690870262
Precision (threshold = 0.772727272727): 0.960870855278
Recall (threshold = 0.772727272727)   : 0.92212137391
Precision (threshold = 0.777777777778): 0.961087182534
Recall (threshold = 0.777777777778)   : 0.921302722904
Precision (threshold = 0.782828282828): 0.961366847624
Recall (threshold = 0.782828282828)   : 0.920270510767
Precision (threshold = 0.787878787879): 0.962202659674
Recall (threshold = 0.787878787879)   : 0.914255205553
Precision (threshold = 0.792929292929): 0.962415603901
Recall (threshold = 0.792929292929)   : 0.913258586937
Precision (threshold = 0.79797979798): 0.9624873268
Recall (threshold = 0.79797979798)   : 0.912333155366
Precision (threshold = 0.80303030303): 0.962727546261
Recall (threshold = 0.80303030303)   : 0.911087382096
Precision (threshold = 0.808080808081): 0.963204278397
Recall (threshold = 0.808080808081)   : 0.910304324613
Precision (threshold = 0.813131313131): 0.963492362814
Recall (threshold = 0.813131313131)   : 0.909307705998
Precision (threshold = 0.818181818182): 0.963922783423
Recall (threshold = 0.818181818182)   : 0.908204306816
Precision (threshold = 0.823232323232): 0.964218170815
Recall (threshold = 0.823232323232)   : 0.907350062289
Precision (threshold = 0.828282828283): 0.964581991742
Recall (threshold = 0.828282828283)   : 0.906353443673
Precision (threshold = 0.833333333333): 0.964945559391
Recall (threshold = 0.833333333333)   : 0.905321231536
Precision (threshold = 0.838383838384): 0.965311550152
Recall (threshold = 0.838383838384)   : 0.90432461292
Precision (threshold = 0.843434343434): 0.965662948723
Recall (threshold = 0.843434343434)   : 0.902900872041
Precision (threshold = 0.848484848485): 0.965982762566
Recall (threshold = 0.848484848485)   : 0.901583911728
Precision (threshold = 0.853535353535): 0.966381418093
Recall (threshold = 0.853535353535)   : 0.900373731981
Precision (threshold = 0.858585858586): 0.966780205901
Recall (threshold = 0.858585858586)   : 0.899127958712
Precision (threshold = 0.863636363636): 0.966996320147
Recall (threshold = 0.863636363636)   : 0.897917778964
Precision (threshold = 0.868686868687): 0.96737626806
Recall (threshold = 0.868686868687)   : 0.896066915821
Precision (threshold = 0.873737373737): 0.96765996766
Recall (threshold = 0.873737373737)   : 0.89460758142
Precision (threshold = 0.878787878788): 0.967978395062
Recall (threshold = 0.878787878788)   : 0.893041466453
Precision (threshold = 0.883838383838): 0.968586792526
Recall (threshold = 0.883838383838)   : 0.891155009788
Precision (threshold = 0.888888888889): 0.968960968418
Recall (threshold = 0.888888888889)   : 0.888912617904
Precision (threshold = 0.893939393939): 0.969313939017
Recall (threshold = 0.893939393939)   : 0.887097348283
Precision (threshold = 0.89898989899): 0.969468923029
Recall (threshold = 0.89898989899)   : 0.884961736964
Precision (threshold = 0.90404040404): 0.969731336279
Recall (threshold = 0.90404040404)   : 0.882612564513
Precision (threshold = 0.909090909091): 0.969926286073
Recall (threshold = 0.909090909091)   : 0.880476953195
Precision (threshold = 0.914141414141): 0.970296640176
Recall (threshold = 0.914141414141)   : 0.877843032568
Precision (threshold = 0.919191919192): 0.970813586098
Recall (threshold = 0.919191919192)   : 0.874924363766
Precision (threshold = 0.924242424242): 0.971404775125
Recall (threshold = 0.924242424242)   : 0.871792133832
Precision (threshold = 0.929292929293): 0.97203187251
Recall (threshold = 0.929292929293)   : 0.868410749244
Precision (threshold = 0.934343434343): 0.972883121045
Recall (threshold = 0.934343434343)   : 0.864531055348
Precision (threshold = 0.939393939394): 0.973425672411
Recall (threshold = 0.939393939394)   : 0.860508987364
Precision (threshold = 0.944444444444): 0.974041226258
Recall (threshold = 0.944444444444)   : 0.856095390639
Precision (threshold = 0.949494949495): 0.974463571837
Recall (threshold = 0.949494949495)   : 0.850258053034
Precision (threshold = 0.954545454545): 0.974766393611
Recall (threshold = 0.954545454545)   : 0.842854600463
Precision (threshold = 0.959595959596): 0.97549325026
Recall (threshold = 0.959595959596)   : 0.835913863677
Precision (threshold = 0.964646464646): 0.976197472818
Recall (threshold = 0.964646464646)   : 0.8276917601
Precision (threshold = 0.969696969697): 0.976871731644
Recall (threshold = 0.969696969697)   : 0.817832354511
Precision (threshold = 0.974747474747): 0.977337354589
Recall (threshold = 0.974747474747)   : 0.807403452572
Precision (threshold = 0.979797979798): 0.978530031612
Recall (threshold = 0.979797979798)   : 0.793272824346
Precision (threshold = 0.984848484848): 0.980131852253
Recall (threshold = 0.984848484848)   : 0.772592988076
Precision (threshold = 0.989898989899): 0.981307971185
Recall (threshold = 0.989898989899)   : 0.741840185086
Precision (threshold = 0.994949494949): 0.984238628196
Recall (threshold = 0.994949494949)   : 0.682363409859
Precision (threshold = 1.0): 0.991666666667
Recall (threshold = 1.0)   : 0.0042356291155

Now, let's plot the precision-recall curve to visualize the precision-recall tradeoff as we vary the threshold.



In [25]:

    
import matplotlib.pyplot as plt
%matplotlib inline

def plot_pr_curve(precision, recall, title):
    plt.rcParams['figure.figsize'] = 7, 5
    plt.locator_params(axis = 'x', nbins = 5)
    plt.plot(precision, recall, 'b-', linewidth=4.0, color = '#B0017F')
    plt.title(title)
    plt.xlabel('Precision')
    plt.ylabel('Recall')
    plt.rcParams.update({'font.size': 16})
    
plot_pr_curve(precision_all, recall_all, 'Precision recall curve (all)')

Quiz Question: Among all the threshold values tried, what is the smallest threshold value that achieves a precision of 96.5% or better? Round your answer to 3 decimal places.



In [30]:

    
0.838383838384









    Out[30]:





0.838383838384

Quiz Question: Using threshold = 0.98, how many false negatives do we get on the test_data? (Hint: You may use the graphlab.evaluation.confusion_matrix function implemented in GraphLab Create.)



In [35]:

    
predictions_98threshold = apply_threshold(probabilities, 0.98)
graphlab.evaluation.confusion_matrix(test_data['sentiment'], predictions_98threshold)









    Out[35]:





    
        target_label
        predicted_label
        count
    
    
        -1
        1
        487
    
    
        1
        1
        22269
    
    
        1
        -1
        5826
    
    
        -1
        -1
        4754
    

[4 rows x 3 columns]

This is the number of false negatives (i.e the number of reviews to look at when not needed) that we have to deal with using this classifier.

Evaluating specific search terms

So far, we looked at the number of false positives for the entire test set. In this section, let's select reviews using a specific search term and optimize the precision on these reviews only. After all, a manufacturer would be interested in tuning the false positive rate just for their products (the reviews they want to read) rather than that of the entire set of products on Amazon.

From the test set, select all the reviews for all products with the word 'baby' in them.



In [36]:

    
baby_reviews =  test_data[test_data['name'].apply(lambda x: 'baby' in x.lower())]

Now, let's predict the probability of classifying these reviews as positive:



In [37]:

    
probabilities = model.predict(baby_reviews, output_type='probability')

Let's plot the precision-recall curve for the baby_reviews dataset.

First, let's consider the following threshold_values ranging from 0.5 to 1:



In [38]:

    
threshold_values = np.linspace(0.5, 1, num=100)

Second, as we did above, let's compute precision and recall for each value in threshold_values on the baby_reviews dataset. Complete the code block below.



In [40]:

    
precision_all = []
recall_all = []

for threshold in threshold_values:
    
    # Make predictions. Use the `apply_threshold` function 
    ## YOUR CODE HERE 
    predictions = apply_threshold(probabilities, threshold)

    # Calculate the precision.
    # YOUR CODE HERE
    precision = graphlab.evaluation.precision(baby_reviews['sentiment'], predictions)
    
    # YOUR CODE HERE
    recall = graphlab.evaluation.recall(baby_reviews['sentiment'], predictions)
    
    print "Precision (threshold = %s): %s" % (threshold, precision)
    print "Recall (threshold = %s)   : %s" % (threshold, recall)
    
    # Append the precision and recall scores.
    precision_all.append(precision)
    recall_all.append(recall)









    



Precision (threshold = 0.5): 0.947656392486
Recall (threshold = 0.5)   : 0.944555535357
Precision (threshold = 0.505050505051): 0.948165723672
Recall (threshold = 0.505050505051)   : 0.944373750227
Precision (threshold = 0.510101010101): 0.948319941563
Recall (threshold = 0.510101010101)   : 0.944010179967
Precision (threshold = 0.515151515152): 0.948474328522
Recall (threshold = 0.515151515152)   : 0.943646609707
Precision (threshold = 0.520202020202): 0.948638274538
Recall (threshold = 0.520202020202)   : 0.943464824577
Precision (threshold = 0.525252525253): 0.948792977323
Recall (threshold = 0.525252525253)   : 0.943101254317
Precision (threshold = 0.530303030303): 0.949487554905
Recall (threshold = 0.530303030303)   : 0.943101254317
Precision (threshold = 0.535353535354): 0.949459805896
Recall (threshold = 0.535353535354)   : 0.942555898927
Precision (threshold = 0.540404040404): 0.94998167827
Recall (threshold = 0.540404040404)   : 0.942555898927
Precision (threshold = 0.545454545455): 0.949954170486
Recall (threshold = 0.545454545455)   : 0.942010543538
Precision (threshold = 0.550505050505): 0.95011920044
Recall (threshold = 0.550505050505)   : 0.941828758408
Precision (threshold = 0.555555555556): 0.950816663608
Recall (threshold = 0.555555555556)   : 0.941828758408
Precision (threshold = 0.560606060606): 0.95080763583
Recall (threshold = 0.560606060606)   : 0.941646973278
Precision (threshold = 0.565656565657): 0.950964187328
Recall (threshold = 0.565656565657)   : 0.941283403018
Precision (threshold = 0.570707070707): 0.951793928243
Recall (threshold = 0.570707070707)   : 0.940374477368
Precision (threshold = 0.575757575758): 0.951951399116
Recall (threshold = 0.575757575758)   : 0.940010907108
Precision (threshold = 0.580808080808): 0.952082565426
Recall (threshold = 0.580808080808)   : 0.939101981458
Precision (threshold = 0.585858585859): 0.952407304925
Recall (threshold = 0.585858585859)   : 0.938556626068
Precision (threshold = 0.590909090909): 0.952363367799
Recall (threshold = 0.590909090909)   : 0.937647700418
Precision (threshold = 0.59595959596): 0.952345770225
Recall (threshold = 0.59595959596)   : 0.937284130158
Precision (threshold = 0.60101010101): 0.952336966562
Recall (threshold = 0.60101010101)   : 0.937102345028
Precision (threshold = 0.606060606061): 0.952856350527
Recall (threshold = 0.606060606061)   : 0.936920559898
Precision (threshold = 0.611111111111): 0.95282146161
Recall (threshold = 0.611111111111)   : 0.936193419378
Precision (threshold = 0.616161616162): 0.952795261014
Recall (threshold = 0.616161616162)   : 0.935648063988
Precision (threshold = 0.621212121212): 0.952901909883
Recall (threshold = 0.621212121212)   : 0.934193782949
Precision (threshold = 0.626262626263): 0.953035084463
Recall (threshold = 0.626262626263)   : 0.933284857299
Precision (threshold = 0.631313131313): 0.953212031192
Recall (threshold = 0.631313131313)   : 0.933284857299
Precision (threshold = 0.636363636364): 0.953354395094
Recall (threshold = 0.636363636364)   : 0.932557716779
Precision (threshold = 0.641414141414): 0.953683035714
Recall (threshold = 0.641414141414)   : 0.932012361389
Precision (threshold = 0.646464646465): 0.954020848846
Recall (threshold = 0.646464646465)   : 0.931648791129
Precision (threshold = 0.651515151515): 0.954172876304
Recall (threshold = 0.651515151515)   : 0.931103435739
Precision (threshold = 0.656565656566): 0.954164337619
Recall (threshold = 0.656565656566)   : 0.930921650609
Precision (threshold = 0.661616161616): 0.954291044776
Recall (threshold = 0.661616161616)   : 0.929830939829
Precision (threshold = 0.666666666667): 0.954248366013
Recall (threshold = 0.666666666667)   : 0.928922014179
Precision (threshold = 0.671717171717): 0.954214165577
Recall (threshold = 0.671717171717)   : 0.928194873659
Precision (threshold = 0.676767676768): 0.95436693473
Recall (threshold = 0.676767676768)   : 0.927649518269
Precision (threshold = 0.681818181818): 0.954715568862
Recall (threshold = 0.681818181818)   : 0.927467733139
Precision (threshold = 0.686868686869): 0.954852004496
Recall (threshold = 0.686868686869)   : 0.92655880749
Precision (threshold = 0.691919191919): 0.954997187324
Recall (threshold = 0.691919191919)   : 0.92583166697
Precision (threshold = 0.69696969697): 0.955355468017
Recall (threshold = 0.69696969697)   : 0.92583166697
Precision (threshold = 0.70202020202): 0.955492957746
Recall (threshold = 0.70202020202)   : 0.92492274132
Precision (threshold = 0.707070707071): 0.955622414442
Recall (threshold = 0.707070707071)   : 0.92383203054
Precision (threshold = 0.712121212121): 0.955768868812
Recall (threshold = 0.712121212121)   : 0.92310489002
Precision (threshold = 0.717171717172): 0.95627591406
Recall (threshold = 0.717171717172)   : 0.9223777495
Precision (threshold = 0.722222222222): 0.956259426848
Recall (threshold = 0.722222222222)   : 0.92201417924
Precision (threshold = 0.727272727273): 0.956579195771
Recall (threshold = 0.727272727273)   : 0.92110525359
Precision (threshold = 0.732323232323): 0.956924239562
Recall (threshold = 0.732323232323)   : 0.92074168333
Precision (threshold = 0.737373737374): 0.956883509834
Recall (threshold = 0.737373737374)   : 0.91983275768
Precision (threshold = 0.742424242424): 0.957188861527
Recall (threshold = 0.742424242424)   : 0.918560261771
Precision (threshold = 0.747474747475): 0.957321699545
Recall (threshold = 0.747474747475)   : 0.917469550991
Precision (threshold = 0.752525252525): 0.957471046136
Recall (threshold = 0.752525252525)   : 0.916742410471
Precision (threshold = 0.757575757576): 0.957596501236
Recall (threshold = 0.757575757576)   : 0.915469914561
Precision (threshold = 0.762626262626): 0.957912778518
Recall (threshold = 0.762626262626)   : 0.914379203781
Precision (threshold = 0.767676767677): 0.958245948522
Recall (threshold = 0.767676767677)   : 0.913652063261
Precision (threshold = 0.772727272727): 0.958778625954
Recall (threshold = 0.772727272727)   : 0.913288493001
Precision (threshold = 0.777777777778): 0.959105675521
Recall (threshold = 0.777777777778)   : 0.912379567351
Precision (threshold = 0.782828282828): 0.959586286152
Recall (threshold = 0.782828282828)   : 0.910743501182
Precision (threshold = 0.787878787879): 0.960655737705
Recall (threshold = 0.787878787879)   : 0.905471732412
Precision (threshold = 0.792929292929): 0.96098126328
Recall (threshold = 0.792929292929)   : 0.904381021632
Precision (threshold = 0.79797979798): 0.961121856867
Recall (threshold = 0.79797979798)   : 0.903290310853
Precision (threshold = 0.80303030303): 0.961084220716
Recall (threshold = 0.80303030303)   : 0.902381385203
Precision (threshold = 0.808080808081): 0.961419154711
Recall (threshold = 0.808080808081)   : 0.901472459553
Precision (threshold = 0.813131313131): 0.961545931249
Recall (threshold = 0.813131313131)   : 0.900018178513
Precision (threshold = 0.818181818182): 0.962062256809
Recall (threshold = 0.818181818182)   : 0.898927467733
Precision (threshold = 0.823232323232): 0.962378167641
Recall (threshold = 0.823232323232)   : 0.897473186693
Precision (threshold = 0.828282828283): 0.962724434036
Recall (threshold = 0.828282828283)   : 0.896746046173
Precision (threshold = 0.833333333333): 0.963064295486
Recall (threshold = 0.833333333333)   : 0.895837120524
Precision (threshold = 0.838383838384): 0.963194988254
Recall (threshold = 0.838383838384)   : 0.894382839484
Precision (threshold = 0.843434343434): 0.963711259317
Recall (threshold = 0.843434343434)   : 0.893110343574
Precision (threshold = 0.848484848485): 0.964194373402
Recall (threshold = 0.848484848485)   : 0.890928922014
Precision (threshold = 0.853535353535): 0.964722112732
Recall (threshold = 0.853535353535)   : 0.889838211234
Precision (threshold = 0.858585858586): 0.964863797868
Recall (threshold = 0.858585858586)   : 0.888565715324
Precision (threshold = 0.863636363636): 0.965019762846
Recall (threshold = 0.863636363636)   : 0.887656789675
Precision (threshold = 0.868686868687): 0.965510406343
Recall (threshold = 0.868686868687)   : 0.885475368115
Precision (threshold = 0.873737373737): 0.966037735849
Recall (threshold = 0.873737373737)   : 0.884202872205
Precision (threshold = 0.878787878788): 0.966155683854
Recall (threshold = 0.878787878788)   : 0.882203235775
Precision (threshold = 0.883838383838): 0.966839792249
Recall (threshold = 0.883838383838)   : 0.879840029086
Precision (threshold = 0.888888888889): 0.967935871743
Recall (threshold = 0.888888888889)   : 0.878022177786
Precision (threshold = 0.893939393939): 0.968072289157
Recall (threshold = 0.893939393939)   : 0.876386111616
Precision (threshold = 0.89898989899): 0.968014484007
Recall (threshold = 0.89898989899)   : 0.874750045446
Precision (threshold = 0.90404040404): 0.967911200807
Recall (threshold = 0.90404040404)   : 0.871841483367
Precision (threshold = 0.909090909091): 0.967859308672
Recall (threshold = 0.909090909091)   : 0.870387202327
Precision (threshold = 0.914141414141): 0.968154158215
Recall (threshold = 0.914141414141)   : 0.867660425377
Precision (threshold = 0.919191919192): 0.968470301058
Recall (threshold = 0.919191919192)   : 0.865479003817
Precision (threshold = 0.924242424242): 0.969139587165
Recall (threshold = 0.924242424242)   : 0.862025086348
Precision (threshold = 0.929292929293): 0.969771745836
Recall (threshold = 0.929292929293)   : 0.857298672969
Precision (threshold = 0.934343434343): 0.971050454921
Recall (threshold = 0.934343434343)   : 0.853662970369
Precision (threshold = 0.939393939394): 0.971226021685
Recall (threshold = 0.939393939394)   : 0.84675513543
Precision (threshold = 0.944444444444): 0.971476510067
Recall (threshold = 0.944444444444)   : 0.842028722051
Precision (threshold = 0.949494949495): 0.972245762712
Recall (threshold = 0.949494949495)   : 0.834211961462
Precision (threshold = 0.954545454545): 0.972418216806
Recall (threshold = 0.954545454545)   : 0.826758771133
Precision (threshold = 0.959595959596): 0.973411154345
Recall (threshold = 0.959595959596)   : 0.818578440284
Precision (threshold = 0.964646464646): 0.974202011369
Recall (threshold = 0.964646464646)   : 0.810034539175
Precision (threshold = 0.969696969697): 0.97500552975
Recall (threshold = 0.969696969697)   : 0.801308852936
Precision (threshold = 0.974747474747): 0.975100942127
Recall (threshold = 0.974747474747)   : 0.790219960007
Precision (threshold = 0.979797979798): 0.976659038902
Recall (threshold = 0.979797979798)   : 0.775858934739
Precision (threshold = 0.984848484848): 0.979048964218
Recall (threshold = 0.984848484848)   : 0.756044355572
Precision (threshold = 0.989898989899): 0.980103168755
Recall (threshold = 0.989898989899)   : 0.725322668606
Precision (threshold = 0.994949494949): 0.984425349087
Recall (threshold = 0.994949494949)   : 0.666424286493
Precision (threshold = 1.0): 1.0
Recall (threshold = 1.0)   : 0.00290856207962

Quiz Question: Among all the threshold values tried, what is the smallest threshold value that achieves a precision of 96.5% or better for the reviews of data in baby_reviews? Round your answer to 3 decimal places.

0.863636363636

Quiz Question: Is this threshold value smaller or larger than the threshold used for the entire dataset to achieve the same specified precision of 96.5%?

Finally, let's plot the precision recall curve.



In [41]:

    
plot_pr_curve(precision_all, recall_all, "Precision-Recall (Baby)")



In [ ]:

name	review	rating	word_count	sentiment
Planetwise Wipe Pouch	it came early and was not disappointed. i love ...	5.0	{'and': 3, 'love': 1, 'it': 3, 'highly': 1, ...	1
Annas Dream Full Quilt with 2 Shams ...	Very soft and comfortable and warmer than it ...	5.0	{'and': 2, 'quilt': 1, 'it': 1, 'comfortable': ...	1
Stop Pacifier Sucking without tears with ...	This is a product well worth the purchase. I ...	5.0	{'and': 3, 'ingenious': 1, 'love': 2, 'what': 1, ...	1
Stop Pacifier Sucking without tears with ...	All of my kids have cried non-stop when I tried to ...	5.0	{'and': 2, 'all': 2, 'help': 1, 'cried': 1, ...	1
Stop Pacifier Sucking without tears with ...	When the Binky Fairy came to our house, we didn't ...	5.0	{'and': 2, 'this': 2, 'her': 1, 'help': 2, ...	1
A Tale of Baby's Days with Peter Rabbit ...	Lovely book, it's bound tightly so you may no ...	4.0	{'shop': 1, 'noble': 1, 'is': 1, 'it': 1, 'as': ...	1
Baby Tracker® - Daily Childcare Journal, ...	Perfect for new parents. We were able to keep ...	5.0	{'and': 2, 'all': 1, 'right': 1, 'had': 1, ...	1
Baby Tracker® - Daily Childcare Journal, ...	A friend of mine pinned this product on Pinte ...	5.0	{'and': 1, 'fantastic': 1, 'help': 1, 'give': 1, ...	1
Baby Tracker® - Daily Childcare Journal, ...	This has been an easy way for my nanny to record ...	4.0	{'all': 1, 'standarad': 1, 'another': 1, 'when': ...	1
Baby Tracker® - Daily Childcare Journal, ...	I love this journal and our nanny uses it ...	4.0	{'all': 2, 'nannys': 1, 'just': 1, 'food': 1, ...	1