notebook.community

Edit and run



In [1]:

    
%run evaluate_poi_identifier.py









    



/Users/sunnyamrat/anaconda/envs/python2/lib/python2.7/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)



In [2]:

    
len(features_test)









    Out[2]:





29



In [3]:

    
pred = clf.predict(features_test, labels_test)

Number of POIs in test set



In [16]:

    
s = sum(pred)
s









    Out[16]:





4.0

Number of people in test set



In [17]:

    
l = len(pred)
l









    Out[17]:





29



In [12]:

    
score = clf.score(features_test, labels_test)
score









    Out[12]:





0.72413793103448276

If your identifier predicted 0. (not POI) for everyone in the test set, what would its accuracy be?

Assume that all the labels_test are 0.0



In [28]:

    
score1 = clf.score(features_test, [0.0 for i in range(l)])
score1









    Out[28]:





0.86206896551724133

For a true positive (defined as 1.0 in both test and prediction), when we add the arrays, we should see some 2.0s. However as you seen below, there are no 2.0s and so there are no true positives



In [36]:

    
labels_test + pred









    Out[36]:





array([ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  1.,  0.,  1.,  1.,
        0.,  1.,  0.])



In [38]:

    
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score



In [39]:

    
precision_score(labels_test, pred)









    Out[39]:





0.0



In [40]:

    
recall_score(labels_test, pred)









    Out[40]:





0.0

How many True Positives?

Creating numpy arrays to allow vector arithmetic synactic sugar and also express values as floats



In [44]:

    
predictions = np.array([0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1], dtype=float)
true_labels = np.array([0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0], dtype=float)



In [45]:

    
predictions + true_labels









    Out[45]:





array([ 0.,  1.,  1.,  0.,  0.,  0.,  2.,  0.,  2.,  1.,  0.,  2.,  0.,
        1.,  2.,  2.,  0.,  2.,  0.,  1.])

False positives and negatives



In [48]:

    
predictions - true_labels









    Out[48]:





array([ 0.,  1.,  1.,  0.,  0.,  0.,  0.,  0.,  0., -1.,  0.,  0.,  0.,
       -1.,  0.,  0.,  0.,  0.,  0.,  1.])



In [49]:

    
precision_score(true_labels, predictions)









    Out[49]:





0.66666666666666663



In [50]:

    
recall_score(true_labels, predictions)









    Out[50]:





0.75



In [ ]: