In [1]:
%run evaluate_poi_identifier.py


/Users/sunnyamrat/anaconda/envs/python2/lib/python2.7/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)

In [2]:
len(features_test)


Out[2]:
29

In [3]:
pred = clf.predict(features_test, labels_test)

Number of POIs in test set


In [16]:
s = sum(pred)
s


Out[16]:
4.0

Number of people in test set


In [17]:
l = len(pred)
l


Out[17]:
29

In [12]:
score = clf.score(features_test, labels_test)
score


Out[12]:
0.72413793103448276

If your identifier predicted 0. (not POI) for everyone in the test set, what would its accuracy be?

Assume that all the labels_test are 0.0


In [28]:
score1 = clf.score(features_test, [0.0 for i in range(l)])
score1


Out[28]:
0.86206896551724133

For a true positive (defined as 1.0 in both test and prediction), when we add the arrays, we should see some 2.0s. However as you seen below, there are no 2.0s and so there are no true positives


In [36]:
labels_test + pred


Out[36]:
array([ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  1.,  0.,  1.,  1.,
        0.,  1.,  0.])

In [38]:
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score

In [39]:
precision_score(labels_test, pred)


Out[39]:
0.0

In [40]:
recall_score(labels_test, pred)


Out[40]:
0.0

How many True Positives?

Creating numpy arrays to allow vector arithmetic synactic sugar and also express values as floats


In [44]:
predictions = np.array([0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1], dtype=float)
true_labels = np.array([0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0], dtype=float)

In [45]:
predictions + true_labels


Out[45]:
array([ 0.,  1.,  1.,  0.,  0.,  0.,  2.,  0.,  2.,  1.,  0.,  2.,  0.,
        1.,  2.,  2.,  0.,  2.,  0.,  1.])

False positives and negatives


In [48]:
predictions - true_labels


Out[48]:
array([ 0.,  1.,  1.,  0.,  0.,  0.,  0.,  0.,  0., -1.,  0.,  0.,  0.,
       -1.,  0.,  0.,  0.,  0.,  0.,  1.])

In [49]:
precision_score(true_labels, predictions)


Out[49]:
0.66666666666666663

In [50]:
recall_score(true_labels, predictions)


Out[50]:
0.75

In [ ]: