In [1]:
%run evaluate_poi_identifier.py
In [2]:
len(features_test)
Out[2]:
In [3]:
pred = clf.predict(features_test, labels_test)
Number of POIs in test set
In [16]:
s = sum(pred)
s
Out[16]:
Number of people in test set
In [17]:
l = len(pred)
l
Out[17]:
In [12]:
score = clf.score(features_test, labels_test)
score
Out[12]:
If your identifier predicted 0. (not POI) for everyone in the test set, what would its accuracy be?
Assume that all the labels_test are 0.0
In [28]:
score1 = clf.score(features_test, [0.0 for i in range(l)])
score1
Out[28]:
For a true positive (defined as 1.0 in both test and prediction), when we add the arrays, we should see some 2.0s. However as you seen below, there are no 2.0s and so there are no true positives
In [36]:
labels_test + pred
Out[36]:
In [38]:
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
In [39]:
precision_score(labels_test, pred)
Out[39]:
In [40]:
recall_score(labels_test, pred)
Out[40]:
How many True Positives?
Creating numpy arrays to allow vector arithmetic synactic sugar and also express values as floats
In [44]:
predictions = np.array([0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1], dtype=float)
true_labels = np.array([0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0], dtype=float)
In [45]:
predictions + true_labels
Out[45]:
False positives and negatives
In [48]:
predictions - true_labels
Out[48]:
In [49]:
precision_score(true_labels, predictions)
Out[49]:
In [50]:
recall_score(true_labels, predictions)
Out[50]:
In [ ]: