The performance of binary classifier tools is typically measured on a test set of data, which can be called a member of the class (positive
) or not a member (negative
). Calculations of four values can be made:
positive
examples that the classifer correctly assigns as positive
negative
examples that the classifer correctly assigns as negative
negative
examples that the classifer incorrectly assigns as positive
positive
examples that the classifer incorrectly assigns as negative
These are often represented as contingency tables or a confusion matrix:
These values can be combined into a number of useful performance metrics that summarise the classifier's ability to perform particular tasks.
positive
examples that are correctly classifiednegative
examples that are correctly classifiednegative
examples incorrectly classed as positive
.Denoting the baseline occurrence, or baseline frequency (the proportion of all proteins that are effectors) with which a protein may be an effector as $f_{x}$:
$$f_{x} = 0.01 \implies P(\textrm{effector}|\textrm{+ve}) = 0.490 \approx 0.5$$ $$f_{x} = 0.8 \implies P(\textrm{effector}|\textrm{+ve}) = 0.997 \approx 1.0$$
With:
positive
examples (the baserate)negative
examplespositive
, if the example is positive
(TPR, Sn)positive
, if the example is negative
(FPR)The probability that an example is positive
, given the classifier says that it is positive
is $P(\textrm{positive}|+)$ and can be calculated:
In [ ]:
# Import Python libraries
%matplotlib inline
import seaborn as sns # This produces pretty graphical output
import tools.classifier as tc # This lets us plot some classifer-specific visualisation
# Define sensitivity and FPR
sn = 0.90 # sensitivity
fpr = 0.05 # false positive rate
# Define baserate (frequency of positive examples)
baserate = 0.3
# Static plot of
tc.plot_prob_effector(sn, fpr, baserate);
In the plot above, we see the effector classifier's response curve (red line) as a function of baserate, assuming it has a 90% sensitivity, and a 5% false positive rate.
The black arrow points at a particular response rate - when the baserate of positives in the population is 30%. At this point, any positive classification has about 89% probability of really being a positive example.
In their paper, the authors identify hundreds of type III effectors in genomes that possess no annotated type III secretion system (over 10% of the complete protein complement, in some cases). They note:
The surprisingly high number of (false) positives in genomes without TTSS exceeds the expected false positive rate (Table 1) and thus raised questions about their nature.
In [ ]:
# Import Python libraries
from ipywidgets import interact, FloatSlider # for interactive widgets
# Define sensitivity and FPR
sn = 0.90 # sensitivity
fpr = 0.05 # false positive rate
# Define baserate (frequency of positive examples)
baserate = 0.3
# Create an interactive plot
interact(tc.plot_prob_effector,
sens=FloatSlider(min=0.01, max=0.99, step=0.01, value=sn),
fpr=FloatSlider(min=0.01, max=0.99, step=0.01, value=fpr),
baserate=FloatSlider(min=0.01, max=0.99, step=0.01, value=baserate),
xmax=FloatSlider(min=0.1, max=1, step=0.1, value=1));
Predictors and classifiers identify groups of positive/negative examples, not individual members of the group. For example, if a test for smugglers at an airport has $P(\textrm{smuggler}|+) = 0.9$ and 100 potential smugglers are identified, how do we tell which 10 smugglers are wrongly identified? We always need more evidence to distinguish within the predicted group members.
If there are a set of criteria that an example must meet in order to be a member of a class, then excluding all examples that do not meet these criteria reduces the scope for false positives, and raises the baserate, increasing the probability that a positive classification implies a positive example.