prelim_month - confusion matrix

original title: 2017.09.20 - work log - prelim_month - confusion matrix
original file name: 2017.09.20-work_log-prelim_month-confusion_matrix.ipynb

1 Setup
2 Build Confusion Matrix Data
3 Notes
4 NEXT
5 TODO
- 5.1 TODO - filter articles that are not news
- 5.2 Debugging
6 DONE
7 Appendix - Build confusion matrix
- - 7.0.1 create confusion matrix
  - 7.0.2 derive confusion outputs
- 7.1 Create unit tests

Setup

Back to Table of Contents

Setup - Imports

Back to Table of Contents



In [1]:

    
import datetime
import math
import pandas
import pandas_ml
import sklearn
import sklearn.metrics
import six
import statsmodels
import statsmodels.api

print( "packages imported at " + str( datetime.datetime.now() ) )









    



/home/jonathanmorgan/.virtualenvs/sourcenet/lib/python3.5/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
  from pandas.core import datetools






    



packages imported at 2017-10-06 22:34:28.889769



In [2]:

    
%pwd









    Out[2]:





'/home/jonathanmorgan/work/django/research/work/phd_work'

Setup - Initialize Django

Back to Table of Contents

First, initialize my dev django project, so I can run code in this notebook that references my django models and can talk to the database using my project's settings.

You need to have installed your virtualenv with django as a kernel, then select that kernel for this notebook.



In [3]:

    
%run ../django_init.py









    



django initialized at 2017-10-07 02:34:31.889621

Import any sourcenet or context_analysis models or classes.



In [4]:

    
# python_utilities
from python_utilities.analysis.statistics.confusion_matrix_helper import ConfusionMatrixHelper
from python_utilities.analysis.statistics.stats_helper import StatsHelper
from python_utilities.dictionaries.dict_helper import DictHelper

# context_analysis models.
from context_analysis.models import Reliability_Names

print( "sourcenet and context_analysis packages imported at " + str( datetime.datetime.now() ) )









    



sourcenet and context_analysis packages imported at 2017-10-07 02:34:33.266765

Setup - Tools

Back to Table of Contents

Write functions here to do math, so that we can reuse said tools below.

Build Confusion Matrix Data

Back to Table of Contents

A basic confusion matrix ( https://en.wikipedia.org/wiki/Confusion_matrix ) contains counts of true positives, true negatives, false positives, and false negatives for a given binary or boolean (yes/no) classification decision you are asking someone or something to make.

To create a confusion matrix, you need two associated vectors containing classification decisions (0s and 1s), one that contains ground truth, and one that contains values predicted by whatever coder you are testing. For each associated pair of values:

Start with the predicted value: positive (1) or negative (0).
Look at the corresponding ground truth value. If they match, it is "true". If not, it is "false".
So, predicted 1 and ground_truth 1 is a "true positive".
Add one to the counter for the class of prediction: "true positive", "true negative", "false positive", "false negative".

Once you have your basic confusion matrix, the counts of true positives, true negatives, false positives, and false negatives can then be used to calculate a set of different scores and values one can use to assess the quality of predictive models. These scores include "precision and recall", "accuracy", an "F1 score" (a harmonic mean), and a "diagnostic odds ratio", among many others.

Person detection

Back to Table of Contents

For each person detected across the set of articles, look at whether the automated coder correctly detected the person, independent of eventual lookup or person type.

Person detection - build value lists

Back to Table of Contents

First, build lists of ground truth and predicted values per person.



In [5]:

    
# declare variables
reliability_names_label = None
label_in_list = []
reliability_names_qs = None
ground_truth_coder_index = 1
predicted_coder_index = 2

# processing
column_name = ""
predicted_value = -1
predicted_list = []
ground_truth_value = -1
ground_truth_list = []
reliability_names_instance = None

# set label
reliability_names_label = "prelim_month"

# lookup Reliability_Names for selected label
label_in_list.append( reliability_names_label )
reliability_names_qs = Reliability_Names.objects.filter( label__in = label_in_list )

print( "Found " + str( reliability_names_qs.count() ) + " rows with label in " + str( label_in_list ) )

# loop over records
predicted_value = -1
predicted_list = []
ground_truth_value = -1
ground_truth_list = []
ground_truth_positive_count = 0
predicted_positive_count = 0
true_positive_count = 0
false_positive_count = 0
ground_truth_negative_count = 0
predicted_negative_count = 0
true_negative_count = 0
false_negative_count = 0
for reliability_names_instance in reliability_names_qs:
    
    # get detected flag from ground truth and predicted columns and add them to list.
    
    # ==> ground truth
    column_name = Reliability_Names.FIELD_NAME_PREFIX_CODER
    column_name += str( ground_truth_coder_index )
    column_name += "_" + Reliability_Names.FIELD_NAME_SUFFIX_DETECTED
    ground_truth_value = getattr( reliability_names_instance, column_name )
    ground_truth_list.append( ground_truth_value )
    
    # ==> predicted
    column_name = Reliability_Names.FIELD_NAME_PREFIX_CODER
    column_name += str( predicted_coder_index )
    column_name += "_" + Reliability_Names.FIELD_NAME_SUFFIX_DETECTED
    predicted_value = getattr( reliability_names_instance, column_name )
    predicted_list.append( predicted_value )
    
#-- END loop over Reliability_Names instances. --#

print( "==> population values count: " + str( len( ground_truth_list ) ) )
print( "==> predicted values count: " + str( len( predicted_list ) ) )
print( "==> percentage agreement = " + str( StatsHelper.percentage_agreement( ground_truth_list, predicted_list ) ) )









    



Found 2446 rows with label in ['prelim_month']
==> population values count: 2446
==> predicted values count: 2446
==> percentage agreement = 0.946443172527



In [6]:

    
print( "==> population values: " + str( len( ground_truth_list ) ) )
list_name = "ACTUAL_VALUE_LIST"
string_list = map( str, ground_truth_list )
list_values = ", ".join( string_list )
print( list_name + " = [ " + list_values + " ]" )

print( "==> predicted values count: " + str( len( predicted_list ) ) )
list_name = "PREDICTED_VALUE_LIST"
string_list = map( str, predicted_list )
list_values = ", ".join( string_list )
print( list_name + " = [ " + list_values + " ]" )









    



==> population values: 2446
ACTUAL_VALUE_LIST = [ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ]
==> predicted values count: 2446
PREDICTED_VALUE_LIST = [ 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ]

Person detection - confusion matrix



In [14]:

    
confusion_matrix = pandas_ml.ConfusionMatrix( ground_truth_list, predicted_list )
print("Confusion matrix:\n%s" % confusion_matrix)
confusion_matrix.print_stats()
stats_dict = confusion_matrix.stats()
print( str( stats_dict ) )
print( str( stats_dict[ 'TPR' ] ) )

# get counts in variables
true_positive_count = confusion_matrix.TP
false_positive_count = confusion_matrix.FP
true_negative_count = confusion_matrix.TN
false_negative_count = confusion_matrix.FN

# and derive population and predicted counts
ground_truth_positive_count = true_positive_count + false_negative_count
predicted_positive_count = true_positive_count + false_positive_count
ground_truth_negative_count = true_negative_count + false_positive_count
predicted_negative_count = true_negative_count + false_negative_count


print( "==> Predicted positives: " + str( predicted_positive_count ) + " ( " + str( ( true_positive_count + false_positive_count ) ) + " )" )
print( "==> Ground truth positives: " + str( ground_truth_positive_count )  + " ( " + str( ( true_positive_count + false_negative_count ) ) + " )" )
print( "==> True positives: " + str( true_positive_count ) )
print( "==> False positives: " + str( false_positive_count ) )
print( "==> Predicted negatives: " + str( predicted_negative_count ) + " ( " + str( ( true_negative_count + false_negative_count ) ) + " )" )
print( "==> Ground truth negatives: " + str( ground_truth_negative_count ) + " ( " + str( ( true_negative_count + false_positive_count ) ) + " )" )
print( "==> True negatives: " + str( true_negative_count ) )
print( "==> False negatives: " + str( false_negative_count ) )
print( "==> Precision (true positive/predicted positive): " + str( ( true_positive_count / predicted_positive_count ) ) )
print( "==> Recall (true positive/ground truth positive): " + str( ( true_positive_count / ground_truth_positive_count ) ) )









    



Confusion matrix:
Predicted   0     1  __all__
Actual                      
0           0    68       68
1          63  2315     2378
__all__    63  2383     2446
population: 2446
P: 2378
N: 68
PositiveTest: 2383
NegativeTest: 63
TP: 2315
TN: 0
FP: 68
FN: 63
TPR: 0.973507148865
TNR: 0.0
PPV: 0.971464540495
NPV: 0.0
FPR: 1.0
FDR: 0.0285354595048
FNR: 0.0264928511354
ACC: 0.946443172527
F1_score: 0.972484772107
MCC: -0.0274951937753
informedness: -0.0264928511354
markedness: -0.0285354595048
prevalence: 0.972199509403
LRP: 0.973507148865
LRN: inf
DOR: 0.0
FOR: 1.0
OrderedDict([('population', 2446), ('P', 2378), ('N', 68), ('PositiveTest', 2383), ('NegativeTest', 63), ('TP', 2315), ('TN', 0), ('FP', 68), ('FN', 63), ('TPR', 0.97350714886459211), ('TNR', 0.0), ('PPV', 0.97146454049517417), ('NPV', 0.0), ('FPR', 1.0), ('FDR', 0.02853545950482585), ('FNR', 0.026492851135407905), ('ACC', 0.94644317252657395), ('F1_score', 0.97248477210670026), ('MCC', -0.027495193775309384), ('informedness', -0.026492851135407891), ('markedness', -0.028535459504825833), ('prevalence', 0.97219950940310706), ('LRP', 0.97350714886459211), ('LRN', inf), ('DOR', 0.0), ('FOR', 1.0)])
0.973507148865
==> Predicted positives: 2383 ( 2383 )
==> Ground truth positives: 2378 ( 2378 )
==> True positives: 2315
==> False positives: 68
==> Predicted negatives: 63 ( 63 )
==> Ground truth negatives: 68 ( 68 )
==> True negatives: 0
==> False negatives: 63
==> Precision (true positive/predicted positive): 0.971464540495
==> Recall (true positive/ground truth positive): 0.973507148865






    



/home/jonathanmorgan/.virtualenvs/sourcenet/lib/python3.5/site-packages/pandas_ml/confusion_matrix/bcm.py:339: RuntimeWarning: divide by zero encountered in double_scalars
  return(np.float64(self.FNR) / self.TNR)



In [40]:

    
confusion_helper = ConfusionMatrixHelper.populate_confusion_matrix( ground_truth_list, predicted_list )
print( str( confusion_helper ) )









    



ConfusionMatrixHelper --> 
DictHelper --> 
----> ACC : 0.946443172526574
----> BM : -0.02649285113540789
----> DOR : None
----> FDR : 0.02853545950482585
----> FNR : 0.026492851135407905
----> FOR : 1.0
----> FPR : 1.0
----> LR+ : 0.9735071488645921
----> LR- : None
----> MCC : -0.027495193775309384
----> MK : -0.028535459504825833
----> NPV : 0.0
----> PPV : 0.9714645404951742
----> SPC : 0.0
----> TNR : 0.0
----> TPR : 0.9735071488645921
----> accuracy : 0.946443172526574
----> diagnostic_odds_ratio : None
----> f1_score : 0.9724847721067004
----> false_discovery_rate : 0.02853545950482585
----> false_negative : 63
----> false_negative_rate : 0.026492851135407905
----> false_omission_rate : 1.0
----> false_positive : 68
----> false_positive_rate : 1.0
----> informedness : -0.02649285113540789
----> markedness : -0.028535459504825833
----> matthews_correlation_coefficient : -0.027495193775309384
----> negative_likelihood_ratio : None
----> negative_predictive_value : 0.0
----> population_negative : 68
----> population_positive : 2378
----> positive_likelihood_ratio : 0.9735071488645921
----> precision : 0.9714645404951742
----> predicted_negative : 63
----> predicted_positive : 2383
----> recall : 0.9735071488645921
----> specificity : 0.0
----> total_population : 2446
----> true_negative : 0
----> true_negative_rate : 0.0
----> true_positive : 2315

Person lookup

Back to Table of Contents

For each person detected across the set of articles, look at whether the automated coder correctly looked up the person (so compare person IDs).

Person lookup - build value lists

Back to Table of Contents

First, build lists of ground truth and predicted values per person.



In [41]:

    
# declare variables
reliability_names_label = None
label_in_list = []
reliability_names_qs = None
ground_truth_coder_index = 1
predicted_coder_index = 2

# processing
column_name = ""
predicted_value = -1
predicted_list = []
ground_truth_value = -1
ground_truth_list = []
reliability_names_instance = None

# set label
reliability_names_label = "prelim_month"

# lookup Reliability_Names for selected label
label_in_list.append( reliability_names_label )
reliability_names_qs = Reliability_Names.objects.filter( label__in = label_in_list )

print( "Found " + str( reliability_names_qs.count() ) + " rows with label in " + str( label_in_list ) )

# loop over records
predicted_value = -1
predicted_list = []
ground_truth_value = -1
ground_truth_list = []
ground_truth_positive_count = 0
predicted_positive_count = 0
true_positive_count = 0
false_positive_count = 0
ground_truth_negative_count = 0
predicted_negative_count = 0
true_negative_count = 0
false_negative_count = 0
for reliability_names_instance in reliability_names_qs:
    
    # get person_id from ground truth and predicted columns and add them to list.
    
    # ==> ground truth
    column_name = Reliability_Names.FIELD_NAME_PREFIX_CODER
    column_name += str( ground_truth_coder_index )
    column_name += "_" + Reliability_Names.FIELD_NAME_SUFFIX_PERSON_ID
    ground_truth_value = getattr( reliability_names_instance, column_name )
    ground_truth_list.append( ground_truth_value )
    
    # ==> predicted
    column_name = Reliability_Names.FIELD_NAME_PREFIX_CODER
    column_name += str( predicted_coder_index )
    column_name += "_" + Reliability_Names.FIELD_NAME_SUFFIX_PERSON_ID
    predicted_value = getattr( reliability_names_instance, column_name )
    predicted_list.append( predicted_value )
    
#-- END loop over Reliability_Names instances. --#

print( "==> population values count: " + str( len( ground_truth_list ) ) )
print( "==> predicted values count: " + str( len( predicted_list ) ) )
print( "==> percentage agreement = " + str( StatsHelper.percentage_agreement( ground_truth_list, predicted_list ) ) )









    



Found 2446 rows with label in ['prelim_month']
==> population values count: 2446
==> predicted values count: 2446
==> percentage agreement = 0.94071954211



In [ ]:

    
print( "==> population values: " + str( len( ground_truth_list ) ) )
list_name = "ACTUAL_VALUE_LIST"
string_list = map( str, ground_truth_list )
list_values = ", ".join( string_list )
print( list_name + " = [ " + list_values + " ]" )

print( "==> predicted values count: " + str( len( predicted_list ) ) )
list_name = "PREDICTED_VALUE_LIST"
string_list = map( str, predicted_list )
list_values = ", ".join( string_list )
print( list_name + " = [ " + list_values + " ]" )

Person lookup - confusion matrix



In [42]:

    
confusion_helper = ConfusionMatrixHelper.populate_confusion_matrix( ground_truth_list, predicted_list )
print( str( confusion_helper ) )









    



ConfusionMatrixHelper --> 
DictHelper --> 
----> ACC : 0.9407195421095667
----> BM : -0.03238015138772077
----> DOR : None
----> FDR : 0.03441040704993705
----> FNR : 0.026492851135407905
----> FOR : 1.0
----> FPR : 1.2058823529411764
----> LR+ : 0.802416459824817
----> LR- : None
----> MCC : -0.03028248031907182
----> MK : -0.03441040704993703
----> NPV : 0.0
----> PPV : 0.965589592950063
----> SPC : 0.0
----> TNR : 0.0
----> TPR : 0.9676198486122792
----> accuracy : 0.9407195421095667
----> diagnostic_odds_ratio : None
----> f1_score : 0.9666036546943921
----> false_discovery_rate : 0.03441040704993705
----> false_negative : 63
----> false_negative_rate : 0.026492851135407905
----> false_omission_rate : 1.0
----> false_positive : 82
----> false_positive_rate : 1.2058823529411764
----> informedness : -0.03238015138772077
----> markedness : -0.03441040704993703
----> matthews_correlation_coefficient : -0.03028248031907182
----> negative_likelihood_ratio : None
----> negative_predictive_value : 0.0
----> population_negative : 68
----> population_positive : 2378
----> positive_likelihood_ratio : 0.802416459824817
----> precision : 0.965589592950063
----> predicted_negative : 63
----> predicted_positive : 2383
----> recall : 0.9676198486122792
----> specificity : 0.0
----> total_population : 2446
----> true_negative : 0
----> true_negative_rate : 0.0
----> true_positive : 2301

Person Types

For each person type, will build binary lists of yes or no where each person type value will in turn be the value of interest, and positive or negative is whether the coder found the current person to be of that type (positive/1) or any other type (negative/0).

Function: build confusion lists for a given categorical value



In [22]:

    
def build_confusion_lists( column_name_suffix_IN,
                           desired_value_IN,
                           label_list_IN = [ "prelim_month", ],
                           ground_truth_coder_index_IN = 1,
                           predicted_coder_index_IN = 2,
                           debug_flag_IN = False ):

    '''
    Accepts suffix of column name of interest and desired value.  Also accepts optional labels
        list, indexes of ground_truth and predicted coder users, and a debug flag.  Uses these
        values to loop over records whose label matches the on in the list passed in.  For each,
        in the specified column, checks to see if the ground_truth and predicted values match
        the desired value.  If so, positive, so 1 is stored for the row.  If no, negative, so 0
        is stored for the row.
        
    Returns dictionary with value lists inside, ground truth values list mapped to key
        "ground_truth" and predicted values list mapped to key "predicted".
    '''
    
    # return reference
    lists_OUT = {}

    # declare variables
    reliability_names_label = None
    label_in_list = []
    reliability_names_qs = None
    ground_truth_coder_index = -1
    predicted_coder_index = -1

    # processing
    debug_flag = False
    desired_column_suffix = None
    desired_value = None
    ground_truth_column_name = None
    ground_truth_column_value = None
    ground_truth_value = -1
    ground_truth_list = []
    predicted_column_name = None
    predicted_column_value = None
    predicted_value = -1
    predicted_list = []
    reliability_names_instance = None

    # got required values?
    
    # column name suffix?
    if ( column_name_suffix_IN is not None ):
        
        # desired value?
        if ( desired_value_IN is not None ):
    
            # ==> initialize
            desired_column_suffix = column_name_suffix_IN
            desired_value = desired_value_IN
            label_in_list = label_list_IN
            ground_truth_coder_index = ground_truth_coder_index_IN
            predicted_coder_index = predicted_coder_index_IN
            debug_flag = debug_flag_IN
            
            # create ground truth column name
            ground_truth_column_name = Reliability_Names.FIELD_NAME_PREFIX_CODER
            ground_truth_column_name += str( ground_truth_coder_index )
            ground_truth_column_name += "_" + desired_column_suffix

            # create predicted column name.
            predicted_column_name = Reliability_Names.FIELD_NAME_PREFIX_CODER
            predicted_column_name += str( predicted_coder_index )
            predicted_column_name += "_" + desired_column_suffix

            # ==> processing
            
            # lookup Reliability_Names for selected label(s)
            reliability_names_qs = Reliability_Names.objects.filter( label__in = label_in_list )

            print( "Found " + str( reliability_names_qs.count() ) + " rows with label in " + str( label_in_list ) )

            # reset all lists and values.
            ground_truth_column_value = ""
            ground_truth_value = -1
            ground_truth_list = []
            predicted_column_value = ""
            predicted_value = -1
            predicted_list = []
            
            # loop over records to build ground_truth and predicted value lists
            #     where 1 = value matching desired value in multi-value categorical
            #     variable and 0 = any value other than the desired value.
            for reliability_names_instance in reliability_names_qs:

                # get detected flag from ground truth and predicted columns and add them to list.

                # ==> ground truth

                # get column value.
                ground_truth_column_value = getattr( reliability_names_instance, ground_truth_column_name )

                # does it match desired value?
                if ( ground_truth_column_value == desired_value ):

                    # it does - True (or positive or 1!)!
                    ground_truth_value = 1

                else:

                    # it does not - False (or negative or 0!)!
                    ground_truth_value = 0

                #-- END check to see if current value matches desired value. --#

                # add value to list.
                ground_truth_list.append( ground_truth_value )

                # ==> predicted

                # get column value.
                predicted_column_value = getattr( reliability_names_instance, predicted_column_name )

                # does it match desired value?
                if ( predicted_column_value == desired_value ):

                    # it does - True (or positive or 1!)!
                    predicted_value = 1

                else:

                    # it does not - False (or negative or 0!)!
                    predicted_value = 0

                #-- END check to see if current value matches desired value. --#

                # add to predicted list.
                predicted_list.append( predicted_value )

                if ( debug_flag == True ):        
                    print( "----> gt: " + str( ground_truth_column_value ) + " ( " + str( ground_truth_value ) + " ) - p: " + str( predicted_column_value ) + " ( " + str( predicted_value ) + " )" )
                #-- END DEBUG --#

            #-- END loop over Reliability_Names instances. --#
            
        else:
            
            print( "ERROR - you must specify a desired value." )
            
        #-- END check to see if desired value passed in. --#
        
    else:
        
        print( "ERROR - you must provide the suffix of the column you want to examine." )

    #-- END check to see if column name suffix passed in. --#
    
    # package up and return lists.
    lists_OUT[ "ground_truth" ] = ground_truth_list
    lists_OUT[ "predicted" ] = predicted_list
    
    return lists_OUT
    
#-- END function build_confusion_lists() --#

print( "Function build_confusion_lists() defined at " + str( datetime.datetime.now() ) )









    



Function build_confusion_lists() defined at 2017-10-07 03:42:44.881673

Person type - Authors

Back to Table of Contents

For each person detected across the set of articles, look at whether the automated coder assigned the correct type.

Person type - Authors - build value lists

Back to Table of Contents

First, build lists of ground truth and predicted values per person.



In [23]:

    
confusion_lists = build_confusion_lists( Reliability_Names.FIELD_NAME_SUFFIX_PERSON_TYPE,
                                         Reliability_Names.PERSON_TYPE_AUTHOR )
ground_truth_list = confusion_lists.get( "ground_truth", None )
predicted_list = confusion_lists.get( "predicted", None )

print( "==> population values count: " + str( len( ground_truth_list ) ) )
print( "==> predicted values count: " + str( len( predicted_list ) ) )
print( "==> percentage agreement = " + str( StatsHelper.percentage_agreement( ground_truth_list, predicted_list ) ) )









    



Found 2446 rows with label in ['prelim_month']
==> population values count: 2446
==> predicted values count: 2446
==> percentage agreement = 0.999182338512



In [ ]:

    
print( "==> population values: " + str( len( ground_truth_list ) ) )
list_name = "ACTUAL_VALUE_LIST"
string_list = map( str, ground_truth_list )
list_values = ", ".join( string_list )
print( list_name + " = [ " + list_values + " ]" )

print( "==> predicted values count: " + str( len( predicted_list ) ) )
list_name = "PREDICTED_VALUE_LIST"
string_list = map( str, predicted_list )
list_values = ", ".join( string_list )
print( list_name + " = [ " + list_values + " ]" )

Person type - Authors - confusion matrix



In [16]:

    
confusion_helper = ConfusionMatrixHelper.populate_confusion_matrix( ground_truth_list, predicted_list )
print( str( confusion_helper ) )









    



ConfusionMatrixHelper --> 
DictHelper --> 
----> ACC : 0.9991823385118561
----> BM : 0.9956140350877192
----> DOR : None
----> FDR : 0.0
----> FNR : 0.0043859649122807015
----> FOR : 0.001004016064257028
----> FPR : 0.0
----> LR+ : None
----> LR- : 0.0043859649122807015
----> MCC : 0.9973035759500171
----> MK : 0.998995983935743
----> NPV : 0.998995983935743
----> PPV : 1.0
----> SPC : 1.0
----> TNR : 1.0
----> TPR : 0.9956140350877193
----> accuracy : 0.9991823385118561
----> diagnostic_odds_ratio : None
----> f1_score : 0.9978021978021979
----> false_discovery_rate : 0.0
----> false_negative : 2
----> false_negative_rate : 0.0043859649122807015
----> false_omission_rate : 0.001004016064257028
----> false_positive : 0
----> false_positive_rate : 0.0
----> informedness : 0.9956140350877192
----> markedness : 0.998995983935743
----> matthews_correlation_coefficient : 0.9973035759500171
----> negative_likelihood_ratio : 0.0043859649122807015
----> negative_predictive_value : 0.998995983935743
----> population_negative : 1990
----> population_positive : 456
----> positive_likelihood_ratio : None
----> precision : 1.0
----> predicted_negative : 1992
----> predicted_positive : 454
----> recall : 0.9956140350877193
----> specificity : 1.0
----> total_population : 2446
----> true_negative : 1990
----> true_negative_rate : 1.0
----> true_positive : 454

Person type - Subjects

Back to Table of Contents

For each person detected across the set of articles classified by Ground truth as a subject, look at whether the automated coder assigned the correct person type.

Person type - Subjects - build value lists

Back to Table of Contents

First, build lists of ground truth and predicted values per person.



In [24]:

    
# subjects = "mentioned"
confusion_lists = build_confusion_lists( Reliability_Names.FIELD_NAME_SUFFIX_PERSON_TYPE,
                                         Reliability_Names.SUBJECT_TYPE_MENTIONED )
ground_truth_list = confusion_lists.get( "ground_truth", None )
predicted_list = confusion_lists.get( "predicted", None )

print( "==> population values count: " + str( len( ground_truth_list ) ) )
print( "==> predicted values count: " + str( len( predicted_list ) ) )
print( "==> percentage agreement = " + str( StatsHelper.percentage_agreement( ground_truth_list, predicted_list ) ) )









    



Found 2446 rows with label in ['prelim_month']
==> population values count: 2446
==> predicted values count: 2446
==> percentage agreement = 0.903924775143



In [ ]:

    
print( "==> population values: " + str( len( ground_truth_list ) ) )
list_name = "ACTUAL_VALUE_LIST"
string_list = map( str, ground_truth_list )
list_values = ", ".join( string_list )
print( list_name + " = [ " + list_values + " ]" )

print( "==> predicted values count: " + str( len( predicted_list ) ) )
list_name = "PREDICTED_VALUE_LIST"
string_list = map( str, predicted_list )
list_values = ", ".join( string_list )
print( list_name + " = [ " + list_values + " ]" )

Person type - Subjects - confusion matrix



In [18]:

    
confusion_helper = ConfusionMatrixHelper.populate_confusion_matrix( ground_truth_list, predicted_list )
print( str( confusion_helper ) )









    



ConfusionMatrixHelper --> 
DictHelper --> 
----> ACC : 0.9039247751430908
----> BM : 0.7959936861580661
----> DOR : 79.02504755865567
----> FDR : 0.194125159642401
----> FNR : 0.11624649859943978
----> FOR : 0.04990980156343957
----> FPR : 0.08775981524249422
----> LR+ : 10.07013858174849
----> LR- : 0.12742970605963905
----> MCC : 0.7757212114132166
----> MK : 0.7559650387941592
----> NPV : 0.9500901984365604
----> PPV : 0.8058748403575989
----> SPC : 0.9122401847575058
----> TNR : 0.9122401847575058
----> TPR : 0.8837535014005602
----> accuracy : 0.9039247751430908
----> diagnostic_odds_ratio : 79.02504755865567
----> f1_score : 0.8430193720774883
----> false_discovery_rate : 0.194125159642401
----> false_negative : 83
----> false_negative_rate : 0.11624649859943978
----> false_omission_rate : 0.04990980156343957
----> false_positive : 152
----> false_positive_rate : 0.08775981524249422
----> informedness : 0.7959936861580661
----> markedness : 0.7559650387941592
----> matthews_correlation_coefficient : 0.7757212114132166
----> negative_likelihood_ratio : 0.12742970605963905
----> negative_predictive_value : 0.9500901984365604
----> population_negative : 1732
----> population_positive : 714
----> positive_likelihood_ratio : 10.07013858174849
----> precision : 0.8058748403575989
----> predicted_negative : 1663
----> predicted_positive : 783
----> recall : 0.8837535014005602
----> specificity : 0.9122401847575058
----> total_population : 2446
----> true_negative : 1580
----> true_negative_rate : 0.9122401847575058
----> true_positive : 631

Person type - Sources

Back to Table of Contents

For each person detected across the set of articles classified by Ground truth as a source, look at whether the automated coder assigned the correct person type.

Person type - Sources - build value lists

Back to Table of Contents

First, build lists of ground truth and predicted values per person.



In [25]:

    
# subjects = "mentioned"
confusion_lists = build_confusion_lists( Reliability_Names.FIELD_NAME_SUFFIX_PERSON_TYPE,
                                         Reliability_Names.SUBJECT_TYPE_QUOTED )
ground_truth_list = confusion_lists.get( "ground_truth", None )
predicted_list = confusion_lists.get( "predicted", None )

print( "==> population values count: " + str( len( ground_truth_list ) ) )
print( "==> predicted values count: " + str( len( predicted_list ) ) )
print( "==> percentage agreement = " + str( StatsHelper.percentage_agreement( ground_truth_list, predicted_list ) ) )









    



Found 2446 rows with label in ['prelim_month']
==> population values count: 2446
==> predicted values count: 2446
==> percentage agreement = 0.92068683565



In [ ]:

    
print( "==> population values: " + str( len( ground_truth_list ) ) )
list_name = "ACTUAL_VALUE_LIST"
string_list = map( str, ground_truth_list )
list_values = ", ".join( string_list )
print( list_name + " = [ " + list_values + " ]" )

print( "==> predicted values count: " + str( len( predicted_list ) ) )
list_name = "PREDICTED_VALUE_LIST"
string_list = map( str, predicted_list )
list_values = ", ".join( string_list )
print( list_name + " = [ " + list_values + " ]" )

Person type - Sources - confusion matrix



In [26]:

    
confusion_helper = ConfusionMatrixHelper.populate_confusion_matrix( ground_truth_list,
                                                                    predicted_list,
                                                                    calc_type_IN = ConfusionMatrixHelper.CALC_TYPE_PANDAS_ML )
print( str( confusion_helper ) )









    



ConfusionMatrixHelper --> 
DictHelper --> 
----> ACC : 0.92068683565
----> BM : 0.840727941884
----> DOR : 149.829545455
----> FDR : 0.0575916230366
----> FNR : 0.105960264901
----> FOR : 0.0984615384615
----> FPR : 0.0533117932149
----> LR+ : 16.7700180614
----> LR- : 0.111927310535
----> MCC : 0.842335852611
----> MK : 0.843946838502
----> NPV : 0.901538461538
----> PPV : 0.942408376963
----> SPC : 0.946688206785
----> TNR : 0.946688206785
----> TPR : 0.894039735099
----> accuracy : 0.92068683565
----> diagnostic_odds_ratio : 149.829545455
----> f1_score : 0.917587085811
----> false_discovery_rate : 0.0575916230366
----> false_negative : 128
----> false_negative_rate : 0.105960264901
----> false_omission_rate : 0.0984615384615
----> false_positive : 66
----> false_positive_rate : 0.0533117932149
----> informedness : 0.840727941884
----> markedness : 0.843946838502
----> matthews_correlation_coefficient : 0.842335852611
----> negative_likelihood_ratio : 0.111927310535
----> negative_predictive_value : 0.901538461538
----> population_negative : 1238
----> population_positive : 1208
----> positive_likelihood_ratio : 16.7700180614
----> precision : 0.942408376963
----> predicted_negative : 1300
----> predicted_positive : 1146
----> recall : 0.894039735099
----> specificity : 0.946688206785
----> total_population : 2446
----> true_negative : 1172
----> true_negative_rate : 0.946688206785
----> true_positive : 1080

Notes

Back to Table of Contents

Back to Table of Contents

TODO

Back to Table of Contents

TODO:

add field to article table for non-news or is_hard_news.
Article 22181 - Why is the incorrect person "Christian Reformed Church" tagged as being mentioned in paragraph 14 rather than 18 where that string is?
Want a way to limit to disagreements where quoted? Might not - this is a start to assessing erroneous agreement. If yes, 1 < coding time < 4 hours.
- problem - Reliability_Names.person_type only has three values - "author", "subject", "source" - might need a row-level measure of "has_mention", "has_quote" to more readily capture rows where disagreement is over quoted-or-not.

TODO - filter articles that are not news

Back to Table of Contents

TODO:

Article 22705 - Book roundup - probably should just remove from study, and see if meta-data about articles that could be used to automatically filter these type of articles out in the future. Leaving in for now, but should flag these so I can do comparison of numbers with and without.
Use keywords for Lakeshore section stories to try to filter out sports stories ("Basketball"). Maybe try this for all articles in the month?
sports

Debugging

Back to Table of Contents

Issues to debug:

DONE

Back to Table of Contents

DONE:

build ConfusionMatrixHelper based on logic in appendix at bottom of this page.
build test case for ConfusionMatrixHelper based on lists of values from this page.

Appendix - Build confusion matrix

Back to Table of Contents

create confusion matrix

Back to Table of Contents

Use lists of population and predicted values to derive confusion matrix counts:

population positive
predicted positive
true positive
false positive
population negative
predicted negative
true negative
false negative



In [7]:

    
# loop over lists to derive counts
predicted_value = -1
ground_truth_value = -1
ground_truth_positive_count = 0
predicted_positive_count = 0
true_positive_count = 0
false_positive_count = 0
ground_truth_negative_count = 0
predicted_negative_count = 0
true_negative_count = 0
false_negative_count = 0
list_index = -1
for predicted_value in predicted_list:

    # increment index and get associated item from ground_truth_list
    list_index += 1
    ground_truth_value = ground_truth_list[ list_index ]
    
    # add to counts
    
    # ==> ground truth
    if ( ground_truth_value == 0 ):
        
        # ground truth negative
        ground_truth_negative_count += 1
        
    # not zero - so 1 (or supports other integer values)
    else:

        # ground truth positive
        ground_truth_positive_count += 1
                
    #-- END check to see if positive or negative --# 
    
    
    if ( predicted_value == 0 ):
        
        # predicted negative
        predicted_negative_count += 1
        
        # equal to ground_truth?
        if ( predicted_value == ground_truth_value ):
            
            # true negative
            true_negative_count += 1
            
        else:
            
            # false negative
            false_negative_count += 1
            
        #-- END check to see if true or false --#
        
    # not zero - so 1 (or supports other integer values)
    else:

        # predicted positive
        predicted_positive_count += 1
        
        # equal to ground_truth?
        if ( predicted_value == ground_truth_value ):
            
            # true positive
            true_positive_count += 1
            
        else:
            
            # false positive
            false_positive_count += 1
            
        #-- END check to see if true or false --#
        
    #-- END check to see if positive or negative --# 

#-- END loop over list items. --#
    
print( "==> Predicted positives: " + str( predicted_positive_count ) + " ( " + str( ( true_positive_count + false_positive_count ) ) + " )" )
print( "==> Ground truth positives: " + str( ground_truth_positive_count )  + " ( " + str( ( true_positive_count + false_negative_count ) ) + " )" )
print( "==> True positives: " + str( true_positive_count ) )
print( "==> False positives: " + str( false_positive_count ) )
print( "==> Predicted negatives: " + str( predicted_negative_count ) + " ( " + str( ( true_negative_count + false_negative_count ) ) + " )" )
print( "==> Ground truth negatives: " + str( ground_truth_negative_count ) + " ( " + str( ( true_negative_count + false_positive_count ) ) + " )" )
print( "==> True negatives: " + str( true_negative_count ) )
print( "==> False negatives: " + str( false_negative_count ) )
print( "==> Precision (true positive/predicted positive): " + str( ( true_positive_count / predicted_positive_count ) ) )
print( "==> Recall (true positive/ground truth positive): " + str( ( true_positive_count / ground_truth_positive_count ) ) )









    



==> Predicted positives: 2383 ( 2383 )
==> Ground truth positives: 2378 ( 2378 )
==> True positives: 2315
==> False positives: 68
==> Predicted negatives: 63 ( 63 )
==> Ground truth negatives: 68 ( 68 )
==> True negatives: 0
==> False negatives: 63
==> Precision (true positive/predicted positive): 0.9714645404951742
==> Recall (true positive/ground truth positive): 0.9735071488645921



In [8]:

    
# try scikit-learn: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_fscore_support.html
sklearn.metrics.precision_recall_fscore_support( ground_truth_list, predicted_list )









    Out[8]:





(array([ 0.        ,  0.97146454]),
 array([ 0.        ,  0.97350715]),
 array([ 0.        ,  0.97248477]),
 array([  68, 2378]))



In [ ]:

    
# scikit-learn confusion matrix
# http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
conf_matrix = sklearn.metrics.confusion_matrix( ground_truth_list, predicted_list )
print( str( conf_matrix ) )

# get counts in variables
true_positive_count = conf_matrix[ 1 ][ 1 ]
false_positive_count = conf_matrix[ 0 ][ 1 ]
true_negative_count = conf_matrix[ 0 ][ 0 ]
false_negative_count = conf_matrix[ 1 ][ 0 ]

# and derive population and predicted counts
ground_truth_positive_count = true_positive_count + false_negative_count
predicted_positive_count = true_positive_count + false_positive_count
ground_truth_negative_count = true_negative_count + false_positive_count
predicted_negative_count = true_negative_count + false_negative_count


print( "==> Predicted positives: " + str( predicted_positive_count ) + " ( " + str( ( true_positive_count + false_positive_count ) ) + " )" )
print( "==> Ground truth positives: " + str( ground_truth_positive_count )  + " ( " + str( ( true_positive_count + false_negative_count ) ) + " )" )
print( "==> True positives: " + str( true_positive_count ) )
print( "==> False positives: " + str( false_positive_count ) )
print( "==> Predicted negatives: " + str( predicted_negative_count ) + " ( " + str( ( true_negative_count + false_negative_count ) ) + " )" )
print( "==> Ground truth negatives: " + str( ground_truth_negative_count ) + " ( " + str( ( true_negative_count + false_positive_count ) ) + " )" )
print( "==> True negatives: " + str( true_negative_count ) )
print( "==> False negatives: " + str( false_negative_count ) )
print( "==> Precision (true positive/predicted positive): " + str( ( true_positive_count / predicted_positive_count ) ) )
print( "==> Recall (true positive/ground truth positive): " + str( ( true_positive_count / ground_truth_positive_count ) ) )



In [ ]:

    
# pandas
# https://stackoverflow.com/questions/2148543/how-to-write-a-confusion-matrix-in-python
y_actu = pandas.Series( ground_truth_list, name='Actual')
y_pred = pandas.Series( predicted_list, name='Predicted')
df_confusion = pandas.crosstab(y_actu, y_pred)
print( str( df_confusion ) )

# get counts in variables
true_positive_count = df_confusion[ 1 ][ 1 ]
false_positive_count = df_confusion[ 1 ][ 0 ]
true_negative_count = df_confusion[ 0 ][ 0 ]
false_negative_count = df_confusion[ 0 ][ 1 ]

# and derive population and predicted counts
ground_truth_positive_count = true_positive_count + false_negative_count
predicted_positive_count = true_positive_count + false_positive_count
ground_truth_negative_count = true_negative_count + false_positive_count
predicted_negative_count = true_negative_count + false_negative_count

print( "==> Predicted positives: " + str( predicted_positive_count ) + " ( " + str( ( true_positive_count + false_positive_count ) ) + " )" )
print( "==> Ground truth positives: " + str( ground_truth_positive_count )  + " ( " + str( ( true_positive_count + false_negative_count ) ) + " )" )
print( "==> True positives: " + str( true_positive_count ) )
print( "==> False positives: " + str( false_positive_count ) )
print( "==> Predicted negatives: " + str( predicted_negative_count ) + " ( " + str( ( true_negative_count + false_negative_count ) ) + " )" )
print( "==> Ground truth negatives: " + str( ground_truth_negative_count ) + " ( " + str( ( true_negative_count + false_positive_count ) ) + " )" )
print( "==> True negatives: " + str( true_negative_count ) )
print( "==> False negatives: " + str( false_negative_count ) )
print( "==> Precision (true positive/predicted positive): " + str( ( true_positive_count / predicted_positive_count ) ) )
print( "==> Recall (true positive/ground truth positive): " + str( ( true_positive_count / ground_truth_positive_count ) ) )

More to look into:

https://stackoverflow.com/questions/39626401/how-to-get-odds-ratios-and-other-related-features-with-scikit-learn#39711837

derive confusion outputs

Back to Table of Contents

Use confusion matrix to derive related metrics:

Positive predictive value (PPV), Precision
True positive rate (TPR), Recall, Sensitivity, probability of detection



In [9]:

    
# build up confusion outputs from confusion matrix.

# assume we have the following set one way or another above.
#ground_truth_positive_count = 0
#predicted_positive_count = 0
#ground_truth_negative_count = 0
#predicted_negative_count = 0
#true_positive_count = 0
#false_positive_count = 0
#true_negative_count = 0
#false_negative_count = 0

# add base measures to confusion_outputs
confusion_outputs[ "population_positive" ] = ground_truth_positive_count
confusion_outputs[ "predicted_positive" ] = predicted_positive_count
confusion_outputs[ "population_negative" ] = ground_truth_negative_count
confusion_outputs[ "predicted_negative" ] = predicted_negative_count
confusion_outputs[ "true_positive" ] = true_positive_count
confusion_outputs[ "false_positive" ] = false_positive_count
confusion_outputs[ "true_negative" ] = true_negative_count
confusion_outputs[ "false_negative" ] = false_negative_count

print( "==> Confusion outputs:" )
DictHelper.print_dict( confusion_outputs )









    



==> Confusion outputs:
----> false_negative : 63
----> false_positive : 68
----> population_negative : 68
----> population_positive : 2378
----> predicted_negative : 63
----> predicted_positive : 2383
----> true_negative : 0
----> true_positive : 2315

precision - Positive predictive value (PPV), Precision



In [10]:

    
# declare variables
precision = None

# ==> Positive predictive value (PPV), Precision
try:
    
    precision = ( true_positive_count / predicted_positive_count )

except:
    
    # error - None
    precision = None
    
#-- END check to see if Exception. --#

confusion_outputs[ "precision" ] = precision
confusion_outputs[ "PPV" ] = precision

print( "==> Positive predictive value (PPV), Precision = " + str( precision ) )
print( "==> Confusion outputs:" )
DictHelper.print_dict( confusion_outputs )









    



==> Positive predictive value (PPV), Precision = 0.9714645404951742
==> Confusion outputs:
----> PPV : 0.9714645404951742
----> false_negative : 63
----> false_positive : 68
----> population_negative : 68
----> population_positive : 2378
----> precision : 0.9714645404951742
----> predicted_negative : 63
----> predicted_positive : 2383
----> true_negative : 0
----> true_positive : 2315

recall - True positive rate (TPR), Recall, Sensitivity, probability of detection



In [11]:

    
# declare variables
recall = None

# ==> True positive rate (TPR), Recall, Sensitivity, probability of detection
try:
    
    recall = ( true_positive_count / ground_truth_positive_count )

except:
    
    # error - None
    recall = None
    
#-- END check to see if Exception. --#

confusion_outputs[ "recall" ] = recall
confusion_outputs[ "TPR" ] = recall

print( "==> True positive rate (TPR), Recall = " + str( recall ) )
print( "==> Confusion outputs:" )
DictHelper.print_dict( confusion_outputs )









    



==> True positive rate (TPR), Recall = 0.9735071488645921
==> Confusion outputs:
----> PPV : 0.9714645404951742
----> TPR : 0.9735071488645921
----> false_negative : 63
----> false_positive : 68
----> population_negative : 68
----> population_positive : 2378
----> precision : 0.9714645404951742
----> predicted_negative : 63
----> predicted_positive : 2383
----> recall : 0.9735071488645921
----> true_negative : 0
----> true_positive : 2315

FNR - False negative rate (FNR), Miss rate



In [12]:

    
# declare variables
false_negative_rate = None

# ==> False negative rate (FNR), Miss rate
try:
    
    false_negative_rate = ( false_negative_count / ground_truth_positive_count )

except:
    
    # error - None
    false_negative_rate = None
    
#-- END check to see if Exception. --#

confusion_outputs[ "false_negative_rate" ] = false_negative_rate
confusion_outputs[ "FNR" ] = false_negative_rate

print( "==> false negative rate (FNR) = " + str( false_negative_rate ) )
print( "==> Confusion outputs:" )
DictHelper.print_dict( confusion_outputs )









    



==> false negative rate (FNR) = 0.026492851135407905
==> Confusion outputs:
----> FNR : 0.026492851135407905
----> PPV : 0.9714645404951742
----> TPR : 0.9735071488645921
----> false_negative : 63
----> false_negative_rate : 0.026492851135407905
----> false_positive : 68
----> population_negative : 68
----> population_positive : 2378
----> precision : 0.9714645404951742
----> predicted_negative : 63
----> predicted_positive : 2383
----> recall : 0.9735071488645921
----> true_negative : 0
----> true_positive : 2315

FPR - False positive rate (FPR), Fall-out



In [13]:

    
# declare variables
false_positive_rate = None

# ==> False positive rate (FPR), Fall-out
try:
    
    false_positive_rate = ( false_positive_count / ground_truth_negative_count )

except:
    
    # error - None
    false_positive_rate = None
    
#-- END check to see if Exception. --#

confusion_outputs[ "false_positive_rate" ] = false_positive_rate
confusion_outputs[ "FPR" ] = false_positive_rate

print( "==> False positive rate (FPR), Fall-out = " + str( false_positive_rate ) )
print( "==> Confusion outputs:" )
DictHelper.print_dict( confusion_outputs )









    



==> False positive rate (FPR), Fall-out = 1.0
==> Confusion outputs:
----> FNR : 0.026492851135407905
----> FPR : 1.0
----> PPV : 0.9714645404951742
----> TPR : 0.9735071488645921
----> false_negative : 63
----> false_negative_rate : 0.026492851135407905
----> false_positive : 68
----> false_positive_rate : 1.0
----> population_negative : 68
----> population_positive : 2378
----> precision : 0.9714645404951742
----> predicted_negative : 63
----> predicted_positive : 2383
----> recall : 0.9735071488645921
----> true_negative : 0
----> true_positive : 2315

TNR - True negative rate (TNR), Specificity (SPC)



In [14]:

    
# declare variables
true_negative_rate = None

# ==> True negative rate (TNR), Specificity (SPC)
try:
    
    true_negative_rate = ( true_negative_count / ground_truth_negative_count )

except:
    
    # error - None
    true_negative_rate = None
    
#-- END check to see if Exception. --#

confusion_outputs[ "true_negative_rate" ] = true_negative_rate
confusion_outputs[ "TNR" ] = true_negative_rate
confusion_outputs[ "specificity" ] = true_negative_rate
confusion_outputs[ "SPC" ] = true_negative_rate

print( "==> True negative rate (TNR), Specificity (SPC) = " + str( true_negative_rate ) )
print( "==> Confusion outputs:" )
DictHelper.print_dict( confusion_outputs )









    



==> True negative rate (TNR), Specificity (SPC) = 0.0
==> Confusion outputs:
----> FNR : 0.026492851135407905
----> FPR : 1.0
----> PPV : 0.9714645404951742
----> SPC : 0.0
----> TNR : 0.0
----> TPR : 0.9735071488645921
----> false_negative : 63
----> false_negative_rate : 0.026492851135407905
----> false_positive : 68
----> false_positive_rate : 1.0
----> population_negative : 68
----> population_positive : 2378
----> precision : 0.9714645404951742
----> predicted_negative : 63
----> predicted_positive : 2383
----> recall : 0.9735071488645921
----> specificity : 0.0
----> true_negative : 0
----> true_negative_rate : 0.0
----> true_positive : 2315

FOR - False omission rate (FOR)



In [15]:

    
# declare variables
false_omission_rate = None

# ==> False omission rate (FOR) = Σ False negative/Σ Predicted condition negative
try:
    
    false_omission_rate = ( false_negative_count / predicted_negative_count )

except:
    
    # error - None
    false_omission_rate = None
    
#-- END check to see if Exception. --#

confusion_outputs[ "false_omission_rate" ] = false_omission_rate
confusion_outputs[ "FOR" ] = false_omission_rate

print( "==> False omission rate (FOR) = " + str( false_omission_rate ) )
print( "==> Confusion outputs:" )
DictHelper.print_dict( confusion_outputs )









    



==> False omission rate (FOR) = 1.0
==> Confusion outputs:
----> FNR : 0.026492851135407905
----> FOR : 1.0
----> FPR : 1.0
----> PPV : 0.9714645404951742
----> SPC : 0.0
----> TNR : 0.0
----> TPR : 0.9735071488645921
----> false_negative : 63
----> false_negative_rate : 0.026492851135407905
----> false_omission_rate : 1.0
----> false_positive : 68
----> false_positive_rate : 1.0
----> population_negative : 68
----> population_positive : 2378
----> precision : 0.9714645404951742
----> predicted_negative : 63
----> predicted_positive : 2383
----> recall : 0.9735071488645921
----> specificity : 0.0
----> true_negative : 0
----> true_negative_rate : 0.0
----> true_positive : 2315

LR+ - Positive likelihood ratio (LR+)



In [16]:

    
# declare variables
positive_likelihood_ratio = None
tpr = None
fpr = None

# ==> Positive likelihood ratio (LR+) = TPR/FPR
tpr = confusion_outputs.get( "TPR", None )
fpr = confusion_outputs.get( "FPR", None )

try:
    
    positive_likelihood_ratio = ( tpr / fpr )

except:
    
    # error - None
    positive_likelihood_ratio = None
    
#-- END check to see if Exception. --#

confusion_outputs[ "positive_likelihood_ratio" ] = positive_likelihood_ratio
confusion_outputs[ "LR+" ] = positive_likelihood_ratio

print( "==> Positive likelihood ratio (LR+) = " + str( positive_likelihood_ratio ) )
print( "==> Confusion outputs:" )
DictHelper.print_dict( confusion_outputs )









    



==> Positive likelihood ratio (LR+) = 0.9735071488645921
==> Confusion outputs:
----> FNR : 0.026492851135407905
----> FOR : 1.0
----> FPR : 1.0
----> LR+ : 0.9735071488645921
----> PPV : 0.9714645404951742
----> SPC : 0.0
----> TNR : 0.0
----> TPR : 0.9735071488645921
----> false_negative : 63
----> false_negative_rate : 0.026492851135407905
----> false_omission_rate : 1.0
----> false_positive : 68
----> false_positive_rate : 1.0
----> population_negative : 68
----> population_positive : 2378
----> positive_likelihood_ratio : 0.9735071488645921
----> precision : 0.9714645404951742
----> predicted_negative : 63
----> predicted_positive : 2383
----> recall : 0.9735071488645921
----> specificity : 0.0
----> true_negative : 0
----> true_negative_rate : 0.0
----> true_positive : 2315

LR- - Negative likelihood ratio (LR-)



In [17]:

    
# declare variables
negative_likelihood_ratio = None
fnr = None
tnr = None

# ==> Negative likelihood ratio (LR-) = FNR/TNR
fnr = confusion_outputs.get( "FNR", None )
tnr = confusion_outputs.get( "TNR", None )

try:
    
    negative_likelihood_ratio = ( fnr / tnr )

except:
    
    # error - None
    negative_likelihood_ratio = None
    
#-- END check to see if Exception. --#

confusion_outputs[ "negative_likelihood_ratio" ] = negative_likelihood_ratio
confusion_outputs[ "LR-" ] = negative_likelihood_ratio

print( "==> Negative likelihood ratio (LR-) = " + str( negative_likelihood_ratio ) )
print( "==> Confusion outputs:" )
DictHelper.print_dict( confusion_outputs )









    



==> Negative likelihood ratio (LR-) = None
==> Confusion outputs:
----> FNR : 0.026492851135407905
----> FOR : 1.0
----> FPR : 1.0
----> LR+ : 0.9735071488645921
----> LR- : None
----> PPV : 0.9714645404951742
----> SPC : 0.0
----> TNR : 0.0
----> TPR : 0.9735071488645921
----> false_negative : 63
----> false_negative_rate : 0.026492851135407905
----> false_omission_rate : 1.0
----> false_positive : 68
----> false_positive_rate : 1.0
----> negative_likelihood_ratio : None
----> population_negative : 68
----> population_positive : 2378
----> positive_likelihood_ratio : 0.9735071488645921
----> precision : 0.9714645404951742
----> predicted_negative : 63
----> predicted_positive : 2383
----> recall : 0.9735071488645921
----> specificity : 0.0
----> true_negative : 0
----> true_negative_rate : 0.0
----> true_positive : 2315

ACC - Accuracy (ACC)



In [18]:

    
# declare variables
accuracy = None
total_population = None

# ==> Accuracy (ACC) = Σ True positive + Σ True negative/Σ Total population
total_population = true_positive_count + true_negative_count + false_positive_count + false_negative_count

try:
    
    accuracy = ( ( true_positive_count + true_negative_count ) / total_population )

except:
    
    # error - None
    accuracy = None
    
#-- END check to see if Exception. --#

confusion_outputs[ "accuracy" ] = accuracy
confusion_outputs[ "ACC" ] = accuracy
confusion_outputs[ "total_population" ] = total_population

print( "==> Accuracy (ACC) = " + str( accuracy ) )
print( "==> Confusion outputs:" )
DictHelper.print_dict( confusion_outputs )









    



==> Accuracy (ACC) = 0.946443172526574
==> Confusion outputs:
----> ACC : 0.946443172526574
----> FNR : 0.026492851135407905
----> FOR : 1.0
----> FPR : 1.0
----> LR+ : 0.9735071488645921
----> LR- : None
----> PPV : 0.9714645404951742
----> SPC : 0.0
----> TNR : 0.0
----> TPR : 0.9735071488645921
----> accuracy : 0.946443172526574
----> false_negative : 63
----> false_negative_rate : 0.026492851135407905
----> false_omission_rate : 1.0
----> false_positive : 68
----> false_positive_rate : 1.0
----> negative_likelihood_ratio : None
----> population_negative : 68
----> population_positive : 2378
----> positive_likelihood_ratio : 0.9735071488645921
----> precision : 0.9714645404951742
----> predicted_negative : 63
----> predicted_positive : 2383
----> recall : 0.9735071488645921
----> specificity : 0.0
----> total_population : 2446
----> true_negative : 0
----> true_negative_rate : 0.0
----> true_positive : 2315

FDR - False discovery rate (FDR), probability of false alarm



In [19]:

    
# declare variables
false_discovery_rate = None

# ==> False discovery rate (FDR), probability of false alarm = Σ False positive/Σ Predicted condition positive
try:
    
    false_discovery_rate = ( false_positive_count / predicted_positive_count )

except:
    
    # error - None
    false_discovery_rate = None
    
#-- END check to see if Exception. --#

confusion_outputs[ "false_discovery_rate" ] = false_discovery_rate
confusion_outputs[ "FDR" ] = false_discovery_rate

print( "==> False discovery rate (FDR), probability of false alarm = " + str( false_discovery_rate ) )
print( "==> Confusion outputs:" )
DictHelper.print_dict( confusion_outputs )









    



==> False discovery rate (FDR), probability of false alarm = 0.02853545950482585
==> Confusion outputs:
----> ACC : 0.946443172526574
----> FDR : 0.02853545950482585
----> FNR : 0.026492851135407905
----> FOR : 1.0
----> FPR : 1.0
----> LR+ : 0.9735071488645921
----> LR- : None
----> PPV : 0.9714645404951742
----> SPC : 0.0
----> TNR : 0.0
----> TPR : 0.9735071488645921
----> accuracy : 0.946443172526574
----> false_discovery_rate : 0.02853545950482585
----> false_negative : 63
----> false_negative_rate : 0.026492851135407905
----> false_omission_rate : 1.0
----> false_positive : 68
----> false_positive_rate : 1.0
----> negative_likelihood_ratio : None
----> population_negative : 68
----> population_positive : 2378
----> positive_likelihood_ratio : 0.9735071488645921
----> precision : 0.9714645404951742
----> predicted_negative : 63
----> predicted_positive : 2383
----> recall : 0.9735071488645921
----> specificity : 0.0
----> total_population : 2446
----> true_negative : 0
----> true_negative_rate : 0.0
----> true_positive : 2315

NPV - Negative predictive value (NPV)



In [20]:

    
# declare variables
negative_predictive_value = None

# ==> Negative predictive value (NPV) = Σ True negative/Σ Predicted condition negative
try:
    
    negative_predictive_value = ( true_negative_count / predicted_negative_count )

except:
    
    # error - None
    negative_predictive_value = None
    
#-- END check to see if Exception. --#

confusion_outputs[ "negative_predictive_value" ] = negative_predictive_value
confusion_outputs[ "NPV" ] = negative_predictive_value

print( "==> Negative predictive value (NPV) = " + str( negative_predictive_value ) )
print( "==> Confusion outputs:" )
DictHelper.print_dict( confusion_outputs )









    



==> Negative predictive value (NPV) = 0.0
==> Confusion outputs:
----> ACC : 0.946443172526574
----> FDR : 0.02853545950482585
----> FNR : 0.026492851135407905
----> FOR : 1.0
----> FPR : 1.0
----> LR+ : 0.9735071488645921
----> LR- : None
----> NPV : 0.0
----> PPV : 0.9714645404951742
----> SPC : 0.0
----> TNR : 0.0
----> TPR : 0.9735071488645921
----> accuracy : 0.946443172526574
----> false_discovery_rate : 0.02853545950482585
----> false_negative : 63
----> false_negative_rate : 0.026492851135407905
----> false_omission_rate : 1.0
----> false_positive : 68
----> false_positive_rate : 1.0
----> negative_likelihood_ratio : None
----> negative_predictive_value : 0.0
----> population_negative : 68
----> population_positive : 2378
----> positive_likelihood_ratio : 0.9735071488645921
----> precision : 0.9714645404951742
----> predicted_negative : 63
----> predicted_positive : 2383
----> recall : 0.9735071488645921
----> specificity : 0.0
----> total_population : 2446
----> true_negative : 0
----> true_negative_rate : 0.0
----> true_positive : 2315

DOR - Diagnostic odds ratio (DOR)



In [21]:

    
# declare variables
diagnostic_odds_ratio = None
lr_plus = None
lr_minus = None

# ==> Diagnostic odds ratio (DOR) = LR+/LR−
lr_plus = confusion_outputs.get( "LR+", None )
lr_minus = confusion_outputs.get( "LR-", None )

try:
    
    diagnostic_odds_ratio = ( lr_plus / lr_minus )
    
except:
    
    # error - None
    diagnostic_odds_ratio = None
    
#-- END check to see if Exception. --#
    
confusion_outputs[ "diagnostic_odds_ratio" ] = diagnostic_odds_ratio
confusion_outputs[ "DOR" ] = diagnostic_odds_ratio

print( "==> Diagnostic odds ratio (DOR) = " + str( diagnostic_odds_ratio ) )
print( "==> Confusion outputs:" )
DictHelper.print_dict( confusion_outputs )









    



==> Diagnostic odds ratio (DOR) = None
==> Confusion outputs:
----> ACC : 0.946443172526574
----> DOR : None
----> FDR : 0.02853545950482585
----> FNR : 0.026492851135407905
----> FOR : 1.0
----> FPR : 1.0
----> LR+ : 0.9735071488645921
----> LR- : None
----> NPV : 0.0
----> PPV : 0.9714645404951742
----> SPC : 0.0
----> TNR : 0.0
----> TPR : 0.9735071488645921
----> accuracy : 0.946443172526574
----> diagnostic_odds_ratio : None
----> false_discovery_rate : 0.02853545950482585
----> false_negative : 63
----> false_negative_rate : 0.026492851135407905
----> false_omission_rate : 1.0
----> false_positive : 68
----> false_positive_rate : 1.0
----> negative_likelihood_ratio : None
----> negative_predictive_value : 0.0
----> population_negative : 68
----> population_positive : 2378
----> positive_likelihood_ratio : 0.9735071488645921
----> precision : 0.9714645404951742
----> predicted_negative : 63
----> predicted_positive : 2383
----> recall : 0.9735071488645921
----> specificity : 0.0
----> total_population : 2446
----> true_negative : 0
----> true_negative_rate : 0.0
----> true_positive : 2315

F1 Score



In [22]:

    
# declare variables
f1_score = None
recall = None
precision = None

# ==> F1 score = 2 / ( ( 1 / Recall ) + ( 1 / Precision ) )
recall = confusion_outputs.get( "recall", None )
precision = confusion_outputs.get( "precision", None )
try:
    
    f1_score = ( 2 / ( ( 1 / recall ) + ( 1 / precision ) ) )

except:
    
    # error - None
    f1_score = None
    
#-- END check to see if Exception. --#

confusion_outputs[ "f1_score" ] = f1_score

print( "==> F1 score = " + str( f1_score ) )
print( "==> Confusion outputs:" )
DictHelper.print_dict( confusion_outputs )









    



==> F1 score = 0.9724847721067004
==> Confusion outputs:
----> ACC : 0.946443172526574
----> DOR : None
----> FDR : 0.02853545950482585
----> FNR : 0.026492851135407905
----> FOR : 1.0
----> FPR : 1.0
----> LR+ : 0.9735071488645921
----> LR- : None
----> NPV : 0.0
----> PPV : 0.9714645404951742
----> SPC : 0.0
----> TNR : 0.0
----> TPR : 0.9735071488645921
----> accuracy : 0.946443172526574
----> diagnostic_odds_ratio : None
----> f1_score : 0.9724847721067004
----> false_discovery_rate : 0.02853545950482585
----> false_negative : 63
----> false_negative_rate : 0.026492851135407905
----> false_omission_rate : 1.0
----> false_positive : 68
----> false_positive_rate : 1.0
----> negative_likelihood_ratio : None
----> negative_predictive_value : 0.0
----> population_negative : 68
----> population_positive : 2378
----> positive_likelihood_ratio : 0.9735071488645921
----> precision : 0.9714645404951742
----> predicted_negative : 63
----> predicted_positive : 2383
----> recall : 0.9735071488645921
----> specificity : 0.0
----> total_population : 2446
----> true_negative : 0
----> true_negative_rate : 0.0
----> true_positive : 2315

MCC - Matthews correlation coefficient (MCC)



In [23]:

    
# declare variables
matthews_correlation_coefficient = None
numerator = None
temp_math = None
denominator = None

# ==> Matthews correlation coefficient (MCC) = ( ( T P × T N ) − ( F P × F N ) ) / sqrt( ( T P + F P ) * ( T P + F N ) * ( T N + F P ) * ( T N + F N ) )
numerator = ( ( true_positive_count * true_negative_count ) - ( false_positive_count * false_negative_count ) )
temp_math = ( ( true_positive_count + false_positive_count ) * ( true_positive_count + false_negative_count ) * ( true_negative_count + false_positive_count ) * ( true_negative_count + false_negative_count ) )
denominator = math.sqrt( temp_math )

try:
    
    matthews_correlation_coefficient = numerator / denominator

except:
    
    # error - None
    matthews_correlation_coefficient = None
    
#-- END check to see if Exception. --#

confusion_outputs[ "matthews_correlation_coefficient" ] = matthews_correlation_coefficient
confusion_outputs[ "MCC" ] = matthews_correlation_coefficient

print( "==> Matthews correlation coefficient (MCC) = " + str( matthews_correlation_coefficient ) )
print( "==> Confusion outputs:" )
DictHelper.print_dict( confusion_outputs )









    



==> Matthews correlation coefficient (MCC) = -0.027495193775309384
==> Confusion outputs:
----> ACC : 0.946443172526574
----> DOR : None
----> FDR : 0.02853545950482585
----> FNR : 0.026492851135407905
----> FOR : 1.0
----> FPR : 1.0
----> LR+ : 0.9735071488645921
----> LR- : None
----> MCC : -0.027495193775309384
----> NPV : 0.0
----> PPV : 0.9714645404951742
----> SPC : 0.0
----> TNR : 0.0
----> TPR : 0.9735071488645921
----> accuracy : 0.946443172526574
----> diagnostic_odds_ratio : None
----> f1_score : 0.9724847721067004
----> false_discovery_rate : 0.02853545950482585
----> false_negative : 63
----> false_negative_rate : 0.026492851135407905
----> false_omission_rate : 1.0
----> false_positive : 68
----> false_positive_rate : 1.0
----> matthews_correlation_coefficient : -0.027495193775309384
----> negative_likelihood_ratio : None
----> negative_predictive_value : 0.0
----> population_negative : 68
----> population_positive : 2378
----> positive_likelihood_ratio : 0.9735071488645921
----> precision : 0.9714645404951742
----> predicted_negative : 63
----> predicted_positive : 2383
----> recall : 0.9735071488645921
----> specificity : 0.0
----> total_population : 2446
----> true_negative : 0
----> true_negative_rate : 0.0
----> true_positive : 2315

Informedness or Bookmaker Informedness (BM)



In [24]:

    
# declare variables
informedness = None
tpr = None
tnr = None

# ==> Informedness or Bookmaker Informedness (BM) = TPR + TNR − 1
tpr = confusion_outputs.get( "TPR", None )
tnr = confusion_outputs.get( "TNR", None )

try:
    
    informedness = tpr + tnr - 1

except:
    
    # error - None
    informedness = None
    
#-- END check to see if Exception. --#

confusion_outputs[ "informedness" ] = informedness
confusion_outputs[ "BM" ] = informedness

print( "==> Informedness or Bookmaker Informedness (BM) = " + str( informedness ) )
print( "==> Confusion outputs:" )
DictHelper.print_dict( confusion_outputs )









    



==> Informedness or Bookmaker Informedness (BM) = -0.02649285113540789
==> Confusion outputs:
----> ACC : 0.946443172526574
----> BM : -0.02649285113540789
----> DOR : None
----> FDR : 0.02853545950482585
----> FNR : 0.026492851135407905
----> FOR : 1.0
----> FPR : 1.0
----> LR+ : 0.9735071488645921
----> LR- : None
----> MCC : -0.027495193775309384
----> NPV : 0.0
----> PPV : 0.9714645404951742
----> SPC : 0.0
----> TNR : 0.0
----> TPR : 0.9735071488645921
----> accuracy : 0.946443172526574
----> diagnostic_odds_ratio : None
----> f1_score : 0.9724847721067004
----> false_discovery_rate : 0.02853545950482585
----> false_negative : 63
----> false_negative_rate : 0.026492851135407905
----> false_omission_rate : 1.0
----> false_positive : 68
----> false_positive_rate : 1.0
----> informedness : -0.02649285113540789
----> matthews_correlation_coefficient : -0.027495193775309384
----> negative_likelihood_ratio : None
----> negative_predictive_value : 0.0
----> population_negative : 68
----> population_positive : 2378
----> positive_likelihood_ratio : 0.9735071488645921
----> precision : 0.9714645404951742
----> predicted_negative : 63
----> predicted_positive : 2383
----> recall : 0.9735071488645921
----> specificity : 0.0
----> total_population : 2446
----> true_negative : 0
----> true_negative_rate : 0.0
----> true_positive : 2315

Markedness (MK)



In [25]:

    
# declare variables
markedness = None
ppv = None
npv = None

# ==> Markedness (MK) = PPV + NPV − 1 
ppv = confusion_outputs.get( "PPV", None )
npv = confusion_outputs.get( "NPV", None )

try:
    
    markedness = ppv + npv - 1

except:
    
    # error - None
    markedness = None
    
#-- END check to see if Exception. --#

confusion_outputs[ "markedness" ] = markedness
confusion_outputs[ "MK" ] = markedness

print( "==> Markedness (MK) = " + str( markedness ) )
print( "==> Confusion outputs:" )
DictHelper.print_dict( confusion_outputs,
                       prefix_IN = "EXPECTED_OUTPUT_MAP[ \"",
                       separator_IN = "\" ] = ",
                       suffix_IN = None )









    



==> Markedness (MK) = -0.028535459504825833
==> Confusion outputs:
EXPECTED_OUTPUT_MAP[ "ACC" ] = 0.946443172526574
EXPECTED_OUTPUT_MAP[ "BM" ] = -0.02649285113540789
EXPECTED_OUTPUT_MAP[ "DOR" ] = None
EXPECTED_OUTPUT_MAP[ "FDR" ] = 0.02853545950482585
EXPECTED_OUTPUT_MAP[ "FNR" ] = 0.026492851135407905
EXPECTED_OUTPUT_MAP[ "FOR" ] = 1.0
EXPECTED_OUTPUT_MAP[ "FPR" ] = 1.0
EXPECTED_OUTPUT_MAP[ "LR+" ] = 0.9735071488645921
EXPECTED_OUTPUT_MAP[ "LR-" ] = None
EXPECTED_OUTPUT_MAP[ "MCC" ] = -0.027495193775309384
EXPECTED_OUTPUT_MAP[ "MK" ] = -0.028535459504825833
EXPECTED_OUTPUT_MAP[ "NPV" ] = 0.0
EXPECTED_OUTPUT_MAP[ "PPV" ] = 0.9714645404951742
EXPECTED_OUTPUT_MAP[ "SPC" ] = 0.0
EXPECTED_OUTPUT_MAP[ "TNR" ] = 0.0
EXPECTED_OUTPUT_MAP[ "TPR" ] = 0.9735071488645921
EXPECTED_OUTPUT_MAP[ "accuracy" ] = 0.946443172526574
EXPECTED_OUTPUT_MAP[ "diagnostic_odds_ratio" ] = None
EXPECTED_OUTPUT_MAP[ "f1_score" ] = 0.9724847721067004
EXPECTED_OUTPUT_MAP[ "false_discovery_rate" ] = 0.02853545950482585
EXPECTED_OUTPUT_MAP[ "false_negative" ] = 63
EXPECTED_OUTPUT_MAP[ "false_negative_rate" ] = 0.026492851135407905
EXPECTED_OUTPUT_MAP[ "false_omission_rate" ] = 1.0
EXPECTED_OUTPUT_MAP[ "false_positive" ] = 68
EXPECTED_OUTPUT_MAP[ "false_positive_rate" ] = 1.0
EXPECTED_OUTPUT_MAP[ "informedness" ] = -0.02649285113540789
EXPECTED_OUTPUT_MAP[ "markedness" ] = -0.028535459504825833
EXPECTED_OUTPUT_MAP[ "matthews_correlation_coefficient" ] = -0.027495193775309384
EXPECTED_OUTPUT_MAP[ "negative_likelihood_ratio" ] = None
EXPECTED_OUTPUT_MAP[ "negative_predictive_value" ] = 0.0
EXPECTED_OUTPUT_MAP[ "population_negative" ] = 68
EXPECTED_OUTPUT_MAP[ "population_positive" ] = 2378
EXPECTED_OUTPUT_MAP[ "positive_likelihood_ratio" ] = 0.9735071488645921
EXPECTED_OUTPUT_MAP[ "precision" ] = 0.9714645404951742
EXPECTED_OUTPUT_MAP[ "predicted_negative" ] = 63
EXPECTED_OUTPUT_MAP[ "predicted_positive" ] = 2383
EXPECTED_OUTPUT_MAP[ "recall" ] = 0.9735071488645921
EXPECTED_OUTPUT_MAP[ "specificity" ] = 0.0
EXPECTED_OUTPUT_MAP[ "total_population" ] = 2446
EXPECTED_OUTPUT_MAP[ "true_negative" ] = 0
EXPECTED_OUTPUT_MAP[ "true_negative_rate" ] = 0.0
EXPECTED_OUTPUT_MAP[ "true_positive" ] = 2315

Create unit tests



In [27]:

    
print( "==> population values: " + str( len( ground_truth_list ) ) )
list_name = "ACTUAL_VALUE_LIST"
string_list = map( str, ground_truth_list )
list_values = ", ".join( string_list )
print( list_name + " = [ " + list_values + " ]" )

print( "==> predicted values count: " + str( len( predicted_list ) ) )
list_name = "PREDICTED_VALUE_LIST"
string_list = map( str, predicted_list )
list_values = ", ".join( string_list )
print( list_name + " = [ " + list_values + " ]" )









    



==> population values: 2446
ACTUAL_VALUE_LIST = [ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ]
==> predicted values count: 2446
PREDICTED_VALUE_LIST = [ 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ]



In [26]:

    
confusion_helper = ConfusionMatrixHelper.populate_confusion_matrix( ground_truth_list, predicted_list )
print( str( confusion_helper ) )









    



ConfusionMatrixHelper --> 
DictHelper --> 
----> ACC : 0.946443172526574
----> BM : -0.02649285113540789
----> DOR : None
----> FDR : 0.02853545950482585
----> FNR : 0.026492851135407905
----> FOR : 1.0
----> FPR : 1.0
----> LR+ : 0.9735071488645921
----> LR- : None
----> MCC : -0.027495193775309384
----> MK : -0.028535459504825833
----> NPV : 0.0
----> PPV : 0.9714645404951742
----> SPC : 0.0
----> TNR : 0.0
----> TPR : 0.9735071488645921
----> accuracy : 0.946443172526574
----> diagnostic_odds_ratio : None
----> f1_score : 0.9724847721067004
----> false_discovery_rate : 0.02853545950482585
----> false_negative : 63
----> false_negative_rate : 0.026492851135407905
----> false_omission_rate : 1.0
----> false_positive : 68
----> false_positive_rate : 1.0
----> informedness : -0.02649285113540789
----> markedness : -0.028535459504825833
----> matthews_correlation_coefficient : -0.027495193775309384
----> negative_likelihood_ratio : None
----> negative_predictive_value : 0.0
----> population_negative : 68
----> population_positive : 2378
----> positive_likelihood_ratio : 0.9735071488645921
----> precision : 0.9714645404951742
----> predicted_negative : 63
----> predicted_positive : 2383
----> recall : 0.9735071488645921
----> specificity : 0.0
----> total_population : 2446
----> true_negative : 0
----> true_negative_rate : 0.0
----> true_positive : 2315

Table of Contents

Setup

Setup - Imports

Setup - Initialize Django

Setup - Tools

Build Confusion Matrix Data

Person detection

Person detection - build value lists

Person detection - confusion matrix

Person lookup

Person lookup - build value lists

Person lookup - confusion matrix

Person Types

Function: build confusion lists for a given categorical value

Person type - Authors

Person type - Authors - build value lists

Person type - Authors - confusion matrix

Person type - Subjects

Person type - Subjects - build value lists

Person type - Subjects - confusion matrix

Person type - Sources

Person type - Sources - build value lists

Person type - Sources - confusion matrix

Notes

NEXT

TODO

TODO - filter articles that are not news

Debugging

DONE

Appendix - Build confusion matrix

create confusion matrix

derive confusion outputs

precision - Positive predictive value (PPV), Precision

recall - True positive rate (TPR), Recall, Sensitivity, probability of detection

FNR - False negative rate (FNR), Miss rate

FPR - False positive rate (FPR), Fall-out

TNR - True negative rate (TNR), Specificity (SPC)

FOR - False omission rate (FOR)

LR+ - Positive likelihood ratio (LR+)

LR- - Negative likelihood ratio (LR-)

ACC - Accuracy (ACC)

FDR - False discovery rate (FDR), probability of false alarm

NPV - Negative predictive value (NPV)

DOR - Diagnostic odds ratio (DOR)

F1 Score

MCC - Matthews correlation coefficient (MCC)

Informedness or Bookmaker Informedness (BM)

Markedness (MK)

Create unit tests