prelim_month_human - confusion matrix
old file name: 2017.10.21 - work log - prelim_month_human - confusion matrix
Confusion matrix for data where coder 1 is ground truth, coder 2 is uncorrected human coding.
In [2]:
# set the label we'll be looking at throughout
current_label = "prelim_month_human"
In [3]:
import datetime
import math
import pandas
import pandas_ml
import sklearn
import sklearn.metrics
import six
import statsmodels
import statsmodels.api
print( "packages imported at " + str( datetime.datetime.now() ) )
In [4]:
%pwd
Out[4]:
First, initialize my dev django project, so I can run code in this notebook that references my django models and can talk to the database using my project's settings.
You need to have installed your virtualenv with django as a kernel, then select that kernel for this notebook.
In [5]:
%run ../django_init.py
Import any sourcenet
or context_analysis
models or classes.
In [6]:
# python_utilities
from python_utilities.analysis.statistics.confusion_matrix_helper import ConfusionMatrixHelper
from python_utilities.analysis.statistics.stats_helper import StatsHelper
from python_utilities.dictionaries.dict_helper import DictHelper
# context_analysis models.
from context_analysis.models import Reliability_Names
print( "sourcenet and context_analysis packages imported at " + str( datetime.datetime.now() ) )
Write functions here to do math, so that we can reuse said tools below.
A basic confusion matrix ( https://en.wikipedia.org/wiki/Confusion_matrix ) contains counts of true positives, true negatives, false positives, and false negatives for a given binary or boolean (yes/no) classification decision you are asking someone or something to make.
To create a confusion matrix, you need two associated vectors containing classification decisions (0s and 1s), one that contains ground truth, and one that contains values predicted by whatever coder you are testing. For each associated pair of values:
Once you have your basic confusion matrix, the counts of true positives, true negatives, false positives, and false negatives can then be used to calculate a set of different scores and values one can use to assess the quality of predictive models. These scores include "precision and recall", "accuracy", an "F1 score" (a harmonic mean), and a "diagnostic odds ratio", among many others.
For each person detected across the set of articles, look at whether the automated coder correctly detected the person, independent of eventual lookup or person type.
First, build lists of ground truth and predicted values per person.
In [7]:
# declare variables
reliability_names_label = None
label_in_list = []
reliability_names_qs = None
ground_truth_coder_index = 1
predicted_coder_index = 2
# processing
column_name = ""
predicted_value = -1
predicted_list = []
ground_truth_value = -1
ground_truth_list = []
reliability_names_instance = None
# set label
reliability_names_label = current_label # "prelim_month_human"
# lookup Reliability_Names for selected label
label_in_list.append( reliability_names_label )
reliability_names_qs = Reliability_Names.objects.filter( label__in = label_in_list )
print( "Found " + str( reliability_names_qs.count() ) + " rows with label in " + str( label_in_list ) )
# loop over records
predicted_value = -1
predicted_list = []
ground_truth_value = -1
ground_truth_list = []
ground_truth_positive_count = 0
predicted_positive_count = 0
true_positive_count = 0
false_positive_count = 0
ground_truth_negative_count = 0
predicted_negative_count = 0
true_negative_count = 0
false_negative_count = 0
for reliability_names_instance in reliability_names_qs:
# get detected flag from ground truth and predicted columns and add them to list.
# ==> ground truth
column_name = Reliability_Names.FIELD_NAME_PREFIX_CODER
column_name += str( ground_truth_coder_index )
column_name += "_" + Reliability_Names.FIELD_NAME_SUFFIX_DETECTED
ground_truth_value = getattr( reliability_names_instance, column_name )
ground_truth_list.append( ground_truth_value )
# ==> predicted
column_name = Reliability_Names.FIELD_NAME_PREFIX_CODER
column_name += str( predicted_coder_index )
column_name += "_" + Reliability_Names.FIELD_NAME_SUFFIX_DETECTED
predicted_value = getattr( reliability_names_instance, column_name )
predicted_list.append( predicted_value )
#-- END loop over Reliability_Names instances. --#
print( "==> population values count: " + str( len( ground_truth_list ) ) )
print( "==> predicted values count: " + str( len( predicted_list ) ) )
print( "==> percentage agreement = " + str( StatsHelper.percentage_agreement( ground_truth_list, predicted_list ) ) )
In [8]:
print( "==> population values: " + str( len( ground_truth_list ) ) )
list_name = "ACTUAL_VALUE_LIST"
string_list = map( str, ground_truth_list )
list_values = ", ".join( string_list )
print( list_name + " = [ " + list_values + " ]" )
print( "==> predicted values count: " + str( len( predicted_list ) ) )
list_name = "PREDICTED_VALUE_LIST"
string_list = map( str, predicted_list )
list_values = ", ".join( string_list )
print( list_name + " = [ " + list_values + " ]" )
In [9]:
confusion_matrix = pandas_ml.ConfusionMatrix( ground_truth_list, predicted_list )
print("Confusion matrix:\n%s" % confusion_matrix)
confusion_matrix.print_stats()
stats_dict = confusion_matrix.stats()
print( str( stats_dict ) )
print( str( stats_dict[ 'TPR' ] ) )
# get counts in variables
true_positive_count = confusion_matrix.TP
false_positive_count = confusion_matrix.FP
true_negative_count = confusion_matrix.TN
false_negative_count = confusion_matrix.FN
# and derive population and predicted counts
ground_truth_positive_count = true_positive_count + false_negative_count
predicted_positive_count = true_positive_count + false_positive_count
ground_truth_negative_count = true_negative_count + false_positive_count
predicted_negative_count = true_negative_count + false_negative_count
print( "==> Predicted positives: " + str( predicted_positive_count ) + " ( " + str( ( true_positive_count + false_positive_count ) ) + " )" )
print( "==> Ground truth positives: " + str( ground_truth_positive_count ) + " ( " + str( ( true_positive_count + false_negative_count ) ) + " )" )
print( "==> True positives: " + str( true_positive_count ) )
print( "==> False positives: " + str( false_positive_count ) )
print( "==> Predicted negatives: " + str( predicted_negative_count ) + " ( " + str( ( true_negative_count + false_negative_count ) ) + " )" )
print( "==> Ground truth negatives: " + str( ground_truth_negative_count ) + " ( " + str( ( true_negative_count + false_positive_count ) ) + " )" )
print( "==> True negatives: " + str( true_negative_count ) )
print( "==> False negatives: " + str( false_negative_count ) )
print( "==> Precision (true positive/predicted positive): " + str( ( true_positive_count / predicted_positive_count ) ) )
print( "==> Recall (true positive/ground truth positive): " + str( ( true_positive_count / ground_truth_positive_count ) ) )
In [10]:
confusion_helper = ConfusionMatrixHelper.populate_confusion_matrix( ground_truth_list, predicted_list )
print( str( confusion_helper ) )
For each person detected across the set of articles, look at whether the automated coder correctly looked up the person (so compare person IDs).
First, build lists of ground truth and predicted values per person.
In [11]:
# declare variables
reliability_names_label = None
label_in_list = []
reliability_names_qs = None
ground_truth_coder_index = 1
predicted_coder_index = 2
# processing
column_name = ""
predicted_value = -1
predicted_list = []
ground_truth_value = -1
ground_truth_list = []
reliability_names_instance = None
# set label
reliability_names_label = current_label # "prelim_month_human"
# lookup Reliability_Names for selected label
label_in_list.append( reliability_names_label )
reliability_names_qs = Reliability_Names.objects.filter( label__in = label_in_list )
print( "Found " + str( reliability_names_qs.count() ) + " rows with label in " + str( label_in_list ) )
# loop over records
predicted_value = -1
predicted_list = []
ground_truth_value = -1
ground_truth_list = []
ground_truth_positive_count = 0
predicted_positive_count = 0
true_positive_count = 0
false_positive_count = 0
ground_truth_negative_count = 0
predicted_negative_count = 0
true_negative_count = 0
false_negative_count = 0
for reliability_names_instance in reliability_names_qs:
# get person_id from ground truth and predicted columns and add them to list.
# ==> ground truth
column_name = Reliability_Names.FIELD_NAME_PREFIX_CODER
column_name += str( ground_truth_coder_index )
column_name += "_" + Reliability_Names.FIELD_NAME_SUFFIX_PERSON_ID
ground_truth_value = getattr( reliability_names_instance, column_name )
ground_truth_list.append( ground_truth_value )
# ==> predicted
column_name = Reliability_Names.FIELD_NAME_PREFIX_CODER
column_name += str( predicted_coder_index )
column_name += "_" + Reliability_Names.FIELD_NAME_SUFFIX_PERSON_ID
predicted_value = getattr( reliability_names_instance, column_name )
predicted_list.append( predicted_value )
#-- END loop over Reliability_Names instances. --#
print( "==> population values count: " + str( len( ground_truth_list ) ) )
print( "==> predicted values count: " + str( len( predicted_list ) ) )
print( "==> percentage agreement = " + str( StatsHelper.percentage_agreement( ground_truth_list, predicted_list ) ) )
In [12]:
print( "==> population values: " + str( len( ground_truth_list ) ) )
list_name = "ACTUAL_VALUE_LIST"
string_list = map( str, ground_truth_list )
list_values = ", ".join( string_list )
print( list_name + " = [ " + list_values + " ]" )
print( "==> predicted values count: " + str( len( predicted_list ) ) )
list_name = "PREDICTED_VALUE_LIST"
string_list = map( str, predicted_list )
list_values = ", ".join( string_list )
print( list_name + " = [ " + list_values + " ]" )
In [13]:
confusion_helper = ConfusionMatrixHelper.populate_confusion_matrix( ground_truth_list, predicted_list )
print( str( confusion_helper ) )
In [14]:
def build_confusion_lists( column_name_suffix_IN,
desired_value_IN,
label_list_IN = [ "prelim_month_human", ],
ground_truth_coder_index_IN = 1,
predicted_coder_index_IN = 2,
debug_flag_IN = False ):
'''
Accepts suffix of column name of interest and desired value. Also accepts optional labels
list, indexes of ground_truth and predicted coder users, and a debug flag. Uses these
values to loop over records whose label matches the on in the list passed in. For each,
in the specified column, checks to see if the ground_truth and predicted values match
the desired value. If so, positive, so 1 is stored for the row. If no, negative, so 0
is stored for the row.
Returns dictionary with value lists inside, ground truth values list mapped to key
"ground_truth" and predicted values list mapped to key "predicted".
'''
# return reference
lists_OUT = {}
# declare variables
reliability_names_label = None
label_in_list = []
reliability_names_qs = None
ground_truth_coder_index = -1
predicted_coder_index = -1
# processing
debug_flag = False
desired_column_suffix = None
desired_value = None
ground_truth_column_name = None
ground_truth_column_value = None
ground_truth_value = -1
ground_truth_list = []
predicted_column_name = None
predicted_column_value = None
predicted_value = -1
predicted_list = []
reliability_names_instance = None
# got required values?
# column name suffix?
if ( column_name_suffix_IN is not None ):
# desired value?
if ( desired_value_IN is not None ):
# ==> initialize
desired_column_suffix = column_name_suffix_IN
desired_value = desired_value_IN
label_in_list = label_list_IN
ground_truth_coder_index = ground_truth_coder_index_IN
predicted_coder_index = predicted_coder_index_IN
debug_flag = debug_flag_IN
# create ground truth column name
ground_truth_column_name = Reliability_Names.FIELD_NAME_PREFIX_CODER
ground_truth_column_name += str( ground_truth_coder_index )
ground_truth_column_name += "_" + desired_column_suffix
# create predicted column name.
predicted_column_name = Reliability_Names.FIELD_NAME_PREFIX_CODER
predicted_column_name += str( predicted_coder_index )
predicted_column_name += "_" + desired_column_suffix
# ==> processing
# lookup Reliability_Names for selected label(s)
reliability_names_qs = Reliability_Names.objects.filter( label__in = label_in_list )
print( "Found " + str( reliability_names_qs.count() ) + " rows with label in " + str( label_in_list ) )
# reset all lists and values.
ground_truth_column_value = ""
ground_truth_value = -1
ground_truth_list = []
predicted_column_value = ""
predicted_value = -1
predicted_list = []
# loop over records to build ground_truth and predicted value lists
# where 1 = value matching desired value in multi-value categorical
# variable and 0 = any value other than the desired value.
for reliability_names_instance in reliability_names_qs:
# get detected flag from ground truth and predicted columns and add them to list.
# ==> ground truth
# get column value.
ground_truth_column_value = getattr( reliability_names_instance, ground_truth_column_name )
# does it match desired value?
if ( ground_truth_column_value == desired_value ):
# it does - True (or positive or 1!)!
ground_truth_value = 1
else:
# it does not - False (or negative or 0!)!
ground_truth_value = 0
#-- END check to see if current value matches desired value. --#
# add value to list.
ground_truth_list.append( ground_truth_value )
# ==> predicted
# get column value.
predicted_column_value = getattr( reliability_names_instance, predicted_column_name )
# does it match desired value?
if ( predicted_column_value == desired_value ):
# it does - True (or positive or 1!)!
predicted_value = 1
else:
# it does not - False (or negative or 0!)!
predicted_value = 0
#-- END check to see if current value matches desired value. --#
# add to predicted list.
predicted_list.append( predicted_value )
if ( debug_flag == True ):
print( "----> gt: " + str( ground_truth_column_value ) + " ( " + str( ground_truth_value ) + " ) - p: " + str( predicted_column_value ) + " ( " + str( predicted_value ) + " )" )
#-- END DEBUG --#
#-- END loop over Reliability_Names instances. --#
else:
print( "ERROR - you must specify a desired value." )
#-- END check to see if desired value passed in. --#
else:
print( "ERROR - you must provide the suffix of the column you want to examine." )
#-- END check to see if column name suffix passed in. --#
# package up and return lists.
lists_OUT[ "ground_truth" ] = ground_truth_list
lists_OUT[ "predicted" ] = predicted_list
return lists_OUT
#-- END function build_confusion_lists() --#
print( "Function build_confusion_lists() defined at " + str( datetime.datetime.now() ) )
For each person detected across the set of articles, look at whether the automated coder assigned the correct type.
First, build lists of ground truth and predicted values per person.
In [15]:
confusion_lists = build_confusion_lists( Reliability_Names.FIELD_NAME_SUFFIX_PERSON_TYPE,
Reliability_Names.PERSON_TYPE_AUTHOR )
ground_truth_list = confusion_lists.get( "ground_truth", None )
predicted_list = confusion_lists.get( "predicted", None )
print( "==> population values count: " + str( len( ground_truth_list ) ) )
print( "==> predicted values count: " + str( len( predicted_list ) ) )
print( "==> percentage agreement = " + str( StatsHelper.percentage_agreement( ground_truth_list, predicted_list ) ) )
In [16]:
print( "==> population values: " + str( len( ground_truth_list ) ) )
list_name = "ACTUAL_VALUE_LIST"
string_list = map( str, ground_truth_list )
list_values = ", ".join( string_list )
print( list_name + " = [ " + list_values + " ]" )
print( "==> predicted values count: " + str( len( predicted_list ) ) )
list_name = "PREDICTED_VALUE_LIST"
string_list = map( str, predicted_list )
list_values = ", ".join( string_list )
print( list_name + " = [ " + list_values + " ]" )
In [17]:
confusion_helper = ConfusionMatrixHelper.populate_confusion_matrix( ground_truth_list, predicted_list )
print( str( confusion_helper ) )
For each person detected across the set of articles classified by Ground truth as a subject, look at whether the automated coder assigned the correct person type.
First, build lists of ground truth and predicted values per person.
In [18]:
# subjects = "mentioned"
confusion_lists = build_confusion_lists( Reliability_Names.FIELD_NAME_SUFFIX_PERSON_TYPE,
Reliability_Names.SUBJECT_TYPE_MENTIONED )
ground_truth_list = confusion_lists.get( "ground_truth", None )
predicted_list = confusion_lists.get( "predicted", None )
print( "==> population values count: " + str( len( ground_truth_list ) ) )
print( "==> predicted values count: " + str( len( predicted_list ) ) )
print( "==> percentage agreement = " + str( StatsHelper.percentage_agreement( ground_truth_list, predicted_list ) ) )
In [19]:
print( "==> population values: " + str( len( ground_truth_list ) ) )
list_name = "ACTUAL_VALUE_LIST"
string_list = map( str, ground_truth_list )
list_values = ", ".join( string_list )
print( list_name + " = [ " + list_values + " ]" )
print( "==> predicted values count: " + str( len( predicted_list ) ) )
list_name = "PREDICTED_VALUE_LIST"
string_list = map( str, predicted_list )
list_values = ", ".join( string_list )
print( list_name + " = [ " + list_values + " ]" )
In [20]:
confusion_helper = ConfusionMatrixHelper.populate_confusion_matrix( ground_truth_list, predicted_list )
print( str( confusion_helper ) )
For each person detected across the set of articles classified by Ground truth as a source, look at whether the automated coder assigned the correct person type.
First, build lists of ground truth and predicted values per person.
In [21]:
# subjects = "mentioned"
confusion_lists = build_confusion_lists( Reliability_Names.FIELD_NAME_SUFFIX_PERSON_TYPE,
Reliability_Names.SUBJECT_TYPE_QUOTED )
ground_truth_list = confusion_lists.get( "ground_truth", None )
predicted_list = confusion_lists.get( "predicted", None )
print( "==> population values count: " + str( len( ground_truth_list ) ) )
print( "==> predicted values count: " + str( len( predicted_list ) ) )
print( "==> percentage agreement = " + str( StatsHelper.percentage_agreement( ground_truth_list, predicted_list ) ) )
In [22]:
print( "==> population values: " + str( len( ground_truth_list ) ) )
list_name = "ACTUAL_VALUE_LIST"
string_list = map( str, ground_truth_list )
list_values = ", ".join( string_list )
print( list_name + " = [ " + list_values + " ]" )
print( "==> predicted values count: " + str( len( predicted_list ) ) )
list_name = "PREDICTED_VALUE_LIST"
string_list = map( str, predicted_list )
list_values = ", ".join( string_list )
print( list_name + " = [ " + list_values + " ]" )
In [23]:
confusion_helper = ConfusionMatrixHelper.populate_confusion_matrix( ground_truth_list,
predicted_list,
calc_type_IN = ConfusionMatrixHelper.CALC_TYPE_PANDAS_ML )
print( str( confusion_helper ) )