2016.12.09 - work log - prelim_month - no single names
In [2]:
import datetime
print( "packages imported at " + str( datetime.datetime.now() ) )
In [3]:
%pwd
Out[3]:
First, initialize my dev django project, so I can run code in this notebook that references my django models and can talk to the database using my project's settings.
In [4]:
%run django_init.py
Next, remove all reliability data that refers to a single name using the "View reliability name information" screen:
To start, enter the following in fields there:
You should see lots of entries where coders detected people who were mentioned only by their first name.
Need to look at each instance where a person has a single name part.
Most are probably instances where the computer correctly detected the name part, but where you don't have enough name to match it to a person so the human coding protocol directed them to not capture the name fragment.
However, there might be some where a coder made a mistake and just captured a name part for a person whose full name was in the story. To check, click the "Article ID" in the column that has a link to article ID. It will take you to a view of the article where all the people who coded the article are included, with each detection of a mention or quotation displayed next to the paragraph where the person was originally first detected.
So for each instance of a single name part:
click on the article ID link in the row to go to the article and check to see if there is person whose name the fragment is a part of ( https://research.local/research/context/text/article/article_data/view_with_text/ ).
If there is a person with a full name to which the name fragment is a reference, check to see if the coder has data for the full person.
if not, merge:
Configure:
this will bring up all coding for the article whose ID you entered.
Remove the Reliability_Names
row with the name fragment from reliability data.
To get rid of all matching in this list, click the checkbox in the "select" column next to each one you want to delete (sorry, no "select all" just yet), choose "Delete selected" from the "Reliability names action:" field at the top of the list, then click the "Do action" button.
Reliability_Names records Removed:
First name "Kate" was matched to "Kate Gosselin" but "Gosselin" is nowhere in the article.
Article Data 2980, article 20739 - 11003 (AS) - Gosselin, Kate ( id = 1608; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Kate
In [15]:
# imports
from context_text.article_coding.manual_coding.manual_article_coder import ManualArticleCoder
from context_text.models import Article_Subject
# declare variables
my_coder = None
subject = None
person_name = ""
person_instance = None
person_match_list = None
# create ManualArticleCoder and Article_Subject instance
my_coder = ManualArticleCoder()
subject = Article_Subject()
# set up look up of "Kate"
person_name = "Kate"
# lookup person - returns person and confidence score inside
# Article_Person descendent instance.
subject = my_coder.lookup_person( subject,
person_name,
create_if_no_match_IN = False,
update_person_IN = False )
# retrieve information from Article_Person
person_instance = subject.person
person_match_list = subject.person_match_list # list of Person instances
if ( person_instance is not None ):
# Found person for "Kate":
print( "Found person for \"" + str( person_name ) + "\": " + str( person_instance ) )
else:
# no person instance found.
print( "No person instance found for \"" + str( person_name ) + "\"" )
#-- END check to see if person_instance --#
if ( ( person_match_list is not None ) and ( len( person_match_list ) > 0 ) ):
print( "match list:" )
for match_person in person_match_list:
# output each person for now.
print( "- " + str( match_person ) )
#-- END loop over person_match_list --#
else:
print( "match list is None or empty." )
#-- END check to see if there is a match list.
Is there only one person with first name Kate?
In [17]:
# imports
from context_text.models import Person
# declare variables
name_string = ""
test_person_qs = None
test_person = None
test_person_count = -1
# do a lookup, filtering on first name of "Kate".
name_string = "Kate"
test_person_qs = Person.objects.filter( first_name = name_string )
# got anything at all?
if ( test_person_qs is not None ):
# process results - count...
test_person_count = test_person_qs.count()
print( "Found " + str( test_person_count ) + " matches:" )
# ...and loop.
for test_person in test_person_qs:
# output person
print( "- " + str( test_person ) )
#-- END loop over matching persons. --#
#-- END check to see if None --#
So... If there is a single match in the database for a single name part (first name or last name), but the match contains more than just the first name, I don't want to call that a match unless there is some sort of associated ID that also matches.
For all subjects here:
There are no mentions displayed, even though the counts next to each show there are mentions.
In [4]:
from context_text.models import Article_Data
# lookup the article data in question.
article_data = Article_Data.objects.get( pk = 2980 )
# ha. So, I had a misnamed variable - didn't need to do any more debugging than this.