2016.12.09 - work log - prelim_month - no single names

Setup

Setup - Imports


In [2]:
import datetime

print( "packages imported at " + str( datetime.datetime.now() ) )


packages imported at 2016-12-10 16:09:59.139729

In [3]:
%pwd


Out[3]:
'/home/jonathanmorgan/work/django/research/work/phd_work'

Setup - Initialize Django

First, initialize my dev django project, so I can run code in this notebook that references my django models and can talk to the database using my project's settings.


In [4]:
%run django_init.py


django initialized at 2016-12-10 16:10:04.149805

Data cleanup

Remove single-name reliability data

Next, remove all reliability data that refers to a single name using the "View reliability name information" screen:

To start, enter the following in fields there:

  • Label: - "prelim_month"
  • Coders to compare (1 through ==>): - 2
  • Reliability names filter type: - Select "Lookup"
  • [Lookup] - Person has first name, no other name parts. - CHECK the checkbox

You should see lots of entries where coders detected people who were mentioned only by their first name.

Single-name data assessment

Need to look at each instance where a person has a single name part.

Most are probably instances where the computer correctly detected the name part, but where you don't have enough name to match it to a person so the human coding protocol directed them to not capture the name fragment.

However, there might be some where a coder made a mistake and just captured a name part for a person whose full name was in the story. To check, click the "Article ID" in the column that has a link to article ID. It will take you to a view of the article where all the people who coded the article are included, with each detection of a mention or quotation displayed next to the paragraph where the person was originally first detected.

So for each instance of a single name part:

  • click on the article ID link in the row to go to the article and check to see if there is person whose name the fragment is a part of ( https://research.local/research/context/text/article/article_data/view_with_text/ ).

    • If there is a person with a full name to which the name fragment is a reference, check to see if the coder has data for the full person.

      • if not, merge:

        • go to the disagreement view page: https://research.local/research/context/analysis/reliability/names/disagreement/view
        • Configure:

          • Label: - "prelim_month"
          • Coders to compare (1 through ==>): - 2
          • Reliability names filter type: - Select "Lookup"
          • [Lookup] - Associated Article IDs (comma-delimited): - Enter the ID of the article the coding belonged to.
        • this will bring up all coding for the article whose ID you entered.

        • In the "select" column, click the checkbox in the row where there is a single name part that needs to be merged.
        • In the "merge INTO" column, click the checbox in the row with the full name for that person.
        • In "Reliability Names Action", choose "Merge Coding --> FROM 1 SELECTED / INTO 1"
        • Click "Do Action" button.
    • Remove the Reliability_Names row with the name fragment from reliability data.

Delete single-name data

To get rid of all matching in this list, click the checkbox in the "select" column next to each one you want to delete (sorry, no "select all" just yet), choose "Delete selected" from the "Reliability names action:" field at the top of the list, then click the "Do action" button.

Reliability_Names records Removed:

ID Article Article_Data Article_Subject
8618 Article 20739 Article_Data 2980 11006 (AS) - Christopher ( id = 2776; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Christopher

| Article | Article_Data | <str( Article_Subject )> |

Coding to look into

Coding decisions to look at more closely:

Match for just first name? - TODO

First name "Kate" was matched to "Kate Gosselin" but "Gosselin" is nowhere in the article.


In [15]:
# imports
from context_text.article_coding.manual_coding.manual_article_coder import ManualArticleCoder
from context_text.models import Article_Subject

# declare variables
my_coder = None
subject = None
person_name = ""
person_instance = None
person_match_list = None

# create ManualArticleCoder and Article_Subject instance
my_coder = ManualArticleCoder()
subject = Article_Subject()

# set up look up of "Kate"
person_name = "Kate"

# lookup person - returns person and confidence score inside
#    Article_Person descendent instance.
subject = my_coder.lookup_person( subject, 
                                  person_name,
                                  create_if_no_match_IN = False,
                                  update_person_IN = False )

# retrieve information from Article_Person
person_instance = subject.person
person_match_list = subject.person_match_list  # list of Person instances

if ( person_instance is not None ):

    # Found person for "Kate":
    print( "Found person for \"" + str( person_name ) + "\": " + str( person_instance ) )
    
else:
    
    # no person instance found.
    print( "No person instance found for \"" + str( person_name ) + "\"" )
    
#-- END check to see if person_instance --#

if ( ( person_match_list is not None ) and ( len( person_match_list ) > 0 ) ):

    print( "match list:" )
    for match_person in person_match_list:
        
        # output each person for now.
        print( "- " + str( match_person ) )
        
    #-- END loop over person_match_list --#

else:
    
    print( "match list is None or empty." )

#-- END check to see if there is a match list.


Found person for "Kate": 1608 - Gosselin, Kate ( Zondervan )
match list is None or empty.

Is there only one person with first name Kate?


In [17]:
# imports
from context_text.models import Person

# declare variables
name_string = ""
test_person_qs = None
test_person = None
test_person_count = -1

# do a lookup, filtering on first name of "Kate".
name_string = "Kate"
test_person_qs = Person.objects.filter( first_name = name_string )

# got anything at all?
if ( test_person_qs is not None ):

    # process results - count...
    test_person_count = test_person_qs.count()
    print( "Found " + str( test_person_count ) + " matches:" )

    # ...and loop.
    for test_person in test_person_qs:

        # output person
        print( "- " + str( test_person ) )
        
    #-- END loop over matching persons. --#
    
#-- END check to see if None --#


Found 1 matches:
- 1608 - Gosselin, Kate ( Zondervan )

So... If there is a single match in the database for a single name part (first name or last name), but the match contains more than just the first name, I don't want to call that a match unless there is some sort of associated ID that also matches.

Debugging

No mentions in Article_Data view page? - FIXED

For all subjects here:

There are no mentions displayed, even though the counts next to each show there are mentions.


In [4]:
from context_text.models import Article_Data

# lookup the article data in question.
article_data = Article_Data.objects.get( pk = 2980 )

# ha.  So, I had a misnamed variable - didn't need to do any more debugging than this.