2017.06.01 - work log - prelim_month - remove single names

Setup

Setup - Imports


In [ ]:
import datetime
import json
import six

print( "packages imported at " + str( datetime.datetime.now() ) )

In [ ]:
%pwd

Setup - Initialize Django

First, initialize my dev django project, so I can run code in this notebook that references my django models and can talk to the database using my project's settings.

You need to have installed your virtualenv with django as a kernel, then select that kernel for this notebook.


In [ ]:
%run django_init.py

Import any sourcenet or context_analysis models or classes.


In [ ]:
# django imports
from django.contrib.auth.models import User

# sourcenet shared
from context_text.shared.person_details import PersonDetails

# sourcenet models.
from context_text.models import Article
from context_text.models import Article_Data
from context_text.models import Article_Subject
from context_text.models import Person
from context_text.shared.context_text_base import ContextTextBase
from context_text.tests.models.test_Article_Data_model import Article_Data_Copy_Tester

# sourcenet article_coding
from context_text.article_coding.article_coding import ArticleCoder
from context_text.article_coding.manual_coding.manual_article_coder import ManualArticleCoder

# context_analysis models.
from context_analysis.models import Reliability_Names
from context_analysis.reliability.reliability_names_builder import ReliabilityNamesBuilder

print( "sourcenet and context_analysis packages imported at " + str( datetime.datetime.now() ) )

Data cleanup

Remove single-name reliability data

Next, remove all reliability data that refers to a single name using the "View reliability name information" screen:

To start, enter the following in fields there:

  • Label: "prelim_month"
  • Coders to compare (1 through ==>): 2
  • Reliability names filter type: Select "Lookup"
  • [Lookup] - Person has first name, no other name parts.: CHECK the checkbox

You should see lots of entries where coders detected people who were mentioned only by their first name.

Single-name data assessment

Need to look at each instance where a person has a single name part.

Most are probably instances where the computer correctly detected the name part, but where you don't have enough name to match it to a person so the human coding protocol directed them to not capture the name fragment.

However, there might be some where a coder made a mistake and just captured a name part for a person whose full name was in the story. To check, click the "Article ID" in the column that has a link to article ID. It will take you to a view of the article where all the people who coded the article are included, with each detection of a mention or quotation displayed next to the paragraph where the person was originally first detected.

So for each instance of a single name part:

  • click on the article ID link in the row to go to the article and check to see if there is person whose name the fragment is a part of ( http://research.local/research/context/text/article/article_data/view_with_text/ ).

    • If there is a person with a full name to which the name fragment is a reference, check to see if the human coder has data for the full person.

      • if human coder has data for the full person, merge:

        • go to the disagreement view page: http://research.local/research/context/analysis/reliability/names/disagreement/view
        • Configure:

          • Label: - "prelim_month"
          • Coders to compare (1 through ==>): - 2
          • Reliability names filter type: - Select "Lookup"
          • [Lookup] - Associated Article IDs (comma-delimited): - Enter the ID of the article the coding belonged to.
        • this will bring up all coding for the article whose ID you entered.

        • In the "select" column, click the checkbox in the row where there is a single name part that needs to be merged.
        • In the "merge INTO" column, click the checbox in the row with the full name for that person.
        • In "Reliability Names Action", choose "Merge Coding --> FROM 1 SELECTED / INTO 1"
        • Click "Do Action" button.
      • if human coder did not detect person or made some other kind of error:

        • uset the Tool - copy Article_Data to user ground_truth to create a copy of the person's Article_Data and assign it to coder "ground_truth".
        • if this is the first time you've used the "ground_truth" user, log into the django admin ( http://research.local/research/admin/ ) and:

          • set or reset the "ground_truth" user's password.
          • give it "staff status".
        • log in to the coding tool ( http://research.local/research/context/text/article/code/ ) as the "ground_truth" user and fix the coding for the article in question.

        • save.
        • rebuild Reliability_Names for just that article.

          • remove old Reliability_Names for that article ( Delete existing Reliability_Names ). Make sure to specify both label and Article ID, so you don't delete more than you intend.
          • re-run Reliability_Names creation for the article ( Make new Reliability_Names ). Specify:

            • Article ID list (just put the ID of the article you want to reprocess in the list).
            • label: make sure this is the same as the label of the rest of your Reliability_Names records ("prelim_month").
            • Tag list: If you want to make even more certain that you don't do something unexpected, also specify the article tags that make up your current data set, so if you accidentally specify the ID of an article not in your data set, it won't process. Current tag is "grp_month".
            • Coders to assign to which index in the Reliability_Names record, and in what priority. You can assign multiple coders to a given index, for example, when multiple coders coded subsets of a data set, and you want their combined coding to be used as "coder 1" or "coder 2", for example. See the cell for an example.
            • Automated coder type: You can specify the particular automated coding type you want for automated coder, to filter out coding done by other automated methods. See the cell for an example for "OpenCalais v2".
          • Then, you'll need to re-fix any other problems with the article. They'll pop into the list of single-name records again, for example.

        • if needed, clean up/merge the two Reliability_Names records for the person.

    • Remove the Reliability_Names row with the name fragment from reliability data.

Resolve single-name data

To get rid of all matching in this list, click the checkbox in the "select" column next to each one you want to delete (sorry, no "select all" just yet), choose "Delete selected" from the "Reliability names action:" field at the top of the list, then click the "Do action" button.

Single-name Reliability_Names records removed

Table of Reliability_Names records removed because of single names. Table is here still, but all records were moved to Reliability_Names_Evaluation table in django:

ID Article Article_Data Article_Subject Type
8618 Article 20739 Article_Data 2980 11006 (AS) - Christopher ( id = 2776; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Christopher CORRECT
8705 Article 20843 Article_Data 3000 11102 (AS) - Brock ( id = 2798; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Brock CORRECT
9163 Article 20912 Article_Data 3015 11147 (AS) - Slate ( id = 2801; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Slate CORRECT
9243 Article 20936 Article_Data 3002 11110 (AS) - Christine ( id = 2800; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Christine CORRECT
9506 Article 21049 Article_Data 3034 11232 (AS) - Reyes ( id = 2809; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Reyes CORRECT
9584 Article 21080 Article_Data 3037 11244 (AS) - Ben ( id = 2811; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Ben CORRECT
9594 Article 21080 Article_Data 3037 11249 (AS) - Carman ( id = 2814; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Carman CORRECT
9583 Article 21080 Article_Data 3037 11252 (AS) - Culter ( id = 2816; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Culter CORRECT
9590 Article 21080 Article_Data 3037 11243 (AS) - Emma ( id = 2810; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Emma CORRECT
9595 Article 21080 Article_Data 3037 11250 (AS) - Isabel ( id = 2815; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Isabel CORRECT
9592 Article 21080 Article_Data 3037 11245 (AS) - Tarina ( id = 2812; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Tarina CORRECT
9671 Article 21109 Article_Data 3045 11289 (AS) - Pat ( id = 2818; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Pat CORRECT
9681 Article 21112 Article_Data 3038 11255 (AS) - Obama ( id = 842; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Obama CORRECT
9687 Article 21113 Article_Data 3033 11225 (AS) - Steve ( id = 2806; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Steve CORRECT
9688 Article 21113 Article_Data 3033 11227 (AS) - Jay ( id = 2807; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Jay CORRECT
9684 Article 21113 Article_Data 3033 11228 (AS) - Jesse ( id = 2808; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Jesse CORRECT
9696 Article 21117 Article_Data 3049 8511 (AS) - Mary ( id = 1912; capture_method = None ) (mentioned; individual) ==> name: Mary CORRECT
9707 Article 21121 Article_Data 3048 11306 (AS) - Jesus ( id = 1451; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Jesus CORRECT
9584 Article 21080 Article_Data 3037 11244 (AS) - Ben ( id = 2811; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Ben CORRECT
9594 Article 21080 Article_Data 3037 11249 (AS) - Carman ( id = 2814; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Carman CORRECT
9583 Article 21080 Article_Data 3037 11252 (AS) - Culter ( id = 2816; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Culter CORRECT
9590 Article 21080 Article_Data 3037 11243 (AS) - Emma ( id = 2810; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Emma CORRECT
9595 Article 21080 Article_Data 3037 11250 (AS) - Isabel ( id = 2815; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Isabel CORRECT
9592 Article 21080 Article_Data 3037 11245 (AS) - Tarina ( id = 2812; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Tarina CORRECT
9671 Article 21109 Article_Data 3045 11289 (AS) - Pat ( id = 2818; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Pat CORRECT
9681 Article 21112 Article_Data 3038 11255 (AS) - Obama ( id = 842; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Obama CORRECT
9688 Article 21113 Article_Data 3033 11227 (AS) - Jay ( id = 2807; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Jay CORRECT
9684 Article 21113 Article_Data 3033 11228 (AS) - Jesse ( id = 2808; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Jesse CORRECT
9687 Article 21113 Article_Data 3033 11225 (AS) - Steve ( id = 2806; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Steve CORRECT
9690 Article 21116 Article_Data 3044 11288 (AS) - More ( id = 2817; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: More ERROR
9823 Article 21190 Article_Data 1641 5423 (AS) - Bill ( id = 855; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Bill CORRECT
10076 Article 21287 Article_Data 1635 5396 (AS) - Vernon ( id = 847; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Vernon CORRECT
10422 Article 21435 Article_Data 1651 5460 (AS) - Joshua ( id = 869; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Joshua CORRECT
7956 Article 21509 Article_Data 1660 5498 (AS) - Jaidon ( id = 875; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Jaidon CORRECT
7958 Article 21509 Article_Data 1660 5500 (AS) - Kaidon ( id = 877; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Kaidon CORRECT
7959 Article 21509 Article_Data 1660 5502 (AS) - Rushing ( id = 878; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Rushing CORRECT
8064 Article 21569 Article_Data 1666 5534 (AS) - Betty ( id = 885; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Betty CORRECT
8662 Article 21719 Article_Data 1706 5692 (AS) - Al ( id = 934; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Al CORRECT
8689 Article 21781 Article_Data 1726 5779 (AS) - Benjamin ( id = 961; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Benjamin CORRECT
8769 Article 21813 Article_Data 1727 5783 (AS) - Jaidon ( id = 875; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Jaidon CORRECT
8771 Article 21813 Article_Data 1727 5786 (AS) - Kaidon ( id = 877; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Kaidon CORRECT
8767 Article 21813 Article_Data 1727 5784 (AS) - Kyanie ( id = 775; capture_method = OpenCalais_REST_API ) (quoted; individual) ==> name: Kyanie CORRECT
8278 Article 21827 Article_Data 1721 5753 (AS) - Schultz ( id = 752; capture_method = OpenCalais_REST_API ) (quoted; individual) ==> name: Schultz CORRECT
9013 Article 21886 Article_Data 3060 11386 (AS) - Dan ( id = 2824; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Dan CORRECT
9010 Article 21886 Article_Data 3060 11387 (AS) - Tom ( id = 2825; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Tom CORRECT
9043 Article 21898 Article_Data 2556 9006 (AS) - Dave ( id = 2178; capture_method = None ) (quoted; individual) ==> name: Dave CORRECT
9064 Article 21903 Article_Data 1746 5895 (AS) - Daniel ( id = 1000; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Daniel CORRECT
9067 Article 21903 Article_Data 1746 5892 (AS) - Patsy ( id = 998; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Patsy CORRECT
9150 Article 21931 Article_Data 1750 5912 (AS) - Christ ( id = 1006; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Christ CORRECT
9424 Article 22034 Article_Data 3076 11457 (AS) - Ken ( id = 2840; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Ken CORRECT
9573 Article 22099 Article_Data 3071 11440 (AS) - Abigail ( id = 2835; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Abigail CORRECT
9572 Article 22099 Article_Data 3071 11439 (AS) - Sonneveldt ( id = 2834; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Sonneveldt CORRECT
9578 Article 22100 Article_Data 3067 11424 (AS) - Don ( id = 2830; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Don CORRECT
9791 Article 22199 Article_Data 3080 11468 (AS) - Marcia ( id = 2842; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Marcia CORRECT
9799 Article 22200 Article_Data 3084 11486 (AS) - Bryan ( id = 2845; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Bryan CORRECT
9992 Article 22281 Article_Data 3083 11483 (AS) - Tassell ( id = 2844; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Tassell MERGED
10044 Article 22302 Article_Data 3086 11494 (AS) - Greg ( id = 2847; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Greg CORRECT
10082 Article 22313 Article_Data 3104 11569 (AS) - Noonday ( id = 2852; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Noonday CORRECT
10084 Article 22313 Article_Data 3104 11576 (AS) - Tecumseh ( id = 2853; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Tecumseh CORRECT
8546 Article 22566 Article_Data 3133 11705 (AS) - Obama ( id = 842; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Obama CORRECT
8160 Article 22625 Article_Data 3147 11757 (AS) - Laura ( id = 2864; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Laura CORRECT
8341 Article 22681 Article_Data 3151 11770 (AS) - Aaron ( id = 2865; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Aaron CORRECT
10464 Article 22690 Article_Data 2666 9493 (AS) - Obama ( id = 842; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Obama ==> organization: United States CORRECT
8435 Article 22714 Article_Data 3164 11831 (AS) - Caleb ( id = 2873; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Caleb CORRECT
8515 Article 22747 Article_Data 3161 11815 (AS) - Corey ( id = 2870; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Corey CORRECT
8566 Article 22765 Article_Data 3158 11806 (AS) - Coopersville ( id = 2869; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Coopersville ERROR
8565 Article 22765 Article_Data 3158 11803 (AS) - Dave ( id = 2868; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Dave CORRECT
9459 Article 22790 Article_Data 2685 9584 (AS) - Obama ( id = 842; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Obama ==> organization: United States CORRECT
8827 Article 22854 Article_Data 2672 9528 (AS) - Obama ( id = 842; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Obama ==> organization: United States CORRECT
8834 Article 22858 Article_Data 3177 11923 (AS) - Schwaraswak ( id = 2886; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Schwaraswak MISSPELLED
8882 Article 22869 Article_Data 3187 11980 (AS) - Olympian ( id = 2895; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Olympian SPORTS BRIEFS
8901 Article 22874 Article_Data 3182 11949 (AS) - Benthem, Amber ( id = 1929; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Amber LOOKUP ERROR
8903 Article 22874 Article_Data 3182 11950 (AS) - Alyssa ( id = 2889; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Alyssa CORRECT
8902 Article 22874 Article_Data 3182 11947 (AS) - Amanda ( id = 2888; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Amanda CORRECT
8944 Article 22887 Article_Data 3173 11888 (AS) - Ben ( id = 2811; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Ben CORRECT
8933 Article 22887 Article_Data 3173 11898 (AS) - Julie ( id = 2885; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Julie CORRECT
8932 Article 22887 Article_Data 3173 11890 (AS) - Alexis ( id = 2884; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Alexis CORRECT
9108 Article 22946 Article_Data 3179 11931 (AS) - Bob ( id = 2887; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Bob CORRECT
9210 Article 22970 Article_Data 3183 11955 (AS) - Bartholomew, Logan ( id = 2579; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Logan LOOKUP ERROR
9211 Article 22970 Article_Data 3183 11957 (AS) - Matt ( id = 2891; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Matt CORRECT
8143 Article 23055 Article_Data 3194 12014 (AS) - Lansing ( id = 2902; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Lansing ERROR
8323 Article 23065 Article_Data 3193 12006 (AS) - Eli ( id = 2899; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Eli CORRECT
8325 Article 23065 Article_Data 3193 12010 (AS) - Betty ( id = 885; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Betty PET( CHICKEN )
8324 Article 23065 Article_Data 3193 12009 (AS) - Mabel ( id = 2900; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Mabel PET( CHICKEN )
8329 Article 23065 Article_Data 3193 12011 (AS) - Violet ( id = 2901; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Violet PET( CHICKEN )
9617 Article 23139 Article_Data 3198 12037 (AS) - Bernice ( id = 2906; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Bernice CORRECT
9989 Article 23169 Article_Data 3195 MERGED - 12020 (AS) - Keller ( id = 2903; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Keller MERGED
9764 Article 23216 Article_Data 3211 12093 (AS) - Sue ( id = 2908; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Sue CORRECT
8109 Article 23223 Article_Data 2700 9639 (AS) - Satan ( id = 2518; capture_method = None ) (mentioned; individual) ==> name: Satan CORRECT
8112 Article 23223 Article_Data 3212 12096 (AS) - Linda ( id = 2911; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Linda QUOTED
8111 Article 23223 Article_Data 3212 12095 (AS) - Tristan ( id = 2910; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Tristan CORRECT
9841 Article 23243 Article_Data 3222 12149 (AS) - Ignacio ( id = 2916; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Ignacio CORRECT
9842 Article 23243 Article_Data 3222 12150 (AS) - Paulina ( id = 2917; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Paulina CORRECT
10017 Article 23313 Article_Data 3225 12163 (AS) - Jesus ( id = 1451; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Jesus CORRECT
10560 Article 23379 Article_Data 3232 12206 (AS) - Barbara ( id = 2924; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Barbara CORRECT
10561 Article 23379 Article_Data 3232 12205 (AS) - Van Tubbergen, Tyler ( id = 2089; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Tyler LOOKUP ERROR
10215 Article 23384 Article_Data 3231 12195 (AS) - Diana ( id = 2920; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Diana CHILD
10216 Article 23384 Article_Data 3231 12197 (AS) - Shakulu ( id = 2921; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Shakulu CHILD
10217 Article 23384 Article_Data 3231 12199 (AS) - Shabani ( id = 2922; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Shabani CHILD
10218 Article 23384 Article_Data 3231 12200 (AS) - Joana ( id = 2923; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Joana CHILD
10220 Article 23384 Article_Data 2726 9752 (AS) - Jesus ( id = 1451; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Jesus CORRECT
10220 Article 23384 Article_Data 3231 12196 (AS) - Jesus ( id = 1451; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Jesus CORRECT
10102 Article 23403 Article_Data 3237 12231 (AS) - Danielle ( id = 2928; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Danielle CORRECT
10103 Article 23403 Article_Data 3237 12233 (AS) - Jacob ( id = 2929; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Jacob CORRECT
10109 Article 23403 Article_Data 3237 12230 (AS) - Madyson ( id = 2927; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Madyson CORRECT
9078 Article 23449 Article_Data 3233 12213 (AS) - Howard ( id = 2763; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Howard CORRECT
9385 Article 23476 Article_Data 3240 12247 (AS) - Greg ( id = 2847; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Greg CORRECT
9376 Article 23476 Article_Data 3240 12248 (AS) - Cathy ( id = 2931; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Cathy CORRECT
9377 Article 23476 Article_Data 3240 12249 (AS) - Brandon ( id = 2932; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Brandon CORRECT
9373 Article 23476 Article_Data 3240 12243 (AS) - Chase ( id = 2863; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Chase CORRECT
10448 Article 23491 Article_Data 3249 12299 (AS) - Twinkle ( id = 2938; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Twinkle ERROR
10525 Article 23529 Article_Data 3247 12288 (AS) - Liz ( id = 2936; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Liz CORRECT
7951 Article 23555 Article_Data 3261 12351 (AS) - Samson ( id = 2941; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Samson SON
7961 Article 23559 Article_Data 3254 12315 (AS) - Saturn ( id = 2940; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Saturn ERROR
7968 Article 23562 Article_Data 3260 12347 (AS) - Samson ( id = 2941; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Samson SON
8120 Article 23631 Article_Data 3274 12404 (AS) - Madonna ( id = 2946; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Madonna ERROR
8119 Article 23631 Article_Data 3274 12405 (AS) - Davenport ( id = 2947; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Davenport ERROR
9730 Article 23663 Article_Data 3266 12375 (AS) - Stephanie ( id = 2942; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Stephanie SPOUSE
8312 Article 23699 Article_Data 3271 12390 (AS) - Ed ( id = 2943; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Ed FATHER
9323 Article 23804 Article_Data 3312 12611 (AS) - Marc ( id = 2969; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Marc HUSBAND
9321 Article 23804 Article_Data 3312 12612 (AS) - Anthony ( id = 2967; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Anthony GRANDCHILD
9319 Article 23804 Article_Data 3312 12609 (AS) - Angelina ( id = 2951; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Angelina GRANDCHILD
8981 Article 23921 Article_Data 3283 12444 (AS) - Saigon ( id = 2952; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Saigon ERROR
8185 Article 23974 Article_Data 3292 12492 (AS) - Smitty ( id = 2789; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Smitty ERROR
9153 Article 23982 Article_Data 3296 12515 (AS) - Matt ( id = 2891; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Matt NO LAST NAME (HOMELESS)
9183 Article 23988 Article_Data 3300 12529 (AS) - Barbara ( id = 2924; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Barbara WIFE
8683 Article 24082 Article_Data 3304 12552 (AS) - Kulmeet ( id = 2958; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Kulmeet WIFE
9470 Article 24111 Article_Data 3311 12591 (AS) - Jonathan ( id = 2964; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Jonathan SON
9469 Article 24111 Article_Data 3311 12588 (AS) - Stephen ( id = 2963; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Stephen SON
9908 Article 24132 Article_Data 3308 12566 (AS) - Erin ( id = 2961; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Erin CHILD - DAUGHTER
9911 Article 24132 Article_Data 3308 12563 (AS) - Robyn ( id = 2959; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Robyn CHILD - DAUGHTER
9910 Article 24132 Article_Data 3308 12567 (AS) - Straayer, Mason ( id = 2504; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Mason CHILD - SON, LOOKUP ERROR
8619 Article 20739 Article_Data 2980 11003 (AS) - Gosselin, Kate ( id = 1608; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Kate LOOKUP ERROR
9591 Article 21080 Article_Data 3037 11247 (AS) - O'Brien, Collin ( id = 2619; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Collin LOOKUP ERROR
10429 Article 21435 Article_Data 1651 5462 (AS) - Bielinski, Jamie ( id = 780; capture_method = OpenCalais_REST_API ) (mentioned; individual) ==> name: Jamie LOOKUP ERROR
8291 Article 21644 Article_Data 1681 5598 (AS) - Taylor, Helen ( id = 576; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Helen LOOKUP ERROR
8459 Article 21699 Article_Data 2537 8929 (AS) - Felske, Jon ( id = 188; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Felske ==> organization: Wyoming Public Schools; Godwin Heights Public Schools CORRECT
8666 Article 21719 Article_Data 1706 5695 (AS) - Vander Hart, Ginny ( id = 271; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Ginny LOOKUP ERROR
9770 Article 22194 Article_Data 3077 11462 (AS) - Bartholomew, Logan ( id = 2579; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Logan LOOKUP ERROR
9771 Article 22194 Article_Data 3077 11463 (AS) - O'Brien, Collin ( id = 2619; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Collin LOOKUP ERROR
9876 Article 23264 Article_Data 3216 12113 (AS) - Kowalczyk-Fulmer, Katie ( id = 704; capture_method = OpenCalais_REST_API ) (mentioned; individual) ==> name: Katie LOOKUP ERROR
9374 Article 23476 Article_Data 3240 12244 (AS) - Broaddus, Adrienne ( id = 785; capture_method = OpenCalais_REST_API ) (mentioned; individual) ==> name: Adrienne LOOKUP ERROR
8008 Article 23577 Article_Data 3270 12388 (AS) - Garcia, Juan ( id = 1627; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Juan LOOKUP ERROR

In [ ]:
# folded this code into the Reliability_Names delete screen (context_analysis/views.py-->reliability_names_disagreement_view().
'''
reliability_names_id = "7956"
article_id = "21509"
article_data_id = "1660"
article_subject = "5498 (AS) - Jaidon ( id = 875; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Jaidon"
    
markdown_string = "| "
markdown_string += reliability_names_id
markdown_string += " | Article ["
markdown_string += article_id
markdown_string += "](http://research.local/research/context/text/article/article_data/view_with_text/?article_id="
markdown_string += article_id
markdown_string += ") | Article_Data ["
markdown_string += article_data_id
markdown_string += "](http://research.local/research/context/text/article/article_data/view/?article_id="
markdown_string += article_id
markdown_string += "&article_data_id_select="
markdown_string += article_data_id
markdown_string += ") | "
markdown_string += article_subject
markdown_string += " |"

print( "Reliability_Names removal Markdown:\n" + markdown_string )
'''

Reliability_Names records merged

For some, need to merge a single-name detection by Calais with full-name detection by ground_truth (an OpenCalais error - did not detect full name - combined with lookup error - didn't lookup the right person since missed part of his or her name). Will still have subsequently deleted one or more duplicate rows.

ID FROM ID INTO Article Article_Data Article_Subject
9506 9507 Article 21049 FROM 3034
TO 2443
8494 (AS) - Reyes, Ivette ( id = 1899; capture_method = None ) (quoted; individual) ( quotes: 1; mentions: 1 ) ==> Name: Ivette Reyes
9992 9993 Article 22281 FROM 3083 TO 2635 9369 (AS) - Tassell, Leslie ( id = 2328; capture_method = None ) (mentioned; individual) ==> name: Leslie E. Tassell
9989 9988 Article 23169 FROM 3195 TO 2719 12020 (AS) - Keller ( id = 2903; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Keller

In [ ]:
reliability_names_id_from = "9989"
reliability_names_id_to = "9988"
article_id = "23169"
article_data_id_from = "3195"
article_data_id_to = "2719"
article_subject = "12020 (AS) - Keller ( id = 2903; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Keller"

markdown_string = "| "
markdown_string += reliability_names_id_from
markdown_string += " | "
markdown_string += reliability_names_id_to
markdown_string += " | Article ["
markdown_string += article_id
markdown_string += "](http://research.local/research/context/text/article/article_data/view_with_text/?article_id="
markdown_string += article_id
markdown_string += ") | FROM ["
markdown_string += article_data_id_from
markdown_string += "](http://research.local/research/context/text/article/article_data/view/?article_id="
markdown_string += article_id
markdown_string += "&article_data_id_select="
markdown_string += article_data_id_from
markdown_string += ") TO ["
markdown_string += article_data_id_to
markdown_string += "](http://research.local/research/context/text/article/article_data/view/?article_id="
markdown_string += article_id
markdown_string += "&article_data_id_select="
markdown_string += article_data_id_to
markdown_string += ") | "
markdown_string += article_subject
markdown_string += " |"

print( "Reliabilty_Names merge Markdown:\n" + markdown_string )

Ground truth coding fixed

For a few, the error will be on the part of the human coder. For human error, we create a new "ground_truth" record that we will correct, so we preserve original coding (and evidence of errors) in case we want or need that information later.

ID Article Article_Data Coder Article_Subject
9720 Article 21130 Article_Data 2489 Coder=9 8719 (AS) - Krueger ( id = 2015; capture_method = None ) (quoted; individual) ==> name: Krueger ==> organization: Ottawa County; Republican; Republican Party

In [ ]:
coder_id = 9
reliability_names_id = "9720"
article_id = "21130"
article_data_id = "2489"
article_subject = "8719 (AS) - Krueger ( id = 2015; capture_method = None ) (quoted; individual) ==> name: Krueger ==> organization: Ottawa County; Republican; Republican Party"
    
markdown_string = "| "
markdown_string += reliability_names_id
markdown_string += " | Article ["
markdown_string += article_id
markdown_string += "](http://research.local/research/context/text/article/article_data/view_with_text/?article_id="
markdown_string += article_id
markdown_string += ") | Article_Data ["
markdown_string += article_data_id
markdown_string += "](http://research.local/research/context/text/article/article_data/view/?article_id="
markdown_string += article_id
markdown_string += "&article_data_id_select="
markdown_string += article_data_id
markdown_string += ") | "
markdown_string += "Coder=" + str( coder_id )
markdown_string += " | "
markdown_string += article_subject
markdown_string += " |"

print( "Reliability_Names ground_truth fix Markdown:\n" + markdown_string )

Tools

Tool - copy Article_Data to user ground_truth

Retrieve the ground truth user, then make a deep copy of an Article_Data record, assigning it to the ground truth user.


In [ ]:
# declare variables
ground_truth_user = None
ground_truth_user_id = -1
id_of_article_data_to_copy = -1
new_article_data = None
new_article_data_id = -1
validation_error_list = None
validation_error_count = -1
validation_error = None

# set ID of article data we want to copy.
id_of_article_data_to_copy = 2489

# get the ground_truth user's ID.
ground_truth_user = ContextTextBase.get_ground_truth_coding_user()
ground_truth_user_id = ground_truth_user.id

# make the copy
new_article_data = Article_Data.make_deep_copy( id_of_article_data_to_copy,
                                                new_coder_user_id_IN = ground_truth_user_id )
new_article_data_id = new_article_data.id

# validate it.
validation_error_list = Article_Data_Copy_Tester.validate_article_data_deep_copy( original_article_data_id_IN = id_of_article_data_to_copy,
                                                                                  copy_article_data_id_IN = new_article_data_id,
                                                                                  copy_coder_user_id_IN = ground_truth_user_id )

# get error count:
validation_error_count = len( validation_error_list )
if ( validation_error_count > 0 ):
    
    # loop and output messages
    for validation_error in validation_error_list:
        
        print( "- Validation erorr: " + str( validation_error ) )
        
    #-- END loop over validation errors. --#
    
else:

    # no errors - success!
    print( "Record copy a success (as far as we know)!" )
    
#-- END check to see if validation errors --#

print( "copied Article_Data id " + str( id_of_article_data_to_copy ) + " INTO Article_Data id " + str( new_article_data_id ) + " at " + str( datetime.datetime.now() ) )

Tool - delete Article_Data

Delete the Article_Data whose ID you specify (intended only when you accidentally create a "ground_truth").


In [ ]:
# declare variables
article_data_id = -1
article_data = None
do_delete = False

# set ID.
article_data_id = 3314

# get model instance
article_data = Article_Data.objects.get( id = article_data_id )

# got something?
if ( article_data is not None ):
    
    # yes.  Delete?
    if ( do_delete == True ):
        
        # delete.
        print( "Deleting Article_Data: " + str( article_data ) )
        article_data.delete()
    
    else:
        
        # no delete.
        print( "Found Article_Data: " + str( article_data ) + ", but not deleting." )
        
    #-- END check to see if we delete --#
    
#-- END check to see if Article_Data match. --#

Tool - rebuild Reliability_Names for an article

Steps:

  • retrieve the Reliability_Names row(s) for article with a paritcular ID, and filter on label if one provided.
  • delete the selected Reliability_Names row(s).
  • set up a call to the Reliability_Names program that just generates data for:

    • the article in question
    • users in a desired order.
    • etc.

Delete existing Reliability_Names for article


In [ ]:
# declare variables
article_id = -1
label = ""
do_delete = False
row_string_list = None

# first, get existing Reliability_Names rows for article and label.
article_id = 21130
label = "prelim_month"
#do_delete = True

# Do the delete
row_string_list = Reliability_Names.delete_reliabilty_names_for_article( article_id,
                                                                         label_IN = label,
                                                                         do_delete_IN = do_delete )

# print the strings.
for row_string in row_string_list:
    
    # print it.
    print( row_string )
    
#-- END loop over row strings --#

Make new Reliability_Names


In [ ]:
# django imports
#from django.contrib.auth.models import User

# sourcenet imports
#from context_text.shared.context_text_base import ContextTextBase

# context_analysis imports
#from context_analysis.reliability.reliability_names_builder import ReliabilityNamesBuilder

# declare variables
my_reliability_instance = None
tag_in_list = []
article_id_in_list = []
label = ""

# declare variables - user setup
current_coder = None
current_coder_id = -1
current_index = -1

# declare variables - Article_Data filtering.
coder_type = ""

# make reliability instance
my_reliability_instance = ReliabilityNamesBuilder()

#===============================================================================
# configure
#===============================================================================

# list of tags of articles we want to process.
tag_in_list = [ "grp_month", ]

# list of IDs of articles we want to process:
article_id_in_list = [ 21130, ]

# label to associate with results, for subsequent lookup.
label = "prelim_month"

# ! ====> map coders to indices

# set it up so that...

# ...the ground truth user has highest priority (4) for index 1...
current_coder = ContextTextBase.get_ground_truth_coding_user()
current_coder_id = current_coder.id
current_index = 1
current_priority = 4
my_reliability_instance.add_coder_at_index( current_coder_id, current_index, priority_IN = current_priority )

# ...coder ID 8 is priority 3 for index 1...
current_coder_id = 8
current_index = 1
current_priority = 3
my_reliability_instance.add_coder_at_index( current_coder_id, current_index, priority_IN = current_priority )

# ...coder ID 9 is priority 2 for index 1...
current_coder_id = 9
current_index = 1
current_priority = 2
my_reliability_instance.add_coder_at_index( current_coder_id, current_index, priority_IN = current_priority )

# ...coder ID 10 is priority 1 for index 1...
current_coder_id = 10
current_index = 1
current_priority = 1
my_reliability_instance.add_coder_at_index( current_coder_id, current_index, priority_IN = current_priority )

# ...and automated coder (2) is index 2
current_coder = ContextTextBase.get_automated_coding_user()
current_coder_id = current_coder.id
current_index = 2
current_priority = 1
my_reliability_instance.add_coder_at_index( current_coder_id, current_index, priority_IN = current_priority )

# and only look at coding by those users.  And...

# configure so that it limits to automated coder_type of OpenCalais_REST_API_v2.
coder_type = "OpenCalais_REST_API_v2"
#my_reliability_instance.limit_to_automated_coder_type = "OpenCalais_REST_API_v2"
my_reliability_instance.automated_coder_type_include_list.append( coder_type )

# output debug JSON to file
#my_reliability_instance.debug_output_json_file_path = "/home/jonathanmorgan/" + label + ".json"

#===============================================================================
# process
#===============================================================================

# process articles
my_reliability_instance.process_articles( tag_in_list,
                                          article_id_in_list_IN = article_id_in_list )

# output to database.
my_reliability_instance.output_reliability_data( label )

Notes

Notes and questions

Notes and questions:

  • what to do about a misspelled name within an article? Single name - removing all. But making note:

    • In article 21080, Reliability_Names 9583, name = Culter, should have been Cutler - quoted, graf: 13, index: 1322

      • single name, remove it - but, this will cut both ways - when both name parts present, sometimes will work out, sometimes will be false positive.
    • Article 22858 - 8834 - Schwaraswak, should have been Scharaswak.

  • What to do about single last name that is the correct last name of a person where the other name parts were detected by a person? Leave it in and map it to the correct Article_Data?

  • Obama? One name, but it is a well-known one, and preceded by "President". Still, single name, removed it.
  • 22869 - Sports briefs - error city - doesn't do well in non-news articles (no surprise there).
  • Article 23223 | Article_Data 3212 | 12096 (AS) - Linda ( id = 2911; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Linda |

    • Actually was quoted, but just a one-word name, no explicit mention of last name. Need to keep track of relationship to others in story ("wife of X").
  • Portion of Song title: | 10448 | Article 23491 | Article_Data 3249 | 12299 (AS) - Twinkle ( id = 2938; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Twinkle |

Errors

Errors:

  • Article 21116

    • RANDOM - "More..."
    • Paragraph 12: More than 600 works of art were added to the museum's collection under her leadership, most notably Ellsworth Kelly's "Blue White," a 25-foot- tall wall sculpture that was commissioned in 2006 for the museum's entry pavilion.
    • User: 2 - automated (OpenCalais_REST_API_v2)
    • 11288 (AS) - More ( id = 2817; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: More
  • Article 22765

    • PLACE NAME
    • Paragraph 8: Gavin Orchards has started selling farm-direct apples to Grand Rapids and Fruitport schools. The biggest challenge is the time it takes to deliver low-volume orders, said Mike Gavin, who runs the 240-acre farm near Coopersville with his brother, Dave.
    • User: 2 - automated (OpenCalais_REST_API_v2)
    • 11806 (AS) - Coopersville ( id = 2869; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Coopersville
  • Article 23055

    • PLACE NAME
    • Paragraph 2: While they are not disputing the state DHS' recent decision to reassign longtime Kent County DHS Director Andy Zylstra from Grand Rapids to Lansing, legislators are asking state officials to improve their communications with local workers, state Rep. Robert Dean said.
    • User: 2 - automated (OpenCalais_REST_API_v2)
    • 12014 (AS) - Lansing ( id = 2902; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Lansing
  • Article 23491

    • SONG LYRIC
    • Paragraph 39: "As the program was wrapping up and the kids were leaving the stage, one of the 2-year-olds ran up to the microphone and started singing 'Twinkle, twinkle Christmas star ...' to the tune of 'Twinkle, Twinkle Little Star.' It was so funny and cute."
    • User: 2 - automated (OpenCalais_REST_API_v2)
    • Portion of Song title: | 10448 | Article 23491 | Article_Data 3249 | 12299 (AS) - Twinkle ( id = 2938; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Twinkle |
  • Article 23559

    • PLANET NAME
    • Paragraph 10: "Three appear: Saturn joins Mars and Venus in March so, through spring and most of summer, there will be three naked eye planets in the evening sky. They will be joined briefly by elusive Mercury in April."
    • User: 2 - automated (OpenCalais_REST_API_v2)
    • "Saturn joins..." - | 7961 | Article 23559 | Article_Data 3254 | 12315 (AS) - Saturn ( id = 2940; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Saturn |
  • Article 23631

    • SCHOOL NAME - "Madonna"
    • Paragraph 6: "The school is planning a tribute during halftime of the first night's Hope game Tuesday against Madonna. There will also be other activities open to former players and family members connected to DeVette."
    • User: 2 - automated (OpenCalais_REST_API_v2)
    • | 8120 | Article 23631 | Article_Data 3274 | 12404 (AS) - Madonna ( id = 2946; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Madonna |
  • Article 23631

    • SCHOOL NAME
    • Paragraph 7: "We have a dinner scheduled in his honor and memory during the first game of the tournament (between Davenport and Grace Bible)," Van Wieren said. "We had people that had a hard time getting to the funeral, so this will be a way that people attending can share memories of Russ."
    • User: 2 - automated (OpenCalais_REST_API_v2)
    • | 8119 | Article 23631 | Article_Data 3274 | 12405 (AS) - Davenport ( id = 2947; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Davenport |
  • Article 23921

    • PLACE
    • Paragraph 6: It was 1975 when he fled his native Saigon as it fell to the North Vietnamese Army.
    • User: 2 - automated (OpenCalais_REST_API_v2)
    • | 8981 | Article 23921 | Article_Data 3283 | 12444 (AS) - Saigon ( id = 2952; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Saigon |
  • Article 23974

    • BUSINESS NAME - Detected part of business name as person.
    • Paragraph 14: Trevor Ditmar, a two-year employee at Smitty's Specialty Beverage, 1489 Lake Drive SE, said customers are vowing to quit in increasing numbers due to the product change.
    • User: 2 - automated (OpenCalais_REST_API_v2)
    • | 8185 | Article 23974 | Article_Data 3292 | 12492 (AS) - Smitty ( id = 2789; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Smitty |
  • Article 21080

    • MISSPELLING
    • Paragraph 21: "Ben was in middle school when his father was in Desert Storm, and we'd watch the developments on TV," Patti Vab Syzkle said. "He'd say, 'It's OK, Mom. It's just a skirmish.'"
    • User: 2 - automated (OpenCalais_REST_API_v2)
    • Made new person: 11246 (AS) - Syzkle, Patti ( id = 2813; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Patti Vab Syzkle
    • Should have mapped to: 11248 (AS) - Van Syzkle, Patti ( id = 1750; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Patti Van Syzkle

TODO

TODO:

Coding to look into

Coding decisions to look at more closely:

Debugging

Issues to debug:

  • TK

DONE

Debugging: Only include subjects found by current coder set

Debugging:

  • figure out why removing "Kreuger" from "ground_truth" coder's Article_Data doesn't cause it to be removed from the Reliability_Names output.

    • something to do with there still being a coder who found "Kreuger", even though it isn't one of the coders chosen based on priorities...
    • Resolution - only output row if at least one selected person detected the person in question. Added flag to methods that could have to deal with this so they can decide whether or not they want rows where selected coders did not detect, but someone did.

Issue: Single name resolves to wrong person

Issue: Single name resolves to wrong person

  • This is/was the old lookup problem where it would take THE match if there was only one person with the first name in the article. Looks like I fixed this, based on code below. Need to look at unit tests for name, make one for this scenario, run it on current code.
  • Examples (try clearing automated coding, re-running):

In [ ]:
# initialization
test_manual_article_coder = None
test_person_details = None
test_article_subject = None
test_user = None

# create ManualArticleCoder
test_manual_article_coder = ManualArticleCoder()

# get test user
test_user = ContextTextBase.get_automated_coding_user()

12567 (AS) - Straayer, Mason ( id = 2504; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Mason

Example - Amber

Details:

  • Article 22874
  • Name "Amber" resolved to person "11949 (AS)
  • Benthem, Amber ( id = 1929; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual)"
  • not the same person.

In [ ]:
name_string = "Amber"
should_be = 0
test_qs = Person.find_person_from_name( name_string )
match_count = test_qs.count()
error_string = "==> Person.find_person_from_name( " + name_string + " ) --> " + str( match_count ) + " should = " + str( should_be )
print( error_string )

person_counter = 0
for person_instance in test_qs:
    
    person_counter += 1
    print( "Person " + str( person_counter ) + ": " + str( person_instance ) )
    
    
#-- END loop over person matches --#

In [ ]:
# test person values
lookup_person_id = -1
lookup_person_name = "Amber"
lookup_title = ""

# get ManualArticleCoder instance - in init above
#test_manual_article_coder = ManualArticleCoder()

# set up person details
test_person_details = PersonDetails()
test_person_details[ ArticleCoder.PARAM_PERSON_NAME ] = lookup_person_name
#test_person_details[ ArticleCoder.PARAM_PERSON_ID ] = lookup_person_id
#test_person_details[ ArticleCoder.PARAM_TITLE ] = lookup_title

# make test Article_Subject
test_article_subject = Article_Subject()

# lookup person - returns person and confidence score inside
#    Article_Subject instance.
test_article_subject = test_manual_article_coder.lookup_person( test_article_subject, 
                                                                lookup_person_name,
                                                                create_if_no_match_IN = False,
                                                                update_person_IN = False,
                                                                person_details_IN = test_person_details )

# get results from Article_Subject
test_person = test_article_subject.person
test_person_match_list = test_article_subject.person_match_list  # list of Person instances

# output results
print( "==> ManualArticleCoder.lookup_person() result for name \"" + lookup_person_name + "\": " + str( test_person ) )
print( "ManualArticleCoder.lookup_person() match list: " + str( test_person_match_list ) )

In [ ]:
# create ManualArticleCoder instance.
#test_manual_article_coder = ManualArticleCoder()

# get an article.
test_article = Article.objects.get( pk = 22874 )

# create bare-bones Article_Data
test_article_data = Article_Data()
test_article_data.coder = test_user
test_article_data.article = test_article
test_article_data.save()

#----------------------------------------------------------------------#
# !test 1 - with person ID.       
#----------------------------------------------------------------------#

# retrieve person information.
person_name = "Amber"
title = ""
person_id = -1

# set up person details
person_details = PersonDetails()
person_details[ ManualArticleCoder.PARAM_PERSON_NAME ] = person_name
person_details[ ManualArticleCoder.PARAM_NEWSPAPER_INSTANCE ] = test_article.newspaper

# got a title?
if ( ( title is not None ) and ( title != "" ) ):

    # we do.  store it in person_details.
    person_details[ ManualArticleCoder.PARAM_TITLE ] = title

#-- END check to see if title --#

# got a person ID?
if ( ( person_id is not None ) and ( person_id != "" ) and ( person_id > 0 ) ):

    # we do.  store it in person_details.
    person_details[ ManualArticleCoder.PARAM_PERSON_ID ] = person_id

#-- END check to see if title --#

# create an article_subject.
test_article_subject = test_manual_article_coder.process_subject_name( test_article_data, person_name, person_details_IN = person_details )

# check to make sure not None
status_message = "==> ArticleCoder.process_subject_name() for name \"" + str( person_name ) + "\": " + str( test_article_subject )
print( status_message )

Example - Logan

Details:

  • Article 22970
  • Name "Logan" resolved to person "11955 (AS)
  • Bartholomew, Logan ( id = 2579; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Logan"
  • not the same person

In [ ]:
name_string = "Logan"
should_be = 0
test_qs = Person.find_person_from_name( name_string )
match_count = test_qs.count()
error_string = "==> Person.find_person_from_name( " + name_string + " ) --> " + str( match_count ) + " should = " + str( should_be )
print( error_string )

person_counter = 0
for person_instance in test_qs:
    
    person_counter += 1
    print( "Person " + str( person_counter ) + ": " + str( person_instance ) )
    
    
#-- END loop over person matches --#

In [ ]:
# test person values
lookup_person_id = -1
lookup_person_name = "Logan"
lookup_title = ""

# get ManualArticleCoder instance
test_manual_article_coder = ManualArticleCoder()

# set up person details
test_person_details = PersonDetails()
test_person_details[ ArticleCoder.PARAM_PERSON_NAME ] = lookup_person_name
#test_person_details[ ArticleCoder.PARAM_PERSON_ID ] = lookup_person_id
#test_person_details[ ArticleCoder.PARAM_TITLE ] = lookup_title

# make test Article_Subject
test_article_subject = Article_Subject()

# lookup person - returns person and confidence score inside
#    Article_Subject instance.
test_article_subject = test_manual_article_coder.lookup_person( test_article_subject, 
                                                                lookup_person_name,
                                                                create_if_no_match_IN = False,
                                                                update_person_IN = False,
                                                                person_details_IN = test_person_details )

# get results from Article_Subject
test_person = test_article_subject.person
test_person_match_list = test_article_subject.person_match_list  # list of Person instances

# output results
print( "==> ManualArticleCoder.lookup_person() result for name \"" + lookup_person_name + "\": " + str( test_person ) )
print( "ManualArticleCoder.lookup_person() match list: " + str( test_person_match_list ) )

In [ ]:
# create ManualArticleCoder instance.
#test_manual_article_coder = ManualArticleCoder()

# get an article.
test_article = Article.objects.get( pk = 22970 )

# create bare-bones Article_Data
test_article_data = Article_Data()
test_article_data.coder = test_user
test_article_data.article = test_article
test_article_data.save()

#----------------------------------------------------------------------#
# !test 1 - with person ID.       
#----------------------------------------------------------------------#

# retrieve person information.
person_name = "Logan"
title = ""
person_id = -1

# set up person details
person_details = PersonDetails()
person_details[ ManualArticleCoder.PARAM_PERSON_NAME ] = person_name
person_details[ ManualArticleCoder.PARAM_NEWSPAPER_INSTANCE ] = test_article.newspaper

# got a title?
if ( ( title is not None ) and ( title != "" ) ):

    # we do.  store it in person_details.
    person_details[ ManualArticleCoder.PARAM_TITLE ] = title

#-- END check to see if title --#

# got a person ID?
if ( ( person_id is not None ) and ( person_id != "" ) and ( person_id > 0 ) ):

    # we do.  store it in person_details.
    person_details[ ManualArticleCoder.PARAM_PERSON_ID ] = person_id

#-- END check to see if title --#

# create an article_subject.
test_article_subject = test_manual_article_coder.process_subject_name( test_article_data, person_name, person_details_IN = person_details )

# check to make sure not None
status_message = "==> ArticleCoder.process_subject_name() for name \"" + str( person_name ) + "\": " + str( test_article_subject )
print( status_message )

Fix: Single name resolves to wrong person

Write SQL Query to find all Aritcle_Subject where name in Article_Subejct is one word, Person associated has multiple names parts.

Then, go through those in the disagreements screen.

Cleanup - Single name resolves to wrong person

Look for Article_Subjects with a single name word (like "Logan" or "Amber") and an associated Person that has more than one name (for reason why, see Issue: Single name resolves to wrong person below).

  • One to test on: Article 23476 - 12244 (AS) - Broaddus, Adrienne ( id = 785; capture_method = OpenCalais_REST_API ) (mentioned; individual) ==> name: Adrienne

      SELECT sas.id AS sas_id, sas.article_data_id, sas.name, sas.person_id, sas.name, sas.lookup_name, sp.id AS sp_id, sp.full_name_string, sad.id AS sad_id, sad.article_id
      FROM context_text_article_subject sas
          INNER JOIN context_text_person sp ON sas.person_id = sp.id
          INNER JOIN context_text_article_data sad ON sas.article_data_id = sad.id
      WHERE SPLIT_PART( sas.name, ' ', 2 ) = ''
          AND SPLIT_PART( sp.full_name_string, ' ', 2 ) != ''
      ORDER BY sas.id DESC;
  • 23 instances where single name corresponds to multiple name parts person in 21 articles.

      SELECT DISTINCT( sad.article_id )
      FROM context_text_article_subject sas
          INNER JOIN context_text_person sp ON sas.person_id = sp.id
          INNER JOIN context_text_article_data sad ON sas.article_data_id = sad.id
      WHERE SPLIT_PART( sas.name, ' ', 2 ) = ''
          AND SPLIT_PART( sp.full_name_string, ' ', 2 ) != ''
      ORDER BY sad.article_id ASC;
    
    

    Article IDs:

    • // 11502 - not in article set.
    • // 20739
    • // 21080
    • // 21435
    • // 21644
    • // 21699
    • // 21719
    • // 22194
    • // 22874
    • // 22970
    • // 23264
    • // 23379
    • // 23476
    • // 23577
    • // 23745 - false alarm - human coder only selected and copied a single name - looked it up correctly, though.
    • // 24132
    • // 94326
    • // 158908
    • // 192047
    • // 337046
    • // 359001
  • If you limit to just articles with tag "grp_month", you are down to 15:

      SELECT DISTINCT( sad.article_id )
      FROM context_text_article_subject sas
          INNER JOIN context_text_person sp ON sas.person_id = sp.id
          INNER JOIN context_text_article_data sad ON sas.article_data_id = sad.id
          INNER JOIN taggit_taggeditem tti ON sad.article_id = tti.object_id
      WHERE SPLIT_PART( sas.name, ' ', 2 ) = ''
          AND SPLIT_PART( sp.full_name_string, ' ', 2 ) != ''
          AND tti.content_type_id = 13
          AND tti.tag_id = 14
      ORDER BY sad.article_id ASC;
    

    Article IDs:

    • // 20739
    • // 21080
    • // 21435
    • // 21644
    • // 21699
    • // 21719
    • // 22194
    • // 22874
    • // 22970
    • // 23264
    • // 23379
    • // 23476
    • // 23577
    • // 23745 - false alarm - human coder only selected and copied a single name - looked it up correctly, though.
    • // 24132

Debugging: Unit test for incorrect lookup of single name

Debugging - incorrect lookup of single name. Make sure there is a unit test created for this (will have to go look at the contents of the test database for a person where there is only one record with their first name).

  • In test data, look for first name that only occurs once, and that has a last name.

In [ ]:
# load test data fixture JSON into memory.
fixture_directory_path = '/home/jonathanmorgan/work/django/research/context/text/fixtures'
test_data_fixture_file_name = 'sourcenet_unittest_data.json'
fixture_file_path = fixture_directory_path + "/" + test_data_fixture_file_name

# constants-ish
person_model_name = "sourcenet.person"
json_name_model = "model"
json_name_fields = "fields"
json_name_field_first_name = "first_name"
json_name_field_last_name = "last_name"

# tracking counts of names
model_name_to_count_map = {}
model_none_count = -1
record_count = -1
person_count = -1
first_name_to_info_map = {}
first_name_to_count_map = {}
data_record = None
person_info = None
person_fields = None
model = ""
first_name = ""
first_name_count = -1
first_name_none_count = -1
last_name = ""

# read file and use json to load it.
with open( fixture_file_path, 'r' ) as fixture_file:
    
    # load into memory and parse.
    fixture_data = json.load( fixture_file )

#-- END with open( fixture_file_path )... --#
    
# how many things?
print( "There are " + str( len( fixture_data ) ) + " items in fixture " + fixture_file_path + " - type: " + str( type( fixture_data ) ) )

# loop over records.
model_none_count = 0
record_count = 0
person_count = 0
person_info = None
first_name_none_count = 0
first_name_empty_count = 0
for data_record in fixture_data:

    record_count += 1

    # First, check to see if type is sourcenet.person
    model = data_record.get( json_name_model, None )
    if ( ( model != None ) and ( model == person_model_name ) ):

        if model not in model_name_to_count_map:

            # add to count map with count set to 1.
            model_name_to_count_map[ model ] = 1

        else:

            # increment counter
            model_count = model_name_to_count_map[ model ]
            model_count += 1
            model_name_to_count_map[ model ] = model_count

        #-- END check to see if model in model_name_set. --#

        # it is a person.
        person_count += 1
        person_info = data_record
        person_fields = person_info.get( json_name_fields, None )

        # make sure we have fields
        if ( person_fields is not None ):

            # we have fields.

            # get first name
            first_name = person_fields.get( json_name_field_first_name, None )
            #print( "First name string = " + str( first_name ) )
            last_name = person_fields.get( json_name_field_last_name, None )

            # Got one?
            if ( first_name is not None ):

                # ...that is not empty
                first_name = first_name.strip()
                if ( first_name != "" ):

                    # yes.  Check for first name in count map.
                    if ( first_name in first_name_to_count_map ):

                        # it is there.  Increment count and clear out person_info...
                        first_name_count = first_name_to_count_map[ first_name ]
                        first_name_count += 1
                        first_name_to_count_map[ first_name ] = first_name_count

                        # ...and clear out the person info.
                        first_name_to_info_map[ first_name ] = None


                    else:

                        # it is not there.  Set count to 1 and store person_info.
                        first_name_count = 1
                        first_name_to_count_map[ first_name ] = first_name_count
                        first_name_to_info_map[ first_name ] = person_info

                    #-- END check to see if first name. --#

                else:
                
                    # increment first_name_empty_count --#
                    first_name_empty_count += 1
                
                #-- END check for first name not empty. --#

            else:
                
                # first name is None.
                first_name_none_count += 1
                
            #-- END check for first name. --#

        else:

            print( "ERROR - no fields in " + str( person_info ) )

        #-- END check for fields --#
        
    else:
        
        if ( model is not None ):
            
            if model not in model_name_to_count_map:
                
                model_name_to_count_map[ model ] = 1
                
            else:
                
                model_count = -1
                model_count = model_name_to_count_map[ model ]
                model_count += 1
                model_name_to_count_map[ model ] = model_count
                
            #-- END check to see if model in model_name_set. --#
            
        else:
            
            model_none_count += 1
            
        #-- END check to see if model is None --#

    #-- END check to see if sourcenet.person --#

#-- END loop over data records. --#

print( "Found " + str( len( first_name_to_count_map ) ) + " sourcenet.person records out of " + str( record_count ) + " records." )
print( "first names to counts: " + str( first_name_to_count_map ) )
print( "==> Model name set = " + str( model_name_to_count_map ) + "; model_none_count = " + str( model_none_count ) )
print( "==> first_name_none_count = " + str( first_name_none_count ) )
print( "==> first_name_empty_count = " + str( first_name_empty_count ) )
  • then, pick one and make a unit test case looking that first name up, and make sure it creates a new person, rather than returning the person that already exists.

In [ ]:
selected_person = first_name_to_info_map.get( 'Anirban', None )
print( "Selected person: " )
print( str( selected_person ) )

Selected person:

{
    'model': 'sourcenet.person',
    'pk': 526,
    'fields':
    {
        'create_date': '2015-03-09T10:54:55.115',
        'last_name': 'Basu',
        'full_name_string': 'Anirban Basu',
        'gender': 'male',
        'nameparser_pickled': None,
        'name_prefix': None,
        'name_suffix': None,
        'last_modified': '2015-03-09T10:55:06.865',
        'nickname': None,
        'original_name_string': None,
        'first_name': 'Anirban',
        'middle_name': '',
        'is_ambiguous': False,
        'notes': '',
        'title': 'chief economist, Associated Builders and Contractors Inc.',
        'capture_method': None
    }
}

Adding 2 test cases to context_text/tests/article_coder/test_article_coder.py, function test_lookup_person():

#----------------------------------------------------------------------#
# !test 5 - Single name, single match test - should not match.
# - "Anirban" matches one person with that first name ( 526 - "Anirban Basu" )
# - should not be counted as a match.
# - No match, do create.
#----------------------------------------------------------------------#

#----------------------------------------------------------------------#
# !test 6 - 526 - Anirban Basu - use name.       
#----------------------------------------------------------------------#

NEXT