2017.06.01 - work log - prelim_month - remove single names
In [ ]:
import datetime
import json
import six
print( "packages imported at " + str( datetime.datetime.now() ) )
In [ ]:
%pwd
First, initialize my dev django project, so I can run code in this notebook that references my django models and can talk to the database using my project's settings.
You need to have installed your virtualenv with django as a kernel, then select that kernel for this notebook.
In [ ]:
%run django_init.py
Import any sourcenet
or context_analysis
models or classes.
In [ ]:
# django imports
from django.contrib.auth.models import User
# sourcenet shared
from context_text.shared.person_details import PersonDetails
# sourcenet models.
from context_text.models import Article
from context_text.models import Article_Data
from context_text.models import Article_Subject
from context_text.models import Person
from context_text.shared.context_text_base import ContextTextBase
from context_text.tests.models.test_Article_Data_model import Article_Data_Copy_Tester
# sourcenet article_coding
from context_text.article_coding.article_coding import ArticleCoder
from context_text.article_coding.manual_coding.manual_article_coder import ManualArticleCoder
# context_analysis models.
from context_analysis.models import Reliability_Names
from context_analysis.reliability.reliability_names_builder import ReliabilityNamesBuilder
print( "sourcenet and context_analysis packages imported at " + str( datetime.datetime.now() ) )
Next, remove all reliability data that refers to a single name using the "View reliability name information" screen:
To start, enter the following in fields there:
You should see lots of entries where coders detected people who were mentioned only by their first name.
Need to look at each instance where a person has a single name part.
Most are probably instances where the computer correctly detected the name part, but where you don't have enough name to match it to a person so the human coding protocol directed them to not capture the name fragment.
However, there might be some where a coder made a mistake and just captured a name part for a person whose full name was in the story. To check, click the "Article ID" in the column that has a link to article ID. It will take you to a view of the article where all the people who coded the article are included, with each detection of a mention or quotation displayed next to the paragraph where the person was originally first detected.
So for each instance of a single name part:
click on the article ID link in the row to go to the article and check to see if there is person whose name the fragment is a part of ( http://research.local/research/context/text/article/article_data/view_with_text/ ).
If there is a person with a full name to which the name fragment is a reference, check to see if the human coder has data for the full person.
if human coder has data for the full person, merge:
Configure:
this will bring up all coding for the article whose ID you entered.
if human coder did not detect person or made some other kind of error:
ground_truth
".if this is the first time you've used the "ground_truth
" user, log into the django admin ( http://research.local/research/admin/ ) and:
ground_truth
" user's password.log in to the coding tool ( http://research.local/research/context/text/article/code/ ) as the "ground_truth
" user and fix the coding for the article in question.
rebuild Reliability_Names for just that article.
Reliability_Names
). Make sure to specify both label and Article ID, so you don't delete more than you intend.re-run Reliability_Names creation for the article ( Make new Reliability_Names
). Specify:
Then, you'll need to re-fix any other problems with the article. They'll pop into the list of single-name records again, for example.
if needed, clean up/merge the two Reliability_Names records for the person.
Remove the Reliability_Names
row with the name fragment from reliability data.
To get rid of all matching in this list, click the checkbox in the "select" column next to each one you want to delete (sorry, no "select all" just yet), choose "Delete selected" from the "Reliability names action:" field at the top of the list, then click the "Do action" button.
Table of Reliability_Names records removed because of single names. Table is here still, but all records were moved to Reliability_Names_Evaluation table in django:
is_single_name
" set to True in the Reliability_Names_Evaluation
table in django: http://research.local/research/admin/context_analysis/reliability_names_evaluation/?is_single_name__exact=1&label=prelim_month&o=-1.7.8.3.5ID | Article | Article_Data | Article_Subject | Type |
---|---|---|---|---|
8618 | Article 20739 | Article_Data 2980 | 11006 (AS) - Christopher ( id = 2776; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Christopher | CORRECT |
8705 | Article 20843 | Article_Data 3000 | 11102 (AS) - Brock ( id = 2798; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Brock | CORRECT |
9163 | Article 20912 | Article_Data 3015 | 11147 (AS) - Slate ( id = 2801; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Slate | CORRECT |
9243 | Article 20936 | Article_Data 3002 | 11110 (AS) - Christine ( id = 2800; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Christine | CORRECT |
9506 | Article 21049 | Article_Data 3034 | 11232 (AS) - Reyes ( id = 2809; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Reyes | CORRECT |
9584 | Article 21080 | Article_Data 3037 | 11244 (AS) - Ben ( id = 2811; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Ben | CORRECT |
9594 | Article 21080 | Article_Data 3037 | 11249 (AS) - Carman ( id = 2814; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Carman | CORRECT |
9583 | Article 21080 | Article_Data 3037 | 11252 (AS) - Culter ( id = 2816; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Culter | CORRECT |
9590 | Article 21080 | Article_Data 3037 | 11243 (AS) - Emma ( id = 2810; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Emma | CORRECT |
9595 | Article 21080 | Article_Data 3037 | 11250 (AS) - Isabel ( id = 2815; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Isabel | CORRECT |
9592 | Article 21080 | Article_Data 3037 | 11245 (AS) - Tarina ( id = 2812; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Tarina | CORRECT |
9671 | Article 21109 | Article_Data 3045 | 11289 (AS) - Pat ( id = 2818; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Pat | CORRECT |
9681 | Article 21112 | Article_Data 3038 | 11255 (AS) - Obama ( id = 842; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Obama | CORRECT |
9687 | Article 21113 | Article_Data 3033 | 11225 (AS) - Steve ( id = 2806; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Steve | CORRECT |
9688 | Article 21113 | Article_Data 3033 | 11227 (AS) - Jay ( id = 2807; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Jay | CORRECT |
9684 | Article 21113 | Article_Data 3033 | 11228 (AS) - Jesse ( id = 2808; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Jesse | CORRECT |
9696 | Article 21117 | Article_Data 3049 | 8511 (AS) - Mary ( id = 1912; capture_method = None ) (mentioned; individual) ==> name: Mary | CORRECT |
9707 | Article 21121 | Article_Data 3048 | 11306 (AS) - Jesus ( id = 1451; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Jesus | CORRECT |
9584 | Article 21080 | Article_Data 3037 | 11244 (AS) - Ben ( id = 2811; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Ben | CORRECT |
9594 | Article 21080 | Article_Data 3037 | 11249 (AS) - Carman ( id = 2814; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Carman | CORRECT |
9583 | Article 21080 | Article_Data 3037 | 11252 (AS) - Culter ( id = 2816; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Culter | CORRECT |
9590 | Article 21080 | Article_Data 3037 | 11243 (AS) - Emma ( id = 2810; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Emma | CORRECT |
9595 | Article 21080 | Article_Data 3037 | 11250 (AS) - Isabel ( id = 2815; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Isabel | CORRECT |
9592 | Article 21080 | Article_Data 3037 | 11245 (AS) - Tarina ( id = 2812; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Tarina | CORRECT |
9671 | Article 21109 | Article_Data 3045 | 11289 (AS) - Pat ( id = 2818; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Pat | CORRECT |
9681 | Article 21112 | Article_Data 3038 | 11255 (AS) - Obama ( id = 842; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Obama | CORRECT |
9688 | Article 21113 | Article_Data 3033 | 11227 (AS) - Jay ( id = 2807; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Jay | CORRECT |
9684 | Article 21113 | Article_Data 3033 | 11228 (AS) - Jesse ( id = 2808; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Jesse | CORRECT |
9687 | Article 21113 | Article_Data 3033 | 11225 (AS) - Steve ( id = 2806; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Steve | CORRECT |
9690 | Article 21116 | Article_Data 3044 | 11288 (AS) - More ( id = 2817; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: More | ERROR |
9823 | Article 21190 | Article_Data 1641 | 5423 (AS) - Bill ( id = 855; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Bill | CORRECT |
10076 | Article 21287 | Article_Data 1635 | 5396 (AS) - Vernon ( id = 847; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Vernon | CORRECT |
10422 | Article 21435 | Article_Data 1651 | 5460 (AS) - Joshua ( id = 869; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Joshua | CORRECT |
7956 | Article 21509 | Article_Data 1660 | 5498 (AS) - Jaidon ( id = 875; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Jaidon | CORRECT |
7958 | Article 21509 | Article_Data 1660 | 5500 (AS) - Kaidon ( id = 877; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Kaidon | CORRECT |
7959 | Article 21509 | Article_Data 1660 | 5502 (AS) - Rushing ( id = 878; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Rushing | CORRECT |
8064 | Article 21569 | Article_Data 1666 | 5534 (AS) - Betty ( id = 885; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Betty | CORRECT |
8662 | Article 21719 | Article_Data 1706 | 5692 (AS) - Al ( id = 934; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Al | CORRECT |
8689 | Article 21781 | Article_Data 1726 | 5779 (AS) - Benjamin ( id = 961; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Benjamin | CORRECT |
8769 | Article 21813 | Article_Data 1727 | 5783 (AS) - Jaidon ( id = 875; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Jaidon | CORRECT |
8771 | Article 21813 | Article_Data 1727 | 5786 (AS) - Kaidon ( id = 877; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Kaidon | CORRECT |
8767 | Article 21813 | Article_Data 1727 | 5784 (AS) - Kyanie ( id = 775; capture_method = OpenCalais_REST_API ) (quoted; individual) ==> name: Kyanie | CORRECT |
8278 | Article 21827 | Article_Data 1721 | 5753 (AS) - Schultz ( id = 752; capture_method = OpenCalais_REST_API ) (quoted; individual) ==> name: Schultz | CORRECT |
9013 | Article 21886 | Article_Data 3060 | 11386 (AS) - Dan ( id = 2824; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Dan | CORRECT |
9010 | Article 21886 | Article_Data 3060 | 11387 (AS) - Tom ( id = 2825; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Tom | CORRECT |
9043 | Article 21898 | Article_Data 2556 | 9006 (AS) - Dave ( id = 2178; capture_method = None ) (quoted; individual) ==> name: Dave | CORRECT |
9064 | Article 21903 | Article_Data 1746 | 5895 (AS) - Daniel ( id = 1000; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Daniel | CORRECT |
9067 | Article 21903 | Article_Data 1746 | 5892 (AS) - Patsy ( id = 998; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Patsy | CORRECT |
9150 | Article 21931 | Article_Data 1750 | 5912 (AS) - Christ ( id = 1006; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Christ | CORRECT |
9424 | Article 22034 | Article_Data 3076 | 11457 (AS) - Ken ( id = 2840; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Ken | CORRECT |
9573 | Article 22099 | Article_Data 3071 | 11440 (AS) - Abigail ( id = 2835; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Abigail | CORRECT |
9572 | Article 22099 | Article_Data 3071 | 11439 (AS) - Sonneveldt ( id = 2834; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Sonneveldt | CORRECT |
9578 | Article 22100 | Article_Data 3067 | 11424 (AS) - Don ( id = 2830; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Don | CORRECT |
9791 | Article 22199 | Article_Data 3080 | 11468 (AS) - Marcia ( id = 2842; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Marcia | CORRECT |
9799 | Article 22200 | Article_Data 3084 | 11486 (AS) - Bryan ( id = 2845; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Bryan | CORRECT |
9992 | Article 22281 | Article_Data 3083 | 11483 (AS) - Tassell ( id = 2844; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Tassell | MERGED |
10044 | Article 22302 | Article_Data 3086 | 11494 (AS) - Greg ( id = 2847; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Greg | CORRECT |
10082 | Article 22313 | Article_Data 3104 | 11569 (AS) - Noonday ( id = 2852; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Noonday | CORRECT |
10084 | Article 22313 | Article_Data 3104 | 11576 (AS) - Tecumseh ( id = 2853; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Tecumseh | CORRECT |
8546 | Article 22566 | Article_Data 3133 | 11705 (AS) - Obama ( id = 842; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Obama | CORRECT |
8160 | Article 22625 | Article_Data 3147 | 11757 (AS) - Laura ( id = 2864; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Laura | CORRECT |
8341 | Article 22681 | Article_Data 3151 | 11770 (AS) - Aaron ( id = 2865; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Aaron | CORRECT |
10464 | Article 22690 | Article_Data 2666 | 9493 (AS) - Obama ( id = 842; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Obama ==> organization: United States | CORRECT |
8435 | Article 22714 | Article_Data 3164 | 11831 (AS) - Caleb ( id = 2873; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Caleb | CORRECT |
8515 | Article 22747 | Article_Data 3161 | 11815 (AS) - Corey ( id = 2870; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Corey | CORRECT |
8566 | Article 22765 | Article_Data 3158 | 11806 (AS) - Coopersville ( id = 2869; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Coopersville | ERROR |
8565 | Article 22765 | Article_Data 3158 | 11803 (AS) - Dave ( id = 2868; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Dave | CORRECT |
9459 | Article 22790 | Article_Data 2685 | 9584 (AS) - Obama ( id = 842; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Obama ==> organization: United States | CORRECT |
8827 | Article 22854 | Article_Data 2672 | 9528 (AS) - Obama ( id = 842; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Obama ==> organization: United States | CORRECT |
8834 | Article 22858 | Article_Data 3177 | 11923 (AS) - Schwaraswak ( id = 2886; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Schwaraswak | MISSPELLED |
8882 | Article 22869 | Article_Data 3187 | 11980 (AS) - Olympian ( id = 2895; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Olympian | SPORTS BRIEFS |
8901 | Article 22874 | Article_Data 3182 | 11949 (AS) - Benthem, Amber ( id = 1929; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Amber | LOOKUP ERROR |
8903 | Article 22874 | Article_Data 3182 | 11950 (AS) - Alyssa ( id = 2889; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Alyssa | CORRECT |
8902 | Article 22874 | Article_Data 3182 | 11947 (AS) - Amanda ( id = 2888; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Amanda | CORRECT |
8944 | Article 22887 | Article_Data 3173 | 11888 (AS) - Ben ( id = 2811; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Ben | CORRECT |
8933 | Article 22887 | Article_Data 3173 | 11898 (AS) - Julie ( id = 2885; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Julie | CORRECT |
8932 | Article 22887 | Article_Data 3173 | 11890 (AS) - Alexis ( id = 2884; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Alexis | CORRECT |
9108 | Article 22946 | Article_Data 3179 | 11931 (AS) - Bob ( id = 2887; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Bob | CORRECT |
9210 | Article 22970 | Article_Data 3183 | 11955 (AS) - Bartholomew, Logan ( id = 2579; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Logan | LOOKUP ERROR |
9211 | Article 22970 | Article_Data 3183 | 11957 (AS) - Matt ( id = 2891; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Matt | CORRECT |
8143 | Article 23055 | Article_Data 3194 | 12014 (AS) - Lansing ( id = 2902; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Lansing | ERROR |
8323 | Article 23065 | Article_Data 3193 | 12006 (AS) - Eli ( id = 2899; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Eli | CORRECT |
8325 | Article 23065 | Article_Data 3193 | 12010 (AS) - Betty ( id = 885; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Betty | PET( CHICKEN ) |
8324 | Article 23065 | Article_Data 3193 | 12009 (AS) - Mabel ( id = 2900; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Mabel | PET( CHICKEN ) |
8329 | Article 23065 | Article_Data 3193 | 12011 (AS) - Violet ( id = 2901; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Violet | PET( CHICKEN ) |
9617 | Article 23139 | Article_Data 3198 | 12037 (AS) - Bernice ( id = 2906; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Bernice | CORRECT |
9989 | Article 23169 | Article_Data 3195 | MERGED - 12020 (AS) - Keller ( id = 2903; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Keller | MERGED |
9764 | Article 23216 | Article_Data 3211 | 12093 (AS) - Sue ( id = 2908; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Sue | CORRECT |
8109 | Article 23223 | Article_Data 2700 | 9639 (AS) - Satan ( id = 2518; capture_method = None ) (mentioned; individual) ==> name: Satan | CORRECT |
8112 | Article 23223 | Article_Data 3212 | 12096 (AS) - Linda ( id = 2911; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Linda | QUOTED |
8111 | Article 23223 | Article_Data 3212 | 12095 (AS) - Tristan ( id = 2910; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Tristan | CORRECT |
9841 | Article 23243 | Article_Data 3222 | 12149 (AS) - Ignacio ( id = 2916; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Ignacio | CORRECT |
9842 | Article 23243 | Article_Data 3222 | 12150 (AS) - Paulina ( id = 2917; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Paulina | CORRECT |
10017 | Article 23313 | Article_Data 3225 | 12163 (AS) - Jesus ( id = 1451; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Jesus | CORRECT |
10560 | Article 23379 | Article_Data 3232 | 12206 (AS) - Barbara ( id = 2924; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Barbara | CORRECT |
10561 | Article 23379 | Article_Data 3232 | 12205 (AS) - Van Tubbergen, Tyler ( id = 2089; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Tyler | LOOKUP ERROR |
10215 | Article 23384 | Article_Data 3231 | 12195 (AS) - Diana ( id = 2920; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Diana | CHILD |
10216 | Article 23384 | Article_Data 3231 | 12197 (AS) - Shakulu ( id = 2921; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Shakulu | CHILD |
10217 | Article 23384 | Article_Data 3231 | 12199 (AS) - Shabani ( id = 2922; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Shabani | CHILD |
10218 | Article 23384 | Article_Data 3231 | 12200 (AS) - Joana ( id = 2923; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Joana | CHILD |
10220 | Article 23384 | Article_Data 2726 | 9752 (AS) - Jesus ( id = 1451; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Jesus | CORRECT |
10220 | Article 23384 | Article_Data 3231 | 12196 (AS) - Jesus ( id = 1451; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Jesus | CORRECT |
10102 | Article 23403 | Article_Data 3237 | 12231 (AS) - Danielle ( id = 2928; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Danielle | CORRECT |
10103 | Article 23403 | Article_Data 3237 | 12233 (AS) - Jacob ( id = 2929; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Jacob | CORRECT |
10109 | Article 23403 | Article_Data 3237 | 12230 (AS) - Madyson ( id = 2927; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Madyson | CORRECT |
9078 | Article 23449 | Article_Data 3233 | 12213 (AS) - Howard ( id = 2763; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Howard | CORRECT |
9385 | Article 23476 | Article_Data 3240 | 12247 (AS) - Greg ( id = 2847; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Greg | CORRECT |
9376 | Article 23476 | Article_Data 3240 | 12248 (AS) - Cathy ( id = 2931; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Cathy | CORRECT |
9377 | Article 23476 | Article_Data 3240 | 12249 (AS) - Brandon ( id = 2932; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Brandon | CORRECT |
9373 | Article 23476 | Article_Data 3240 | 12243 (AS) - Chase ( id = 2863; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Chase | CORRECT |
10448 | Article 23491 | Article_Data 3249 | 12299 (AS) - Twinkle ( id = 2938; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Twinkle | ERROR |
10525 | Article 23529 | Article_Data 3247 | 12288 (AS) - Liz ( id = 2936; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Liz | CORRECT |
7951 | Article 23555 | Article_Data 3261 | 12351 (AS) - Samson ( id = 2941; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Samson | SON |
7961 | Article 23559 | Article_Data 3254 | 12315 (AS) - Saturn ( id = 2940; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Saturn | ERROR |
7968 | Article 23562 | Article_Data 3260 | 12347 (AS) - Samson ( id = 2941; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Samson | SON |
8120 | Article 23631 | Article_Data 3274 | 12404 (AS) - Madonna ( id = 2946; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Madonna | ERROR |
8119 | Article 23631 | Article_Data 3274 | 12405 (AS) - Davenport ( id = 2947; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Davenport | ERROR |
9730 | Article 23663 | Article_Data 3266 | 12375 (AS) - Stephanie ( id = 2942; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Stephanie | SPOUSE |
8312 | Article 23699 | Article_Data 3271 | 12390 (AS) - Ed ( id = 2943; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Ed | FATHER |
9323 | Article 23804 | Article_Data 3312 | 12611 (AS) - Marc ( id = 2969; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Marc | HUSBAND |
9321 | Article 23804 | Article_Data 3312 | 12612 (AS) - Anthony ( id = 2967; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Anthony | GRANDCHILD |
9319 | Article 23804 | Article_Data 3312 | 12609 (AS) - Angelina ( id = 2951; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Angelina | GRANDCHILD |
8981 | Article 23921 | Article_Data 3283 | 12444 (AS) - Saigon ( id = 2952; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Saigon | ERROR |
8185 | Article 23974 | Article_Data 3292 | 12492 (AS) - Smitty ( id = 2789; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Smitty | ERROR |
9153 | Article 23982 | Article_Data 3296 | 12515 (AS) - Matt ( id = 2891; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Matt | NO LAST NAME (HOMELESS) |
9183 | Article 23988 | Article_Data 3300 | 12529 (AS) - Barbara ( id = 2924; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Barbara | WIFE |
8683 | Article 24082 | Article_Data 3304 | 12552 (AS) - Kulmeet ( id = 2958; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Kulmeet | WIFE |
9470 | Article 24111 | Article_Data 3311 | 12591 (AS) - Jonathan ( id = 2964; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Jonathan | SON |
9469 | Article 24111 | Article_Data 3311 | 12588 (AS) - Stephen ( id = 2963; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Stephen | SON |
9908 | Article 24132 | Article_Data 3308 | 12566 (AS) - Erin ( id = 2961; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Erin | CHILD - DAUGHTER |
9911 | Article 24132 | Article_Data 3308 | 12563 (AS) - Robyn ( id = 2959; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Robyn | CHILD - DAUGHTER |
9910 | Article 24132 | Article_Data 3308 | 12567 (AS) - Straayer, Mason ( id = 2504; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Mason | CHILD - SON, LOOKUP ERROR |
8619 | Article 20739 | Article_Data 2980 | 11003 (AS) - Gosselin, Kate ( id = 1608; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Kate | LOOKUP ERROR |
9591 | Article 21080 | Article_Data 3037 | 11247 (AS) - O'Brien, Collin ( id = 2619; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Collin | LOOKUP ERROR |
10429 | Article 21435 | Article_Data 1651 | 5462 (AS) - Bielinski, Jamie ( id = 780; capture_method = OpenCalais_REST_API ) (mentioned; individual) ==> name: Jamie | LOOKUP ERROR |
8291 | Article 21644 | Article_Data 1681 | 5598 (AS) - Taylor, Helen ( id = 576; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Helen | LOOKUP ERROR |
8459 | Article 21699 | Article_Data 2537 | 8929 (AS) - Felske, Jon ( id = 188; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Felske ==> organization: Wyoming Public Schools; Godwin Heights Public Schools | CORRECT |
8666 | Article 21719 | Article_Data 1706 | 5695 (AS) - Vander Hart, Ginny ( id = 271; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Ginny | LOOKUP ERROR |
9770 | Article 22194 | Article_Data 3077 | 11462 (AS) - Bartholomew, Logan ( id = 2579; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Logan | LOOKUP ERROR |
9771 | Article 22194 | Article_Data 3077 | 11463 (AS) - O'Brien, Collin ( id = 2619; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Collin | LOOKUP ERROR |
9876 | Article 23264 | Article_Data 3216 | 12113 (AS) - Kowalczyk-Fulmer, Katie ( id = 704; capture_method = OpenCalais_REST_API ) (mentioned; individual) ==> name: Katie | LOOKUP ERROR |
9374 | Article 23476 | Article_Data 3240 | 12244 (AS) - Broaddus, Adrienne ( id = 785; capture_method = OpenCalais_REST_API ) (mentioned; individual) ==> name: Adrienne | LOOKUP ERROR |
8008 | Article 23577 | Article_Data 3270 | 12388 (AS) - Garcia, Juan ( id = 1627; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Juan | LOOKUP ERROR |
In [ ]:
# folded this code into the Reliability_Names delete screen (context_analysis/views.py-->reliability_names_disagreement_view().
'''
reliability_names_id = "7956"
article_id = "21509"
article_data_id = "1660"
article_subject = "5498 (AS) - Jaidon ( id = 875; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Jaidon"
markdown_string = "| "
markdown_string += reliability_names_id
markdown_string += " | Article ["
markdown_string += article_id
markdown_string += "](http://research.local/research/context/text/article/article_data/view_with_text/?article_id="
markdown_string += article_id
markdown_string += ") | Article_Data ["
markdown_string += article_data_id
markdown_string += "](http://research.local/research/context/text/article/article_data/view/?article_id="
markdown_string += article_id
markdown_string += "&article_data_id_select="
markdown_string += article_data_id
markdown_string += ") | "
markdown_string += article_subject
markdown_string += " |"
print( "Reliability_Names removal Markdown:\n" + markdown_string )
'''
For some, need to merge a single-name detection by Calais with full-name detection by ground_truth (an OpenCalais error - did not detect full name - combined with lookup error - didn't lookup the right person since missed part of his or her name). Will still have subsequently deleted one or more duplicate rows.
event_type
" set to "merge" in the Reliability_Names_Evaluation table in django: http://research.local/research/admin/context_analysis/reliability_names_evaluation/?event_type__exact=merge&label=prelim_month&o=-1.7.8.3.5ID FROM | ID INTO | Article | Article_Data | Article_Subject |
---|---|---|---|---|
9506 | 9507 | Article 21049 | FROM 3034 TO 2443 |
8494 (AS) - Reyes, Ivette ( id = 1899; capture_method = None ) (quoted; individual) ( quotes: 1; mentions: 1 ) ==> Name: Ivette Reyes |
9992 | 9993 | Article 22281 | FROM 3083 TO 2635 | 9369 (AS) - Tassell, Leslie ( id = 2328; capture_method = None ) (mentioned; individual) ==> name: Leslie E. Tassell |
9989 | 9988 | Article 23169 | FROM 3195 TO 2719 | 12020 (AS) - Keller ( id = 2903; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Keller |
In [ ]:
reliability_names_id_from = "9989"
reliability_names_id_to = "9988"
article_id = "23169"
article_data_id_from = "3195"
article_data_id_to = "2719"
article_subject = "12020 (AS) - Keller ( id = 2903; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Keller"
markdown_string = "| "
markdown_string += reliability_names_id_from
markdown_string += " | "
markdown_string += reliability_names_id_to
markdown_string += " | Article ["
markdown_string += article_id
markdown_string += "](http://research.local/research/context/text/article/article_data/view_with_text/?article_id="
markdown_string += article_id
markdown_string += ") | FROM ["
markdown_string += article_data_id_from
markdown_string += "](http://research.local/research/context/text/article/article_data/view/?article_id="
markdown_string += article_id
markdown_string += "&article_data_id_select="
markdown_string += article_data_id_from
markdown_string += ") TO ["
markdown_string += article_data_id_to
markdown_string += "](http://research.local/research/context/text/article/article_data/view/?article_id="
markdown_string += article_id
markdown_string += "&article_data_id_select="
markdown_string += article_data_id_to
markdown_string += ") | "
markdown_string += article_subject
markdown_string += " |"
print( "Reliabilty_Names merge Markdown:\n" + markdown_string )
For a few, the error will be on the part of the human coder. For human error, we create a new "ground_truth
" record that we will correct, so we preserve original coding (and evidence of errors) in case we want or need that information later.
In [ ]:
coder_id = 9
reliability_names_id = "9720"
article_id = "21130"
article_data_id = "2489"
article_subject = "8719 (AS) - Krueger ( id = 2015; capture_method = None ) (quoted; individual) ==> name: Krueger ==> organization: Ottawa County; Republican; Republican Party"
markdown_string = "| "
markdown_string += reliability_names_id
markdown_string += " | Article ["
markdown_string += article_id
markdown_string += "](http://research.local/research/context/text/article/article_data/view_with_text/?article_id="
markdown_string += article_id
markdown_string += ") | Article_Data ["
markdown_string += article_data_id
markdown_string += "](http://research.local/research/context/text/article/article_data/view/?article_id="
markdown_string += article_id
markdown_string += "&article_data_id_select="
markdown_string += article_data_id
markdown_string += ") | "
markdown_string += "Coder=" + str( coder_id )
markdown_string += " | "
markdown_string += article_subject
markdown_string += " |"
print( "Reliability_Names ground_truth fix Markdown:\n" + markdown_string )
Retrieve the ground truth user, then make a deep copy of an Article_Data record, assigning it to the ground truth user.
In [ ]:
# declare variables
ground_truth_user = None
ground_truth_user_id = -1
id_of_article_data_to_copy = -1
new_article_data = None
new_article_data_id = -1
validation_error_list = None
validation_error_count = -1
validation_error = None
# set ID of article data we want to copy.
id_of_article_data_to_copy = 2489
# get the ground_truth user's ID.
ground_truth_user = ContextTextBase.get_ground_truth_coding_user()
ground_truth_user_id = ground_truth_user.id
# make the copy
new_article_data = Article_Data.make_deep_copy( id_of_article_data_to_copy,
new_coder_user_id_IN = ground_truth_user_id )
new_article_data_id = new_article_data.id
# validate it.
validation_error_list = Article_Data_Copy_Tester.validate_article_data_deep_copy( original_article_data_id_IN = id_of_article_data_to_copy,
copy_article_data_id_IN = new_article_data_id,
copy_coder_user_id_IN = ground_truth_user_id )
# get error count:
validation_error_count = len( validation_error_list )
if ( validation_error_count > 0 ):
# loop and output messages
for validation_error in validation_error_list:
print( "- Validation erorr: " + str( validation_error ) )
#-- END loop over validation errors. --#
else:
# no errors - success!
print( "Record copy a success (as far as we know)!" )
#-- END check to see if validation errors --#
print( "copied Article_Data id " + str( id_of_article_data_to_copy ) + " INTO Article_Data id " + str( new_article_data_id ) + " at " + str( datetime.datetime.now() ) )
Delete the Article_Data whose ID you specify (intended only when you accidentally create a "ground_truth
").
In [ ]:
# declare variables
article_data_id = -1
article_data = None
do_delete = False
# set ID.
article_data_id = 3314
# get model instance
article_data = Article_Data.objects.get( id = article_data_id )
# got something?
if ( article_data is not None ):
# yes. Delete?
if ( do_delete == True ):
# delete.
print( "Deleting Article_Data: " + str( article_data ) )
article_data.delete()
else:
# no delete.
print( "Found Article_Data: " + str( article_data ) + ", but not deleting." )
#-- END check to see if we delete --#
#-- END check to see if Article_Data match. --#
Reliability_Names
for an articleSteps:
set up a call to the Reliability_Names program that just generates data for:
Reliability_Names
for article
In [ ]:
# declare variables
article_id = -1
label = ""
do_delete = False
row_string_list = None
# first, get existing Reliability_Names rows for article and label.
article_id = 21130
label = "prelim_month"
#do_delete = True
# Do the delete
row_string_list = Reliability_Names.delete_reliabilty_names_for_article( article_id,
label_IN = label,
do_delete_IN = do_delete )
# print the strings.
for row_string in row_string_list:
# print it.
print( row_string )
#-- END loop over row strings --#
Reliability_Names
In [ ]:
# django imports
#from django.contrib.auth.models import User
# sourcenet imports
#from context_text.shared.context_text_base import ContextTextBase
# context_analysis imports
#from context_analysis.reliability.reliability_names_builder import ReliabilityNamesBuilder
# declare variables
my_reliability_instance = None
tag_in_list = []
article_id_in_list = []
label = ""
# declare variables - user setup
current_coder = None
current_coder_id = -1
current_index = -1
# declare variables - Article_Data filtering.
coder_type = ""
# make reliability instance
my_reliability_instance = ReliabilityNamesBuilder()
#===============================================================================
# configure
#===============================================================================
# list of tags of articles we want to process.
tag_in_list = [ "grp_month", ]
# list of IDs of articles we want to process:
article_id_in_list = [ 21130, ]
# label to associate with results, for subsequent lookup.
label = "prelim_month"
# ! ====> map coders to indices
# set it up so that...
# ...the ground truth user has highest priority (4) for index 1...
current_coder = ContextTextBase.get_ground_truth_coding_user()
current_coder_id = current_coder.id
current_index = 1
current_priority = 4
my_reliability_instance.add_coder_at_index( current_coder_id, current_index, priority_IN = current_priority )
# ...coder ID 8 is priority 3 for index 1...
current_coder_id = 8
current_index = 1
current_priority = 3
my_reliability_instance.add_coder_at_index( current_coder_id, current_index, priority_IN = current_priority )
# ...coder ID 9 is priority 2 for index 1...
current_coder_id = 9
current_index = 1
current_priority = 2
my_reliability_instance.add_coder_at_index( current_coder_id, current_index, priority_IN = current_priority )
# ...coder ID 10 is priority 1 for index 1...
current_coder_id = 10
current_index = 1
current_priority = 1
my_reliability_instance.add_coder_at_index( current_coder_id, current_index, priority_IN = current_priority )
# ...and automated coder (2) is index 2
current_coder = ContextTextBase.get_automated_coding_user()
current_coder_id = current_coder.id
current_index = 2
current_priority = 1
my_reliability_instance.add_coder_at_index( current_coder_id, current_index, priority_IN = current_priority )
# and only look at coding by those users. And...
# configure so that it limits to automated coder_type of OpenCalais_REST_API_v2.
coder_type = "OpenCalais_REST_API_v2"
#my_reliability_instance.limit_to_automated_coder_type = "OpenCalais_REST_API_v2"
my_reliability_instance.automated_coder_type_include_list.append( coder_type )
# output debug JSON to file
#my_reliability_instance.debug_output_json_file_path = "/home/jonathanmorgan/" + label + ".json"
#===============================================================================
# process
#===============================================================================
# process articles
my_reliability_instance.process_articles( tag_in_list,
article_id_in_list_IN = article_id_in_list )
# output to database.
my_reliability_instance.output_reliability_data( label )
Notes and questions:
what to do about a misspelled name within an article? Single name - removing all. But making note:
In article 21080, Reliability_Names 9583, name = Culter, should have been Cutler - quoted, graf: 13, index: 1322
Article 22858 - 8834 - Schwaraswak, should have been Scharaswak.
What to do about single last name that is the correct last name of a person where the other name parts were detected by a person? Leave it in and map it to the correct Article_Data?
Article 23223 | Article_Data 3212 | 12096 (AS) - Linda ( id = 2911; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Linda |
Portion of Song title: | 10448 | Article 23491 | Article_Data 3249 | 12299 (AS) - Twinkle ( id = 2938; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Twinkle |
Errors:
Article 21116
Article 22765
Article 23055
Article 23491
Article 23559
Article 23631
Article 23631
Article 23921
Article 23974
Article 21080
Debugging:
figure out why removing "Kreuger" from "ground_truth
" coder's Article_Data doesn't cause it to be removed from the Reliability_Names output.
Issue: Single name resolves to wrong person
In [ ]:
# initialization
test_manual_article_coder = None
test_person_details = None
test_article_subject = None
test_user = None
# create ManualArticleCoder
test_manual_article_coder = ManualArticleCoder()
# get test user
test_user = ContextTextBase.get_automated_coding_user()
12567 (AS) - Straayer, Mason ( id = 2504; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Mason
Details:
In [ ]:
name_string = "Amber"
should_be = 0
test_qs = Person.find_person_from_name( name_string )
match_count = test_qs.count()
error_string = "==> Person.find_person_from_name( " + name_string + " ) --> " + str( match_count ) + " should = " + str( should_be )
print( error_string )
person_counter = 0
for person_instance in test_qs:
person_counter += 1
print( "Person " + str( person_counter ) + ": " + str( person_instance ) )
#-- END loop over person matches --#
In [ ]:
# test person values
lookup_person_id = -1
lookup_person_name = "Amber"
lookup_title = ""
# get ManualArticleCoder instance - in init above
#test_manual_article_coder = ManualArticleCoder()
# set up person details
test_person_details = PersonDetails()
test_person_details[ ArticleCoder.PARAM_PERSON_NAME ] = lookup_person_name
#test_person_details[ ArticleCoder.PARAM_PERSON_ID ] = lookup_person_id
#test_person_details[ ArticleCoder.PARAM_TITLE ] = lookup_title
# make test Article_Subject
test_article_subject = Article_Subject()
# lookup person - returns person and confidence score inside
# Article_Subject instance.
test_article_subject = test_manual_article_coder.lookup_person( test_article_subject,
lookup_person_name,
create_if_no_match_IN = False,
update_person_IN = False,
person_details_IN = test_person_details )
# get results from Article_Subject
test_person = test_article_subject.person
test_person_match_list = test_article_subject.person_match_list # list of Person instances
# output results
print( "==> ManualArticleCoder.lookup_person() result for name \"" + lookup_person_name + "\": " + str( test_person ) )
print( "ManualArticleCoder.lookup_person() match list: " + str( test_person_match_list ) )
In [ ]:
# create ManualArticleCoder instance.
#test_manual_article_coder = ManualArticleCoder()
# get an article.
test_article = Article.objects.get( pk = 22874 )
# create bare-bones Article_Data
test_article_data = Article_Data()
test_article_data.coder = test_user
test_article_data.article = test_article
test_article_data.save()
#----------------------------------------------------------------------#
# !test 1 - with person ID.
#----------------------------------------------------------------------#
# retrieve person information.
person_name = "Amber"
title = ""
person_id = -1
# set up person details
person_details = PersonDetails()
person_details[ ManualArticleCoder.PARAM_PERSON_NAME ] = person_name
person_details[ ManualArticleCoder.PARAM_NEWSPAPER_INSTANCE ] = test_article.newspaper
# got a title?
if ( ( title is not None ) and ( title != "" ) ):
# we do. store it in person_details.
person_details[ ManualArticleCoder.PARAM_TITLE ] = title
#-- END check to see if title --#
# got a person ID?
if ( ( person_id is not None ) and ( person_id != "" ) and ( person_id > 0 ) ):
# we do. store it in person_details.
person_details[ ManualArticleCoder.PARAM_PERSON_ID ] = person_id
#-- END check to see if title --#
# create an article_subject.
test_article_subject = test_manual_article_coder.process_subject_name( test_article_data, person_name, person_details_IN = person_details )
# check to make sure not None
status_message = "==> ArticleCoder.process_subject_name() for name \"" + str( person_name ) + "\": " + str( test_article_subject )
print( status_message )
Details:
In [ ]:
name_string = "Logan"
should_be = 0
test_qs = Person.find_person_from_name( name_string )
match_count = test_qs.count()
error_string = "==> Person.find_person_from_name( " + name_string + " ) --> " + str( match_count ) + " should = " + str( should_be )
print( error_string )
person_counter = 0
for person_instance in test_qs:
person_counter += 1
print( "Person " + str( person_counter ) + ": " + str( person_instance ) )
#-- END loop over person matches --#
In [ ]:
# test person values
lookup_person_id = -1
lookup_person_name = "Logan"
lookup_title = ""
# get ManualArticleCoder instance
test_manual_article_coder = ManualArticleCoder()
# set up person details
test_person_details = PersonDetails()
test_person_details[ ArticleCoder.PARAM_PERSON_NAME ] = lookup_person_name
#test_person_details[ ArticleCoder.PARAM_PERSON_ID ] = lookup_person_id
#test_person_details[ ArticleCoder.PARAM_TITLE ] = lookup_title
# make test Article_Subject
test_article_subject = Article_Subject()
# lookup person - returns person and confidence score inside
# Article_Subject instance.
test_article_subject = test_manual_article_coder.lookup_person( test_article_subject,
lookup_person_name,
create_if_no_match_IN = False,
update_person_IN = False,
person_details_IN = test_person_details )
# get results from Article_Subject
test_person = test_article_subject.person
test_person_match_list = test_article_subject.person_match_list # list of Person instances
# output results
print( "==> ManualArticleCoder.lookup_person() result for name \"" + lookup_person_name + "\": " + str( test_person ) )
print( "ManualArticleCoder.lookup_person() match list: " + str( test_person_match_list ) )
In [ ]:
# create ManualArticleCoder instance.
#test_manual_article_coder = ManualArticleCoder()
# get an article.
test_article = Article.objects.get( pk = 22970 )
# create bare-bones Article_Data
test_article_data = Article_Data()
test_article_data.coder = test_user
test_article_data.article = test_article
test_article_data.save()
#----------------------------------------------------------------------#
# !test 1 - with person ID.
#----------------------------------------------------------------------#
# retrieve person information.
person_name = "Logan"
title = ""
person_id = -1
# set up person details
person_details = PersonDetails()
person_details[ ManualArticleCoder.PARAM_PERSON_NAME ] = person_name
person_details[ ManualArticleCoder.PARAM_NEWSPAPER_INSTANCE ] = test_article.newspaper
# got a title?
if ( ( title is not None ) and ( title != "" ) ):
# we do. store it in person_details.
person_details[ ManualArticleCoder.PARAM_TITLE ] = title
#-- END check to see if title --#
# got a person ID?
if ( ( person_id is not None ) and ( person_id != "" ) and ( person_id > 0 ) ):
# we do. store it in person_details.
person_details[ ManualArticleCoder.PARAM_PERSON_ID ] = person_id
#-- END check to see if title --#
# create an article_subject.
test_article_subject = test_manual_article_coder.process_subject_name( test_article_data, person_name, person_details_IN = person_details )
# check to make sure not None
status_message = "==> ArticleCoder.process_subject_name() for name \"" + str( person_name ) + "\": " + str( test_article_subject )
print( status_message )
Write SQL Query to find all Aritcle_Subject where name in Article_Subejct is one word, Person associated has multiple names parts.
Then, go through those in the disagreements screen.
Look for Article_Subjects with a single name word (like "Logan" or "Amber") and an associated Person that has more than one name (for reason why, see Issue: Single name resolves to wrong person below).
One to test on: Article 23476 - 12244 (AS) - Broaddus, Adrienne ( id = 785; capture_method = OpenCalais_REST_API ) (mentioned; individual) ==> name: Adrienne
SELECT sas.id AS sas_id, sas.article_data_id, sas.name, sas.person_id, sas.name, sas.lookup_name, sp.id AS sp_id, sp.full_name_string, sad.id AS sad_id, sad.article_id
FROM context_text_article_subject sas
INNER JOIN context_text_person sp ON sas.person_id = sp.id
INNER JOIN context_text_article_data sad ON sas.article_data_id = sad.id
WHERE SPLIT_PART( sas.name, ' ', 2 ) = ''
AND SPLIT_PART( sp.full_name_string, ' ', 2 ) != ''
ORDER BY sas.id DESC;
23 instances where single name corresponds to multiple name parts person in 21 articles.
SELECT DISTINCT( sad.article_id )
FROM context_text_article_subject sas
INNER JOIN context_text_person sp ON sas.person_id = sp.id
INNER JOIN context_text_article_data sad ON sas.article_data_id = sad.id
WHERE SPLIT_PART( sas.name, ' ', 2 ) = ''
AND SPLIT_PART( sp.full_name_string, ' ', 2 ) != ''
ORDER BY sad.article_id ASC;
Article IDs:
If you limit to just articles with tag "grp_month", you are down to 15:
SELECT DISTINCT( sad.article_id )
FROM context_text_article_subject sas
INNER JOIN context_text_person sp ON sas.person_id = sp.id
INNER JOIN context_text_article_data sad ON sas.article_data_id = sad.id
INNER JOIN taggit_taggeditem tti ON sad.article_id = tti.object_id
WHERE SPLIT_PART( sas.name, ' ', 2 ) = ''
AND SPLIT_PART( sp.full_name_string, ' ', 2 ) != ''
AND tti.content_type_id = 13
AND tti.tag_id = 14
ORDER BY sad.article_id ASC;
Article IDs:
Debugging - incorrect lookup of single name. Make sure there is a unit test created for this (will have to go look at the contents of the test database for a person where there is only one record with their first name).
In [ ]:
# load test data fixture JSON into memory.
fixture_directory_path = '/home/jonathanmorgan/work/django/research/context/text/fixtures'
test_data_fixture_file_name = 'sourcenet_unittest_data.json'
fixture_file_path = fixture_directory_path + "/" + test_data_fixture_file_name
# constants-ish
person_model_name = "sourcenet.person"
json_name_model = "model"
json_name_fields = "fields"
json_name_field_first_name = "first_name"
json_name_field_last_name = "last_name"
# tracking counts of names
model_name_to_count_map = {}
model_none_count = -1
record_count = -1
person_count = -1
first_name_to_info_map = {}
first_name_to_count_map = {}
data_record = None
person_info = None
person_fields = None
model = ""
first_name = ""
first_name_count = -1
first_name_none_count = -1
last_name = ""
# read file and use json to load it.
with open( fixture_file_path, 'r' ) as fixture_file:
# load into memory and parse.
fixture_data = json.load( fixture_file )
#-- END with open( fixture_file_path )... --#
# how many things?
print( "There are " + str( len( fixture_data ) ) + " items in fixture " + fixture_file_path + " - type: " + str( type( fixture_data ) ) )
# loop over records.
model_none_count = 0
record_count = 0
person_count = 0
person_info = None
first_name_none_count = 0
first_name_empty_count = 0
for data_record in fixture_data:
record_count += 1
# First, check to see if type is sourcenet.person
model = data_record.get( json_name_model, None )
if ( ( model != None ) and ( model == person_model_name ) ):
if model not in model_name_to_count_map:
# add to count map with count set to 1.
model_name_to_count_map[ model ] = 1
else:
# increment counter
model_count = model_name_to_count_map[ model ]
model_count += 1
model_name_to_count_map[ model ] = model_count
#-- END check to see if model in model_name_set. --#
# it is a person.
person_count += 1
person_info = data_record
person_fields = person_info.get( json_name_fields, None )
# make sure we have fields
if ( person_fields is not None ):
# we have fields.
# get first name
first_name = person_fields.get( json_name_field_first_name, None )
#print( "First name string = " + str( first_name ) )
last_name = person_fields.get( json_name_field_last_name, None )
# Got one?
if ( first_name is not None ):
# ...that is not empty
first_name = first_name.strip()
if ( first_name != "" ):
# yes. Check for first name in count map.
if ( first_name in first_name_to_count_map ):
# it is there. Increment count and clear out person_info...
first_name_count = first_name_to_count_map[ first_name ]
first_name_count += 1
first_name_to_count_map[ first_name ] = first_name_count
# ...and clear out the person info.
first_name_to_info_map[ first_name ] = None
else:
# it is not there. Set count to 1 and store person_info.
first_name_count = 1
first_name_to_count_map[ first_name ] = first_name_count
first_name_to_info_map[ first_name ] = person_info
#-- END check to see if first name. --#
else:
# increment first_name_empty_count --#
first_name_empty_count += 1
#-- END check for first name not empty. --#
else:
# first name is None.
first_name_none_count += 1
#-- END check for first name. --#
else:
print( "ERROR - no fields in " + str( person_info ) )
#-- END check for fields --#
else:
if ( model is not None ):
if model not in model_name_to_count_map:
model_name_to_count_map[ model ] = 1
else:
model_count = -1
model_count = model_name_to_count_map[ model ]
model_count += 1
model_name_to_count_map[ model ] = model_count
#-- END check to see if model in model_name_set. --#
else:
model_none_count += 1
#-- END check to see if model is None --#
#-- END check to see if sourcenet.person --#
#-- END loop over data records. --#
print( "Found " + str( len( first_name_to_count_map ) ) + " sourcenet.person records out of " + str( record_count ) + " records." )
print( "first names to counts: " + str( first_name_to_count_map ) )
print( "==> Model name set = " + str( model_name_to_count_map ) + "; model_none_count = " + str( model_none_count ) )
print( "==> first_name_none_count = " + str( first_name_none_count ) )
print( "==> first_name_empty_count = " + str( first_name_empty_count ) )
In [ ]:
selected_person = first_name_to_info_map.get( 'Anirban', None )
print( "Selected person: " )
print( str( selected_person ) )
Selected person:
{
'model': 'sourcenet.person',
'pk': 526,
'fields':
{
'create_date': '2015-03-09T10:54:55.115',
'last_name': 'Basu',
'full_name_string': 'Anirban Basu',
'gender': 'male',
'nameparser_pickled': None,
'name_prefix': None,
'name_suffix': None,
'last_modified': '2015-03-09T10:55:06.865',
'nickname': None,
'original_name_string': None,
'first_name': 'Anirban',
'middle_name': '',
'is_ambiguous': False,
'notes': '',
'title': 'chief economist, Associated Builders and Contractors Inc.',
'capture_method': None
}
}
Adding 2 test cases to context_text/tests/article_coder/test_article_coder.py, function test_lookup_person():
#----------------------------------------------------------------------#
# !test 5 - Single name, single match test - should not match.
# - "Anirban" matches one person with that first name ( 526 - "Anirban Basu" )
# - should not be counted as a match.
# - No match, do create.
#----------------------------------------------------------------------#
#----------------------------------------------------------------------#
# !test 6 - 526 - Anirban Basu - use name.
#----------------------------------------------------------------------#