2016.12.11 - work log - prelim_month - remove single names
In [ ]:
import datetime
print( "packages imported at " + str( datetime.datetime.now() ) )
In [ ]:
%pwd
First, initialize my dev django project, so I can run code in this notebook that references my django models and can talk to the database using my project's settings.
You need to have installed your virtualenv with django as a kernel, then select that kernel for this notebook.
In [ ]:
%run django_init.py
Next, remove all reliability data that refers to a single name using the "View reliability name information" screen:
To start, enter the following in fields there:
You should see lots of entries where coders detected people who were mentioned only by their first name.
Need to look at each instance where a person has a single name part.
Most are probably instances where the computer correctly detected the name part, but where you don't have enough name to match it to a person so the human coding protocol directed them to not capture the name fragment.
However, there might be some where a coder made a mistake and just captured a name part for a person whose full name was in the story. To check, click the "Article ID" in the column that has a link to article ID. It will take you to a view of the article where all the people who coded the article are included, with each detection of a mention or quotation displayed next to the paragraph where the person was originally first detected.
So for each instance of a single name part:
click on the article ID link in the row to go to the article and check to see if there is person whose name the fragment is a part of ( http://research.local/research/context/text/article/article_data/view_with_text/ ).
If there is a person with a full name to which the name fragment is a reference, check to see if the human coder has data for the full person.
if human coder has data for the full person, merge:
Configure:
this will bring up all coding for the article whose ID you entered.
if human coder did not detect person:
regenerate Reliability_Names for just that article.
merge the two Reliability_Names records for the person.
Remove the Reliability_Names
row with the name fragment from reliability data.
To get rid of all matching in this list, click the checkbox in the "select" column next to each one you want to delete (sorry, no "select all" just yet), choose "Delete selected" from the "Reliability names action:" field at the top of the list, then click the "Do action" button.
Reliability_Names records Removed:
ID | Article | Article_Data | Article_Subject |
---|---|---|---|
8618 | Article 20739 | Article_Data 2980 | 11006 (AS) - Christopher ( id = 2776; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Christopher |
8705 | Article 20843 | Article_Data 3000 | 11102 (AS) - Brock ( id = 2798; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Brock |
9163 | Article 20912 | Article_Data 3015 | 11147 (AS) - Slate ( id = 2801; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Slate |
9243 | Article 20936 | Article_Data 3002 | 11110 (AS) - Christine ( id = 2800; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Christine |
9506 | Article 21049 | Article_Data 3034 | 11232 (AS) - Reyes ( id = 2809; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Reyes |
9584 | Article 21080 | Article_Data 3037 | 11244 (AS) - Ben ( id = 2811; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Ben |
9594 | Article 21080 | Article_Data 3037 | 11249 (AS) - Carman ( id = 2814; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Carman |
9583 | Article 21080 | Article_Data 3037 | 11252 (AS) - Culter ( id = 2816; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Culter |
9590 | Article 21080 | Article_Data 3037 | 11243 (AS) - Emma ( id = 2810; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Emma |
9595 | Article 21080 | Article_Data 3037 | 11250 (AS) - Isabel ( id = 2815; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Isabel |
9592 | Article 21080 | Article_Data 3037 | 11245 (AS) - Tarina ( id = 2812; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Tarina |
9671 | Article 21109 | Article_Data 3045 | 11289 (AS) - Pat ( id = 2818; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Pat |
9681 | Article 21112 | Article_Data 3038 | 11255 (AS) - Obama ( id = 842; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Obama |
9687 | Article 21113 | Article_Data 3033 | 11225 (AS) - Steve ( id = 2806; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Steve |
9688 | Article 21113 | Article_Data 3033 | 11227 (AS) - Jay ( id = 2807; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Jay |
9684 | Article 21113 | Article_Data 3033 | 11228 (AS) - Jesse ( id = 2808; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Jesse |
9696 | Article 21117 | Article_Data 3049 | 8511 (AS) - Mary ( id = 1912; capture_method = None ) (mentioned; individual) ==> name: Mary |
9707 | Article 21121 | Article_Data 3048 | 11306 (AS) - Jesus ( id = 1451; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Jesus |
9584 | Article 21080 | Article_Data 3037 | 11244 (AS) - Ben ( id = 2811; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Ben |
9594 | Article 21080 | Article_Data 3037 | 11249 (AS) - Carman ( id = 2814; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Carman |
9583 | Article 21080 | Article_Data 3037 | 11252 (AS) - Culter ( id = 2816; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Culter |
9590 | Article 21080 | Article_Data 3037 | 11243 (AS) - Emma ( id = 2810; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Emma |
9595 | Article 21080 | Article_Data 3037 | 11250 (AS) - Isabel ( id = 2815; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Isabel |
9592 | Article 21080 | Article_Data 3037 | 11245 (AS) - Tarina ( id = 2812; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Tarina |
9671 | Article 21109 | Article_Data 3045 | 11289 (AS) - Pat ( id = 2818; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Pat |
9681 | Article 21112 | Article_Data 3038 | 11255 (AS) - Obama ( id = 842; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Obama |
9688 | Article 21113 | Article_Data 3033 | 11227 (AS) - Jay ( id = 2807; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Jay |
9684 | Article 21113 | Article_Data 3033 | 11228 (AS) - Jesse ( id = 2808; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Jesse |
9687 | Article 21113 | Article_Data 3033 | 11225 (AS) - Steve ( id = 2806; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Steve |
9690 | Article 21116 | Article_Data 3044 | 11288 (AS) - More ( id = 2817; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: More |
In [ ]:
reliability_names_id = "9720"
article_id = "21130"
article_data_id = "3052"
article_subject = "11288 (AS) - More ( id = 2817; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: More"
markdown_string = "| "
markdown_string += reliability_names_id
markdown_string += " | Article ["
markdown_string += article_id
markdown_string += "](http://research.local/research/context/text/article/article_data/view_with_text/?article_id="
markdown_string += article_id
markdown_string += ") | Article_Data ["
markdown_string += article_data_id
markdown_string += "](http://research.local/research/context/text/article/article_data/view/?article_id="
markdown_string += article_id
markdown_string += "&article_data_id_select="
markdown_string += article_data_id
markdown_string += ") | "
markdown_string += article_subject
markdown_string += " |"
print( "Reliability_Names removal Markdown:\n" + markdown_string )
Reliability_Names records merged:
In [ ]:
reliability_names_id_from = "9506"
reliability_names_id_to = "9507"
article_id = "21049"
article_data_id_from = "3034"
article_data_id_to = "2443"
article_subject = "8494 (AS) - Reyes, Ivette ( id = 1899; capture_method = None ) (quoted; individual) ( quotes: 1; mentions: 1 ) ==> Name: Ivette Reyes"
markdown_string = "| "
markdown_string += reliability_names_id_from
markdown_string += " | "
markdown_string += reliability_names_id_to
markdown_string += " | Article ["
markdown_string += article_id
markdown_string += "](http://research.local/research/context/text/article/article_data/view_with_text/?article_id="
markdown_string += article_id
markdown_string += ") | FROM ["
markdown_string += article_data_id_from
markdown_string += "](http://research.local/research/context/text/article/article_data/view/?article_id="
markdown_string += article_id
markdown_string += "&article_data_id_select="
markdown_string += article_data_id_from
markdown_string += ") TO ["
markdown_string += article_data_id_to
markdown_string += "](http://research.local/research/context/text/article/article_data/view/?article_id="
markdown_string += article_id
markdown_string += "&article_data_id_select="
markdown_string += article_data_id_to
markdown_string += ") | "
markdown_string += article_subject
markdown_string += " |"
print( "Reliabilty_Names merge Markdown:\n" + markdown_string )
Notes:
what to do about a misspelled name within an article? Single name - removing all. But making note:
In article 21080, Reliability_Names 9583, name = Culter, should have been Cutler - quoted, graf: 13, index: 1322
What to do about single last name that is the correct last name of a person where the other name parts were detected by a person? Leave it in and map it to the correct Article_Data?
Errors:
Article 21116