2016.12.11 - work log - prelim_month - remove single names

Setup

Setup - Imports


In [ ]:
import datetime

print( "packages imported at " + str( datetime.datetime.now() ) )

In [ ]:
%pwd

Setup - Initialize Django

First, initialize my dev django project, so I can run code in this notebook that references my django models and can talk to the database using my project's settings.

You need to have installed your virtualenv with django as a kernel, then select that kernel for this notebook.


In [ ]:
%run django_init.py

Data cleanup

Remove single-name reliability data

Next, remove all reliability data that refers to a single name using the "View reliability name information" screen:

To start, enter the following in fields there:

  • Label: "prelim_month"
  • Coders to compare (1 through ==>): 2
  • Reliability names filter type: Select "Lookup"
  • [Lookup] - Person has first name, no other name parts.: CHECK the checkbox

You should see lots of entries where coders detected people who were mentioned only by their first name.

Single-name data assessment

Need to look at each instance where a person has a single name part.

Most are probably instances where the computer correctly detected the name part, but where you don't have enough name to match it to a person so the human coding protocol directed them to not capture the name fragment.

However, there might be some where a coder made a mistake and just captured a name part for a person whose full name was in the story. To check, click the "Article ID" in the column that has a link to article ID. It will take you to a view of the article where all the people who coded the article are included, with each detection of a mention or quotation displayed next to the paragraph where the person was originally first detected.

So for each instance of a single name part:

  • click on the article ID link in the row to go to the article and check to see if there is person whose name the fragment is a part of ( http://research.local/research/context/text/article/article_data/view_with_text/ ).

    • If there is a person with a full name to which the name fragment is a reference, check to see if the human coder has data for the full person.

      • if human coder has data for the full person, merge:

        • go to the disagreement view page: http://research.local/research/context/analysis/reliability/names/disagreement/view
        • Configure:

          • Label: - "prelim_month"
          • Coders to compare (1 through ==>): - 2
          • Reliability names filter type: - Select "Lookup"
          • [Lookup] - Associated Article IDs (comma-delimited): - Enter the ID of the article the coding belonged to.
        • this will bring up all coding for the article whose ID you entered.

        • In the "select" column, click the checkbox in the row where there is a single name part that needs to be merged.
        • In the "merge INTO" column, click the checbox in the row with the full name for that person.
        • In "Reliability Names Action", choose "Merge Coding --> FROM 1 SELECTED / INTO 1"
        • Click "Do Action" button.
      • if human coder did not detect person:

        • create a copy of the person's Article_Data.
        • assign it to coder "ground_truth".
        • add person to the new record.
        • save
        • regenerate Reliability_Names for just that article.

          • (?) Remove old Reliability_Names for that article.
          • re run names creation for the article.
          • TK
        • merge the two Reliability_Names records for the person.

    • Remove the Reliability_Names row with the name fragment from reliability data.

Delete single-name data

To get rid of all matching in this list, click the checkbox in the "select" column next to each one you want to delete (sorry, no "select all" just yet), choose "Delete selected" from the "Reliability names action:" field at the top of the list, then click the "Do action" button.

Reliability_Names records Removed:

ID Article Article_Data Article_Subject
8618 Article 20739 Article_Data 2980 11006 (AS) - Christopher ( id = 2776; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Christopher
8705 Article 20843 Article_Data 3000 11102 (AS) - Brock ( id = 2798; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Brock
9163 Article 20912 Article_Data 3015 11147 (AS) - Slate ( id = 2801; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Slate
9243 Article 20936 Article_Data 3002 11110 (AS) - Christine ( id = 2800; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Christine
9506 Article 21049 Article_Data 3034 11232 (AS) - Reyes ( id = 2809; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Reyes
9584 Article 21080 Article_Data 3037 11244 (AS) - Ben ( id = 2811; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Ben
9594 Article 21080 Article_Data 3037 11249 (AS) - Carman ( id = 2814; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Carman
9583 Article 21080 Article_Data 3037 11252 (AS) - Culter ( id = 2816; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Culter
9590 Article 21080 Article_Data 3037 11243 (AS) - Emma ( id = 2810; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Emma
9595 Article 21080 Article_Data 3037 11250 (AS) - Isabel ( id = 2815; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Isabel
9592 Article 21080 Article_Data 3037 11245 (AS) - Tarina ( id = 2812; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Tarina
9671 Article 21109 Article_Data 3045 11289 (AS) - Pat ( id = 2818; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Pat
9681 Article 21112 Article_Data 3038 11255 (AS) - Obama ( id = 842; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Obama
9687 Article 21113 Article_Data 3033 11225 (AS) - Steve ( id = 2806; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Steve
9688 Article 21113 Article_Data 3033 11227 (AS) - Jay ( id = 2807; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Jay
9684 Article 21113 Article_Data 3033 11228 (AS) - Jesse ( id = 2808; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Jesse
9696 Article 21117 Article_Data 3049 8511 (AS) - Mary ( id = 1912; capture_method = None ) (mentioned; individual) ==> name: Mary
9707 Article 21121 Article_Data 3048 11306 (AS) - Jesus ( id = 1451; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Jesus
9584 Article 21080 Article_Data 3037 11244 (AS) - Ben ( id = 2811; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Ben
9594 Article 21080 Article_Data 3037 11249 (AS) - Carman ( id = 2814; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Carman
9583 Article 21080 Article_Data 3037 11252 (AS) - Culter ( id = 2816; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Culter
9590 Article 21080 Article_Data 3037 11243 (AS) - Emma ( id = 2810; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Emma
9595 Article 21080 Article_Data 3037 11250 (AS) - Isabel ( id = 2815; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Isabel
9592 Article 21080 Article_Data 3037 11245 (AS) - Tarina ( id = 2812; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Tarina
9671 Article 21109 Article_Data 3045 11289 (AS) - Pat ( id = 2818; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Pat
9681 Article 21112 Article_Data 3038 11255 (AS) - Obama ( id = 842; capture_method = OpenCalais_REST_API_v1 ) (mentioned; individual) ==> name: Obama
9688 Article 21113 Article_Data 3033 11227 (AS) - Jay ( id = 2807; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Jay
9684 Article 21113 Article_Data 3033 11228 (AS) - Jesse ( id = 2808; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Jesse
9687 Article 21113 Article_Data 3033 11225 (AS) - Steve ( id = 2806; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: Steve
9690 Article 21116 Article_Data 3044 11288 (AS) - More ( id = 2817; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: More

In [ ]:
reliability_names_id = "9720"
article_id = "21130"
article_data_id = "3052"
article_subject = "11288 (AS) - More ( id = 2817; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: More"
    
markdown_string = "| "
markdown_string += reliability_names_id
markdown_string += " | Article ["
markdown_string += article_id
markdown_string += "](http://research.local/research/context/text/article/article_data/view_with_text/?article_id="
markdown_string += article_id
markdown_string += ") | Article_Data ["
markdown_string += article_data_id
markdown_string += "](http://research.local/research/context/text/article/article_data/view/?article_id="
markdown_string += article_id
markdown_string += "&article_data_id_select="
markdown_string += article_data_id
markdown_string += ") | "
markdown_string += article_subject
markdown_string += " |"

print( "Reliability_Names removal Markdown:\n" + markdown_string )

Reliability_Names records merged:

ID FROM ID INTO Article Article_Data Article_Subject
9506 9507 Article 21049 FROM 3034
TO 2443
8494 (AS) - Reyes, Ivette ( id = 1899; capture_method = None ) (quoted; individual) ( quotes: 1; mentions: 1 ) ==> Name: Ivette Reyes

In [ ]:
reliability_names_id_from = "9506"
reliability_names_id_to = "9507"
article_id = "21049"
article_data_id_from = "3034"
article_data_id_to = "2443"
article_subject = "8494 (AS) - Reyes, Ivette ( id = 1899; capture_method = None ) (quoted; individual) ( quotes: 1; mentions: 1 ) ==> Name: Ivette Reyes"

markdown_string = "| "
markdown_string += reliability_names_id_from
markdown_string += " | "
markdown_string += reliability_names_id_to
markdown_string += " | Article ["
markdown_string += article_id
markdown_string += "](http://research.local/research/context/text/article/article_data/view_with_text/?article_id="
markdown_string += article_id
markdown_string += ") | FROM ["
markdown_string += article_data_id_from
markdown_string += "](http://research.local/research/context/text/article/article_data/view/?article_id="
markdown_string += article_id
markdown_string += "&article_data_id_select="
markdown_string += article_data_id_from
markdown_string += ") TO ["
markdown_string += article_data_id_to
markdown_string += "](http://research.local/research/context/text/article/article_data/view/?article_id="
markdown_string += article_id
markdown_string += "&article_data_id_select="
markdown_string += article_data_id_to
markdown_string += ") | "
markdown_string += article_subject
markdown_string += " |"

print( "Reliabilty_Names merge Markdown:\n" + markdown_string )

Notes

Notes:

  • what to do about a misspelled name within an article? Single name - removing all. But making note:

    • In article 21080, Reliability_Names 9583, name = Culter, should have been Cutler - quoted, graf: 13, index: 1322

      • single name, remove it - but, this will cut both ways - when both name parts present, sometimes will work out, sometimes will be false positive.
  • What to do about single last name that is the correct last name of a person where the other name parts were detected by a person? Leave it in and map it to the correct Article_Data?

  • Obama? One name, but it is a well-known one, and preceded by "President". Still, single name, removed it.

Errors:

  • Article 21116

    • Paragraph 12: More than 600 works of art were added to the museum's collection under her leadership, most notably Ellsworth Kelly's "Blue White," a 25-foot- tall wall sculpture that was commissioned in 2006 for the museum's entry pavilion.
    • User: 2 - automated (OpenCalais_REST_API_v2)
    • 11288 (AS) - More ( id = 2817; capture_method = OpenCalais_REST_API_v2 ) (mentioned; individual) ==> name: More

TODO

TODO:

  • TK

Coding to look into

Coding decisions to look at more closely:

Debugging

Debuggin: