methods paper planning
In [1]:
import datetime
import six
print( "packages imported at " + str( datetime.datetime.now() ) )
If you are using a virtualenv, make sure that you:
Since I use a virtualenv, need to get that activated somehow inside this notebook. One option is to run ../dev/wsgi.py
in this notebook, to configure the python environment manually as if you had activated the sourcenet
virtualenv. To do this, you'd make a code cell that contains:
%run ../dev/wsgi.py
This is sketchy, however, because of the changes it makes to your Python environment within the context of whatever your current kernel is. I'd worry about collisions with the actual Python 3 kernel. Better, one can install their virtualenv as a separate kernel. Steps:
activate your virtualenv:
workon sourcenet
in your virtualenv, install the package ipykernel
.
pip install ipykernel
use the ipykernel python program to install the current environment as a kernel:
python -m ipykernel install --user --name <env_name> --display-name "<display_name>"
sourcenet
example:
python -m ipykernel install --user --name sourcenet --display-name "sourcenet (Python 3)"
More details: http://ipython.readthedocs.io/en/stable/install/kernel_install.html
In [2]:
%pwd
Out[2]:
First, initialize my dev django project, so I can run code in this notebook that references my django models and can talk to the database using my project's settings.
In [3]:
%run django_init.py
In [4]:
# django imports
from context_analysis.models import Reliability_Names_Evaluation
In the methods folder, here is the order the code was run for analysis:
data_creation
(results in data
folder)reliability
- for human reliability.evaluate_disagreements
- correct human coding when they are wrong in a disagreement, to create "ground truth".precision_recall
reliability
- for comparing human and computer coding to baseline.network_analysis
results
Below are the criteria used for each paper to filter down to just locally-implemented hard news articles.
For actual code, see ./data_creation/data_creation-filter_locally_implemented_hard_news.ipynb
Definition of local hard news and in-house implementor:
Grand Rapids Press
context_text/examples/articles/articles-GRP-local_news.py
local hard news sections (stored in Article.GRP_NEWS_SECTION_NAME_LIST
):
excluding any publications with index term of "Column".
in-house implementor (based on byline patterns, stored in sourcenet.models.Article.Q_GRP_IN_HOUSE_AUTHOR
):
Byline ends in "/ THE GRAND RAPIDS PRESS", ignore case.
Q( author_varchar__iregex = r'.* */ *THE GRAND RAPIDS PRESS$'
Byline ends in "/ PRESS * EDITOR", ignore case.
Q( author_varchar__iregex = r'.* */ *PRESS .* EDITOR$' )
Byline ends in "/ GRAND RAPIDS PRESS * BUREAU", ignore case.
Q( author_varchar__iregex = r'.* */ *GRAND RAPIDS PRESS .* BUREAU$' )
Byline ends in "/ SPECIAL TO THE PRESS", ignore case.
Q( author_varchar__iregex = r'.* */ *SPECIAL TO THE PRESS$' )
In [5]:
from context_text.models import Article
In [ ]:
# how many articles in "grp_month"?
article_qs = Article.objects.filter( tags__name__in = [ "grp_month" ] )
grp_month_count = article_qs.count()
print( "grp_month count = {}".format( grp_month_count ) )
Definition of local hard news and in-house implementor:
Detroit News
context_text/examples/articles/articles-TDN-local_news.py
local hard news sections (stored in from context_text.collectors.newsbank.newspapers.DTNB import DTNB
- DTNB.NEWS_SECTION_NAME_LIST
):
in-house implementor (based on byline patterns, stored in DTNB.Q_IN_HOUSE_AUTHOR
):
Byline ends in "/ The Detroit News", ignore case.
Q( author_varchar__iregex = r'.*\s*/\s*the\s*detroit\s*news$' )
Byline ends in "Special to The Detroit News", ignore case.
Q( author_varchar__iregex = r'.*\s*/\s*special\s*to\s*the\s*detroit\s*news$' )
Byline ends in "Detroit News * Bureau", ignore case.
Q( author_varchar__iregex = r'.*\s*/\s*detroit\s*news\s*.*\s*bureau$' )
TODO:
Outline in voodoopad - "Dropbox/academia/MSU/program_stuff/voodoopad/phd.vpdoc
", note "Prelim - Notes".
Trained on 7 samples of 10 articles each. For each training set, users coded, I reviewed coding and updated protocol, then we reviewed problems and changes to protocol. After 7 sets, did formal reliability test.
Article traits:
Sample size:
Equation: $$n = \frac {(N-1)(SE)^2 + PQN}{(N-1)(SE)^2 + PQ}$$
WHERE:
from:
Articles in the reliability test sample:
number of people detected?
In [3]:
# ==> prelim_month
# init variables
n = 441
p = 0.95
ci = 0.05
z = 1.64
se = ci / z
q = 1 - p
sample_size = None
# calculate sample_size
n_minus_1 = n - 1
se_squared = se ** 2
p_times_q = p * q
n_minus_1_times_se_squared = n_minus_1 * se_squared
numerator = n_minus_1_times_se_squared + ( p_times_q * n )
denominator = n_minus_1_times_se_squared + p_times_q
sample_size = numerator / denominator
print( "prelim_month reliability minimum sample size: {}".format( sample_size ) )
for "grp_month
":
In [8]:
# ==> prelim_month
# init variables
n = 1000000000
p = 0.95
ci = 0.05
z = 1.64
se = ci / z
q = 1 - p
sample_size = None
# calculate sample_size
n_minus_1 = n - 1
se_squared = se ** 2
p_times_q = p * q
n_minus_1_times_se_squared = n_minus_1 * se_squared
numerator = n_minus_1_times_se_squared + ( p_times_q * n )
denominator = n_minus_1_times_se_squared + p_times_q
sample_size = numerator / denominator
print( "prelim_month reliability minimum sample size: {}".format( sample_size ) )
In [2]:
# ==> original sample
# init variables
n = 461
p = 0.95
ci = 0.05
z = 1.64
se = ci / z
q = 1 - p
sample_size = None
# calculate sample_size
n_minus_1 = n - 1
se_squared = se ** 2
p_times_q = p * q
n_minus_1_times_se_squared = n_minus_1 * se_squared
numerator = n_minus_1_times_se_squared + ( p_times_q * n )
denominator = n_minus_1_times_se_squared + p_times_q
sample_size = numerator / denominator
print( "original sample reliability minimum sample size: {}".format( sample_size ) )
for original sample:
A little exploration to see what the tags below contain.
In [6]:
from context_text.models import Article
In [7]:
# how many articles in "prelim_reliability_test"?
article_qs = Article.objects.filter( tags__name__in = [ "prelim_reliability_test" ] )
reliability_sample_count = article_qs.count()
print( "prelim_reliability_test count = {}".format( reliability_sample_count ) )
In [8]:
# how many articles in "prelim_reliability_combined"?
article_qs = Article.objects.filter( tags__name__in = [ "prelim_reliability_combined" ] )
reliability_sample_count = article_qs.count()
print( "prelim_reliability_combined count = {}".format( reliability_sample_count ) )
So:
prelim_reliability_test
is just Grand Rapids Press, not what I'm reporting for the paper.prelim_reliability_combined
is GRP plus The Detroit News, is what I'm reporting in the paper.The original code to generate data is in context_analysis/examples/reliability/reliability-build_name_data.py. It was used to create all the Reliability_Names data for the formal reliability test, including the initial run that only contained GRP articles, and some runs that included the automated coding alongside (no need).
The main label for reliability is prelim_reliability_combined_human
, which would not have included the index 4 with automated coder:
In [ ]:
from __future__ import unicode_literals
# django imports
from django.contrib.auth.models import User
# sourcenet imports
from context_text.shared.context_text_base import ContextTextBase
# context_analysis imports
from context_analysis.reliability.reliability_names_builder import ReliabilityNamesBuilder
# declare variables
my_reliability_instance = None
tag_list = None
label = ""
# declare variables - user setup
current_coder = None
current_coder_id = -1
current_index = -1
# declare variables - Article_Data filtering.
coder_type = ""
# make reliability instance
my_reliability_instance = ReliabilityNamesBuilder()
#===============================================================================
# configure
#===============================================================================
# list of tags of articles we want to process.
tag_list = [ "prelim_reliability_combined", ]
# label to associate with results, for subsequent lookup.
label = "prelim_reliability_combined_human"
# ! ====> map coder user IDs to indices within the reliability names table.
# set it up so that...
# ...coder ID 8 is index 1...
current_coder_id = 8
current_index = 1
my_reliability_instance.add_coder_at_index( current_coder_id, current_index )
# ...coder ID 9 is index 2...
current_coder_id = 9
current_index = 2
my_reliability_instance.add_coder_at_index( current_coder_id, current_index )
# ...coder ID 10 is index 3...
current_coder_id = 10
current_index = 3
my_reliability_instance.add_coder_at_index( current_coder_id, current_index )
# output debug JSON to file
#my_reliability_instance.debug_output_json_file_path = "/home/jonathanmorgan/" + label + ".json"
#===============================================================================
# process
#===============================================================================
# process articles
my_reliability_instance.process_articles( tag_list )
# output to database.
my_reliability_instance.output_reliability_data( label )
Path to Dropbox folder that holds PDF and Excel file output of reliability numbers:
To view results: https://research.local/research/context/analysis/reliability/names/results/view
The human-only results (the ones I will write about) are results with labels:
"prelim_reliability_combined_human_final
"
this is latest code, regenerated recently. Is identical to the results from the old code (numbers from 2016.08.27):
Dropbox/academia/MSU/program_stuff/prelim_paper/analysis/reliability/2016-data/2016.08.27-reliability-prelim_reliability_combined_human.pdf
Dropbox/academia/MSU/program_stuff/prelim_paper/analysis/reliability/2016-data/2016.08.27-reliability-prelim_reliability_combined_human.xlsx
and "prelim_reliability_combined_human
"
prelim_reliability_combined_human
", shown below, are not identical - but, the results stored in Dropbox (see above) are identical to those for "prelim_reliability_combined_human_final
". Very strange.Since they match original numbers, and since they are lower, I'll just use "prelim_reliability_combined_human_final
".
This is the code invoked by the page https://research.local/research/context/analysis/reliability/names/results/view
In [ ]:
# start to support python 3:
from __future__ import unicode_literals
from __future__ import division
#==============================================================================#
# ! imports
#==============================================================================#
# grouped by functional area, then alphabetical order by package, then
# alphabetical order by name of thing being imported.
# context_analysis imports
from context_analysis.reliability.reliability_names_analyzer import ReliabilityNamesAnalyzer
#==============================================================================#
# ! logic
#==============================================================================#
# declare variables
my_analysis_instance = None
label = ""
indices_to_process = -1
result_status = ""
# make reliability instance
my_analysis_instance = ReliabilityNamesAnalyzer()
# database connection information - 2 options... Enter it here:
#my_analysis_instance.db_username = ""
#my_analysis_instance.db_password = ""
#my_analysis_instance.db_host = "localhost"
#my_analysis_instance.db_name = "sourcenet"
# Or set up the following properties in Django_Config, inside the django admins.
# All have application of: "sourcenet-db-admin":
# - db_username
# - db_password
# - db_host
# - db_port
# - db_name
# run the analyze method, see what happens.
#label = "prelim_reliability_test"
#indices_to_process = 3
#label = "prelim_reliability_combined_human"
#indices_to_process = 3
#label = "name_data_test_combined_human"
#indices_to_process = 3
#label = "prelim_reliability_combined_human_final"
#indices_to_process = 3
#label = "prelim_reliability_combined_all"
#indices_to_process = 4
#label = "prelim_reliability_combined_all_final"
#indices_to_process = 4
#label = "prelim_reliability_test_human"
#indices_to_process = 3
#label = "prelim_reliability_test_all"
#indices_to_process = 4
label = "prelim_month"
indices_to_process = 2
result_status = my_analysis_instance.analyze_reliability_names( label, indices_to_process )
Notebooks with the original, pre-web-page code to calculate reliability:
Go to: https://research.local/research/context/analysis/reliability/names/results/view
Label: prelim_reliability_combined_human_final
Results:
label | results ID | coder1 index | coder2 index | coder1 ID | coder2 ID | count | detect % | detect A | detect pi | lookup % | lookup A | lookup-NZ % | lookup-NZ A | lookup-NZ N | type % | type A | type pi | type-NZ % | type-NZ A | type-NZ pi | type-NZ N |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
prelim_reliability_combined_human_final | 10 | 1 | 2 | 98 | 0.9795918367 | -0.0051546392 | 0.9727891156 | 0.9795918367 | 0.9791722296 | 1.0000000000 | 1.0000000000 | 96 | 0.9795918367 | -0.0051546392 | 0.9782312925 | 1.0000000000 | 1.0000000000 | 1.0000000000 | 96 | ||
prelim_reliability_combined_human_final | 11 | 1 | 3 | 10 | 98 | 0.9897959184 | 0.0000000000 | 0.9863945578 | 0.9897959184 | 0.9895861148 | 1.0000000000 | 1.0000000000 | 97 | 0.9897959184 | 0.0000000000 | 0.9891156463 | 1.0000000000 | 1.0000000000 | 1.0000000000 | 97 | |
prelim_reliability_combined_human_final | 12 | 2 | 3 | 10 | 98 | 0.9897959184 | 0.0000000000 | 0.9863945578 | 0.9897959184 | 0.9895816637 | 1.0000000000 | 1.0000000000 | 97 | 0.9897959184 | 0.0000000000 | 0.9891156463 | 1.0000000000 | 1.0000000000 | 1.0000000000 | 97 | |
Averages: | 98 | 0.9863945578333333333333333333 | -0.001718213066666666666666666667 | 0.9818594104 | 0.9863945578333333333333333333 | 0.9861133360333333333333333333 | 1.0000000000 | 1.0000000000 | 96.66666666666666666666666667 | 0.9863945578333333333333333333 | -0.001718213066666666666666666667 | 0.9854875283666666666666666667 | 1.0000000000 | 1.0000000000 | 1.0000000000 | 96.66666666666666666666666667 |
label | results ID | coder1 index | coder2 index | coder1 ID | coder2 ID | count | detect % | detect A | detect pi | lookup % | lookup A | lookup-NZ % | lookup-NZ A | lookup-NZ N | type % | type A | type pi | type-NZ % | type-NZ A | type-NZ pi | type-NZ N | 1st graf % | 1st graf A | 1st index % | 1st index A | org hash % | org hash A |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
prelim_reliability_combined_human_final | 10 | 1 | 2 | 399 | 0.9122807018 | 0.1407669798 | 0.8830409357 | 0.9122807018 | 0.9118944818 | 1.0000000000 | 1.0000000000 | 360 | 0.8922305764 | 0.7955934892 | 0.8850459482 | 0.9777777778 | 0.9523699116 | 0.9750000000 | 360 | 0.5363408521 | 0.9573273382 | 0.5087719298 | 0.9101064582 | 0.5839598997 | 0.5626481371 | ||
prelim_reliability_combined_human_final | 11 | 1 | 3 | 10 | 399 | 0.8972431078 | 0.2505447123 | 0.8629908104 | 0.8972431078 | 0.8965318523 | 1.0000000000 | 1.0000000000 | 349 | 0.8746867168 | 0.7694145966 | 0.8663324979 | 0.9742120344 | 0.9446517907 | 0.9709885387 | 349 | 0.4962406015 | 0.9117737368 | 0.4736842105 | 0.8747093023 | 0.5664160401 | 0.5380809123 | |
prelim_reliability_combined_human_final | 12 | 2 | 3 | 10 | 399 | 0.9147869674 | 0.1062664908 | 0.8863826232 | 0.9147869674 | 0.9144471807 | 1.0000000000 | 1.0000000000 | 362 | 0.8972431078 | 0.8055310893 | 0.8903926483 | 0.9806629834 | 0.9591258208 | 0.9782458564 | 362 | 0.5037593985 | 0.9086158161 | 0.4812030075 | 0.8724327241 | 0.5664160401 | 0.5514299936 | |
Averages: | 399 | 0.9081035923333333333333333333 | 0.1658593943 | 0.8774714564333333333333333333 | 0.9081035923333333333333333333 | 0.9076245049333333333333333333 | 1.0000000000 | 1.0000000000 | 357 | 0.8880534670 | 0.7901797250333333333333333333 | 0.8805903648 | 0.9775509318666666666666666667 | 0.9520491743666666666666666667 | 0.9747447983666666666666666667 | 357 | 0.5121136173666666666666666667 | 0.9259056303666666666666666667 | 0.4878863826 | 0.8857494948666666666666666667 | 0.5722639933 | 0.5507196810 |
This is not the latest code, and so not reporting it, but including it here for reference.
Go to: https://research.local/research/context/analysis/reliability/names/results/view
Label: prelim_reliability_combined_human
Results:
label | results ID | coder1 index | coder2 index | coder1 ID | coder2 ID | count | detect % | detect A | detect pi | lookup % | lookup A | lookup-NZ % | lookup-NZ A | lookup-NZ N | type % | type A | type pi | type-NZ % | type-NZ A | type-NZ pi | type-NZ N |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
prelim_reliability_combined_human | 37 | 1 | 2 | 9 | 98 | 0.9795918367 | -0.0051546392 | 0.9727891156 | 0.9795918367 | 0.9791722296 | 1.0000000000 | 1.0000000000 | 96 | 0.9795918367 | -0.0051546392 | 0.9782312925 | 1.0000000000 | 1.0000000000 | 1.0000000000 | 96 | |
prelim_reliability_combined_human | 38 | 1 | 3 | 98 | 0.9897959184 | 0.0000000000 | 0.9863945578 | 0.9897959184 | 0.9895861148 | 1.0000000000 | 1.0000000000 | 97 | 0.9897959184 | 0.0000000000 | 0.9891156463 | 1.0000000000 | 1.0000000000 | 1.0000000000 | 97 | ||
prelim_reliability_combined_human | 39 | 2 | 3 | 9 | 98 | 0.9897959184 | 0.0000000000 | 0.9863945578 | 0.9897959184 | 0.9895816637 | 1.0000000000 | 1.0000000000 | 97 | 0.9897959184 | 0.0000000000 | 0.9891156463 | 1.0000000000 | 1.0000000000 | 1.0000000000 | 97 | |
Averages: | 98 | 0.9863945578333333333333333333 | -0.001718213066666666666666666667 | 0.9818594104 | 0.9863945578333333333333333333 | 0.9861133360333333333333333333 | 1.0000000000 | 1.0000000000 | 96.66666666666666666666666667 | 0.9863945578333333333333333333 | -0.001718213066666666666666666667 | 0.9854875283666666666666666667 | 1.0000000000 | 1.0000000000 | 1.0000000000 | 96.66666666666666666666666667 |
label | results ID | coder1 index | coder2 index | coder1 ID | coder2 ID | count | detect % | detect A | detect pi | lookup % | lookup A | lookup-NZ % | lookup-NZ A | lookup-NZ N | type % | type A | type pi | type-NZ % | type-NZ A | type-NZ pi | type-NZ N | 1st graf % | 1st graf A | 1st index % | 1st index A | org hash % | org hash A |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
prelim_reliability_combined_human | 37 | 1 | 2 | 9 | 398 | 0.9170854271 | 0.1524794056 | 0.8894472362 | 0.9145728643 | 0.9142174364 | 0.9972299169 | 0.9972247563 | 361 | 0.8969849246 | 0.8038230285 | 0.8901172529 | 0.9778393352 | 0.9524469067 | 0.9750692521 | 361 | 0.5402010050 | 0.9575247587 | 0.5125628141 | 0.9105087189 | 0.5854271357 | 0.5645628699 | |
prelim_reliability_combined_human | 38 | 1 | 3 | 398 | 0.9020100503 | 0.2639413147 | 0.8693467337 | 0.8994974874 | 0.8988353338 | 0.9971428571 | 0.9971374163 | 350 | 0.8793969849 | 0.7772888300 | 0.8713567839 | 0.9742857143 | 0.9447435683 | 0.9710714286 | 350 | 0.5000000000 | 0.9121951220 | 0.4773869347 | 0.8752880184 | 0.5703517588 | 0.5427100012 | ||
prelim_reliability_combined_human | 39 | 2 | 3 | 9 | 398 | 0.9145728643 | 0.0615886682 | 0.8860971524 | 0.9145728643 | 0.9142514529 | 1.0000000000 | 1.0000000000 | 362 | 0.8969849246 | 0.8042530448 | 0.8901172529 | 0.9806629834 | 0.9591258208 | 0.9782458564 | 362 | 0.5050251256 | 0.9086158161 | 0.4824120603 | 0.8724327241 | 0.5653266332 | 0.5506229232 | |
Averages: | 398 | 0.9112227805666666666666666667 | 0.1593364628333333333333333333 | 0.8816303741 | 0.9095477386666666666666666667 | 0.9091014077 | 0.9981242580 | 0.9981207242 | 357.6666666666666666666666667 | 0.8911222780333333333333333333 | 0.7951216344333333333333333333 | 0.8838637632333333333333333333 | 0.9775960109666666666666666667 | 0.9521054319333333333333333333 | 0.9747955123666666666666666667 | 357.6666666666666666666666667 | 0.5150753768666666666666666667 | 0.9261118989333333333333333333 | 0.4907872697 | 0.8860764871333333333333333333 | 0.5737018425666666666666666667 | 0.5526319314333333333333333333 |
In preparation for using the human coding as a standard against which data created with automated tool is assessed, I performed a couple of cleaning steps:
In CA protocol, we ignored people who were referred to only with a single name part, to avoid potential for ambiguity when assigning a last name. Removed all instances where person was only ever referenced using a single-word (really single-part - only first name, mostly) name, to remove potential source of ambiguity.
Example: "Joe Smith's wife Sandy" - could assume her name is Sandy Smith, but it could be something else. For this study, removing that potential ambiguity by discarding instances where a given person's full name is never used.
Exceptions:
Types of single-named entities:
Of 143 single names removed from analysis data, 15 instances were out-and-out errors (89.5% correct):
Only noted one instance where the single-name person was quoted ("Linda") - Article 23223 | Article_Data 3212 | 12096 (AS) - Linda ( id = 2911; capture_method = OpenCalais_REST_API_v2 ) (quoted; individual) ==> name: Linda |
Assessment - OpenCalais is actually quite good at identifying single name-part references to people, for the most part. It even sometimes tacked on a last name based on the context in the article. But, most of the time it did not. Appears to be built to know of this potential, but tuned to only take action when it is certain. Not built to assume name relationships implied by things like "survived by" or "Smith's children X, Y, and Z". This is something that could be leveraged in a post-processing step if single names were left in.
Errors:
Article 21116
Article 22765
Article 23055
Article 23491
Article 23559
Article 23631
Article 23631
Article 23921
Article 23974
Article 21080
Moved to 2018.02.09-prelim-disagreement_analysis.ipynb.
Details:
2018.02.09-prelim-disagreement_analysis.ipynb
--> Deleted Reliability_Names
records2018.02.09-prelim-disagreement_analysis.ipynb
--> disagreement reason summary2018.02.09-prelim-disagreement_analysis.ipynb
--> review tagsprelim_month
vs. prelim_month_human
prelim_month
- Reliability_Names data with label prelim_month
where coder 1 is "ground truth" (corrected human coding) and coder 2 is data created by OpenCalais.prelim_month_human
- Reliability_Names data with label prelim_month_human
where coder 1 is "ground truth" (corrected human coding) and coder 2 is uncorrected human coding (for comparison).prelim_month
Calculate precision and recall for automated versus baseline - set it up so that coder 1 is human coding with ground_truth user having precedence, set up coder 2 so it is the automated coding output.
Jupyter notebooks:
Create Reliability_Names data where coder 1 is ground truth, and coder 2 is automated coder: https://research.local:8000/user/jonathanmorgan/notebooks/work/django/research/work/phd_work/methods/data_creation/prelim_month-create_Reliability_Names_data.ipynb
Calculate confusion matrices and precision/recall/F1: https://research.local:8000/user/jonathanmorgan/notebooks/work/django/research/work/phd_work/methods/precision_recall/prelim_month-confusion_matrix.ipynb
results are in Dropbox/academia/MSU/program_stuff/prelim_paper/analysis/precision_and_recall/
.
score | detect | type "author" | type "subject" | type "source" |
---|---|---|---|---|
TP | 2315 | 454 | 631 | 1080 |
TN | 0 | 1990 | 1580 | 1172 |
FP | 68 | 0 | 152 | 66 |
FN | 63 | 2 | 83 | 128 |
precision | 0.97146 | 1 | 0.80587 | 0.94241 |
recall | 0.97351 | 0.99561 | 0.88375 | 0.89404 |
F1 | 0.97248 | 0.9978 | 0.84302 | 0.91759 |
prelim_month_human
Calculate precision and recall for humans versus ground truth - set it up so that coder 1 is as it was for computer (ground_truth having precedence) and then set up coder 2 up the same way, but without ground_truth...
Jupyter notebooks:
results are in Dropbox/academia/MSU/program_stuff/prelim_paper/analysis/precision_and_recall/
.
score | detect | type "author" | type "subject" | type "source" |
---|---|---|---|---|
TP | 2309 | 453 | 646 | 1188 |
TN | 0 | 1962 | 1669 | 1189 |
FP | 19 | 1 | 24 | 16 |
FN | 93 | 5 | 82 | 28 |
precision | 0.99184 | 0.9978 | 0.96418 | 0.98671 |
recall | 0.96128 | 0.98908 | 0.88736 | 0.97697 |
F1 | 0.97632 | 0.99342 | 0.92418 | 0.98182 |
prelim_month
...Run the reliability calculations for prelim_month just to get lookup assessment (since it is not classification, precision and recall make no sense).
results:
prelim_month
Dropbox/academia/MSU/program_stuff/prelim_paper/analysis/reliability/2016-data/prelim_month-reliability_results.pdf
.label | results ID | coder1 index | coder2 index | coder1 ID | coder2 ID | count | detect % | detect A | detect pi | lookup % | lookup A | lookup-NZ % | lookup-NZ A | lookup-NZ N | type % | type A | type pi | type-NZ % | type-NZ A | type-NZ pi | type-NZ N |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
prelim_month | 41 | 1 | 2 | 9 | 2 | 456 | 1.0000000000 | 1.0000000000 | 1.0000000000 | 1.0000000000 | 1.0000000000 | 1.0000000000 | 1.0000000000 | 456 | 0.9956140351 | -0.0010989011 | 0.9941520468 | 0.9956140351 | -0.0010989011 | 0.9934210526 | 456 |
Averages: | 456 | 1.0000000000 | 1.0000000000 | 1.0000000000 | 1.0000000000 | 1.0000000000 | 1.0000000000 | 1.0000000000 | 456 | 0.9956140351 | -0.0010989011 | 0.9941520468 | 0.9956140351 | -0.0010989011 | 0.9934210526 | 456 |
label | results ID | coder1 index | coder2 index | coder1 ID | coder2 ID | count | detect % | detect A | detect pi | lookup % | lookup A | lookup-NZ % | lookup-NZ A | lookup-NZ N | type % | type A | type pi | type-NZ % | type-NZ A | type-NZ pi | type-NZ N | 1st graf % | 1st graf A | 1st index % | 1st index A | org hash % | org hash A |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
prelim_month | 41 | 1 | 2 | 9 | 2 | 1990 | 0.9341708543 | -0.0337750065 | 0.8683417085 | 0.9271356784 | 0.9270088091 | 0.9924690694 | 0.9924634556 | 1859 | 0.8597989950 | 0.7240822488 | 0.8130653266 | 0.9203873050 | 0.8309561562 | 0.8805809575 | 1859 | 0.3437185930 | 0.6123922212 | 0.3366834171 | 0.6206739538 | 0.2412060302 | -0.2349657677 |
Averages: | 1990 | 0.9341708543 | -0.0337750065 | 0.8683417085 | 0.9271356784 | 0.9270088091 | 0.9924690694 | 0.9924634556 | 1859 | 0.8597989950 | 0.7240822488 | 0.8130653266 | 0.9203873050 | 0.8309561562 | 0.8805809575 | 1859 | 0.3437185930 | 0.6123922212 | 0.3366834171 | 0.6206739538 | 0.2412060302 | -0.2349657677 |
prelim_month_human
Run the reliability calculations for prelim_month_human just to get lookup assessment (since it is not classification, precision and recall make no sense).
Jupyter notebook of agreement between corrected and uncorrected human coding: https://research.local:8000/user/jonathanmorgan/notebooks/work/django/research/work/phd_work/methods/reliability/prelim_month_human-reliability.ipynb
results are in Dropbox/academia/MSU/program_stuff/prelim_paper/analysis/reliability/2016-data/prelim_month_human-reliability_results.pdf
.
label | results ID | coder1 index | coder2 index | coder1 ID | coder2 ID | count | detect % | detect A | detect pi | lookup % | lookup A | lookup-NZ % | lookup-NZ A | lookup-NZ N | type % | type A | type pi | type-NZ % | type-NZ A | type-NZ pi | type-NZ N |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
prelim_month_human | 42 | 1 | 2 | 9 | 9 | 459 | 0.9869281046 | -0.0054824561 | 0.9738562092 | 0.9869281046 | 0.9864845280 | 1.0000000000 | 1.0000000000 | 453 | 0.9869281046 | -0.0054824561 | 0.9825708061 | 1.0000000000 | 1.0000000000 | 1.0000000000 | 453 |
Averages: | 459 | 0.9869281046 | -0.0054824561 | 0.9738562092 | 0.9869281046 | 0.9864845280 | 1.0000000000 | 1.0000000000 | 453 | 0.9869281046 | -0.0054824561 | 0.9825708061 | 1.0000000000 | 1.0000000000 | 1.0000000000 | 453 |
label | results ID | coder1 index | coder2 index | coder1 ID | coder2 ID | count | detect % | detect A | detect pi | lookup % | lookup A | lookup-NZ % | lookup-NZ A | lookup-NZ N | type % | type A | type pi | type-NZ % | type-NZ A | type-NZ pi | type-NZ N | 1st graf % | 1st graf A | 1st index % | 1st index A | org hash % | org hash A |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
prelim_month_human | 42 | 1 | 2 | 9 | 9 | 1962 | 0.9459734964 | -0.0275013096 | 0.8919469929 | 0.9454638124 | 0.9453873170 | 0.9994612069 | 0.9994608121 | 1856 | 0.9347604485 | 0.8674336065 | 0.9130139314 | 0.9881465517 | 0.9740898999 | 0.9822198276 | 1856 | 0.6121304791 | 1.0000000000 | 0.6116207951 | 0.9991667040 | 0.9576962283 | 0.9550945519 |
Averages: | 1962 | 0.9459734964 | -0.0275013096 | 0.8919469929 | 0.9454638124 | 0.9453873170 | 0.9994612069 | 0.9994608121 | 1856 | 0.9347604485 | 0.8674336065 | 0.9130139314 | 0.9881465517 | 0.9740898999 | 0.9822198276 | 1856 | 0.6121304791 | 1.0000000000 | 0.6116207951 | 0.9991667040 | 0.9576962283 | 0.9550945519 |
Generate some basic network statistics from the ground truth and automated attribution data, characterize and compare using QAP (including explaining substantial limitations of this given sparseness of networks).
This notebook, methods_paper_planning.ipynb, is the master network analysis planning notebook now, not network_analysis/methods-network_analysis-create_network_data.ipynb.
ARCHIVE - Original master network analysis notebook: network_analysis/methods-network_analysis-create_network_data.ipynb (previously named 2017.11.14-work_log-prelim-network_analysis.ipynb
).
examine traits of ground_truth and automated networks
1) create network data for each time period - network_analysis/methods-network_analysis-create_network_data.ipynb
2) network descriptives, for comparison across network slices.
3) QAP comparison of networks.
4) Question: do we care about author-info?
network_analysis/methods-network_analysis-create_network_data.ipynb
Overview:
Section 2 (2.1-2.3): Deriving network data - for all network analysis, contains the exact settings used to create the network data for each time period.
2.1 - for original week (12/06/2009-12/12/2009), original coders, networks output include:
2.2 - original week (12/06/2009-12/12/2009), new coders, networks output include:
2.3 - entire month (12/01/2009-12/31/2009).
includes all people for the entire month, networks for:
Section 3 - output from here is basis for all the R author info stuff below:
network_analysis/igraph - R igraph analysis
new notebooks: Broke out into one file per time period, and separate R data files:
igraph-grp_month-full_month.RData
.igraph-grp_month-week_1.RData
.igraph-grp_month-week_2.RData
.igraph-grp_month-week_3.RData
.original notebook: network_analysis/igraph/2017.12.02-work_log-prelim-R-igraph-grp_month.ipynb - basic network analysis of new month and week (nodes for all people from entire month, ties for whole month, then just first week) using igraph.
network_analysis/statnet - R statnet analysis
statnet-grp_month.RData
)context_analysis/r/sna/statnet/functions-statnet.r
network creation, descriptives, and QAP notebooks:
the below notebooks each create network data for a single time period, then analyze each period separately. Each includes:
notebooks:
network_analysis/statnet/R-statnet-grp_month-full_month.ipynb - full month of data.
network_analysis/statnet/R-statnet-grp_month-week_1.ipynb - full week 1 of three (2009-12-06 to 2009-12-12).
network_analysis/statnet/R-statnet-grp_month-week_2.ipynb - full week 2 of three (2009-12-13 to 2009-12-19).
network_analysis/statnet/R-statnet-grp_month-week_3.ipynb - full week 3 of three (2009-12-20 to 2009-12-26).
network comparison - then, the networks created above are compared week-to-week and each week to the month as a whole to start to look at what constitutes a network snapshot.
network_analysis/statnet/R-statnet-grp_month-compare_graphs.ipynb - QAP comparisons, both automated-to-automated and human-to-human, of:
network_analysis/statnet/R-statnet-grp_month-compare_graphs_cross_source.ipynb - To look at difference mixing and matching human and automated makes for analysis, includes QAP comparisons, baseline-to-automated (b2a) and automated-to-baseline (a2b), of:
ARCHIVE: original notebook: network_analysis/statnet/2017.12.02-work_log-prelim-R-statnet-grp_month.ipynb - basic network analysis of new month and week (nodes for all people from entire month, ties for whole month, then just first week) using statnet. Broke out into one notebook per time period, and one notebook for comparisons across time periods. All data still stored in a single RData file (statnet-grp_month.RData
).
Anaylsis of these descriptives and QAP correlations:
Comparison between automated and baseline networks (month, week1, week2, week3): Dropbox/academia/MSU/program_stuff/prelim_paper/paper/latest/network_snapshots-compare_automated_to_baseline.xlsx
network_analysis/statnet/R-statnet-grp_month-compare_graphs.ipynb
and network_analysis/statnet/R-statnet-grp_month-compare_graphs_cross_source.ipynb
above.network_analysis/author_info - Information on the authors in the data set and their network characteristics.
NOTE: Still looks to be dependent on the python author info code run in network_analysis/methods-network_analysis-create_network_data.ipynb, section 3.1 (vectors of person IDs, counts hard-coded in the R code)
new notebooks:
original notebooks:
TODO: update all below so they include the two additional weeks.
DONE:
ArticleSelectForm
and PersonSelectForm
to include field for "coder_id_priority_list
"/"person_coder_id_priority_list
".created method NetworkOutput.get_coder_id_list() that:
if prioritzed list is present:
updated NetworkOutput.create_query_set() to use get_coder_id_list() method.
need to update NetworkOutput.remove_duplicate_article_data() - it is where we choose which Article_Data to omit per article where there are duplicates. Need to go with order of list. Might already do this... Nope.
Need to test
person-coded articles:
look for differences in:
automated coder:
as long as the tests above check out, then try out the whole month, with prioritized coder list.
need to update NetworkDataOutput and children? Looks like no - all comes down to the remove_duplicate_article_data().
context_text/R/sna/sna_author_info.r
.Next step is to pull analysis together in an Excel spreadsheet like I did last time.
For old results and more detailed notes on implementation and interpretation, see Dropbox/academia/MSU/program_stuff/prelim_paper/analysis/archive/prelim_v1-2015/analysis_summary.xlsx
.
New analysis file: Dropbox/academia/MSU/program_stuff/prelim_paper/analysis/analysis_summary-2017.12.24.xlsx
Analysis charts for paper (should take all tables, convert to markdown, and add to this notebook): Dropbox/academia/MSU/program_stuff/prelim_paper/paper/latest/methods-charts.xlsx
Notes:
In general, revised procedure:
content analysis protocol to create testing data.
have automated tool code same articles.
so, won't look as much at comparing humans to computer in terms of agreement for content analysis:
Removed tabs:
agree-prelim_reliability
- old reliability coding between 2 human coders.agree-prelim_network-mentions
- agreement between traits of network data derived from human and computer code - tie weights.values-detect_names
- survey of name detection descriptives - counts across all names of how many were detected and not per coder. Will see if we need to derive this again for new coders. Probably won't.values-count_ties
- descriptives and comparison of ties weights between human and computer, to look at something like precision and recall (confusion matrix), but just comparing human and computer, not treating human as ground truth. No need for this with precision and recall stats.counts_per_person
- not sure what this is...disagreements
- similar to values-count_ties
, but higher-level analysis. Will have to create new disagreement information from results of disagreement analysis in creating evaluation data.Updated spreadsheet:
Dropbox/academia/MSU/program_stuff/prelim_paper/analysis/analysis_summary-2017.12.24.xlsx
Agreement results:
tabs "CA-reliability-author
" and "CA-reliability-subject
" are derived from work labeled "prelim_reliability_combined
" = articles from both Grand Rapids Press and Detroit News, to minimally test cross-paper use of protocol.
Dropbox/academia/MSU/program_stuff/prelim_paper/analysis/reliability/2016-data/2016.08.27-reliability-prelim_reliability_combined_human.xlsx
old results have "mentions". Mentions are weights from network data back when I derived it and stored it in a table of my own design ("context_analysis_reliability_ties
"), rather than outputting in formats readable by SNA packages. Omitting this in favor of precision and recall and network statistics.
Network results:
Main sources:
2017.12.02-work_log-prelim-R-igraph-grp_month.ipynb - basic network analysis of new month and week using igraph.
Other:
Path to paper: Dropbox/academia/MSU/program_stuff/prelim_paper/paper/latest/Morgan-Prelim.docx
TODO:
DONE:
update methods
update results
TODO:
DONE: