Notes:
Questions:
Do we want an Identifier Type separate from Entity and Relation identifiers? I think we do, so we can specify the entity type(s) a given identifier should be used on.
In [1]:
me = "render-context-networks-dev"
In [2]:
debug_flag = False
In [3]:
import datetime
from django.db.models import Avg, Max, Min, Q
from django.utils.text import slugify
import json
import logging
import six
In [4]:
%pwd
Out[4]:
In [5]:
# current working folder
current_working_folder = "/home/jonathanmorgan/work/django/research/work/phd_work/analysis"
current_datetime = datetime.datetime.now()
current_date_string = current_datetime.strftime( "%Y-%m-%d-%H-%M-%S" )
configure logging for this notebook's kernel (If you do not run this cell, you'll get the django application's logging configuration.
In [6]:
logging_file_name = "{}/logs/{}-{}.log.txt".format( current_working_folder, me, current_date_string )
logging.basicConfig(
level = logging.DEBUG,
format = '%(asctime)s - %(levelname)s - %(name)s - %(message)s',
filename = logging_file_name,
filemode = 'w' # set to 'a' if you want to append, rather than overwrite each time.
)
print( "Logging initialized, to {}".format( logging_file_name ) )
If you are using a virtualenv, make sure that you:
Since I use a virtualenv, need to get that activated somehow inside this notebook. One option is to run ../dev/wsgi.py
in this notebook, to configure the python environment manually as if you had activated the sourcenet
virtualenv. To do this, you'd make a code cell that contains:
%run ../dev/wsgi.py
This is sketchy, however, because of the changes it makes to your Python environment within the context of whatever your current kernel is. I'd worry about collisions with the actual Python 3 kernel. Better, one can install their virtualenv as a separate kernel. Steps:
activate your virtualenv:
workon research
in your virtualenv, install the package ipykernel
.
pip install ipykernel
use the ipykernel python program to install the current environment as a kernel:
python -m ipykernel install --user --name <env_name> --display-name "<display_name>"
sourcenet
example:
python -m ipykernel install --user --name sourcenet --display-name "research (Python 3)"
More details: http://ipython.readthedocs.io/en/stable/install/kernel_install.html
First, initialize my dev django project, so I can run code in this notebook that references my django models and can talk to the database using my project's settings.
In [7]:
# init django
django_init_folder = "/home/jonathanmorgan/work/django/research/work/phd_work"
django_init_path = "django_init.py"
if( ( django_init_folder is not None ) and ( django_init_folder != "" ) ):
# add folder to front of path.
django_init_path = "{}/{}".format( django_init_folder, django_init_path )
#-- END check to see if django_init folder. --#
In [8]:
%run $django_init_path
In [9]:
# context imports
from context.export.network.filter_spec import FilterSpec
from context.export.network.network_data_request import NetworkDataRequest
from context.export.network.network_output import NetworkOutput
from context.models import Entity
from context.models import Entity_Identifier_Type
from context.models import Entity_Identifier
from context.models import Entity_Relation
from context.models import Entity_Relation_Type
from context.models import Entity_Type
from context.shared.context_base import ContextBase
from context.tests.export.network.test_helper import TestHelper
from context.tests.export.network.test_NetworkDataOutput_class import NetworkDataOutputTest
Create a LoggingHelper instance to use to log debug and also print at the same time.
Preconditions: Must be run after Django is initialized, since python_utilities
is in the django path.
In [10]:
# python_utilities
from python_utilities.logging.logging_helper import LoggingHelper
# init
my_logging_helper = LoggingHelper()
my_logging_helper.set_logger_name( me )
log_message = None
Now, we need to render out our network data from context, so we can then test it out and make sure we are getting the same answers we got from the old way.
First step is to translate the filter criteria for nodes and ties from the existing admin for the querying context.
Configuration of Network Builder, from methods-network_analysis-create_network_data.ipynb:
Configuration to generate network files for prelim:
Config of "Select Articles" - fields in bold need to be changed from default values:
Start date (YYYY-MM-DD):
2009-12-01End date (YYYY-MM-DD):
2009-12-31Fancy date range:
- Empty.Publications:
"Grand Rapids Press, The"Coders:
None selected.Coder IDs to include, in order of highest to lowest priority:
if automated: Article_Data coder_type Filter Type
and coder_type 'Value In' List (comma-delimited):
use the coder_type filter fields to filter automatically coded Article_Data on coder type if you have tried different automated coder types:
Article_Data coder_type Filter Type:
- Just automated
coder_type 'Value In' List (comma-delimited):
- Enter the coder types you want included. Examples:
Topics
: None selected.
Article Tag List (comma-delimited):
- "grp_month"Unique Identifier List (comma-delimited):
- Empty.Allow duplicate articles:
- "No"Configure "Network Settings" - fields in bold need to be changed from default values:
relations - Include source contact types
- All selected.relations - Include source capacities:
- None selected.relations - Exclude source capacities:
- None selected.Download as File?
- "Yes"Include render details?
- "No"Data Format:
- "Tab-Delimited Matrix"Data Output Type:
- "Network + Attribute Columns"Network Label:
- Empty.Include Headers:
- "Yes"Config of "Select People" - fields in bold need to be changed from default values:
Person Query Type:
- "Custom, defined below"People from (YYYY-MM-DD):
- 2009-12-01People to (YYYY-MM-DD):
- 2009-12-31Fancy person date range:
- Empty.Person publications:
- "Grand Rapids Press, The"Person coders:
- "automated", "minnesota1", "minnesota2", "minnesota3", "ground_truth"Coder IDs to include, in order of highest to lowest priority:
- Empty.Article_Data coder_type Filter Type
and coder_type 'Value In' List (comma-delimited):
use the coder_type filter fields to filter automatically coded Article_Data on coder type if you have tried different automated coder types:
Article_Data coder_type Filter Type:
- Just automated
coder_type 'Value In' List (comma-delimited):
- Enter the coder types you want included. Examples:
Person Topics
: None
Article Tag List (comma-delimited):
- "grp_month"Unique Identifier List (comma-delimited):
- Empty.Person allow duplicate articles:
- "Yes"Below is a JSON file that is just the automated coding for the month from 2009-12-01 through 2009-12-31, articles in the Grand Rapids Press. Just about all of the complexity of the original screens is possible here, as long as you loaded the entities and ties and all of the needed traits, including some way of adding tags...
{
"output_specification": {
"output_type": "file",
"output_file_path": "./NetworkDataRequest_test_output.txt",
"output_format": "TSV_matrix",
"output_structure": "both_trait_columns",
"output_include_column_headers": true
},
"relation_selection": {
"relation_type_slug_filter_combine_type": "AND",
"relation_type_slug_filters": [
{
"comparison_type": "includes",
"value_list": [ "mentioned", "qouted", "shared_byline" ]
}
],
"relation_trait_filter_combine_type": "AND",
"relation_trait_filters": [
{
"name": "pub_date",
"data_type": "date",
"comparison_type": "in_range",
"value_from": "2009-12-01",
"value_to": "2009-12-31"
},
{
"name": "sourcenet-coder-User-username",
"data_type": "string",
"comparison_type": "includes",
"value_list": [ "automated" ]
},
{
"name": "coder_type",
"data_type": "string",
"comparison_type": "includes",
"value_list": [ "OpenCalais_REST_API_v2" ]
}
],
"entity_type_slug_filter_combine_type": "AND",
"entity_type_slug_filters": [
{
"comparison_type": "includes",
"value_list": [ "person" ],
"relation_roles_list": [ "FROM" ]
},
{
"comparison_type": "includes",
"value_list": [ "person" ],
"relation_roles_list": [ "TO" ]
},
{
"comparison_type": "includes",
"value_list": [ "article" ],
"relation_roles_list": [ "THROUGH" ]
}
],
"entity_trait_filter_combine_type": "AND",
"entity_trait_filters": [
{
"name": "sourcenet-Newspaper-ID",
"data_type": "int",
"comparison_type": "includes",
"value_list": [ 1 ],
"relation_roles_list": [ "THROUGH" ]
}
]
}
}
Steps:
settings.py
is referring to the full database (research
, not the testing database research_temp
)load a request into a NetworkDataRequest instance (example below uses the unit test TestHelper class to load the basic configuration, which is the same as the above, for 12/1/2009 through 12/31/2009, automated coder, Grand Rapids Press only).
data_request = TestHelper.load_basic()
from file path:
data_request = NetworkDataRequest()
path_to_file = "./request.json"
data_request.load_network_data_request_json_file( path_to_file )
update details of the request post-load. Examples:
data_request.set_output_file_path( path )
" to change where the file will be output.call "data_request.set_output_format( format )
" to change the output format. Values:
NetworkDataRequest.PROP_VALUE_OUTPUT_FORMAT_SIMPLE_MATRIX
) - format for UCINet where contents of tie file and attribute files are all concatenated into one text file.NetworkDataRequest.PROP_VALUE_OUTPUT_FORMAT_CSV_MATRIX
)NetworkDataRequest.PROP_VALUE_OUTPUT_FORMAT_TSV_MATRIX
)call "data_request.set_output_type( type )
" to change the output type.
make an instance of NetworkOutput: network_output = NetworkOutput()
.
network_output.set_network_data_request( data_request )
", passing in the request you loaded.network_output.render_network_data()
". The method returns the contents of the file it created, and outputs to file as directed inside the request.
In [14]:
# load basic NetworkDataRequest
#data_request = TestHelper.load_basic()
# or, load from file path
data_request = NetworkDataRequest()
path_to_file = "./grp_month_from_context.json"
data_request.load_network_data_request_json_file( path_to_file )
Out[14]:
In [12]:
# configure output
#overridden_output_path = "{}/automated_grp_month.tsv".format( current_working_folder )
#data_request.set_output_file_path( overridden_output_path )
Out[12]:
In [15]:
# make and initialize instance of NetworkOutput
network_output = NetworkOutput()
network_output.set_network_data_request( data_request )
Out[15]:
In [16]:
# call render, see what happens.
network_data = network_output.render_network_data()
In [14]:
ndo_instance = network_output.m_NDO_instance
test1 = ndo_instance.get_entity_relation_type_summary_dict()
print( test1 )
test2 = ndo_instance.m_relation_type_slug_list
print( "\n\nEntity_Relation_Type slug list: {}".format( test2 ) )
test3 = ndo_instance.m_relation_type_slug_to_instance_map
print( "\n\nEntity_Relation_Type slug to instance map: {}".format( test3 ) )
Convert code from using Article_Data to derive ties to just loop over Entity_Relation instances.
Pseudocode:
NetworkOutput.render_network_data( query_set_IN )
create person dictionary - self.create_person_dict()
retrieve Article_Data QuerySet to traverse to pull in all authors and subjects - self.create_person_query_set()
for each Article_Data:
for each, call self.add_people_to_dict() on the QuerySet.
return dictionary into "person_dictionary".
get NetworkDataOutput (NDO) instance - self.get_NDO_instance()
initialize NDO
render - NDO.render()
Pseudocode:
NetworkDataOutput.render()
create ties
loop over query_set:
if authors:
process author relations (shared byline and qouted): self.process_author_relations( author_qs, source_qs )
.keys()
.loops over authors:
self.add_reciprocal_relation()
- adds bidirectional ties between authors in nested connection map (calls self.add_directed_relation( person_1_id_IN, person_2_id_IN )
, then self.add_directed_relation( person_2_id_IN, person_1_id_IN )
).
- self.add_directed_relation()
: updates self.relation_map
, map of FROM IDs to dictionary of TO IDs, where each TO maps to a count of the ties between the two. Solely ID-based, so methods to add relations can stay as-is! Yay!
- set person's type to "author": self.update_person_type( current_person_id, NetworkDataOutput.PERSON_TYPE_AUTHOR )
- accepts person ID and type.
- looks for person ID in self.person_type_dict
.
- If not present, adds current type.
- If present but not same as what is passed in, sets to BOTH.
- add relations to sources: self.process_source_relations( current_person_id, source_qs_IN )
- accepts author ID and source QuerySet.
- if author ID and source QuerySet has something in it, proceeds.
- checks if source is connected: self.is_source_connected( current_source )
- calls Article_Subject.is_connected( self.inclusion_params )
- if connected, retrieves person ID for source, if source has ID adds reciprocal relation between author and source: self.add_reciprocal_relation()
.update the types of the sources (from source_qs): self.update_source_person_types( source_qs )
self.update_person_type( current_person_id, NetworkDataOutput.PERSON_TYPE_SOURCE )
build master person list: self.generate_master_person_list()
actually render the network data: self.render_network_data()
Pseudocode:
NetworkOutput.render_network_data( relation_qs_IN, network_data_request_IN )
create entity dictionary - self.create_entity_dict()
call self.add_entities_to_dict() to add entities from the relation QuerySet.
loops over relations, retrieves FROM, TO, and optionally THROUGH entities, calls self.add_entity_to_dict() on each entity to be added.
return dictionary into "entity_dictionary".
get NetworkDataOutput (NDO) instance - self.get_NDO_instance()
initialize NDO
render - NDO.render()
Pseudocode:
NetworkDataOutput.render()
create ties
loop over query_set:
// add relation - check if directed:
directed:
self.add_directed_relation( from_id, to_id )
not directed:
self.add_reciprocal_relation( from_id, to_id )
// update roles of entities...
replaces:
self.update_person_type( current_person_id, NetworkDataOutput.PERSON_TYPE_SOURCE )
with update_entity_relations_details( self, entity_id_IN, relation_type_slug_IN, relation_role_IN ):
self.person_type_dict
with something more nuanced (no longer can depend on a single type).// build master entity list: self.generate_master_entity_list()
actually render the network data: self.render_network_data()
Try loading the basic file and using it to render network data...
In [15]:
# load basic NetworkDataRequest
data_request_basic = TestHelper.load_basic()
In [16]:
# make and initialize instance of NetworkOutput
network_output = NetworkOutput()
network_output.set_network_data_request( data_request_basic )
Out[16]:
In [17]:
# call render, see what happens.
network_data = network_output.render_network_data()
In [14]:
ndo_instance = network_output.m_NDO_instance
test1 = ndo_instance.get_entity_relation_type_summary_dict()
print( test1 )
test2 = ndo_instance.m_relation_type_slug_list
print( "\n\nEntity_Relation_Type slug list: {}".format( test2 ) )
test3 = ndo_instance.m_relation_type_slug_to_instance_map
print( "\n\nEntity_Relation_Type slug to instance map: {}".format( test3 ) )
In [17]:
# randomly pick 5 or so relations that have THROUGH.
relation_qs = Entity_Relation.objects.all()
relation_qs = relation_qs.exclude( relation_through__isnull = True )
relation_qs = relation_qs.order_by('?')[:5]
relation_count = len( relation_qs )
print( "relation count: {}".format( relation_count ) )
In [21]:
# randomly pick 5 or so relations that have THROUGH.
relation_qs = Entity_Relation.objects.all()
relation_qs = relation_qs.exclude( pk__in = [ 1101, 1245, 2264, 2687, 2973 ] )
relation_qs = relation_qs.filter( relation_from__in = [ 80, 128, 147, 151, 246, 248, 271, 279, 288, 291 ] )
relation_qs = relation_qs.order_by('?')[:5]
relation_count = len( relation_qs )
print( "relation count: {}".format( relation_count ) )
# loop
for relation in relation_qs:
print( "Relation ID: {}".format( relation.id ) )
#-- END loop over relations --#
In [23]:
relation_id_list = [ 1101, 1245, 2264, 2271, 2288, 2595, 2638, 2687, 2973, 3086 ]
#relation_id_list = TestHelper.TEST_RELATION_IDS_WITH_THROUGH
relation_qs = Entity_Relation.objects.all()
relation_qs = relation_qs.filter( pk__in = relation_id_list )
print( "relation count: {}".format( relation_count ) )
# loop
for relation in relation_qs:
print( "Relation ID: {}".format( relation.id ) )
#-- END loop over relations --#
In [25]:
# declare variables
relation_id_list = None
relation = None
relation_id = None
relation_from = None
relation_from_id = None
relation_to = None
relation_to_id = None
relation_through = None
relation_through_id = None
from_to_entity_id_list = None
all_entity_id_list = None
relation_counter = None
# init
relation_id_list = []
from_to_entity_id_list = []
all_entity_id_list = []
# loop
relation_counter = 0
for relation in relation_qs:
# increment counter
relation_counter += 1
# get relation ID
relation_id = relation.id
# add it to list?
if ( relation_id not in relation_id_list ):
# not already in list. Add it.
relation_id_list.append( relation_id )
relation_id_list.sort()
#-- END check to see if ID already in list. --#
# get entities
relation_from = relation.relation_from
relation_to = relation.relation_to
relation_through = relation.relation_through
# get IDs, check for each in entity ID lists
relation_from_id = relation_from.id
if ( relation_from_id not in from_to_entity_id_list ):
# append it and sort.
from_to_entity_id_list.append( relation_from_id )
from_to_entity_id_list.sort()
#-- END check to see if FROM in FROM-TO list --#
if ( relation_from_id not in all_entity_id_list ):
# append it and sort.
all_entity_id_list.append( relation_from_id )
all_entity_id_list.sort()
#-- END check to see if FROM in FROM-TO list --#
relation_to_id = relation_to.id
if ( relation_to_id not in from_to_entity_id_list ):
# append it and sort.
from_to_entity_id_list.append( relation_to_id )
from_to_entity_id_list.sort()
#-- END check to see if FROM in FROM-TO list --#
if ( relation_to_id not in all_entity_id_list ):
# append it and sort.
all_entity_id_list.append( relation_to_id )
all_entity_id_list.sort()
#-- END check to see if FROM in FROM-TO list --#
relation_through_id = relation_through.id
if ( relation_through_id not in all_entity_id_list ):
# append it and sort.
all_entity_id_list.append( relation_through_id )
all_entity_id_list.sort()
#-- END check to see if FROM in FROM-TO list --#
print( "\n- after relation #{} - {}:".format( relation_counter, relation ) )
print( "----> Relation ID List ( {} ): {}".format( len( relation_id_list ), relation_id_list ) )
print( "----> from_to_entity_id_list ( {} ): {}".format( len( from_to_entity_id_list ), from_to_entity_id_list ) )
print( "----> all_entity_id_list ( {} ): {}".format( len( all_entity_id_list ), all_entity_id_list ) )
#-- END loop over Relations --#
print( "Relation ID List ( {} ): {}".format( len( relation_id_list ), relation_id_list ) )
print( "from_to_entity_id_list ( {} ): {}".format( len( from_to_entity_id_list ), from_to_entity_id_list ) )
print( "all_entity_id_list ( {} ): {}".format( len( all_entity_id_list ), all_entity_id_list ) )
In [12]:
# make test instance.
test_instance = NetworkDataOutputTest()
# set up basic
basic_instance = test_instance.set_up_basic_test_instance()
# render
basic_instance.render()
# get relation QuerySet
relation_qs = basic_instance.get_query_set()
# get master ID list
entity_id_list = basic_instance.get_master_entity_list()
In [16]:
test_entity = Entity.objects.get( id = 18 )
print( test_entity )
test_qs = relation_qs.filter( relation_through = test_entity )
print( "match count: {}".format( test_qs.count() ) )
test_qs = relation_qs.filter( relation_through_id = 18 )
print( "match count: {}".format( test_qs.count() ) )
In [21]:
mentioned_type = Entity_Relation_Type.objects.get( slug = "mentioned" )
quoted_type = Entity_Relation_Type.objects.get( slug = "quoted" )
shared_byline_type = Entity_Relation_Type.objects.get( slug = "shared_byline" )
type_list = [ mentioned_type, quoted_type, shared_byline_type ]
role_list = [ ContextBase.RELATION_ROLES_FROM, ContextBase.RELATION_ROLES_TO, ContextBase.RELATION_ROLES_THROUGH ]
# declare variables
current_type_slug = None
type_to_roles_map = {}
role_to_list_map = {}
# loop over types
for current_type in type_list:
# add to type_to_roles_map
current_type_slug = current_type.slug
role_to_list_map = {}
type_to_roles_map[ current_type_slug ] = role_to_list_map
# loop over roles
for current_role in role_list:
# init list
type_role_list = []
# loop over IDs.
for entity_id in entity_id_list:
# do lookup in relations. relation type...
test_qs = relation_qs.filter( relation_type = current_type )
# and entity in requested role...
if ( current_role == ContextBase.RELATION_ROLES_FROM ):
test_qs = test_qs.filter( relation_from_id = entity_id )
elif ( current_role == ContextBase.RELATION_ROLES_TO ):
test_qs = test_qs.filter( relation_to_id = entity_id )
elif ( current_role == ContextBase.RELATION_ROLES_THROUGH ):
test_qs = test_qs.filter( relation_through_id = entity_id )
#-- END check to see which role we are filtering on. --#
# get count of matches for this entity, requested type and role
entity_count = test_qs.count()
# add count to list.
type_role_list.append( entity_count )
#-- END loop over entities. --#
# output the list.
print( "----> type: {}; role: {}; list: {}".format( current_type, current_role, type_role_list ) )
# add it to role map for current type
role_to_list_map[ current_role ] = type_role_list
#-- END loop over roles --#
#-- END loop over relation types. --#
In [22]:
# make test instance.
test_instance = NetworkDataOutputTest()
# set up basic
ndo_instance = test_instance.set_up_entity_selection_test_instance()
# render
ndo_instance.render()
# get relation QuerySet
relation_qs = ndo_instance.get_query_set()
# get master ID list
entity_id_list = ndo_instance.get_master_entity_list()
In [23]:
test_entity = Entity.objects.get( id = 18 )
print( test_entity )
test_qs = relation_qs.filter( relation_through = test_entity )
print( "match count: {}".format( test_qs.count() ) )
test_qs = relation_qs.filter( relation_through_id = 18 )
print( "match count: {}".format( test_qs.count() ) )
In [24]:
mentioned_type = Entity_Relation_Type.objects.get( slug = "mentioned" )
quoted_type = Entity_Relation_Type.objects.get( slug = "quoted" )
shared_byline_type = Entity_Relation_Type.objects.get( slug = "shared_byline" )
type_list = [ mentioned_type, quoted_type, shared_byline_type ]
role_list = [ ContextBase.RELATION_ROLES_FROM, ContextBase.RELATION_ROLES_TO, ContextBase.RELATION_ROLES_THROUGH ]
# declare variables
current_type_slug = None
entity_selection_type_to_roles_map = {}
role_to_list_map = {}
# loop over types
for current_type in type_list:
# add to type_to_roles_map
current_type_slug = current_type.slug
role_to_list_map = {}
entity_selection_type_to_roles_map[ current_type_slug ] = role_to_list_map
# loop over roles
for current_role in role_list:
# init list
type_role_list = []
# loop over IDs.
for entity_id in entity_id_list:
# do lookup in relations. relation type...
test_qs = relation_qs.filter( relation_type = current_type )
# and entity in requested role...
if ( current_role == ContextBase.RELATION_ROLES_FROM ):
test_qs = test_qs.filter( relation_from_id = entity_id )
elif ( current_role == ContextBase.RELATION_ROLES_TO ):
test_qs = test_qs.filter( relation_to_id = entity_id )
elif ( current_role == ContextBase.RELATION_ROLES_THROUGH ):
test_qs = test_qs.filter( relation_through_id = entity_id )
#-- END check to see which role we are filtering on. --#
# get count of matches for this entity, requested type and role
entity_count = test_qs.count()
# add count to list.
type_role_list.append( entity_count )
#-- END loop over entities. --#
# output the list.
print( "----> type: {}; role: {}; list: {}".format( current_type, current_role, type_role_list ) )
# add it to role map for current type
role_to_list_map[ current_role ] = type_role_list
#-- END loop over roles --#
#-- END loop over relation types. --#
Framework todos:
build basic framework where each output type accepts these two QuerySets, renders and returns desired output format.
initialize_from_request
" if needed, call parent, then init format-specific stuff.add type checking to setters of dictionaries and lists, to make sure either None or correct type passed in.
Testing todos:
find a way to put method validate_string_against_file_contents()
into python_utilities - new unittest_helper class, and call the stock unittest asserts instead of those on the instance?
unittest.TestCase
)django.test.TestCase
).unit testing:
NetworkDataRequest
do_output_entity_traits_or_ids
create_entity_id_header_label
create_entity_trait_header_label
create_entity_ids_and_traits_header_list
get/set_entity_ids_and_traits_header_list
get/set_entity_id_to_instance_map
get/set_entity_id_to_traits_map
// process_entities
- add to test to include entity traits and IDs
// spot-check a few individuals for values
build tests for the following:
load_entities_ids_and_traits
load_entity_identifiers
load_entity_traits
NOTE: Add process_entities
as precondition for the following:
// create_ids_and_traits_values_for_entity
get_ids_and_traits_for_entity
// get/set_entity_id_list
generate_entity_id_list
create_entity_ids_and_traits_value_dict( entity_id_list_IN )
create_entity_ids_and_traits_value_list
for each.create_entity_ids_and_traits_value_list( self, header_label_IN, entity_id_list_IN = None ):
- make a "validate" method that accepts...? and:
load_ids_and_traits_for_entities( self, entity_id_list_IN, dictionary_IN )
process_entities_from_id_list
NetworkDataOutput
dependencies for child classes:
create_header_list()
create_label_list()
create_relation_type_roles_for_entity()
create_relation_type_roles_header_list()
do_output_attribute_columns()
do_output_attribute_rows()
do_output_network()
get_entity_label()
get_relation_roles_for_entity()
get_relations_for_entity()
notes
register_relation_type()
, and the places that call it: render()
, optionally also update_entity_relation_details()
NDO children
NDO_SimpleMatrix
render_network_data()
- at a high level, render the basic, compare to a pre-rendered file.
create_label_string()
create_network_string()
- the worker method, effectively, testing render_network_data()
tests this.
create_entity_row_string()
- per row method.create_entity_relation_types_attribute_string()
NDO_CSVMatrix
append_entity_ids_and_traits_rows
render_network_data()
- at a high level, render the basic, compare to a pre-rendered file.
create_csv_string()
init_csv_output()
create_csv_document()
create_header_list()
- append_row_to_csv()
- append_entity_row()
- create_relation_type_roles_for_entity()
- (duplicate) append_row_to_csv()
- append_entity_id_row()
- (duplicate) append_row_to_csv()
- append_entity_relation_type_rows()
- (duplicate) append_row_to_csv()
cleanup()
NDO_TabDelimitedMatrix
// look at relation filtering/tie creation/rendering - Entity 10 (person 872) has two ties in test output for "basic" (to entities 8 and 9), should only have 1. Article (21409) had two authors, so single article resulted in two ties to the subject, one from each author.
test by comparing to output from the original tool, including derived statistics.
Jupyter notebooks:
overview of network data creation and analysis: phd_work/methods/methods_paper_planning.ipynb#Network-Analysis
R code for analysis comparing human and automated networks from original output: phd_work/methods/network_analysis/statnet/R-statnet-grp_month-full_month.ipynb
row counts are not the same, so need to look into what is going on.
spec for just automated: ./grp_month_from_context.json
./grp_month_from_context.tsv
context_text_data-20200205-023708-just_automated.txt
to start, probably should create a program to compare the two:
for each relation in one, then the other:
general TODO:
methods to find relations, similar to filter_entities()
and lookup_entities()
in Entity
model class. Include:
from_entity_traits
// to =
to_entity_traits
// through =
through_entity_traits
either FROM or TO (so undirected search - "I don't care which side")
any of FROM, TO, THROUGH
relation_traits
NOTE for trait matching types (and probably entity identifiers, also):
Tags on Entity and/or Entity_Relation.
make small dummy person and organization classes in context, so I can use them to test Abstract_Entity_Container without needing context_text.
come up with better way to seed entities and relations for sourcenet - store spec in context_text base, then method to create or update all.
This will need to build up a basic object mapping based on foriegn keys before it creates anything, then create things in the right order (Entity_Types and related first, then Entity_Identifier_Types, then Relation_Types). Order:
for each type:
To actually load, go in order of types outlined above, creating as you go.
abstraction:
make an abstract parent for a type that has associated trait specs (parent to Entity_Type
and Entity_Relation_Type
).
get_trait_spec()
.make an abstract parent for trait containers that have associated types with associated trait specs (parent to Entity
and Entity_Relation
).
2020.01.06
then, build basic framework where each output type accepts these two QuerySets, renders and returns desired output format.
refactoring from old NetworkOutput, NetworkDataOutput, and NDO objects:
// terms to search for and consider replacing "person" with "entity" (if they aren't in sections that will just be ripped out because they are no longer needed):
todo:
// remove "inclusion_params"
self.inclusion_params
inclusion_params
self.is_source_connected( current_source )
// remove network_label
generate_master_person_list()
to reference new variables (generate_master_entity_list
).get_person_label()
? - updated to get_entity_label
, for now just uses Entity ID. Could make it also include more IF Entity instances are cached. We'll see if that is helpful.get_person_type
and get_person_type_id
(functions themselves are removed, need to mop up around the other classes: grep -r -n "get_person_type" .
; grep -r -n "get_person_type_id" .
get_relations_for_person
to get_relations_for_entity
get_master_person_list
to get_master_entity_list
create_person_id_list
to create_entity_id_list
get_person_label
to get_entity_label
(and made it a lot simpler).PERSON_QUERY_TYPE_CHOICES_LIST
and related.CODER_TYPE_FILTER_*
PERSON_TYPE_*
variables (check in all files)// search for "person", "people" in all NDO files in context.
ndo_simple_matrix.py
ndo_csv_matrix.py
ndo_tab_delimited_matrix.py
// Article_Subject
append_person_row
...PARAM_*
? Not for now - used in places, is a good signal for needing changes.update person type stuff so it stores a list, rather than "author", "source", or "both".
register_relation_type( relation_type_IN )
, called from render()
, optionally also from update_entity_relation_details()
.2020.01.08
build basic framework where each output type accepts these two QuerySets, renders and returns desired output format.
refactoring from old NetworkOutput, NetworkDataOutput, and NDO objects:
NetworkOutput.render_network_data()
: add outputting to file if file output path specified in request.// update person type stuff so it stores a list, rather than "author", "source", or "both".
update_person_type()
self.person_type_dict
) - removed all.relation_type_slug_to_instance_map
and relation_type_slug_list
.NDO.create_person_type_id_list
- figure out ramifications. Might need to create a set of columns for each relation type, one for each of FROM, TO, and THROUGH, then populate appropriately from the new relation type map. register_relation_type( relation_type_IN )
to update map and list created above. Updated from render()
, optionally also from update_entity_relation_details()
.then, when outputting:
// for tabular (ndo_csv_matrix
and children):
NetworkDataOutput.create_relation_type_roles_for_entity()
: for data rows (attribute columns at right), walk the entity's relation type data structure in the same order as the relation type list, and output FROM, TO, and THROUGH numbers for each, 0 if not found. Will result in many attribute columns.for data columns (attribute rows at bottom), pull in all relation types, then for each entity-->type-->role, walk all entities and output their value for that relation type in the row. So, will result in many rows of attribute values.
NetworkDataOutput.create_relation_type_role_value_list()
: create method that accepts a relation type slug and a role, creates list of values for all entities in master entity list for that combination of slug and role. If not present for a given entity, sets to 0.
- // NetworkDataOutput.create_relation_type_value_dict()
: create a method that accepts a relation type slug, loops over all roles, calls NetworkDataOutput.create_relation_type_role_value_list()
to build the list of values for each, then makes and returns dictionary mapping roles to value lists.
- // NetworkDataOutput.create_all_relation_type_values_lists()
: create a method that loops over relation type slugs, then for each, calls NetworkDataOutput.create_relation_type_value_dict()
to create dictionary that maps roles to values lists. Creates a dictionary that maps relation type slugs to these dictionaries, then returns the new dictionary.
- // NDO_CSVMatrix.append_entity_relation_type_rows()
: implement logic in the ndo_csv_matrix class that retrieves the values lists and uses them appropriately.// for ndo_simple_matrix.py
(UCINet native format), need to implement create_entity_relation_types_attribute_string()
- it assumed a single entity type value per person - need to re-do it so it pulls in all relation types, then outputs a list per person-->type-->role. So, will result in many lists.
2020.01.23
add ability in JSON to tell which entity traits we want to include in output traits, then when loading entities, look for those traits in each as it is loaded, store in a separate entity ID to trait name-value map.
// add place in output spec for:
traits (output_entity_traits_list
?) - store the list of names/slugs of traits you want included if traits are output. Possible filter criteria:
identifiers (output_entity_identifiers_list
) - list of identifiers you want to include - to effectively target, might need an object with more than just name. Possible filter criteria:
includes updating NetworkDataRequest and its tests to know of and allow for easy retrieval of these lists.
// add to processing a step in generating the master entity list where you loop over the list of traits if one present in the request and make a map of the values for those traits for each entity (map entity ID --> trait dict).
move the logic for processing entities to NetworkDataRequest, so it can be used to pass the entity dictionary, master entity list, and entry traits to NetworkDataOutput and children.
process_entities
add_entities_to_dict
add_entity_to_dict
entity dictionary and trait map, and getters and setters.
self.m_entity_id_to_instance_map = {}
self.m_entity_id_to_traits_map = {}
it checks if traits specified (call to NetworkDataRequest.do_output_entity_traits_or_ids()
).
if so, calls load_entities_traits_and_ids
. Inside:
load_entity_traits
load_entity_identifiers
need to remove all that stuff from NetworkOutput, fix everything so it works again.
// in rendering output, if traits output, update to render:
Testing todos:
unit testing:
// NetworkOutput
create_ndo_instance()
get_NDO_instance()
get_network_data_request()
get_relation_query_set()
set_NDO_instance()
set_network_data_request()
set_relation_query_set()
NetworkDataOutput
// getters and setters
get_entity_dictionary()
/set_entity_dictionary()
get_entity_relation_type_summary_dict()
/set_entity_relation_type_summary_dict()
get_master_entity_list()
/set_master_entity_list()
get_network_data_request()
/set_network_data_request()
get_output_format()
/set_output_format()
get_output_structure()
/set_output_structure()
get_output_type()
/set_output_type()
get_query_set()
/set_query_set()
get_relation_map()
/set_relation_map()
get_relation_type_slug_list()
/set_relation_type_slug_list()
get_relation_type_slug_to_instance_map()
/set_relation_type_slug_to_instance_map()
// set_query_set()
set_entity_dictionary()
initialize_from_request()
// render()
get_query_set()
get_entity_dictionary()
// add_directed_relation()
get_relation_map()
// add_reciprocal_relation()
add_directed_relation()
// register_relation_type()
get_relation_type_slug_to_instance_map()
get_relation_type_slug_list()
// update_entity_relations_details()
register_relation_type()
get_entity_relation_type_summary_dict()
// generate_master_entity_list()
get_entity_dictionary()
get_entity_relation_type_summary_dict()
set_master_entity_list()
get_master_entity_list()
abstract render_network_data()
dependencies for child classes:
create_entity_id_list()
// create_all_relation_type_values_lists()
get_relation_type_slug_list()
// create_relation_type_value_dict()
// create_relation_type_role_value_list()
get_master_entity_list()
- // get_entity_relation_type_summary_dict()
- // create goal data:
- write program to, for basic, then entity_selection:
- setup and render.
- retrieve relation QS.
- retrieve master entity ID list.
- for each relation type
- for each role:
- loop over ID list, and filter to count all relations where the current ID is in the selected type and role.// NetworkDataRequest
// process_entities()
// create_entity_dict()
// add_entities_to_dict()
add_entity_to_dict()