R - grp_month - sna-author_info - full_month
2017.12.07 - work log - prelim - R - grp month
From sna-author_info.r
Store important directories and file names in variables:
In [1]:
getwd()
In [2]:
# code files (in particular SNA function library, modest though it may be)
code_directory <- "/home/jonathanmorgan/work/django/research/context_analysis/R/sna"
sna_function_file_path <- paste( code_directory, "/", 'functions-sna.r', sep = "" )
# home directory
home_directory <- getwd()
home_directory <- "/home/jonathanmorgan/work/django/research/work/phd_work/methods"
# data directories
data_directory <- paste( home_directory, "/data", sep = "" )
workspace_file_name <- "statnet-grp_month.RData"
workspace_file_path <- paste( data_directory, "/", workspace_file_name )
In [3]:
# set working directory to data directory for now.
setwd( data_directory )
getwd()
In the original file, it assumed you'd just source it after running other stuff. Here, we have done that other stuff in another notebook, now we need to reload the workspace in which it was done:
In [4]:
# assumes that you've already set working directory above to the
# working directory.
setwd( data_directory )
message( paste( "Loading workspace : ", workspace_file_name, sep = "" ) )
load( workspace_file_name )
In [5]:
# output workspace
output_workspace_file_name <- workspace_file_name
output_workspace_file_path <- paste( data_directory, "/", output_workspace_file_name )
Load original network data dataframes into humanNetworkDataDF and automatedNetworkDataDF.
In [6]:
# in statsnet files, orginal automated data was loaded into gmAutomatedDataDF
automatedNetworkDataDF <- gmAutomatedDataDF
# original human data was loaded into gmHumanDataDF
humanNetworkDataDF <- gmHumanDataDF
The original file on which this is based is: context_text/R/sna/sna-author_info.r
:
Notes:
humanNetworkDataDF
is original data DataFrame (gmHumanDataDF
, etc.).automatedNetworkDataDF
is original data DataFrame (gmAutomatedDataDF
, etc.).
In [7]:
# For this to work, you'll need to have run either of the following, including
# all of the prerequisite files listed in each file:
# - context_text/R/igraph/sna-igraph-network_stats.r
# - context_text/R/statnet/sna-statnet-network_stats.r
# Also assumes that you haven't re-ordered the <type>NetworkData data frames.
#==============================================================================#
# information for all authors - source_type = 2 (reporter) or 4 (both source and reporter)
#==============================================================================#
# source_type = 2 (reporter) or 4 (both source and reporter)
# human - all authors
humanAuthorsNetworkData <- humanNetworkDataDF[ humanNetworkDataDF$person_type == 2 | humanNetworkDataDF$person_type == 4, ]
humanAuthorsCount <- nrow( humanAuthorsNetworkData )
humanAuthorsMeanDegree <- mean( humanAuthorsNetworkData$degree )
humanAuthorsMaxDegree <- max( humanAuthorsNetworkData$degree )
humanAuthorsMeanTieWeightGE0 <- mean( humanAuthorsNetworkData$meanTieWeightGE0 )
humanAuthorsMeanTieWeightGE1 <- mean( humanAuthorsNetworkData$meanTieWeightGE1 )
humanAuthorsMaxTieWeight <- max( humanAuthorsNetworkData$maxTieWeight )
# automated - all authors
automatedAuthorsNetworkData <- automatedNetworkDataDF[ automatedNetworkDataDF$person_type == 2 | automatedNetworkDataDF$person_type == 4, ]
automatedAuthorsCount <- nrow( automatedAuthorsNetworkData )
automatedAuthorsMeanDegree <- mean( automatedAuthorsNetworkData$degree )
automatedAuthorsMaxDegree <- max( automatedAuthorsNetworkData$degree )
automatedAuthorsMeanTieWeightGE0 <- mean( automatedAuthorsNetworkData$meanTieWeightGE0 )
automatedAuthorsMeanTieWeightGE1 <- mean( automatedAuthorsNetworkData$meanTieWeightGE1 )
automatedAuthorsMaxTieWeight <- max( automatedAuthorsNetworkData$maxTieWeight )
#==============================================================================#
# Generate information on individual reporters who have shared sources (subset
# of all authors).
#==============================================================================#
# human - subsetting based on position of authors who had shared sources.
#humanAuthorsSharedNetworkData <- humanNetworkDataDF[ c( 3, 6, 9, 11, 12, 13, 14, 16, 21, 43, 44, 63, 169, 310 ), ]
# subsetting based on person IDs.
humanAuthorsSharedIDs <- c( 387, 2310, 394, 13, 3, 46, 23, 29, 30, 36, 425, 302, 178, 437, 1082, 443, 377, 66, 69, 161, 73, 74, 460, 591, 336, 84, 598, 599, 217, 223, 937, 1655, 332, 505 )
humanAuthorsSharedNetworkData <- humanNetworkDataDF[ humanNetworkDataDF$person_id %in% humanAuthorsSharedIDs , ]
# human - make data
humanAuthorsSharedCount <- nrow( humanAuthorsSharedNetworkData )
humanAuthorsSharedMeanDegree <- mean( humanAuthorsSharedNetworkData$degree )
humanAuthorsSharedMaxDegree <- max( humanAuthorsSharedNetworkData$degree )
humanAuthorsSharedMeanTieWeightGE0 <- mean( humanAuthorsSharedNetworkData$meanTieWeightGE0 )
humanAuthorsSharedMeanTieWeightGE1 <- mean( humanAuthorsSharedNetworkData$meanTieWeightGE1 )
humanAuthorsSharedMaxTieWeight <- max( humanAuthorsSharedNetworkData$maxTieWeight )
# automated - subsetting based on position of authors who had shared sources.
#automatedAuthorsSharedNetworkData <- automatedNetworkDataDF[ c( 3, 6, 9, 11, 12, 13, 16, 21, 44, 63, 169, 310 ), ]
# subsetting based on person IDs.
automatedAuthorsSharedIDs <- c( 387, 2310, 394, 13, 3, 46, 23, 30, 36, 425, 2614, 302, 178, 437, 1082, 443, 377, 66, 69, 161, 73, 74, 460, 591, 336, 84, 598, 599, 217, 223, 1655, 332, 505 )
automatedAuthorsSharedNetworkData <- automatedNetworkDataDF[ automatedNetworkDataDF$person_id %in% automatedAuthorsSharedIDs , ]
# automated - make data
automatedAuthorsSharedCount <- nrow( automatedAuthorsSharedNetworkData )
automatedAuthorsSharedMeanDegree <- mean( automatedAuthorsSharedNetworkData$degree )
automatedAuthorsSharedMaxDegree <- max( automatedAuthorsSharedNetworkData$degree )
automatedAuthorsSharedMeanTieWeightGE0 <- mean( automatedAuthorsSharedNetworkData$meanTieWeightGE0 )
automatedAuthorsSharedMeanTieWeightGE1 <- mean( automatedAuthorsSharedNetworkData$meanTieWeightGE1 )
automatedAuthorsSharedMaxTieWeight <- max( automatedAuthorsSharedNetworkData$maxTieWeight )
#==============================================================================#
# Do some regression to see if article or source count predict source sharing.
#==============================================================================#
#------------------------------------------------------------------------------#
# first, set up data frames (from results of running python script:
# context_text/examples/analysis/analysis-person_info.py)
#------------------------------------------------------------------------------#
# human coder (index 1), all authors.
humanIdVector <- c( 387, 2310, 2567, 394, 652, 13, 654, 3, 46, 23, 2004, 29, 30, 417, 36, 425, 2614, 302, 178, 437, 566, 1082, 443, 377, 66, 69, 161, 73, 74, 460, 482, 591, 336, 84, 598, 599, 217, 223, 736, 2018, 743, 937, 1782, 1655, 332, 505, 703, 637 )
humanSourceCountsVector <- c( 18, 2, 0, 33, 9, 36, 3, 27, 57, 31, 4, 50, 28, 4, 31, 30, 5, 31, 41, 45, 4, 13, 43, 36, 92, 43, 37, 30, 46, 3, 1, 76, 9, 64, 21, 50, 46, 18, 2, 5, 2, 7, 4, 6, 7, 18, 2, 13 )
humanSharedCountsVector <- c( 7, 2, 0, 2, 0, 1, 0, 9, 22, 12, 0, 2, 2, 0, 9, 2, 0, 3, 6, 13, 0, 5, 9, 10, 37, 19, 12, 10, 5, 1, 0, 6, 2, 19, 5, 4, 13, 9, 0, 0, 0, 7, 0, 6, 1, 1, 0, 0 )
humanArticleCountsVector <- c( 7, 1, 1, 8, 5, 17, 1, 13, 21, 15, 2, 18, 13, 4, 11, 10, 1, 12, 13, 15, 1, 8, 16, 17, 30, 15, 14, 12, 19, 4, 1, 25, 4, 27, 9, 17, 18, 6, 1, 1, 1, 1, 4, 1, 4, 8, 2, 4 )
humanAuthorsDF <- data.frame( humanIdVector, humanSourceCountsVector, humanSharedCountsVector, humanArticleCountsVector )
names( humanAuthorsDF ) <- c( "authorID", "sourceCount", "sharedCount", "articleCount" )
# human coder, only authors with shared sources.
humanSharedIdVector <- c( 387, 2310, 394, 13, 3, 46, 23, 29, 30, 36, 425, 302, 178, 437, 1082, 443, 377, 66, 69, 161, 73, 74, 460, 591, 336, 84, 598, 599, 217, 223, 937, 1655, 332, 505 )
humanSharedSourceCountsVector <- c( 18, 2, 33, 36, 27, 57, 31, 50, 28, 31, 30, 31, 41, 45, 13, 43, 36, 92, 43, 37, 30, 46, 3, 76, 9, 64, 21, 50, 46, 18, 7, 6, 7, 18 )
humanSharedSharedCountsVector <- c( 7, 2, 2, 1, 9, 22, 12, 2, 2, 9, 2, 3, 6, 13, 5, 9, 10, 37, 19, 12, 10, 5, 1, 6, 2, 19, 5, 4, 13, 9, 7, 6, 1, 1 )
humanSharedArticleCountsVector <- c( 7, 1, 8, 17, 13, 21, 15, 18, 13, 11, 10, 12, 13, 15, 8, 16, 17, 30, 15, 14, 12, 19, 4, 25, 4, 27, 9, 17, 18, 6, 1, 1, 4, 8 )
humanSharedDF <- data.frame( humanSharedIdVector, humanSharedSourceCountsVector, humanSharedSharedCountsVector, humanSharedArticleCountsVector )
names( humanSharedDF ) <- c( "authorID", "sourceCount", "sharedCount", "articleCount" )
# computer coder, all authors.
automatedIdVector <- c( 387, 2310, 2567, 394, 652, 13, 654, 3, 46, 23, 2004, 29, 30, 417, 36, 425, 2614, 302, 178, 437, 566, 1082, 443, 377, 66, 69, 161, 73, 74, 460, 482, 591, 336, 84, 598, 599, 217, 223, 736, 2018, 743, 1782, 1655, 332, 505, 703, 637 )
automatedSourceCountsVector <- c( 18, 2, 0, 27, 8, 39, 2, 29, 46, 33, 4, 50, 26, 4, 28, 31, 6, 31, 42, 49, 2, 15, 43, 34, 88, 45, 34, 28, 46, 4, 1, 72, 9, 69, 22, 46, 43, 13, 2, 5, 2, 4, 6, 7, 14, 2, 10 )
automatedSharedCountsVector <- c( 7, 2, 0, 2, 0, 1, 0, 12, 13, 11, 0, 0, 2, 0, 7, 3, 1, 4, 8, 10, 0, 7, 8, 9, 35, 19, 11, 10, 4, 1, 0, 6, 1, 20, 7, 3, 11, 8, 0, 0, 0, 0, 6, 1, 1, 0, 0 )
automatedArticleCountsVector <- c( 7, 1, 1, 8, 5, 17, 1, 13, 20, 15, 2, 18, 13, 4, 11, 10, 1, 12, 13, 15, 1, 8, 16, 17, 30, 15, 14, 12, 19, 4, 1, 25, 4, 27, 9, 17, 18, 6, 1, 1, 1, 4, 1, 4, 8, 2, 4 )
automatedAuthorsDF <- data.frame( automatedIdVector, automatedSourceCountsVector, automatedSharedCountsVector, automatedArticleCountsVector )
names( automatedAuthorsDF ) <- c( "authorID", "sourceCount", "sharedCount", "articleCount" )
# computer coder, only authors with shared sources.
automatedSharedIdVector <- c( 387, 2310, 394, 13, 3, 46, 23, 30, 36, 425, 2614, 302, 178, 437, 1082, 443, 377, 66, 69, 161, 73, 74, 460, 591, 336, 84, 598, 599, 217, 223, 1655, 332, 505 )
automatedSharedSourceCountsVector <- c( 18, 2, 27, 39, 29, 46, 33, 26, 28, 31, 6, 31, 42, 49, 15, 43, 34, 88, 45, 34, 28, 46, 4, 72, 9, 69, 22, 46, 43, 13, 6, 7, 14 )
automatedSharedSharedCountsVector <- c( 7, 2, 2, 1, 12, 13, 11, 2, 7, 3, 1, 4, 8, 10, 7, 8, 9, 35, 19, 11, 10, 4, 1, 6, 1, 20, 7, 3, 11, 8, 6, 1, 1 )
automatedSharedArticleCountsVector <- c( 7, 1, 8, 17, 13, 20, 15, 13, 11, 10, 1, 12, 13, 15, 8, 16, 17, 30, 15, 14, 12, 19, 4, 25, 4, 27, 9, 17, 18, 6, 1, 4, 8 )
automatedSharedDF <- data.frame( automatedSharedIdVector, automatedSharedSourceCountsVector, automatedSharedSharedCountsVector, automatedSharedArticleCountsVector )
names( automatedSharedDF ) <- c( "authorID", "sourceCount", "sharedCount", "articleCount" )
#------------------------------------------------------------------------------#
# regression
#------------------------------------------------------------------------------#
# all human-coded authors:
humanLmResults <- lm( sharedCount ~ sourceCount + articleCount, data = humanAuthorsDF )
humanLmResults
# all computer-coded authors:
automatedLmResults <- lm( sharedCount ~ sourceCount + articleCount, data = automatedAuthorsDF )
automatedLmResults
#------------------------------------------------------------------------------#
# means of counts from python file
#------------------------------------------------------------------------------#
# Article Count
humanAuthorsMeanArticleCount <- mean( humanAuthorsDF$articleCount )
humanAuthorsSharedMeanArticleCount <- mean( humanSharedDF$articleCount )
automatedAuthorsMeanArticleCount <- mean( automatedAuthorsDF$articleCount )
automatedAuthorsSharedMeanArticleCount <- mean( automatedSharedDF$articleCount )
# Source Count
humanAuthorsMeanSourceCount <- mean( humanAuthorsDF$sourceCount )
humanAuthorsSharedMeanSourceCount <- mean( humanSharedDF$sourceCount )
automatedAuthorsMeanSourceCount <- mean( automatedAuthorsDF$sourceCount )
automatedAuthorsSharedMeanSourceCount <- mean( automatedSharedDF$sourceCount )
# Shared Count
humanAuthorsMeanSharedCount <- mean( humanAuthorsDF$sharedCount )
humanAuthorsSharedMeanSharedCount <- mean( humanSharedDF$sharedCount )
automatedAuthorsMeanSharedCount <- mean( automatedAuthorsDF$sharedCount )
automatedAuthorsSharedMeanSharedCount <- mean( automatedSharedDF$sharedCount )
In [8]:
#------------------------------------------------------------------------------#
# output
#------------------------------------------------------------------------------#
message( "====> HUMAN - all authors" )
message( paste( "human author count = ", humanAuthorsCount, sep = "" ) )
message( paste( "human author mean degree = ", humanAuthorsMeanDegree, sep = "" ) )
message( paste( "human author max degree = ", humanAuthorsMaxDegree, sep = "" ) )
message( paste( "human author mean tie weight GE0 = ", humanAuthorsMeanTieWeightGE0, sep = "" ) )
message( paste( "human author mean tie weight GE1 = ", humanAuthorsMeanTieWeightGE1, sep = "" ) )
message( paste( "human author max tie weight = ", humanAuthorsMaxTieWeight, sep = "" ) )
message( paste( "human author mean article count = ", humanAuthorsMeanArticleCount, sep = "" ) )
message( paste( "human author mean source count = ", humanAuthorsMeanSourceCount, sep = "" ) )
message( paste( "human author mean shared count = ", humanAuthorsMeanSharedCount, sep = "" ) )
message( "" )
message( "" )
message( "====> HUMAN - authors with shared sources" )
message( paste( "human shared count = ", humanAuthorsSharedCount, sep = "" ) )
message( paste( "human shared mean degree = ", humanAuthorsSharedMeanDegree, sep = "" ) )
message( paste( "human shared max degree = ", humanAuthorsSharedMaxDegree, sep = "" ) )
message( paste( "human shared mean tie weight GE0 = ", humanAuthorsSharedMeanTieWeightGE0, sep = "" ) )
message( paste( "human shared mean tie weight GE1 = ", humanAuthorsSharedMeanTieWeightGE1, sep = "" ) )
message( paste( "human shared max tie weight = ", humanAuthorsSharedMaxTieWeight, sep = "" ) )
message( paste( "human shared mean article count = ", humanAuthorsSharedMeanArticleCount, sep = "" ) )
message( paste( "human shared mean source count = ", humanAuthorsSharedMeanSourceCount, sep = "" ) )
message( paste( "human shared mean shared count = ", humanAuthorsSharedMeanSharedCount, sep = "" ) )
message( "regression results:" )
print( humanLmResults )
message( "" )
message( "" )
message( "====> AUTOMATED - all authors" )
message( paste( "automated author count = ", automatedAuthorsCount, sep = "" ) )
message( paste( "automated author mean degree = ", automatedAuthorsMeanDegree, sep = "" ) )
message( paste( "automated author max degree = ", automatedAuthorsMaxDegree, sep = "" ) )
message( paste( "automated author mean tie weight GE0 = ", automatedAuthorsMeanTieWeightGE0, sep = "" ) )
message( paste( "automated author mean tie weight GE1 = ", automatedAuthorsMeanTieWeightGE1, sep = "" ) )
message( paste( "automated author max tie weight = ", automatedAuthorsMaxTieWeight, sep = "" ) )
message( paste( "automated author mean article count = ", automatedAuthorsMeanArticleCount, sep = "" ) )
message( paste( "automated author mean source count = ", automatedAuthorsMeanSourceCount, sep = "" ) )
message( paste( "automated author mean shared count = ", automatedAuthorsMeanSharedCount, sep = "" ) )
message( "" )
message( "" )
message( "====> AUTOMATED - authors with shared sources" )
message( paste( "automated shared count = ", automatedAuthorsSharedCount, sep = "" ) )
message( paste( "automated shared mean degree = ", automatedAuthorsSharedMeanDegree, sep = "" ) )
message( paste( "automated shared max degree = ", automatedAuthorsSharedMaxDegree, sep = "" ) )
message( paste( "automated shared mean tie weight GE0 = ", automatedAuthorsSharedMeanTieWeightGE0, sep = "" ) )
message( paste( "automated shared mean tie weight GE1 = ", automatedAuthorsSharedMeanTieWeightGE1, sep = "" ) )
message( paste( "automated shared max tie weight = ", automatedAuthorsSharedMaxTieWeight, sep = "" ) )
message( paste( "automated shared mean article count = ", automatedAuthorsSharedMeanArticleCount, sep = "" ) )
message( paste( "automated shared mean source count = ", automatedAuthorsSharedMeanSourceCount, sep = "" ) )
message( paste( "automated shared mean shared count = ", automatedAuthorsSharedMeanSharedCount, sep = "" ) )
message( "regression results:" )
print( automatedLmResults )
message( "" )
message( "" )
Save all the information in the current image, in case we need/want it later.
In [9]:
# help( save.image )
message( paste( "Output workspace to: ", output_workspace_file_name, sep = "" ) )
save.image( file = output_workspace_file_name )