2017.12.02 - work log - prelim - R - statnet - grp_month - 0/1
grp_month
analysisgrp_week
analysisgrp_month
and grp_week
using QAPRelated files:
network descriptives
network-level
files
R scripts:
context_text/R/db_connect.r
context_text/R/sna/functions-sna.r
context_text/R/sna/sna-load_data.r
context_text/R/sna/igraph/*
context_text/R/sna/statnet/*
statnet/sna
sna::gden()
- graph densityR scripts:
context_text/R/sna/statnet/sna-statnet-init.r
context_text/R/sna/statnet/sna-statnet-network-stats.r
context_text/R/sna/statnet/sna-qap.r
igraph
igraph::transitivity()
- vector of transitivity scores for each node in a graph, plus network-level transitivity score.
R scripts:
context_text/R/sna/statnet/sna-igraph-init.r
context_text/R/sna/statnet/sna-igraph-network-stats.r
Store important directories and file names in variables:
In [1]:
# code files (in particular SNA function library, modest though it may be)
code_directory <- "/home/jonathanmorgan/work/django/research/context_analysis/R/sna"
sna_function_file_path <- paste( code_directory, "/", 'functions-sna.r', sep = "" )
# home directory
home_directory <- getwd()
home_directory <- "/home/jonathanmorgan/work/django/research/work/phd_work/methods"
# data directories
data_directory <- paste( home_directory, "/data", sep = "" )
workspace_file_name <- "statnet-grp_month-01.RData"
workspace_file_path <- paste( data_directory, "/", workspace_file_name )
In [2]:
# set working directory
setwd( data_directory )
getwd()
In [3]:
source( sna_function_file_path )
First, need render to render network data and upload it to your server.
Directions for rendering network data are in 2017.11.14-work_log-prelim-network_analysis.ipynb. You want a tab-delimited matrix that includes both the network and attributes of nodes as columns, and you want it to include a header row.
Once you render your network data files, you should place them on the server.
High level data file layout:
person_id
and person_type
)Files and their location on server:
This is data from the Grand Rapids Press articles from December of 2009, coded by both humans and OpenCalais.
Files:
sourcenet_data-20171205-022551-grp_month-automated.tab
sourcenet_data-20171206-031358-grp_month-automated-week_subset.tab
sourcenet_data-20171115-043102-grp_month-human.tab
sourcenet_data-20171206-031319-grp_month-human-week_subset.tab
Location in Dropbox: Dropbox/academia/MSU/program_stuff/prelim_paper/data/network_analysis/2017.11.14/network/new_coders/grp_month
Location on server: /home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month
If you want, you can load this file's workspace, from a previous run:
In [4]:
# assumes that you've already set working directory above to the
# working directory.
setwd( data_directory )
load( workspace_file_name )
grp_month
(gm) - automated - OpenCalaisFirst, we'll analyze the month of data coded by OpenCalais. Set up some variables to store where data is located:
grp_month
(gm) - automated - Read dataRead in the data from tab-delimited data file, then get it in right data structures for use in R SNA.
In [5]:
# initialize variables
gmAutomatedDataFolder <- "/home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month"
gmAutomatedDataFile <- "sourcenet_data-20171205-022551-grp_month-automated.tab"
gmAutomatedDataPath <- paste( gmAutomatedDataFolder, "/", gmAutomatedDataFile, sep = "" )
In [6]:
gmAutomatedDataPath
Load the data file into memory
In [7]:
# tab-delimited:
gmAutomatedDataDF <- read.delim( gmAutomatedDataPath, header = TRUE, row.names = 1, check.names = FALSE )
In [8]:
# get count of rows...
gmAutomatedRowCount <- nrow( gmAutomatedDataDF )
paste( "grp_month automated row count = ", gmAutomatedRowCount, sep = "" )
# ...and columns
gmAutomatedColumnCount <- ncol( gmAutomatedDataDF )
paste( "grp_month automated column count = ", gmAutomatedColumnCount, sep = "" )
Get just the tie rows and columns for initializing network libraries.
In [9]:
# the below syntax returns only as many columns as there are rows, so
# omitting any trait columns that lie in columns on the right side
# of the file.
gmAutomatedNetworkDF <- gmAutomatedDataDF[ , 1 : gmAutomatedRowCount ]
#str( gmAutomatedNetworkDF )
In [10]:
# convert to a matrix
gmAutomatedNetworkMatrix <- as.matrix( gmAutomatedNetworkDF )
# str( gmAutomatedNetworkMatrix )
In [11]:
# for all values greater than 1, reset their values to 1
gmAutomatedNetworkMatrix[ gmAutomatedNetworkMatrix > 1 ] = 1
grp_month
(gm) - automated - initialize statnetFirst, load the statnet package, then load the automated grp_month data into statnet object and assign attributes to nodes.
Based on context_text/R/sna/statnet/sna-statnet-init.r
.
In [12]:
# make sure you've loaded the statnet library
# install.packages( "statnet" )
library( statnet )
In [13]:
# If you have a data frame of attributes (each attribute is a column, with
# attribute name the column name), you can associate those attributes
# when you create the network.
# attribute help: http://www.inside-r.org/packages/cran/network/docs/loading.attributes
# load attributes from a file:
#tab_attribute_test1 <- read.delim( "tab-test1-attribute_data.txt", header = TRUE, row.names = 1, check.names = FALSE )
# or create DataFrame by just grabbing the attribute columns
gmAutomatedNetworkAttributeDF <- gmAutomatedDataDF[ , 1168:1169 ]
# convert matrix to statnet network object instance.
gmAutomatedNetworkStatnet <- network( gmAutomatedNetworkMatrix, matrix.type = "adjacency", directed = FALSE, vertex.attr = gmAutomatedNetworkAttributeDF )
# look at information now.
gmAutomatedNetworkStatnet
# Network attributes:
# vertices = 314
# directed = FALSE
# hyper = FALSE
# loops = FALSE
# multiple = FALSE
# bipartite = FALSE
# total edges= 309
# missing edges= 0
# non-missing edges= 309
#
# Vertex attribute names:
# person_type vertex.names
#
# No edge attributes
In [5]:
# calais - include ties Greater than or equal to 0 (GE0)
gmAutomatedMeanTieWeightGE0Vector <- apply( gmAutomatedNetworkMatrix, 1, calculateListMean )
gmAutomatedDataDF$meanTieWeightGE0 <- gmAutomatedMeanTieWeightGE0Vector
# calais - include ties Greater than or equal to 1 (GE1)
gmAutomatedMeanTieWeightGE1Vector <- apply( gmAutomatedNetworkMatrix, 1, calculateListMean, minValueToIncludeIN = 1 )
gmAutomatedDataDF$meanTieWeightGE1 <- gmAutomatedMeanTieWeightGE1Vector
# automated - Max tie weight?
gmAutomatedMaxTieWeightVector <- apply( gmAutomatedNetworkMatrix, 1, calculateListMax )
gmAutomatedDataDF$maxTieWeight <- gmAutomatedMaxTieWeightVector
grp_month
(gm) - automated - Basic metrics
In [14]:
# assuming that our statnet network object is in reference test1_statnet.
# Use the degree function in the sna package to create vector of degree values
# for each node. Make sure to pass the gmode parameter to tell it that the
# graph is not directed (gmode = "graph", instead of "digraph").
# Doc: http://www.inside-r.org/packages/cran/sna/docs/degree
#degree_vector <- degree( test1_statnet, gmode = "graph" )
# If you have other libraries loaded that also implement a degree function, you
# can also call this with package name:
gmAutomatedDegreeVector <- sna::degree( gmAutomatedNetworkStatnet, gmode = "graph" )
# output the vector
gmAutomatedDegreeVector
# want more info on the degree function? You can get to it eventually through
# the following:
#help( package = "sna" )
#??sna::degree
# what is the average (mean) degree?
gmAutomatedAvgDegree <- mean( gmAutomatedDegreeVector )
paste( "average degree = ", gmAutomatedAvgDegree, sep = "" )
# subset vector to get only those that are above mean
gmAutomatedAboveMeanVector <- gmAutomatedDegreeVector[ gmAutomatedDegreeVector > gmAutomatedAvgDegree ]
# Take the degree and associate it with each node as a node attribute.
# (%v% is a shortcut for the get.vertex.attribute command)
gmAutomatedNetworkStatnet %v% "degree" <- gmAutomatedDegreeVector
# also add degree vector to original data frame
gmAutomatedDataDF$degree <- gmAutomatedDegreeVector
In [15]:
# average author degree (person types 2 and 4)
gmAutomatedAverageAuthorDegree2And4 <- calcAuthorMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = TRUE )
paste( "average author degree (2 and 4) = ", gmAutomatedAverageAuthorDegree2And4, sep = "" )
# average author degree (person type 2 only)
gmAutomatedAverageAuthorDegreeOnly2 <- calcAuthorMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = FALSE )
paste( "average author degree (only 2) = ", gmAutomatedAverageAuthorDegreeOnly2, sep = "" )
# average source degree (person types 3 and 4)
gmAutomatedAverageSourceDegree3And4 <- calcSourceMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = TRUE )
paste( "average source degree (3 and 4) = ", gmAutomatedAverageSourceDegree3And4, sep = "" )
# average source degree (person type 3 only)
gmAutomatedAverageSourceDegreeOnly3 <- calcSourceMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = FALSE )
paste( "average source degree (only 3) = ", gmAutomatedAverageSourceDegreeOnly3, sep = "" )
grp_month
(gm) - automated - More metricsNow that we have the data in statnet object, run the code in the following for more in-depth information:
context_text/R/sna/statnet/sna-statnet-network-stats.r
In [16]:
# Links:
# - manual (PDF): http://cran.r-project.org/web/packages/sna/sna.pdf
# - good notes: http://www.shizukalab.com/toolkits/sna/node-level-calculations
# Also, be advised that statnet and igraph don't really play nice together.
# If you'll be using both, best idea is to have a workspace for each.
#==============================================================================#
# statnet
#==============================================================================#
# make sure you've loaded the statnet library (includes sna)
# install.packages( "statnet" )
#library( statnet )
#==============================================================================#
# NODE level
#==============================================================================#
# what is the standard deviation of the degrees?
gmAutomatedDegreeSd <- sd( gmAutomatedDegreeVector )
paste( "degree SD = ", gmAutomatedDegreeSd, sep = "" )
# what is the variance of the degrees?
gmAutomatedDegreeVar <- var( gmAutomatedDegreeVector )
paste( "degree variance = ", gmAutomatedDegreeVar, sep = "" )
# what is the max value among the degrees?
gmAutomatedDegreeMax <- max( gmAutomatedDegreeVector )
paste( "degree max = ", gmAutomatedDegreeMax, sep = "" )
# calculate and plot degree distributions
gmAutomatedDegreeFrequenciesTable <- table( gmAutomatedDegreeVector )
paste( "degree frequencies = ", gmAutomatedDegreeFrequenciesTable, sep = "" )
gmAutomatedDegreeFrequenciesTable
# node-level undirected betweenness
gmAutomatedBetweenness <- sna::betweenness( gmAutomatedNetworkStatnet, gmode = "graph", cmode = "undirected" )
#paste( "betweenness = ", gmAutomatedBetweenness, sep = "" )
# associate with each node as a node attribute.
# (%v% is a shortcut for the get.vertex.attribute command)
gmAutomatedNetworkStatnet %v% "betweenness" <- gmAutomatedBetweenness
# also add degree vector to original data frame
gmAutomatedDataDF$betweenness <- gmAutomatedBetweenness
#==============================================================================#
# NETWORK level
#==============================================================================#
# graph-level degree centrality
gmAutomatedDegreeCentrality <- sna::centralization( gmAutomatedNetworkStatnet, sna::degree, mode = "graph" )
paste( "degree centrality = ", gmAutomatedDegreeCentrality, sep = "" )
# graph-level betweenness centrality
gmAutomatedBetweennessCentrality <- sna::centralization( gmAutomatedNetworkStatnet, sna::betweenness, mode = "graph", cmode = "undirected" )
paste( "betweenness centrality = ", gmAutomatedBetweennessCentrality, sep = "" )
# graph-level connectedness
gmAutomatedConnectedness <- sna::connectedness( gmAutomatedNetworkStatnet )
paste( "connectedness = ", gmAutomatedConnectedness, sep = "" )
# graph-level transitivity
gmAutomatedTransitivity <- sna::gtrans( gmAutomatedNetworkStatnet, mode = "graph" )
paste( "transitivity = ", gmAutomatedTransitivity, sep = "" )
# graph-level density
gmAutomatedDensity <- sna::gden( gmAutomatedNetworkStatnet, mode = "graph" )
paste( "density = ", gmAutomatedDensity, sep = "" )
grp_month
(gm) - automated - create node attribute DataFrameIf you want to just work with the traits of the nodes/vertexes, you can combine the attribute vectors into a data frame.
In [17]:
#==============================================================================#
# output attributes to data frame
#==============================================================================#
# if you want to just work with the traits of the nodes/vertexes, you can
# combine the attribute vectors into a data frame.
# first, output network object to see what attributes you have
gmAutomatedNetworkStatnet
# then, combine them into a data frame.
gmAutomatedNodeAttrDF <- data.frame( id = gmAutomatedNetworkStatnet %v% "vertex.names",
person_id = gmAutomatedNetworkStatnet %v% "person_id",
person_type = gmAutomatedNetworkStatnet %v% "person_type",
degree = gmAutomatedNetworkStatnet %v% "degree",
betweenness = gmAutomatedNetworkStatnet %v% "betweenness" )
grp_month
(gm) - humanNext, we'll analyze the month of data coded by human coders. Set up some variables to store where data is located:
grp_month
(gm) - human - Read dataRead in the data from tab-delimited data file, then get it in right data structures for use in R SNA.
In [18]:
# initialize variables
gmHumanDataFolder <- "/home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month"
gmHumanDataFile <- "sourcenet_data-20171115-043102-grp_month-human.tab"
gmHumanDataPath <- paste( gmHumanDataFolder, "/", gmHumanDataFile, sep = "" )
In [19]:
gmHumanDataPath
Load the data file into memory
In [20]:
# tab-delimited:
gmHumanDataDF <- read.delim( gmHumanDataPath, header = TRUE, row.names = 1, check.names = FALSE )
In [21]:
# get count of rows...
gmHumanRowCount <- nrow( gmHumanDataDF )
paste( "grp_month automated row count = ", gmHumanRowCount, sep = "" )
# ...and columns
gmHumanColumnCount <- ncol( gmHumanDataDF )
paste( "grp_month automated column count = ", gmHumanColumnCount, sep = "" )
Get just the tie rows and columns for initializing network libraries.
In [22]:
# the below syntax returns only as many columns as there are rows, so
# omitting any trait columns that lie in columns on the right side
# of the file.
gmHumanNetworkDF <- gmHumanDataDF[ , 1 : gmHumanRowCount ]
#str( gmHumanNetworkDF )
In [23]:
# convert to a matrix
gmHumanNetworkMatrix <- as.matrix( gmHumanNetworkDF )
# str( gmHumanNetworkMatrix )
In [24]:
# for all values greater than 1, reset their values to 1
gmHumanNetworkMatrix[ gmHumanNetworkMatrix > 1 ] = 1
grp_month
(gm) - human - initialize statnetFirst, load the statnet package, then load the automated grp_month data into statnet object and assign attributes to nodes.
Based on context_text/R/sna/statnet/sna-statnet-init.r
.
In [25]:
# make sure you've loaded the statnet library
# install.packages( "statnet" )
library( statnet )
In [26]:
# If you have a data frame of attributes (each attribute is a column, with
# attribute name the column name), you can associate those attributes
# when you create the network.
# attribute help: http://www.inside-r.org/packages/cran/network/docs/loading.attributes
# load attributes from a file:
#tab_attribute_test1 <- read.delim( "tab-test1-attribute_data.txt", header = TRUE, row.names = 1, check.names = FALSE )
# or create DataFrame by just grabbing the attribute columns
#gmHumanNetworkAttributeDF <- gmHumanDataDF[ , 1169:1170 ]
gmHumanNetworkAttributeDF <- gmHumanDataDF[ , 1168:1169 ]
# convert matrix to statnet network object instance.
gmHumanNetworkStatnet <- network( gmHumanNetworkMatrix, matrix.type = "adjacency", directed = FALSE, vertex.attr = gmHumanNetworkAttributeDF )
# look at information now.
gmHumanNetworkStatnet
# Network attributes:
# vertices = 314
# directed = FALSE
# hyper = FALSE
# loops = FALSE
# multiple = FALSE
# bipartite = FALSE
# total edges= 309
# missing edges= 0
# non-missing edges= 309
#
# Vertex attribute names:
# person_type vertex.names
#
# No edge attributes
In [6]:
# human - include ties Greater than or equal to 0 (GE0)
gmHumanMeanTieWeightGE0Vector <- apply( gmHumanNetworkMatrix, 1, calculateListMean )
gmHumanDataDF$meanTieWeightGE0 <- gmHumanMeanTieWeightGE0Vector
# human - include ties Greater than or equal to 1 (GE1)
gmHumanMeanTieWeightGE1Vector <- apply( gmHumanNetworkMatrix, 1, calculateListMean, minValueToIncludeIN = 1 )
gmHumanDataDF$meanTieWeightGE1 <- gmHumanMeanTieWeightGE1Vector
# human - Max tie weight?
gmHumanMaxTieWeightVector <- apply( gmHumanNetworkMatrix, 1, calculateListMax )
gmHumanDataDF$maxTieWeight <- gmHumanMaxTieWeightVector
grp_month
(gm) - human - Basic metrics
In [27]:
# assuming that our statnet network object is in reference test1_statnet.
# Use the degree function in the sna package to create vector of degree values
# for each node. Make sure to pass the gmode parameter to tell it that the
# graph is not directed (gmode = "graph", instead of "digraph").
# Doc: http://www.inside-r.org/packages/cran/sna/docs/degree
#degree_vector <- degree( test1_statnet, gmode = "graph" )
# If you have other libraries loaded that also implement a degree function, you
# can also call this with package name:
gmHumanDegreeVector <- sna::degree( gmHumanNetworkStatnet, gmode = "graph" )
# output the vector
gmHumanDegreeVector
# want more info on the degree function? You can get to it eventually through
# the following:
#help( package = "sna" )
#??sna::degree
# what is the average (mean) degree?
gmHumanAvgDegree <- mean( gmHumanDegreeVector )
paste( "average degree = ", gmHumanAvgDegree, sep = "" )
# subset vector to get only those that are above mean
gmHumanAboveMeanVector <- gmHumanDegreeVector[ gmHumanDegreeVector > gmHumanAvgDegree ]
# Take the degree and associate it with each node as a node attribute.
# (%v% is a shortcut for the get.vertex.attribute command)
gmHumanNetworkStatnet %v% "degree" <- gmHumanDegreeVector
# also add degree vector to original data frame
gmHumanDataDF$degree <- gmHumanDegreeVector
In [28]:
# average author degree (person types 2 and 4)
gmHumanAverageAuthorDegree2And4 <- calcAuthorMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = TRUE )
paste( "average author degree (2 and 4) = ", gmHumanAverageAuthorDegree2And4, sep = "" )
# average author degree (person type 2 only)
gmHumanAverageAuthorDegreeOnly2 <- calcAuthorMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = FALSE )
paste( "average author degree (only 2) = ", gmHumanAverageAuthorDegreeOnly2, sep = "" )
# average source degree (person types 3 and 4)
gmHumanAverageSourceDegree3And4 <- calcSourceMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = TRUE )
paste( "average source degree (3 and 4) = ", gmHumanAverageSourceDegree3And4, sep = "" )
# average source degree (person type 3 only)
gmHumanAverageSourceDegreeOnly3 <- calcSourceMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = FALSE )
paste( "average source degree (only 3) = ", gmHumanAverageSourceDegreeOnly3, sep = "" )
grp_month
(gm) - human - More metricsNow that we have the data in statnet object, run the code in the following for more in-depth information:
context_text/R/sna/statnet/sna-statnet-network-stats.r
In [29]:
# Links:
# - manual (PDF): http://cran.r-project.org/web/packages/sna/sna.pdf
# - good notes: http://www.shizukalab.com/toolkits/sna/node-level-calculations
# Also, be advised that statnet and igraph don't really play nice together.
# If you'll be using both, best idea is to have a workspace for each.
#==============================================================================#
# statnet
#==============================================================================#
# make sure you've loaded the statnet library (includes sna)
# install.packages( "statnet" )
#library( statnet )
#==============================================================================#
# NODE level
#==============================================================================#
# what is the standard deviation of the degrees?
gmHumanDegreeSd <- sd( gmHumanDegreeVector )
paste( "degree SD = ", gmHumanDegreeSd, sep = "" )
# what is the variance of the degrees?
gmHumanDegreeVar <- var( gmHumanDegreeVector )
paste( "degree variance = ", gmHumanDegreeVar, sep = "" )
# what is the max value among the degrees?
gmHumanDegreeMax <- max( gmHumanDegreeVector )
paste( "degree max = ", gmHumanDegreeMax, sep = "" )
# calculate and plot degree distributions
gmHumanDegreeFrequenciesTable <- table( gmHumanDegreeVector )
paste( "degree frequencies = ", gmHumanDegreeFrequenciesTable, sep = "" )
gmHumanDegreeFrequenciesTable
# node-level undirected betweenness
gmHumanBetweenness <- sna::betweenness( gmHumanNetworkStatnet, gmode = "graph", cmode = "undirected" )
#paste( "betweenness = ", gmHumanBetweenness, sep = "" )
# associate with each node as a node attribute.
# (%v% is a shortcut for the get.vertex.attribute command)
gmHumanNetworkStatnet %v% "betweenness" <- gmHumanBetweenness
# also add degree vector to original data frame
gmHumanDataDF$betweenness <- gmHumanBetweenness
#==============================================================================#
# NETWORK level
#==============================================================================#
# graph-level degree centrality
gmHumanDegreeCentrality <- sna::centralization( gmHumanNetworkStatnet, sna::degree, mode = "graph" )
paste( "degree centrality = ", gmHumanDegreeCentrality, sep = "" )
# graph-level betweenness centrality
gmHumanBetweennessCentrality <- sna::centralization( gmHumanNetworkStatnet, sna::betweenness, mode = "graph", cmode = "undirected" )
paste( "betweenness centrality = ", gmHumanBetweennessCentrality, sep = "" )
# graph-level connectedness
gmHumanConnectedness <- sna::connectedness( gmHumanNetworkStatnet )
paste( "connectedness = ", gmHumanConnectedness, sep = "" )
# graph-level transitivity
gmHumanTransitivity <- sna::gtrans( gmHumanNetworkStatnet, mode = "graph" )
paste( "transitivity = ", gmHumanTransitivity, sep = "" )
# graph-level density
gmHumanDensity <- sna::gden( gmHumanNetworkStatnet, mode = "graph" )
paste( "density = ", gmHumanDensity, sep = "" )
grp_month
(gm) - human - create node attribute DataFrameIf you want to just work with the traits of the nodes/vertexes, you can combine the attribute vectors into a data frame.
In [30]:
#==============================================================================#
# output attributes to data frame
#==============================================================================#
# if you want to just work with the traits of the nodes/vertexes, you can
# combine the attribute vectors into a data frame.
# first, output network object to see what attributes you have
gmHumanNetworkStatnet
# then, combine them into a data frame.
gmHumanNodeAttrDF <- data.frame( id = gmHumanNetworkStatnet %v% "vertex.names",
person_id = gmHumanNetworkStatnet %v% "person_id",
person_type = gmHumanNetworkStatnet %v% "person_type",
degree = gmHumanNetworkStatnet %v% "degree",
betweenness = gmHumanNetworkStatnet %v% "betweenness" )
grp_month
QAP graph correlation between automated and ground truthNow, compare the automated and human-coded networks themselves using graph correlation in QAP.
Based on: context_text/R/sna/statnet/sna-qap.r
Note: QAP compares two networks, so will have to wait until both OpenCalais and human coding networks have been processed.
In [31]:
# link to good doc on qaptest(){sna} function: http://www.inside-r.org/packages/cran/sna/docs/qaptest
# First, need to load data - see (or just source() ) the file "sna-load_data.r".
# source( "sna-load_data.r" )
# does the following (among other things):
# Start with loading in tab-delimited files.
#humanNetworkData <- read.delim( "human-sourcenet_data-20150504-002453.tab", header = TRUE, row.names = 1, check.names = FALSE )
#calaisNetworkData <- read.delim( "puter-sourcenet_data-20150504-002507.tab", header = TRUE, row.names = 1, check.names = FALSE )
# remove the right-most column, which contains non-tie info on nodes.
#humanNetworkTies <- humanNetworkData[ , -ncol( humanNetworkData ) ]
#gmAutomatedNetworkDF <- calaisNetworkData[ , -ncol( calaisNetworkData )]
# convert each to a matrix
#gmHumanNetworkMatrix <- as.matrix( gmHumanNetworkTies )
#gmAutomatedNetworkMatrix <- as.matrix( gmAutomatedNetworkDF )
# imports
# install.packages( "sna" )
# install.packages( "statnet" )
library( "sna" )
# package up data for calling qaptest() - first make 3-dimensional array to hold
# our two matrices - this is known as a "graph set".
graphSetArray <- array( dim = c( 2, ncol( gmHumanNetworkMatrix ), nrow( gmHumanNetworkMatrix ) ) )
# then, place each matrix in one dimension of the array.
graphSetArray[ 1, , ] <- gmHumanNetworkMatrix
graphSetArray[ 2, , ] <- gmAutomatedNetworkMatrix
# first, try a graph correlation
graphCorrelation <- sna::gcor( gmHumanNetworkMatrix, gmAutomatedNetworkMatrix )
paste( "graph correlation = ", graphCorrelation, sep = "" )
# try a qaptest...
qapGcorResult <- sna::qaptest( graphSetArray, sna::gcor, g1 = 1, g2 = 2 )
summary( qapGcorResult )
plot( qapGcorResult )
# graph covariance...
graphCovariance <- sna::gcov( gmHumanNetworkMatrix, gmAutomatedNetworkMatrix )
graphCovariance
paste( "graph covariance = ", graphCovariance, sep = "" )
# try a qaptest...
qapGcovResult <- sna::qaptest( graphSetArray, sna::gcov, g1 = 1, g2 = 2 )
summary( qapGcovResult )
plot( qapGcovResult )
# Hamming Distance
graphHammingDist <- sna::hdist( gmHumanNetworkMatrix, gmAutomatedNetworkMatrix )
paste( "graph hamming distance = ", graphHammingDist, sep = "" )
# try a qaptest...
qapHdistResult <- sna::qaptest( graphSetArray, sna::hdist, g1 = 1, g2 = 2 )
summary( qapHdistResult )
plot( qapHdistResult )
# graph structural correlation?
#graphStructCorrelation <- gscor( gmHumanNetworkMatrix, gmAutomatedNetworkMatrix )
#graphStructCorrelation
In [32]:
output_prefix <- "grp_week"
grp_week
(gw) - automated - OpenCalaisFirst, we'll analyze the month of data coded by OpenCalais. Set up some variables to store where data is located:
grp_week
(gw) - automated - Read dataRead in the data from tab-delimited data file, then get it in right data structures for use in R SNA.
In [33]:
# initialize variables
gwAutomatedDataFolder <- "/home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month"
gwAutomatedDataFile <- "sourcenet_data-20171206-031358-grp_month-automated-week_subset.tab"
gwAutomatedDataPath <- paste( gwAutomatedDataFolder, "/", gwAutomatedDataFile, sep = "" )
In [34]:
gwAutomatedDataPath
Load the data file into memory
In [35]:
# tab-delimited:
gwAutomatedDataDF <- read.delim( gwAutomatedDataPath, header = TRUE, row.names = 1, check.names = FALSE )
In [36]:
# get count of rows...
gwAutomatedRowCount <- nrow( gwAutomatedDataDF )
paste( output_prefix, "automated row count =", gwAutomatedRowCount, sep = " " )
# ...and columns
gwAutomatedColumnCount <- ncol( gwAutomatedDataDF )
paste( output_prefix, "automated column count =", gwAutomatedColumnCount, sep = " " )
Get just the tie rows and columns for initializing network libraries.
In [37]:
# the below syntax returns only as many columns as there are rows, so
# omitting any trait columns that lie in columns on the right side
# of the file.
gwAutomatedNetworkDF <- gwAutomatedDataDF[ , 1 : gwAutomatedRowCount ]
#str( gwAutomatedNetworkDF )
In [38]:
# convert to a matrix
gwAutomatedNetworkMatrix <- as.matrix( gwAutomatedNetworkDF )
# str( gwAutomatedNetworkMatrix )
In [39]:
# for all values greater than 1, reset their values to 1
gwAutomatedNetworkMatrix[ gwAutomatedNetworkMatrix > 1 ] = 1
grp_week
(gw) - automated - initialize statnetFirst, load the statnet package, then load the automated grp_month week subset data into statnet object and assign attributes to nodes.
Based on context_text/R/sna/statnet/sna-statnet-init.r
.
In [40]:
# make sure you've loaded the statnet library
# install.packages( "statnet" )
library( statnet )
In [41]:
# If you have a data frame of attributes (each attribute is a column, with
# attribute name the column name), you can associate those attributes
# when you create the network.
# attribute help: http://www.inside-r.org/packages/cran/network/docs/loading.attributes
# load attributes from a file:
#tab_attribute_test1 <- read.delim( "tab-test1-attribute_data.txt", header = TRUE, row.names = 1, check.names = FALSE )
# or create DataFrame by just grabbing the attribute columns
gwAutomatedNetworkAttributeDF <- gwAutomatedDataDF[ , 1168:1169 ]
# convert matrix to statnet network object instance.
gwAutomatedNetworkStatnet <- network( gwAutomatedNetworkMatrix, matrix.type = "adjacency", directed = FALSE, vertex.attr = gwAutomatedNetworkAttributeDF )
# look at information now.
gwAutomatedNetworkStatnet
# Network attributes:
# vertices = 314
# directed = FALSE
# hyper = FALSE
# loops = FALSE
# multiple = FALSE
# bipartite = FALSE
# total edges= 309
# missing edges= 0
# non-missing edges= 309
#
# Vertex attribute names:
# person_type vertex.names
#
# No edge attributes
In [7]:
# calais - include ties Greater than or equal to 0 (GE0)
gwAutomatedMeanTieWeightGE0Vector <- apply( gwAutomatedNetworkMatrix, 1, calculateListMean )
gwAutomatedDataDF$meanTieWeightGE0 <- gwAutomatedMeanTieWeightGE0Vector
# calais - include ties Greater than or equal to 1 (GE1)
gwAutomatedMeanTieWeightGE1Vector <- apply( gwAutomatedNetworkMatrix, 1, calculateListMean, minValueToIncludeIN = 1 )
gwAutomatedDataDF$meanTieWeightGE1 <- gwAutomatedMeanTieWeightGE1Vector
# automated - Max tie weight?
gwAutomatedMaxTieWeightVector <- apply( gwAutomatedNetworkMatrix, 1, calculateListMax )
gwAutomatedDataDF$maxTieWeight <- gwAutomatedMaxTieWeightVector
grp_week
(gw) - automated - Basic metrics
In [42]:
# assuming that our statnet network object is in reference test1_statnet.
# Use the degree function in the sna package to create vector of degree values
# for each node. Make sure to pass the gwode parameter to tell it that the
# graph is not directed (gwode = "graph", instead of "digraph").
# Doc: http://www.inside-r.org/packages/cran/sna/docs/degree
#degree_vector <- degree( test1_statnet, gwode = "graph" )
# If you have other libraries loaded that also implement a degree function, you
# can also call this with package name:
gwAutomatedDegreeVector <- sna::degree( gwAutomatedNetworkStatnet, gmode = "graph" )
# output the vector
gwAutomatedDegreeVector
# want more info on the degree function? You can get to it eventually through
# the following:
#help( package = "sna" )
#??sna::degree
# what is the average (mean) degree?
gwAutomatedAvgDegree <- mean( gwAutomatedDegreeVector )
paste( output_prefix, "average degree =", gwAutomatedAvgDegree, sep = " " )
# subset vector to get only those that are above mean
gwAutomatedAboveMeanVector <- gwAutomatedDegreeVector[ gwAutomatedDegreeVector > gwAutomatedAvgDegree ]
# Take the degree and associate it with each node as a node attribute.
# (%v% is a shortcut for the get.vertex.attribute command)
gwAutomatedNetworkStatnet %v% "degree" <- gwAutomatedDegreeVector
# also add degree vector to original data frame
gwAutomatedDataDF$degree <- gwAutomatedDegreeVector
In [43]:
# average author degree (person types 2 and 4)
gwAutomatedAverageAuthorDegree2And4 <- calcAuthorMeanDegree( dataFrameIN = gwAutomatedDataDF, includeBothIN = TRUE )
paste( output_prefix, "average author degree (2 and 4) =", gwAutomatedAverageAuthorDegree2And4, sep = " " )
# average author degree (person type 2 only)
gwAutomatedAverageAuthorDegreeOnly2 <- calcAuthorMeanDegree( dataFrameIN = gwAutomatedDataDF, includeBothIN = FALSE )
paste( output_prefix, "average author degree (only 2) =", gwAutomatedAverageAuthorDegreeOnly2, sep = " " )
# average source degree (person types 3 and 4)
gwAutomatedAverageSourceDegree3And4 <- calcSourceMeanDegree( dataFrameIN = gwAutomatedDataDF, includeBothIN = TRUE )
paste( output_prefix, "average source degree (3 and 4) =", gwAutomatedAverageSourceDegree3And4, sep = " " )
# average source degree (person type 3 only)
gwAutomatedAverageSourceDegreeOnly3 <- calcSourceMeanDegree( dataFrameIN = gwAutomatedDataDF, includeBothIN = FALSE )
paste( output_prefix, "average source degree (only 3) =", gwAutomatedAverageSourceDegreeOnly3, sep = " " )
grp_week
(gw) - automated - More metricsNow that we have the data in statnet object, run the code in the following for more in-depth information:
context_text/R/sna/statnet/sna-statnet-network-stats.r
In [44]:
# Links:
# - manual (PDF): http://cran.r-project.org/web/packages/sna/sna.pdf
# - good notes: http://www.shizukalab.com/toolkits/sna/node-level-calculations
# Also, be advised that statnet and igraph don't really play nice together.
# If you'll be using both, best idea is to have a workspace for each.
#==============================================================================#
# statnet
#==============================================================================#
# make sure you've loaded the statnet library (includes sna)
# install.packages( "statnet" )
#library( statnet )
#==============================================================================#
# NODE level
#==============================================================================#
# what is the standard deviation of the degrees?
gwAutomatedDegreeSd <- sd( gwAutomatedDegreeVector )
paste( output_prefix, "degree SD =", gwAutomatedDegreeSd, sep = " " )
# what is the variance of the degrees?
gwAutomatedDegreeVar <- var( gwAutomatedDegreeVector )
paste( output_prefix, "degree variance =", gwAutomatedDegreeVar, sep = " " )
# what is the max value among the degrees?
gwAutomatedDegreeMax <- max( gwAutomatedDegreeVector )
paste( output_prefix, "degree max =", gwAutomatedDegreeMax, sep = " " )
# calculate and plot degree distributions
gwAutomatedDegreeFrequenciesTable <- table( gwAutomatedDegreeVector )
paste( output_prefix, "degree frequencies =", gwAutomatedDegreeFrequenciesTable, sep = " " )
gwAutomatedDegreeFrequenciesTable
# node-level undirected betweenness
gwAutomatedBetweenness <- sna::betweenness( gwAutomatedNetworkStatnet, gmode = "graph", cmode = "undirected" )
#paste( "betweenness = ", gwAutomatedBetweenness, sep = "" )
# associate with each node as a node attribute.
# (%v% is a shortcut for the get.vertex.attribute command)
gwAutomatedNetworkStatnet %v% "betweenness" <- gwAutomatedBetweenness
# also add degree vector to original data frame
gwAutomatedDataDF$betweenness <- gwAutomatedBetweenness
#==============================================================================#
# NETWORK level
#==============================================================================#
# graph-level degree centrality
gwAutomatedDegreeCentrality <- sna::centralization( gwAutomatedNetworkStatnet, sna::degree, mode = "graph" )
paste( output_prefix, "degree centrality =", gwAutomatedDegreeCentrality, sep = " " )
# graph-level betweenness centrality
gwAutomatedBetweennessCentrality <- sna::centralization( gwAutomatedNetworkStatnet, sna::betweenness, mode = "graph", cmode = "undirected" )
paste( output_prefix, "betweenness centrality =", gwAutomatedBetweennessCentrality, sep = " " )
# graph-level connectedness
gwAutomatedConnectedness <- sna::connectedness( gwAutomatedNetworkStatnet )
paste( output_prefix, "connectedness =", gwAutomatedConnectedness, sep = " " )
# graph-level transitivity
gwAutomatedTransitivity <- sna::gtrans( gwAutomatedNetworkStatnet, mode = "graph" )
paste( output_prefix, "transitivity =", gwAutomatedTransitivity, sep = " " )
# graph-level density
gwAutomatedDensity <- sna::gden( gwAutomatedNetworkStatnet, mode = "graph" )
paste( output_prefix, "density =", gwAutomatedDensity, sep = " " )
grp_week
(gw) - automated - create node attribute DataFrameIf you want to just work with the traits of the nodes/vertexes, you can combine the attribute vectors into a data frame.
In [45]:
#==============================================================================#
# output attributes to data frame
#==============================================================================#
# if you want to just work with the traits of the nodes/vertexes, you can
# combine the attribute vectors into a data frame.
# first, output network object to see what attributes you have
gwAutomatedNetworkStatnet
# then, combine them into a data frame.
gwAutomatedNodeAttrDF <- data.frame( id = gwAutomatedNetworkStatnet %v% "vertex.names",
person_id = gwAutomatedNetworkStatnet %v% "person_id",
person_type = gwAutomatedNetworkStatnet %v% "person_type",
degree = gwAutomatedNetworkStatnet %v% "degree",
betweenness = gwAutomatedNetworkStatnet %v% "betweenness" )
grp_week
(gw) - humanNext, we'll analyze the same week from the month of data coded by human coders. Set up some variables to store where data is located:
grp_week
(gw) - human - Read dataRead in the data from tab-delimited data file, then get it in right data structures for use in R SNA.
In [46]:
# initialize variables
gwHumanDataFolder <- "/home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month"
gwHumanDataFile <- "sourcenet_data-20171206-031319-grp_month-human-week_subset.tab"
gwHumanDataPath <- paste( gwHumanDataFolder, "/", gwHumanDataFile, sep = "" )
In [47]:
gwHumanDataPath
Load the data file into memory
In [48]:
# tab-delimited:
gwHumanDataDF <- read.delim( gwHumanDataPath, header = TRUE, row.names = 1, check.names = FALSE )
In [49]:
# get count of rows...
gwHumanRowCount <- nrow( gwHumanDataDF )
paste( output_prefix, "automated row count =", gwHumanRowCount, sep = " " )
# ...and columns
gwHumanColumnCount <- ncol( gwHumanDataDF )
paste( output_prefix, "automated column count =", gwHumanColumnCount, sep = " " )
Get just the tie rows and columns for initializing network libraries.
In [50]:
# the below syntax returns only as many columns as there are rows, so
# omitting any trait columns that lie in columns on the right side
# of the file.
gwHumanNetworkDF <- gwHumanDataDF[ , 1 : gwHumanRowCount ]
#str( gwHumanNetworkDF )
In [51]:
# convert to a matrix
gwHumanNetworkMatrix <- as.matrix( gwHumanNetworkDF )
# str( gwHumanNetworkMatrix )
In [52]:
# for all values greater than 1, reset their values to 1
gwHumanNetworkMatrix[ gwHumanNetworkMatrix > 1 ] = 1
grp_week
(gw) - human - initialize statnetFirst, load the statnet package, then load the automated grp_month week of data into statnet object and assign attributes to nodes.
Based on context_text/R/sna/statnet/sna-statnet-init.r
.
In [53]:
# make sure you've loaded the statnet library
# install.packages( "statnet" )
library( statnet )
In [54]:
# If you have a data frame of attributes (each attribute is a column, with
# attribute name the column name), you can associate those attributes
# when you create the network.
# attribute help: http://www.inside-r.org/packages/cran/network/docs/loading.attributes
# load attributes from a file:
#tab_attribute_test1 <- read.delim( "tab-test1-attribute_data.txt", header = TRUE, row.names = 1, check.names = FALSE )
# or create DataFrame by just grabbing the attribute columns
#gwHumanNetworkAttributeDF <- gwHumanDataDF[ , 1169:1170 ]
gwHumanNetworkAttributeDF <- gwHumanDataDF[ , 1168:1169 ]
# convert matrix to statnet network object instance.
gwHumanNetworkStatnet <- network( gwHumanNetworkMatrix, matrix.type = "adjacency", directed = FALSE, vertex.attr = gwHumanNetworkAttributeDF )
# look at information now.
gwHumanNetworkStatnet
# Network attributes:
# vertices = 314
# directed = FALSE
# hyper = FALSE
# loops = FALSE
# multiple = FALSE
# bipartite = FALSE
# total edges= 309
# missing edges= 0
# non-missing edges= 309
#
# Vertex attribute names:
# person_type vertex.names
#
# No edge attributes
In [8]:
# human - include ties Greater than or equal to 0 (GE0)
gwHumanMeanTieWeightGE0Vector <- apply( gwHumanNetworkMatrix, 1, calculateListMean )
gwHumanDataDF$meanTieWeightGE0 <- gwHumanMeanTieWeightGE0Vector
# human - include ties Greater than or equal to 1 (GE1)
gwHumanMeanTieWeightGE1Vector <- apply( gwHumanNetworkMatrix, 1, calculateListMean, minValueToIncludeIN = 1 )
gwHumanDataDF$meanTieWeightGE1 <- gwHumanMeanTieWeightGE1Vector
# human - Max tie weight?
gwHumanMaxTieWeightVector <- apply( gwHumanNetworkMatrix, 1, calculateListMax )
gwHumanDataDF$maxTieWeight <- gwHumanMaxTieWeightVector
grp_week
(gw) - human - Basic metrics
In [55]:
# assuming that our statnet network object is in reference test1_statnet.
# Use the degree function in the sna package to create vector of degree values
# for each node. Make sure to pass the gmode parameter to tell it that the
# graph is not directed (gmode = "graph", instead of "digraph").
# Doc: http://www.inside-r.org/packages/cran/sna/docs/degree
#degree_vector <- degree( test1_statnet, gmode = "graph" )
# If you have other libraries loaded that also implement a degree function, you
# can also call this with package name:
gwHumanDegreeVector <- sna::degree( gwHumanNetworkStatnet, gmode = "graph" )
# output the vector
gwHumanDegreeVector
# want more info on the degree function? You can get to it eventually through
# the following:
#help( package = "sna" )
#??sna::degree
# what is the average (mean) degree?
gwHumanAvgDegree <- mean( gwHumanDegreeVector )
paste( output_prefix, "average degree =", gwHumanAvgDegree, sep = " " )
# subset vector to get only those that are above mean
gwHumanAboveMeanVector <- gwHumanDegreeVector[ gwHumanDegreeVector > gwHumanAvgDegree ]
# Take the degree and associate it with each node as a node attribute.
# (%v% is a shortcut for the get.vertex.attribute command)
gwHumanNetworkStatnet %v% "degree" <- gwHumanDegreeVector
# also add degree vector to original data frame
gwHumanDataDF$degree <- gwHumanDegreeVector
In [56]:
# average author degree (person types 2 and 4)
gwHumanAverageAuthorDegree2And4 <- calcAuthorMeanDegree( dataFrameIN = gwHumanDataDF, includeBothIN = TRUE )
paste( output_prefix, "average author degree (2 and 4) = ", gwHumanAverageAuthorDegree2And4, sep = " " )
# average author degree (person type 2 only)
gwHumanAverageAuthorDegreeOnly2 <- calcAuthorMeanDegree( dataFrameIN = gwHumanDataDF, includeBothIN = FALSE )
paste( output_prefix, "average author degree (only 2) = ", gwHumanAverageAuthorDegreeOnly2, sep = " " )
# average source degree (person types 3 and 4)
gwHumanAverageSourceDegree3And4 <- calcSourceMeanDegree( dataFrameIN = gwHumanDataDF, includeBothIN = TRUE )
paste( output_prefix, "average source degree (3 and 4) = ", gwHumanAverageSourceDegree3And4, sep = " " )
# average source degree (person type 3 only)
gwHumanAverageSourceDegreeOnly3 <- calcSourceMeanDegree( dataFrameIN = gwHumanDataDF, includeBothIN = FALSE )
paste( output_prefix, "average source degree (only 3) = ", gwHumanAverageSourceDegreeOnly3, sep = " " )
grp_week
(gw) - human - More metricsNow that we have the data in statnet object, run the code in the following for more in-depth information:
context_text/R/sna/statnet/sna-statnet-network-stats.r
In [57]:
# Links:
# - manual (PDF): http://cran.r-project.org/web/packages/sna/sna.pdf
# - good notes: http://www.shizukalab.com/toolkits/sna/node-level-calculations
# Also, be advised that statnet and igraph don't really play nice together.
# If you'll be using both, best idea is to have a workspace for each.
#==============================================================================#
# statnet
#==============================================================================#
# make sure you've loaded the statnet library (includes sna)
# install.packages( "statnet" )
#library( statnet )
#==============================================================================#
# NODE level
#==============================================================================#
# what is the standard deviation of the degrees?
gwHumanDegreeSd <- sd( gwHumanDegreeVector )
paste( output_prefix, "degree SD =", gwHumanDegreeSd, sep = " " )
# what is the variance of the degrees?
gwHumanDegreeVar <- var( gwHumanDegreeVector )
paste( output_prefix, "degree variance =", gwHumanDegreeVar, sep = " " )
# what is the max value among the degrees?
gwHumanDegreeMax <- max( gwHumanDegreeVector )
paste( output_prefix, "degree max =", gwHumanDegreeMax, sep = " " )
# calculate and plot degree distributions
gwHumanDegreeFrequenciesTable <- table( gwHumanDegreeVector )
paste( output_prefix, "degree frequencies =", gwHumanDegreeFrequenciesTable, sep = " " )
gwHumanDegreeFrequenciesTable
# node-level undirected betweenness
gwHumanBetweenness <- sna::betweenness( gwHumanNetworkStatnet, gmode = "graph", cmode = "undirected" )
#paste( "betweenness = ", gwHumanBetweenness, sep = "" )
# associate with each node as a node attribute.
# (%v% is a shortcut for the get.vertex.attribute command)
gwHumanNetworkStatnet %v% "betweenness" <- gwHumanBetweenness
# also add degree vector to original data frame
gwHumanDataDF$betweenness <- gwHumanBetweenness
#==============================================================================#
# NETWORK level
#==============================================================================#
# graph-level degree centrality
gwHumanDegreeCentrality <- sna::centralization( gwHumanNetworkStatnet, sna::degree, mode = "graph" )
paste( output_prefix, "degree centrality =", gwHumanDegreeCentrality, sep = " " )
# graph-level betweenness centrality
gwHumanBetweennessCentrality <- sna::centralization( gwHumanNetworkStatnet, sna::betweenness, mode = "graph", cmode = "undirected" )
paste( output_prefix, "betweenness centrality =", gwHumanBetweennessCentrality, sep = " " )
# graph-level connectedness
gwHumanConnectedness <- sna::connectedness( gwHumanNetworkStatnet )
paste( output_prefix, "connectedness =", gwHumanConnectedness, sep = " " )
# graph-level transitivity
gwHumanTransitivity <- sna::gtrans( gwHumanNetworkStatnet, mode = "graph" )
paste( output_prefix, "transitivity =", gwHumanTransitivity, sep = " " )
# graph-level density
gwHumanDensity <- sna::gden( gwHumanNetworkStatnet, mode = "graph" )
paste( output_prefix, "density =", gwHumanDensity, sep = " " )
grp_week
(gw) - human - create node attribute DataFrameIf you want to just work with the traits of the nodes/vertexes, you can combine the attribute vectors into a data frame.
In [58]:
#==============================================================================#
# output attributes to data frame
#==============================================================================#
# if you want to just work with the traits of the nodes/vertexes, you can
# combine the attribute vectors into a data frame.
# first, output network object to see what attributes you have
gwHumanNetworkStatnet
# then, combine them into a data frame.
gwHumanNodeAttrDF <- data.frame( id = gwHumanNetworkStatnet %v% "vertex.names",
person_id = gwHumanNetworkStatnet %v% "person_id",
person_type = gwHumanNetworkStatnet %v% "person_type",
degree = gwHumanNetworkStatnet %v% "degree",
betweenness = gwHumanNetworkStatnet %v% "betweenness" )
grp_week
QAP graph correlation between automated and ground truthNow, compare the automated and human-coded networks themselves using graph correlation in QAP.
Based on: context_text/R/sna/statnet/sna-qap.r
Note: QAP compares two networks, so will have to wait until both OpenCalais and human coding networks have been processed.
In [59]:
# link to good doc on qaptest(){sna} function: http://www.inside-r.org/packages/cran/sna/docs/qaptest
# First, need to load data - see (or just source() ) the file "sna-load_data.r".
# source( "sna-load_data.r" )
# does the following (among other things):
# Start with loading in tab-delimited files.
#humanNetworkData <- read.delim( "human-sourcenet_data-20150504-002453.tab", header = TRUE, row.names = 1, check.names = FALSE )
#calaisNetworkData <- read.delim( "puter-sourcenet_data-20150504-002507.tab", header = TRUE, row.names = 1, check.names = FALSE )
# remove the right-most column, which contains non-tie info on nodes.
#humanNetworkTies <- humanNetworkData[ , -ncol( humanNetworkData ) ]
#gwAutomatedNetworkDF <- calaisNetworkData[ , -ncol( calaisNetworkData )]
# convert each to a matrix
#gwHumanNetworkMatrix <- as.matrix( gwHumanNetworkTies )
#gwAutomatedNetworkMatrix <- as.matrix( gwAutomatedNetworkDF )
# imports
# install.packages( "sna" )
# install.packages( "statnet" )
library( "sna" )
# package up data for calling qaptest() - first make 3-dimensional array to hold
# our two matrices - this is known as a "graph set".
graphSetArray <- array( dim = c( 2, ncol( gwHumanNetworkMatrix ), nrow( gwHumanNetworkMatrix ) ) )
# then, place each matrix in one dimension of the array.
graphSetArray[ 1, , ] <- gwHumanNetworkMatrix
graphSetArray[ 2, , ] <- gwAutomatedNetworkMatrix
# first, try a graph correlation
graphCorrelation <- sna::gcor( gwHumanNetworkMatrix, gwAutomatedNetworkMatrix )
paste( output_prefix, "graph correlation =", graphCorrelation, sep = " " )
# try a qaptest...
qapGcorResult <- sna::qaptest( graphSetArray, sna::gcor, g1 = 1, g2 = 2 )
summary( qapGcorResult )
plot( qapGcorResult )
# graph covariance...
graphCovariance <- sna::gcov( gwHumanNetworkMatrix, gwAutomatedNetworkMatrix )
graphCovariance
paste( output_prefix, "graph covariance =", graphCovariance, sep = " " )
# try a qaptest...
qapGcovResult <- sna::qaptest( graphSetArray, sna::gcov, g1 = 1, g2 = 2 )
summary( qapGcovResult )
plot( qapGcovResult )
# Hamming Distance
graphHammingDist <- sna::hdist( gwHumanNetworkMatrix, gwAutomatedNetworkMatrix )
paste( output_prefix, "graph hamming distance =", graphHammingDist, sep = " " )
# try a qaptest...
qapHdistResult <- sna::qaptest( graphSetArray, sna::hdist, g1 = 1, g2 = 2 )
summary( qapHdistResult )
plot( qapHdistResult )
# graph structural correlation?
#graphStructCorrelation <- gscor( gwHumanNetworkMatrix, gwAutomatedNetworkMatrix )
#graphStructCorrelation
grp_month
and grp_week
using QAPNow, compare the automated and human-coded networks from a month and a week against each other, to see what more time gets you.
Based on: context_text/R/sna/statnet/sna-qap.r
Note: QAP compares two networks, so will have to wait until both OpenCalais and human coding networks have been processed.
In [60]:
output_prefix <- "month-to-week automated"
In [61]:
# link to good doc on qaptest(){sna} function: http://www.inside-r.org/packages/cran/sna/docs/qaptest
# First, need to load data - see (or just source() ) the file "sna-load_data.r".
# source( "sna-load_data.r" )
# does the following (among other things):
# Start with loading in tab-delimited files.
#humanNetworkData <- read.delim( "human-sourcenet_data-20150504-002453.tab", header = TRUE, row.names = 1, check.names = FALSE )
#calaisNetworkData <- read.delim( "puter-sourcenet_data-20150504-002507.tab", header = TRUE, row.names = 1, check.names = FALSE )
# remove the right-most column, which contains non-tie info on nodes.
#humanNetworkTies <- humanNetworkData[ , -ncol( humanNetworkData ) ]
#gwAutomatedNetworkDF <- calaisNetworkData[ , -ncol( calaisNetworkData )]
# convert each to a matrix
#gwHumanNetworkMatrix <- as.matrix( gwHumanNetworkTies )
#gwAutomatedNetworkMatrix <- as.matrix( gwAutomatedNetworkDF )
# imports
# install.packages( "sna" )
# install.packages( "statnet" )
library( "sna" )
# package up data for calling qaptest() - first make 3-dimensional array to hold
# our two matrices - this is known as a "graph set".
graphSetArray <- array( dim = c( 2, ncol( gmAutomatedNetworkMatrix ), nrow( gmAutomatedNetworkMatrix ) ) )
# then, place each matrix in one dimension of the array.
graphSetArray[ 1, , ] <- gmAutomatedNetworkMatrix
graphSetArray[ 2, , ] <- gwAutomatedNetworkMatrix
# first, try a graph correlation
graphCorrelation <- sna::gcor( gmAutomatedNetworkMatrix, gwAutomatedNetworkMatrix )
paste( output_prefix, "graph correlation =", graphCorrelation, sep = " " )
# try a qaptest...
qapGcorResult <- sna::qaptest( graphSetArray, sna::gcor, g1 = 1, g2 = 2 )
summary( qapGcorResult )
plot( qapGcorResult )
# graph covariance...
graphCovariance <- sna::gcov( gmAutomatedNetworkMatrix, gwAutomatedNetworkMatrix )
graphCovariance
paste( output_prefix, "graph covariance =", graphCovariance, sep = " " )
# try a qaptest...
qapGcovResult <- sna::qaptest( graphSetArray, sna::gcov, g1 = 1, g2 = 2 )
summary( qapGcovResult )
plot( qapGcovResult )
# Hamming Distance
graphHammingDist <- sna::hdist( gmAutomatedNetworkMatrix, gwAutomatedNetworkMatrix )
paste( output_prefix, "graph hamming distance =", graphHammingDist, sep = " " )
# try a qaptest...
qapHdistResult <- sna::qaptest( graphSetArray, sna::hdist, g1 = 1, g2 = 2 )
summary( qapHdistResult )
plot( qapHdistResult )
# graph structural correlation?
#graphStructCorrelation <- gscor( gwHumanNetworkMatrix, gwAutomatedNetworkMatrix )
#graphStructCorrelation
In [62]:
output_prefix <- "month-to-week human"
In [63]:
# link to good doc on qaptest(){sna} function: http://www.inside-r.org/packages/cran/sna/docs/qaptest
# First, need to load data - see (or just source() ) the file "sna-load_data.r".
# source( "sna-load_data.r" )
# does the following (among other things):
# Start with loading in tab-delimited files.
#humanNetworkData <- read.delim( "human-sourcenet_data-20150504-002453.tab", header = TRUE, row.names = 1, check.names = FALSE )
#calaisNetworkData <- read.delim( "puter-sourcenet_data-20150504-002507.tab", header = TRUE, row.names = 1, check.names = FALSE )
# remove the right-most column, which contains non-tie info on nodes.
#humanNetworkTies <- humanNetworkData[ , -ncol( humanNetworkData ) ]
#gwHumanNetworkDF <- calaisNetworkData[ , -ncol( calaisNetworkData )]
# convert each to a matrix
#gwHumanNetworkMatrix <- as.matrix( gwHumanNetworkTies )
#gwHumanNetworkMatrix <- as.matrix( gwHumanNetworkDF )
# imports
# install.packages( "sna" )
# install.packages( "statnet" )
library( "sna" )
# package up data for calling qaptest() - first make 3-dimensional array to hold
# our two matrices - this is known as a "graph set".
graphSetArray <- array( dim = c( 2, ncol( gmHumanNetworkMatrix ), nrow( gmHumanNetworkMatrix ) ) )
# then, place each matrix in one dimension of the array.
graphSetArray[ 1, , ] <- gmHumanNetworkMatrix
graphSetArray[ 2, , ] <- gwHumanNetworkMatrix
# first, try a graph correlation
graphCorrelation <- sna::gcor( gmHumanNetworkMatrix, gwHumanNetworkMatrix )
paste( output_prefix, "graph correlation =", graphCorrelation, sep = " " )
# try a qaptest...
qapGcorResult <- sna::qaptest( graphSetArray, sna::gcor, g1 = 1, g2 = 2 )
summary( qapGcorResult )
plot( qapGcorResult )
# graph covariance...
graphCovariance <- sna::gcov( gmHumanNetworkMatrix, gwHumanNetworkMatrix )
graphCovariance
paste( output_prefix, "graph covariance =", graphCovariance, sep = " " )
# try a qaptest...
qapGcovResult <- sna::qaptest( graphSetArray, sna::gcov, g1 = 1, g2 = 2 )
summary( qapGcovResult )
plot( qapGcovResult )
# Hamming Distance
graphHammingDist <- sna::hdist( gmHumanNetworkMatrix, gwHumanNetworkMatrix )
paste( output_prefix, "graph hamming distance =", graphHammingDist, sep = " " )
# try a qaptest...
qapHdistResult <- sna::qaptest( graphSetArray, sna::hdist, g1 = 1, g2 = 2 )
summary( qapHdistResult )
plot( qapHdistResult )
# graph structural correlation?
#graphStructCorrelation <- gscor( gmHumanNetworkMatrix, gwHumanNetworkMatrix )
#graphStructCorrelation
Save all the information in the current image, in case we need/want it later.
In [9]:
# help( save.image )
save.image( file = workspace_file_name )
TODO:
human data for grp_month has one fewer vertex (1167) than automated (1168). The missing person is row 355, user ID 781 (source_3), who is in automated, not in human. QAP needs same-size matrices.