2017.12.02 - work log - prelim - R network analysis - igraph
Related files:
network descriptives
network-level
files
R scripts:
context_text/R/db_connect.r
context_text/R/sna/functions-sna.r
context_text/R/sna/sna-load_data.r
context_text/R/sna/igraph/*
context_text/R/sna/statnet/*
statnet/sna
sna::gden()
- graph densityR scripts:
context_text/R/sna/statnet/sna-statnet-init.r
context_text/R/sna/statnet/sna-statnet-network-stats.r
context_text/R/sna/statnet/sna-qap.r
igraph
igraph::transitivity()
- vector of transitivity scores for each node in a graph, plus network-level transitivity score.
R scripts:
context_text/R/sna/statnet/sna-igraph-init.r
context_text/R/sna/statnet/sna-igraph-network-stats.r
Store important directories and file names in variables:
In [1]:
getwd()
In [2]:
# code files (in particular SNA function library, modest though it may be)
code_directory <- "/home/jonathanmorgan/work/django/research/context_analysis/R/sna"
sna_function_file_path = paste( code_directory, "/", 'functions-sna.r', sep = "" )
# home directory
home_directory <- getwd()
home_directory <- "/home/jonathanmorgan/work/django/research/work/phd_work/methods"
# data directories
data_directory <- paste( home_directory, "/data", sep = "" )
workspace_file_name <- "igraph-grp_month.RData"
workspace_file_path <- paste( data_directory, "/", workspace_file_name )
In [21]:
# set working directory to data directory for now.
setwd( data_directory )
message( getwd() )
In [4]:
source( sna_function_file_path )
First, need render to render network data and upload it to your server.
Directions for rendering network data are in 2017.11.14-work_log-prelim-network_analysis.ipynb. You want a tab-delimited matrix that includes both the network and attributes of nodes as columns, and you want it to include a header row.
Once you render your network data files, you should place them on the server.
High level data file layout:
person_id
and person_type
)Files and their location on server:
This is data from the Grand Rapids Press articles from December of 2009, coded by both humans and OpenCalais.
Files:
sourcenet_data-20171115-043151-grp_month-automated.tab
sourcenet_data-20171115-043246-grp_month-automated-week_subset.tab
sourcenet_data-20171115-043102-grp_month-human.tab
sourcenet_data-20171115-043404-grp_month-human-week_subset.tab
Location in Dropbox: Dropbox/academia/MSU/program_stuff/prelim_paper/data/network_analysis/2017.11.14/network/new_coders/grp_month
Location on server: /home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month
If you want, you can load this file's workspace, from a previous run:
In [5]:
# assumes that you've already set working directory above to the
# working directory.
setwd( data_directory )
load( workspace_file_name )
grp_month
(gm) - automated - OpenCalaisFirst, we'll analyze the month of data coded by OpenCalais. Set up some variables to store where data is located:
In [6]:
# initialize variables
gmAutomatedDataFolder <- paste( data_directory, "/network/grp_month", sep = "" )
gmAutomatedDataFile <- "sourcenet_data-20171205-022551-grp_month-automated.tab"
gmAutomatedDataPath <- paste( gmAutomatedDataFolder, "/", gmAutomatedDataFile, sep = "" )
In [7]:
gmAutomatedDataPath
Load the data file into memory
In [8]:
# tab-delimited:
gmAutomatedDataDF <- read.delim( gmAutomatedDataPath, header = TRUE, row.names = 1, check.names = FALSE )
In [35]:
# get count of rows...
gmAutomatedRowCount <- nrow( gmAutomatedDataDF )
message( paste( "grp_month automated row count = ", gmAutomatedRowCount, sep = "" ) )
# ...and columns
gmAutomatedColumnCount <- ncol( gmAutomatedDataDF )
message( paste( "grp_month automated column count = ", gmAutomatedColumnCount, sep = "" ) )
Get just the tie rows and columns for initializing network libraries.
In [10]:
# the below syntax returns only as many columns as there are rows, so
# omitting any trait columns that lie in columns on the right side
# of the file.
gmAutomatedNetworkDF <- gmAutomatedDataDF[ , 1 : gmAutomatedRowCount ]
#str( gmAutomatedNetworkDF )
In [11]:
# convert to a matrix
gmAutomatedNetworkMatrix <- as.matrix( gmAutomatedNetworkDF )
# str( gmAutomatedNetworkMatrix )
First, load the igraph package.
Once we get the data into an igraph object, run the code in the following for more in-depth information:
context_text/R/sna/statnet/sna-igraph-init.r
context_text/R/sna/statnet/sna-igraph-network-stats.r
In [12]:
#install.packages( "igraph" )
library( igraph )
Load our data matrix into an igraph object.
In [13]:
# load data into igraph instance.
gmAutomatedNetworkIgraph <- graph.adjacency( gmAutomatedNetworkMatrix, mode = "undirected", weighted = TRUE )
In [14]:
# add person_id (column 1168)
personIdColumnNumber <- 1168
# first, get just the data frame column with person ID:
personIdsColumn <- gmAutomatedDataDF[ , personIdColumnNumber ]
# populate list we will use to set node person_ID attribute
# Don't just do these - they don't convert to simple list/vector, contain remnants of data frame
#personIdsList <- personIdsColumn
#personIdsList <- c( personIdsColumn )
# Convert to a list of numbers.
personIdsList <- as.numeric( personIdsColumn )
# Try this if you have character attribute...
#personTypesList <- unname( unlist( personTypesColumn ) )
# set vertex/node attribute person_id
V( gmAutomatedNetworkIgraph )$person_id <- personIdsList
# OR use function:
#gmAutomatedNetworkIgraph <- set.vertex.attribute( gmAutomatedNetworkIgraph, "person_id", value = personIdsList )
# look at graph and person_type attribute values
gmAutomatedNetworkIgraph
V( gmAutomatedNetworkIgraph )$person_id
In [15]:
# add person_type (column 1169)
personTypeColumnNumber <- 1169
# first, get just the data frame column with person type:
personTypesColumn <- gmAutomatedDataDF[ , personTypeColumnNumber ]
# populate list we will use to set node person_type attribute
# Don't just do these - they don't convert to simple list/vector, contain remnants of data frame
#personTypesList <- personTypesColumn
#personTypesList <- c( personTypesColumn )
# Convert to a list of numbers.
personTypesList <- as.numeric( personTypesColumn )
# Try this if you have character attribute...
#personTypesList <- unname( unlist( personTypesColumn ) )
# set vertex/node attribute person_type
V( gmAutomatedNetworkIgraph )$person_type <- personTypesList
# OR use function:
#test1_igraph <- set.vertex.attribute( test1_igraph, "person_type", value = person_types_list )
# look at graph and person_type attribute values
gmAutomatedNetworkIgraph
V( gmAutomatedNetworkIgraph )$person_type
In [16]:
# calais - include ties Greater than or equal to 0 (GE0)
gmAutomatedMeanTieWeightGE0Vector <- apply( gmAutomatedNetworkMatrix, 1, calculateListMean )
gmAutomatedDataDF$meanTieWeightGE0 <- gmAutomatedMeanTieWeightGE0Vector
# calais - include ties Greater than or equal to 1 (GE1)
gmAutomatedMeanTieWeightGE1Vector <- apply( gmAutomatedNetworkMatrix, 1, calculateListMean, minValueToIncludeIN = 1 )
gmAutomatedDataDF$meanTieWeightGE1 <- gmAutomatedMeanTieWeightGE1Vector
# automated - Max tie weight?
gmAutomatedMaxTieWeightVector <- apply( gmAutomatedNetworkMatrix, 1, calculateListMax )
gmAutomatedDataDF$maxTieWeight <- gmAutomatedMaxTieWeightVector
In [17]:
# to see count of nodes and edges, just type the object name:
gmAutomatedNetworkIgraph
# Will output something like:
#
# IGRAPH UNW- 314 309 --
# + attr: name (v/c), weight (e/n)
#
# in the first line, "UNW-" are traits of graph:
# - 1 - U = undirected ( directed would be "D" )
# - 2 - N = named or not ( "-" instead of "N" )
# - 3 - W = weighted
# - 4 - B = bipartite ( "-" = not bipartite )
# 314 is where node count goes, 309 is edge count.
# The second line gives you information about the 'attributes' associated with the graph. In this case, there are two attributes, name and weight. Next to each attribute name is a two-character construct that looks like "(v/c)". The first letter is the thing the attribute is associated with (g = graph, v = vertex or node, e = edge). The second is the type of the attribute (c = character data, n = numeric data). So, in this case:
# - name (v/c) - the name attribute is a vertex/node attribute - the "v" in "(v/c)" - where the values are character data - the "c" in "(v/c)".
# - weight (e/n) - the weight attribute is an edge attribute - the "e" in "(e/n)" - where the values are numeric data - the "n" in "(e/n)".
# - based on: http://www.shizukalab.com/toolkits/sna/sna_data
In [37]:
# try calling the degree() function on an igraph object. Returns a number vector with names.
gmAutomatedDegreeVector <- igraph::degree( gmAutomatedNetworkIgraph )
# For help with igraph::degree function:
#??igraph::degree
# calculate the mean of the degrees.
gmAutomatedAvgDegree <- mean( gmAutomatedDegreeVector )
message( paste( "grp_month automated average degree = ", gmAutomatedAvgDegree, sep = "" ) )
# append the degrees to the network as a vertex attribute.
V( gmAutomatedNetworkIgraph )$degree <- gmAutomatedDegreeVector
# also add degree vector to original data frame
gmAutomatedDataDF$degree <- gmAutomatedDegreeVector
# if you want to just work with the traits of the nodes/vertexes, you can
# combine the attribute vectors into a data frame.
# first, output igraph object to see what attributes you have
gmAutomatedNetworkIgraph
V( gmAutomatedNetworkIgraph )$degree
In [ ]:
Calculate average source and author degree:
In [36]:
# average author degree (person types 2 and 4)
gmAutomatedAverageAuthorDegree2And4 <- calcAuthorMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = TRUE )
message( paste( "grp_month automated average author degree (2 and 4) = ", gmAutomatedAverageAuthorDegree2And4, sep = "" ) )
# average author degree (person type 2 only)
gmAutomatedAverageAuthorDegreeOnly2 <- calcAuthorMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = FALSE )
message( paste( "grp_month automated average author degree (only 2) = ", gmAutomatedAverageAuthorDegreeOnly2, sep = "" ) )
# average source degree (person types 3 and 4)
gmAutomatedAverageSourceDegree3And4 <- calcSourceMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = TRUE )
message( paste( "grp_month automated average source degree (3 and 4) = ", gmAutomatedAverageSourceDegree3And4, sep = "" ) )
# average source degree (person type 3 only)
gmAutomatedAverageSourceDegreeOnly3 <- calcSourceMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = FALSE )
message( paste( "grp_month automated average source degree (only 3) = ", gmAutomatedAverageSourceDegreeOnly3, sep = "" ) )
Once we get the data into an igraph object, run the code in the following for more in-depth information:
context_text/R/sna/statnet/sna-igraph-init.r
context_text/R/sna/statnet/sna-igraph-network-stats.r
In [54]:
# First, need to load SNA functions and load data into statnet network object.
# For more details on that, see the files "functions-sna.r",
# "sna-load_data.r" and "sna-igraph_init.r".
#
# assumes that working directory for statnet is context_text/R/igraph
# setwd( ".." )
# source( "functions-sna.r" )
# source( "sna-load_data.r" )
# setwd( "igraph" )
# source( "sna-igraph-init.r" )
# results in (among other things):
# - humanNetworkData - data frame with human-generated network data matrix in it, including columns on the right side for any node-specific attributes.
# - calaisNetworkData - data frame with computer-generated network data matrix in it, including columns on the right side for any node-specific attributes.
# - humanNetworkTies - data frame with only human-generated network data matrix in it, no node-specific attributes.
# - calaisNetworkTies - data frame with only computer-generated network data matrix in it, no node-specific attributes.
# - humanNetworkMatrix - matrix with only human-generated network data matrix in it, no node-specific attributes.
# - calaisNetworkMatrix - matrix with only computer-generated network data matrix in it, no node-specific attributes.
# - humanNetworkIgraph - igraph network with human-coded network in it, including node-specific attributes.
# - calaisNetworkIgraph - igraph network with computer-coded network in it, including node-specific attributes.
# Links:
# - CRAN page: http://cran.r-project.org/web/packages/igraph/index.html
# - Manual (PDF): http://cran.r-project.org/web/packages/igraph/igraph.pdf
# - intro.: http://horicky.blogspot.com/2012/04/basic-graph-analytics-using-igraph.html
# - good notes: http://www.shizukalab.com/toolkits/sna/node-level-calculations
# Also, be advised that statnet and igraph don't really play nice together.
# If you'll be using both, best idea is to have a workspace for each.
#==============================================================================#
# igraph
#==============================================================================#
# Good notes:
# - http://assemblingnetwork.wordpress.com/2013/06/10/network-basics-with-r-and-igraph-part-ii-of-iii/
# make sure you've loaded the igraph library
# install.packages( "igraph" )
library( igraph )
#==============================================================================#
# NODE level
#==============================================================================#
# calculate the mean of the degrees.
gmAutomatedDegreeMean <- gmAutomatedAvgDegree
message( paste( "grp_month automated degree mean = ", gmAutomatedDegreeMean, sep = "" ) )
# what is the standard deviation of these degrees?
gmAutomatedDegreeSd <- sd( gmAutomatedDegreeVector )
message( paste( "grp_month automated degree SD = ", gmAutomatedDegreeSd, sep = "" ) )
# what is the variance of these degrees?
gmAutomatedDegreeVar <- var( gmAutomatedDegreeVector )
message( paste( "grp_month automated degree Variance = ", gmAutomatedDegreeVar, sep = "" ) )
# what is the max value among these degrees?
gmAutomatedDegreeMax <- max( gmAutomatedDegreeVector )
message( paste( "grp_month automated degree Max Value = ", gmAutomatedDegreeMax, sep = "" ) )
# calculate and plot degree distributions
gmAutomatedDegreeFrequenciesTable <- table( gmAutomatedDegreeVector )
gmAutomatedDegreeDistribution <- igraph::degree.distribution( gmAutomatedNetworkIgraph )
plot( gmAutomatedDegreeDistribution, xlab = "grp_month automated node degree" )
lines( gmAutomatedDegreeDistribution )
# subset vector to get only those that are above mean
gmAutomatedAboveMeanVector <- gmAutomatedDegreeVector[ gmAutomatedDegreeVector > gmAutomatedDegreeMax ]
# node-level transitivity
# create transitivity vectors.
gmAutomatedTransitivityVector <- igraph::transitivity( gmAutomatedNetworkIgraph, type = "local" )
# append the transitivity to the network as a vertex attribute.
V( gmAutomatedNetworkIgraph )$transitivity <- gmAutomatedTransitivityVector
# also add transitivity vector to original data frame
gmAutomatedDataDF$transitivity <- gmAutomatedTransitivityVector
# And, if you want averages of these:
gmAutomatedMeanTransitivity <- mean( gmAutomatedTransitivityVector, na.rm = TRUE )
message( paste( "grp_month automated mean transitivity = ", gmAutomatedMeanTransitivity, sep = "" ) )
#==============================================================================#
# NETWORK level
#==============================================================================#
#------------------------------------------------------------------------------#
# ==> graph-level degree centrality
#
# Returns a named list with the following components:
# res - The node-level centrality scores
# centralization - The graph level centrality index.
# theoretical_max - The maximum theoretical graph level centralization score
# for a graph with the given number of vertices, using the same parameters.
# If the normalized argument was TRUE (the default), then the result was
# divided by this number.
gmAutomatedDegreeCentralityOutput <- igraph::centralization.degree( gmAutomatedNetworkIgraph )
gmAutomatedDegreeCentrality <- gmAutomatedDegreeCentralityOutput$centralization
gmAutomatedDegreeCentralityMax <- gmAutomatedDegreeCentralityOutput$theoretical_max
message( paste( "grp_month automated degree centrality = ", gmAutomatedDegreeCentrality, " ( max = ", gmAutomatedDegreeCentralityMax, " )", sep = "" ) )
# node-level degree centrality
gmAutomatedDegreeCentralityVector <- gmAutomatedDegreeCentralityOutput$res
#message( paste( "grp_month automated betweenness = ", gmAutomatedBetweenness, sep = "" ) )
# append the degree centrality to the network as a vertex attribute.
V( gmAutomatedNetworkIgraph )$degreeCentrality <- gmAutomatedDegreeCentralityVector
# also add degree centrality vector to original data frame
gmAutomatedDataDF$degreeCentrality <- gmAutomatedDegreeCentralityVector
# And, if you want averages of these:
gmAutomatedMeanDegreeCentrality <- mean( gmAutomatedDegreeCentralityVector, na.rm = TRUE )
message( paste( "grp_month automated mean degree centrality = ", gmAutomatedMeanDegreeCentrality, sep = "" ) )
#------------------------------------------------------------------------------#
# ==> graph-level undirected betweenness
#
# Returns a named list with the following components:
# res - The node-level centrality scores
# centralization - The graph level centrality index.
# theoretical_max - The maximum theoretical graph level centralization score
# for a graph with the given number of vertices, using the same parameters.
# If the normalized argument was TRUE (the default), then the result was
# divided by this number.
gmAutomatedBetweennessCentralityOutput <- igraph::centralization.betweenness( gmAutomatedNetworkIgraph, directed = FALSE )
gmAutomatedBetweennessCentrality <- gmAutomatedBetweennessCentralityOutput$centralization
gmAutomatedBetweennessCentralityMax <- gmAutomatedBetweennessCentralityOutput$theoretical_max
message( paste( "grp_month automated betweenness centrality = ", gmAutomatedBetweennessCentrality, " ( max = ", gmAutomatedBetweennessCentralityMax, " )", sep = "" ) )
# node-level undirected betweenness
gmAutomatedBetweennessVector <- gmAutomatedBetweennessCentralityOutput$res
#message( paste( "grp_month automated betweenness = ", gmAutomatedBetweenness, sep = "" ) )
# append the betweenness to the network as a vertex attribute.
V( gmAutomatedNetworkIgraph )$betweenness <- gmAutomatedBetweennessVector
# also add betweenness vector to original data frame
gmAutomatedDataDF$betweenness <- gmAutomatedBetweennessVector
# And, if you want averages of these:
gmAutomatedMeanBetweenness <- mean( gmAutomatedBetweennessVector, na.rm = TRUE )
message( paste( "grp_month automated mean betweenness = ", gmAutomatedMeanBetweenness, sep = "" ) )
# graph-level transitivity
gmAutomatedTransitivity <- igraph::transitivity( gmAutomatedNetworkIgraph, type = "global" )
message( paste( "grp_month automated transitivity = ", gmAutomatedTransitivity, sep = "" ) )
# graph-level density
gmAutomatedDensity <- igraph::graph.density( gmAutomatedNetworkIgraph )
message( paste( "grp_month automated density = ", gmAutomatedDensity, sep = "" ) )
#==============================================================================#
# output attributes to data frame
#==============================================================================#
# if you want to just work with the traits of the nodes/vertexes, you can
# combine the attribute vectors into a data frame.
# first, output igraph object to see what attributes you have
gmAutomatedNetworkIgraph
# then, combine them into a data frame.
gmAutomatedAttributeDF <- data.frame( id = V( gmAutomatedNetworkIgraph )$name,
person_id = V( gmAutomatedNetworkIgraph )$person_id,
person_type = V( gmAutomatedNetworkIgraph )$person_type,
degree = V( gmAutomatedNetworkIgraph )$degree,
transitivity = V( gmAutomatedNetworkIgraph )$transitivity,
degreeCentrality = V( gmAutomatedNetworkIgraph )$degreeCentrality,
betweenness = V( gmAutomatedNetworkIgraph )$betweenness )
In [49]:
?igraph::centralization.betweenness
grp_month
(gm) - humanNext, we'll analyze the month of data coded by humans. Set up some variables to store where data is located:
In [24]:
# initialize variables
gmHumanDataFolder <- paste( data_directory, "/network/grp_month", sep = "" )
gmHumanDataFile <- "sourcenet_data-20171115-043102-grp_month-human.tab"
gmHumanDataPath <- paste( gmHumanDataFolder, "/", gmHumanDataFile, sep = "" )
In [25]:
gmHumanDataPath
Load the data file into memory
In [26]:
# tab-delimited:
gmHumanDataDF <- read.delim( gmHumanDataPath, header = TRUE, row.names = 1, check.names = FALSE )
In [29]:
# get count of rows...
gmHumanRowCount <- nrow( gmHumanDataDF )
paste( "grp_month human row count = ", gmHumanRowCount, sep = "" )
# ...and columns
gmHumanColumnCount <- ncol( gmHumanDataDF )
paste( "grp_month human column count = ", gmHumanColumnCount, sep = "" )
Get just the tie rows and columns for initializing network libraries.
In [30]:
# the below syntax returns only as many columns as there are rows, so
# omitting any trait columns that lie in columns on the right side
# of the file.
gmHumanNetworkDF <- gmHumanDataDF[ , 1 : gmHumanRowCount ]
#str( gmAutomatedNetworkDF )
In [31]:
# convert to a matrix
gmHumanNetworkMatrix <- as.matrix( gmHumanNetworkDF )
# str( gmHumanNetworkMatrix )
First, load the igraph package.
Once we get the data into an igraph object, run the code in the following for more in-depth information:
context_text/R/sna/statnet/sna-igraph-init.r
context_text/R/sna/statnet/sna-igraph-network-stats.r
In [32]:
#install.packages( "igraph" )
library( igraph )
Load our data matrix into an igraph object.
In [33]:
# load data into igraph instance.
gmHumanNetworkIgraph <- graph.adjacency( gmHumanNetworkMatrix, mode = "undirected", weighted = TRUE )
In [34]:
# add person_id (column 1168)
personIdColumnNumber <- 1168
# first, get just the data frame column with person ID:
personIdsColumn <- gmHumanDataDF[ , personIdColumnNumber ]
# populate list we will use to set node person_ID attribute
# Don't just do these - they don't convert to simple list/vector, contain remnants of data frame
#personIdsList <- personIdsColumn
#personIdsList <- c( personIdsColumn )
# Convert to a list of numbers.
personIdsList <- as.numeric( personIdsColumn )
# Try this if you have character attribute...
#personTypesList <- unname( unlist( personTypesColumn ) )
# set vertex/node attribute person_id
V( gmHumanNetworkIgraph )$person_id <- personIdsList
# OR use function:
#gmHumanNetworkIgraph <- set.vertex.attribute( gmHumanNetworkIgraph, "person_id", value = personIdsList )
# look at graph and person_type attribute values
gmHumanNetworkIgraph
V( gmHumanNetworkIgraph )$person_id
In [38]:
# add person_type (column 1169)
personTypeColumnNumber <- 1169
# first, get just the data frame column with person type:
personTypesColumn <- gmHumanDataDF[ , personTypeColumnNumber ]
# populate list we will use to set node person_type attribute
# Don't just do these - they don't convert to simple list/vector, contain remnants of data frame
#personTypesList <- personTypesColumn
#personTypesList <- c( personTypesColumn )
# Convert to a list of numbers.
personTypesList <- as.numeric( personTypesColumn )
# Try this if you have character attribute...
#personTypesList <- unname( unlist( personTypesColumn ) )
# set vertex/node attribute person_type
V( gmHumanNetworkIgraph )$person_type <- personTypesList
# OR use function:
#test1_igraph <- set.vertex.attribute( test1_igraph, "person_type", value = person_types_list )
# look at graph and person_type attribute values
gmHumanNetworkIgraph
V( gmHumanNetworkIgraph )$person_type
In [39]:
# human - include ties Greater than or equal to 0 (GE0)
gmHumanMeanTieWeightGE0Vector <- apply( gmHumanNetworkMatrix, 1, calculateListMean )
gmHumanDataDF$meanTieWeightGE0 <- gmHumanMeanTieWeightGE0Vector
# human - include ties Greater than or equal to 1 (GE1)
gmHumanMeanTieWeightGE1Vector <- apply( gmHumanNetworkMatrix, 1, calculateListMean, minValueToIncludeIN = 1 )
gmHumanDataDF$meanTieWeightGE1 <- gmHumanMeanTieWeightGE1Vector
# human - Max tie weight?
gmHumanMaxTieWeightVector <- apply( gmHumanNetworkMatrix, 1, calculateListMax )
gmHumanDataDF$maxTieWeight <- gmHumanMaxTieWeightVector
In [40]:
# to see count of nodes and edges, just type the object name:
gmHumanNetworkIgraph
# Will output something like:
#
# IGRAPH UNW- 314 309 --
# + attr: name (v/c), weight (e/n)
#
# in the first line, "UNW-" are traits of graph:
# - 1 - U = undirected ( directed would be "D" )
# - 2 - N = named or not ( "-" instead of "N" )
# - 3 - W = weighted
# - 4 - B = bipartite ( "-" = not bipartite )
# 314 is where node count goes, 309 is edge count.
# The second line gives you information about the 'attributes' associated with the graph. In this case, there are two attributes, name and weight. Next to each attribute name is a two-character construct that looks like "(v/c)". The first letter is the thing the attribute is associated with (g = graph, v = vertex or node, e = edge). The second is the type of the attribute (c = character data, n = numeric data). So, in this case:
# - name (v/c) - the name attribute is a vertex/node attribute - the "v" in "(v/c)" - where the values are character data - the "c" in "(v/c)".
# - weight (e/n) - the weight attribute is an edge attribute - the "e" in "(e/n)" - where the values are numeric data - the "n" in "(e/n)".
# - based on: http://www.shizukalab.com/toolkits/sna/sna_data
In [41]:
# try calling the degree() function on an igraph object. Returns a number vector with names.
gmHumanDegreeVector <- igraph::degree( gmHumanNetworkIgraph )
# For help with igraph::degree function:
#??igraph::degree
# calculate the mean of the degrees.
gmHumanAvgDegree <- mean( gmHumanDegreeVector )
message( paste( "grp_month human average degree = ", gmHumanAvgDegree, sep = "" ) )
# append the degrees to the network as a vertex attribute.
V( gmHumanNetworkIgraph )$degree <- gmHumanDegreeVector
# also add degree vector to original data frame
gmHumanDataDF$degree <- gmHumanDegreeVector
# if you want to just work with the traits of the nodes/vertexes, you can
# combine the attribute vectors into a data frame.
# first, output igraph object to see what attributes you have
gmHumanNetworkIgraph
V( gmHumanNetworkIgraph )$degree
Calculate average source and author degree:
In [43]:
# average author degree (person types 2 and 4)
gmHumanAverageAuthorDegree2And4 <- calcAuthorMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = TRUE )
message( paste( "grp_month human average author degree (2 and 4) = ", gmHumanAverageAuthorDegree2And4, sep = "" ) )
# average author degree (person type 2 only)
gmHumanAverageAuthorDegreeOnly2 <- calcAuthorMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = FALSE )
message( paste( "grp_month human average author degree (only 2) = ", gmHumanAverageAuthorDegreeOnly2, sep = "" ) )
# average source degree (person types 3 and 4)
gmHumanAverageSourceDegree3And4 <- calcSourceMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = TRUE )
message( paste( "grp_month human average source degree (3 and 4) = ", gmHumanAverageSourceDegree3And4, sep = "" ) )
# average source degree (person type 3 only)
gmHumanAverageSourceDegreeOnly3 <- calcSourceMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = FALSE )
message( paste( "grp_month human average source degree (only 3) = ", gmHumanAverageSourceDegreeOnly3, sep = "" ) )
Once we get the data into an igraph object, run the code in the following for more in-depth information:
context_text/R/sna/statnet/sna-igraph-init.r
context_text/R/sna/statnet/sna-igraph-network-stats.r
In [56]:
# First, need to load SNA functions and load data into statnet network object.
# For more details on that, see the files "functions-sna.r",
# "sna-load_data.r" and "sna-igraph_init.r".
#
# assumes that working directory for statnet is context_text/R/igraph
# setwd( ".." )
# source( "functions-sna.r" )
# source( "sna-load_data.r" )
# setwd( "igraph" )
# source( "sna-igraph-init.r" )
# results in (among other things):
# - humanNetworkData - data frame with human-generated network data matrix in it, including columns on the right side for any node-specific attributes.
# - calaisNetworkData - data frame with computer-generated network data matrix in it, including columns on the right side for any node-specific attributes.
# - humanNetworkTies - data frame with only human-generated network data matrix in it, no node-specific attributes.
# - calaisNetworkTies - data frame with only computer-generated network data matrix in it, no node-specific attributes.
# - humanNetworkMatrix - matrix with only human-generated network data matrix in it, no node-specific attributes.
# - calaisNetworkMatrix - matrix with only computer-generated network data matrix in it, no node-specific attributes.
# - humanNetworkIgraph - igraph network with human-coded network in it, including node-specific attributes.
# - calaisNetworkIgraph - igraph network with computer-coded network in it, including node-specific attributes.
# Links:
# - CRAN page: http://cran.r-project.org/web/packages/igraph/index.html
# - Manual (PDF): http://cran.r-project.org/web/packages/igraph/igraph.pdf
# - intro.: http://horicky.blogspot.com/2012/04/basic-graph-analytics-using-igraph.html
# - good notes: http://www.shizukalab.com/toolkits/sna/node-level-calculations
# Also, be advised that statnet and igraph don't really play nice together.
# If you'll be using both, best idea is to have a workspace for each.
#==============================================================================#
# igraph
#==============================================================================#
# Good notes:
# - http://assemblingnetwork.wordpress.com/2013/06/10/network-basics-with-r-and-igraph-part-ii-of-iii/
# make sure you've loaded the igraph library
# install.packages( "igraph" )
library( igraph )
#==============================================================================#
# NODE level
#==============================================================================#
# calculate the mean of the degrees.
gmHumanDegreeMean <- gmHumanAvgDegree
message( paste( "grp_month human degree mean = ", gmHumanDegreeMean, sep = "" ) )
# what is the standard deviation of these degrees?
gmHumanDegreeSd <- sd( gmHumanDegreeVector )
message( paste( "grp_month human degree SD = ", gmHumanDegreeSd, sep = "" ) )
# what is the variance of these degrees?
gmHumanDegreeVar <- var( gmHumanDegreeVector )
message( paste( "grp_month human degree Variance = ", gmHumanDegreeVar, sep = "" ) )
# what is the max value among these degrees?
gmHumanDegreeMax <- max( gmHumanDegreeVector )
message( paste( "grp_month human degree Max Value = ", gmHumanDegreeMax, sep = "" ) )
# calculate and plot degree distributions
gmHumanDegreeFrequenciesTable <- table( gmHumanDegreeVector )
gmHumanDegreeDistribution <- igraph::degree.distribution( gmHumanNetworkIgraph )
plot( gmHumanDegreeDistribution, xlab = "grp_month human node degree" )
lines( gmHumanDegreeDistribution )
# subset vector to get only those that are above mean
gmHumanAboveMeanVector <- gmHumanDegreeVector[ gmHumanDegreeVector > gmHumanDegreeMax ]
# node-level transitivity
# create transitivity vectors.
gmHumanTransitivityVector <- igraph::transitivity( gmHumanNetworkIgraph, type = "local" )
# append the transitivity to the network as a vertex attribute.
V( gmHumanNetworkIgraph )$transitivity <- gmHumanTransitivityVector
# also add transitivity vector to original data frame
gmHumanDataDF$transitivity <- gmHumanTransitivityVector
# And, if you want averages of these:
gmHumanMeanTransitivity <- mean( gmHumanTransitivityVector, na.rm = TRUE )
message( paste( "grp_month human mean transitivity = ", gmHumanMeanTransitivity, sep = "" ) )
#==============================================================================#
# NETWORK level
#==============================================================================#
#------------------------------------------------------------------------------#
# ==> graph-level degree centrality
#
# Returns a named list with the following components:
# res - The node-level centrality scores
# centralization - The graph level centrality index.
# theoretical_max - The maximum theoretical graph level centralization score
# for a graph with the given number of vertices, using the same parameters.
# If the normalized argument was TRUE (the default), then the result was
# divided by this number.
gmHumanDegreeCentralityOutput <- igraph::centralization.degree( gmHumanNetworkIgraph )
gmHumanDegreeCentrality <- gmHumanDegreeCentralityOutput$centralization
gmHumanDegreeCentralityMax <- gmHumanDegreeCentralityOutput$theoretical_max
message( paste( "grp_month human degree centrality = ", gmHumanDegreeCentrality, " ( max = ", gmHumanDegreeCentralityMax, " )", sep = "" ) )
# node-level degree centrality
gmHumanDegreeCentralityVector <- gmHumanDegreeCentralityOutput$res
#message( paste( "grp_month human betweenness = ", gmHumanBetweenness, sep = "" ) )
# append the degree centrality to the network as a vertex attribute.
V( gmHumanNetworkIgraph )$degreeCentrality <- gmHumanDegreeCentralityVector
# also add degree centrality vector to original data frame
gmHumanDataDF$degreeCentrality <- gmHumanDegreeCentralityVector
# And, if you want averages of these:
gmHumanMeanDegreeCentrality <- mean( gmHumanDegreeCentralityVector, na.rm = TRUE )
message( paste( "grp_month human mean degree centrality = ", gmHumanMeanDegreeCentrality, sep = "" ) )
#------------------------------------------------------------------------------#
# ==> graph-level undirected betweenness
#
# Returns a named list with the following components:
# res - The node-level centrality scores
# centralization - The graph level centrality index.
# theoretical_max - The maximum theoretical graph level centralization score
# for a graph with the given number of vertices, using the same parameters.
# If the normalized argument was TRUE (the default), then the result was
# divided by this number.
gmHumanBetweennessCentralityOutput <- igraph::centralization.betweenness( gmHumanNetworkIgraph, directed = FALSE )
gmHumanBetweennessCentrality <- gmHumanBetweennessCentralityOutput$centralization
gmHumanBetweennessCentralityMax <- gmHumanBetweennessCentralityOutput$theoretical_max
message( paste( "grp_month human betweenness centrality = ", gmHumanBetweennessCentrality, " ( max = ", gmHumanBetweennessCentralityMax, " )", sep = "" ) )
# node-level undirected betweenness
gmHumanBetweennessVector <- gmHumanBetweennessCentralityOutput$res
#message( paste( "grp_month human betweenness = ", gmHumanBetweenness, sep = "" ) )
# append the betweenness to the network as a vertex attribute.
V( gmHumanNetworkIgraph )$betweenness <- gmHumanBetweennessVector
# also add betweenness vector to original data frame
gmHumanDataDF$betweenness <- gmHumanBetweennessVector
# And, if you want averages of these:
gmHumanMeanBetweenness <- mean( gmHumanBetweennessVector, na.rm = TRUE )
message( paste( "grp_month human mean betweenness = ", gmHumanMeanBetweenness, sep = "" ) )
# graph-level transitivity
gmHumanTransitivity <- igraph::transitivity( gmHumanNetworkIgraph, type = "global" )
message( paste( "grp_month human transitivity = ", gmHumanTransitivity, sep = "" ) )
# graph-level density
gmHumanDensity <- igraph::graph.density( gmHumanNetworkIgraph )
message( paste( "grp_month human density = ", gmHumanDensity, sep = "" ) )
#==============================================================================#
# output attributes to data frame
#==============================================================================#
# if you want to just work with the traits of the nodes/vertexes, you can
# combine the attribute vectors into a data frame.
# first, output igraph object to see what attributes you have
gmHumanNetworkIgraph
# then, combine them into a data frame.
gmHumanAttributeDF <- data.frame( id = V( gmHumanNetworkIgraph )$name,
person_id = V( gmHumanNetworkIgraph )$person_id,
person_type = V( gmHumanNetworkIgraph )$person_type,
degree = V( gmHumanNetworkIgraph )$degree,
transitivity = V( gmHumanNetworkIgraph )$transitivity,
degreeCentrality = V( gmHumanNetworkIgraph )$degreeCentrality,
betweenness = V( gmHumanNetworkIgraph )$betweenness )
grp_week
analysisFirst, look at a single week out of the shiny new month of data.
grp_week
(gw) - automated - OpenCalaisFirst, we'll analyze the week of data coded by OpenCalais. Set up some variables to store where data is located:
In [57]:
# initialize variables
gwAutomatedDataFolder <- paste( data_directory, "/network/grp_month", sep = "" )
gwAutomatedDataFile <- "sourcenet_data-20171206-031358-grp_month-automated-week_subset.tab"
gwAutomatedDataPath <- paste( gwAutomatedDataFolder, "/", gwAutomatedDataFile, sep = "" )
In [58]:
gwAutomatedDataPath
Load the data file into memory
In [59]:
# tab-delimited:
gwAutomatedDataDF <- read.delim( gwAutomatedDataPath, header = TRUE, row.names = 1, check.names = FALSE )
In [61]:
# get count of rows...
gwAutomatedRowCount <- nrow( gwAutomatedDataDF )
message( paste( "grp_week automated row count = ", gwAutomatedRowCount, sep = "" ) )
# ...and columns
gwAutomatedColumnCount <- ncol( gwAutomatedDataDF )
message( paste( "grp_week automated column count = ", gwAutomatedColumnCount, sep = "" ) )
Get just the tie rows and columns for initializing network libraries.
In [62]:
# the below syntax returns only as many columns as there are rows, so
# omitting any trait columns that lie in columns on the right side
# of the file.
gwAutomatedNetworkDF <- gwAutomatedDataDF[ , 1 : gwAutomatedRowCount ]
#str( gwAutomatedNetworkDF )
In [63]:
# convert to a matrix
gwAutomatedNetworkMatrix <- as.matrix( gwAutomatedNetworkDF )
# str( gwAutomatedNetworkMatrix )
First, load the igraph package.
Once we get the data into an igraph object, run the code in the following for more in-depth information:
context_text/R/sna/statnet/sna-igraph-init.r
context_text/R/sna/statnet/sna-igraph-network-stats.r
In [64]:
#install.packages( "igraph" )
library( igraph )
Load our data matrix into an igraph object.
In [65]:
# load data into igraph instance.
gwAutomatedNetworkIgraph <- graph.adjacency( gwAutomatedNetworkMatrix, mode = "undirected", weighted = TRUE )
In [66]:
# add person_id (column 1168)
personIdColumnNumber <- 1168
# first, get just the data frame column with person ID:
personIdsColumn <- gwAutomatedDataDF[ , personIdColumnNumber ]
# populate list we will use to set node person_ID attribute
# Don't just do these - they don't convert to simple list/vector, contain remnants of data frame
#personIdsList <- personIdsColumn
#personIdsList <- c( personIdsColumn )
# Convert to a list of numbers.
personIdsList <- as.numeric( personIdsColumn )
# Try this if you have character attribute...
#personTypesList <- unname( unlist( personTypesColumn ) )
# set vertex/node attribute person_id
V( gwAutomatedNetworkIgraph )$person_id <- personIdsList
# OR use function:
#gwAutomatedNetworkIgraph <- set.vertex.attribute( gwAutomatedNetworkIgraph, "person_id", value = personIdsList )
# look at graph and person_type attribute values
gwAutomatedNetworkIgraph
V( gwAutomatedNetworkIgraph )$person_id
In [67]:
# add person_type (column 1169)
personTypeColumnNumber <- 1169
# first, get just the data frame column with person type:
personTypesColumn <- gwAutomatedDataDF[ , personTypeColumnNumber ]
# populate list we will use to set node person_type attribute
# Don't just do these - they don't convert to simple list/vector, contain remnants of data frame
#personTypesList <- personTypesColumn
#personTypesList <- c( personTypesColumn )
# Convert to a list of numbers.
personTypesList <- as.numeric( personTypesColumn )
# Try this if you have character attribute...
#personTypesList <- unname( unlist( personTypesColumn ) )
# set vertex/node attribute person_type
V( gwAutomatedNetworkIgraph )$person_type <- personTypesList
# OR use function:
#test1_igraph <- set.vertex.attribute( test1_igraph, "person_type", value = person_types_list )
# look at graph and person_type attribute values
gwAutomatedNetworkIgraph
V( gwAutomatedNetworkIgraph )$person_type
In [68]:
# calais - include ties Greater than or equal to 0 (GE0)
gwAutomatedMeanTieWeightGE0Vector <- apply( gwAutomatedNetworkMatrix, 1, calculateListMean )
gwAutomatedDataDF$meanTieWeightGE0 <- gwAutomatedMeanTieWeightGE0Vector
# calais - include ties Greater than or equal to 1 (GE1)
gwAutomatedMeanTieWeightGE1Vector <- apply( gwAutomatedNetworkMatrix, 1, calculateListMean, minValueToIncludeIN = 1 )
gwAutomatedDataDF$meanTieWeightGE1 <- gwAutomatedMeanTieWeightGE1Vector
# automated - Max tie weight?
gwAutomatedMaxTieWeightVector <- apply( gwAutomatedNetworkMatrix, 1, calculateListMax )
gwAutomatedDataDF$maxTieWeight <- gwAutomatedMaxTieWeightVector
In [69]:
# to see count of nodes and edges, just type the object name:
gwAutomatedNetworkIgraph
# Will output something like:
#
# IGRAPH UNW- 314 309 --
# + attr: name (v/c), weight (e/n)
#
# in the first line, "UNW-" are traits of graph:
# - 1 - U = undirected ( directed would be "D" )
# - 2 - N = named or not ( "-" instead of "N" )
# - 3 - W = weighted
# - 4 - B = bipartite ( "-" = not bipartite )
# 314 is where node count goes, 309 is edge count.
# The second line gives you information about the 'attributes' associated with the graph. In this case, there are two attributes, name and weight. Next to each attribute name is a two-character construct that looks like "(v/c)". The first letter is the thing the attribute is associated with (g = graph, v = vertex or node, e = edge). The second is the type of the attribute (c = character data, n = numeric data). So, in this case:
# - name (v/c) - the name attribute is a vertex/node attribute - the "v" in "(v/c)" - where the values are character data - the "c" in "(v/c)".
# - weight (e/n) - the weight attribute is an edge attribute - the "e" in "(e/n)" - where the values are numeric data - the "n" in "(e/n)".
# - based on: http://www.shizukalab.com/toolkits/sna/sna_data
In [70]:
# try calling the degree() function on an igraph object. Returns a number vector with names.
gwAutomatedDegreeVector <- igraph::degree( gwAutomatedNetworkIgraph )
# For help with igraph::degree function:
#??igraph::degree
# calculate the mean of the degrees.
gwAutomatedAvgDegree <- mean( gwAutomatedDegreeVector )
message( paste( "grp_week automated average degree = ", gwAutomatedAvgDegree, sep = "" ) )
# append the degrees to the network as a vertex attribute.
V( gwAutomatedNetworkIgraph )$degree <- gwAutomatedDegreeVector
# also add degree vector to original data frame
gwAutomatedDataDF$degree <- gwAutomatedDegreeVector
# if you want to just work with the traits of the nodes/vertexes, you can
# combine the attribute vectors into a data frame.
# first, output igraph object to see what attributes you have
gwAutomatedNetworkIgraph
V( gwAutomatedNetworkIgraph )$degree
Calculate average source and author degree:
In [71]:
# average author degree (person types 2 and 4)
gwAutomatedAverageAuthorDegree2And4 <- calcAuthorMeanDegree( dataFrameIN = gwAutomatedDataDF, includeBothIN = TRUE )
message( paste( "grp_week automated average author degree (2 and 4) = ", gwAutomatedAverageAuthorDegree2And4, sep = "" ) )
# average author degree (person type 2 only)
gwAutomatedAverageAuthorDegreeOnly2 <- calcAuthorMeanDegree( dataFrameIN = gwAutomatedDataDF, includeBothIN = FALSE )
message( paste( "grp_week automated average author degree (only 2) = ", gwAutomatedAverageAuthorDegreeOnly2, sep = "" ) )
# average source degree (person types 3 and 4)
gwAutomatedAverageSourceDegree3And4 <- calcSourceMeanDegree( dataFrameIN = gwAutomatedDataDF, includeBothIN = TRUE )
message( paste( "grp_week automated average source degree (3 and 4) = ", gwAutomatedAverageSourceDegree3And4, sep = "" ) )
# average source degree (person type 3 only)
gwAutomatedAverageSourceDegreeOnly3 <- calcSourceMeanDegree( dataFrameIN = gwAutomatedDataDF, includeBothIN = FALSE )
message( paste( "grp_week automated average source degree (only 3) = ", gwAutomatedAverageSourceDegreeOnly3, sep = "" ) )
Once we get the data into an igraph object, run the code in the following for more in-depth information:
context_text/R/sna/statnet/sna-igraph-init.r
context_text/R/sna/statnet/sna-igraph-network-stats.r
In [72]:
# First, need to load SNA functions and load data into statnet network object.
# For more details on that, see the files "functions-sna.r",
# "sna-load_data.r" and "sna-igraph_init.r".
#
# assumes that working directory for statnet is context_text/R/igraph
# setwd( ".." )
# source( "functions-sna.r" )
# source( "sna-load_data.r" )
# setwd( "igraph" )
# source( "sna-igraph-init.r" )
# results in (among other things):
# - humanNetworkData - data frame with human-generated network data matrix in it, including columns on the right side for any node-specific attributes.
# - calaisNetworkData - data frame with computer-generated network data matrix in it, including columns on the right side for any node-specific attributes.
# - humanNetworkTies - data frame with only human-generated network data matrix in it, no node-specific attributes.
# - calaisNetworkTies - data frame with only computer-generated network data matrix in it, no node-specific attributes.
# - humanNetworkMatrix - matrix with only human-generated network data matrix in it, no node-specific attributes.
# - calaisNetworkMatrix - matrix with only computer-generated network data matrix in it, no node-specific attributes.
# - humanNetworkIgraph - igraph network with human-coded network in it, including node-specific attributes.
# - calaisNetworkIgraph - igraph network with computer-coded network in it, including node-specific attributes.
# Links:
# - CRAN page: http://cran.r-project.org/web/packages/igraph/index.html
# - Manual (PDF): http://cran.r-project.org/web/packages/igraph/igraph.pdf
# - intro.: http://horicky.blogspot.com/2012/04/basic-graph-analytics-using-igraph.html
# - good notes: http://www.shizukalab.com/toolkits/sna/node-level-calculations
# Also, be advised that statnet and igraph don't really play nice together.
# If you'll be using both, best idea is to have a workspace for each.
#==============================================================================#
# igraph
#==============================================================================#
# Good notes:
# - http://assemblingnetwork.wordpress.com/2013/06/10/network-basics-with-r-and-igraph-part-ii-of-iii/
# make sure you've loaded the igraph library
# install.packages( "igraph" )
library( igraph )
#==============================================================================#
# NODE level
#==============================================================================#
# calculate the mean of the degrees.
gwAutomatedDegreeMean <- gwAutomatedAvgDegree
message( paste( "grp_week automated degree mean = ", gwAutomatedDegreeMean, sep = "" ) )
# what is the standard deviation of these degrees?
gwAutomatedDegreeSd <- sd( gwAutomatedDegreeVector )
message( paste( "grp_week automated degree SD = ", gwAutomatedDegreeSd, sep = "" ) )
# what is the variance of these degrees?
gwAutomatedDegreeVar <- var( gwAutomatedDegreeVector )
message( paste( "grp_week automated degree Variance = ", gwAutomatedDegreeVar, sep = "" ) )
# what is the max value among these degrees?
gwAutomatedDegreeMax <- max( gwAutomatedDegreeVector )
message( paste( "grp_week automated degree Max Value = ", gwAutomatedDegreeMax, sep = "" ) )
# calculate and plot degree distributions
gwAutomatedDegreeFrequenciesTable <- table( gwAutomatedDegreeVector )
gwAutomatedDegreeDistribution <- igraph::degree.distribution( gwAutomatedNetworkIgraph )
plot( gwAutomatedDegreeDistribution, xlab = "grp_week automated node degree" )
lines( gwAutomatedDegreeDistribution )
# subset vector to get only those that are above mean
gwAutomatedAboveMeanVector <- gwAutomatedDegreeVector[ gwAutomatedDegreeVector > gwAutomatedDegreeMax ]
# node-level transitivity
# create transitivity vectors.
gwAutomatedTransitivityVector <- igraph::transitivity( gwAutomatedNetworkIgraph, type = "local" )
# append the transitivity to the network as a vertex attribute.
V( gwAutomatedNetworkIgraph )$transitivity <- gwAutomatedTransitivityVector
# also add transitivity vector to original data frame
gwAutomatedDataDF$transitivity <- gwAutomatedTransitivityVector
# And, if you want averages of these:
gwAutomatedMeanTransitivity <- mean( gwAutomatedTransitivityVector, na.rm = TRUE )
message( paste( "grp_week automated mean transitivity = ", gwAutomatedMeanTransitivity, sep = "" ) )
#==============================================================================#
# NETWORK level
#==============================================================================#
#------------------------------------------------------------------------------#
# ==> graph-level degree centrality
#
# Returns a named list with the following components:
# res - The node-level centrality scores
# centralization - The graph level centrality index.
# theoretical_max - The maximum theoretical graph level centralization score
# for a graph with the given number of vertices, using the same parameters.
# If the normalized argument was TRUE (the default), then the result was
# divided by this number.
gwAutomatedDegreeCentralityOutput <- igraph::centralization.degree( gwAutomatedNetworkIgraph )
gwAutomatedDegreeCentrality <- gwAutomatedDegreeCentralityOutput$centralization
gwAutomatedDegreeCentralityMax <- gwAutomatedDegreeCentralityOutput$theoretical_max
message( paste( "grp_week automated degree centrality = ", gwAutomatedDegreeCentrality, " ( max = ", gwAutomatedDegreeCentralityMax, " )", sep = "" ) )
# node-level degree centrality
gwAutomatedDegreeCentralityVector <- gwAutomatedDegreeCentralityOutput$res
#message( paste( "grp_week automated betweenness = ", gwAutomatedBetweenness, sep = "" ) )
# append the degree centrality to the network as a vertex attribute.
V( gwAutomatedNetworkIgraph )$degreeCentrality <- gwAutomatedDegreeCentralityVector
# also add degree centrality vector to original data frame
gwAutomatedDataDF$degreeCentrality <- gwAutomatedDegreeCentralityVector
# And, if you want averages of these:
gwAutomatedMeanDegreeCentrality <- mean( gwAutomatedDegreeCentralityVector, na.rm = TRUE )
message( paste( "grp_week automated mean degree centrality = ", gwAutomatedMeanDegreeCentrality, sep = "" ) )
#------------------------------------------------------------------------------#
# ==> graph-level undirected betweenness
#
# Returns a named list with the following components:
# res - The node-level centrality scores
# centralization - The graph level centrality index.
# theoretical_max - The maximum theoretical graph level centralization score
# for a graph with the given number of vertices, using the same parameters.
# If the normalized argument was TRUE (the default), then the result was
# divided by this number.
gwAutomatedBetweennessCentralityOutput <- igraph::centralization.betweenness( gwAutomatedNetworkIgraph, directed = FALSE )
gwAutomatedBetweennessCentrality <- gwAutomatedBetweennessCentralityOutput$centralization
gwAutomatedBetweennessCentralityMax <- gwAutomatedBetweennessCentralityOutput$theoretical_max
message( paste( "grp_week automated betweenness centrality = ", gwAutomatedBetweennessCentrality, " ( max = ", gwAutomatedBetweennessCentralityMax, " )", sep = "" ) )
# node-level undirected betweenness
gwAutomatedBetweennessVector <- gwAutomatedBetweennessCentralityOutput$res
#message( paste( "grp_week automated betweenness = ", gwAutomatedBetweenness, sep = "" ) )
# append the betweenness to the network as a vertex attribute.
V( gwAutomatedNetworkIgraph )$betweenness <- gwAutomatedBetweennessVector
# also add betweenness vector to original data frame
gwAutomatedDataDF$betweenness <- gwAutomatedBetweennessVector
# And, if you want averages of these:
gwAutomatedMeanBetweenness <- mean( gwAutomatedBetweennessVector, na.rm = TRUE )
message( paste( "grp_week automated mean betweenness = ", gwAutomatedMeanBetweenness, sep = "" ) )
# graph-level transitivity
gwAutomatedTransitivity <- igraph::transitivity( gwAutomatedNetworkIgraph, type = "global" )
message( paste( "grp_week automated transitivity = ", gwAutomatedTransitivity, sep = "" ) )
# graph-level density
gwAutomatedDensity <- igraph::graph.density( gwAutomatedNetworkIgraph )
message( paste( "grp_week automated density = ", gwAutomatedDensity, sep = "" ) )
#==============================================================================#
# output attributes to data frame
#==============================================================================#
# if you want to just work with the traits of the nodes/vertexes, you can
# combine the attribute vectors into a data frame.
# first, output igraph object to see what attributes you have
gwAutomatedNetworkIgraph
# then, combine them into a data frame.
gwAutomatedAttributeDF <- data.frame( id = V( gwAutomatedNetworkIgraph )$name,
person_id = V( gwAutomatedNetworkIgraph )$person_id,
person_type = V( gwAutomatedNetworkIgraph )$person_type,
degree = V( gwAutomatedNetworkIgraph )$degree,
transitivity = V( gwAutomatedNetworkIgraph )$transitivity,
degreeCentrality = V( gwAutomatedNetworkIgraph )$degreeCentrality,
betweenness = V( gwAutomatedNetworkIgraph )$betweenness )
grp_week
(gw) - humanNext, we'll analyze a week of the month of data coded by humans. Set up some variables to store where data is located:
In [76]:
# initialize variables
gwHumanDataFolder <- paste( data_directory, "/network/grp_month", sep = "" )
gwHumanDataFile <- "sourcenet_data-20171206-031319-grp_month-human-week_subset.tab"
gwHumanDataPath <- paste( gwHumanDataFolder, "/", gwHumanDataFile, sep = "" )
In [77]:
gwHumanDataPath
Load the data file into memory
In [78]:
# tab-delimited:
gwHumanDataDF <- read.delim( gwHumanDataPath, header = TRUE, row.names = 1, check.names = FALSE )
In [79]:
# get count of rows...
gwHumanRowCount <- nrow( gwHumanDataDF )
message( paste( "grp_week human row count = ", gwHumanRowCount, sep = "" ) )
# ...and columns
gwHumanColumnCount <- ncol( gwHumanDataDF )
message( paste( "grp_week human column count = ", gwHumanColumnCount, sep = "" ) )
Get just the tie rows and columns for initializing network libraries.
In [80]:
# the below syntax returns only as many columns as there are rows, so
# omitting any trait columns that lie in columns on the right side
# of the file.
gwHumanNetworkDF <- gwHumanDataDF[ , 1 : gwHumanRowCount ]
#str( gwAutomatedNetworkDF )
In [81]:
# convert to a matrix
gwHumanNetworkMatrix <- as.matrix( gwHumanNetworkDF )
# str( gwHumanNetworkMatrix )
First, load the igraph package.
Once we get the data into an igraph object, run the code in the following for more in-depth information:
context_text/R/sna/statnet/sna-igraph-init.r
context_text/R/sna/statnet/sna-igraph-network-stats.r
In [82]:
#install.packages( "igraph" )
library( igraph )
Load our data matrix into an igraph object.
In [83]:
# load data into igraph instance.
gwHumanNetworkIgraph <- graph.adjacency( gwHumanNetworkMatrix, mode = "undirected", weighted = TRUE )
In [84]:
# add person_id (column 1168)
personIdColumnNumber <- 1168
# first, get just the data frame column with person ID:
personIdsColumn <- gwHumanDataDF[ , personIdColumnNumber ]
# populate list we will use to set node person_ID attribute
# Don't just do these - they don't convert to simple list/vector, contain remnants of data frame
#personIdsList <- personIdsColumn
#personIdsList <- c( personIdsColumn )
# Convert to a list of numbers.
personIdsList <- as.numeric( personIdsColumn )
# Try this if you have character attribute...
#personTypesList <- unname( unlist( personTypesColumn ) )
# set vertex/node attribute person_id
V( gwHumanNetworkIgraph )$person_id <- personIdsList
# OR use function:
#gwHumanNetworkIgraph <- set.vertex.attribute( gwHumanNetworkIgraph, "person_id", value = personIdsList )
# look at graph and person_type attribute values
gwHumanNetworkIgraph
V( gwHumanNetworkIgraph )$person_id
In [85]:
# add person_type (column 1169)
personTypeColumnNumber <- 1169
# first, get just the data frame column with person type:
personTypesColumn <- gwHumanDataDF[ , personTypeColumnNumber ]
# populate list we will use to set node person_type attribute
# Don't just do these - they don't convert to simple list/vector, contain remnants of data frame
#personTypesList <- personTypesColumn
#personTypesList <- c( personTypesColumn )
# Convert to a list of numbers.
personTypesList <- as.numeric( personTypesColumn )
# Try this if you have character attribute...
#personTypesList <- unname( unlist( personTypesColumn ) )
# set vertex/node attribute person_type
V( gwHumanNetworkIgraph )$person_type <- personTypesList
# OR use function:
#test1_igraph <- set.vertex.attribute( test1_igraph, "person_type", value = person_types_list )
# look at graph and person_type attribute values
gwHumanNetworkIgraph
V( gwHumanNetworkIgraph )$person_type
In [86]:
# human - include ties Greater than or equal to 0 (GE0)
gwHumanMeanTieWeightGE0Vector <- apply( gwHumanNetworkMatrix, 1, calculateListMean )
gwHumanDataDF$meanTieWeightGE0 <- gwHumanMeanTieWeightGE0Vector
# human - include ties Greater than or equal to 1 (GE1)
gwHumanMeanTieWeightGE1Vector <- apply( gwHumanNetworkMatrix, 1, calculateListMean, minValueToIncludeIN = 1 )
gwHumanDataDF$meanTieWeightGE1 <- gwHumanMeanTieWeightGE1Vector
# human - Max tie weight?
gwHumanMaxTieWeightVector <- apply( gwHumanNetworkMatrix, 1, calculateListMax )
gwHumanDataDF$maxTieWeight <- gwHumanMaxTieWeightVector
In [87]:
# to see count of nodes and edges, just type the object name:
gwHumanNetworkIgraph
# Will output something like:
#
# IGRAPH UNW- 314 309 --
# + attr: name (v/c), weight (e/n)
#
# in the first line, "UNW-" are traits of graph:
# - 1 - U = undirected ( directed would be "D" )
# - 2 - N = named or not ( "-" instead of "N" )
# - 3 - W = weighted
# - 4 - B = bipartite ( "-" = not bipartite )
# 314 is where node count goes, 309 is edge count.
# The second line gives you information about the 'attributes' associated with the graph. In this case, there are two attributes, name and weight. Next to each attribute name is a two-character construct that looks like "(v/c)". The first letter is the thing the attribute is associated with (g = graph, v = vertex or node, e = edge). The second is the type of the attribute (c = character data, n = numeric data). So, in this case:
# - name (v/c) - the name attribute is a vertex/node attribute - the "v" in "(v/c)" - where the values are character data - the "c" in "(v/c)".
# - weight (e/n) - the weight attribute is an edge attribute - the "e" in "(e/n)" - where the values are numeric data - the "n" in "(e/n)".
# - based on: http://www.shizukalab.com/toolkits/sna/sna_data
In [88]:
# try calling the degree() function on an igraph object. Returns a number vector with names.
gwHumanDegreeVector <- igraph::degree( gwHumanNetworkIgraph )
# For help with igraph::degree function:
#??igraph::degree
# calculate the mean of the degrees.
gwHumanAvgDegree <- mean( gwHumanDegreeVector )
message( paste( "grp_week human average degree = ", gwHumanAvgDegree, sep = "" ) )
# append the degrees to the network as a vertex attribute.
V( gwHumanNetworkIgraph )$degree <- gwHumanDegreeVector
# also add degree vector to original data frame
gwHumanDataDF$degree <- gwHumanDegreeVector
# if you want to just work with the traits of the nodes/vertexes, you can
# combine the attribute vectors into a data frame.
# first, output igraph object to see what attributes you have
gwHumanNetworkIgraph
V( gwHumanNetworkIgraph )$degree
Calculate average source and author degree:
In [89]:
# average author degree (person types 2 and 4)
gwHumanAverageAuthorDegree2And4 <- calcAuthorMeanDegree( dataFrameIN = gwHumanDataDF, includeBothIN = TRUE )
message( paste( "grp_week human average author degree (2 and 4) = ", gwHumanAverageAuthorDegree2And4, sep = "" ) )
# average author degree (person type 2 only)
gwHumanAverageAuthorDegreeOnly2 <- calcAuthorMeanDegree( dataFrameIN = gwHumanDataDF, includeBothIN = FALSE )
message( paste( "grp_week human average author degree (only 2) = ", gwHumanAverageAuthorDegreeOnly2, sep = "" ) )
# average source degree (person types 3 and 4)
gwHumanAverageSourceDegree3And4 <- calcSourceMeanDegree( dataFrameIN = gwHumanDataDF, includeBothIN = TRUE )
message( paste( "grp_week human average source degree (3 and 4) = ", gwHumanAverageSourceDegree3And4, sep = "" ) )
# average source degree (person type 3 only)
gwHumanAverageSourceDegreeOnly3 <- calcSourceMeanDegree( dataFrameIN = gwHumanDataDF, includeBothIN = FALSE )
message( paste( "grp_week human average source degree (only 3) = ", gwHumanAverageSourceDegreeOnly3, sep = "" ) )
Once we get the data into an igraph object, run the code in the following for more in-depth information:
context_text/R/sna/statnet/sna-igraph-init.r
context_text/R/sna/statnet/sna-igraph-network-stats.r
In [90]:
# First, need to load SNA functions and load data into statnet network object.
# For more details on that, see the files "functions-sna.r",
# "sna-load_data.r" and "sna-igraph_init.r".
#
# assumes that working directory for statnet is context_text/R/igraph
# setwd( ".." )
# source( "functions-sna.r" )
# source( "sna-load_data.r" )
# setwd( "igraph" )
# source( "sna-igraph-init.r" )
# results in (among other things):
# - humanNetworkData - data frame with human-generated network data matrix in it, including columns on the right side for any node-specific attributes.
# - calaisNetworkData - data frame with computer-generated network data matrix in it, including columns on the right side for any node-specific attributes.
# - humanNetworkTies - data frame with only human-generated network data matrix in it, no node-specific attributes.
# - calaisNetworkTies - data frame with only computer-generated network data matrix in it, no node-specific attributes.
# - humanNetworkMatrix - matrix with only human-generated network data matrix in it, no node-specific attributes.
# - calaisNetworkMatrix - matrix with only computer-generated network data matrix in it, no node-specific attributes.
# - humanNetworkIgraph - igraph network with human-coded network in it, including node-specific attributes.
# - calaisNetworkIgraph - igraph network with computer-coded network in it, including node-specific attributes.
# Links:
# - CRAN page: http://cran.r-project.org/web/packages/igraph/index.html
# - Manual (PDF): http://cran.r-project.org/web/packages/igraph/igraph.pdf
# - intro.: http://horicky.blogspot.com/2012/04/basic-graph-analytics-using-igraph.html
# - good notes: http://www.shizukalab.com/toolkits/sna/node-level-calculations
# Also, be advised that statnet and igraph don't really play nice together.
# If you'll be using both, best idea is to have a workspace for each.
#==============================================================================#
# igraph
#==============================================================================#
# Good notes:
# - http://assemblingnetwork.wordpress.com/2013/06/10/network-basics-with-r-and-igraph-part-ii-of-iii/
# make sure you've loaded the igraph library
# install.packages( "igraph" )
library( igraph )
#==============================================================================#
# NODE level
#==============================================================================#
# calculate the mean of the degrees.
gwHumanDegreeMean <- gwHumanAvgDegree
message( paste( "grp_week human degree mean = ", gwHumanDegreeMean, sep = "" ) )
# what is the standard deviation of these degrees?
gwHumanDegreeSd <- sd( gwHumanDegreeVector )
message( paste( "grp_week human degree SD = ", gwHumanDegreeSd, sep = "" ) )
# what is the variance of these degrees?
gwHumanDegreeVar <- var( gwHumanDegreeVector )
message( paste( "grp_week human degree Variance = ", gwHumanDegreeVar, sep = "" ) )
# what is the max value among these degrees?
gwHumanDegreeMax <- max( gwHumanDegreeVector )
message( paste( "grp_week human degree Max Value = ", gwHumanDegreeMax, sep = "" ) )
# calculate and plot degree distributions
gwHumanDegreeFrequenciesTable <- table( gwHumanDegreeVector )
gwHumanDegreeDistribution <- igraph::degree.distribution( gwHumanNetworkIgraph )
plot( gwHumanDegreeDistribution, xlab = "grp_week human node degree" )
lines( gwHumanDegreeDistribution )
# subset vector to get only those that are above mean
gwHumanAboveMeanVector <- gwHumanDegreeVector[ gwHumanDegreeVector > gwHumanDegreeMax ]
# node-level transitivity
# create transitivity vectors.
gwHumanTransitivityVector <- igraph::transitivity( gwHumanNetworkIgraph, type = "local" )
# append the transitivity to the network as a vertex attribute.
V( gwHumanNetworkIgraph )$transitivity <- gwHumanTransitivityVector
# also add transitivity vector to original data frame
gwHumanDataDF$transitivity <- gwHumanTransitivityVector
# And, if you want averages of these:
gwHumanMeanTransitivity <- mean( gwHumanTransitivityVector, na.rm = TRUE )
message( paste( "grp_week human mean transitivity = ", gwHumanMeanTransitivity, sep = "" ) )
#==============================================================================#
# NETWORK level
#==============================================================================#
#------------------------------------------------------------------------------#
# ==> graph-level degree centrality
#
# Returns a named list with the following components:
# res - The node-level centrality scores
# centralization - The graph level centrality index.
# theoretical_max - The maximum theoretical graph level centralization score
# for a graph with the given number of vertices, using the same parameters.
# If the normalized argument was TRUE (the default), then the result was
# divided by this number.
gwHumanDegreeCentralityOutput <- igraph::centralization.degree( gwHumanNetworkIgraph )
gwHumanDegreeCentrality <- gwHumanDegreeCentralityOutput$centralization
gwHumanDegreeCentralityMax <- gwHumanDegreeCentralityOutput$theoretical_max
message( paste( "grp_week human degree centrality = ", gwHumanDegreeCentrality, " ( max = ", gwHumanDegreeCentralityMax, " )", sep = "" ) )
# node-level degree centrality
gwHumanDegreeCentralityVector <- gwHumanDegreeCentralityOutput$res
#message( paste( "grp_week human betweenness = ", gwHumanBetweenness, sep = "" ) )
# append the degree centrality to the network as a vertex attribute.
V( gwHumanNetworkIgraph )$degreeCentrality <- gwHumanDegreeCentralityVector
# also add degree centrality vector to original data frame
gwHumanDataDF$degreeCentrality <- gwHumanDegreeCentralityVector
# And, if you want averages of these:
gwHumanMeanDegreeCentrality <- mean( gwHumanDegreeCentralityVector, na.rm = TRUE )
message( paste( "grp_week human mean degree centrality = ", gwHumanMeanDegreeCentrality, sep = "" ) )
#------------------------------------------------------------------------------#
# ==> graph-level undirected betweenness
#
# Returns a named list with the following components:
# res - The node-level centrality scores
# centralization - The graph level centrality index.
# theoretical_max - The maximum theoretical graph level centralization score
# for a graph with the given number of vertices, using the same parameters.
# If the normalized argument was TRUE (the default), then the result was
# divided by this number.
gwHumanBetweennessCentralityOutput <- igraph::centralization.betweenness( gwHumanNetworkIgraph, directed = FALSE )
gwHumanBetweennessCentrality <- gwHumanBetweennessCentralityOutput$centralization
gwHumanBetweennessCentralityMax <- gwHumanBetweennessCentralityOutput$theoretical_max
message( paste( "grp_week human betweenness centrality = ", gwHumanBetweennessCentrality, " ( max = ", gwHumanBetweennessCentralityMax, " )", sep = "" ) )
# node-level undirected betweenness
gwHumanBetweennessVector <- gwHumanBetweennessCentralityOutput$res
#message( paste( "grp_week human betweenness = ", gwHumanBetweenness, sep = "" ) )
# append the betweenness to the network as a vertex attribute.
V( gwHumanNetworkIgraph )$betweenness <- gwHumanBetweennessVector
# also add betweenness vector to original data frame
gwHumanDataDF$betweenness <- gwHumanBetweennessVector
# And, if you want averages of these:
gwHumanMeanBetweenness <- mean( gwHumanBetweennessVector, na.rm = TRUE )
message( paste( "grp_week human mean betweenness = ", gwHumanMeanBetweenness, sep = "" ) )
# graph-level transitivity
gwHumanTransitivity <- igraph::transitivity( gwHumanNetworkIgraph, type = "global" )
message( paste( "grp_week human transitivity = ", gwHumanTransitivity, sep = "" ) )
# graph-level density
gwHumanDensity <- igraph::graph.density( gwHumanNetworkIgraph )
message( paste( "grp_week human density = ", gwHumanDensity, sep = "" ) )
#==============================================================================#
# output attributes to data frame
#==============================================================================#
# if you want to just work with the traits of the nodes/vertexes, you can
# combine the attribute vectors into a data frame.
# first, output igraph object to see what attributes you have
gwHumanNetworkIgraph
# then, combine them into a data frame.
gwHumanAttributeDF <- data.frame( id = V( gwHumanNetworkIgraph )$name,
person_id = V( gwHumanNetworkIgraph )$person_id,
person_type = V( gwHumanNetworkIgraph )$person_type,
degree = V( gwHumanNetworkIgraph )$degree,
transitivity = V( gwHumanNetworkIgraph )$transitivity,
degreeCentrality = V( gwHumanNetworkIgraph )$degreeCentrality,
betweenness = V( gwHumanNetworkIgraph )$betweenness )
Save all the information in the current image, in case we need/want it later.
In [91]:
# help( save.image )
save.image( file = workspace_file_name )