R-igraph-grp_month-full_month
2017.12.02 - work log - prelim - R network analysis - igraph
Related files:
network descriptives
network-level
files
R scripts:
context_text/R/db_connect.r
context_text/R/sna/functions-sna.r
context_text/R/sna/sna-load_data.r
context_text/R/sna/igraph/*
context_text/R/sna/statnet/*
statnet/sna
sna::gden()
- graph densityR scripts:
context_text/R/sna/statnet/sna-statnet-init.r
context_text/R/sna/statnet/sna-statnet-network-stats.r
context_text/R/sna/statnet/sna-qap.r
igraph
igraph::transitivity()
- vector of transitivity scores for each node in a graph, plus network-level transitivity score.
R scripts:
context_text/R/sna/statnet/sna-igraph-init.r
context_text/R/sna/statnet/sna-igraph-network-stats.r
Store important directories and file names in variables:
In [1]:
getwd()
In [32]:
# code files (in particular SNA function library, modest though it may be)
code_directory <- "/home/jonathanmorgan/work/django/research/context_analysis/R/sna"
sna_function_file_path = paste( code_directory, "/", 'functions-sna.r', sep = "" )
# home directory
home_directory <- getwd()
home_directory <- "/home/jonathanmorgan/work/django/research/work/phd_work/methods"
# data directories
data_directory <- paste( home_directory, "/data", sep = "" )
workspace_file_name <- "igraph-grp_month-full_month.RData"
workspace_file_path <- paste( data_directory, "/", workspace_file_name )
In [26]:
# set working directory to data directory for now.
setwd( data_directory )
message( getwd() )
If you want, you can load this file's workspace, from a previous run - If you've changed values above, you might also need to re-run the above cells so new values overwrite those in workspace (make sure to save at the end, also):
In [24]:
# assumes that you've already set working directory above to the
# working directory.
setwd( data_directory )
load( workspace_file_name )
In [27]:
source( sna_function_file_path )
First, need render to render network data and upload it to your server.
Directions for rendering network data are in 2017.11.14-work_log-prelim-network_analysis.ipynb. You want a tab-delimited matrix that includes both the network and attributes of nodes as columns, and you want it to include a header row.
Once you render your network data files, you should place them on the server.
High level data file layout:
person_id
and person_type
)Files and their location on server:
This is data from the Grand Rapids Press articles from December of 2009, coded by both humans and OpenCalais.
Files:
sourcenet_data-20171115-043151-grp_month-automated.tab
sourcenet_data-20171115-043102-grp_month-human.tab
Location in Dropbox: Dropbox/academia/MSU/program_stuff/prelim_paper/data/network_analysis/2017.11.14/network/new_coders/grp_month
Location on server: /home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month
grp_month
(gm) - automated - OpenCalaisFirst, we'll analyze the month of data coded by OpenCalais. Set up some variables to store where data is located:
In [28]:
# initialize variables
gmAutomatedDataFolder <- paste( data_directory, "/network/grp_month", sep = "" )
gmAutomatedDataFile <- "sourcenet_data-20171205-022551-grp_month-automated.tab"
gmAutomatedDataPath <- paste( gmAutomatedDataFolder, "/", gmAutomatedDataFile, sep = "" )
In [29]:
gmAutomatedDataPath
Load the data file into memory
In [30]:
# tab-delimited:
gmAutomatedDataDF <- read.delim( gmAutomatedDataPath, header = TRUE, row.names = 1, check.names = FALSE )
In [31]:
# get count of rows...
gmAutomatedRowCount <- nrow( gmAutomatedDataDF )
message( paste( "grp_month automated row count = ", gmAutomatedRowCount, sep = "" ) )
# ...and columns
gmAutomatedColumnCount <- ncol( gmAutomatedDataDF )
message( paste( "grp_month automated column count = ", gmAutomatedColumnCount, sep = "" ) )
Get just the tie rows and columns for initializing network libraries.
In [33]:
# the below syntax returns only as many columns as there are rows, so
# omitting any trait columns that lie in columns on the right side
# of the file.
gmAutomatedNetworkDF <- gmAutomatedDataDF[ , 1 : gmAutomatedRowCount ]
#str( gmAutomatedNetworkDF )
In [34]:
# convert to a matrix
gmAutomatedNetworkMatrix <- as.matrix( gmAutomatedNetworkDF )
# str( gmAutomatedNetworkMatrix )
First, load the igraph package.
Once we get the data into an igraph object, run the code in the following for more in-depth information:
context_text/R/sna/statnet/sna-igraph-init.r
context_text/R/sna/statnet/sna-igraph-network-stats.r
In [35]:
#install.packages( "igraph" )
library( igraph )
Load our data matrix into an igraph object.
In [36]:
# load data into igraph instance.
gmAutomatedNetworkIgraph <- graph.adjacency( gmAutomatedNetworkMatrix, mode = "undirected", weighted = TRUE )
In [37]:
# add person_id (column 1168)
personIdColumnNumber <- 1168
# first, get just the data frame column with person ID:
personIdsColumn <- gmAutomatedDataDF[ , personIdColumnNumber ]
# populate list we will use to set node person_ID attribute
# Don't just do these - they don't convert to simple list/vector, contain remnants of data frame
#personIdsList <- personIdsColumn
#personIdsList <- c( personIdsColumn )
# Convert to a list of numbers.
personIdsList <- as.numeric( personIdsColumn )
# Try this if you have character attribute...
#personTypesList <- unname( unlist( personTypesColumn ) )
# set vertex/node attribute person_id
V( gmAutomatedNetworkIgraph )$person_id <- personIdsList
# OR use function:
#gmAutomatedNetworkIgraph <- set.vertex.attribute( gmAutomatedNetworkIgraph, "person_id", value = personIdsList )
# look at graph and person_type attribute values
gmAutomatedNetworkIgraph
V( gmAutomatedNetworkIgraph )$person_id
In [38]:
# add person_type (column 1169)
personTypeColumnNumber <- 1169
# first, get just the data frame column with person type:
personTypesColumn <- gmAutomatedDataDF[ , personTypeColumnNumber ]
# populate list we will use to set node person_type attribute
# Don't just do these - they don't convert to simple list/vector, contain remnants of data frame
#personTypesList <- personTypesColumn
#personTypesList <- c( personTypesColumn )
# Convert to a list of numbers.
personTypesList <- as.numeric( personTypesColumn )
# Try this if you have character attribute...
#personTypesList <- unname( unlist( personTypesColumn ) )
# set vertex/node attribute person_type
V( gmAutomatedNetworkIgraph )$person_type <- personTypesList
# OR use function:
#test1_igraph <- set.vertex.attribute( test1_igraph, "person_type", value = person_types_list )
# look at graph and person_type attribute values
gmAutomatedNetworkIgraph
V( gmAutomatedNetworkIgraph )$person_type
In [39]:
# calais - include ties Greater than or equal to 0 (GE0)
gmAutomatedMeanTieWeightGE0Vector <- apply( gmAutomatedNetworkMatrix, 1, calculateListMean )
gmAutomatedDataDF$meanTieWeightGE0 <- gmAutomatedMeanTieWeightGE0Vector
# calais - include ties Greater than or equal to 1 (GE1)
gmAutomatedMeanTieWeightGE1Vector <- apply( gmAutomatedNetworkMatrix, 1, calculateListMean, minValueToIncludeIN = 1 )
gmAutomatedDataDF$meanTieWeightGE1 <- gmAutomatedMeanTieWeightGE1Vector
# automated - Max tie weight?
gmAutomatedMaxTieWeightVector <- apply( gmAutomatedNetworkMatrix, 1, calculateListMax )
gmAutomatedDataDF$maxTieWeight <- gmAutomatedMaxTieWeightVector
In [40]:
# to see count of nodes and edges, just type the object name:
gmAutomatedNetworkIgraph
# Will output something like:
#
# IGRAPH UNW- 314 309 --
# + attr: name (v/c), weight (e/n)
#
# in the first line, "UNW-" are traits of graph:
# - 1 - U = undirected ( directed would be "D" )
# - 2 - N = named or not ( "-" instead of "N" )
# - 3 - W = weighted
# - 4 - B = bipartite ( "-" = not bipartite )
# 314 is where node count goes, 309 is edge count.
# The second line gives you information about the 'attributes' associated with the graph. In this case, there are two attributes, name and weight. Next to each attribute name is a two-character construct that looks like "(v/c)". The first letter is the thing the attribute is associated with (g = graph, v = vertex or node, e = edge). The second is the type of the attribute (c = character data, n = numeric data). So, in this case:
# - name (v/c) - the name attribute is a vertex/node attribute - the "v" in "(v/c)" - where the values are character data - the "c" in "(v/c)".
# - weight (e/n) - the weight attribute is an edge attribute - the "e" in "(e/n)" - where the values are numeric data - the "n" in "(e/n)".
# - based on: http://www.shizukalab.com/toolkits/sna/sna_data
In [41]:
# try calling the degree() function on an igraph object. Returns a number vector with names.
gmAutomatedDegreeVector <- igraph::degree( gmAutomatedNetworkIgraph )
# For help with igraph::degree function:
#??igraph::degree
# calculate the mean of the degrees.
gmAutomatedAvgDegree <- mean( gmAutomatedDegreeVector )
message( paste( "grp_month automated average degree = ", gmAutomatedAvgDegree, sep = "" ) )
# append the degrees to the network as a vertex attribute.
V( gmAutomatedNetworkIgraph )$degree <- gmAutomatedDegreeVector
# also add degree vector to original data frame
gmAutomatedDataDF$degree <- gmAutomatedDegreeVector
# if you want to just work with the traits of the nodes/vertexes, you can
# combine the attribute vectors into a data frame.
# first, output igraph object to see what attributes you have
gmAutomatedNetworkIgraph
V( gmAutomatedNetworkIgraph )$degree
Calculate average source and author degree:
In [42]:
# average author degree (person types 2 and 4)
gmAutomatedAverageAuthorDegree2And4 <- calcAuthorMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = TRUE )
message( paste( "grp_month automated average author degree (2 and 4) = ", gmAutomatedAverageAuthorDegree2And4, sep = "" ) )
# average author degree (person type 2 only)
gmAutomatedAverageAuthorDegreeOnly2 <- calcAuthorMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = FALSE )
message( paste( "grp_month automated average author degree (only 2) = ", gmAutomatedAverageAuthorDegreeOnly2, sep = "" ) )
# average source degree (person types 3 and 4)
gmAutomatedAverageSourceDegree3And4 <- calcSourceMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = TRUE )
message( paste( "grp_month automated average source degree (3 and 4) = ", gmAutomatedAverageSourceDegree3And4, sep = "" ) )
# average source degree (person type 3 only)
gmAutomatedAverageSourceDegreeOnly3 <- calcSourceMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = FALSE )
message( paste( "grp_month automated average source degree (only 3) = ", gmAutomatedAverageSourceDegreeOnly3, sep = "" ) )
Once we get the data into an igraph object, run the code in the following for more in-depth information:
context_text/R/sna/statnet/sna-igraph-init.r
context_text/R/sna/statnet/sna-igraph-network-stats.r
In [43]:
# First, need to load SNA functions and load data into statnet network object.
# For more details on that, see the files "functions-sna.r",
# "sna-load_data.r" and "sna-igraph_init.r".
#
# assumes that working directory for statnet is context_text/R/igraph
# setwd( ".." )
# source( "functions-sna.r" )
# source( "sna-load_data.r" )
# setwd( "igraph" )
# source( "sna-igraph-init.r" )
# results in (among other things):
# - humanNetworkData - data frame with human-generated network data matrix in it, including columns on the right side for any node-specific attributes.
# - calaisNetworkData - data frame with computer-generated network data matrix in it, including columns on the right side for any node-specific attributes.
# - humanNetworkTies - data frame with only human-generated network data matrix in it, no node-specific attributes.
# - calaisNetworkTies - data frame with only computer-generated network data matrix in it, no node-specific attributes.
# - humanNetworkMatrix - matrix with only human-generated network data matrix in it, no node-specific attributes.
# - calaisNetworkMatrix - matrix with only computer-generated network data matrix in it, no node-specific attributes.
# - humanNetworkIgraph - igraph network with human-coded network in it, including node-specific attributes.
# - calaisNetworkIgraph - igraph network with computer-coded network in it, including node-specific attributes.
# Links:
# - CRAN page: http://cran.r-project.org/web/packages/igraph/index.html
# - Manual (PDF): http://cran.r-project.org/web/packages/igraph/igraph.pdf
# - intro.: http://horicky.blogspot.com/2012/04/basic-graph-analytics-using-igraph.html
# - good notes: http://www.shizukalab.com/toolkits/sna/node-level-calculations
# Also, be advised that statnet and igraph don't really play nice together.
# If you'll be using both, best idea is to have a workspace for each.
#==============================================================================#
# igraph
#==============================================================================#
# Good notes:
# - http://assemblingnetwork.wordpress.com/2013/06/10/network-basics-with-r-and-igraph-part-ii-of-iii/
# make sure you've loaded the igraph library
# install.packages( "igraph" )
library( igraph )
#==============================================================================#
# NODE level
#==============================================================================#
# calculate the mean of the degrees.
gmAutomatedDegreeMean <- gmAutomatedAvgDegree
message( paste( "grp_month automated degree mean = ", gmAutomatedDegreeMean, sep = "" ) )
# what is the standard deviation of these degrees?
gmAutomatedDegreeSd <- sd( gmAutomatedDegreeVector )
message( paste( "grp_month automated degree SD = ", gmAutomatedDegreeSd, sep = "" ) )
# what is the variance of these degrees?
gmAutomatedDegreeVar <- var( gmAutomatedDegreeVector )
message( paste( "grp_month automated degree Variance = ", gmAutomatedDegreeVar, sep = "" ) )
# what is the max value among these degrees?
gmAutomatedDegreeMax <- max( gmAutomatedDegreeVector )
message( paste( "grp_month automated degree Max Value = ", gmAutomatedDegreeMax, sep = "" ) )
# calculate and plot degree distributions
gmAutomatedDegreeFrequenciesTable <- table( gmAutomatedDegreeVector )
gmAutomatedDegreeDistribution <- igraph::degree.distribution( gmAutomatedNetworkIgraph )
plot( gmAutomatedDegreeDistribution, xlab = "grp_month automated node degree" )
lines( gmAutomatedDegreeDistribution )
# subset vector to get only those that are above mean
gmAutomatedAboveMeanVector <- gmAutomatedDegreeVector[ gmAutomatedDegreeVector > gmAutomatedDegreeMax ]
# node-level transitivity
# create transitivity vectors.
gmAutomatedTransitivityVector <- igraph::transitivity( gmAutomatedNetworkIgraph, type = "local" )
# append the transitivity to the network as a vertex attribute.
V( gmAutomatedNetworkIgraph )$transitivity <- gmAutomatedTransitivityVector
# also add transitivity vector to original data frame
gmAutomatedDataDF$transitivity <- gmAutomatedTransitivityVector
# And, if you want averages of these:
gmAutomatedMeanTransitivity <- mean( gmAutomatedTransitivityVector, na.rm = TRUE )
message( paste( "grp_month automated mean transitivity = ", gmAutomatedMeanTransitivity, sep = "" ) )
#==============================================================================#
# NETWORK level
#==============================================================================#
#------------------------------------------------------------------------------#
# ==> graph-level degree centrality
#
# Returns a named list with the following components:
# res - The node-level centrality scores
# centralization - The graph level centrality index.
# theoretical_max - The maximum theoretical graph level centralization score
# for a graph with the given number of vertices, using the same parameters.
# If the normalized argument was TRUE (the default), then the result was
# divided by this number.
gmAutomatedDegreeCentralityOutput <- igraph::centralization.degree( gmAutomatedNetworkIgraph )
gmAutomatedDegreeCentrality <- gmAutomatedDegreeCentralityOutput$centralization
gmAutomatedDegreeCentralityMax <- gmAutomatedDegreeCentralityOutput$theoretical_max
message( paste( "grp_month automated degree centrality = ", gmAutomatedDegreeCentrality, " ( max = ", gmAutomatedDegreeCentralityMax, " )", sep = "" ) )
# node-level degree centrality
gmAutomatedDegreeCentralityVector <- gmAutomatedDegreeCentralityOutput$res
#message( paste( "grp_month automated betweenness = ", gmAutomatedBetweenness, sep = "" ) )
# append the degree centrality to the network as a vertex attribute.
V( gmAutomatedNetworkIgraph )$degreeCentrality <- gmAutomatedDegreeCentralityVector
# also add degree centrality vector to original data frame
gmAutomatedDataDF$degreeCentrality <- gmAutomatedDegreeCentralityVector
# And, if you want averages of these:
gmAutomatedMeanDegreeCentrality <- mean( gmAutomatedDegreeCentralityVector, na.rm = TRUE )
message( paste( "grp_month automated mean degree centrality = ", gmAutomatedMeanDegreeCentrality, sep = "" ) )
#------------------------------------------------------------------------------#
# ==> graph-level undirected betweenness
#
# Returns a named list with the following components:
# res - The node-level centrality scores
# centralization - The graph level centrality index.
# theoretical_max - The maximum theoretical graph level centralization score
# for a graph with the given number of vertices, using the same parameters.
# If the normalized argument was TRUE (the default), then the result was
# divided by this number.
gmAutomatedBetweennessCentralityOutput <- igraph::centralization.betweenness( gmAutomatedNetworkIgraph, directed = FALSE )
gmAutomatedBetweennessCentrality <- gmAutomatedBetweennessCentralityOutput$centralization
gmAutomatedBetweennessCentralityMax <- gmAutomatedBetweennessCentralityOutput$theoretical_max
message( paste( "grp_month automated betweenness centrality = ", gmAutomatedBetweennessCentrality, " ( max = ", gmAutomatedBetweennessCentralityMax, " )", sep = "" ) )
# node-level undirected betweenness
gmAutomatedBetweennessVector <- gmAutomatedBetweennessCentralityOutput$res
#message( paste( "grp_month automated betweenness = ", gmAutomatedBetweenness, sep = "" ) )
# append the betweenness to the network as a vertex attribute.
V( gmAutomatedNetworkIgraph )$betweenness <- gmAutomatedBetweennessVector
# also add betweenness vector to original data frame
gmAutomatedDataDF$betweenness <- gmAutomatedBetweennessVector
# And, if you want averages of these:
gmAutomatedMeanBetweenness <- mean( gmAutomatedBetweennessVector, na.rm = TRUE )
message( paste( "grp_month automated mean betweenness = ", gmAutomatedMeanBetweenness, sep = "" ) )
# graph-level transitivity
gmAutomatedTransitivity <- igraph::transitivity( gmAutomatedNetworkIgraph, type = "global" )
message( paste( "grp_month automated transitivity = ", gmAutomatedTransitivity, sep = "" ) )
# graph-level density
gmAutomatedDensity <- igraph::graph.density( gmAutomatedNetworkIgraph )
message( paste( "grp_month automated density = ", gmAutomatedDensity, sep = "" ) )
#==============================================================================#
# output attributes to data frame
#==============================================================================#
# if you want to just work with the traits of the nodes/vertexes, you can
# combine the attribute vectors into a data frame.
# first, output igraph object to see what attributes you have
gmAutomatedNetworkIgraph
# then, combine them into a data frame.
gmAutomatedAttributeDF <- data.frame( id = V( gmAutomatedNetworkIgraph )$name,
person_id = V( gmAutomatedNetworkIgraph )$person_id,
person_type = V( gmAutomatedNetworkIgraph )$person_type,
degree = V( gmAutomatedNetworkIgraph )$degree,
transitivity = V( gmAutomatedNetworkIgraph )$transitivity,
degreeCentrality = V( gmAutomatedNetworkIgraph )$degreeCentrality,
betweenness = V( gmAutomatedNetworkIgraph )$betweenness )
In [44]:
?igraph::centralization.betweenness
grp_month
(gm) - humanNext, we'll analyze the month of data coded by humans. Set up some variables to store where data is located:
In [45]:
# initialize variables
gmHumanDataFolder <- paste( data_directory, "/network/grp_month", sep = "" )
gmHumanDataFile <- "sourcenet_data-20171115-043102-grp_month-human.tab"
gmHumanDataPath <- paste( gmHumanDataFolder, "/", gmHumanDataFile, sep = "" )
In [46]:
gmHumanDataPath
Load the data file into memory
In [47]:
# tab-delimited:
gmHumanDataDF <- read.delim( gmHumanDataPath, header = TRUE, row.names = 1, check.names = FALSE )
In [48]:
# get count of rows...
gmHumanRowCount <- nrow( gmHumanDataDF )
paste( "grp_month human row count = ", gmHumanRowCount, sep = "" )
# ...and columns
gmHumanColumnCount <- ncol( gmHumanDataDF )
paste( "grp_month human column count = ", gmHumanColumnCount, sep = "" )
Get just the tie rows and columns for initializing network libraries.
In [49]:
# the below syntax returns only as many columns as there are rows, so
# omitting any trait columns that lie in columns on the right side
# of the file.
gmHumanNetworkDF <- gmHumanDataDF[ , 1 : gmHumanRowCount ]
#str( gmAutomatedNetworkDF )
In [50]:
# convert to a matrix
gmHumanNetworkMatrix <- as.matrix( gmHumanNetworkDF )
# str( gmHumanNetworkMatrix )
First, load the igraph package.
Once we get the data into an igraph object, run the code in the following for more in-depth information:
context_text/R/sna/statnet/sna-igraph-init.r
context_text/R/sna/statnet/sna-igraph-network-stats.r
In [51]:
#install.packages( "igraph" )
library( igraph )
Load our data matrix into an igraph object.
In [52]:
# load data into igraph instance.
gmHumanNetworkIgraph <- graph.adjacency( gmHumanNetworkMatrix, mode = "undirected", weighted = TRUE )
In [53]:
# add person_id (column 1168)
personIdColumnNumber <- 1168
# first, get just the data frame column with person ID:
personIdsColumn <- gmHumanDataDF[ , personIdColumnNumber ]
# populate list we will use to set node person_ID attribute
# Don't just do these - they don't convert to simple list/vector, contain remnants of data frame
#personIdsList <- personIdsColumn
#personIdsList <- c( personIdsColumn )
# Convert to a list of numbers.
personIdsList <- as.numeric( personIdsColumn )
# Try this if you have character attribute...
#personTypesList <- unname( unlist( personTypesColumn ) )
# set vertex/node attribute person_id
V( gmHumanNetworkIgraph )$person_id <- personIdsList
# OR use function:
#gmHumanNetworkIgraph <- set.vertex.attribute( gmHumanNetworkIgraph, "person_id", value = personIdsList )
# look at graph and person_type attribute values
gmHumanNetworkIgraph
V( gmHumanNetworkIgraph )$person_id
In [54]:
# add person_type (column 1169)
personTypeColumnNumber <- 1169
# first, get just the data frame column with person type:
personTypesColumn <- gmHumanDataDF[ , personTypeColumnNumber ]
# populate list we will use to set node person_type attribute
# Don't just do these - they don't convert to simple list/vector, contain remnants of data frame
#personTypesList <- personTypesColumn
#personTypesList <- c( personTypesColumn )
# Convert to a list of numbers.
personTypesList <- as.numeric( personTypesColumn )
# Try this if you have character attribute...
#personTypesList <- unname( unlist( personTypesColumn ) )
# set vertex/node attribute person_type
V( gmHumanNetworkIgraph )$person_type <- personTypesList
# OR use function:
#test1_igraph <- set.vertex.attribute( test1_igraph, "person_type", value = person_types_list )
# look at graph and person_type attribute values
gmHumanNetworkIgraph
V( gmHumanNetworkIgraph )$person_type
In [55]:
# human - include ties Greater than or equal to 0 (GE0)
gmHumanMeanTieWeightGE0Vector <- apply( gmHumanNetworkMatrix, 1, calculateListMean )
gmHumanDataDF$meanTieWeightGE0 <- gmHumanMeanTieWeightGE0Vector
# human - include ties Greater than or equal to 1 (GE1)
gmHumanMeanTieWeightGE1Vector <- apply( gmHumanNetworkMatrix, 1, calculateListMean, minValueToIncludeIN = 1 )
gmHumanDataDF$meanTieWeightGE1 <- gmHumanMeanTieWeightGE1Vector
# human - Max tie weight?
gmHumanMaxTieWeightVector <- apply( gmHumanNetworkMatrix, 1, calculateListMax )
gmHumanDataDF$maxTieWeight <- gmHumanMaxTieWeightVector
In [56]:
# to see count of nodes and edges, just type the object name:
gmHumanNetworkIgraph
# Will output something like:
#
# IGRAPH UNW- 314 309 --
# + attr: name (v/c), weight (e/n)
#
# in the first line, "UNW-" are traits of graph:
# - 1 - U = undirected ( directed would be "D" )
# - 2 - N = named or not ( "-" instead of "N" )
# - 3 - W = weighted
# - 4 - B = bipartite ( "-" = not bipartite )
# 314 is where node count goes, 309 is edge count.
# The second line gives you information about the 'attributes' associated with the graph. In this case, there are two attributes, name and weight. Next to each attribute name is a two-character construct that looks like "(v/c)". The first letter is the thing the attribute is associated with (g = graph, v = vertex or node, e = edge). The second is the type of the attribute (c = character data, n = numeric data). So, in this case:
# - name (v/c) - the name attribute is a vertex/node attribute - the "v" in "(v/c)" - where the values are character data - the "c" in "(v/c)".
# - weight (e/n) - the weight attribute is an edge attribute - the "e" in "(e/n)" - where the values are numeric data - the "n" in "(e/n)".
# - based on: http://www.shizukalab.com/toolkits/sna/sna_data
In [57]:
# try calling the degree() function on an igraph object. Returns a number vector with names.
gmHumanDegreeVector <- igraph::degree( gmHumanNetworkIgraph )
# For help with igraph::degree function:
#??igraph::degree
# calculate the mean of the degrees.
gmHumanAvgDegree <- mean( gmHumanDegreeVector )
message( paste( "grp_month human average degree = ", gmHumanAvgDegree, sep = "" ) )
# append the degrees to the network as a vertex attribute.
V( gmHumanNetworkIgraph )$degree <- gmHumanDegreeVector
# also add degree vector to original data frame
gmHumanDataDF$degree <- gmHumanDegreeVector
# if you want to just work with the traits of the nodes/vertexes, you can
# combine the attribute vectors into a data frame.
# first, output igraph object to see what attributes you have
gmHumanNetworkIgraph
V( gmHumanNetworkIgraph )$degree
Calculate average source and author degree:
In [58]:
# average author degree (person types 2 and 4)
gmHumanAverageAuthorDegree2And4 <- calcAuthorMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = TRUE )
message( paste( "grp_month human average author degree (2 and 4) = ", gmHumanAverageAuthorDegree2And4, sep = "" ) )
# average author degree (person type 2 only)
gmHumanAverageAuthorDegreeOnly2 <- calcAuthorMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = FALSE )
message( paste( "grp_month human average author degree (only 2) = ", gmHumanAverageAuthorDegreeOnly2, sep = "" ) )
# average source degree (person types 3 and 4)
gmHumanAverageSourceDegree3And4 <- calcSourceMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = TRUE )
message( paste( "grp_month human average source degree (3 and 4) = ", gmHumanAverageSourceDegree3And4, sep = "" ) )
# average source degree (person type 3 only)
gmHumanAverageSourceDegreeOnly3 <- calcSourceMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = FALSE )
message( paste( "grp_month human average source degree (only 3) = ", gmHumanAverageSourceDegreeOnly3, sep = "" ) )
Once we get the data into an igraph object, run the code in the following for more in-depth information:
context_text/R/sna/statnet/sna-igraph-init.r
context_text/R/sna/statnet/sna-igraph-network-stats.r
In [59]:
# First, need to load SNA functions and load data into statnet network object.
# For more details on that, see the files "functions-sna.r",
# "sna-load_data.r" and "sna-igraph_init.r".
#
# assumes that working directory for statnet is context_text/R/igraph
# setwd( ".." )
# source( "functions-sna.r" )
# source( "sna-load_data.r" )
# setwd( "igraph" )
# source( "sna-igraph-init.r" )
# results in (among other things):
# - humanNetworkData - data frame with human-generated network data matrix in it, including columns on the right side for any node-specific attributes.
# - calaisNetworkData - data frame with computer-generated network data matrix in it, including columns on the right side for any node-specific attributes.
# - humanNetworkTies - data frame with only human-generated network data matrix in it, no node-specific attributes.
# - calaisNetworkTies - data frame with only computer-generated network data matrix in it, no node-specific attributes.
# - humanNetworkMatrix - matrix with only human-generated network data matrix in it, no node-specific attributes.
# - calaisNetworkMatrix - matrix with only computer-generated network data matrix in it, no node-specific attributes.
# - humanNetworkIgraph - igraph network with human-coded network in it, including node-specific attributes.
# - calaisNetworkIgraph - igraph network with computer-coded network in it, including node-specific attributes.
# Links:
# - CRAN page: http://cran.r-project.org/web/packages/igraph/index.html
# - Manual (PDF): http://cran.r-project.org/web/packages/igraph/igraph.pdf
# - intro.: http://horicky.blogspot.com/2012/04/basic-graph-analytics-using-igraph.html
# - good notes: http://www.shizukalab.com/toolkits/sna/node-level-calculations
# Also, be advised that statnet and igraph don't really play nice together.
# If you'll be using both, best idea is to have a workspace for each.
#==============================================================================#
# igraph
#==============================================================================#
# Good notes:
# - http://assemblingnetwork.wordpress.com/2013/06/10/network-basics-with-r-and-igraph-part-ii-of-iii/
# make sure you've loaded the igraph library
# install.packages( "igraph" )
library( igraph )
#==============================================================================#
# NODE level
#==============================================================================#
# calculate the mean of the degrees.
gmHumanDegreeMean <- gmHumanAvgDegree
message( paste( "grp_month human degree mean = ", gmHumanDegreeMean, sep = "" ) )
# what is the standard deviation of these degrees?
gmHumanDegreeSd <- sd( gmHumanDegreeVector )
message( paste( "grp_month human degree SD = ", gmHumanDegreeSd, sep = "" ) )
# what is the variance of these degrees?
gmHumanDegreeVar <- var( gmHumanDegreeVector )
message( paste( "grp_month human degree Variance = ", gmHumanDegreeVar, sep = "" ) )
# what is the max value among these degrees?
gmHumanDegreeMax <- max( gmHumanDegreeVector )
message( paste( "grp_month human degree Max Value = ", gmHumanDegreeMax, sep = "" ) )
# calculate and plot degree distributions
gmHumanDegreeFrequenciesTable <- table( gmHumanDegreeVector )
gmHumanDegreeDistribution <- igraph::degree.distribution( gmHumanNetworkIgraph )
plot( gmHumanDegreeDistribution, xlab = "grp_month human node degree" )
lines( gmHumanDegreeDistribution )
# subset vector to get only those that are above mean
gmHumanAboveMeanVector <- gmHumanDegreeVector[ gmHumanDegreeVector > gmHumanDegreeMax ]
# node-level transitivity
# create transitivity vectors.
gmHumanTransitivityVector <- igraph::transitivity( gmHumanNetworkIgraph, type = "local" )
# append the transitivity to the network as a vertex attribute.
V( gmHumanNetworkIgraph )$transitivity <- gmHumanTransitivityVector
# also add transitivity vector to original data frame
gmHumanDataDF$transitivity <- gmHumanTransitivityVector
# And, if you want averages of these:
gmHumanMeanTransitivity <- mean( gmHumanTransitivityVector, na.rm = TRUE )
message( paste( "grp_month human mean transitivity = ", gmHumanMeanTransitivity, sep = "" ) )
#==============================================================================#
# NETWORK level
#==============================================================================#
#------------------------------------------------------------------------------#
# ==> graph-level degree centrality
#
# Returns a named list with the following components:
# res - The node-level centrality scores
# centralization - The graph level centrality index.
# theoretical_max - The maximum theoretical graph level centralization score
# for a graph with the given number of vertices, using the same parameters.
# If the normalized argument was TRUE (the default), then the result was
# divided by this number.
gmHumanDegreeCentralityOutput <- igraph::centralization.degree( gmHumanNetworkIgraph )
gmHumanDegreeCentrality <- gmHumanDegreeCentralityOutput$centralization
gmHumanDegreeCentralityMax <- gmHumanDegreeCentralityOutput$theoretical_max
message( paste( "grp_month human degree centrality = ", gmHumanDegreeCentrality, " ( max = ", gmHumanDegreeCentralityMax, " )", sep = "" ) )
# node-level degree centrality
gmHumanDegreeCentralityVector <- gmHumanDegreeCentralityOutput$res
#message( paste( "grp_month human betweenness = ", gmHumanBetweenness, sep = "" ) )
# append the degree centrality to the network as a vertex attribute.
V( gmHumanNetworkIgraph )$degreeCentrality <- gmHumanDegreeCentralityVector
# also add degree centrality vector to original data frame
gmHumanDataDF$degreeCentrality <- gmHumanDegreeCentralityVector
# And, if you want averages of these:
gmHumanMeanDegreeCentrality <- mean( gmHumanDegreeCentralityVector, na.rm = TRUE )
message( paste( "grp_month human mean degree centrality = ", gmHumanMeanDegreeCentrality, sep = "" ) )
#------------------------------------------------------------------------------#
# ==> graph-level undirected betweenness
#
# Returns a named list with the following components:
# res - The node-level centrality scores
# centralization - The graph level centrality index.
# theoretical_max - The maximum theoretical graph level centralization score
# for a graph with the given number of vertices, using the same parameters.
# If the normalized argument was TRUE (the default), then the result was
# divided by this number.
gmHumanBetweennessCentralityOutput <- igraph::centralization.betweenness( gmHumanNetworkIgraph, directed = FALSE )
gmHumanBetweennessCentrality <- gmHumanBetweennessCentralityOutput$centralization
gmHumanBetweennessCentralityMax <- gmHumanBetweennessCentralityOutput$theoretical_max
message( paste( "grp_month human betweenness centrality = ", gmHumanBetweennessCentrality, " ( max = ", gmHumanBetweennessCentralityMax, " )", sep = "" ) )
# node-level undirected betweenness
gmHumanBetweennessVector <- gmHumanBetweennessCentralityOutput$res
#message( paste( "grp_month human betweenness = ", gmHumanBetweenness, sep = "" ) )
# append the betweenness to the network as a vertex attribute.
V( gmHumanNetworkIgraph )$betweenness <- gmHumanBetweennessVector
# also add betweenness vector to original data frame
gmHumanDataDF$betweenness <- gmHumanBetweennessVector
# And, if you want averages of these:
gmHumanMeanBetweenness <- mean( gmHumanBetweennessVector, na.rm = TRUE )
message( paste( "grp_month human mean betweenness = ", gmHumanMeanBetweenness, sep = "" ) )
# graph-level transitivity
gmHumanTransitivity <- igraph::transitivity( gmHumanNetworkIgraph, type = "global" )
message( paste( "grp_month human transitivity = ", gmHumanTransitivity, sep = "" ) )
# graph-level density
gmHumanDensity <- igraph::graph.density( gmHumanNetworkIgraph )
message( paste( "grp_month human density = ", gmHumanDensity, sep = "" ) )
#==============================================================================#
# output attributes to data frame
#==============================================================================#
# if you want to just work with the traits of the nodes/vertexes, you can
# combine the attribute vectors into a data frame.
# first, output igraph object to see what attributes you have
gmHumanNetworkIgraph
# then, combine them into a data frame.
gmHumanAttributeDF <- data.frame( id = V( gmHumanNetworkIgraph )$name,
person_id = V( gmHumanNetworkIgraph )$person_id,
person_type = V( gmHumanNetworkIgraph )$person_type,
degree = V( gmHumanNetworkIgraph )$degree,
transitivity = V( gmHumanNetworkIgraph )$transitivity,
degreeCentrality = V( gmHumanNetworkIgraph )$degreeCentrality,
betweenness = V( gmHumanNetworkIgraph )$betweenness )
Save all the information in the current image, in case we need/want it later.
In [ ]:
message( paste( "workspace_file_name = ", workspace_file_name, sep = "" ) )
In [60]:
# help( save.image )
save.image( file = workspace_file_name )
message( paste( "saved workspace_file_name = ", workspace_file_name, sep = "" ) )