R-igraph-grp_month-week_1
2017.12.02 - work log - prelim - R network analysis - igraph
Related files:
network descriptives
network-level
files
R scripts:
context_text/R/db_connect.r
context_text/R/sna/functions-sna.r
context_text/R/sna/sna-load_data.r
context_text/R/sna/igraph/*
context_text/R/sna/statnet/*
statnet/sna
sna::gden()
- graph densityR scripts:
context_text/R/sna/statnet/sna-statnet-init.r
context_text/R/sna/statnet/sna-statnet-network-stats.r
context_text/R/sna/statnet/sna-qap.r
igraph
igraph::transitivity()
- vector of transitivity scores for each node in a graph, plus network-level transitivity score.
R scripts:
context_text/R/sna/statnet/sna-igraph-init.r
context_text/R/sna/statnet/sna-igraph-network-stats.r
Store important directories and file names in variables:
In [1]:
getwd()
In [2]:
# code files (in particular SNA function library, modest though it may be)
code_directory <- "/home/jonathanmorgan/work/django/research/context_analysis/R/sna"
sna_function_file_path = paste( code_directory, "/", 'functions-sna.r', sep = "" )
# home directory
home_directory <- getwd()
home_directory <- "/home/jonathanmorgan/work/django/research/work/phd_work/methods"
# data directories
data_directory <- paste( home_directory, "/data", sep = "" )
workspace_file_name <- "igraph-grp_month-week_1.RData"
workspace_file_path <- paste( data_directory, "/", workspace_file_name )
In [3]:
# set working directory to data directory for now.
setwd( data_directory )
message( getwd() )
If you want, you can load this file's workspace, from a previous run. If you changed any directories above, you'll have to re-run the above cells after loading the workspace, and you'll want to save at the end, as well. Load the workspace:
In [ ]:
# assumes that you've already set working directory above to the
# working directory.
setwd( data_directory )
load( workspace_file_name )
In [4]:
source( sna_function_file_path )
First, need render to render network data and upload it to your server.
Directions for rendering network data are in 2017.11.14-work_log-prelim-network_analysis.ipynb. You want a tab-delimited matrix that includes both the network and attributes of nodes as columns, and you want it to include a header row.
Once you render your network data files, you should place them on the server.
High level data file layout:
person_id
and person_type
)Files and their location on server:
This is data from the Grand Rapids Press articles from December of 2009, coded by both humans and OpenCalais.
Files:
sourcenet_data-20171115-043246-grp_month-automated-week1_subset.tab
sourcenet_data-20171115-043404-grp_month-human-week1_subset.tab
Location in Dropbox: Dropbox/academia/MSU/program_stuff/prelim_paper/data/network_analysis/2017.11.14/network/new_coders/grp_month
Location on server: /home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month
grp_week_1
analysisFirst, look at a single week out of the shiny new month of data.
grp_week_1
(gw1) - automated - OpenCalaisFirst, we'll analyze the week of data coded by OpenCalais. Set up some variables to store where data is located:
In [5]:
# initialize variables
gwAutomatedDataFolder <- paste( data_directory, "/network/grp_month", sep = "" )
gwAutomatedDataFile <- "sourcenet_data-20171206-031358-grp_month-automated-week1_subset.tab"
gwAutomatedDataPath <- paste( gwAutomatedDataFolder, "/", gwAutomatedDataFile, sep = "" )
In [6]:
gwAutomatedDataPath
Load the data file into memory
In [7]:
# tab-delimited:
gwAutomatedDataDF <- read.delim( gwAutomatedDataPath, header = TRUE, row.names = 1, check.names = FALSE )
In [8]:
# get count of rows...
gwAutomatedRowCount <- nrow( gwAutomatedDataDF )
message( paste( "grp_week automated row count = ", gwAutomatedRowCount, sep = "" ) )
# ...and columns
gwAutomatedColumnCount <- ncol( gwAutomatedDataDF )
message( paste( "grp_week automated column count = ", gwAutomatedColumnCount, sep = "" ) )
Get just the tie rows and columns for initializing network libraries.
In [9]:
# the below syntax returns only as many columns as there are rows, so
# omitting any trait columns that lie in columns on the right side
# of the file.
gwAutomatedNetworkDF <- gwAutomatedDataDF[ , 1 : gwAutomatedRowCount ]
#str( gwAutomatedNetworkDF )
In [10]:
# convert to a matrix
gwAutomatedNetworkMatrix <- as.matrix( gwAutomatedNetworkDF )
# str( gwAutomatedNetworkMatrix )
First, load the igraph package.
Once we get the data into an igraph object, run the code in the following for more in-depth information:
context_text/R/sna/statnet/sna-igraph-init.r
context_text/R/sna/statnet/sna-igraph-network-stats.r
In [11]:
#install.packages( "igraph" )
library( igraph )
Load our data matrix into an igraph object.
In [12]:
# load data into igraph instance.
gwAutomatedNetworkIgraph <- graph.adjacency( gwAutomatedNetworkMatrix, mode = "undirected", weighted = TRUE )
In [13]:
# add person_id (column 1168)
personIdColumnNumber <- 1168
# first, get just the data frame column with person ID:
personIdsColumn <- gwAutomatedDataDF[ , personIdColumnNumber ]
# populate list we will use to set node person_ID attribute
# Don't just do these - they don't convert to simple list/vector, contain remnants of data frame
#personIdsList <- personIdsColumn
#personIdsList <- c( personIdsColumn )
# Convert to a list of numbers.
personIdsList <- as.numeric( personIdsColumn )
# Try this if you have character attribute...
#personTypesList <- unname( unlist( personTypesColumn ) )
# set vertex/node attribute person_id
V( gwAutomatedNetworkIgraph )$person_id <- personIdsList
# OR use function:
#gwAutomatedNetworkIgraph <- set.vertex.attribute( gwAutomatedNetworkIgraph, "person_id", value = personIdsList )
# look at graph and person_type attribute values
gwAutomatedNetworkIgraph
V( gwAutomatedNetworkIgraph )$person_id
In [14]:
# add person_type (column 1169)
personTypeColumnNumber <- 1169
# first, get just the data frame column with person type:
personTypesColumn <- gwAutomatedDataDF[ , personTypeColumnNumber ]
# populate list we will use to set node person_type attribute
# Don't just do these - they don't convert to simple list/vector, contain remnants of data frame
#personTypesList <- personTypesColumn
#personTypesList <- c( personTypesColumn )
# Convert to a list of numbers.
personTypesList <- as.numeric( personTypesColumn )
# Try this if you have character attribute...
#personTypesList <- unname( unlist( personTypesColumn ) )
# set vertex/node attribute person_type
V( gwAutomatedNetworkIgraph )$person_type <- personTypesList
# OR use function:
#test1_igraph <- set.vertex.attribute( test1_igraph, "person_type", value = person_types_list )
# look at graph and person_type attribute values
gwAutomatedNetworkIgraph
V( gwAutomatedNetworkIgraph )$person_type
In [15]:
# calais - include ties Greater than or equal to 0 (GE0)
gwAutomatedMeanTieWeightGE0Vector <- apply( gwAutomatedNetworkMatrix, 1, calculateListMean )
gwAutomatedDataDF$meanTieWeightGE0 <- gwAutomatedMeanTieWeightGE0Vector
# calais - include ties Greater than or equal to 1 (GE1)
gwAutomatedMeanTieWeightGE1Vector <- apply( gwAutomatedNetworkMatrix, 1, calculateListMean, minValueToIncludeIN = 1 )
gwAutomatedDataDF$meanTieWeightGE1 <- gwAutomatedMeanTieWeightGE1Vector
# automated - Max tie weight?
gwAutomatedMaxTieWeightVector <- apply( gwAutomatedNetworkMatrix, 1, calculateListMax )
gwAutomatedDataDF$maxTieWeight <- gwAutomatedMaxTieWeightVector
In [16]:
# to see count of nodes and edges, just type the object name:
gwAutomatedNetworkIgraph
# Will output something like:
#
# IGRAPH UNW- 314 309 --
# + attr: name (v/c), weight (e/n)
#
# in the first line, "UNW-" are traits of graph:
# - 1 - U = undirected ( directed would be "D" )
# - 2 - N = named or not ( "-" instead of "N" )
# - 3 - W = weighted
# - 4 - B = bipartite ( "-" = not bipartite )
# 314 is where node count goes, 309 is edge count.
# The second line gives you information about the 'attributes' associated with the graph. In this case, there are two attributes, name and weight. Next to each attribute name is a two-character construct that looks like "(v/c)". The first letter is the thing the attribute is associated with (g = graph, v = vertex or node, e = edge). The second is the type of the attribute (c = character data, n = numeric data). So, in this case:
# - name (v/c) - the name attribute is a vertex/node attribute - the "v" in "(v/c)" - where the values are character data - the "c" in "(v/c)".
# - weight (e/n) - the weight attribute is an edge attribute - the "e" in "(e/n)" - where the values are numeric data - the "n" in "(e/n)".
# - based on: http://www.shizukalab.com/toolkits/sna/sna_data
In [17]:
# try calling the degree() function on an igraph object. Returns a number vector with names.
gwAutomatedDegreeVector <- igraph::degree( gwAutomatedNetworkIgraph )
# For help with igraph::degree function:
#??igraph::degree
# calculate the mean of the degrees.
gwAutomatedAvgDegree <- mean( gwAutomatedDegreeVector )
message( paste( "grp_week automated average degree = ", gwAutomatedAvgDegree, sep = "" ) )
# append the degrees to the network as a vertex attribute.
V( gwAutomatedNetworkIgraph )$degree <- gwAutomatedDegreeVector
# also add degree vector to original data frame
gwAutomatedDataDF$degree <- gwAutomatedDegreeVector
# if you want to just work with the traits of the nodes/vertexes, you can
# combine the attribute vectors into a data frame.
# first, output igraph object to see what attributes you have
gwAutomatedNetworkIgraph
V( gwAutomatedNetworkIgraph )$degree
Calculate average source and author degree:
In [18]:
# average author degree (person types 2 and 4)
gwAutomatedAverageAuthorDegree2And4 <- calcAuthorMeanDegree( dataFrameIN = gwAutomatedDataDF, includeBothIN = TRUE )
message( paste( "grp_week automated average author degree (2 and 4) = ", gwAutomatedAverageAuthorDegree2And4, sep = "" ) )
# average author degree (person type 2 only)
gwAutomatedAverageAuthorDegreeOnly2 <- calcAuthorMeanDegree( dataFrameIN = gwAutomatedDataDF, includeBothIN = FALSE )
message( paste( "grp_week automated average author degree (only 2) = ", gwAutomatedAverageAuthorDegreeOnly2, sep = "" ) )
# average source degree (person types 3 and 4)
gwAutomatedAverageSourceDegree3And4 <- calcSourceMeanDegree( dataFrameIN = gwAutomatedDataDF, includeBothIN = TRUE )
message( paste( "grp_week automated average source degree (3 and 4) = ", gwAutomatedAverageSourceDegree3And4, sep = "" ) )
# average source degree (person type 3 only)
gwAutomatedAverageSourceDegreeOnly3 <- calcSourceMeanDegree( dataFrameIN = gwAutomatedDataDF, includeBothIN = FALSE )
message( paste( "grp_week automated average source degree (only 3) = ", gwAutomatedAverageSourceDegreeOnly3, sep = "" ) )
Once we get the data into an igraph object, run the code in the following for more in-depth information:
context_text/R/sna/statnet/sna-igraph-init.r
context_text/R/sna/statnet/sna-igraph-network-stats.r
In [19]:
# First, need to load SNA functions and load data into statnet network object.
# For more details on that, see the files "functions-sna.r",
# "sna-load_data.r" and "sna-igraph_init.r".
#
# assumes that working directory for statnet is context_text/R/igraph
# setwd( ".." )
# source( "functions-sna.r" )
# source( "sna-load_data.r" )
# setwd( "igraph" )
# source( "sna-igraph-init.r" )
# results in (among other things):
# - humanNetworkData - data frame with human-generated network data matrix in it, including columns on the right side for any node-specific attributes.
# - calaisNetworkData - data frame with computer-generated network data matrix in it, including columns on the right side for any node-specific attributes.
# - humanNetworkTies - data frame with only human-generated network data matrix in it, no node-specific attributes.
# - calaisNetworkTies - data frame with only computer-generated network data matrix in it, no node-specific attributes.
# - humanNetworkMatrix - matrix with only human-generated network data matrix in it, no node-specific attributes.
# - calaisNetworkMatrix - matrix with only computer-generated network data matrix in it, no node-specific attributes.
# - humanNetworkIgraph - igraph network with human-coded network in it, including node-specific attributes.
# - calaisNetworkIgraph - igraph network with computer-coded network in it, including node-specific attributes.
# Links:
# - CRAN page: http://cran.r-project.org/web/packages/igraph/index.html
# - Manual (PDF): http://cran.r-project.org/web/packages/igraph/igraph.pdf
# - intro.: http://horicky.blogspot.com/2012/04/basic-graph-analytics-using-igraph.html
# - good notes: http://www.shizukalab.com/toolkits/sna/node-level-calculations
# Also, be advised that statnet and igraph don't really play nice together.
# If you'll be using both, best idea is to have a workspace for each.
#==============================================================================#
# igraph
#==============================================================================#
# Good notes:
# - http://assemblingnetwork.wordpress.com/2013/06/10/network-basics-with-r-and-igraph-part-ii-of-iii/
# make sure you've loaded the igraph library
# install.packages( "igraph" )
library( igraph )
#==============================================================================#
# NODE level
#==============================================================================#
# calculate the mean of the degrees.
gwAutomatedDegreeMean <- gwAutomatedAvgDegree
message( paste( "grp_week automated degree mean = ", gwAutomatedDegreeMean, sep = "" ) )
# what is the standard deviation of these degrees?
gwAutomatedDegreeSd <- sd( gwAutomatedDegreeVector )
message( paste( "grp_week automated degree SD = ", gwAutomatedDegreeSd, sep = "" ) )
# what is the variance of these degrees?
gwAutomatedDegreeVar <- var( gwAutomatedDegreeVector )
message( paste( "grp_week automated degree Variance = ", gwAutomatedDegreeVar, sep = "" ) )
# what is the max value among these degrees?
gwAutomatedDegreeMax <- max( gwAutomatedDegreeVector )
message( paste( "grp_week automated degree Max Value = ", gwAutomatedDegreeMax, sep = "" ) )
# calculate and plot degree distributions
gwAutomatedDegreeFrequenciesTable <- table( gwAutomatedDegreeVector )
gwAutomatedDegreeDistribution <- igraph::degree.distribution( gwAutomatedNetworkIgraph )
plot( gwAutomatedDegreeDistribution, xlab = "grp_week automated node degree" )
lines( gwAutomatedDegreeDistribution )
# subset vector to get only those that are above mean
gwAutomatedAboveMeanVector <- gwAutomatedDegreeVector[ gwAutomatedDegreeVector > gwAutomatedDegreeMax ]
# node-level transitivity
# create transitivity vectors.
gwAutomatedTransitivityVector <- igraph::transitivity( gwAutomatedNetworkIgraph, type = "local" )
# append the transitivity to the network as a vertex attribute.
V( gwAutomatedNetworkIgraph )$transitivity <- gwAutomatedTransitivityVector
# also add transitivity vector to original data frame
gwAutomatedDataDF$transitivity <- gwAutomatedTransitivityVector
# And, if you want averages of these:
gwAutomatedMeanTransitivity <- mean( gwAutomatedTransitivityVector, na.rm = TRUE )
message( paste( "grp_week automated mean transitivity = ", gwAutomatedMeanTransitivity, sep = "" ) )
#==============================================================================#
# NETWORK level
#==============================================================================#
#------------------------------------------------------------------------------#
# ==> graph-level degree centrality
#
# Returns a named list with the following components:
# res - The node-level centrality scores
# centralization - The graph level centrality index.
# theoretical_max - The maximum theoretical graph level centralization score
# for a graph with the given number of vertices, using the same parameters.
# If the normalized argument was TRUE (the default), then the result was
# divided by this number.
gwAutomatedDegreeCentralityOutput <- igraph::centralization.degree( gwAutomatedNetworkIgraph )
gwAutomatedDegreeCentrality <- gwAutomatedDegreeCentralityOutput$centralization
gwAutomatedDegreeCentralityMax <- gwAutomatedDegreeCentralityOutput$theoretical_max
message( paste( "grp_week automated degree centrality = ", gwAutomatedDegreeCentrality, " ( max = ", gwAutomatedDegreeCentralityMax, " )", sep = "" ) )
# node-level degree centrality
gwAutomatedDegreeCentralityVector <- gwAutomatedDegreeCentralityOutput$res
#message( paste( "grp_week automated betweenness = ", gwAutomatedBetweenness, sep = "" ) )
# append the degree centrality to the network as a vertex attribute.
V( gwAutomatedNetworkIgraph )$degreeCentrality <- gwAutomatedDegreeCentralityVector
# also add degree centrality vector to original data frame
gwAutomatedDataDF$degreeCentrality <- gwAutomatedDegreeCentralityVector
# And, if you want averages of these:
gwAutomatedMeanDegreeCentrality <- mean( gwAutomatedDegreeCentralityVector, na.rm = TRUE )
message( paste( "grp_week automated mean degree centrality = ", gwAutomatedMeanDegreeCentrality, sep = "" ) )
#------------------------------------------------------------------------------#
# ==> graph-level undirected betweenness
#
# Returns a named list with the following components:
# res - The node-level centrality scores
# centralization - The graph level centrality index.
# theoretical_max - The maximum theoretical graph level centralization score
# for a graph with the given number of vertices, using the same parameters.
# If the normalized argument was TRUE (the default), then the result was
# divided by this number.
gwAutomatedBetweennessCentralityOutput <- igraph::centralization.betweenness( gwAutomatedNetworkIgraph, directed = FALSE )
gwAutomatedBetweennessCentrality <- gwAutomatedBetweennessCentralityOutput$centralization
gwAutomatedBetweennessCentralityMax <- gwAutomatedBetweennessCentralityOutput$theoretical_max
message( paste( "grp_week automated betweenness centrality = ", gwAutomatedBetweennessCentrality, " ( max = ", gwAutomatedBetweennessCentralityMax, " )", sep = "" ) )
# node-level undirected betweenness
gwAutomatedBetweennessVector <- gwAutomatedBetweennessCentralityOutput$res
#message( paste( "grp_week automated betweenness = ", gwAutomatedBetweenness, sep = "" ) )
# append the betweenness to the network as a vertex attribute.
V( gwAutomatedNetworkIgraph )$betweenness <- gwAutomatedBetweennessVector
# also add betweenness vector to original data frame
gwAutomatedDataDF$betweenness <- gwAutomatedBetweennessVector
# And, if you want averages of these:
gwAutomatedMeanBetweenness <- mean( gwAutomatedBetweennessVector, na.rm = TRUE )
message( paste( "grp_week automated mean betweenness = ", gwAutomatedMeanBetweenness, sep = "" ) )
# graph-level transitivity
gwAutomatedTransitivity <- igraph::transitivity( gwAutomatedNetworkIgraph, type = "global" )
message( paste( "grp_week automated transitivity = ", gwAutomatedTransitivity, sep = "" ) )
# graph-level density
gwAutomatedDensity <- igraph::graph.density( gwAutomatedNetworkIgraph )
message( paste( "grp_week automated density = ", gwAutomatedDensity, sep = "" ) )
#==============================================================================#
# output attributes to data frame
#==============================================================================#
# if you want to just work with the traits of the nodes/vertexes, you can
# combine the attribute vectors into a data frame.
# first, output igraph object to see what attributes you have
gwAutomatedNetworkIgraph
# then, combine them into a data frame.
gwAutomatedAttributeDF <- data.frame( id = V( gwAutomatedNetworkIgraph )$name,
person_id = V( gwAutomatedNetworkIgraph )$person_id,
person_type = V( gwAutomatedNetworkIgraph )$person_type,
degree = V( gwAutomatedNetworkIgraph )$degree,
transitivity = V( gwAutomatedNetworkIgraph )$transitivity,
degreeCentrality = V( gwAutomatedNetworkIgraph )$degreeCentrality,
betweenness = V( gwAutomatedNetworkIgraph )$betweenness )
grp_week_1
(gw1) - humanNext, we'll analyze a week of the month of data coded by humans. Set up some variables to store where data is located:
In [23]:
# initialize variables
gwHumanDataFolder <- paste( data_directory, "/network/grp_month", sep = "" )
gwHumanDataFile <- "sourcenet_data-20171206-031319-grp_month-human-week1_subset.tab"
gwHumanDataPath <- paste( gwHumanDataFolder, "/", gwHumanDataFile, sep = "" )
In [24]:
gwHumanDataPath
Load the data file into memory
In [25]:
# tab-delimited:
gwHumanDataDF <- read.delim( gwHumanDataPath, header = TRUE, row.names = 1, check.names = FALSE )
In [26]:
# get count of rows...
gwHumanRowCount <- nrow( gwHumanDataDF )
message( paste( "grp_week human row count = ", gwHumanRowCount, sep = "" ) )
# ...and columns
gwHumanColumnCount <- ncol( gwHumanDataDF )
message( paste( "grp_week human column count = ", gwHumanColumnCount, sep = "" ) )
Get just the tie rows and columns for initializing network libraries.
In [27]:
# the below syntax returns only as many columns as there are rows, so
# omitting any trait columns that lie in columns on the right side
# of the file.
gwHumanNetworkDF <- gwHumanDataDF[ , 1 : gwHumanRowCount ]
#str( gwAutomatedNetworkDF )
In [28]:
# convert to a matrix
gwHumanNetworkMatrix <- as.matrix( gwHumanNetworkDF )
# str( gwHumanNetworkMatrix )
First, load the igraph package.
Once we get the data into an igraph object, run the code in the following for more in-depth information:
context_text/R/sna/statnet/sna-igraph-init.r
context_text/R/sna/statnet/sna-igraph-network-stats.r
In [29]:
#install.packages( "igraph" )
library( igraph )
Load our data matrix into an igraph object.
In [30]:
# load data into igraph instance.
gwHumanNetworkIgraph <- graph.adjacency( gwHumanNetworkMatrix, mode = "undirected", weighted = TRUE )
In [31]:
# add person_id (column 1168)
personIdColumnNumber <- 1168
# first, get just the data frame column with person ID:
personIdsColumn <- gwHumanDataDF[ , personIdColumnNumber ]
# populate list we will use to set node person_ID attribute
# Don't just do these - they don't convert to simple list/vector, contain remnants of data frame
#personIdsList <- personIdsColumn
#personIdsList <- c( personIdsColumn )
# Convert to a list of numbers.
personIdsList <- as.numeric( personIdsColumn )
# Try this if you have character attribute...
#personTypesList <- unname( unlist( personTypesColumn ) )
# set vertex/node attribute person_id
V( gwHumanNetworkIgraph )$person_id <- personIdsList
# OR use function:
#gwHumanNetworkIgraph <- set.vertex.attribute( gwHumanNetworkIgraph, "person_id", value = personIdsList )
# look at graph and person_type attribute values
gwHumanNetworkIgraph
V( gwHumanNetworkIgraph )$person_id
In [32]:
# add person_type (column 1169)
personTypeColumnNumber <- 1169
# first, get just the data frame column with person type:
personTypesColumn <- gwHumanDataDF[ , personTypeColumnNumber ]
# populate list we will use to set node person_type attribute
# Don't just do these - they don't convert to simple list/vector, contain remnants of data frame
#personTypesList <- personTypesColumn
#personTypesList <- c( personTypesColumn )
# Convert to a list of numbers.
personTypesList <- as.numeric( personTypesColumn )
# Try this if you have character attribute...
#personTypesList <- unname( unlist( personTypesColumn ) )
# set vertex/node attribute person_type
V( gwHumanNetworkIgraph )$person_type <- personTypesList
# OR use function:
#test1_igraph <- set.vertex.attribute( test1_igraph, "person_type", value = person_types_list )
# look at graph and person_type attribute values
gwHumanNetworkIgraph
V( gwHumanNetworkIgraph )$person_type
In [33]:
# human - include ties Greater than or equal to 0 (GE0)
gwHumanMeanTieWeightGE0Vector <- apply( gwHumanNetworkMatrix, 1, calculateListMean )
gwHumanDataDF$meanTieWeightGE0 <- gwHumanMeanTieWeightGE0Vector
# human - include ties Greater than or equal to 1 (GE1)
gwHumanMeanTieWeightGE1Vector <- apply( gwHumanNetworkMatrix, 1, calculateListMean, minValueToIncludeIN = 1 )
gwHumanDataDF$meanTieWeightGE1 <- gwHumanMeanTieWeightGE1Vector
# human - Max tie weight?
gwHumanMaxTieWeightVector <- apply( gwHumanNetworkMatrix, 1, calculateListMax )
gwHumanDataDF$maxTieWeight <- gwHumanMaxTieWeightVector
In [34]:
# to see count of nodes and edges, just type the object name:
gwHumanNetworkIgraph
# Will output something like:
#
# IGRAPH UNW- 314 309 --
# + attr: name (v/c), weight (e/n)
#
# in the first line, "UNW-" are traits of graph:
# - 1 - U = undirected ( directed would be "D" )
# - 2 - N = named or not ( "-" instead of "N" )
# - 3 - W = weighted
# - 4 - B = bipartite ( "-" = not bipartite )
# 314 is where node count goes, 309 is edge count.
# The second line gives you information about the 'attributes' associated with the graph. In this case, there are two attributes, name and weight. Next to each attribute name is a two-character construct that looks like "(v/c)". The first letter is the thing the attribute is associated with (g = graph, v = vertex or node, e = edge). The second is the type of the attribute (c = character data, n = numeric data). So, in this case:
# - name (v/c) - the name attribute is a vertex/node attribute - the "v" in "(v/c)" - where the values are character data - the "c" in "(v/c)".
# - weight (e/n) - the weight attribute is an edge attribute - the "e" in "(e/n)" - where the values are numeric data - the "n" in "(e/n)".
# - based on: http://www.shizukalab.com/toolkits/sna/sna_data
In [35]:
# try calling the degree() function on an igraph object. Returns a number vector with names.
gwHumanDegreeVector <- igraph::degree( gwHumanNetworkIgraph )
# For help with igraph::degree function:
#??igraph::degree
# calculate the mean of the degrees.
gwHumanAvgDegree <- mean( gwHumanDegreeVector )
message( paste( "grp_week human average degree = ", gwHumanAvgDegree, sep = "" ) )
# append the degrees to the network as a vertex attribute.
V( gwHumanNetworkIgraph )$degree <- gwHumanDegreeVector
# also add degree vector to original data frame
gwHumanDataDF$degree <- gwHumanDegreeVector
# if you want to just work with the traits of the nodes/vertexes, you can
# combine the attribute vectors into a data frame.
# first, output igraph object to see what attributes you have
gwHumanNetworkIgraph
V( gwHumanNetworkIgraph )$degree
Calculate average source and author degree:
In [36]:
# average author degree (person types 2 and 4)
gwHumanAverageAuthorDegree2And4 <- calcAuthorMeanDegree( dataFrameIN = gwHumanDataDF, includeBothIN = TRUE )
message( paste( "grp_week human average author degree (2 and 4) = ", gwHumanAverageAuthorDegree2And4, sep = "" ) )
# average author degree (person type 2 only)
gwHumanAverageAuthorDegreeOnly2 <- calcAuthorMeanDegree( dataFrameIN = gwHumanDataDF, includeBothIN = FALSE )
message( paste( "grp_week human average author degree (only 2) = ", gwHumanAverageAuthorDegreeOnly2, sep = "" ) )
# average source degree (person types 3 and 4)
gwHumanAverageSourceDegree3And4 <- calcSourceMeanDegree( dataFrameIN = gwHumanDataDF, includeBothIN = TRUE )
message( paste( "grp_week human average source degree (3 and 4) = ", gwHumanAverageSourceDegree3And4, sep = "" ) )
# average source degree (person type 3 only)
gwHumanAverageSourceDegreeOnly3 <- calcSourceMeanDegree( dataFrameIN = gwHumanDataDF, includeBothIN = FALSE )
message( paste( "grp_week human average source degree (only 3) = ", gwHumanAverageSourceDegreeOnly3, sep = "" ) )
Once we get the data into an igraph object, run the code in the following for more in-depth information:
context_text/R/sna/statnet/sna-igraph-init.r
context_text/R/sna/statnet/sna-igraph-network-stats.r
In [37]:
# First, need to load SNA functions and load data into statnet network object.
# For more details on that, see the files "functions-sna.r",
# "sna-load_data.r" and "sna-igraph_init.r".
#
# assumes that working directory for statnet is context_text/R/igraph
# setwd( ".." )
# source( "functions-sna.r" )
# source( "sna-load_data.r" )
# setwd( "igraph" )
# source( "sna-igraph-init.r" )
# results in (among other things):
# - humanNetworkData - data frame with human-generated network data matrix in it, including columns on the right side for any node-specific attributes.
# - calaisNetworkData - data frame with computer-generated network data matrix in it, including columns on the right side for any node-specific attributes.
# - humanNetworkTies - data frame with only human-generated network data matrix in it, no node-specific attributes.
# - calaisNetworkTies - data frame with only computer-generated network data matrix in it, no node-specific attributes.
# - humanNetworkMatrix - matrix with only human-generated network data matrix in it, no node-specific attributes.
# - calaisNetworkMatrix - matrix with only computer-generated network data matrix in it, no node-specific attributes.
# - humanNetworkIgraph - igraph network with human-coded network in it, including node-specific attributes.
# - calaisNetworkIgraph - igraph network with computer-coded network in it, including node-specific attributes.
# Links:
# - CRAN page: http://cran.r-project.org/web/packages/igraph/index.html
# - Manual (PDF): http://cran.r-project.org/web/packages/igraph/igraph.pdf
# - intro.: http://horicky.blogspot.com/2012/04/basic-graph-analytics-using-igraph.html
# - good notes: http://www.shizukalab.com/toolkits/sna/node-level-calculations
# Also, be advised that statnet and igraph don't really play nice together.
# If you'll be using both, best idea is to have a workspace for each.
#==============================================================================#
# igraph
#==============================================================================#
# Good notes:
# - http://assemblingnetwork.wordpress.com/2013/06/10/network-basics-with-r-and-igraph-part-ii-of-iii/
# make sure you've loaded the igraph library
# install.packages( "igraph" )
library( igraph )
#==============================================================================#
# NODE level
#==============================================================================#
# calculate the mean of the degrees.
gwHumanDegreeMean <- gwHumanAvgDegree
message( paste( "grp_week human degree mean = ", gwHumanDegreeMean, sep = "" ) )
# what is the standard deviation of these degrees?
gwHumanDegreeSd <- sd( gwHumanDegreeVector )
message( paste( "grp_week human degree SD = ", gwHumanDegreeSd, sep = "" ) )
# what is the variance of these degrees?
gwHumanDegreeVar <- var( gwHumanDegreeVector )
message( paste( "grp_week human degree Variance = ", gwHumanDegreeVar, sep = "" ) )
# what is the max value among these degrees?
gwHumanDegreeMax <- max( gwHumanDegreeVector )
message( paste( "grp_week human degree Max Value = ", gwHumanDegreeMax, sep = "" ) )
# calculate and plot degree distributions
gwHumanDegreeFrequenciesTable <- table( gwHumanDegreeVector )
gwHumanDegreeDistribution <- igraph::degree.distribution( gwHumanNetworkIgraph )
plot( gwHumanDegreeDistribution, xlab = "grp_week human node degree" )
lines( gwHumanDegreeDistribution )
# subset vector to get only those that are above mean
gwHumanAboveMeanVector <- gwHumanDegreeVector[ gwHumanDegreeVector > gwHumanDegreeMax ]
# node-level transitivity
# create transitivity vectors.
gwHumanTransitivityVector <- igraph::transitivity( gwHumanNetworkIgraph, type = "local" )
# append the transitivity to the network as a vertex attribute.
V( gwHumanNetworkIgraph )$transitivity <- gwHumanTransitivityVector
# also add transitivity vector to original data frame
gwHumanDataDF$transitivity <- gwHumanTransitivityVector
# And, if you want averages of these:
gwHumanMeanTransitivity <- mean( gwHumanTransitivityVector, na.rm = TRUE )
message( paste( "grp_week human mean transitivity = ", gwHumanMeanTransitivity, sep = "" ) )
#==============================================================================#
# NETWORK level
#==============================================================================#
#------------------------------------------------------------------------------#
# ==> graph-level degree centrality
#
# Returns a named list with the following components:
# res - The node-level centrality scores
# centralization - The graph level centrality index.
# theoretical_max - The maximum theoretical graph level centralization score
# for a graph with the given number of vertices, using the same parameters.
# If the normalized argument was TRUE (the default), then the result was
# divided by this number.
gwHumanDegreeCentralityOutput <- igraph::centralization.degree( gwHumanNetworkIgraph )
gwHumanDegreeCentrality <- gwHumanDegreeCentralityOutput$centralization
gwHumanDegreeCentralityMax <- gwHumanDegreeCentralityOutput$theoretical_max
message( paste( "grp_week human degree centrality = ", gwHumanDegreeCentrality, " ( max = ", gwHumanDegreeCentralityMax, " )", sep = "" ) )
# node-level degree centrality
gwHumanDegreeCentralityVector <- gwHumanDegreeCentralityOutput$res
#message( paste( "grp_week human betweenness = ", gwHumanBetweenness, sep = "" ) )
# append the degree centrality to the network as a vertex attribute.
V( gwHumanNetworkIgraph )$degreeCentrality <- gwHumanDegreeCentralityVector
# also add degree centrality vector to original data frame
gwHumanDataDF$degreeCentrality <- gwHumanDegreeCentralityVector
# And, if you want averages of these:
gwHumanMeanDegreeCentrality <- mean( gwHumanDegreeCentralityVector, na.rm = TRUE )
message( paste( "grp_week human mean degree centrality = ", gwHumanMeanDegreeCentrality, sep = "" ) )
#------------------------------------------------------------------------------#
# ==> graph-level undirected betweenness
#
# Returns a named list with the following components:
# res - The node-level centrality scores
# centralization - The graph level centrality index.
# theoretical_max - The maximum theoretical graph level centralization score
# for a graph with the given number of vertices, using the same parameters.
# If the normalized argument was TRUE (the default), then the result was
# divided by this number.
gwHumanBetweennessCentralityOutput <- igraph::centralization.betweenness( gwHumanNetworkIgraph, directed = FALSE )
gwHumanBetweennessCentrality <- gwHumanBetweennessCentralityOutput$centralization
gwHumanBetweennessCentralityMax <- gwHumanBetweennessCentralityOutput$theoretical_max
message( paste( "grp_week human betweenness centrality = ", gwHumanBetweennessCentrality, " ( max = ", gwHumanBetweennessCentralityMax, " )", sep = "" ) )
# node-level undirected betweenness
gwHumanBetweennessVector <- gwHumanBetweennessCentralityOutput$res
#message( paste( "grp_week human betweenness = ", gwHumanBetweenness, sep = "" ) )
# append the betweenness to the network as a vertex attribute.
V( gwHumanNetworkIgraph )$betweenness <- gwHumanBetweennessVector
# also add betweenness vector to original data frame
gwHumanDataDF$betweenness <- gwHumanBetweennessVector
# And, if you want averages of these:
gwHumanMeanBetweenness <- mean( gwHumanBetweennessVector, na.rm = TRUE )
message( paste( "grp_week human mean betweenness = ", gwHumanMeanBetweenness, sep = "" ) )
# graph-level transitivity
gwHumanTransitivity <- igraph::transitivity( gwHumanNetworkIgraph, type = "global" )
message( paste( "grp_week human transitivity = ", gwHumanTransitivity, sep = "" ) )
# graph-level density
gwHumanDensity <- igraph::graph.density( gwHumanNetworkIgraph )
message( paste( "grp_week human density = ", gwHumanDensity, sep = "" ) )
#==============================================================================#
# output attributes to data frame
#==============================================================================#
# if you want to just work with the traits of the nodes/vertexes, you can
# combine the attribute vectors into a data frame.
# first, output igraph object to see what attributes you have
gwHumanNetworkIgraph
# then, combine them into a data frame.
gwHumanAttributeDF <- data.frame( id = V( gwHumanNetworkIgraph )$name,
person_id = V( gwHumanNetworkIgraph )$person_id,
person_type = V( gwHumanNetworkIgraph )$person_type,
degree = V( gwHumanNetworkIgraph )$degree,
transitivity = V( gwHumanNetworkIgraph )$transitivity,
degreeCentrality = V( gwHumanNetworkIgraph )$degreeCentrality,
betweenness = V( gwHumanNetworkIgraph )$betweenness )
Save all the information in the current image, in case we need/want it later.
In [38]:
message( paste( "workspace_file_name = ", workspace_file_name, sep = "" ) )
In [39]:
# help( save.image )
save.image( file = workspace_file_name )
message( paste( "saved workspace_file_name = ", workspace_file_name, sep = "" ) )