2017.12.02 - work log - prelim - R - statnet - grp_month - 0/1

1 R network analysis files
2 Setup
3 grp_month analysis
4 grp_week analysis
5 Compare grp_month and grp_week using QAP
- 5.1 month-to-week - automated
- 5.2 month-to-week - human
6 Save workspace image
7 TODO

R network analysis files

Back to Table of Contents

Related files:

network descriptives
- network-level
  - files
    - R scripts:
      - context_text/R/db_connect.r
      - context_text/R/sna/functions-sna.r
      - context_text/R/sna/sna-load_data.r
      - context_text/R/sna/igraph/*
      - context_text/R/sna/statnet/*
  - statnet/sna
    - sna::gden() - graph density
    - R scripts:
      - context_text/R/sna/statnet/sna-statnet-init.r
      - context_text/R/sna/statnet/sna-statnet-network-stats.r
      - context_text/R/sna/statnet/sna-qap.r
  - igraph
    - igraph::transitivity() - vector of transitivity scores for each node in a graph, plus network-level transitivity score.
      - Q - interpretation?
    - R scripts:
      - context_text/R/sna/statnet/sna-igraph-init.r
      - context_text/R/sna/statnet/sna-igraph-network-stats.r

Setup

Back to Table of Contents

Setup - working directories

Back to Table of Contents

Store important directories and file names in variables:



In [1]:

    
# code files (in particular SNA function library, modest though it may be)
code_directory <- "/home/jonathanmorgan/work/django/research/context_analysis/R/sna"
sna_function_file_path <- paste( code_directory, "/", 'functions-sna.r', sep = "" )

# home directory
home_directory <- getwd()
home_directory <- "/home/jonathanmorgan/work/django/research/work/phd_work/methods"

# data directories
data_directory <- paste( home_directory, "/data", sep = "" )
workspace_file_name <- "statnet-grp_month-01.RData"
workspace_file_path <- paste( data_directory, "/", workspace_file_name )



In [2]:

    
# set working directory
setwd( data_directory )
getwd()









    




'/home/jonathanmorgan/work/django/research/work/phd_work/data'

Setup - import SNA functions

Back to Table of Contents

source the file functions-sna.r.



In [3]:

    
source( sna_function_file_path )

Setup - network data - render and store network data

Back to Table of Contents

First, need render to render network data and upload it to your server.

Directions for rendering network data are in 2017.11.14-work_log-prelim-network_analysis.ipynb. You want a tab-delimited matrix that includes both the network and attributes of nodes as columns, and you want it to include a header row.

Once you render your network data files, you should place them on the server.

High level data file layout:

tab-delimited.
first row and first column are labels
last 2 columns are traits of nodes (person_id and person_type)
each row and column after first until the trait columns represents a person found in one of the articles.
The people are in the same order from top to bottom and left to right.
Where the row and column of two people meet, and one of the people is an author, the nunber in the cell where they meet is the number of times the non-author was quoted in an article by the author. Does not include more basic two-mode co-location ties (appeared in same article, even if not an author and/or not quoted).

Files and their location on server:

data - grp_month

Back to Table of Contents

This is data from the Grand Rapids Press articles from December of 2009, coded by both humans and OpenCalais.

Files:

automated full month - sourcenet_data-20171205-022551-grp_month-automated.tab
automated week subset - sourcenet_data-20171206-031358-grp_month-automated-week_subset.tab
human full month - sourcenet_data-20171115-043102-grp_month-human.tab
human week subset - sourcenet_data-20171206-031319-grp_month-human-week_subset.tab

Location in Dropbox: Dropbox/academia/MSU/program_stuff/prelim_paper/data/network_analysis/2017.11.14/network/new_coders/grp_month

Location on server: /home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month

Setup - load workspace (optional)

Back to Table of Contents

If you want, you can load this file's workspace, from a previous run:



In [4]:

    
# assumes that you've already set working directory above to the
#     working directory.
setwd( data_directory )
load( workspace_file_name )

`grp_month` analysis

Back to Table of Contents

First, look at the shiny new month of data.

`grp_month` (gm) - automated - OpenCalais

Return to Table of Contents

First, we'll analyze the month of data coded by OpenCalais. Set up some variables to store where data is located:

`grp_month` (gm) - automated - Read data

Return to Table of Contents

Read in the data from tab-delimited data file, then get it in right data structures for use in R SNA.



In [5]:

    
# initialize variables
gmAutomatedDataFolder <- "/home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month"
gmAutomatedDataFile <- "sourcenet_data-20171205-022551-grp_month-automated.tab"
gmAutomatedDataPath <- paste( gmAutomatedDataFolder, "/", gmAutomatedDataFile, sep = "" )



In [6]:

    
gmAutomatedDataPath









    




'/home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month/sourcenet_data-20171205-022551-grp_month-automated.tab'

Load the data file into memory



In [7]:

    
# tab-delimited:
gmAutomatedDataDF <- read.delim( gmAutomatedDataPath, header = TRUE, row.names = 1, check.names = FALSE )



In [8]:

    
# get count of rows...
gmAutomatedRowCount <- nrow( gmAutomatedDataDF )
paste( "grp_month automated row count = ", gmAutomatedRowCount, sep = "" )

# ...and columns
gmAutomatedColumnCount <- ncol( gmAutomatedDataDF )
paste( "grp_month automated column count = ", gmAutomatedColumnCount, sep = "" )









    




'grp_month automated row count = 1167'






    




'grp_month automated column count = 1169'

Get just the tie rows and columns for initializing network libraries.



In [9]:

    
# the below syntax returns only as many columns as there are rows, so
#     omitting any trait columns that lie in columns on the right side
#     of the file.
gmAutomatedNetworkDF <- gmAutomatedDataDF[ , 1 : gmAutomatedRowCount ]
#str( gmAutomatedNetworkDF )



In [10]:

    
# convert to a matrix
gmAutomatedNetworkMatrix <- as.matrix( gmAutomatedNetworkDF )
# str( gmAutomatedNetworkMatrix )



In [11]:

    
# for all values greater than 1, reset their values to 1
gmAutomatedNetworkMatrix[ gmAutomatedNetworkMatrix > 1 ] = 1

`grp_month` (gm) - automated - initialize statnet

Back to Table of Contents

First, load the statnet package, then load the automated grp_month data into statnet object and assign attributes to nodes.

Based on context_text/R/sna/statnet/sna-statnet-init.r.



In [12]:

    
# make sure you've loaded the statnet library
# install.packages( "statnet" )
library( statnet )









    



Loading required package: tergm
Loading required package: statnet.common

Attaching package: ‘statnet.common’

The following object is masked from ‘package:base’:

    order

Loading required package: ergm
Loading required package: network
network: Classes for Relational Data
Version 1.13.0 created on 2015-08-31.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
                    Mark S. Handcock, University of California -- Los Angeles
                    David R. Hunter, Penn State University
                    Martina Morris, University of Washington
                    Skye Bender-deMoll, University of Washington
 For citation information, type citation("network").
 Type help("network-package") to get started.


ergm: version 3.8.0, created on 2017-08-18
Copyright (c) 2017, Mark S. Handcock, University of California -- Los Angeles
                    David R. Hunter, Penn State University
                    Carter T. Butts, University of California -- Irvine
                    Steven M. Goodreau, University of Washington
                    Pavel N. Krivitsky, University of Wollongong
                    Martina Morris, University of Washington
                    with contributions from
                    Li Wang
                    Kirk Li, University of Washington
                    Skye Bender-deMoll, University of Washington
Based on "statnet" project software (statnet.org).
For license and citation information see statnet.org/attribution
or type citation("ergm").

NOTE: Versions before 3.6.1 had a bug in the implementation of the bd()
constriant which distorted the sampled distribution somewhat. In
addition, Sampson's Monks datasets had mislabeled vertices. See the
NEWS and the documentation for more details.

Loading required package: networkDynamic

networkDynamic: version 0.9.0, created on 2016-01-12
Copyright (c) 2016, Carter T. Butts, University of California -- Irvine
                    Ayn Leslie-Cook, University of Washington
                    Pavel N. Krivitsky, University of Wollongong
                    Skye Bender-deMoll, University of Washington
                    with contributions from
                    Zack Almquist, University of California -- Irvine
                    David R. Hunter, Penn State University
                    Li Wang
                    Kirk Li, University of Washington
                    Steven M. Goodreau, University of Washington
                    Jeffrey Horner
                    Martina Morris, University of Washington
Based on "statnet" project software (statnet.org).
For license and citation information see statnet.org/attribution
or type citation("networkDynamic").


tergm: version 3.4.1, created on 2017-09-12
Copyright (c) 2017, Pavel N. Krivitsky, University of Wollongong
                    Mark S. Handcock, University of California -- Los Angeles
                    with contributions from
                    David R. Hunter, Penn State University
                    Steven M. Goodreau, University of Washington
                    Martina Morris, University of Washington
                    Nicole Bohme Carnegie, New York University
                    Carter T. Butts, University of California -- Irvine
                    Ayn Leslie-Cook, University of Washington
                    Skye Bender-deMoll
                    Li Wang
                    Kirk Li, University of Washington
Based on "statnet" project software (statnet.org).
For license and citation information see statnet.org/attribution
or type citation("tergm").

Loading required package: ergm.count

ergm.count: version 3.2.2, created on 2016-03-29
Copyright (c) 2016, Pavel N. Krivitsky, University of Wollongong
                    with contributions from
                    Mark S. Handcock, University of California -- Los Angeles
                    David R. Hunter, Penn State University
Based on "statnet" project software (statnet.org).
For license and citation information see statnet.org/attribution
or type citation("ergm.count").

NOTE: The form of the term ‘CMP’ has been changed in version 3.2 of
‘ergm.count’. See the news or help('CMP') for more information.

Loading required package: sna
sna: Tools for Social Network Analysis
Version 2.4 created on 2016-07-23.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
 For citation information, type citation("sna").
 Type help(package="sna") to get started.


statnet: version 2016.9, created on 2016-08-29
Copyright (c) 2016, Mark S. Handcock, University of California -- Los Angeles
                    David R. Hunter, Penn State University
                    Carter T. Butts, University of California -- Irvine
                    Steven M. Goodreau, University of Washington
                    Pavel N. Krivitsky, University of Wollongong
                    Skye Bender-deMoll
                    Martina Morris, University of Washington
Based on "statnet" project software (statnet.org).
For license and citation information see statnet.org/attribution
or type citation("statnet").



In [13]:

    
# If you have a data frame of attributes (each attribute is a column, with
#     attribute name the column name), you can associate those attributes
#     when you create the network.
# attribute help: http://www.inside-r.org/packages/cran/network/docs/loading.attributes

# load attributes from a file:
#tab_attribute_test1 <- read.delim( "tab-test1-attribute_data.txt", header = TRUE, row.names = 1, check.names = FALSE )

# or create DataFrame by just grabbing the attribute columns
gmAutomatedNetworkAttributeDF <- gmAutomatedDataDF[ , 1168:1169 ]

# convert matrix to statnet network object instance.
gmAutomatedNetworkStatnet <- network( gmAutomatedNetworkMatrix, matrix.type = "adjacency", directed = FALSE, vertex.attr = gmAutomatedNetworkAttributeDF )

# look at information now.
gmAutomatedNetworkStatnet

# Network attributes:
#  vertices = 314
#  directed = FALSE
#  hyper = FALSE
#  loops = FALSE
#  multiple = FALSE
#  bipartite = FALSE
#  total edges= 309
#    missing edges= 0
#    non-missing edges= 309
#
# Vertex attribute names:
#    person_type vertex.names
#
# No edge attributes









    





 Network attributes:
  vertices = 1167 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 1152 
    missing edges= 0 
    non-missing edges= 1152 

 Vertex attribute names: 
    person_id person_type vertex.names 

 Edge attribute names not shown



In [5]:

    
# calais - include ties Greater than or equal to 0 (GE0)
gmAutomatedMeanTieWeightGE0Vector <- apply( gmAutomatedNetworkMatrix, 1, calculateListMean )
gmAutomatedDataDF$meanTieWeightGE0 <- gmAutomatedMeanTieWeightGE0Vector

# calais - include ties Greater than or equal to 1 (GE1)
gmAutomatedMeanTieWeightGE1Vector <- apply( gmAutomatedNetworkMatrix, 1, calculateListMean, minValueToIncludeIN = 1 )
gmAutomatedDataDF$meanTieWeightGE1 <- gmAutomatedMeanTieWeightGE1Vector

# automated - Max tie weight?
gmAutomatedMaxTieWeightVector <- apply( gmAutomatedNetworkMatrix, 1, calculateListMax )
gmAutomatedDataDF$maxTieWeight <- gmAutomatedMaxTieWeightVector

`grp_month` (gm) - automated - Basic metrics

Back to Table of Contents



In [14]:

    
# assuming that our statnet network object is in reference test1_statnet.

# Use the degree function in the sna package to create vector of degree values
#    for each node.  Make sure to pass the gmode parameter to tell it that the
#    graph is not directed (gmode = "graph", instead of "digraph").
# Doc: http://www.inside-r.org/packages/cran/sna/docs/degree
#degree_vector <- degree( test1_statnet, gmode = "graph" )

# If you have other libraries loaded that also implement a degree function, you
#    can also call this with package name:
gmAutomatedDegreeVector <- sna::degree( gmAutomatedNetworkStatnet, gmode = "graph" )

# output the vector
gmAutomatedDegreeVector

# want more info on the degree function?  You can get to it eventually through
#    the following:
#help( package = "sna" )
#??sna::degree

# what is the average (mean) degree?
gmAutomatedAvgDegree <- mean( gmAutomatedDegreeVector )
paste( "average degree = ", gmAutomatedAvgDegree, sep = "" )

# subset vector to get only those that are above mean
gmAutomatedAboveMeanVector <- gmAutomatedDegreeVector[ gmAutomatedDegreeVector > gmAutomatedAvgDegree ]

# Take the degree and associate it with each node as a node attribute.
#    (%v% is a shortcut for the get.vertex.attribute command)
gmAutomatedNetworkStatnet %v% "degree" <- gmAutomatedDegreeVector

# also add degree vector to original data frame
gmAutomatedDataDF$degree <- gmAutomatedDegreeVector









    





	30
	1
	39
	1
	34
	1
	1
	50
	26
	1
	29
	1
	47
	1
	1
	93
	46
	29
	47
	71
	2
	9
	1
	1
	2
	1
	35
	0
	2
	2
	2
	2
	1
	1
	1
	1
	1
	42
	1
	0
	1
	2
	2
	1
	5
	1
	0
	1
	43
	5
	1
	14
	1
	1
	2
	2
	1
	2
	1
	1
	1
	1
	1
	1
	0
	1
	0
	2
	4
	1
	1
	32
	1
	1
	3
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	7
	1
	1
	1
	9
	2
	1
	1
	1
	1
	1
	2
	5
	0
	0
	0
	0
	1
	1
	1
	0
	1
	1
	2
	0
	2
	2
	1
	1
	1
	1
	1
	2
	1
	1
	1
	1
	1
	0
	1
	1
	35
	1
	1
	1
	1
	1
	1
	0
	1
	1
	19
	1
	1
	1
	1
	1
	1
	27
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	4
	1
	1
	32
	49
	1
	2
	0
	1
	1
	44
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	2
	1
	1
	4
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	2
	2
	2
	0
	2
	2
	1
	1
	1
	1
	1
	4
	1
	1
	1
	0
	1
	1
	1
	1
	2
	1
	15
	1
	0
	1
	1
	2
	2
	2
	2
	2
	2
	4
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	0
	2
	2
	2
	2
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	2
	1
	0
	2
	0
	1
	2
	3
	1
	1
	1
	1
	0
	1
	1
	1
	0
	1
	72
	22
	46
	3
	1
	1
	1
	10
	8
	2
	1
	1
	2
	1
	1
	2
	1
	2
	2
	1
	1
	1
	1
	0
	0
	0
	1
	2
	1
	1
	1
	1
	1
	1
	2
	1
	0
	1
	1
	1
	1
	1
	1
	0
	1
	1
	0
	1
	1
	0
	0
	1
	1
	0
	0
	1
	1
	1
	3
	2
	15
	1
	1
	1
	1
	0
	2
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	2
	1
	1
	1
	0
	1
	0
	0
	1
	1
	1
	1
	1
	1
	2
	1
	0
	1
	7
	1
	1
	1
	1
	1
	1
	1
	0
	1
	0
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	3
	2
	4
	2
	1
	2
	2
	1
	1
	1
	1
	1
	0
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	0
	1
	1
	1
	1
	2
	1
	4
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	2
	1
	0
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	4
	1
	1
	1
	1
	1
	0
	1
	5
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	2
	2
	0
	2
	2
	0
	1
	1
	1
	0
	0
	1
	1
	1
	1
	1
	1
	1
	1
	4
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	0
	0
	1
	1
	1
	1
	1
	1
	1
	0
	0
	1
	0
	0
	1
	0
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	2
	1
	0
	1
	0
	1
	1
	1
	0
	1
	1
	1
	1
	1
	0
	2
	2
	2
	2
	2
	1
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	2
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	0
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	0
	1
	1
	1
	1
	0
	1
	0
	0
	1
	1
	0
	1
	3
	2
	1
	1
	1
	2
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	0
	0
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	0
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	0
	0
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	0
	1
	1
	0
	1
	1
	1
	1
	0
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	6
	1
	1
	1
	1
	2
	1
	1
	1
	1
	1
	2
	2
	0
	2
	2
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	2
	0
	2
	2
	2
	1
	0
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	2
	2
	2
	2
	2
	2
	2
	1
	1
	1
	0
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	2
	1
	1
	2
	0
	1
	0
	1








    




'average degree = 1.97429305912596'



In [15]:

    
# average author degree (person types 2 and 4)
gmAutomatedAverageAuthorDegree2And4 <- calcAuthorMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = TRUE )
paste( "average author degree (2 and 4) = ", gmAutomatedAverageAuthorDegree2And4, sep = "" )

# average author degree (person type 2 only)
gmAutomatedAverageAuthorDegreeOnly2 <- calcAuthorMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = FALSE )
paste( "average author degree (only 2) = ", gmAutomatedAverageAuthorDegreeOnly2, sep = "" )

# average source degree (person types 3 and 4)
gmAutomatedAverageSourceDegree3And4 <- calcSourceMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = TRUE )
paste( "average source degree (3 and 4) = ", gmAutomatedAverageSourceDegree3And4, sep = "" )

# average source degree (person type 3 only)
gmAutomatedAverageSourceDegreeOnly3 <- calcSourceMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = FALSE )
paste( "average source degree (only 3) = ", gmAutomatedAverageSourceDegreeOnly3, sep = "" )









    




'average author degree (2 and 4) = 24.7872340425532'






    




'average author degree (only 2) = 24.8478260869565'






    




'average source degree (3 and 4) = 1.161'






    




'average source degree (only 3) = 1.14014014014014'

`grp_month` (gm) - automated - More metrics

Back to Table of Contents

Now that we have the data in statnet object, run the code in the following for more in-depth information:

context_text/R/sna/statnet/sna-statnet-network-stats.r



In [16]:

    
# Links:
# - manual (PDF): http://cran.r-project.org/web/packages/sna/sna.pdf
# - good notes: http://www.shizukalab.com/toolkits/sna/node-level-calculations

# Also, be advised that statnet and igraph don't really play nice together.
#    If you'll be using both, best idea is to have a workspace for each.

#==============================================================================#
# statnet
#==============================================================================#

# make sure you've loaded the statnet library (includes sna)
# install.packages( "statnet" )
#library( statnet )

#==============================================================================#
# NODE level
#==============================================================================#

# what is the standard deviation of the degrees?
gmAutomatedDegreeSd <- sd( gmAutomatedDegreeVector )
paste( "degree SD = ", gmAutomatedDegreeSd, sep = "" )

# what is the variance of the degrees?
gmAutomatedDegreeVar <- var( gmAutomatedDegreeVector )
paste( "degree variance = ", gmAutomatedDegreeVar, sep = "" )

# what is the max value among the degrees?
gmAutomatedDegreeMax <- max( gmAutomatedDegreeVector )
paste( "degree max = ", gmAutomatedDegreeMax, sep = "" )

# calculate and plot degree distributions
gmAutomatedDegreeFrequenciesTable <- table( gmAutomatedDegreeVector )
paste( "degree frequencies = ", gmAutomatedDegreeFrequenciesTable, sep = "" )
gmAutomatedDegreeFrequenciesTable

# node-level undirected betweenness
gmAutomatedBetweenness <- sna::betweenness( gmAutomatedNetworkStatnet, gmode = "graph", cmode = "undirected" )

#paste( "betweenness = ", gmAutomatedBetweenness, sep = "" )
# associate with each node as a node attribute.
#    (%v% is a shortcut for the get.vertex.attribute command)
gmAutomatedNetworkStatnet %v% "betweenness" <- gmAutomatedBetweenness

# also add degree vector to original data frame
gmAutomatedDataDF$betweenness <- gmAutomatedBetweenness

#==============================================================================#
# NETWORK level
#==============================================================================#

# graph-level degree centrality
gmAutomatedDegreeCentrality <- sna::centralization( gmAutomatedNetworkStatnet, sna::degree, mode = "graph" )
paste( "degree centrality = ", gmAutomatedDegreeCentrality, sep = "" )

# graph-level betweenness centrality
gmAutomatedBetweennessCentrality <- sna::centralization( gmAutomatedNetworkStatnet, sna::betweenness, mode = "graph", cmode = "undirected" )
paste( "betweenness centrality = ", gmAutomatedBetweennessCentrality, sep = "" )

# graph-level connectedness
gmAutomatedConnectedness <- sna::connectedness( gmAutomatedNetworkStatnet )
paste( "connectedness = ", gmAutomatedConnectedness, sep = "" )

# graph-level transitivity
gmAutomatedTransitivity <- sna::gtrans( gmAutomatedNetworkStatnet, mode = "graph" )
paste( "transitivity = ", gmAutomatedTransitivity, sep = "" )

# graph-level density
gmAutomatedDensity <- sna::gden( gmAutomatedNetworkStatnet, mode = "graph" )
paste( "density = ", gmAutomatedDensity, sep = "" )









    




'degree SD = 6.42460087405331'






    




'degree variance = 41.2754963908866'






    




'degree max = 93'






    





	'degree frequencies = 122'
	'degree frequencies = 891'
	'degree frequencies = 100'
	'degree frequencies = 6'
	'degree frequencies = 9'
	'degree frequencies = 4'
	'degree frequencies = 1'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 2'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'








    





gmAutomatedDegreeVector
  0   1   2   3   4   5   6   7   8   9  10  14  15  19  22  26  27  29  30  32 
122 891 100   6   9   4   1   2   1   2   1   1   2   1   1   1   1   2   1   2 
 34  35  39  42  43  44  46  47  49  50  71  72  93 
  1   2   1   1   1   1   2   2   1   1   1   1   1 






    




'degree centrality = 0.0782006640213782'






    




'betweenness centrality = 0.206660606881935'






    




'connectedness = 0.588270050752468'






    



Warning message in sna::gtrans(gmAutomatedNetworkStatnet, mode = "graph"):
“gtrans called with use.adjacency=TRUE, but your data looks too large for that to work well.  Overriding to edgelist method.”





    




'transitivity = 0.00893353450329548'






    




'density = 0.00169321874710632'

`grp_month` (gm) - automated - create node attribute DataFrame

Back to Table of Contents

If you want to just work with the traits of the nodes/vertexes, you can combine the attribute vectors into a data frame.



In [17]:

    
#==============================================================================#
# output attributes to data frame
#==============================================================================#

# if you want to just work with the traits of the nodes/vertexes, you can
#    combine the attribute vectors into a data frame.

# first, output network object to see what attributes you have
gmAutomatedNetworkStatnet

# then, combine them into a data frame.
gmAutomatedNodeAttrDF <- data.frame( id = gmAutomatedNetworkStatnet %v% "vertex.names",
                                     person_id = gmAutomatedNetworkStatnet %v% "person_id",
                                     person_type = gmAutomatedNetworkStatnet %v% "person_type",
                                     degree = gmAutomatedNetworkStatnet %v% "degree",
                                     betweenness = gmAutomatedNetworkStatnet %v% "betweenness" )









    





 Network attributes:
  vertices = 1167 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 1152 
    missing edges= 0 
    non-missing edges= 1152 

 Vertex attribute names: 
    betweenness degree person_id person_type vertex.names 

 Edge attribute names not shown

`grp_month` (gm) - human

Return to Table of Contents

Next, we'll analyze the month of data coded by human coders. Set up some variables to store where data is located:

`grp_month` (gm) - human - Read data

Return to Table of Contents

Read in the data from tab-delimited data file, then get it in right data structures for use in R SNA.



In [18]:

    
# initialize variables
gmHumanDataFolder <- "/home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month"
gmHumanDataFile <- "sourcenet_data-20171115-043102-grp_month-human.tab"
gmHumanDataPath <- paste( gmHumanDataFolder, "/", gmHumanDataFile, sep = "" )



In [19]:

    
gmHumanDataPath









    




'/home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month/sourcenet_data-20171115-043102-grp_month-human.tab'

Load the data file into memory



In [20]:

    
# tab-delimited:
gmHumanDataDF <- read.delim( gmHumanDataPath, header = TRUE, row.names = 1, check.names = FALSE )



In [21]:

    
# get count of rows...
gmHumanRowCount <- nrow( gmHumanDataDF )
paste( "grp_month automated row count = ", gmHumanRowCount, sep = "" )

# ...and columns
gmHumanColumnCount <- ncol( gmHumanDataDF )
paste( "grp_month automated column count = ", gmHumanColumnCount, sep = "" )









    




'grp_month automated row count = 1167'






    




'grp_month automated column count = 1169'

Get just the tie rows and columns for initializing network libraries.



In [22]:

    
# the below syntax returns only as many columns as there are rows, so
#     omitting any trait columns that lie in columns on the right side
#     of the file.
gmHumanNetworkDF <- gmHumanDataDF[ , 1 : gmHumanRowCount ]
#str( gmHumanNetworkDF )



In [23]:

    
# convert to a matrix
gmHumanNetworkMatrix <- as.matrix( gmHumanNetworkDF )
# str( gmHumanNetworkMatrix )



In [24]:

    
# for all values greater than 1, reset their values to 1
gmHumanNetworkMatrix[ gmHumanNetworkMatrix > 1 ] = 1

`grp_month` (gm) - human - initialize statnet

Back to Table of Contents

First, load the statnet package, then load the automated grp_month data into statnet object and assign attributes to nodes.

Based on context_text/R/sna/statnet/sna-statnet-init.r.



In [25]:

    
# make sure you've loaded the statnet library
# install.packages( "statnet" )
library( statnet )



In [26]:

    
# If you have a data frame of attributes (each attribute is a column, with
#     attribute name the column name), you can associate those attributes
#     when you create the network.
# attribute help: http://www.inside-r.org/packages/cran/network/docs/loading.attributes

# load attributes from a file:
#tab_attribute_test1 <- read.delim( "tab-test1-attribute_data.txt", header = TRUE, row.names = 1, check.names = FALSE )

# or create DataFrame by just grabbing the attribute columns
#gmHumanNetworkAttributeDF <- gmHumanDataDF[ , 1169:1170 ]
gmHumanNetworkAttributeDF <- gmHumanDataDF[ , 1168:1169 ]

# convert matrix to statnet network object instance.
gmHumanNetworkStatnet <- network( gmHumanNetworkMatrix, matrix.type = "adjacency", directed = FALSE, vertex.attr = gmHumanNetworkAttributeDF )

# look at information now.
gmHumanNetworkStatnet

# Network attributes:
#  vertices = 314
#  directed = FALSE
#  hyper = FALSE
#  loops = FALSE
#  multiple = FALSE
#  bipartite = FALSE
#  total edges= 309
#    missing edges= 0
#    non-missing edges= 309
#
# Vertex attribute names:
#    person_type vertex.names
#
# No edge attributes









    





 Network attributes:
  vertices = 1167 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 1201 
    missing edges= 0 
    non-missing edges= 1201 

 Vertex attribute names: 
    person_id person_type vertex.names 

 Edge attribute names not shown



In [6]:

    
# human - include ties Greater than or equal to 0 (GE0)
gmHumanMeanTieWeightGE0Vector <- apply( gmHumanNetworkMatrix, 1, calculateListMean )
gmHumanDataDF$meanTieWeightGE0 <- gmHumanMeanTieWeightGE0Vector

# human - include ties Greater than or equal to 1 (GE1)
gmHumanMeanTieWeightGE1Vector <- apply( gmHumanNetworkMatrix, 1, calculateListMean, minValueToIncludeIN = 1 )
gmHumanDataDF$meanTieWeightGE1 <- gmHumanMeanTieWeightGE1Vector

# human - Max tie weight?
gmHumanMaxTieWeightVector <- apply( gmHumanNetworkMatrix, 1, calculateListMax )
gmHumanDataDF$maxTieWeight <- gmHumanMaxTieWeightVector

`grp_month` (gm) - human - Basic metrics

Back to Table of Contents



In [27]:

    
# assuming that our statnet network object is in reference test1_statnet.

# Use the degree function in the sna package to create vector of degree values
#    for each node.  Make sure to pass the gmode parameter to tell it that the
#    graph is not directed (gmode = "graph", instead of "digraph").
# Doc: http://www.inside-r.org/packages/cran/sna/docs/degree
#degree_vector <- degree( test1_statnet, gmode = "graph" )

# If you have other libraries loaded that also implement a degree function, you
#    can also call this with package name:
gmHumanDegreeVector <- sna::degree( gmHumanNetworkStatnet, gmode = "graph" )

# output the vector
gmHumanDegreeVector

# want more info on the degree function?  You can get to it eventually through
#    the following:
#help( package = "sna" )
#??sna::degree

# what is the average (mean) degree?
gmHumanAvgDegree <- mean( gmHumanDegreeVector )
paste( "average degree = ", gmHumanAvgDegree, sep = "" )

# subset vector to get only those that are above mean
gmHumanAboveMeanVector <- gmHumanDegreeVector[ gmHumanDegreeVector > gmHumanAvgDegree ]

# Take the degree and associate it with each node as a node attribute.
#    (%v% is a shortcut for the get.vertex.attribute command)
gmHumanNetworkStatnet %v% "degree" <- gmHumanDegreeVector

# also add degree vector to original data frame
gmHumanDataDF$degree <- gmHumanDegreeVector









    





	28
	1
	36
	1
	34
	0
	1
	50
	28
	1
	32
	1
	61
	1
	1
	99
	44
	31
	47
	66
	2
	6
	1
	1
	2
	1
	38
	2
	2
	2
	2
	2
	1
	1
	1
	1
	1
	41
	1
	1
	1
	2
	1
	1
	7
	1
	0
	1
	46
	4
	1
	19
	3
	1
	2
	2
	1
	1
	1
	1
	0
	1
	1
	1
	1
	0
	1
	2
	3
	1
	1
	32
	1
	1
	3
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	7
	1
	1
	1
	9
	2
	1
	1
	1
	1
	1
	2
	7
	1
	0
	0
	0
	1
	1
	1
	1
	1
	1
	2
	2
	2
	2
	1
	1
	1
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	37
	1
	1
	1
	1
	1
	2
	1
	1
	1
	19
	1
	1
	1
	1
	1
	1
	33
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	4
	1
	1
	31
	45
	1
	2
	1
	1
	1
	44
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	2
	1
	1
	3
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	2
	4
	4
	4
	4
	4
	1
	1
	1
	1
	1
	4
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	19
	1
	1
	1
	1
	2
	2
	2
	3
	2
	2
	4
	1
	0
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	3
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	2
	2
	2
	3
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	4
	1
	1
	2
	1
	1
	2
	4
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	76
	22
	50
	3
	1
	1
	1
	13
	9
	3
	1
	2
	3
	1
	1
	2
	1
	2
	2
	1
	0
	1
	0
	0
	0
	0
	1
	1
	0
	1
	0
	0
	1
	0
	2
	0
	1
	1
	1
	0
	0
	1
	1
	0
	0
	1
	0
	1
	1
	0
	10
	0
	1
	1
	1
	1
	1
	1
	3
	2
	14
	2
	1
	0
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	2
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	7
	1
	1
	0
	1
	1
	1
	0
	0
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	3
	2
	4
	2
	1
	2
	2
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	4
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	4
	0
	1
	1
	1
	1
	0
	1
	5
	1
	1
	1
	1
	1
	1
	3
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	2
	2
	2
	2
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	4
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	2
	2
	2
	2
	2
	1
	1
	1
	2
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	0
	1
	1
	1
	2
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	3
	2
	1
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	0
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	0
	0
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	5
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	2
	0
	2
	2
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	2
	2
	2
	2
	1
	1
	1
	1
	1
	1
	1
	1
	0
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	0
	2
	2
	2
	2
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	0
	0
	0
	0
	1
	0
	1
	0
	0
	0
	1
	0
	0
	0
	0
	0
	0
	0
	0
	0
	1
	0
	0
	0
	0
	0
	0
	0
	0
	2
	0
	1
	0
	0
	1
	1
	1
	1








    




'average degree = 2.05826906598115'



In [28]:

    
# average author degree (person types 2 and 4)
gmHumanAverageAuthorDegree2And4 <- calcAuthorMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = TRUE )
paste( "average author degree (2 and 4) = ", gmHumanAverageAuthorDegree2And4, sep = "" )

# average author degree (person type 2 only)
gmHumanAverageAuthorDegreeOnly2 <- calcAuthorMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = FALSE )
paste( "average author degree (only 2) = ", gmHumanAverageAuthorDegreeOnly2, sep = "" )

# average source degree (person types 3 and 4)
gmHumanAverageSourceDegree3And4 <- calcSourceMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = TRUE )
paste( "average source degree (3 and 4) = ", gmHumanAverageSourceDegree3And4, sep = "" )

# average source degree (person type 3 only)
gmHumanAverageSourceDegreeOnly3 <- calcSourceMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = FALSE )
paste( "average source degree (only 3) = ", gmHumanAverageSourceDegreeOnly3, sep = "" )









    




'average author degree (2 and 4) = 25.3958333333333'






    




'average author degree (only 2) = 25.3958333333333'






    




'average source degree (3 and 4) = 1.1564027370479'






    




'average source degree (only 3) = 1.1564027370479'

`grp_month` (gm) - human - More metrics

Back to Table of Contents

Now that we have the data in statnet object, run the code in the following for more in-depth information:

context_text/R/sna/statnet/sna-statnet-network-stats.r



In [29]:

    
# Links:
# - manual (PDF): http://cran.r-project.org/web/packages/sna/sna.pdf
# - good notes: http://www.shizukalab.com/toolkits/sna/node-level-calculations

# Also, be advised that statnet and igraph don't really play nice together.
#    If you'll be using both, best idea is to have a workspace for each.

#==============================================================================#
# statnet
#==============================================================================#

# make sure you've loaded the statnet library (includes sna)
# install.packages( "statnet" )
#library( statnet )

#==============================================================================#
# NODE level
#==============================================================================#

# what is the standard deviation of the degrees?
gmHumanDegreeSd <- sd( gmHumanDegreeVector )
paste( "degree SD = ", gmHumanDegreeSd, sep = "" )

# what is the variance of the degrees?
gmHumanDegreeVar <- var( gmHumanDegreeVector )
paste( "degree variance = ", gmHumanDegreeVar, sep = "" )

# what is the max value among the degrees?
gmHumanDegreeMax <- max( gmHumanDegreeVector )
paste( "degree max = ", gmHumanDegreeMax, sep = "" )

# calculate and plot degree distributions
gmHumanDegreeFrequenciesTable <- table( gmHumanDegreeVector )
paste( "degree frequencies = ", gmHumanDegreeFrequenciesTable, sep = "" )
gmHumanDegreeFrequenciesTable

# node-level undirected betweenness
gmHumanBetweenness <- sna::betweenness( gmHumanNetworkStatnet, gmode = "graph", cmode = "undirected" )

#paste( "betweenness = ", gmHumanBetweenness, sep = "" )
# associate with each node as a node attribute.
#    (%v% is a shortcut for the get.vertex.attribute command)
gmHumanNetworkStatnet %v% "betweenness" <- gmHumanBetweenness

# also add degree vector to original data frame
gmHumanDataDF$betweenness <- gmHumanBetweenness

#==============================================================================#
# NETWORK level
#==============================================================================#

# graph-level degree centrality
gmHumanDegreeCentrality <- sna::centralization( gmHumanNetworkStatnet, sna::degree, mode = "graph" )
paste( "degree centrality = ", gmHumanDegreeCentrality, sep = "" )

# graph-level betweenness centrality
gmHumanBetweennessCentrality <- sna::centralization( gmHumanNetworkStatnet, sna::betweenness, mode = "graph", cmode = "undirected" )
paste( "betweenness centrality = ", gmHumanBetweennessCentrality, sep = "" )

# graph-level connectedness
gmHumanConnectedness <- sna::connectedness( gmHumanNetworkStatnet )
paste( "connectedness = ", gmHumanConnectedness, sep = "" )

# graph-level transitivity
gmHumanTransitivity <- sna::gtrans( gmHumanNetworkStatnet, mode = "graph" )
paste( "transitivity = ", gmHumanTransitivity, sep = "" )

# graph-level density
gmHumanDensity <- sna::gden( gmHumanNetworkStatnet, mode = "graph" )
paste( "density = ", gmHumanDensity, sep = "" )









    




'degree SD = 6.65377784484138'






    




'degree variance = 44.272759608502'






    




'degree max = 99'






    





	'degree frequencies = 97'
	'degree frequencies = 911'
	'degree frequencies = 91'
	'degree frequencies = 14'
	'degree frequencies = 15'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 4'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 3'
	'degree frequencies = 1'
	'degree frequencies = 2'
	'degree frequencies = 2'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'








    





gmHumanDegreeVector
  0   1   2   3   4   5   6   7   9  10  13  14  19  22  28  31  32  33  34  36 
 97 911  91  14  15   2   1   4   2   1   1   1   3   1   2   2   2   1   1   1 
 37  38  41  44  45  46  47  50  61  66  76  99 
  1   1   1   2   1   1   1   2   1   1   1   1 






    




'degree centrality = 0.0832831513777339'






    




'betweenness centrality = 0.220493193695819'






    




'connectedness = 0.673448360502733'






    



Warning message in sna::gtrans(gmHumanNetworkStatnet, mode = "graph"):
“gtrans called with use.adjacency=TRUE, but your data looks too large for that to work well.  Overriding to edgelist method.”





    




'transitivity = 0.0131821874307658'






    




'density = 0.00176523933617594'

`grp_month` (gm) - human - create node attribute DataFrame

Back to Table of Contents

If you want to just work with the traits of the nodes/vertexes, you can combine the attribute vectors into a data frame.



In [30]:

    
#==============================================================================#
# output attributes to data frame
#==============================================================================#

# if you want to just work with the traits of the nodes/vertexes, you can
#    combine the attribute vectors into a data frame.

# first, output network object to see what attributes you have
gmHumanNetworkStatnet

# then, combine them into a data frame.
gmHumanNodeAttrDF <- data.frame( id = gmHumanNetworkStatnet %v% "vertex.names",
                                 person_id = gmHumanNetworkStatnet %v% "person_id",
                                 person_type = gmHumanNetworkStatnet %v% "person_type",
                                 degree = gmHumanNetworkStatnet %v% "degree",
                                 betweenness = gmHumanNetworkStatnet %v% "betweenness" )









    





 Network attributes:
  vertices = 1167 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 1201 
    missing edges= 0 
    non-missing edges= 1201 

 Vertex attribute names: 
    betweenness degree person_id person_type vertex.names 

 Edge attribute names not shown

`grp_month` QAP graph correlation between automated and ground truth

Back to Table of Contents

Now, compare the automated and human-coded networks themselves using graph correlation in QAP.

Based on: context_text/R/sna/statnet/sna-qap.r

Note: QAP compares two networks, so will have to wait until both OpenCalais and human coding networks have been processed.



In [31]:

    
# link to good doc on qaptest(){sna} function: http://www.inside-r.org/packages/cran/sna/docs/qaptest

# First, need to load data - see (or just source() ) the file "sna-load_data.r".
# source( "sna-load_data.r" )
# does the following (among other things):
# Start with loading in tab-delimited files.
#humanNetworkData <- read.delim( "human-sourcenet_data-20150504-002453.tab", header = TRUE, row.names = 1, check.names = FALSE )
#calaisNetworkData <- read.delim( "puter-sourcenet_data-20150504-002507.tab", header = TRUE, row.names = 1, check.names = FALSE )

# remove the right-most column, which contains non-tie info on nodes.
#humanNetworkTies <- humanNetworkData[ , -ncol( humanNetworkData ) ]
#gmAutomatedNetworkDF <- calaisNetworkData[ , -ncol( calaisNetworkData )]

# convert each to a matrix
#gmHumanNetworkMatrix <- as.matrix( gmHumanNetworkTies )
#gmAutomatedNetworkMatrix <- as.matrix( gmAutomatedNetworkDF )

# imports
# install.packages( "sna" )
# install.packages( "statnet" )
library( "sna" )

# package up data for calling qaptest() - first make 3-dimensional array to hold
#    our two matrices - this is known as a "graph set".
graphSetArray <- array( dim = c( 2, ncol( gmHumanNetworkMatrix ), nrow( gmHumanNetworkMatrix ) ) )

# then, place each matrix in one dimension of the array.
graphSetArray[ 1, , ] <- gmHumanNetworkMatrix
graphSetArray[ 2, , ] <- gmAutomatedNetworkMatrix

# first, try a graph correlation
graphCorrelation <- sna::gcor( gmHumanNetworkMatrix, gmAutomatedNetworkMatrix )
paste( "graph correlation = ", graphCorrelation, sep = "" )

# try a qaptest...
qapGcorResult <- sna::qaptest( graphSetArray, sna::gcor, g1 = 1, g2 = 2 )
summary( qapGcorResult )
plot( qapGcorResult )

# graph covariance...
graphCovariance <- sna::gcov( gmHumanNetworkMatrix, gmAutomatedNetworkMatrix )
graphCovariance
paste( "graph covariance = ", graphCovariance, sep = "" )

# try a qaptest...
qapGcovResult <- sna::qaptest( graphSetArray, sna::gcov, g1 = 1, g2 = 2 )
summary( qapGcovResult )
plot( qapGcovResult )

# Hamming Distance
graphHammingDist <- sna::hdist( gmHumanNetworkMatrix, gmAutomatedNetworkMatrix )
paste( "graph hamming distance = ", graphHammingDist, sep = "" )

# try a qaptest...
qapHdistResult <- sna::qaptest( graphSetArray, sna::hdist, g1 = 1, g2 = 2 )
summary( qapHdistResult )
plot( qapHdistResult )

# graph structural correlation?
#graphStructCorrelation <- gscor( gmHumanNetworkMatrix, gmAutomatedNetworkMatrix )
#graphStructCorrelation









    




'graph correlation = 0.902705354664938'






    





QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 0 
	p(f(perm) <= f(d)): 1 

Test Diagnostics:
	Test Value (f(d)): 0.9027054 
	Replications: 1000 
	Distribution Summary:
		Min:	 -0.001731849 
		1stQ:	 -0.000880213 
		Med:	 -2.857725e-05 
		Mean:	 2.08164e-06 
		3rdQ:	 0.0008230585 
		Max:	 0.01104269 







    




0.00155794824109381






    




'graph covariance = 0.00155794824109381'






    





QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 0 
	p(f(perm) <= f(d)): 1 

Test Diagnostics:
	Test Value (f(d)): 0.001557948 
	Replications: 1000 
	Distribution Summary:
		Min:	 -2.988939e-06 
		1stQ:	 -1.51913e-06 
		Med:	 -4.932049e-08 
		Mean:	 1.520433e-07 
		3rdQ:	 1.420489e-06 
		Max:	 1.317896e-05 







    












    




'graph hamming distance = 458'






    





QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 1 
	p(f(perm) <= f(d)): 0 

Test Diagnostics:
	Test Value (f(d)): 458 
	Replications: 1000 
	Distribution Summary:
		Min:	 4634 
		1stQ:	 4694 
		Med:	 4698 
		Mean:	 4697.728 
		3rdQ:	 4702 
		Max:	 4706

`grp_week` analysis

Back to Table of Contents

Look at a single week from the shiny new month of data.



In [32]:

    
output_prefix <- "grp_week"

`grp_week` (gw) - automated - OpenCalais

Return to Table of Contents

First, we'll analyze the month of data coded by OpenCalais. Set up some variables to store where data is located:

`grp_week` (gw) - automated - Read data

Return to Table of Contents

Read in the data from tab-delimited data file, then get it in right data structures for use in R SNA.



In [33]:

    
# initialize variables
gwAutomatedDataFolder <- "/home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month"
gwAutomatedDataFile <- "sourcenet_data-20171206-031358-grp_month-automated-week_subset.tab"
gwAutomatedDataPath <- paste( gwAutomatedDataFolder, "/", gwAutomatedDataFile, sep = "" )



In [34]:

    
gwAutomatedDataPath









    




'/home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month/sourcenet_data-20171206-031358-grp_month-automated-week_subset.tab'

Load the data file into memory



In [35]:

    
# tab-delimited:
gwAutomatedDataDF <- read.delim( gwAutomatedDataPath, header = TRUE, row.names = 1, check.names = FALSE )



In [36]:

    
# get count of rows...
gwAutomatedRowCount <- nrow( gwAutomatedDataDF )
paste( output_prefix, "automated row count =", gwAutomatedRowCount, sep = " " )

# ...and columns
gwAutomatedColumnCount <- ncol( gwAutomatedDataDF )
paste( output_prefix, "automated column count =", gwAutomatedColumnCount, sep = " " )









    




'grp_week automated row count = 1167'






    




'grp_week automated column count = 1169'

Get just the tie rows and columns for initializing network libraries.



In [37]:

    
# the below syntax returns only as many columns as there are rows, so
#     omitting any trait columns that lie in columns on the right side
#     of the file.
gwAutomatedNetworkDF <- gwAutomatedDataDF[ , 1 : gwAutomatedRowCount ]
#str( gwAutomatedNetworkDF )



In [38]:

    
# convert to a matrix
gwAutomatedNetworkMatrix <- as.matrix( gwAutomatedNetworkDF )
# str( gwAutomatedNetworkMatrix )



In [39]:

    
# for all values greater than 1, reset their values to 1
gwAutomatedNetworkMatrix[ gwAutomatedNetworkMatrix > 1 ] = 1

`grp_week` (gw) - automated - initialize statnet

Back to Table of Contents

First, load the statnet package, then load the automated grp_month week subset data into statnet object and assign attributes to nodes.

Based on context_text/R/sna/statnet/sna-statnet-init.r.



In [40]:

    
# make sure you've loaded the statnet library
# install.packages( "statnet" )
library( statnet )



In [41]:

    
# If you have a data frame of attributes (each attribute is a column, with
#     attribute name the column name), you can associate those attributes
#     when you create the network.
# attribute help: http://www.inside-r.org/packages/cran/network/docs/loading.attributes

# load attributes from a file:
#tab_attribute_test1 <- read.delim( "tab-test1-attribute_data.txt", header = TRUE, row.names = 1, check.names = FALSE )

# or create DataFrame by just grabbing the attribute columns
gwAutomatedNetworkAttributeDF <- gwAutomatedDataDF[ , 1168:1169 ]

# convert matrix to statnet network object instance.
gwAutomatedNetworkStatnet <- network( gwAutomatedNetworkMatrix, matrix.type = "adjacency", directed = FALSE, vertex.attr = gwAutomatedNetworkAttributeDF )

# look at information now.
gwAutomatedNetworkStatnet

# Network attributes:
#  vertices = 314
#  directed = FALSE
#  hyper = FALSE
#  loops = FALSE
#  multiple = FALSE
#  bipartite = FALSE
#  total edges= 309
#    missing edges= 0
#    non-missing edges= 309
#
# Vertex attribute names:
#    person_type vertex.names
#
# No edge attributes









    





 Network attributes:
  vertices = 1167 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 298 
    missing edges= 0 
    non-missing edges= 298 

 Vertex attribute names: 
    person_id person_type vertex.names 

No edge attributes



In [7]:

    
# calais - include ties Greater than or equal to 0 (GE0)
gwAutomatedMeanTieWeightGE0Vector <- apply( gwAutomatedNetworkMatrix, 1, calculateListMean )
gwAutomatedDataDF$meanTieWeightGE0 <- gwAutomatedMeanTieWeightGE0Vector

# calais - include ties Greater than or equal to 1 (GE1)
gwAutomatedMeanTieWeightGE1Vector <- apply( gwAutomatedNetworkMatrix, 1, calculateListMean, minValueToIncludeIN = 1 )
gwAutomatedDataDF$meanTieWeightGE1 <- gwAutomatedMeanTieWeightGE1Vector

# automated - Max tie weight?
gwAutomatedMaxTieWeightVector <- apply( gwAutomatedNetworkMatrix, 1, calculateListMax )
gwAutomatedDataDF$maxTieWeight <- gwAutomatedMaxTieWeightVector

`grp_week` (gw) - automated - Basic metrics

Back to Table of Contents



In [42]:

    
# assuming that our statnet network object is in reference test1_statnet.

# Use the degree function in the sna package to create vector of degree values
#    for each node.  Make sure to pass the gwode parameter to tell it that the
#    graph is not directed (gwode = "graph", instead of "digraph").
# Doc: http://www.inside-r.org/packages/cran/sna/docs/degree
#degree_vector <- degree( test1_statnet, gwode = "graph" )

# If you have other libraries loaded that also implement a degree function, you
#    can also call this with package name:
gwAutomatedDegreeVector <- sna::degree( gwAutomatedNetworkStatnet, gmode = "graph" )

# output the vector
gwAutomatedDegreeVector

# want more info on the degree function?  You can get to it eventually through
#    the following:
#help( package = "sna" )
#??sna::degree

# what is the average (mean) degree?
gwAutomatedAvgDegree <- mean( gwAutomatedDegreeVector )
paste( output_prefix, "average degree =", gwAutomatedAvgDegree, sep = " " )

# subset vector to get only those that are above mean
gwAutomatedAboveMeanVector <- gwAutomatedDegreeVector[ gwAutomatedDegreeVector > gwAutomatedAvgDegree ]

# Take the degree and associate it with each node as a node attribute.
#    (%v% is a shortcut for the get.vertex.attribute command)
gwAutomatedNetworkStatnet %v% "degree" <- gwAutomatedDegreeVector

# also add degree vector to original data frame
gwAutomatedDataDF$degree <- gwAutomatedDegreeVector









    





	5
	0
	14
	0
	11
	1
	1
	10
	9
	0
	15
	1
	24
	0
	0
	17
	23
	6
	15
	18
	1
	3
	0
	1
	1
	0
	15
	0
	2
	2
	2
	2
	1
	1
	1
	1
	1
	12
	1
	0
	1
	1
	0
	1
	3
	0
	0
	1
	10
	0
	0
	6
	0
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	0
	0
	0
	1
	1
	1
	1
	17
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	3
	1
	1
	1
	9
	1
	1
	1
	1
	1
	1
	2
	4
	0
	0
	0
	0
	1
	1
	1
	0
	1
	1
	2
	0
	2
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	2
	1
	1
	1
	1
	1
	1
	0
	1
	1
	5
	1
	1
	1
	1
	1
	1
	6
	1
	1
	1
	0
	1
	1
	1
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	6
	0
	0
	0
	0
	1
	1
	4
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	2
	0
	2
	2
	1
	0
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	3
	1
	0
	1
	1
	2
	2
	2
	0
	2
	2
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	0
	0
	2
	2
	2
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	0
	1
	0
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	0
	1
	21
	0
	8
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	1
	1
	1
	1
	0
	0
	0
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	0
	1
	1
	1
	1
	1
	1
	0
	1
	1
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	3
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	1
	1
	1
	1
	1
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	1
	1
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	2
	1
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	1
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0








    




'grp_week average degree = 0.510711225364182'



In [43]:

    
# average author degree (person types 2 and 4)
gwAutomatedAverageAuthorDegree2And4 <- calcAuthorMeanDegree( dataFrameIN = gwAutomatedDataDF, includeBothIN = TRUE )
paste( output_prefix, "average author degree (2 and 4) =", gwAutomatedAverageAuthorDegree2And4, sep = " " )

# average author degree (person type 2 only)
gwAutomatedAverageAuthorDegreeOnly2 <- calcAuthorMeanDegree( dataFrameIN = gwAutomatedDataDF, includeBothIN = FALSE )
paste( output_prefix, "average author degree (only 2) =", gwAutomatedAverageAuthorDegreeOnly2, sep = " " )

# average source degree (person types 3 and 4)
gwAutomatedAverageSourceDegree3And4 <- calcSourceMeanDegree( dataFrameIN = gwAutomatedDataDF, includeBothIN = TRUE )
paste( output_prefix, "average source degree (3 and 4) =", gwAutomatedAverageSourceDegree3And4, sep = " " )

# average source degree (person type 3 only)
gwAutomatedAverageSourceDegreeOnly3 <- calcSourceMeanDegree( dataFrameIN = gwAutomatedDataDF, includeBothIN = FALSE )
paste( output_prefix, "average source degree (only 3) =", gwAutomatedAverageSourceDegreeOnly3, sep = " " )









    




'grp_week average author degree (2 and 4) = 9.46875'






    




'grp_week average author degree (only 2) = 9.46875'






    




'grp_week average source degree (3 and 4) = 1.11406844106464'






    




'grp_week average source degree (only 3) = 1.11406844106464'

`grp_week` (gw) - automated - More metrics

Back to Table of Contents

Now that we have the data in statnet object, run the code in the following for more in-depth information:

context_text/R/sna/statnet/sna-statnet-network-stats.r



In [44]:

    
# Links:
# - manual (PDF): http://cran.r-project.org/web/packages/sna/sna.pdf
# - good notes: http://www.shizukalab.com/toolkits/sna/node-level-calculations

# Also, be advised that statnet and igraph don't really play nice together.
#    If you'll be using both, best idea is to have a workspace for each.

#==============================================================================#
# statnet
#==============================================================================#

# make sure you've loaded the statnet library (includes sna)
# install.packages( "statnet" )
#library( statnet )

#==============================================================================#
# NODE level
#==============================================================================#

# what is the standard deviation of the degrees?
gwAutomatedDegreeSd <- sd( gwAutomatedDegreeVector )
paste( output_prefix, "degree SD =", gwAutomatedDegreeSd, sep = " " )

# what is the variance of the degrees?
gwAutomatedDegreeVar <- var( gwAutomatedDegreeVector )
paste( output_prefix, "degree variance =", gwAutomatedDegreeVar, sep = " " )

# what is the max value among the degrees?
gwAutomatedDegreeMax <- max( gwAutomatedDegreeVector )
paste( output_prefix, "degree max =", gwAutomatedDegreeMax, sep = " " )

# calculate and plot degree distributions
gwAutomatedDegreeFrequenciesTable <- table( gwAutomatedDegreeVector )
paste( output_prefix, "degree frequencies =", gwAutomatedDegreeFrequenciesTable, sep = " " )
gwAutomatedDegreeFrequenciesTable

# node-level undirected betweenness
gwAutomatedBetweenness <- sna::betweenness( gwAutomatedNetworkStatnet, gmode = "graph", cmode = "undirected" )

#paste( "betweenness = ", gwAutomatedBetweenness, sep = "" )
# associate with each node as a node attribute.
#    (%v% is a shortcut for the get.vertex.attribute command)
gwAutomatedNetworkStatnet %v% "betweenness" <- gwAutomatedBetweenness

# also add degree vector to original data frame
gwAutomatedDataDF$betweenness <- gwAutomatedBetweenness

#==============================================================================#
# NETWORK level
#==============================================================================#

# graph-level degree centrality
gwAutomatedDegreeCentrality <- sna::centralization( gwAutomatedNetworkStatnet, sna::degree, mode = "graph" )
paste( output_prefix, "degree centrality =", gwAutomatedDegreeCentrality, sep = " " )

# graph-level betweenness centrality
gwAutomatedBetweennessCentrality <- sna::centralization( gwAutomatedNetworkStatnet, sna::betweenness, mode = "graph", cmode = "undirected" )
paste( output_prefix, "betweenness centrality =", gwAutomatedBetweennessCentrality, sep = " " )

# graph-level connectedness
gwAutomatedConnectedness <- sna::connectedness( gwAutomatedNetworkStatnet )
paste( output_prefix, "connectedness =", gwAutomatedConnectedness, sep = " " )

# graph-level transitivity
gwAutomatedTransitivity <- sna::gtrans( gwAutomatedNetworkStatnet, mode = "graph" )
paste( output_prefix, "transitivity =", gwAutomatedTransitivity, sep = " " )

# graph-level density
gwAutomatedDensity <- sna::gden( gwAutomatedNetworkStatnet, mode = "graph" )
paste( output_prefix, "density =", gwAutomatedDensity, sep = " " )









    




'grp_week degree SD = 1.92474544655479'






    




'grp_week degree variance = 3.7046450340334'






    




'grp_week degree max = 24'






    





	'grp_week degree frequencies = 872'
	'grp_week degree frequencies = 239'
	'grp_week degree frequencies = 26'
	'grp_week degree frequencies = 5'
	'grp_week degree frequencies = 2'
	'grp_week degree frequencies = 2'
	'grp_week degree frequencies = 4'
	'grp_week degree frequencies = 1'
	'grp_week degree frequencies = 2'
	'grp_week degree frequencies = 2'
	'grp_week degree frequencies = 1'
	'grp_week degree frequencies = 1'
	'grp_week degree frequencies = 1'
	'grp_week degree frequencies = 3'
	'grp_week degree frequencies = 2'
	'grp_week degree frequencies = 1'
	'grp_week degree frequencies = 1'
	'grp_week degree frequencies = 1'
	'grp_week degree frequencies = 1'








    





gwAutomatedDegreeVector
  0   1   2   3   4   5   6   8   9  10  11  12  14  15  17  18  21  23  24 
872 239  26   5   2   2   4   1   2   2   1   1   1   3   2   1   1   1   1 






    




'grp_week degree centrality = 0.0201797716414285'






    




'grp_week betweenness centrality = 0.00454678734613902'






    




'grp_week connectedness = 0.00801192308201087'






    



Warning message in sna::gtrans(gwAutomatedNetworkStatnet, mode = "graph"):
“gtrans called with use.adjacency=TRUE, but your data looks too large for that to work well.  Overriding to edgelist method.”





    




'grp_week transitivity = 0.0372393247269116'






    




'grp_week density = 0.000438002766178543'

`grp_week` (gw) - automated - create node attribute DataFrame

Back to Table of Contents

If you want to just work with the traits of the nodes/vertexes, you can combine the attribute vectors into a data frame.



In [45]:

    
#==============================================================================#
# output attributes to data frame
#==============================================================================#

# if you want to just work with the traits of the nodes/vertexes, you can
#    combine the attribute vectors into a data frame.

# first, output network object to see what attributes you have
gwAutomatedNetworkStatnet

# then, combine them into a data frame.
gwAutomatedNodeAttrDF <- data.frame( id = gwAutomatedNetworkStatnet %v% "vertex.names",
                                     person_id = gwAutomatedNetworkStatnet %v% "person_id",
                                     person_type = gwAutomatedNetworkStatnet %v% "person_type",
                                     degree = gwAutomatedNetworkStatnet %v% "degree",
                                     betweenness = gwAutomatedNetworkStatnet %v% "betweenness" )









    





 Network attributes:
  vertices = 1167 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 298 
    missing edges= 0 
    non-missing edges= 298 

 Vertex attribute names: 
    betweenness degree person_id person_type vertex.names 

No edge attributes

`grp_week` (gw) - human

Return to Table of Contents

Next, we'll analyze the same week from the month of data coded by human coders. Set up some variables to store where data is located:

`grp_week` (gw) - human - Read data

Return to Table of Contents

Read in the data from tab-delimited data file, then get it in right data structures for use in R SNA.



In [46]:

    
# initialize variables
gwHumanDataFolder <- "/home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month"
gwHumanDataFile <- "sourcenet_data-20171206-031319-grp_month-human-week_subset.tab"
gwHumanDataPath <- paste( gwHumanDataFolder, "/", gwHumanDataFile, sep = "" )



In [47]:

    
gwHumanDataPath









    




'/home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month/sourcenet_data-20171206-031319-grp_month-human-week_subset.tab'

Load the data file into memory



In [48]:

    
# tab-delimited:
gwHumanDataDF <- read.delim( gwHumanDataPath, header = TRUE, row.names = 1, check.names = FALSE )



In [49]:

    
# get count of rows...
gwHumanRowCount <- nrow( gwHumanDataDF )
paste( output_prefix, "automated row count =", gwHumanRowCount, sep = " " )

# ...and columns
gwHumanColumnCount <- ncol( gwHumanDataDF )
paste( output_prefix, "automated column count =", gwHumanColumnCount, sep = " " )









    




'grp_week automated row count = 1167'






    




'grp_week automated column count = 1169'

Get just the tie rows and columns for initializing network libraries.



In [50]:

    
# the below syntax returns only as many columns as there are rows, so
#     omitting any trait columns that lie in columns on the right side
#     of the file.
gwHumanNetworkDF <- gwHumanDataDF[ , 1 : gwHumanRowCount ]
#str( gwHumanNetworkDF )



In [51]:

    
# convert to a matrix
gwHumanNetworkMatrix <- as.matrix( gwHumanNetworkDF )
# str( gwHumanNetworkMatrix )



In [52]:

    
# for all values greater than 1, reset their values to 1
gwHumanNetworkMatrix[ gwHumanNetworkMatrix > 1 ] = 1

`grp_week` (gw) - human - initialize statnet

Back to Table of Contents

First, load the statnet package, then load the automated grp_month week of data into statnet object and assign attributes to nodes.

Based on context_text/R/sna/statnet/sna-statnet-init.r.



In [53]:

    
# make sure you've loaded the statnet library
# install.packages( "statnet" )
library( statnet )



In [54]:

    
# If you have a data frame of attributes (each attribute is a column, with
#     attribute name the column name), you can associate those attributes
#     when you create the network.
# attribute help: http://www.inside-r.org/packages/cran/network/docs/loading.attributes

# load attributes from a file:
#tab_attribute_test1 <- read.delim( "tab-test1-attribute_data.txt", header = TRUE, row.names = 1, check.names = FALSE )

# or create DataFrame by just grabbing the attribute columns
#gwHumanNetworkAttributeDF <- gwHumanDataDF[ , 1169:1170 ]
gwHumanNetworkAttributeDF <- gwHumanDataDF[ , 1168:1169 ]

# convert matrix to statnet network object instance.
gwHumanNetworkStatnet <- network( gwHumanNetworkMatrix, matrix.type = "adjacency", directed = FALSE, vertex.attr = gwHumanNetworkAttributeDF )

# look at information now.
gwHumanNetworkStatnet

# Network attributes:
#  vertices = 314
#  directed = FALSE
#  hyper = FALSE
#  loops = FALSE
#  multiple = FALSE
#  bipartite = FALSE
#  total edges= 309
#    missing edges= 0
#    non-missing edges= 309
#
# Vertex attribute names:
#    person_type vertex.names
#
# No edge attributes









    





 Network attributes:
  vertices = 1167 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 340 
    missing edges= 0 
    non-missing edges= 340 

 Vertex attribute names: 
    person_id person_type vertex.names 

No edge attributes



In [8]:

    
# human - include ties Greater than or equal to 0 (GE0)
gwHumanMeanTieWeightGE0Vector <- apply( gwHumanNetworkMatrix, 1, calculateListMean )
gwHumanDataDF$meanTieWeightGE0 <- gwHumanMeanTieWeightGE0Vector

# human - include ties Greater than or equal to 1 (GE1)
gwHumanMeanTieWeightGE1Vector <- apply( gwHumanNetworkMatrix, 1, calculateListMean, minValueToIncludeIN = 1 )
gwHumanDataDF$meanTieWeightGE1 <- gwHumanMeanTieWeightGE1Vector

# human - Max tie weight?
gwHumanMaxTieWeightVector <- apply( gwHumanNetworkMatrix, 1, calculateListMax )
gwHumanDataDF$maxTieWeight <- gwHumanMaxTieWeightVector

`grp_week` (gw) - human - Basic metrics

Back to Table of Contents



In [55]:

    
# assuming that our statnet network object is in reference test1_statnet.

# Use the degree function in the sna package to create vector of degree values
#    for each node.  Make sure to pass the gmode parameter to tell it that the
#    graph is not directed (gmode = "graph", instead of "digraph").
# Doc: http://www.inside-r.org/packages/cran/sna/docs/degree
#degree_vector <- degree( test1_statnet, gmode = "graph" )

# If you have other libraries loaded that also implement a degree function, you
#    can also call this with package name:
gwHumanDegreeVector <- sna::degree( gwHumanNetworkStatnet, gmode = "graph" )

# output the vector
gwHumanDegreeVector

# want more info on the degree function?  You can get to it eventually through
#    the following:
#help( package = "sna" )
#??sna::degree

# what is the average (mean) degree?
gwHumanAvgDegree <- mean( gwHumanDegreeVector )
paste( output_prefix, "average degree =", gwHumanAvgDegree, sep = " " )

# subset vector to get only those that are above mean
gwHumanAboveMeanVector <- gwHumanDegreeVector[ gwHumanDegreeVector > gwHumanAvgDegree ]

# Take the degree and associate it with each node as a node attribute.
#    (%v% is a shortcut for the get.vertex.attribute command)
gwHumanNetworkStatnet %v% "degree" <- gwHumanDegreeVector

# also add degree vector to original data frame
gwHumanDataDF$degree <- gwHumanDegreeVector









    





	6
	0
	14
	0
	13
	0
	1
	11
	10
	1
	18
	1
	36
	0
	0
	21
	23
	6
	16
	20
	1
	3
	0
	1
	1
	0
	17
	2
	2
	2
	2
	2
	1
	1
	1
	1
	1
	13
	1
	1
	1
	1
	0
	1
	5
	0
	0
	1
	11
	0
	0
	8
	2
	1
	1
	1
	1
	0
	1
	1
	0
	1
	1
	1
	1
	0
	0
	1
	1
	1
	1
	16
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	3
	1
	1
	1
	9
	1
	1
	1
	1
	1
	1
	2
	6
	1
	0
	0
	0
	1
	1
	1
	1
	1
	1
	2
	2
	2
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	1
	1
	2
	1
	1
	1
	5
	1
	1
	1
	1
	1
	1
	7
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	6
	0
	0
	0
	0
	1
	1
	4
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	4
	4
	4
	4
	4
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	4
	1
	1
	1
	1
	2
	2
	2
	2
	2
	2
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	2
	2
	2
	3
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	4
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	23
	0
	8
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	1
	0
	1
	0
	0
	0
	0
	1
	0
	0
	1
	0
	0
	1
	0
	2
	0
	1
	1
	1
	0
	0
	1
	1
	0
	0
	1
	0
	0
	0
	0
	10
	0
	0
	1
	0
	0
	0
	0
	0
	0
	3
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	1
	1
	1
	1
	1
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	1
	0
	1
	1
	1
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	2
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	1
	1
	1
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0








    




'grp_week average degree = 0.582690659811482'



In [56]:

    
# average author degree (person types 2 and 4)
gwHumanAverageAuthorDegree2And4 <- calcAuthorMeanDegree( dataFrameIN = gwHumanDataDF, includeBothIN = TRUE )
paste( output_prefix, "average author degree (2 and 4) = ", gwHumanAverageAuthorDegree2And4, sep = " " )

# average author degree (person type 2 only)
gwHumanAverageAuthorDegreeOnly2 <- calcAuthorMeanDegree( dataFrameIN = gwHumanDataDF, includeBothIN = FALSE )
paste( output_prefix, "average author degree (only 2) = ", gwHumanAverageAuthorDegreeOnly2, sep = " " )

# average source degree (person types 3 and 4)
gwHumanAverageSourceDegree3And4 <- calcSourceMeanDegree( dataFrameIN = gwHumanDataDF, includeBothIN = TRUE )
paste( output_prefix, "average source degree (3 and 4) = ", gwHumanAverageSourceDegree3And4, sep = " " )

# average source degree (person type 3 only)
gwHumanAverageSourceDegreeOnly3 <- calcSourceMeanDegree( dataFrameIN = gwHumanDataDF, includeBothIN = FALSE )
paste( output_prefix, "average source degree (only 3) = ", gwHumanAverageSourceDegreeOnly3, sep = " " )









    




'grp_week average author degree (2 and 4) =  10.6060606060606'






    




'grp_week average author degree (only 2) =  10.6060606060606'






    




'grp_week average source degree (3 and 4) =  1.1913357400722'






    




'grp_week average source degree (only 3) =  1.1913357400722'

`grp_week` (gw) - human - More metrics

Back to Table of Contents

Now that we have the data in statnet object, run the code in the following for more in-depth information:

context_text/R/sna/statnet/sna-statnet-network-stats.r



In [57]:

    
# Links:
# - manual (PDF): http://cran.r-project.org/web/packages/sna/sna.pdf
# - good notes: http://www.shizukalab.com/toolkits/sna/node-level-calculations

# Also, be advised that statnet and igraph don't really play nice together.
#    If you'll be using both, best idea is to have a workspace for each.

#==============================================================================#
# statnet
#==============================================================================#

# make sure you've loaded the statnet library (includes sna)
# install.packages( "statnet" )
#library( statnet )

#==============================================================================#
# NODE level
#==============================================================================#

# what is the standard deviation of the degrees?
gwHumanDegreeSd <- sd( gwHumanDegreeVector )
paste( output_prefix, "degree SD =", gwHumanDegreeSd, sep = " " )

# what is the variance of the degrees?
gwHumanDegreeVar <- var( gwHumanDegreeVector )
paste( output_prefix, "degree variance =", gwHumanDegreeVar, sep = " " )

# what is the max value among the degrees?
gwHumanDegreeMax <- max( gwHumanDegreeVector )
paste( output_prefix, "degree max =", gwHumanDegreeMax, sep = " " )

# calculate and plot degree distributions
gwHumanDegreeFrequenciesTable <- table( gwHumanDegreeVector )
paste( output_prefix, "degree frequencies =", gwHumanDegreeFrequenciesTable, sep = " " )
gwHumanDegreeFrequenciesTable

# node-level undirected betweenness
gwHumanBetweenness <- sna::betweenness( gwHumanNetworkStatnet, gmode = "graph", cmode = "undirected" )

#paste( "betweenness = ", gwHumanBetweenness, sep = "" )
# associate with each node as a node attribute.
#    (%v% is a shortcut for the get.vertex.attribute command)
gwHumanNetworkStatnet %v% "betweenness" <- gwHumanBetweenness

# also add degree vector to original data frame
gwHumanDataDF$betweenness <- gwHumanBetweenness

#==============================================================================#
# NETWORK level
#==============================================================================#

# graph-level degree centrality
gwHumanDegreeCentrality <- sna::centralization( gwHumanNetworkStatnet, sna::degree, mode = "graph" )
paste( output_prefix, "degree centrality =", gwHumanDegreeCentrality, sep = " " )

# graph-level betweenness centrality
gwHumanBetweennessCentrality <- sna::centralization( gwHumanNetworkStatnet, sna::betweenness, mode = "graph", cmode = "undirected" )
paste( output_prefix, "betweenness centrality =", gwHumanBetweennessCentrality, sep = " " )

# graph-level connectedness
gwHumanConnectedness <- sna::connectedness( gwHumanNetworkStatnet )
paste( output_prefix, "connectedness =", gwHumanConnectedness, sep = " " )

# graph-level transitivity
gwHumanTransitivity <- sna::gtrans( gwHumanNetworkStatnet, mode = "graph" )
paste( output_prefix, "transitivity =", gwHumanTransitivity, sep = " " )

# graph-level density
gwHumanDensity <- sna::gden( gwHumanNetworkStatnet, mode = "graph" )
paste( output_prefix, "density =", gwHumanDensity, sep = " " )









    




'grp_week degree SD = 2.24329960040874'






    




'grp_week degree variance = 5.03239309719399'






    




'grp_week degree max = 36'






    





	'grp_week degree frequencies = 858'
	'grp_week degree frequencies = 244'
	'grp_week degree frequencies = 27'
	'grp_week degree frequencies = 4'
	'grp_week degree frequencies = 8'
	'grp_week degree frequencies = 2'
	'grp_week degree frequencies = 4'
	'grp_week degree frequencies = 1'
	'grp_week degree frequencies = 2'
	'grp_week degree frequencies = 1'
	'grp_week degree frequencies = 2'
	'grp_week degree frequencies = 2'
	'grp_week degree frequencies = 2'
	'grp_week degree frequencies = 1'
	'grp_week degree frequencies = 2'
	'grp_week degree frequencies = 1'
	'grp_week degree frequencies = 1'
	'grp_week degree frequencies = 1'
	'grp_week degree frequencies = 1'
	'grp_week degree frequencies = 2'
	'grp_week degree frequencies = 1'








    





gwHumanDegreeVector
  0   1   2   3   4   5   6   7   8   9  10  11  13  14  16  17  18  20  21  23 
858 244  27   4   8   2   4   1   2   1   2   2   2   1   2   1   1   1   1   2 
 36 
  1 






    




'grp_week degree centrality = 0.0304271969022151'






    




'grp_week betweenness centrality = 0.0134361252020462'






    




'grp_week connectedness = 0.0198541656561737'






    



Warning message in sna::gtrans(gwHumanNetworkStatnet, mode = "graph"):
“gtrans called with use.adjacency=TRUE, but your data looks too large for that to work well.  Overriding to edgelist method.”





    




'grp_week transitivity = 0.0752148997134671'






    




'grp_week density = 0.000499734699666794'

`grp_week` (gw) - human - create node attribute DataFrame

Back to Table of Contents

If you want to just work with the traits of the nodes/vertexes, you can combine the attribute vectors into a data frame.



In [58]:

    
#==============================================================================#
# output attributes to data frame
#==============================================================================#

# if you want to just work with the traits of the nodes/vertexes, you can
#    combine the attribute vectors into a data frame.

# first, output network object to see what attributes you have
gwHumanNetworkStatnet

# then, combine them into a data frame.
gwHumanNodeAttrDF <- data.frame( id = gwHumanNetworkStatnet %v% "vertex.names",
                                 person_id = gwHumanNetworkStatnet %v% "person_id",
                                 person_type = gwHumanNetworkStatnet %v% "person_type",
                                 degree = gwHumanNetworkStatnet %v% "degree",
                                 betweenness = gwHumanNetworkStatnet %v% "betweenness" )









    





 Network attributes:
  vertices = 1167 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 340 
    missing edges= 0 
    non-missing edges= 340 

 Vertex attribute names: 
    betweenness degree person_id person_type vertex.names 

No edge attributes

`grp_week` QAP graph correlation between automated and ground truth

Back to Table of Contents

Now, compare the automated and human-coded networks themselves using graph correlation in QAP.

Based on: context_text/R/sna/statnet/sna-qap.r

Note: QAP compares two networks, so will have to wait until both OpenCalais and human coding networks have been processed.



In [59]:

    
# link to good doc on qaptest(){sna} function: http://www.inside-r.org/packages/cran/sna/docs/qaptest

# First, need to load data - see (or just source() ) the file "sna-load_data.r".
# source( "sna-load_data.r" )
# does the following (among other things):
# Start with loading in tab-delimited files.
#humanNetworkData <- read.delim( "human-sourcenet_data-20150504-002453.tab", header = TRUE, row.names = 1, check.names = FALSE )
#calaisNetworkData <- read.delim( "puter-sourcenet_data-20150504-002507.tab", header = TRUE, row.names = 1, check.names = FALSE )

# remove the right-most column, which contains non-tie info on nodes.
#humanNetworkTies <- humanNetworkData[ , -ncol( humanNetworkData ) ]
#gwAutomatedNetworkDF <- calaisNetworkData[ , -ncol( calaisNetworkData )]

# convert each to a matrix
#gwHumanNetworkMatrix <- as.matrix( gwHumanNetworkTies )
#gwAutomatedNetworkMatrix <- as.matrix( gwAutomatedNetworkDF )

# imports
# install.packages( "sna" )
# install.packages( "statnet" )
library( "sna" )

# package up data for calling qaptest() - first make 3-dimensional array to hold
#    our two matrices - this is known as a "graph set".
graphSetArray <- array( dim = c( 2, ncol( gwHumanNetworkMatrix ), nrow( gwHumanNetworkMatrix ) ) )

# then, place each matrix in one dimension of the array.
graphSetArray[ 1, , ] <- gwHumanNetworkMatrix
graphSetArray[ 2, , ] <- gwAutomatedNetworkMatrix

# first, try a graph correlation
graphCorrelation <- sna::gcor( gwHumanNetworkMatrix, gwAutomatedNetworkMatrix )
paste( output_prefix, "graph correlation =", graphCorrelation, sep = " " )

# try a qaptest...
qapGcorResult <- sna::qaptest( graphSetArray, sna::gcor, g1 = 1, g2 = 2 )
summary( qapGcorResult )
plot( qapGcorResult )

# graph covariance...
graphCovariance <- sna::gcov( gwHumanNetworkMatrix, gwAutomatedNetworkMatrix )
graphCovariance
paste( output_prefix, "graph covariance =", graphCovariance, sep = " " )

# try a qaptest...
qapGcovResult <- sna::qaptest( graphSetArray, sna::gcov, g1 = 1, g2 = 2 )
summary( qapGcovResult )
plot( qapGcovResult )

# Hamming Distance
graphHammingDist <- sna::hdist( gwHumanNetworkMatrix, gwAutomatedNetworkMatrix )
paste( output_prefix, "graph hamming distance =", graphHammingDist, sep = " " )

# try a qaptest...
qapHdistResult <- sna::qaptest( graphSetArray, sna::hdist, g1 = 1, g2 = 2 )
summary( qapHdistResult )
plot( qapHdistResult )

# graph structural correlation?
#graphStructCorrelation <- gscor( gwHumanNetworkMatrix, gwAutomatedNetworkMatrix )
#graphStructCorrelation









    




'grp_week graph correlation = 0.892167983545507'






    





QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 0 
	p(f(perm) <= f(d)): 1 

Test Diagnostics:
	Test Value (f(d)): 0.892168 
	Replications: 1000 
	Distribution Summary:
		Min:	 -0.0004680711 
		1stQ:	 -0.0004680711 
		Med:	 -0.0004680711 
		Mean:	 2.853625e-05 
		3rdQ:	 -0.0004680711 
		Max:	 0.008961183 







    




0.000417206876441978






    




'grp_week graph covariance = 0.000417206876441978'






    





QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 0 
	p(f(perm) <= f(d)): 1 

Test Diagnostics:
	Test Value (f(d)): 0.0004172069 
	Replications: 1000 
	Distribution Summary:
		Min:	 -2.188853e-07 
		1stQ:	 -2.188853e-07 
		Med:	 -2.188853e-07 
		Mean:	 1.481429e-08 
		3rdQ:	 -2.188853e-07 
		Max:	 4.190542e-06 







    












    




'grp_week graph hamming distance = 140'






    





QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 1 
	p(f(perm) <= f(d)): 0 

Test Diagnostics:
	Test Value (f(d)): 140 
	Replications: 1000 
	Distribution Summary:
		Min:	 1260 
		1stQ:	 1276 
		Med:	 1276 
		Mean:	 1275.344 
		3rdQ:	 1276 
		Max:	 1276

Compare `grp_month` and `grp_week` using QAP

Back to Table of Contents

Now, compare the automated and human-coded networks from a month and a week against each other, to see what more time gets you.

Based on: context_text/R/sna/statnet/sna-qap.r

Note: QAP compares two networks, so will have to wait until both OpenCalais and human coding networks have been processed.

month-to-week - automated

Back to Table of Contents



In [60]:

    
output_prefix <- "month-to-week automated"



In [61]:

    
# link to good doc on qaptest(){sna} function: http://www.inside-r.org/packages/cran/sna/docs/qaptest

# First, need to load data - see (or just source() ) the file "sna-load_data.r".
# source( "sna-load_data.r" )
# does the following (among other things):
# Start with loading in tab-delimited files.
#humanNetworkData <- read.delim( "human-sourcenet_data-20150504-002453.tab", header = TRUE, row.names = 1, check.names = FALSE )
#calaisNetworkData <- read.delim( "puter-sourcenet_data-20150504-002507.tab", header = TRUE, row.names = 1, check.names = FALSE )

# remove the right-most column, which contains non-tie info on nodes.
#humanNetworkTies <- humanNetworkData[ , -ncol( humanNetworkData ) ]
#gwAutomatedNetworkDF <- calaisNetworkData[ , -ncol( calaisNetworkData )]

# convert each to a matrix
#gwHumanNetworkMatrix <- as.matrix( gwHumanNetworkTies )
#gwAutomatedNetworkMatrix <- as.matrix( gwAutomatedNetworkDF )

# imports
# install.packages( "sna" )
# install.packages( "statnet" )
library( "sna" )

# package up data for calling qaptest() - first make 3-dimensional array to hold
#    our two matrices - this is known as a "graph set".
graphSetArray <- array( dim = c( 2, ncol( gmAutomatedNetworkMatrix ), nrow( gmAutomatedNetworkMatrix ) ) )

# then, place each matrix in one dimension of the array.
graphSetArray[ 1, , ] <- gmAutomatedNetworkMatrix
graphSetArray[ 2, , ] <- gwAutomatedNetworkMatrix

# first, try a graph correlation
graphCorrelation <- sna::gcor( gmAutomatedNetworkMatrix, gwAutomatedNetworkMatrix )
paste( output_prefix, "graph correlation =", graphCorrelation, sep = " " )

# try a qaptest...
qapGcorResult <- sna::qaptest( graphSetArray, sna::gcor, g1 = 1, g2 = 2 )
summary( qapGcorResult )
plot( qapGcorResult )

# graph covariance...
graphCovariance <- sna::gcov( gmAutomatedNetworkMatrix, gwAutomatedNetworkMatrix )
graphCovariance
paste( output_prefix, "graph covariance =", graphCovariance, sep = " " )

# try a qaptest...
qapGcovResult <- sna::qaptest( graphSetArray, sna::gcov, g1 = 1, g2 = 2 )
summary( qapGcovResult )
plot( qapGcovResult )

# Hamming Distance
graphHammingDist <- sna::hdist( gmAutomatedNetworkMatrix, gwAutomatedNetworkMatrix )
paste( output_prefix, "graph hamming distance =", graphHammingDist, sep = " " )

# try a qaptest...
qapHdistResult <- sna::qaptest( graphSetArray, sna::hdist, g1 = 1, g2 = 2 )
summary( qapHdistResult )
plot( qapHdistResult )

# graph structural correlation?
#graphStructCorrelation <- gscor( gwHumanNetworkMatrix, gwAutomatedNetworkMatrix )
#graphStructCorrelation









    




'month-to-week automated graph correlation = 0.508287038302634'






    





QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 0 
	p(f(perm) <= f(d)): 1 

Test Diagnostics:
	Test Value (f(d)): 0.508287 
	Replications: 1000 
	Distribution Summary:
		Min:	 -0.0008621009 
		1stQ:	 -0.0008621009 
		Med:	 -0.0008621009 
		Mean:	 3.830717e-05 
		3rdQ:	 0.0008464533 
		Max:	 0.00768067 







    




0.000437261453028748






    




'month-to-week automated graph covariance = 0.000437261453028748'






    





QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 0 
	p(f(perm) <= f(d)): 1 

Test Diagnostics:
	Test Value (f(d)): 0.0004372615 
	Replications: 1000 
	Distribution Summary:
		Min:	 -7.41635e-07 
		1stQ:	 -7.41635e-07 
		Med:	 -7.41635e-07 
		Mean:	 3.295431e-08 
		3rdQ:	 7.28174e-07 
		Max:	 8.077219e-06 







    












    




'month-to-week automated graph hamming distance = 1708'






    





QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 1 
	p(f(perm) <= f(d)): 0 

Test Diagnostics:
	Test Value (f(d)): 1708 
	Replications: 1000 
	Distribution Summary:
		Min:	 2880 
		1stQ:	 2896 
		Med:	 2900 
		Mean:	 2897.816 
		3rdQ:	 2900 
		Max:	 2900

month-to-week - human

Back to Table of Contents



In [62]:

    
output_prefix <- "month-to-week human"



In [63]:

    
# link to good doc on qaptest(){sna} function: http://www.inside-r.org/packages/cran/sna/docs/qaptest

# First, need to load data - see (or just source() ) the file "sna-load_data.r".
# source( "sna-load_data.r" )
# does the following (among other things):
# Start with loading in tab-delimited files.
#humanNetworkData <- read.delim( "human-sourcenet_data-20150504-002453.tab", header = TRUE, row.names = 1, check.names = FALSE )
#calaisNetworkData <- read.delim( "puter-sourcenet_data-20150504-002507.tab", header = TRUE, row.names = 1, check.names = FALSE )

# remove the right-most column, which contains non-tie info on nodes.
#humanNetworkTies <- humanNetworkData[ , -ncol( humanNetworkData ) ]
#gwHumanNetworkDF <- calaisNetworkData[ , -ncol( calaisNetworkData )]

# convert each to a matrix
#gwHumanNetworkMatrix <- as.matrix( gwHumanNetworkTies )
#gwHumanNetworkMatrix <- as.matrix( gwHumanNetworkDF )

# imports
# install.packages( "sna" )
# install.packages( "statnet" )
library( "sna" )

# package up data for calling qaptest() - first make 3-dimensional array to hold
#    our two matrices - this is known as a "graph set".
graphSetArray <- array( dim = c( 2, ncol( gmHumanNetworkMatrix ), nrow( gmHumanNetworkMatrix ) ) )

# then, place each matrix in one dimension of the array.
graphSetArray[ 1, , ] <- gmHumanNetworkMatrix
graphSetArray[ 2, , ] <- gwHumanNetworkMatrix

# first, try a graph correlation
graphCorrelation <- sna::gcor( gmHumanNetworkMatrix, gwHumanNetworkMatrix )
paste( output_prefix, "graph correlation =", graphCorrelation, sep = " " )

# try a qaptest...
qapGcorResult <- sna::qaptest( graphSetArray, sna::gcor, g1 = 1, g2 = 2 )
summary( qapGcorResult )
plot( qapGcorResult )

# graph covariance...
graphCovariance <- sna::gcov( gmHumanNetworkMatrix, gwHumanNetworkMatrix )
graphCovariance
paste( output_prefix, "graph covariance =", graphCovariance, sep = " " )

# try a qaptest...
qapGcovResult <- sna::qaptest( graphSetArray, sna::gcov, g1 = 1, g2 = 2 )
summary( qapGcovResult )
plot( qapGcovResult )

# Hamming Distance
graphHammingDist <- sna::hdist( gmHumanNetworkMatrix, gwHumanNetworkMatrix )
paste( output_prefix, "graph hamming distance =", graphHammingDist, sep = " " )

# try a qaptest...
qapHdistResult <- sna::qaptest( graphSetArray, sna::hdist, g1 = 1, g2 = 2 )
summary( qapHdistResult )
plot( qapHdistResult )

# graph structural correlation?
#graphStructCorrelation <- gscor( gmHumanNetworkMatrix, gwHumanNetworkMatrix )
#graphStructCorrelation









    




'month-to-week human graph correlation = 0.531732055265214'






    





QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 0 
	p(f(perm) <= f(d)): 1 

Test Diagnostics:
	Test Value (f(d)): 0.5317321 
	Replications: 1000 
	Distribution Summary:
		Min:	 -0.0009402942 
		1stQ:	 -0.0009402942 
		Med:	 -0.0009402942 
		Mean:	 8.118338e-05 
		3rdQ:	 0.0006263892 
		Max:	 0.008459806 







    




0.000498852914926293






    




'month-to-week human graph covariance = 0.000498852914926293'






    





QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 0 
	p(f(perm) <= f(d)): 1 

Test Diagnostics:
	Test Value (f(d)): 0.0004988529 
	Replications: 1000 
	Distribution Summary:
		Min:	 -8.82152e-07 
		1stQ:	 -8.82152e-07 
		Med:	 -8.82152e-07 
		Mean:	 -6.145822e-09 
		3rdQ:	 5.87657e-07 
		Max:	 6.466893e-06 







    












    




'month-to-week human graph hamming distance = 1722'






    





QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 1 
	p(f(perm) <= f(d)): 0 

Test Diagnostics:
	Test Value (f(d)): 1722 
	Replications: 1000 
	Distribution Summary:
		Min:	 3066 
		1stQ:	 3078 
		Med:	 3082 
		Mean:	 3079.732 
		3rdQ:	 3082 
		Max:	 3082

Save workspace image

Back to Table of Contents

Save all the information in the current image, in case we need/want it later.



In [9]:

    
# help( save.image )
save.image( file = workspace_file_name )

TODO

Back to Table of Contents

TODO:

human data for grp_month has one fewer vertex (1167) than automated (1168). The missing person is row 355, user ID 781 (source_3), who is in automated, not in human. QAP needs same-size matrices.
- 781 - Cook, Matthew ( Wayland Fire Department )
- First, try to regenerate the data.
- Then, if it doesn't get better, look into the article(s) where 781 - Cook, Matthew ( Wayland Fire Department ) is mentioned.

Table of Contents

R network analysis files

Setup

Setup - working directories

Setup - import SNA functions

Setup - network data - render and store network data

data - grp_month

Setup - load workspace (optional)

grp_month analysis

grp_month (gm) - automated - OpenCalais

grp_month (gm) - automated - Read data

grp_month (gm) - automated - initialize statnet

grp_month (gm) - automated - Basic metrics

grp_month (gm) - automated - More metrics

grp_month (gm) - automated - create node attribute DataFrame

grp_month (gm) - human

grp_month (gm) - human - Read data

grp_month (gm) - human - initialize statnet

grp_month (gm) - human - Basic metrics

grp_month (gm) - human - More metrics

grp_month (gm) - human - create node attribute DataFrame

grp_month QAP graph correlation between automated and ground truth

grp_week analysis

grp_week (gw) - automated - OpenCalais

grp_week (gw) - automated - Read data

grp_week (gw) - automated - initialize statnet

grp_week (gw) - automated - Basic metrics

grp_week (gw) - automated - More metrics

grp_week (gw) - automated - create node attribute DataFrame

grp_week (gw) - human

grp_week (gw) - human - Read data

grp_week (gw) - human - initialize statnet

grp_week (gw) - human - Basic metrics

grp_week (gw) - human - More metrics

grp_week (gw) - human - create node attribute DataFrame

grp_week QAP graph correlation between automated and ground truth

Compare grp_month and grp_week using QAP

month-to-week - automated

month-to-week - human

Save workspace image

TODO

`grp_month` analysis

`grp_month` (gm) - automated - OpenCalais

`grp_month` (gm) - automated - Read data

`grp_month` (gm) - automated - initialize statnet

`grp_month` (gm) - automated - Basic metrics

`grp_month` (gm) - automated - More metrics

`grp_month` (gm) - automated - create node attribute DataFrame

`grp_month` (gm) - human

`grp_month` (gm) - human - Read data

`grp_month` (gm) - human - initialize statnet

`grp_month` (gm) - human - Basic metrics

`grp_month` (gm) - human - More metrics

`grp_month` (gm) - human - create node attribute DataFrame

`grp_month` QAP graph correlation between automated and ground truth

`grp_week` analysis

`grp_week` (gw) - automated - OpenCalais

`grp_week` (gw) - automated - Read data

`grp_week` (gw) - automated - initialize statnet

`grp_week` (gw) - automated - Basic metrics

`grp_week` (gw) - automated - More metrics

`grp_week` (gw) - automated - create node attribute DataFrame

`grp_week` (gw) - human

`grp_week` (gw) - human - Read data

`grp_week` (gw) - human - initialize statnet

`grp_week` (gw) - human - Basic metrics

`grp_week` (gw) - human - More metrics

`grp_week` (gw) - human - create node attribute DataFrame

`grp_week` QAP graph correlation between automated and ground truth

Compare `grp_month` and `grp_week` using QAP