R - statnet - grp month - full month

2017.12.02 - work log - prelim - R - statnet - grp month

1 R network analysis files
2 Setup
3 grp_month analysis
4 Save workspace image
5 TODO

R network analysis files

Back to Table of Contents

Related files:

network descriptives
- network-level
  - files
    - R scripts:
      - context_text/R/db_connect.r
      - context_text/R/sna/functions-sna.r
      - context_text/R/sna/sna-load_data.r
      - context_text/R/sna/igraph/*
      - context_text/R/sna/statnet/*
  - statnet/sna
    - sna::gden() - graph density
    - R scripts:
      - context_text/R/sna/statnet/sna-statnet-init.r
      - context_text/R/sna/statnet/sna-statnet-network-stats.r
      - context_text/R/sna/statnet/sna-qap.r
  - igraph
    - igraph::transitivity() - vector of transitivity scores for each node in a graph, plus network-level transitivity score.
      - Q - interpretation?
    - R scripts:
      - context_text/R/sna/statnet/sna-igraph-init.r
      - context_text/R/sna/statnet/sna-igraph-network-stats.r

Setup

Back to Table of Contents

Setup - working directories

Back to Table of Contents

Store important directories and file names in variables:



In [1]:

    
getwd()









    




'/home/jonathanmorgan/work/django/research/work/phd_work/methods/network_analysis/statnet'



In [2]:

    
# code files (in particular SNA function library, modest though it may be)
code_directory <- "/home/jonathanmorgan/work/django/research/context_analysis/R/sna"
sna_function_file_path <- paste( code_directory, "/", 'functions-sna.r', sep = "" )

# home directory
home_directory <- getwd()
home_directory <- "/home/jonathanmorgan/work/django/research/work/phd_work/methods"

# data directories
data_directory <- paste( home_directory, "/data", sep = "" )
workspace_file_name <- "statnet-grp_month.RData"
workspace_file_path <- paste( data_directory, "/", workspace_file_name )



In [3]:

    
# set working directory to data directory for now.
setwd( data_directory )
getwd()









    




'/home/jonathanmorgan/work/django/research/work/phd_work/methods/data'

Setup - import SNA functions

Back to Table of Contents

source the file functions-sna.r.



In [4]:

    
source( sna_function_file_path )

Setup - import statnet functions

Back to Table of Contents

source the file statnet/functions-statnet.r.



In [5]:

    
# statnet/sna functions
# - /home/jonathanmorgan/work/django/research/context_analysis/R/sna/stanet/functions-statnet.r
statnetFunctionFilePath <- paste( code_directory, "/statnet/", 'functions-statnet.r', sep = "" )



In [6]:

    
source( statnetFunctionFilePath )









    



Loading required package: statnet.common

Attaching package: ‘statnet.common’

The following object is masked from ‘package:base’:

    order

Loading required package: network
network: Classes for Relational Data
Version 1.13.0.1 created on 2015-08-31.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
                    Mark S. Handcock, University of California -- Los Angeles
                    David R. Hunter, Penn State University
                    Martina Morris, University of Washington
                    Skye Bender-deMoll, University of Washington
 For citation information, type citation("network").
 Type help("network-package") to get started.

sna: Tools for Social Network Analysis
Version 2.4 created on 2016-07-23.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
 For citation information, type citation("sna").
 Type help(package="sna") to get started.

Setup - network data - render and store network data

Back to Table of Contents

First, need render to render network data and upload it to your server.

Directions for rendering network data are in methods-network_analysis-create_network_data.ipynb. You want a tab-delimited matrix that includes both the network and attributes of nodes as columns, and you want it to include a header row.

Once you render your network data files, you should place them on the server.

High level data file layout:

tab-delimited.
first row and first column are labels
last 2 columns are traits of nodes (person_id and person_type)
each row and column after first until the trait columns represents a person found in one of the articles.
The people are in the same order from top to bottom and left to right.
Where the row and column of two people meet, and one of the people is an author, the nunber in the cell where they meet is the number of times the non-author was quoted in an article by the author. Does not include more basic two-mode co-location ties (appeared in same article, even if not an author and/or not quoted).

Files and their location on server:

data - grp_month

Back to Table of Contents

This is data from the Grand Rapids Press articles from December of 2009, coded by both humans and OpenCalais.

Files:

automated full month - sourcenet_data-20171205-022551-grp_month-automated.tab
human full month (baseline has priority) - sourcenet_data-20171115-043102-grp_month-human.tab

Location in Dropbox: Dropbox/academia/MSU/program_stuff/prelim_paper/data/network_analysis/2017.11.14/network/new_coders/grp_month

Location on server: /home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month

Setup - load workspace

Back to Table of Contents

You must load this file's workspace, from a previous run, if one exists:



In [7]:

    
# assumes that you've already set working directory above to the
#     working directory.
setwd( data_directory )
load( workspace_file_name )

`grp_month` analysis

Back to Table of Contents

First, look at the shiny new month of data.

`grp_month` (gm) - automated - OpenCalais

Return to Table of Contents

First, we'll analyze the month of data coded by OpenCalais. Set up some variables to store where data is located:

`grp_month` (gm) - automated - Read data

Return to Table of Contents

Read in the data from tab-delimited data file, then get it in right data structures for use in R SNA.



In [5]:

    
# initialize variables
gmAutomatedDataFolder <- paste( data_directory, "/network/grp_month", sep = "" )
gmAutomatedDataFile <- "sourcenet_data-20171205-022551-grp_month-automated.tab"
gmAutomatedDataPath <- paste( gmAutomatedDataFolder, "/", gmAutomatedDataFile, sep = "" )



In [6]:

    
gmAutomatedDataPath









    




'/home/jonathanmorgan/work/django/research/work/phd_work/methods/data/network/grp_month/sourcenet_data-20171205-022551-grp_month-automated.tab'

Load the data file into memory



In [7]:

    
# tab-delimited:
gmAutomatedDataDF <- read.delim( gmAutomatedDataPath, header = TRUE, row.names = 1, check.names = FALSE )



In [8]:

    
# get count of rows...
gmAutomatedRowCount <- nrow( gmAutomatedDataDF )
paste( "grp_month automated row count = ", gmAutomatedRowCount, sep = "" )

# ...and columns
gmAutomatedColumnCount <- ncol( gmAutomatedDataDF )
paste( "grp_month automated column count = ", gmAutomatedColumnCount, sep = "" )









    




'grp_month automated row count = 1167'






    




'grp_month automated column count = 1169'

Get just the tie rows and columns for initializing network libraries.



In [9]:

    
# the below syntax returns only as many columns as there are rows, so
#     omitting any trait columns that lie in columns on the right side
#     of the file.
gmAutomatedNetworkDF <- gmAutomatedDataDF[ , 1 : gmAutomatedRowCount ]
#str( gmAutomatedNetworkDF )



In [10]:

    
# convert to a matrix
gmAutomatedNetworkMatrix <- as.matrix( gmAutomatedNetworkDF )
# str( gmAutomatedNetworkMatrix )

`grp_month` (gm) - automated - initialize statnet

Back to Table of Contents

First, load the statnet package, then load the automated grp_month data into statnet object and assign attributes to nodes.

Based on context_text/R/sna/statnet/sna-statnet-init.r.



In [11]:

    
# make sure you've loaded the statnet library
# install.packages( "statnet" )
library( statnet )









    



Loading required package: tergm
Loading required package: statnet.common

Attaching package: ‘statnet.common’

The following object is masked from ‘package:base’:

    order

Loading required package: ergm
Loading required package: network
network: Classes for Relational Data
Version 1.13.0 created on 2015-08-31.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
                    Mark S. Handcock, University of California -- Los Angeles
                    David R. Hunter, Penn State University
                    Martina Morris, University of Washington
                    Skye Bender-deMoll, University of Washington
 For citation information, type citation("network").
 Type help("network-package") to get started.


ergm: version 3.8.0, created on 2017-08-18
Copyright (c) 2017, Mark S. Handcock, University of California -- Los Angeles
                    David R. Hunter, Penn State University
                    Carter T. Butts, University of California -- Irvine
                    Steven M. Goodreau, University of Washington
                    Pavel N. Krivitsky, University of Wollongong
                    Martina Morris, University of Washington
                    with contributions from
                    Li Wang
                    Kirk Li, University of Washington
                    Skye Bender-deMoll, University of Washington
Based on "statnet" project software (statnet.org).
For license and citation information see statnet.org/attribution
or type citation("ergm").

NOTE: Versions before 3.6.1 had a bug in the implementation of the bd()
constriant which distorted the sampled distribution somewhat. In
addition, Sampson's Monks datasets had mislabeled vertices. See the
NEWS and the documentation for more details.

Loading required package: networkDynamic

networkDynamic: version 0.9.0, created on 2016-01-12
Copyright (c) 2016, Carter T. Butts, University of California -- Irvine
                    Ayn Leslie-Cook, University of Washington
                    Pavel N. Krivitsky, University of Wollongong
                    Skye Bender-deMoll, University of Washington
                    with contributions from
                    Zack Almquist, University of California -- Irvine
                    David R. Hunter, Penn State University
                    Li Wang
                    Kirk Li, University of Washington
                    Steven M. Goodreau, University of Washington
                    Jeffrey Horner
                    Martina Morris, University of Washington
Based on "statnet" project software (statnet.org).
For license and citation information see statnet.org/attribution
or type citation("networkDynamic").


tergm: version 3.4.1, created on 2017-09-12
Copyright (c) 2017, Pavel N. Krivitsky, University of Wollongong
                    Mark S. Handcock, University of California -- Los Angeles
                    with contributions from
                    David R. Hunter, Penn State University
                    Steven M. Goodreau, University of Washington
                    Martina Morris, University of Washington
                    Nicole Bohme Carnegie, New York University
                    Carter T. Butts, University of California -- Irvine
                    Ayn Leslie-Cook, University of Washington
                    Skye Bender-deMoll
                    Li Wang
                    Kirk Li, University of Washington
Based on "statnet" project software (statnet.org).
For license and citation information see statnet.org/attribution
or type citation("tergm").

Loading required package: ergm.count

ergm.count: version 3.2.2, created on 2016-03-29
Copyright (c) 2016, Pavel N. Krivitsky, University of Wollongong
                    with contributions from
                    Mark S. Handcock, University of California -- Los Angeles
                    David R. Hunter, Penn State University
Based on "statnet" project software (statnet.org).
For license and citation information see statnet.org/attribution
or type citation("ergm.count").

NOTE: The form of the term ‘CMP’ has been changed in version 3.2 of
‘ergm.count’. See the news or help('CMP') for more information.

Loading required package: sna
sna: Tools for Social Network Analysis
Version 2.4 created on 2016-07-23.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
 For citation information, type citation("sna").
 Type help(package="sna") to get started.


statnet: version 2016.9, created on 2016-08-29
Copyright (c) 2016, Mark S. Handcock, University of California -- Los Angeles
                    David R. Hunter, Penn State University
                    Carter T. Butts, University of California -- Irvine
                    Steven M. Goodreau, University of Washington
                    Pavel N. Krivitsky, University of Wollongong
                    Skye Bender-deMoll
                    Martina Morris, University of Washington
Based on "statnet" project software (statnet.org).
For license and citation information see statnet.org/attribution
or type citation("statnet").


There are updates for the following statnet packages on CRAN:






    



        Installed ReposVer   Built  
network "1.13.0"  "1.13.0.1" "3.4.2"






    



Restart R and use "statnet::update_statnet()" to get the updates.



In [12]:

    
# If you have a data frame of attributes (each attribute is a column, with
#     attribute name the column name), you can associate those attributes
#     when you create the network.
# attribute help: http://www.inside-r.org/packages/cran/network/docs/loading.attributes

# load attributes from a file:
#tab_attribute_test1 <- read.delim( "tab-test1-attribute_data.txt", header = TRUE, row.names = 1, check.names = FALSE )

# or create DataFrame by just grabbing the attribute columns
gmAutomatedNetworkAttributeDF <- gmAutomatedDataDF[ , 1168:1169 ]

# convert matrix to statnet network object instance.
gmAutomatedNetworkStatnet <- network( gmAutomatedNetworkMatrix, matrix.type = "adjacency", directed = FALSE, vertex.attr = gmAutomatedNetworkAttributeDF )

# look at information now.
gmAutomatedNetworkStatnet

# Network attributes:
#  vertices = 314
#  directed = FALSE
#  hyper = FALSE
#  loops = FALSE
#  multiple = FALSE
#  bipartite = FALSE
#  total edges= 309
#    missing edges= 0
#    non-missing edges= 309
#
# Vertex attribute names:
#    person_type vertex.names
#
# No edge attributes









    





 Network attributes:
  vertices = 1167 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 1152 
    missing edges= 0 
    non-missing edges= 1152 

 Vertex attribute names: 
    person_id person_type vertex.names 

 Edge attribute names not shown



In [13]:

    
# calais - include ties Greater than or equal to 0 (GE0)
gmAutomatedMeanTieWeightGE0Vector <- apply( gmAutomatedNetworkMatrix, 1, calculateListMean )
gmAutomatedDataDF$meanTieWeightGE0 <- gmAutomatedMeanTieWeightGE0Vector

# calais - include ties Greater than or equal to 1 (GE1)
gmAutomatedMeanTieWeightGE1Vector <- apply( gmAutomatedNetworkMatrix, 1, calculateListMean, minValueToIncludeIN = 1 )
gmAutomatedDataDF$meanTieWeightGE1 <- gmAutomatedMeanTieWeightGE1Vector

# automated - Max tie weight?
gmAutomatedMaxTieWeightVector <- apply( gmAutomatedNetworkMatrix, 1, calculateListMax )
gmAutomatedDataDF$maxTieWeight <- gmAutomatedMaxTieWeightVector

`grp_month` (gm) - automated - Basic metrics

Back to Table of Contents



In [14]:

    
# assuming that our statnet network object is in reference test1_statnet.

# Use the degree function in the sna package to create vector of degree values
#    for each node.  Make sure to pass the gmode parameter to tell it that the
#    graph is not directed (gmode = "graph", instead of "digraph").
# Doc: http://www.inside-r.org/packages/cran/sna/docs/degree
#degree_vector <- degree( test1_statnet, gmode = "graph" )

# If you have other libraries loaded that also implement a degree function, you
#    can also call this with package name:
gmAutomatedDegreeVector <- sna::degree( gmAutomatedNetworkStatnet, gmode = "graph" )

# output the vector
gmAutomatedDegreeVector

# want more info on the degree function?  You can get to it eventually through
#    the following:
#help( package = "sna" )
#??sna::degree

# what is the average (mean) degree?
gmAutomatedAvgDegree <- mean( gmAutomatedDegreeVector )
paste( "average degree = ", gmAutomatedAvgDegree, sep = "" )

# subset vector to get only those that are above mean
gmAutomatedAboveMeanVector <- gmAutomatedDegreeVector[ gmAutomatedDegreeVector > gmAutomatedAvgDegree ]

# Take the degree and associate it with each node as a node attribute.
#    (%v% is a shortcut for the get.vertex.attribute command)
gmAutomatedNetworkStatnet %v% "degree" <- gmAutomatedDegreeVector

# also add degree vector to original data frame
gmAutomatedDataDF$degree <- gmAutomatedDegreeVector









    





	30
	1
	39
	1
	34
	1
	1
	50
	26
	1
	29
	1
	47
	1
	1
	93
	46
	29
	47
	71
	2
	9
	1
	1
	2
	1
	35
	0
	2
	2
	2
	2
	1
	1
	1
	1
	1
	42
	1
	0
	1
	2
	2
	1
	5
	1
	0
	1
	43
	5
	1
	14
	1
	1
	2
	2
	1
	2
	1
	1
	1
	1
	1
	1
	0
	1
	0
	2
	4
	1
	1
	32
	1
	1
	3
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	7
	1
	1
	1
	9
	2
	1
	1
	1
	1
	1
	2
	5
	0
	0
	0
	0
	1
	1
	1
	0
	1
	1
	2
	0
	2
	2
	1
	1
	1
	1
	1
	2
	1
	1
	1
	1
	1
	0
	1
	1
	35
	1
	1
	1
	1
	1
	1
	0
	1
	1
	19
	1
	1
	1
	1
	1
	1
	27
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	4
	1
	1
	32
	49
	1
	2
	0
	1
	1
	44
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	2
	1
	1
	4
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	2
	2
	2
	0
	2
	2
	1
	1
	1
	1
	1
	4
	1
	1
	1
	0
	1
	1
	1
	1
	2
	1
	15
	1
	0
	1
	1
	2
	2
	2
	2
	2
	2
	4
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	0
	2
	2
	2
	2
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	2
	1
	0
	2
	0
	1
	2
	3
	1
	1
	1
	1
	0
	1
	1
	1
	0
	1
	72
	22
	46
	3
	1
	1
	1
	10
	8
	2
	1
	1
	2
	1
	1
	2
	1
	2
	2
	1
	1
	1
	1
	0
	0
	0
	1
	2
	1
	1
	1
	1
	1
	1
	2
	1
	0
	1
	1
	1
	1
	1
	1
	0
	1
	1
	0
	1
	1
	0
	0
	1
	1
	0
	0
	1
	1
	1
	3
	2
	15
	1
	1
	1
	1
	0
	2
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	2
	1
	1
	1
	0
	1
	0
	0
	1
	1
	1
	1
	1
	1
	2
	1
	0
	1
	7
	1
	1
	1
	1
	1
	1
	1
	0
	1
	0
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	3
	2
	4
	2
	1
	2
	2
	1
	1
	1
	1
	1
	0
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	0
	1
	1
	1
	1
	2
	1
	4
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	2
	1
	0
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	4
	1
	1
	1
	1
	1
	0
	1
	5
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	2
	2
	0
	2
	2
	0
	1
	1
	1
	0
	0
	1
	1
	1
	1
	1
	1
	1
	1
	4
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	0
	0
	1
	1
	1
	1
	1
	1
	1
	0
	0
	1
	0
	0
	1
	0
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	2
	1
	0
	1
	0
	1
	1
	1
	0
	1
	1
	1
	1
	1
	0
	2
	2
	2
	2
	2
	1
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	2
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	0
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	0
	1
	1
	1
	1
	0
	1
	0
	0
	1
	1
	0
	1
	3
	2
	1
	1
	1
	2
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	0
	0
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	0
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	0
	0
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	0
	1
	1
	0
	1
	1
	1
	1
	0
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	6
	1
	1
	1
	1
	2
	1
	1
	1
	1
	1
	2
	2
	0
	2
	2
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	2
	0
	2
	2
	2
	1
	0
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	2
	2
	2
	2
	2
	2
	2
	1
	1
	1
	0
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	2
	1
	1
	2
	0
	1
	0
	1








    




'average degree = 1.97429305912596'



In [15]:

    
# average author degree (person types 2 and 4)
gmAutomatedAverageAuthorDegree2And4 <- calcAuthorMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = TRUE )
paste( "average author degree (2 and 4) = ", gmAutomatedAverageAuthorDegree2And4, sep = "" )

# average author degree (person type 2 only)
gmAutomatedAverageAuthorDegreeOnly2 <- calcAuthorMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = FALSE )
paste( "average author degree (only 2) = ", gmAutomatedAverageAuthorDegreeOnly2, sep = "" )

# average source degree (person types 3 and 4)
gmAutomatedAverageSourceDegree3And4 <- calcSourceMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = TRUE )
paste( "average source degree (3 and 4) = ", gmAutomatedAverageSourceDegree3And4, sep = "" )

# average source degree (person type 3 only)
gmAutomatedAverageSourceDegreeOnly3 <- calcSourceMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = FALSE )
paste( "average source degree (only 3) = ", gmAutomatedAverageSourceDegreeOnly3, sep = "" )









    




'average author degree (2 and 4) = 24.7872340425532'






    




'average author degree (only 2) = 24.8478260869565'






    




'average source degree (3 and 4) = 1.161'






    




'average source degree (only 3) = 1.14014014014014'

`grp_month` (gm) - automated - More metrics

Back to Table of Contents

Now that we have the data in statnet object, run the code in the following for more in-depth information:

context_text/R/sna/statnet/sna-statnet-network-stats.r



In [16]:

    
# Links:
# - manual (PDF): http://cran.r-project.org/web/packages/sna/sna.pdf
# - good notes: http://www.shizukalab.com/toolkits/sna/node-level-calculations

# Also, be advised that statnet and igraph don't really play nice together.
#    If you'll be using both, best idea is to have a workspace for each.

#==============================================================================#
# statnet
#==============================================================================#

# make sure you've loaded the statnet library (includes sna)
# install.packages( "statnet" )
#library( statnet )

#==============================================================================#
# NODE level
#==============================================================================#

# what is the standard deviation of the degrees?
gmAutomatedDegreeSd <- sd( gmAutomatedDegreeVector )
paste( "degree SD = ", gmAutomatedDegreeSd, sep = "" )

# what is the variance of the degrees?
gmAutomatedDegreeVar <- var( gmAutomatedDegreeVector )
paste( "degree variance = ", gmAutomatedDegreeVar, sep = "" )

# what is the max value among the degrees?
gmAutomatedDegreeMax <- max( gmAutomatedDegreeVector )
paste( "degree max = ", gmAutomatedDegreeMax, sep = "" )

# calculate and plot degree distributions
gmAutomatedDegreeFrequenciesTable <- table( gmAutomatedDegreeVector )
paste( "degree frequencies = ", gmAutomatedDegreeFrequenciesTable, sep = "" )
gmAutomatedDegreeFrequenciesTable

# node-level undirected betweenness
gmAutomatedBetweenness <- sna::betweenness( gmAutomatedNetworkStatnet, gmode = "graph", cmode = "undirected" )

#paste( "betweenness = ", gmAutomatedBetweenness, sep = "" )
# associate with each node as a node attribute.
#    (%v% is a shortcut for the get.vertex.attribute command)
gmAutomatedNetworkStatnet %v% "betweenness" <- gmAutomatedBetweenness

# also add degree vector to original data frame
gmAutomatedDataDF$betweenness <- gmAutomatedBetweenness

#==============================================================================#
# NETWORK level
#==============================================================================#

# graph-level degree centrality
gmAutomatedDegreeCentrality <- sna::centralization( gmAutomatedNetworkStatnet, sna::degree, mode = "graph" )
paste( "degree centrality = ", gmAutomatedDegreeCentrality, sep = "" )

# graph-level betweenness centrality
gmAutomatedBetweennessCentrality <- sna::centralization( gmAutomatedNetworkStatnet, sna::betweenness, mode = "graph", cmode = "undirected" )
paste( "betweenness centrality = ", gmAutomatedBetweennessCentrality, sep = "" )

# graph-level connectedness
gmAutomatedConnectedness <- sna::connectedness( gmAutomatedNetworkStatnet )
paste( "connectedness = ", gmAutomatedConnectedness, sep = "" )

# graph-level transitivity
gmAutomatedTransitivity <- sna::gtrans( gmAutomatedNetworkStatnet, mode = "graph" )
paste( "transitivity = ", gmAutomatedTransitivity, sep = "" )

# graph-level density
gmAutomatedDensity <- sna::gden( gmAutomatedNetworkStatnet, mode = "graph" )
paste( "density = ", gmAutomatedDensity, sep = "" )









    




'degree SD = 6.42460087405331'






    




'degree variance = 41.2754963908866'






    




'degree max = 93'






    





	'degree frequencies = 122'
	'degree frequencies = 891'
	'degree frequencies = 100'
	'degree frequencies = 6'
	'degree frequencies = 9'
	'degree frequencies = 4'
	'degree frequencies = 1'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 2'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'








    





gmAutomatedDegreeVector
  0   1   2   3   4   5   6   7   8   9  10  14  15  19  22  26  27  29  30  32 
122 891 100   6   9   4   1   2   1   2   1   1   2   1   1   1   1   2   1   2 
 34  35  39  42  43  44  46  47  49  50  71  72  93 
  1   2   1   1   1   1   2   2   1   1   1   1   1 






    




'degree centrality = 0.0782006640213782'






    




'betweenness centrality = 0.206660606881935'






    




'connectedness = 0.588270050752468'






    



Warning message in sna::gtrans(gmAutomatedNetworkStatnet, mode = "graph"):
“gtrans called with use.adjacency=TRUE, but your data looks too large for that to work well.  Overriding to edgelist method.”





    




'transitivity = 0.00893353450329548'






    




'density = 0.00169321874710632'

`grp_month` (gm) - automated - create node attribute DataFrame

Back to Table of Contents

If you want to just work with the traits of the nodes/vertexes, you can combine the attribute vectors into a data frame.



In [17]:

    
#==============================================================================#
# output attributes to data frame
#==============================================================================#

# if you want to just work with the traits of the nodes/vertexes, you can
#    combine the attribute vectors into a data frame.

# first, output network object to see what attributes you have
gmAutomatedNetworkStatnet

# then, combine them into a data frame.
gmAutomatedNodeAttrDF <- data.frame( id = gmAutomatedNetworkStatnet %v% "vertex.names",
                                     person_id = gmAutomatedNetworkStatnet %v% "person_id",
                                     person_type = gmAutomatedNetworkStatnet %v% "person_type",
                                     degree = gmAutomatedNetworkStatnet %v% "degree",
                                     betweenness = gmAutomatedNetworkStatnet %v% "betweenness" )









    





 Network attributes:
  vertices = 1167 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 1152 
    missing edges= 0 
    non-missing edges= 1152 

 Vertex attribute names: 
    betweenness degree person_id person_type vertex.names 

 Edge attribute names not shown

`grp_month` (gm) - human

Return to Table of Contents

Next, we'll analyze the month of data coded by human coders. Set up some variables to store where data is located:

`grp_month` (gm) - human - Read data

Return to Table of Contents

Read in the data from tab-delimited data file, then get it in right data structures for use in R SNA.



In [18]:

    
# initialize variables
gmHumanDataFolder <- paste( data_directory, "/network/grp_month", sep = "" )
gmHumanDataFile <- "sourcenet_data-20171115-043102-grp_month-human.tab"
gmHumanDataPath <- paste( gmHumanDataFolder, "/", gmHumanDataFile, sep = "" )



In [19]:

    
gmHumanDataPath









    




'/home/jonathanmorgan/work/django/research/work/phd_work/methods/data/network/grp_month/sourcenet_data-20171115-043102-grp_month-human.tab'

Load the data file into memory



In [20]:

    
# tab-delimited:
gmHumanDataDF <- read.delim( gmHumanDataPath, header = TRUE, row.names = 1, check.names = FALSE )



In [21]:

    
# get count of rows...
gmHumanRowCount <- nrow( gmHumanDataDF )
paste( "grp_month automated row count = ", gmHumanRowCount, sep = "" )

# ...and columns
gmHumanColumnCount <- ncol( gmHumanDataDF )
paste( "grp_month automated column count = ", gmHumanColumnCount, sep = "" )









    




'grp_month automated row count = 1167'






    




'grp_month automated column count = 1169'

Get just the tie rows and columns for initializing network libraries.



In [22]:

    
# the below syntax returns only as many columns as there are rows, so
#     omitting any trait columns that lie in columns on the right side
#     of the file.
gmHumanNetworkDF <- gmHumanDataDF[ , 1 : gmHumanRowCount ]
#str( gmHumanNetworkDF )



In [23]:

    
# convert to a matrix
gmHumanNetworkMatrix <- as.matrix( gmHumanNetworkDF )
# str( gmHumanNetworkMatrix )

`grp_month` (gm) - human - initialize statnet

Back to Table of Contents

First, load the statnet package, then load the automated grp_month data into statnet object and assign attributes to nodes.

Based on context_text/R/sna/statnet/sna-statnet-init.r.



In [24]:

    
# make sure you've loaded the statnet library
# install.packages( "statnet" )
library( statnet )



In [25]:

    
# If you have a data frame of attributes (each attribute is a column, with
#     attribute name the column name), you can associate those attributes
#     when you create the network.
# attribute help: http://www.inside-r.org/packages/cran/network/docs/loading.attributes

# load attributes from a file:
#tab_attribute_test1 <- read.delim( "tab-test1-attribute_data.txt", header = TRUE, row.names = 1, check.names = FALSE )

# or create DataFrame by just grabbing the attribute columns
#gmHumanNetworkAttributeDF <- gmHumanDataDF[ , 1169:1170 ]
gmHumanNetworkAttributeDF <- gmHumanDataDF[ , 1168:1169 ]

# convert matrix to statnet network object instance.
gmHumanNetworkStatnet <- network( gmHumanNetworkMatrix, matrix.type = "adjacency", directed = FALSE, vertex.attr = gmHumanNetworkAttributeDF )

# look at information now.
gmHumanNetworkStatnet

# Network attributes:
#  vertices = 314
#  directed = FALSE
#  hyper = FALSE
#  loops = FALSE
#  multiple = FALSE
#  bipartite = FALSE
#  total edges= 309
#    missing edges= 0
#    non-missing edges= 309
#
# Vertex attribute names:
#    person_type vertex.names
#
# No edge attributes









    





 Network attributes:
  vertices = 1167 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 1201 
    missing edges= 0 
    non-missing edges= 1201 

 Vertex attribute names: 
    person_id person_type vertex.names 

 Edge attribute names not shown



In [26]:

    
# human - include ties Greater than or equal to 0 (GE0)
gmHumanMeanTieWeightGE0Vector <- apply( gmHumanNetworkMatrix, 1, calculateListMean )
gmHumanDataDF$meanTieWeightGE0 <- gmHumanMeanTieWeightGE0Vector

# human - include ties Greater than or equal to 1 (GE1)
gmHumanMeanTieWeightGE1Vector <- apply( gmHumanNetworkMatrix, 1, calculateListMean, minValueToIncludeIN = 1 )
gmHumanDataDF$meanTieWeightGE1 <- gmHumanMeanTieWeightGE1Vector

# human - Max tie weight?
gmHumanMaxTieWeightVector <- apply( gmHumanNetworkMatrix, 1, calculateListMax )
gmHumanDataDF$maxTieWeight <- gmHumanMaxTieWeightVector

`grp_month` (gm) - human - Basic metrics

Back to Table of Contents



In [27]:

    
# assuming that our statnet network object is in reference test1_statnet.

# Use the degree function in the sna package to create vector of degree values
#    for each node.  Make sure to pass the gmode parameter to tell it that the
#    graph is not directed (gmode = "graph", instead of "digraph").
# Doc: http://www.inside-r.org/packages/cran/sna/docs/degree
#degree_vector <- degree( test1_statnet, gmode = "graph" )

# If you have other libraries loaded that also implement a degree function, you
#    can also call this with package name:
gmHumanDegreeVector <- sna::degree( gmHumanNetworkStatnet, gmode = "graph" )

# output the vector
gmHumanDegreeVector

# want more info on the degree function?  You can get to it eventually through
#    the following:
#help( package = "sna" )
#??sna::degree

# what is the average (mean) degree?
gmHumanAvgDegree <- mean( gmHumanDegreeVector )
paste( "average degree = ", gmHumanAvgDegree, sep = "" )

# subset vector to get only those that are above mean
gmHumanAboveMeanVector <- gmHumanDegreeVector[ gmHumanDegreeVector > gmHumanAvgDegree ]

# Take the degree and associate it with each node as a node attribute.
#    (%v% is a shortcut for the get.vertex.attribute command)
gmHumanNetworkStatnet %v% "degree" <- gmHumanDegreeVector

# also add degree vector to original data frame
gmHumanDataDF$degree <- gmHumanDegreeVector









    





	28
	1
	36
	1
	34
	0
	1
	50
	28
	1
	32
	1
	61
	1
	1
	99
	44
	31
	47
	66
	2
	6
	1
	1
	2
	1
	38
	2
	2
	2
	2
	2
	1
	1
	1
	1
	1
	41
	1
	1
	1
	2
	1
	1
	7
	1
	0
	1
	46
	4
	1
	19
	3
	1
	2
	2
	1
	1
	1
	1
	0
	1
	1
	1
	1
	0
	1
	2
	3
	1
	1
	32
	1
	1
	3
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	7
	1
	1
	1
	9
	2
	1
	1
	1
	1
	1
	2
	7
	1
	0
	0
	0
	1
	1
	1
	1
	1
	1
	2
	2
	2
	2
	1
	1
	1
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	37
	1
	1
	1
	1
	1
	2
	1
	1
	1
	19
	1
	1
	1
	1
	1
	1
	33
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	4
	1
	1
	31
	45
	1
	2
	1
	1
	1
	44
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	2
	1
	1
	3
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	2
	4
	4
	4
	4
	4
	1
	1
	1
	1
	1
	4
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	19
	1
	1
	1
	1
	2
	2
	2
	3
	2
	2
	4
	1
	0
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	3
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	2
	2
	2
	3
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	4
	1
	1
	2
	1
	1
	2
	4
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	76
	22
	50
	3
	1
	1
	1
	13
	9
	3
	1
	2
	3
	1
	1
	2
	1
	2
	2
	1
	0
	1
	0
	0
	0
	0
	1
	1
	0
	1
	0
	0
	1
	0
	2
	0
	1
	1
	1
	0
	0
	1
	1
	0
	0
	1
	0
	1
	1
	0
	10
	0
	1
	1
	1
	1
	1
	1
	3
	2
	14
	2
	1
	0
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	2
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	7
	1
	1
	0
	1
	1
	1
	0
	0
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	3
	2
	4
	2
	1
	2
	2
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	4
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	4
	0
	1
	1
	1
	1
	0
	1
	5
	1
	1
	1
	1
	1
	1
	3
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	2
	2
	2
	2
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	4
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	2
	2
	2
	2
	2
	1
	1
	1
	2
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	1
	1
	1
	0
	1
	1
	1
	2
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	3
	2
	1
	1
	1
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	0
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	0
	0
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	5
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	2
	0
	2
	2
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	2
	2
	2
	2
	2
	1
	1
	1
	1
	1
	1
	1
	1
	0
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	0
	2
	2
	2
	2
	2
	1
	1
	1
	1
	1
	1
	1
	1
	1
	0
	0
	0
	0
	0
	1
	0
	1
	0
	0
	0
	1
	0
	0
	0
	0
	0
	0
	0
	0
	0
	1
	0
	0
	0
	0
	0
	0
	0
	0
	2
	0
	1
	0
	0
	1
	1
	1
	1








    




'average degree = 2.05826906598115'



In [28]:

    
# average author degree (person types 2 and 4)
gmHumanAverageAuthorDegree2And4 <- calcAuthorMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = TRUE )
paste( "average author degree (2 and 4) = ", gmHumanAverageAuthorDegree2And4, sep = "" )

# average author degree (person type 2 only)
gmHumanAverageAuthorDegreeOnly2 <- calcAuthorMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = FALSE )
paste( "average author degree (only 2) = ", gmHumanAverageAuthorDegreeOnly2, sep = "" )

# average source degree (person types 3 and 4)
gmHumanAverageSourceDegree3And4 <- calcSourceMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = TRUE )
paste( "average source degree (3 and 4) = ", gmHumanAverageSourceDegree3And4, sep = "" )

# average source degree (person type 3 only)
gmHumanAverageSourceDegreeOnly3 <- calcSourceMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = FALSE )
paste( "average source degree (only 3) = ", gmHumanAverageSourceDegreeOnly3, sep = "" )









    




'average author degree (2 and 4) = 25.3958333333333'






    




'average author degree (only 2) = 25.3958333333333'






    




'average source degree (3 and 4) = 1.1564027370479'






    




'average source degree (only 3) = 1.1564027370479'

`grp_month` (gm) - human - More metrics

Back to Table of Contents

Now that we have the data in statnet object, run the code in the following for more in-depth information:

context_text/R/sna/statnet/sna-statnet-network-stats.r



In [29]:

    
# Links:
# - manual (PDF): http://cran.r-project.org/web/packages/sna/sna.pdf
# - good notes: http://www.shizukalab.com/toolkits/sna/node-level-calculations

# Also, be advised that statnet and igraph don't really play nice together.
#    If you'll be using both, best idea is to have a workspace for each.

#==============================================================================#
# statnet
#==============================================================================#

# make sure you've loaded the statnet library (includes sna)
# install.packages( "statnet" )
#library( statnet )

#==============================================================================#
# NODE level
#==============================================================================#

# what is the standard deviation of the degrees?
gmHumanDegreeSd <- sd( gmHumanDegreeVector )
paste( "degree SD = ", gmHumanDegreeSd, sep = "" )

# what is the variance of the degrees?
gmHumanDegreeVar <- var( gmHumanDegreeVector )
paste( "degree variance = ", gmHumanDegreeVar, sep = "" )

# what is the max value among the degrees?
gmHumanDegreeMax <- max( gmHumanDegreeVector )
paste( "degree max = ", gmHumanDegreeMax, sep = "" )

# calculate and plot degree distributions
gmHumanDegreeFrequenciesTable <- table( gmHumanDegreeVector )
paste( "degree frequencies = ", gmHumanDegreeFrequenciesTable, sep = "" )
gmHumanDegreeFrequenciesTable

# node-level undirected betweenness
gmHumanBetweenness <- sna::betweenness( gmHumanNetworkStatnet, gmode = "graph", cmode = "undirected" )

#paste( "betweenness = ", gmHumanBetweenness, sep = "" )
# associate with each node as a node attribute.
#    (%v% is a shortcut for the get.vertex.attribute command)
gmHumanNetworkStatnet %v% "betweenness" <- gmHumanBetweenness

# also add degree vector to original data frame
gmHumanDataDF$betweenness <- gmHumanBetweenness

#==============================================================================#
# NETWORK level
#==============================================================================#

# graph-level degree centrality
gmHumanDegreeCentrality <- sna::centralization( gmHumanNetworkStatnet, sna::degree, mode = "graph" )
paste( "degree centrality = ", gmHumanDegreeCentrality, sep = "" )

# graph-level betweenness centrality
gmHumanBetweennessCentrality <- sna::centralization( gmHumanNetworkStatnet, sna::betweenness, mode = "graph", cmode = "undirected" )
paste( "betweenness centrality = ", gmHumanBetweennessCentrality, sep = "" )

# graph-level connectedness
gmHumanConnectedness <- sna::connectedness( gmHumanNetworkStatnet )
paste( "connectedness = ", gmHumanConnectedness, sep = "" )

# graph-level transitivity
gmHumanTransitivity <- sna::gtrans( gmHumanNetworkStatnet, mode = "graph" )
paste( "transitivity = ", gmHumanTransitivity, sep = "" )

# graph-level density
gmHumanDensity <- sna::gden( gmHumanNetworkStatnet, mode = "graph" )
paste( "density = ", gmHumanDensity, sep = "" )









    




'degree SD = 6.65377784484138'






    




'degree variance = 44.272759608502'






    




'degree max = 99'






    





	'degree frequencies = 97'
	'degree frequencies = 911'
	'degree frequencies = 91'
	'degree frequencies = 14'
	'degree frequencies = 15'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 4'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 3'
	'degree frequencies = 1'
	'degree frequencies = 2'
	'degree frequencies = 2'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 2'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'
	'degree frequencies = 1'








    





gmHumanDegreeVector
  0   1   2   3   4   5   6   7   9  10  13  14  19  22  28  31  32  33  34  36 
 97 911  91  14  15   2   1   4   2   1   1   1   3   1   2   2   2   1   1   1 
 37  38  41  44  45  46  47  50  61  66  76  99 
  1   1   1   2   1   1   1   2   1   1   1   1 






    




'degree centrality = 0.0832831513777339'






    




'betweenness centrality = 0.220493193695819'






    




'connectedness = 0.673448360502733'






    



Warning message in sna::gtrans(gmHumanNetworkStatnet, mode = "graph"):
“gtrans called with use.adjacency=TRUE, but your data looks too large for that to work well.  Overriding to edgelist method.”





    




'transitivity = 0.0131821874307658'






    




'density = 0.00176523933617594'

`grp_month` (gm) - human - create node attribute DataFrame

Back to Table of Contents

If you want to just work with the traits of the nodes/vertexes, you can combine the attribute vectors into a data frame.



In [30]:

    
#==============================================================================#
# output attributes to data frame
#==============================================================================#

# if you want to just work with the traits of the nodes/vertexes, you can
#    combine the attribute vectors into a data frame.

# first, output network object to see what attributes you have
gmHumanNetworkStatnet

# then, combine them into a data frame.
gmHumanNodeAttrDF <- data.frame( id = gmHumanNetworkStatnet %v% "vertex.names",
                                 person_id = gmHumanNetworkStatnet %v% "person_id",
                                 person_type = gmHumanNetworkStatnet %v% "person_type",
                                 degree = gmHumanNetworkStatnet %v% "degree",
                                 betweenness = gmHumanNetworkStatnet %v% "betweenness" )









    





 Network attributes:
  vertices = 1167 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 1201 
    missing edges= 0 
    non-missing edges= 1201 

 Vertex attribute names: 
    betweenness degree person_id person_type vertex.names 

 Edge attribute names not shown

`grp_month` QAP graph correlation between automated and ground truth

Back to Table of Contents

Now, compare the automated and human-coded networks themselves using graph correlation in QAP.

Based on: context_text/R/sna/statnet/sna-qap.r



In [8]:

    
outputPrefix <- "grp_month a2b"



In [9]:

    
grpMontha2bOutput <- compareMatricesQAP( gmHumanNetworkMatrix, gmAutomatedNetworkMatrix, outputPrefix )









    



==> Start of  compareMatricesQAP  at  2019-07-25 22:22:48
----> grp_month a2b graph correlation = 0.914011398376571 ( @ 2019-07-25 22:22:49 )
----> grp_month a2b QAP correlation analysis complete at 2019-07-25 22:26:16.  Summary:






    



QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 0 
	p(f(perm) <= f(d)): 1 

Test Diagnostics:
	Test Value (f(d)): 0.9140114 
	Replications: 1000 
	Distribution Summary:
		Min:	 -0.00153729 
		1stQ:	 -0.0009019339 
		Med:	 -0.0002665774 
		Mean:	 -5.881582e-05 
		3rdQ:	 0.0003687791 
		Max:	 0.00926377 







    



----> grp_month a2b graph covariance = 0.0021144384905141 ( @ 2019-07-25 22:26:16 )
----> grp_month a2b QAP covariance analysis complete at 2019-07-25 22:29:38.  Summary:






    



QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 0 
	p(f(perm) <= f(d)): 1 

Test Diagnostics:
	Test Value (f(d)): 0.002114438 
	Replications: 1000 
	Distribution Summary:
		Min:	 -3.556308e-06 
		1stQ:	 -2.086499e-06 
		Med:	 -6.166898e-07 
		Mean:	 1.182147e-07 
		3rdQ:	 2.322928e-06 
		Max:	 2.290025e-05 







    



----> grp_month a2b graph hamming distance = 514 ( @ 2019-07-25 22:29:38 )
----> grp_month a2b QAP hamming distance analysis complete at 2019-07-25 22:32:35.  Summary:






    



QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 1 
	p(f(perm) <= f(d)): 0 

Test Diagnostics:
	Test Value (f(d)): 514 
	Replications: 1000 
	Distribution Summary:
		Min:	 5086 
		1stQ:	 5122 
		Med:	 5126 
		Mean:	 5125.728 
		3rdQ:	 5130 
		Max:	 5134 







    



==> End of  compareMatricesQAP  at  2019-07-25 22:32:35



In [ ]:

    
# also output plots of distributions of QAP values?
displayCompareMatricesQAPOutput( grpMontha2bOutput, outputPrefix, TRUE )

Save workspace image

Back to Table of Contents

Save all the information in the current image, in case we need/want it later.



In [10]:

    
# help( save.image )
message( paste( "Output workspace to: ", workspace_file_name, sep = "" ) )
save.image( file = workspace_file_name )









    



Output workspace to: statnet-grp_month.RData

TODO

Back to Table of Contents

DONE:

Not sure what the problem was, but it is fixed (might have been the first-name lookup bug - if only first name, and one and only one person with that first name in database, it used to match them, even though you don't know if last name matched).
human data for grp_month has one fewer vertex (1167) than automated (1168). The missing person is row 355, user ID 781 (source_3), who is in automated, not in human. QAP needs same-size matrices.
- 781 - Cook, Matthew ( Wayland Fire Department )
- First, try to regenerate the data.
- Then, if it doesn't get better, look into the article(s) where 781 - Cook, Matthew ( Wayland Fire Department ) is mentioned.

Table of Contents

R network analysis files

Setup

Setup - working directories

Setup - import SNA functions

Setup - import statnet functions

Setup - network data - render and store network data

data - grp_month

Setup - load workspace

grp_month analysis

grp_month (gm) - automated - OpenCalais

grp_month (gm) - automated - Read data

grp_month (gm) - automated - initialize statnet

grp_month (gm) - automated - Basic metrics

grp_month (gm) - automated - More metrics

grp_month (gm) - automated - create node attribute DataFrame

grp_month (gm) - human

grp_month (gm) - human - Read data

grp_month (gm) - human - initialize statnet

grp_month (gm) - human - Basic metrics

grp_month (gm) - human - More metrics

grp_month (gm) - human - create node attribute DataFrame

grp_month QAP graph correlation between automated and ground truth