2017.12.02 - work log - prelim - R - statnet - grp_month

Table of Contents

R network analysis files

Related files:

  • network descriptives

    • network-level

      • files

        • R scripts:

          • context_text/R/db_connect.r
          • context_text/R/sna/functions-sna.r
          • context_text/R/sna/sna-load_data.r
          • context_text/R/sna/igraph/*
          • context_text/R/sna/statnet/*
      • statnet/sna

        • sna::gden() - graph density
        • R scripts:

          • context_text/R/sna/statnet/sna-statnet-init.r
          • context_text/R/sna/statnet/sna-statnet-network-stats.r
          • context_text/R/sna/statnet/sna-qap.r
      • igraph

        • igraph::transitivity() - vector of transitivity scores for each node in a graph, plus network-level transitivity score.

          • Q - interpretation?
        • R scripts:

          • context_text/R/sna/statnet/sna-igraph-init.r
          • context_text/R/sna/statnet/sna-igraph-network-stats.r

Setup

Setup - working directories

Store important directories and file names in variables:


In [1]:
getwd()


'/home/jonathanmorgan/work/django/research/work/phd_work'

In [2]:
# code files (in particular SNA function library, modest though it may be)
code_directory <- "/home/jonathanmorgan/work/django/research/context_analysis/R/sna"
sna_function_file_path <- paste( code_directory, "/", 'functions-sna.r', sep = "" )

# home directory
home_directory <- getwd()
home_directory <- "/home/jonathanmorgan/work/django/research/work/phd_work/methods"

# data directories
data_directory <- paste( home_directory, "/data", sep = "" )
workspace_file_name <- "statnet-grp_month.RData"
workspace_file_path <- paste( data_directory, "/", workspace_file_name )

In [3]:
# set working directory to data directory for now.
setwd( data_directory )
getwd()


'/home/jonathanmorgan/work/django/research/work/phd_work/data'

Setup - import SNA functions

source the file functions-sna.r.


In [4]:
source( sna_function_file_path )

Setup - network data - render and store network data

First, need render to render network data and upload it to your server.

Directions for rendering network data are in 2017.11.14-work_log-prelim-network_analysis.ipynb. You want a tab-delimited matrix that includes both the network and attributes of nodes as columns, and you want it to include a header row.

Once you render your network data files, you should place them on the server.

High level data file layout:

  • tab-delimited.
  • first row and first column are labels
  • last 2 columns are traits of nodes (person_id and person_type)
  • each row and column after first until the trait columns represents a person found in one of the articles.
  • The people are in the same order from top to bottom and left to right.
  • Where the row and column of two people meet, and one of the people is an author, the nunber in the cell where they meet is the number of times the non-author was quoted in an article by the author. Does not include more basic two-mode co-location ties (appeared in same article, even if not an author and/or not quoted).

Files and their location on server:

data - grp_month

This is data from the Grand Rapids Press articles from December of 2009, coded by both humans and OpenCalais.

Files:

  • automated full month - sourcenet_data-20171205-022551-grp_month-automated.tab
  • automated week subset - sourcenet_data-20171206-031358-grp_month-automated-week_subset.tab
  • human full month - sourcenet_data-20171115-043102-grp_month-human.tab
  • human week subset - sourcenet_data-20171206-031319-grp_month-human-week_subset.tab

Location in Dropbox: Dropbox/academia/MSU/program_stuff/prelim_paper/data/network_analysis/2017.11.14/network/new_coders/grp_month

Location on server: /home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month

Setup - load workspace (optional)

If you want, you can load this file's workspace, from a previous run:


In [5]:
# assumes that you've already set working directory above to the
#     working directory.
setwd( data_directory )
load( workspace_file_name )

grp_month analysis

First, look at the shiny new month of data.

grp_month (gm) - automated - OpenCalais

First, we'll analyze the month of data coded by OpenCalais. Set up some variables to store where data is located:

grp_month (gm) - automated - Read data

Read in the data from tab-delimited data file, then get it in right data structures for use in R SNA.


In [8]:
# initialize variables
gmAutomatedDataFolder <- paste( data_directory, "/network/grp_month", sep = "" )
gmAutomatedDataFile <- "sourcenet_data-20171205-022551-grp_month-automated.tab"
gmAutomatedDataPath <- paste( gmAutomatedDataFolder, "/", gmAutomatedDataFile, sep = "" )

In [9]:
gmAutomatedDataPath


'/home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month/sourcenet_data-20171205-022551-grp_month-automated.tab'

Load the data file into memory


In [10]:
# tab-delimited:
gmAutomatedDataDF <- read.delim( gmAutomatedDataPath, header = TRUE, row.names = 1, check.names = FALSE )

In [11]:
# get count of rows...
gmAutomatedRowCount <- nrow( gmAutomatedDataDF )
paste( "grp_month automated row count = ", gmAutomatedRowCount, sep = "" )

# ...and columns
gmAutomatedColumnCount <- ncol( gmAutomatedDataDF )
paste( "grp_month automated column count = ", gmAutomatedColumnCount, sep = "" )


'grp_month automated row count = 1167'
'grp_month automated column count = 1169'

Get just the tie rows and columns for initializing network libraries.


In [12]:
# the below syntax returns only as many columns as there are rows, so
#     omitting any trait columns that lie in columns on the right side
#     of the file.
gmAutomatedNetworkDF <- gmAutomatedDataDF[ , 1 : gmAutomatedRowCount ]
#str( gmAutomatedNetworkDF )

In [13]:
# convert to a matrix
gmAutomatedNetworkMatrix <- as.matrix( gmAutomatedNetworkDF )
# str( gmAutomatedNetworkMatrix )

grp_month (gm) - automated - initialize statnet

First, load the statnet package, then load the automated grp_month data into statnet object and assign attributes to nodes.

Based on context_text/R/sna/statnet/sna-statnet-init.r.


In [14]:
# make sure you've loaded the statnet library
# install.packages( "statnet" )
library( statnet )


Loading required package: tergm
Loading required package: statnet.common

Attaching package: ‘statnet.common’

The following object is masked from ‘package:base’:

    order

Loading required package: ergm
Loading required package: network
network: Classes for Relational Data
Version 1.13.0 created on 2015-08-31.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
                    Mark S. Handcock, University of California -- Los Angeles
                    David R. Hunter, Penn State University
                    Martina Morris, University of Washington
                    Skye Bender-deMoll, University of Washington
 For citation information, type citation("network").
 Type help("network-package") to get started.


ergm: version 3.8.0, created on 2017-08-18
Copyright (c) 2017, Mark S. Handcock, University of California -- Los Angeles
                    David R. Hunter, Penn State University
                    Carter T. Butts, University of California -- Irvine
                    Steven M. Goodreau, University of Washington
                    Pavel N. Krivitsky, University of Wollongong
                    Martina Morris, University of Washington
                    with contributions from
                    Li Wang
                    Kirk Li, University of Washington
                    Skye Bender-deMoll, University of Washington
Based on "statnet" project software (statnet.org).
For license and citation information see statnet.org/attribution
or type citation("ergm").

NOTE: Versions before 3.6.1 had a bug in the implementation of the bd()
constriant which distorted the sampled distribution somewhat. In
addition, Sampson's Monks datasets had mislabeled vertices. See the
NEWS and the documentation for more details.

Loading required package: networkDynamic

networkDynamic: version 0.9.0, created on 2016-01-12
Copyright (c) 2016, Carter T. Butts, University of California -- Irvine
                    Ayn Leslie-Cook, University of Washington
                    Pavel N. Krivitsky, University of Wollongong
                    Skye Bender-deMoll, University of Washington
                    with contributions from
                    Zack Almquist, University of California -- Irvine
                    David R. Hunter, Penn State University
                    Li Wang
                    Kirk Li, University of Washington
                    Steven M. Goodreau, University of Washington
                    Jeffrey Horner
                    Martina Morris, University of Washington
Based on "statnet" project software (statnet.org).
For license and citation information see statnet.org/attribution
or type citation("networkDynamic").


tergm: version 3.4.1, created on 2017-09-12
Copyright (c) 2017, Pavel N. Krivitsky, University of Wollongong
                    Mark S. Handcock, University of California -- Los Angeles
                    with contributions from
                    David R. Hunter, Penn State University
                    Steven M. Goodreau, University of Washington
                    Martina Morris, University of Washington
                    Nicole Bohme Carnegie, New York University
                    Carter T. Butts, University of California -- Irvine
                    Ayn Leslie-Cook, University of Washington
                    Skye Bender-deMoll
                    Li Wang
                    Kirk Li, University of Washington
Based on "statnet" project software (statnet.org).
For license and citation information see statnet.org/attribution
or type citation("tergm").

Loading required package: ergm.count

ergm.count: version 3.2.2, created on 2016-03-29
Copyright (c) 2016, Pavel N. Krivitsky, University of Wollongong
                    with contributions from
                    Mark S. Handcock, University of California -- Los Angeles
                    David R. Hunter, Penn State University
Based on "statnet" project software (statnet.org).
For license and citation information see statnet.org/attribution
or type citation("ergm.count").

NOTE: The form of the term ‘CMP’ has been changed in version 3.2 of
‘ergm.count’. See the news or help('CMP') for more information.

Loading required package: sna
sna: Tools for Social Network Analysis
Version 2.4 created on 2016-07-23.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
 For citation information, type citation("sna").
 Type help(package="sna") to get started.


statnet: version 2016.9, created on 2016-08-29
Copyright (c) 2016, Mark S. Handcock, University of California -- Los Angeles
                    David R. Hunter, Penn State University
                    Carter T. Butts, University of California -- Irvine
                    Steven M. Goodreau, University of Washington
                    Pavel N. Krivitsky, University of Wollongong
                    Skye Bender-deMoll
                    Martina Morris, University of Washington
Based on "statnet" project software (statnet.org).
For license and citation information see statnet.org/attribution
or type citation("statnet").


In [15]:
# If you have a data frame of attributes (each attribute is a column, with
#     attribute name the column name), you can associate those attributes
#     when you create the network.
# attribute help: http://www.inside-r.org/packages/cran/network/docs/loading.attributes

# load attributes from a file:
#tab_attribute_test1 <- read.delim( "tab-test1-attribute_data.txt", header = TRUE, row.names = 1, check.names = FALSE )

# or create DataFrame by just grabbing the attribute columns
gmAutomatedNetworkAttributeDF <- gmAutomatedDataDF[ , 1168:1169 ]

# convert matrix to statnet network object instance.
gmAutomatedNetworkStatnet <- network( gmAutomatedNetworkMatrix, matrix.type = "adjacency", directed = FALSE, vertex.attr = gmAutomatedNetworkAttributeDF )

# look at information now.
gmAutomatedNetworkStatnet

# Network attributes:
#  vertices = 314
#  directed = FALSE
#  hyper = FALSE
#  loops = FALSE
#  multiple = FALSE
#  bipartite = FALSE
#  total edges= 309
#    missing edges= 0
#    non-missing edges= 309
#
# Vertex attribute names:
#    person_type vertex.names
#
# No edge attributes


 Network attributes:
  vertices = 1167 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 1152 
    missing edges= 0 
    non-missing edges= 1152 

 Vertex attribute names: 
    person_id person_type vertex.names 

 Edge attribute names not shown 

In [6]:
# calais - include ties Greater than or equal to 0 (GE0)
gmAutomatedMeanTieWeightGE0Vector <- apply( gmAutomatedNetworkMatrix, 1, calculateListMean )
gmAutomatedDataDF$meanTieWeightGE0 <- gmAutomatedMeanTieWeightGE0Vector

# calais - include ties Greater than or equal to 1 (GE1)
gmAutomatedMeanTieWeightGE1Vector <- apply( gmAutomatedNetworkMatrix, 1, calculateListMean, minValueToIncludeIN = 1 )
gmAutomatedDataDF$meanTieWeightGE1 <- gmAutomatedMeanTieWeightGE1Vector

# automated - Max tie weight?
gmAutomatedMaxTieWeightVector <- apply( gmAutomatedNetworkMatrix, 1, calculateListMax )
gmAutomatedDataDF$maxTieWeight <- gmAutomatedMaxTieWeightVector

grp_month (gm) - automated - Basic metrics


In [16]:
# assuming that our statnet network object is in reference test1_statnet.

# Use the degree function in the sna package to create vector of degree values
#    for each node.  Make sure to pass the gmode parameter to tell it that the
#    graph is not directed (gmode = "graph", instead of "digraph").
# Doc: http://www.inside-r.org/packages/cran/sna/docs/degree
#degree_vector <- degree( test1_statnet, gmode = "graph" )

# If you have other libraries loaded that also implement a degree function, you
#    can also call this with package name:
gmAutomatedDegreeVector <- sna::degree( gmAutomatedNetworkStatnet, gmode = "graph" )

# output the vector
gmAutomatedDegreeVector

# want more info on the degree function?  You can get to it eventually through
#    the following:
#help( package = "sna" )
#??sna::degree

# what is the average (mean) degree?
gmAutomatedAvgDegree <- mean( gmAutomatedDegreeVector )
paste( "average degree = ", gmAutomatedAvgDegree, sep = "" )

# subset vector to get only those that are above mean
gmAutomatedAboveMeanVector <- gmAutomatedDegreeVector[ gmAutomatedDegreeVector > gmAutomatedAvgDegree ]

# Take the degree and associate it with each node as a node attribute.
#    (%v% is a shortcut for the get.vertex.attribute command)
gmAutomatedNetworkStatnet %v% "degree" <- gmAutomatedDegreeVector

# also add degree vector to original data frame
gmAutomatedDataDF$degree <- gmAutomatedDegreeVector


  1. 30
  2. 1
  3. 39
  4. 1
  5. 34
  6. 1
  7. 1
  8. 50
  9. 26
  10. 1
  11. 29
  12. 1
  13. 47
  14. 1
  15. 1
  16. 93
  17. 46
  18. 29
  19. 47
  20. 71
  21. 2
  22. 9
  23. 1
  24. 1
  25. 2
  26. 1
  27. 35
  28. 0
  29. 2
  30. 2
  31. 2
  32. 2
  33. 1
  34. 1
  35. 1
  36. 1
  37. 1
  38. 42
  39. 1
  40. 0
  41. 1
  42. 2
  43. 2
  44. 1
  45. 5
  46. 1
  47. 0
  48. 1
  49. 43
  50. 5
  51. 1
  52. 14
  53. 1
  54. 1
  55. 2
  56. 2
  57. 1
  58. 2
  59. 1
  60. 1
  61. 1
  62. 1
  63. 1
  64. 1
  65. 0
  66. 1
  67. 0
  68. 2
  69. 4
  70. 1
  71. 1
  72. 32
  73. 1
  74. 1
  75. 3
  76. 1
  77. 1
  78. 1
  79. 1
  80. 1
  81. 1
  82. 1
  83. 1
  84. 1
  85. 1
  86. 1
  87. 1
  88. 1
  89. 1
  90. 1
  91. 1
  92. 1
  93. 7
  94. 1
  95. 1
  96. 1
  97. 9
  98. 2
  99. 1
  100. 1
  101. 1
  102. 1
  103. 1
  104. 2
  105. 5
  106. 0
  107. 0
  108. 0
  109. 0
  110. 1
  111. 1
  112. 1
  113. 0
  114. 1
  115. 1
  116. 2
  117. 0
  118. 2
  119. 2
  120. 1
  121. 1
  122. 1
  123. 1
  124. 1
  125. 2
  126. 1
  127. 1
  128. 1
  129. 1
  130. 1
  131. 0
  132. 1
  133. 1
  134. 35
  135. 1
  136. 1
  137. 1
  138. 1
  139. 1
  140. 1
  141. 0
  142. 1
  143. 1
  144. 19
  145. 1
  146. 1
  147. 1
  148. 1
  149. 1
  150. 1
  151. 27
  152. 1
  153. 1
  154. 1
  155. 1
  156. 1
  157. 1
  158. 1
  159. 1
  160. 1
  161. 2
  162. 1
  163. 1
  164. 1
  165. 1
  166. 1
  167. 1
  168. 1
  169. 1
  170. 1
  171. 1
  172. 1
  173. 4
  174. 1
  175. 1
  176. 32
  177. 49
  178. 1
  179. 2
  180. 0
  181. 1
  182. 1
  183. 44
  184. 1
  185. 1
  186. 1
  187. 1
  188. 1
  189. 1
  190. 1
  191. 2
  192. 1
  193. 1
  194. 2
  195. 1
  196. 1
  197. 4
  198. 0
  199. 1
  200. 1
  201. 1
  202. 1
  203. 1
  204. 1
  205. 1
  206. 1
  207. 1
  208. 1
  209. 1
  210. 2
  211. 2
  212. 2
  213. 2
  214. 0
  215. 2
  216. 2
  217. 1
  218. 1
  219. 1
  220. 1
  221. 1
  222. 4
  223. 1
  224. 1
  225. 1
  226. 0
  227. 1
  228. 1
  229. 1
  230. 1
  231. 2
  232. 1
  233. 15
  234. 1
  235. 0
  236. 1
  237. 1
  238. 2
  239. 2
  240. 2
  241. 2
  242. 2
  243. 2
  244. 4
  245. 1
  246. 0
  247. 1
  248. 1
  249. 1
  250. 1
  251. 1
  252. 1
  253. 1
  254. 1
  255. 1
  256. 1
  257. 1
  258. 1
  259. 1
  260. 1
  261. 2
  262. 1
  263. 1
  264. 1
  265. 1
  266. 1
  267. 1
  268. 1
  269. 1
  270. 1
  271. 1
  272. 1
  273. 1
  274. 1
  275. 0
  276. 0
  277. 2
  278. 2
  279. 2
  280. 2
  281. 1
  282. 1
  283. 2
  284. 1
  285. 1
  286. 1
  287. 1
  288. 1
  289. 1
  290. 1
  291. 2
  292. 1
  293. 0
  294. 2
  295. 0
  296. 1
  297. 2
  298. 3
  299. 1
  300. 1
  301. 1
  302. 1
  303. 0
  304. 1
  305. 1
  306. 1
  307. 0
  308. 1
  309. 72
  310. 22
  311. 46
  312. 3
  313. 1
  314. 1
  315. 1
  316. 10
  317. 8
  318. 2
  319. 1
  320. 1
  321. 2
  322. 1
  323. 1
  324. 2
  325. 1
  326. 2
  327. 2
  328. 1
  329. 1
  330. 1
  331. 1
  332. 0
  333. 0
  334. 0
  335. 1
  336. 2
  337. 1
  338. 1
  339. 1
  340. 1
  341. 1
  342. 1
  343. 2
  344. 1
  345. 0
  346. 1
  347. 1
  348. 1
  349. 1
  350. 1
  351. 1
  352. 0
  353. 1
  354. 1
  355. 0
  356. 1
  357. 1
  358. 0
  359. 0
  360. 1
  361. 1
  362. 0
  363. 0
  364. 1
  365. 1
  366. 1
  367. 3
  368. 2
  369. 15
  370. 1
  371. 1
  372. 1
  373. 1
  374. 0
  375. 2
  376. 1
  377. 0
  378. 1
  379. 1
  380. 1
  381. 1
  382. 1
  383. 1
  384. 1
  385. 1
  386. 2
  387. 1
  388. 2
  389. 1
  390. 1
  391. 1
  392. 0
  393. 1
  394. 0
  395. 0
  396. 1
  397. 1
  398. 1
  399. 1
  400. 1
  401. 1
  402. 2
  403. 1
  404. 0
  405. 1
  406. 7
  407. 1
  408. 1
  409. 1
  410. 1
  411. 1
  412. 1
  413. 1
  414. 0
  415. 1
  416. 0
  417. 1
  418. 0
  419. 1
  420. 1
  421. 1
  422. 1
  423. 1
  424. 1
  425. 1
  426. 1
  427. 1
  428. 1
  429. 1
  430. 1
  431. 1
  432. 1
  433. 1
  434. 1
  435. 1
  436. 0
  437. 1
  438. 1
  439. 1
  440. 1
  441. 1
  442. 3
  443. 2
  444. 4
  445. 2
  446. 1
  447. 2
  448. 2
  449. 1
  450. 1
  451. 1
  452. 1
  453. 1
  454. 0
  455. 1
  456. 1
  457. 0
  458. 1
  459. 1
  460. 1
  461. 1
  462. 1
  463. 1
  464. 1
  465. 1
  466. 1
  467. 1
  468. 1
  469. 0
  470. 1
  471. 0
  472. 1
  473. 1
  474. 1
  475. 1
  476. 2
  477. 1
  478. 4
  479. 1
  480. 1
  481. 1
  482. 1
  483. 1
  484. 1
  485. 1
  486. 1
  487. 1
  488. 1
  489. 1
  490. 1
  491. 1
  492. 0
  493. 1
  494. 1
  495. 1
  496. 1
  497. 1
  498. 1
  499. 1
  500. 1
  501. 1
  502. 1
  503. 1
  504. 1
  505. 1
  506. 1
  507. 1
  508. 1
  509. 1
  510. 1
  511. 1
  512. 1
  513. 1
  514. 1
  515. 1
  516. 1
  517. 1
  518. 1
  519. 1
  520. 1
  521. 1
  522. 1
  523. 1
  524. 1
  525. 1
  526. 1
  527. 1
  528. 1
  529. 1
  530. 1
  531. 1
  532. 1
  533. 1
  534. 1
  535. 1
  536. 1
  537. 1
  538. 1
  539. 1
  540. 1
  541. 1
  542. 1
  543. 1
  544. 1
  545. 1
  546. 1
  547. 1
  548. 1
  549. 1
  550. 1
  551. 1
  552. 1
  553. 1
  554. 1
  555. 1
  556. 1
  557. 1
  558. 1
  559. 1
  560. 1
  561. 1
  562. 1
  563. 2
  564. 0
  565. 1
  566. 1
  567. 1
  568. 1
  569. 1
  570. 1
  571. 1
  572. 1
  573. 1
  574. 1
  575. 1
  576. 1
  577. 1
  578. 0
  579. 1
  580. 1
  581. 1
  582. 0
  583. 1
  584. 1
  585. 1
  586. 1
  587. 1
  588. 1
  589. 1
  590. 1
  591. 1
  592. 1
  593. 0
  594. 2
  595. 1
  596. 0
  597. 1
  598. 1
  599. 1
  600. 0
  601. 1
  602. 1
  603. 1
  604. 1
  605. 1
  606. 1
  607. 1
  608. 1
  609. 1
  610. 1
  611. 4
  612. 1
  613. 1
  614. 1
  615. 1
  616. 1
  617. 0
  618. 1
  619. 5
  620. 1
  621. 1
  622. 1
  623. 1
  624. 1
  625. 1
  626. 1
  627. 1
  628. 1
  629. 1
  630. 1
  631. 1
  632. 1
  633. 1
  634. 1
  635. 1
  636. 1
  637. 1
  638. 1
  639. 1
  640. 1
  641. 1
  642. 1
  643. 1
  644. 1
  645. 1
  646. 0
  647. 2
  648. 2
  649. 0
  650. 2
  651. 2
  652. 0
  653. 1
  654. 1
  655. 1
  656. 0
  657. 0
  658. 1
  659. 1
  660. 1
  661. 1
  662. 1
  663. 1
  664. 1
  665. 1
  666. 4
  667. 1
  668. 1
  669. 1
  670. 1
  671. 1
  672. 1
  673. 1
  674. 1
  675. 1
  676. 1
  677. 1
  678. 1
  679. 0
  680. 1
  681. 1
  682. 1
  683. 1
  684. 1
  685. 1
  686. 1
  687. 1
  688. 1
  689. 1
  690. 1
  691. 1
  692. 1
  693. 0
  694. 1
  695. 1
  696. 1
  697. 0
  698. 1
  699. 1
  700. 1
  701. 1
  702. 1
  703. 1
  704. 1
  705. 1
  706. 1
  707. 0
  708. 1
  709. 1
  710. 1
  711. 1
  712. 1
  713. 1
  714. 1
  715. 1
  716. 1
  717. 1
  718. 1
  719. 1
  720. 1
  721. 1
  722. 1
  723. 0
  724. 0
  725. 0
  726. 1
  727. 1
  728. 1
  729. 1
  730. 1
  731. 1
  732. 1
  733. 0
  734. 0
  735. 1
  736. 0
  737. 0
  738. 1
  739. 0
  740. 0
  741. 1
  742. 1
  743. 1
  744. 1
  745. 1
  746. 1
  747. 1
  748. 1
  749. 1
  750. 1
  751. 1
  752. 1
  753. 1
  754. 1
  755. 0
  756. 1
  757. 1
  758. 2
  759. 1
  760. 0
  761. 1
  762. 0
  763. 1
  764. 1
  765. 1
  766. 0
  767. 1
  768. 1
  769. 1
  770. 1
  771. 1
  772. 0
  773. 2
  774. 2
  775. 2
  776. 2
  777. 2
  778. 1
  779. 1
  780. 1
  781. 2
  782. 1
  783. 1
  784. 1
  785. 1
  786. 1
  787. 1
  788. 1
  789. 1
  790. 1
  791. 1
  792. 1
  793. 1
  794. 0
  795. 2
  796. 1
  797. 1
  798. 1
  799. 1
  800. 1
  801. 1
  802. 1
  803. 2
  804. 1
  805. 1
  806. 1
  807. 0
  808. 1
  809. 1
  810. 1
  811. 1
  812. 1
  813. 0
  814. 1
  815. 1
  816. 1
  817. 1
  818. 0
  819. 1
  820. 1
  821. 1
  822. 1
  823. 0
  824. 1
  825. 0
  826. 0
  827. 1
  828. 1
  829. 0
  830. 1
  831. 3
  832. 2
  833. 1
  834. 1
  835. 1
  836. 2
  837. 1
  838. 0
  839. 1
  840. 1
  841. 1
  842. 1
  843. 1
  844. 1
  845. 1
  846. 1
  847. 1
  848. 1
  849. 1
  850. 1
  851. 1
  852. 1
  853. 1
  854. 1
  855. 1
  856. 1
  857. 0
  858. 1
  859. 1
  860. 0
  861. 1
  862. 1
  863. 1
  864. 1
  865. 1
  866. 1
  867. 1
  868. 1
  869. 1
  870. 1
  871. 1
  872. 1
  873. 1
  874. 1
  875. 1
  876. 1
  877. 1
  878. 1
  879. 1
  880. 1
  881. 1
  882. 1
  883. 1
  884. 1
  885. 1
  886. 1
  887. 1
  888. 1
  889. 1
  890. 1
  891. 1
  892. 1
  893. 1
  894. 0
  895. 1
  896. 1
  897. 2
  898. 1
  899. 1
  900. 1
  901. 1
  902. 1
  903. 1
  904. 1
  905. 1
  906. 1
  907. 1
  908. 1
  909. 1
  910. 1
  911. 0
  912. 1
  913. 1
  914. 1
  915. 1
  916. 1
  917. 1
  918. 1
  919. 1
  920. 0
  921. 1
  922. 1
  923. 1
  924. 1
  925. 1
  926. 0
  927. 0
  928. 1
  929. 1
  930. 1
  931. 1
  932. 1
  933. 1
  934. 1
  935. 0
  936. 1
  937. 1
  938. 1
  939. 1
  940. 1
  941. 1
  942. 1
  943. 1
  944. 0
  945. 0
  946. 1
  947. 1
  948. 1
  949. 1
  950. 1
  951. 1
  952. 1
  953. 1
  954. 1
  955. 1
  956. 1
  957. 1
  958. 1
  959. 1
  960. 1
  961. 0
  962. 0
  963. 1
  964. 1
  965. 1
  966. 1
  967. 1
  968. 1
  969. 1
  970. 1
  971. 1
  972. 1
  973. 1
  974. 1
  975. 1
  976. 1
  977. 1
  978. 1
  979. 1
  980. 0
  981. 0
  982. 0
  983. 1
  984. 0
  985. 1
  986. 1
  987. 1
  988. 1
  989. 1
  990. 1
  991. 1
  992. 1
  993. 1
  994. 1
  995. 0
  996. 1
  997. 1
  998. 1
  999. 1
  1000. 1
  1001. 1
  1002. 1
  1003. 1
  1004. 1
  1005. 1
  1006. 1
  1007. 1
  1008. 0
  1009. 1
  1010. 1
  1011. 1
  1012. 0
  1013. 1
  1014. 1
  1015. 0
  1016. 1
  1017. 1
  1018. 1
  1019. 1
  1020. 0
  1021. 1
  1022. 1
  1023. 0
  1024. 1
  1025. 1
  1026. 1
  1027. 1
  1028. 1
  1029. 1
  1030. 1
  1031. 1
  1032. 0
  1033. 1
  1034. 1
  1035. 6
  1036. 1
  1037. 1
  1038. 1
  1039. 1
  1040. 2
  1041. 1
  1042. 1
  1043. 1
  1044. 1
  1045. 1
  1046. 2
  1047. 2
  1048. 0
  1049. 2
  1050. 2
  1051. 1
  1052. 1
  1053. 1
  1054. 1
  1055. 1
  1056. 1
  1057. 1
  1058. 0
  1059. 1
  1060. 1
  1061. 1
  1062. 1
  1063. 1
  1064. 1
  1065. 1
  1066. 1
  1067. 2
  1068. 0
  1069. 2
  1070. 2
  1071. 2
  1072. 1
  1073. 0
  1074. 1
  1075. 1
  1076. 1
  1077. 1
  1078. 1
  1079. 0
  1080. 1
  1081. 1
  1082. 1
  1083. 1
  1084. 1
  1085. 1
  1086. 1
  1087. 1
  1088. 1
  1089. 1
  1090. 1
  1091. 1
  1092. 1
  1093. 1
  1094. 1
  1095. 1
  1096. 1
  1097. 1
  1098. 1
  1099. 1
  1100. 1
  1101. 1
  1102. 1
  1103. 1
  1104. 1
  1105. 1
  1106. 1
  1107. 1
  1108. 1
  1109. 1
  1110. 1
  1111. 0
  1112. 1
  1113. 2
  1114. 2
  1115. 2
  1116. 2
  1117. 2
  1118. 2
  1119. 2
  1120. 1
  1121. 1
  1122. 1
  1123. 0
  1124. 0
  1125. 1
  1126. 1
  1127. 1
  1128. 1
  1129. 1
  1130. 1
  1131. 1
  1132. 1
  1133. 1
  1134. 1
  1135. 1
  1136. 1
  1137. 1
  1138. 1
  1139. 1
  1140. 1
  1141. 1
  1142. 1
  1143. 1
  1144. 1
  1145. 1
  1146. 1
  1147. 2
  1148. 2
  1149. 1
  1150. 1
  1151. 1
  1152. 1
  1153. 1
  1154. 1
  1155. 1
  1156. 1
  1157. 1
  1158. 1
  1159. 2
  1160. 2
  1161. 1
  1162. 1
  1163. 2
  1164. 0
  1165. 1
  1166. 0
  1167. 1
'average degree = 1.97429305912596'

In [17]:
# average author degree (person types 2 and 4)
gmAutomatedAverageAuthorDegree2And4 <- calcAuthorMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = TRUE )
paste( "average author degree (2 and 4) = ", gmAutomatedAverageAuthorDegree2And4, sep = "" )

# average author degree (person type 2 only)
gmAutomatedAverageAuthorDegreeOnly2 <- calcAuthorMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = FALSE )
paste( "average author degree (only 2) = ", gmAutomatedAverageAuthorDegreeOnly2, sep = "" )

# average source degree (person types 3 and 4)
gmAutomatedAverageSourceDegree3And4 <- calcSourceMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = TRUE )
paste( "average source degree (3 and 4) = ", gmAutomatedAverageSourceDegree3And4, sep = "" )

# average source degree (person type 3 only)
gmAutomatedAverageSourceDegreeOnly3 <- calcSourceMeanDegree( dataFrameIN = gmAutomatedDataDF, includeBothIN = FALSE )
paste( "average source degree (only 3) = ", gmAutomatedAverageSourceDegreeOnly3, sep = "" )


'average author degree (2 and 4) = 24.7872340425532'
'average author degree (only 2) = 24.8478260869565'
'average source degree (3 and 4) = 1.161'
'average source degree (only 3) = 1.14014014014014'

grp_month (gm) - automated - More metrics

Now that we have the data in statnet object, run the code in the following for more in-depth information:

  • context_text/R/sna/statnet/sna-statnet-network-stats.r

In [18]:
# Links:
# - manual (PDF): http://cran.r-project.org/web/packages/sna/sna.pdf
# - good notes: http://www.shizukalab.com/toolkits/sna/node-level-calculations

# Also, be advised that statnet and igraph don't really play nice together.
#    If you'll be using both, best idea is to have a workspace for each.

#==============================================================================#
# statnet
#==============================================================================#

# make sure you've loaded the statnet library (includes sna)
# install.packages( "statnet" )
#library( statnet )

#==============================================================================#
# NODE level
#==============================================================================#

# what is the standard deviation of the degrees?
gmAutomatedDegreeSd <- sd( gmAutomatedDegreeVector )
paste( "degree SD = ", gmAutomatedDegreeSd, sep = "" )

# what is the variance of the degrees?
gmAutomatedDegreeVar <- var( gmAutomatedDegreeVector )
paste( "degree variance = ", gmAutomatedDegreeVar, sep = "" )

# what is the max value among the degrees?
gmAutomatedDegreeMax <- max( gmAutomatedDegreeVector )
paste( "degree max = ", gmAutomatedDegreeMax, sep = "" )

# calculate and plot degree distributions
gmAutomatedDegreeFrequenciesTable <- table( gmAutomatedDegreeVector )
paste( "degree frequencies = ", gmAutomatedDegreeFrequenciesTable, sep = "" )
gmAutomatedDegreeFrequenciesTable

# node-level undirected betweenness
gmAutomatedBetweenness <- sna::betweenness( gmAutomatedNetworkStatnet, gmode = "graph", cmode = "undirected" )

#paste( "betweenness = ", gmAutomatedBetweenness, sep = "" )
# associate with each node as a node attribute.
#    (%v% is a shortcut for the get.vertex.attribute command)
gmAutomatedNetworkStatnet %v% "betweenness" <- gmAutomatedBetweenness

# also add degree vector to original data frame
gmAutomatedDataDF$betweenness <- gmAutomatedBetweenness

#==============================================================================#
# NETWORK level
#==============================================================================#

# graph-level degree centrality
gmAutomatedDegreeCentrality <- sna::centralization( gmAutomatedNetworkStatnet, sna::degree, mode = "graph" )
paste( "degree centrality = ", gmAutomatedDegreeCentrality, sep = "" )

# graph-level betweenness centrality
gmAutomatedBetweennessCentrality <- sna::centralization( gmAutomatedNetworkStatnet, sna::betweenness, mode = "graph", cmode = "undirected" )
paste( "betweenness centrality = ", gmAutomatedBetweennessCentrality, sep = "" )

# graph-level connectedness
gmAutomatedConnectedness <- sna::connectedness( gmAutomatedNetworkStatnet )
paste( "connectedness = ", gmAutomatedConnectedness, sep = "" )

# graph-level transitivity
gmAutomatedTransitivity <- sna::gtrans( gmAutomatedNetworkStatnet, mode = "graph" )
paste( "transitivity = ", gmAutomatedTransitivity, sep = "" )

# graph-level density
gmAutomatedDensity <- sna::gden( gmAutomatedNetworkStatnet, mode = "graph" )
paste( "density = ", gmAutomatedDensity, sep = "" )


'degree SD = 6.42460087405331'
'degree variance = 41.2754963908866'
'degree max = 93'
  1. 'degree frequencies = 122'
  2. 'degree frequencies = 891'
  3. 'degree frequencies = 100'
  4. 'degree frequencies = 6'
  5. 'degree frequencies = 9'
  6. 'degree frequencies = 4'
  7. 'degree frequencies = 1'
  8. 'degree frequencies = 2'
  9. 'degree frequencies = 1'
  10. 'degree frequencies = 2'
  11. 'degree frequencies = 1'
  12. 'degree frequencies = 1'
  13. 'degree frequencies = 2'
  14. 'degree frequencies = 1'
  15. 'degree frequencies = 1'
  16. 'degree frequencies = 1'
  17. 'degree frequencies = 1'
  18. 'degree frequencies = 2'
  19. 'degree frequencies = 1'
  20. 'degree frequencies = 2'
  21. 'degree frequencies = 1'
  22. 'degree frequencies = 2'
  23. 'degree frequencies = 1'
  24. 'degree frequencies = 1'
  25. 'degree frequencies = 1'
  26. 'degree frequencies = 1'
  27. 'degree frequencies = 2'
  28. 'degree frequencies = 2'
  29. 'degree frequencies = 1'
  30. 'degree frequencies = 1'
  31. 'degree frequencies = 1'
  32. 'degree frequencies = 1'
  33. 'degree frequencies = 1'
gmAutomatedDegreeVector
  0   1   2   3   4   5   6   7   8   9  10  14  15  19  22  26  27  29  30  32 
122 891 100   6   9   4   1   2   1   2   1   1   2   1   1   1   1   2   1   2 
 34  35  39  42  43  44  46  47  49  50  71  72  93 
  1   2   1   1   1   1   2   2   1   1   1   1   1 
'degree centrality = 0.0782006640213782'
'betweenness centrality = 0.206660606881935'
'connectedness = 0.588270050752468'
Warning message in sna::gtrans(gmAutomatedNetworkStatnet, mode = "graph"):
“gtrans called with use.adjacency=TRUE, but your data looks too large for that to work well.  Overriding to edgelist method.”
'transitivity = 0.00893353450329548'
'density = 0.00169321874710632'

grp_month (gm) - automated - create node attribute DataFrame

If you want to just work with the traits of the nodes/vertexes, you can combine the attribute vectors into a data frame.


In [19]:
#==============================================================================#
# output attributes to data frame
#==============================================================================#

# if you want to just work with the traits of the nodes/vertexes, you can
#    combine the attribute vectors into a data frame.

# first, output network object to see what attributes you have
gmAutomatedNetworkStatnet

# then, combine them into a data frame.
gmAutomatedNodeAttrDF <- data.frame( id = gmAutomatedNetworkStatnet %v% "vertex.names",
                                     person_id = gmAutomatedNetworkStatnet %v% "person_id",
                                     person_type = gmAutomatedNetworkStatnet %v% "person_type",
                                     degree = gmAutomatedNetworkStatnet %v% "degree",
                                     betweenness = gmAutomatedNetworkStatnet %v% "betweenness" )


 Network attributes:
  vertices = 1167 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 1152 
    missing edges= 0 
    non-missing edges= 1152 

 Vertex attribute names: 
    betweenness degree person_id person_type vertex.names 

 Edge attribute names not shown 

grp_month (gm) - human

Next, we'll analyze the month of data coded by human coders. Set up some variables to store where data is located:

grp_month (gm) - human - Read data

Read in the data from tab-delimited data file, then get it in right data structures for use in R SNA.


In [20]:
# initialize variables
gmHumanDataFolder <- paste( data_directory, "/network/grp_month", sep = "" )
gmHumanDataFile <- "sourcenet_data-20171115-043102-grp_month-human.tab"
gmHumanDataPath <- paste( gmHumanDataFolder, "/", gmHumanDataFile, sep = "" )

In [21]:
gmHumanDataPath


'/home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month/sourcenet_data-20171115-043102-grp_month-human.tab'

Load the data file into memory


In [22]:
# tab-delimited:
gmHumanDataDF <- read.delim( gmHumanDataPath, header = TRUE, row.names = 1, check.names = FALSE )

In [23]:
# get count of rows...
gmHumanRowCount <- nrow( gmHumanDataDF )
paste( "grp_month automated row count = ", gmHumanRowCount, sep = "" )

# ...and columns
gmHumanColumnCount <- ncol( gmHumanDataDF )
paste( "grp_month automated column count = ", gmHumanColumnCount, sep = "" )


'grp_month automated row count = 1167'
'grp_month automated column count = 1169'

Get just the tie rows and columns for initializing network libraries.


In [24]:
# the below syntax returns only as many columns as there are rows, so
#     omitting any trait columns that lie in columns on the right side
#     of the file.
gmHumanNetworkDF <- gmHumanDataDF[ , 1 : gmHumanRowCount ]
#str( gmHumanNetworkDF )

In [25]:
# convert to a matrix
gmHumanNetworkMatrix <- as.matrix( gmHumanNetworkDF )
# str( gmHumanNetworkMatrix )

grp_month (gm) - human - initialize statnet

First, load the statnet package, then load the automated grp_month data into statnet object and assign attributes to nodes.

Based on context_text/R/sna/statnet/sna-statnet-init.r.


In [26]:
# make sure you've loaded the statnet library
# install.packages( "statnet" )
library( statnet )

In [27]:
# If you have a data frame of attributes (each attribute is a column, with
#     attribute name the column name), you can associate those attributes
#     when you create the network.
# attribute help: http://www.inside-r.org/packages/cran/network/docs/loading.attributes

# load attributes from a file:
#tab_attribute_test1 <- read.delim( "tab-test1-attribute_data.txt", header = TRUE, row.names = 1, check.names = FALSE )

# or create DataFrame by just grabbing the attribute columns
#gmHumanNetworkAttributeDF <- gmHumanDataDF[ , 1169:1170 ]
gmHumanNetworkAttributeDF <- gmHumanDataDF[ , 1168:1169 ]

# convert matrix to statnet network object instance.
gmHumanNetworkStatnet <- network( gmHumanNetworkMatrix, matrix.type = "adjacency", directed = FALSE, vertex.attr = gmHumanNetworkAttributeDF )

# look at information now.
gmHumanNetworkStatnet

# Network attributes:
#  vertices = 314
#  directed = FALSE
#  hyper = FALSE
#  loops = FALSE
#  multiple = FALSE
#  bipartite = FALSE
#  total edges= 309
#    missing edges= 0
#    non-missing edges= 309
#
# Vertex attribute names:
#    person_type vertex.names
#
# No edge attributes


 Network attributes:
  vertices = 1167 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 1201 
    missing edges= 0 
    non-missing edges= 1201 

 Vertex attribute names: 
    person_id person_type vertex.names 

 Edge attribute names not shown 

In [7]:
# human - include ties Greater than or equal to 0 (GE0)
gmHumanMeanTieWeightGE0Vector <- apply( gmHumanNetworkMatrix, 1, calculateListMean )
gmHumanDataDF$meanTieWeightGE0 <- gmHumanMeanTieWeightGE0Vector

# human - include ties Greater than or equal to 1 (GE1)
gmHumanMeanTieWeightGE1Vector <- apply( gmHumanNetworkMatrix, 1, calculateListMean, minValueToIncludeIN = 1 )
gmHumanDataDF$meanTieWeightGE1 <- gmHumanMeanTieWeightGE1Vector

# human - Max tie weight?
gmHumanMaxTieWeightVector <- apply( gmHumanNetworkMatrix, 1, calculateListMax )
gmHumanDataDF$maxTieWeight <- gmHumanMaxTieWeightVector

grp_month (gm) - human - Basic metrics


In [28]:
# assuming that our statnet network object is in reference test1_statnet.

# Use the degree function in the sna package to create vector of degree values
#    for each node.  Make sure to pass the gmode parameter to tell it that the
#    graph is not directed (gmode = "graph", instead of "digraph").
# Doc: http://www.inside-r.org/packages/cran/sna/docs/degree
#degree_vector <- degree( test1_statnet, gmode = "graph" )

# If you have other libraries loaded that also implement a degree function, you
#    can also call this with package name:
gmHumanDegreeVector <- sna::degree( gmHumanNetworkStatnet, gmode = "graph" )

# output the vector
gmHumanDegreeVector

# want more info on the degree function?  You can get to it eventually through
#    the following:
#help( package = "sna" )
#??sna::degree

# what is the average (mean) degree?
gmHumanAvgDegree <- mean( gmHumanDegreeVector )
paste( "average degree = ", gmHumanAvgDegree, sep = "" )

# subset vector to get only those that are above mean
gmHumanAboveMeanVector <- gmHumanDegreeVector[ gmHumanDegreeVector > gmHumanAvgDegree ]

# Take the degree and associate it with each node as a node attribute.
#    (%v% is a shortcut for the get.vertex.attribute command)
gmHumanNetworkStatnet %v% "degree" <- gmHumanDegreeVector

# also add degree vector to original data frame
gmHumanDataDF$degree <- gmHumanDegreeVector


  1. 28
  2. 1
  3. 36
  4. 1
  5. 34
  6. 0
  7. 1
  8. 50
  9. 28
  10. 1
  11. 32
  12. 1
  13. 61
  14. 1
  15. 1
  16. 99
  17. 44
  18. 31
  19. 47
  20. 66
  21. 2
  22. 6
  23. 1
  24. 1
  25. 2
  26. 1
  27. 38
  28. 2
  29. 2
  30. 2
  31. 2
  32. 2
  33. 1
  34. 1
  35. 1
  36. 1
  37. 1
  38. 41
  39. 1
  40. 1
  41. 1
  42. 2
  43. 1
  44. 1
  45. 7
  46. 1
  47. 0
  48. 1
  49. 46
  50. 4
  51. 1
  52. 19
  53. 3
  54. 1
  55. 2
  56. 2
  57. 1
  58. 1
  59. 1
  60. 1
  61. 0
  62. 1
  63. 1
  64. 1
  65. 1
  66. 0
  67. 1
  68. 2
  69. 3
  70. 1
  71. 1
  72. 32
  73. 1
  74. 1
  75. 3
  76. 1
  77. 1
  78. 1
  79. 1
  80. 1
  81. 1
  82. 1
  83. 1
  84. 1
  85. 1
  86. 1
  87. 1
  88. 1
  89. 1
  90. 1
  91. 1
  92. 1
  93. 7
  94. 1
  95. 1
  96. 1
  97. 9
  98. 2
  99. 1
  100. 1
  101. 1
  102. 1
  103. 1
  104. 2
  105. 7
  106. 1
  107. 0
  108. 0
  109. 0
  110. 1
  111. 1
  112. 1
  113. 1
  114. 1
  115. 1
  116. 2
  117. 2
  118. 2
  119. 2
  120. 1
  121. 1
  122. 1
  123. 1
  124. 1
  125. 2
  126. 1
  127. 1
  128. 1
  129. 1
  130. 1
  131. 1
  132. 1
  133. 1
  134. 37
  135. 1
  136. 1
  137. 1
  138. 1
  139. 1
  140. 2
  141. 1
  142. 1
  143. 1
  144. 19
  145. 1
  146. 1
  147. 1
  148. 1
  149. 1
  150. 1
  151. 33
  152. 1
  153. 1
  154. 1
  155. 1
  156. 1
  157. 1
  158. 1
  159. 1
  160. 1
  161. 2
  162. 1
  163. 1
  164. 1
  165. 1
  166. 1
  167. 1
  168. 1
  169. 1
  170. 1
  171. 1
  172. 1
  173. 4
  174. 1
  175. 1
  176. 31
  177. 45
  178. 1
  179. 2
  180. 1
  181. 1
  182. 1
  183. 44
  184. 1
  185. 1
  186. 1
  187. 1
  188. 1
  189. 1
  190. 1
  191. 2
  192. 1
  193. 1
  194. 2
  195. 1
  196. 1
  197. 3
  198. 0
  199. 1
  200. 1
  201. 1
  202. 1
  203. 1
  204. 1
  205. 1
  206. 1
  207. 1
  208. 1
  209. 1
  210. 2
  211. 2
  212. 4
  213. 4
  214. 4
  215. 4
  216. 4
  217. 1
  218. 1
  219. 1
  220. 1
  221. 1
  222. 4
  223. 1
  224. 1
  225. 1
  226. 1
  227. 1
  228. 1
  229. 1
  230. 1
  231. 2
  232. 1
  233. 19
  234. 1
  235. 1
  236. 1
  237. 1
  238. 2
  239. 2
  240. 2
  241. 3
  242. 2
  243. 2
  244. 4
  245. 1
  246. 0
  247. 1
  248. 1
  249. 2
  250. 1
  251. 1
  252. 1
  253. 1
  254. 1
  255. 1
  256. 1
  257. 1
  258. 1
  259. 1
  260. 1
  261. 3
  262. 1
  263. 1
  264. 1
  265. 1
  266. 1
  267. 1
  268. 1
  269. 1
  270. 1
  271. 1
  272. 1
  273. 1
  274. 1
  275. 1
  276. 2
  277. 2
  278. 2
  279. 2
  280. 3
  281. 1
  282. 1
  283. 2
  284. 1
  285. 1
  286. 1
  287. 1
  288. 1
  289. 1
  290. 1
  291. 4
  292. 1
  293. 1
  294. 2
  295. 1
  296. 1
  297. 2
  298. 4
  299. 1
  300. 1
  301. 1
  302. 1
  303. 1
  304. 1
  305. 1
  306. 1
  307. 1
  308. 1
  309. 76
  310. 22
  311. 50
  312. 3
  313. 1
  314. 1
  315. 1
  316. 13
  317. 9
  318. 3
  319. 1
  320. 2
  321. 3
  322. 1
  323. 1
  324. 2
  325. 1
  326. 2
  327. 2
  328. 1
  329. 0
  330. 1
  331. 0
  332. 0
  333. 0
  334. 0
  335. 1
  336. 1
  337. 0
  338. 1
  339. 0
  340. 0
  341. 1
  342. 0
  343. 2
  344. 0
  345. 1
  346. 1
  347. 1
  348. 0
  349. 0
  350. 1
  351. 1
  352. 0
  353. 0
  354. 1
  355. 0
  356. 1
  357. 1
  358. 0
  359. 10
  360. 0
  361. 1
  362. 1
  363. 1
  364. 1
  365. 1
  366. 1
  367. 3
  368. 2
  369. 14
  370. 2
  371. 1
  372. 0
  373. 1
  374. 1
  375. 2
  376. 1
  377. 1
  378. 1
  379. 1
  380. 1
  381. 1
  382. 1
  383. 1
  384. 1
  385. 1
  386. 2
  387. 1
  388. 2
  389. 1
  390. 1
  391. 1
  392. 1
  393. 0
  394. 1
  395. 1
  396. 1
  397. 1
  398. 1
  399. 1
  400. 1
  401. 1
  402. 2
  403. 1
  404. 1
  405. 1
  406. 7
  407. 1
  408. 1
  409. 0
  410. 1
  411. 1
  412. 1
  413. 0
  414. 0
  415. 1
  416. 1
  417. 1
  418. 1
  419. 1
  420. 1
  421. 1
  422. 0
  423. 1
  424. 1
  425. 1
  426. 1
  427. 1
  428. 1
  429. 1
  430. 1
  431. 1
  432. 1
  433. 1
  434. 1
  435. 1
  436. 1
  437. 1
  438. 1
  439. 1
  440. 1
  441. 1
  442. 3
  443. 2
  444. 4
  445. 2
  446. 1
  447. 2
  448. 2
  449. 1
  450. 1
  451. 1
  452. 1
  453. 1
  454. 0
  455. 1
  456. 1
  457. 1
  458. 1
  459. 1
  460. 1
  461. 1
  462. 1
  463. 1
  464. 1
  465. 1
  466. 1
  467. 1
  468. 1
  469. 1
  470. 1
  471. 1
  472. 1
  473. 1
  474. 1
  475. 1
  476. 2
  477. 1
  478. 4
  479. 1
  480. 1
  481. 1
  482. 1
  483. 1
  484. 1
  485. 1
  486. 1
  487. 1
  488. 1
  489. 1
  490. 1
  491. 1
  492. 1
  493. 1
  494. 1
  495. 1
  496. 1
  497. 1
  498. 1
  499. 1
  500. 1
  501. 1
  502. 1
  503. 1
  504. 1
  505. 1
  506. 1
  507. 1
  508. 1
  509. 1
  510. 1
  511. 1
  512. 1
  513. 1
  514. 1
  515. 1
  516. 1
  517. 1
  518. 1
  519. 1
  520. 1
  521. 0
  522. 1
  523. 1
  524. 1
  525. 1
  526. 1
  527. 1
  528. 1
  529. 1
  530. 1
  531. 1
  532. 1
  533. 1
  534. 1
  535. 1
  536. 1
  537. 1
  538. 1
  539. 1
  540. 1
  541. 1
  542. 2
  543. 1
  544. 1
  545. 1
  546. 1
  547. 1
  548. 1
  549. 1
  550. 0
  551. 1
  552. 1
  553. 1
  554. 1
  555. 1
  556. 1
  557. 1
  558. 1
  559. 1
  560. 1
  561. 1
  562. 1
  563. 2
  564. 1
  565. 1
  566. 1
  567. 1
  568. 1
  569. 1
  570. 1
  571. 1
  572. 1
  573. 1
  574. 1
  575. 1
  576. 1
  577. 1
  578. 1
  579. 1
  580. 1
  581. 1
  582. 1
  583. 1
  584. 1
  585. 1
  586. 1
  587. 1
  588. 1
  589. 1
  590. 1
  591. 1
  592. 1
  593. 1
  594. 2
  595. 1
  596. 1
  597. 1
  598. 1
  599. 0
  600. 1
  601. 1
  602. 1
  603. 1
  604. 1
  605. 1
  606. 1
  607. 1
  608. 1
  609. 1
  610. 1
  611. 4
  612. 0
  613. 1
  614. 1
  615. 1
  616. 1
  617. 0
  618. 1
  619. 5
  620. 1
  621. 1
  622. 1
  623. 1
  624. 1
  625. 1
  626. 3
  627. 1
  628. 1
  629. 1
  630. 1
  631. 1
  632. 1
  633. 1
  634. 1
  635. 1
  636. 1
  637. 1
  638. 1
  639. 1
  640. 1
  641. 1
  642. 1
  643. 1
  644. 1
  645. 1
  646. 1
  647. 2
  648. 2
  649. 2
  650. 2
  651. 2
  652. 0
  653. 1
  654. 1
  655. 1
  656. 1
  657. 1
  658. 1
  659. 1
  660. 1
  661. 1
  662. 1
  663. 1
  664. 1
  665. 1
  666. 4
  667. 1
  668. 2
  669. 1
  670. 1
  671. 1
  672. 1
  673. 1
  674. 1
  675. 1
  676. 1
  677. 1
  678. 1
  679. 1
  680. 1
  681. 1
  682. 1
  683. 1
  684. 1
  685. 1
  686. 1
  687. 1
  688. 1
  689. 1
  690. 1
  691. 1
  692. 1
  693. 1
  694. 1
  695. 1
  696. 1
  697. 1
  698. 1
  699. 1
  700. 1
  701. 1
  702. 1
  703. 1
  704. 1
  705. 1
  706. 1
  707. 1
  708. 1
  709. 1
  710. 1
  711. 1
  712. 1
  713. 1
  714. 1
  715. 1
  716. 1
  717. 1
  718. 1
  719. 1
  720. 1
  721. 1
  722. 1
  723. 1
  724. 1
  725. 1
  726. 1
  727. 1
  728. 1
  729. 1
  730. 1
  731. 1
  732. 1
  733. 0
  734. 1
  735. 1
  736. 1
  737. 1
  738. 1
  739. 1
  740. 1
  741. 1
  742. 1
  743. 0
  744. 1
  745. 1
  746. 1
  747. 1
  748. 0
  749. 1
  750. 1
  751. 1
  752. 1
  753. 1
  754. 1
  755. 1
  756. 1
  757. 1
  758. 2
  759. 1
  760. 1
  761. 1
  762. 1
  763. 1
  764. 1
  765. 1
  766. 1
  767. 1
  768. 1
  769. 0
  770. 1
  771. 1
  772. 1
  773. 2
  774. 2
  775. 2
  776. 2
  777. 2
  778. 1
  779. 1
  780. 1
  781. 2
  782. 1
  783. 1
  784. 0
  785. 1
  786. 1
  787. 1
  788. 1
  789. 1
  790. 1
  791. 1
  792. 1
  793. 1
  794. 1
  795. 2
  796. 1
  797. 1
  798. 1
  799. 0
  800. 1
  801. 1
  802. 1
  803. 2
  804. 1
  805. 1
  806. 1
  807. 0
  808. 1
  809. 1
  810. 1
  811. 1
  812. 1
  813. 1
  814. 1
  815. 1
  816. 1
  817. 1
  818. 1
  819. 1
  820. 1
  821. 1
  822. 1
  823. 1
  824. 1
  825. 1
  826. 1
  827. 1
  828. 0
  829. 1
  830. 1
  831. 3
  832. 2
  833. 1
  834. 1
  835. 1
  836. 2
  837. 1
  838. 1
  839. 1
  840. 1
  841. 1
  842. 1
  843. 1
  844. 1
  845. 1
  846. 0
  847. 1
  848. 0
  849. 1
  850. 1
  851. 1
  852. 1
  853. 0
  854. 1
  855. 1
  856. 1
  857. 1
  858. 1
  859. 1
  860. 1
  861. 1
  862. 1
  863. 1
  864. 1
  865. 0
  866. 1
  867. 1
  868. 1
  869. 0
  870. 1
  871. 1
  872. 1
  873. 1
  874. 1
  875. 1
  876. 1
  877. 1
  878. 1
  879. 1
  880. 1
  881. 1
  882. 1
  883. 1
  884. 1
  885. 1
  886. 1
  887. 1
  888. 1
  889. 1
  890. 1
  891. 1
  892. 1
  893. 1
  894. 1
  895. 1
  896. 1
  897. 2
  898. 0
  899. 1
  900. 1
  901. 1
  902. 1
  903. 1
  904. 1
  905. 1
  906. 1
  907. 1
  908. 1
  909. 1
  910. 1
  911. 0
  912. 1
  913. 1
  914. 1
  915. 1
  916. 1
  917. 1
  918. 1
  919. 1
  920. 1
  921. 1
  922. 1
  923. 1
  924. 1
  925. 1
  926. 1
  927. 1
  928. 0
  929. 1
  930. 1
  931. 1
  932. 1
  933. 1
  934. 0
  935. 1
  936. 1
  937. 1
  938. 1
  939. 1
  940. 1
  941. 1
  942. 1
  943. 1
  944. 1
  945. 1
  946. 1
  947. 1
  948. 1
  949. 1
  950. 1
  951. 1
  952. 1
  953. 1
  954. 1
  955. 1
  956. 1
  957. 1
  958. 1
  959. 1
  960. 1
  961. 1
  962. 1
  963. 1
  964. 1
  965. 1
  966. 1
  967. 1
  968. 1
  969. 1
  970. 1
  971. 1
  972. 1
  973. 1
  974. 1
  975. 1
  976. 1
  977. 1
  978. 1
  979. 1
  980. 1
  981. 1
  982. 1
  983. 1
  984. 1
  985. 1
  986. 1
  987. 1
  988. 1
  989. 1
  990. 1
  991. 1
  992. 1
  993. 1
  994. 1
  995. 1
  996. 1
  997. 1
  998. 1
  999. 1
  1000. 1
  1001. 1
  1002. 1
  1003. 1
  1004. 1
  1005. 1
  1006. 0
  1007. 0
  1008. 0
  1009. 1
  1010. 1
  1011. 1
  1012. 1
  1013. 0
  1014. 1
  1015. 1
  1016. 1
  1017. 1
  1018. 1
  1019. 1
  1020. 0
  1021. 1
  1022. 1
  1023. 1
  1024. 1
  1025. 1
  1026. 1
  1027. 1
  1028. 1
  1029. 1
  1030. 1
  1031. 1
  1032. 1
  1033. 1
  1034. 1
  1035. 5
  1036. 1
  1037. 1
  1038. 1
  1039. 1
  1040. 1
  1041. 1
  1042. 1
  1043. 1
  1044. 1
  1045. 1
  1046. 2
  1047. 2
  1048. 0
  1049. 2
  1050. 2
  1051. 1
  1052. 1
  1053. 0
  1054. 1
  1055. 1
  1056. 1
  1057. 1
  1058. 1
  1059. 1
  1060. 1
  1061. 1
  1062. 1
  1063. 1
  1064. 1
  1065. 1
  1066. 1
  1067. 2
  1068. 2
  1069. 2
  1070. 2
  1071. 2
  1072. 1
  1073. 1
  1074. 1
  1075. 1
  1076. 1
  1077. 1
  1078. 1
  1079. 1
  1080. 0
  1081. 0
  1082. 1
  1083. 1
  1084. 1
  1085. 1
  1086. 1
  1087. 1
  1088. 1
  1089. 1
  1090. 1
  1091. 1
  1092. 1
  1093. 1
  1094. 1
  1095. 1
  1096. 0
  1097. 1
  1098. 1
  1099. 1
  1100. 1
  1101. 1
  1102. 1
  1103. 1
  1104. 1
  1105. 1
  1106. 1
  1107. 1
  1108. 1
  1109. 1
  1110. 1
  1111. 1
  1112. 1
  1113. 0
  1114. 0
  1115. 2
  1116. 2
  1117. 2
  1118. 2
  1119. 2
  1120. 1
  1121. 1
  1122. 1
  1123. 1
  1124. 1
  1125. 1
  1126. 1
  1127. 1
  1128. 1
  1129. 0
  1130. 0
  1131. 0
  1132. 0
  1133. 0
  1134. 1
  1135. 0
  1136. 1
  1137. 0
  1138. 0
  1139. 0
  1140. 1
  1141. 0
  1142. 0
  1143. 0
  1144. 0
  1145. 0
  1146. 0
  1147. 0
  1148. 0
  1149. 0
  1150. 1
  1151. 0
  1152. 0
  1153. 0
  1154. 0
  1155. 0
  1156. 0
  1157. 0
  1158. 0
  1159. 2
  1160. 0
  1161. 1
  1162. 0
  1163. 0
  1164. 1
  1165. 1
  1166. 1
  1167. 1
'average degree = 2.05826906598115'

In [29]:
# average author degree (person types 2 and 4)
gmHumanAverageAuthorDegree2And4 <- calcAuthorMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = TRUE )
paste( "average author degree (2 and 4) = ", gmHumanAverageAuthorDegree2And4, sep = "" )

# average author degree (person type 2 only)
gmHumanAverageAuthorDegreeOnly2 <- calcAuthorMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = FALSE )
paste( "average author degree (only 2) = ", gmHumanAverageAuthorDegreeOnly2, sep = "" )

# average source degree (person types 3 and 4)
gmHumanAverageSourceDegree3And4 <- calcSourceMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = TRUE )
paste( "average source degree (3 and 4) = ", gmHumanAverageSourceDegree3And4, sep = "" )

# average source degree (person type 3 only)
gmHumanAverageSourceDegreeOnly3 <- calcSourceMeanDegree( dataFrameIN = gmHumanDataDF, includeBothIN = FALSE )
paste( "average source degree (only 3) = ", gmHumanAverageSourceDegreeOnly3, sep = "" )


'average author degree (2 and 4) = 25.3958333333333'
'average author degree (only 2) = 25.3958333333333'
'average source degree (3 and 4) = 1.1564027370479'
'average source degree (only 3) = 1.1564027370479'

grp_month (gm) - human - More metrics

Now that we have the data in statnet object, run the code in the following for more in-depth information:

  • context_text/R/sna/statnet/sna-statnet-network-stats.r

In [30]:
# Links:
# - manual (PDF): http://cran.r-project.org/web/packages/sna/sna.pdf
# - good notes: http://www.shizukalab.com/toolkits/sna/node-level-calculations

# Also, be advised that statnet and igraph don't really play nice together.
#    If you'll be using both, best idea is to have a workspace for each.

#==============================================================================#
# statnet
#==============================================================================#

# make sure you've loaded the statnet library (includes sna)
# install.packages( "statnet" )
#library( statnet )

#==============================================================================#
# NODE level
#==============================================================================#

# what is the standard deviation of the degrees?
gmHumanDegreeSd <- sd( gmHumanDegreeVector )
paste( "degree SD = ", gmHumanDegreeSd, sep = "" )

# what is the variance of the degrees?
gmHumanDegreeVar <- var( gmHumanDegreeVector )
paste( "degree variance = ", gmHumanDegreeVar, sep = "" )

# what is the max value among the degrees?
gmHumanDegreeMax <- max( gmHumanDegreeVector )
paste( "degree max = ", gmHumanDegreeMax, sep = "" )

# calculate and plot degree distributions
gmHumanDegreeFrequenciesTable <- table( gmHumanDegreeVector )
paste( "degree frequencies = ", gmHumanDegreeFrequenciesTable, sep = "" )
gmHumanDegreeFrequenciesTable

# node-level undirected betweenness
gmHumanBetweenness <- sna::betweenness( gmHumanNetworkStatnet, gmode = "graph", cmode = "undirected" )

#paste( "betweenness = ", gmHumanBetweenness, sep = "" )
# associate with each node as a node attribute.
#    (%v% is a shortcut for the get.vertex.attribute command)
gmHumanNetworkStatnet %v% "betweenness" <- gmHumanBetweenness

# also add degree vector to original data frame
gmHumanDataDF$betweenness <- gmHumanBetweenness

#==============================================================================#
# NETWORK level
#==============================================================================#

# graph-level degree centrality
gmHumanDegreeCentrality <- sna::centralization( gmHumanNetworkStatnet, sna::degree, mode = "graph" )
paste( "degree centrality = ", gmHumanDegreeCentrality, sep = "" )

# graph-level betweenness centrality
gmHumanBetweennessCentrality <- sna::centralization( gmHumanNetworkStatnet, sna::betweenness, mode = "graph", cmode = "undirected" )
paste( "betweenness centrality = ", gmHumanBetweennessCentrality, sep = "" )

# graph-level connectedness
gmHumanConnectedness <- sna::connectedness( gmHumanNetworkStatnet )
paste( "connectedness = ", gmHumanConnectedness, sep = "" )

# graph-level transitivity
gmHumanTransitivity <- sna::gtrans( gmHumanNetworkStatnet, mode = "graph" )
paste( "transitivity = ", gmHumanTransitivity, sep = "" )

# graph-level density
gmHumanDensity <- sna::gden( gmHumanNetworkStatnet, mode = "graph" )
paste( "density = ", gmHumanDensity, sep = "" )


'degree SD = 6.65377784484138'
'degree variance = 44.272759608502'
'degree max = 99'
  1. 'degree frequencies = 97'
  2. 'degree frequencies = 911'
  3. 'degree frequencies = 91'
  4. 'degree frequencies = 14'
  5. 'degree frequencies = 15'
  6. 'degree frequencies = 2'
  7. 'degree frequencies = 1'
  8. 'degree frequencies = 4'
  9. 'degree frequencies = 2'
  10. 'degree frequencies = 1'
  11. 'degree frequencies = 1'
  12. 'degree frequencies = 1'
  13. 'degree frequencies = 3'
  14. 'degree frequencies = 1'
  15. 'degree frequencies = 2'
  16. 'degree frequencies = 2'
  17. 'degree frequencies = 2'
  18. 'degree frequencies = 1'
  19. 'degree frequencies = 1'
  20. 'degree frequencies = 1'
  21. 'degree frequencies = 1'
  22. 'degree frequencies = 1'
  23. 'degree frequencies = 1'
  24. 'degree frequencies = 2'
  25. 'degree frequencies = 1'
  26. 'degree frequencies = 1'
  27. 'degree frequencies = 1'
  28. 'degree frequencies = 2'
  29. 'degree frequencies = 1'
  30. 'degree frequencies = 1'
  31. 'degree frequencies = 1'
  32. 'degree frequencies = 1'
gmHumanDegreeVector
  0   1   2   3   4   5   6   7   9  10  13  14  19  22  28  31  32  33  34  36 
 97 911  91  14  15   2   1   4   2   1   1   1   3   1   2   2   2   1   1   1 
 37  38  41  44  45  46  47  50  61  66  76  99 
  1   1   1   2   1   1   1   2   1   1   1   1 
'degree centrality = 0.0832831513777339'
'betweenness centrality = 0.220493193695819'
'connectedness = 0.673448360502733'
Warning message in sna::gtrans(gmHumanNetworkStatnet, mode = "graph"):
“gtrans called with use.adjacency=TRUE, but your data looks too large for that to work well.  Overriding to edgelist method.”
'transitivity = 0.0131821874307658'
'density = 0.00176523933617594'

grp_month (gm) - human - create node attribute DataFrame

If you want to just work with the traits of the nodes/vertexes, you can combine the attribute vectors into a data frame.


In [31]:
#==============================================================================#
# output attributes to data frame
#==============================================================================#

# if you want to just work with the traits of the nodes/vertexes, you can
#    combine the attribute vectors into a data frame.

# first, output network object to see what attributes you have
gmHumanNetworkStatnet

# then, combine them into a data frame.
gmHumanNodeAttrDF <- data.frame( id = gmHumanNetworkStatnet %v% "vertex.names",
                                 person_id = gmHumanNetworkStatnet %v% "person_id",
                                 person_type = gmHumanNetworkStatnet %v% "person_type",
                                 degree = gmHumanNetworkStatnet %v% "degree",
                                 betweenness = gmHumanNetworkStatnet %v% "betweenness" )


 Network attributes:
  vertices = 1167 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 1201 
    missing edges= 0 
    non-missing edges= 1201 

 Vertex attribute names: 
    betweenness degree person_id person_type vertex.names 

 Edge attribute names not shown 

grp_month QAP graph correlation between automated and ground truth

Now, compare the automated and human-coded networks themselves using graph correlation in QAP.

Based on: context_text/R/sna/statnet/sna-qap.r

Note: QAP compares two networks, so will have to wait until both OpenCalais and human coding networks have been processed.


In [32]:
# link to good doc on qaptest(){sna} function: http://www.inside-r.org/packages/cran/sna/docs/qaptest

# First, need to load data - see (or just source() ) the file "sna-load_data.r".
# source( "sna-load_data.r" )
# does the following (among other things):
# Start with loading in tab-delimited files.
#humanNetworkData <- read.delim( "human-sourcenet_data-20150504-002453.tab", header = TRUE, row.names = 1, check.names = FALSE )
#calaisNetworkData <- read.delim( "puter-sourcenet_data-20150504-002507.tab", header = TRUE, row.names = 1, check.names = FALSE )

# remove the right-most column, which contains non-tie info on nodes.
#humanNetworkTies <- humanNetworkData[ , -ncol( humanNetworkData ) ]
#gmAutomatedNetworkDF <- calaisNetworkData[ , -ncol( calaisNetworkData )]

# convert each to a matrix
#gmHumanNetworkMatrix <- as.matrix( gmHumanNetworkTies )
#gmAutomatedNetworkMatrix <- as.matrix( gmAutomatedNetworkDF )

# imports
# install.packages( "sna" )
# install.packages( "statnet" )
library( "sna" )

# package up data for calling qaptest() - first make 3-dimensional array to hold
#    our two matrices - this is known as a "graph set".
graphSetArray <- array( dim = c( 2, ncol( gmHumanNetworkMatrix ), nrow( gmHumanNetworkMatrix ) ) )

# then, place each matrix in one dimension of the array.
graphSetArray[ 1, , ] <- gmHumanNetworkMatrix
graphSetArray[ 2, , ] <- gmAutomatedNetworkMatrix

# first, try a graph correlation
graphCorrelation <- sna::gcor( gmHumanNetworkMatrix, gmAutomatedNetworkMatrix )
paste( "graph correlation = ", graphCorrelation, sep = "" )

# try a qaptest...
qapGcorResult <- sna::qaptest( graphSetArray, sna::gcor, g1 = 1, g2 = 2 )
summary( qapGcorResult )
plot( qapGcorResult )

# graph covariance...
graphCovariance <- sna::gcov( gmHumanNetworkMatrix, gmAutomatedNetworkMatrix )
graphCovariance
paste( "graph covariance = ", graphCovariance, sep = "" )

# try a qaptest...
qapGcovResult <- sna::qaptest( graphSetArray, sna::gcov, g1 = 1, g2 = 2 )
summary( qapGcovResult )
plot( qapGcovResult )

# Hamming Distance
graphHammingDist <- sna::hdist( gmHumanNetworkMatrix, gmAutomatedNetworkMatrix )
paste( "graph hamming distance = ", graphHammingDist, sep = "" )

# try a qaptest...
qapHdistResult <- sna::qaptest( graphSetArray, sna::hdist, g1 = 1, g2 = 2 )
summary( qapHdistResult )
plot( qapHdistResult )

# graph structural correlation?
#graphStructCorrelation <- gscor( gmHumanNetworkMatrix, gmAutomatedNetworkMatrix )
#graphStructCorrelation


'graph correlation = 0.914011398376571'
QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 0 
	p(f(perm) <= f(d)): 1 

Test Diagnostics:
	Test Value (f(d)): 0.9140114 
	Replications: 1000 
	Distribution Summary:
		Min:	 -0.00153729 
		1stQ:	 -0.0009019339 
		Med:	 -0.0002665774 
		Mean:	 -1.942372e-05 
		3rdQ:	 0.0003687791 
		Max:	 0.01116984 
0.0021144384905141
'graph covariance = 0.0021144384905141'
QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 0 
	p(f(perm) <= f(d)): 1 

Test Diagnostics:
	Test Value (f(d)): 0.002114438 
	Replications: 1000 
	Distribution Summary:
		Min:	 -3.556308e-06 
		1stQ:	 -2.086499e-06 
		Med:	 -6.166898e-07 
		Mean:	 4.472427e-08 
		3rdQ:	 8.531192e-07 
		Max:	 1.996064e-05 
'graph hamming distance = 514'
QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 1 
	p(f(perm) <= f(d)): 0 

Test Diagnostics:
	Test Value (f(d)): 514 
	Replications: 1000 
	Distribution Summary:
		Min:	 5058 
		1stQ:	 5122 
		Med:	 5126 
		Mean:	 5125.604 
		3rdQ:	 5130 
		Max:	 5134 

grp_week analysis

Look at a single week from the shiny new month of data.


In [33]:
output_prefix <- "grp_week"

grp_week (gw) - automated - OpenCalais

First, we'll analyze the month of data coded by OpenCalais. Set up some variables to store where data is located:

grp_week (gw) - automated - Read data

Read in the data from tab-delimited data file, then get it in right data structures for use in R SNA.


In [34]:
# initialize variables
gwAutomatedDataFolder <- paste( data_directory, "/network/grp_month", sep = "" )
gwAutomatedDataFile <- "sourcenet_data-20171206-031358-grp_month-automated-week_subset.tab"
gwAutomatedDataPath <- paste( gwAutomatedDataFolder, "/", gwAutomatedDataFile, sep = "" )

In [35]:
gwAutomatedDataPath


'/home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month/sourcenet_data-20171206-031358-grp_month-automated-week_subset.tab'

Load the data file into memory


In [36]:
# tab-delimited:
gwAutomatedDataDF <- read.delim( gwAutomatedDataPath, header = TRUE, row.names = 1, check.names = FALSE )

In [37]:
# get count of rows...
gwAutomatedRowCount <- nrow( gwAutomatedDataDF )
paste( output_prefix, "automated row count =", gwAutomatedRowCount, sep = " " )

# ...and columns
gwAutomatedColumnCount <- ncol( gwAutomatedDataDF )
paste( output_prefix, "automated column count =", gwAutomatedColumnCount, sep = " " )


'grp_week automated row count = 1167'
'grp_week automated column count = 1169'

Get just the tie rows and columns for initializing network libraries.


In [38]:
# the below syntax returns only as many columns as there are rows, so
#     omitting any trait columns that lie in columns on the right side
#     of the file.
gwAutomatedNetworkDF <- gwAutomatedDataDF[ , 1 : gwAutomatedRowCount ]
#str( gwAutomatedNetworkDF )

In [39]:
# convert to a matrix
gwAutomatedNetworkMatrix <- as.matrix( gwAutomatedNetworkDF )
# str( gwAutomatedNetworkMatrix )

grp_week (gw) - automated - initialize statnet

First, load the statnet package, then load the automated grp_month week subset data into statnet object and assign attributes to nodes.

Based on context_text/R/sna/statnet/sna-statnet-init.r.


In [40]:
# make sure you've loaded the statnet library
# install.packages( "statnet" )
library( statnet )

In [41]:
# If you have a data frame of attributes (each attribute is a column, with
#     attribute name the column name), you can associate those attributes
#     when you create the network.
# attribute help: http://www.inside-r.org/packages/cran/network/docs/loading.attributes

# load attributes from a file:
#tab_attribute_test1 <- read.delim( "tab-test1-attribute_data.txt", header = TRUE, row.names = 1, check.names = FALSE )

# or create DataFrame by just grabbing the attribute columns
gwAutomatedNetworkAttributeDF <- gwAutomatedDataDF[ , 1168:1169 ]

# convert matrix to statnet network object instance.
gwAutomatedNetworkStatnet <- network( gwAutomatedNetworkMatrix, matrix.type = "adjacency", directed = FALSE, vertex.attr = gwAutomatedNetworkAttributeDF )

# look at information now.
gwAutomatedNetworkStatnet

# Network attributes:
#  vertices = 314
#  directed = FALSE
#  hyper = FALSE
#  loops = FALSE
#  multiple = FALSE
#  bipartite = FALSE
#  total edges= 309
#    missing edges= 0
#    non-missing edges= 309
#
# Vertex attribute names:
#    person_type vertex.names
#
# No edge attributes


 Network attributes:
  vertices = 1167 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 298 
    missing edges= 0 
    non-missing edges= 298 

 Vertex attribute names: 
    person_id person_type vertex.names 

No edge attributes

In [8]:
# calais - include ties Greater than or equal to 0 (GE0)
gwAutomatedMeanTieWeightGE0Vector <- apply( gwAutomatedNetworkMatrix, 1, calculateListMean )
gwAutomatedDataDF$meanTieWeightGE0 <- gwAutomatedMeanTieWeightGE0Vector

# calais - include ties Greater than or equal to 1 (GE1)
gwAutomatedMeanTieWeightGE1Vector <- apply( gwAutomatedNetworkMatrix, 1, calculateListMean, minValueToIncludeIN = 1 )
gwAutomatedDataDF$meanTieWeightGE1 <- gwAutomatedMeanTieWeightGE1Vector

# automated - Max tie weight?
gwAutomatedMaxTieWeightVector <- apply( gwAutomatedNetworkMatrix, 1, calculateListMax )
gwAutomatedDataDF$maxTieWeight <- gwAutomatedMaxTieWeightVector

grp_week (gw) - automated - Basic metrics


In [42]:
# assuming that our statnet network object is in reference test1_statnet.

# Use the degree function in the sna package to create vector of degree values
#    for each node.  Make sure to pass the gwode parameter to tell it that the
#    graph is not directed (gwode = "graph", instead of "digraph").
# Doc: http://www.inside-r.org/packages/cran/sna/docs/degree
#degree_vector <- degree( test1_statnet, gwode = "graph" )

# If you have other libraries loaded that also implement a degree function, you
#    can also call this with package name:
gwAutomatedDegreeVector <- sna::degree( gwAutomatedNetworkStatnet, gmode = "graph" )

# output the vector
gwAutomatedDegreeVector

# want more info on the degree function?  You can get to it eventually through
#    the following:
#help( package = "sna" )
#??sna::degree

# what is the average (mean) degree?
gwAutomatedAvgDegree <- mean( gwAutomatedDegreeVector )
paste( output_prefix, "average degree =", gwAutomatedAvgDegree, sep = " " )

# subset vector to get only those that are above mean
gwAutomatedAboveMeanVector <- gwAutomatedDegreeVector[ gwAutomatedDegreeVector > gwAutomatedAvgDegree ]

# Take the degree and associate it with each node as a node attribute.
#    (%v% is a shortcut for the get.vertex.attribute command)
gwAutomatedNetworkStatnet %v% "degree" <- gwAutomatedDegreeVector

# also add degree vector to original data frame
gwAutomatedDataDF$degree <- gwAutomatedDegreeVector


  1. 5
  2. 0
  3. 14
  4. 0
  5. 11
  6. 1
  7. 1
  8. 10
  9. 9
  10. 0
  11. 15
  12. 1
  13. 24
  14. 0
  15. 0
  16. 17
  17. 23
  18. 6
  19. 15
  20. 18
  21. 1
  22. 3
  23. 0
  24. 1
  25. 1
  26. 0
  27. 15
  28. 0
  29. 2
  30. 2
  31. 2
  32. 2
  33. 1
  34. 1
  35. 1
  36. 1
  37. 1
  38. 12
  39. 1
  40. 0
  41. 1
  42. 1
  43. 0
  44. 1
  45. 3
  46. 0
  47. 0
  48. 1
  49. 10
  50. 0
  51. 0
  52. 6
  53. 0
  54. 1
  55. 1
  56. 1
  57. 1
  58. 0
  59. 1
  60. 1
  61. 1
  62. 1
  63. 1
  64. 1
  65. 0
  66. 0
  67. 0
  68. 1
  69. 1
  70. 1
  71. 1
  72. 17
  73. 0
  74. 1
  75. 1
  76. 1
  77. 1
  78. 1
  79. 1
  80. 1
  81. 1
  82. 1
  83. 1
  84. 1
  85. 1
  86. 1
  87. 1
  88. 1
  89. 1
  90. 1
  91. 1
  92. 1
  93. 3
  94. 1
  95. 1
  96. 1
  97. 9
  98. 1
  99. 1
  100. 1
  101. 1
  102. 1
  103. 1
  104. 2
  105. 4
  106. 0
  107. 0
  108. 0
  109. 0
  110. 1
  111. 1
  112. 1
  113. 0
  114. 1
  115. 1
  116. 2
  117. 0
  118. 2
  119. 2
  120. 1
  121. 1
  122. 1
  123. 1
  124. 1
  125. 1
  126. 1
  127. 1
  128. 1
  129. 1
  130. 1
  131. 0
  132. 1
  133. 1
  134. 2
  135. 1
  136. 1
  137. 1
  138. 1
  139. 1
  140. 1
  141. 0
  142. 1
  143. 1
  144. 5
  145. 1
  146. 1
  147. 1
  148. 1
  149. 1
  150. 1
  151. 6
  152. 1
  153. 1
  154. 1
  155. 0
  156. 1
  157. 1
  158. 1
  159. 1
  160. 1
  161. 2
  162. 1
  163. 1
  164. 1
  165. 1
  166. 1
  167. 1
  168. 1
  169. 1
  170. 1
  171. 1
  172. 1
  173. 0
  174. 1
  175. 1
  176. 6
  177. 0
  178. 0
  179. 0
  180. 0
  181. 1
  182. 1
  183. 4
  184. 1
  185. 1
  186. 1
  187. 1
  188. 1
  189. 1
  190. 1
  191. 1
  192. 1
  193. 1
  194. 1
  195. 1
  196. 1
  197. 1
  198. 0
  199. 1
  200. 1
  201. 1
  202. 1
  203. 1
  204. 1
  205. 1
  206. 1
  207. 1
  208. 1
  209. 1
  210. 1
  211. 1
  212. 2
  213. 2
  214. 0
  215. 2
  216. 2
  217. 1
  218. 0
  219. 1
  220. 1
  221. 1
  222. 1
  223. 1
  224. 1
  225. 1
  226. 0
  227. 1
  228. 1
  229. 1
  230. 1
  231. 1
  232. 1
  233. 3
  234. 1
  235. 0
  236. 1
  237. 1
  238. 2
  239. 2
  240. 2
  241. 0
  242. 2
  243. 2
  244. 1
  245. 1
  246. 0
  247. 1
  248. 1
  249. 1
  250. 1
  251. 1
  252. 1
  253. 1
  254. 1
  255. 1
  256. 1
  257. 1
  258. 1
  259. 1
  260. 1
  261. 1
  262. 1
  263. 1
  264. 1
  265. 1
  266. 1
  267. 1
  268. 1
  269. 1
  270. 1
  271. 1
  272. 1
  273. 1
  274. 0
  275. 0
  276. 0
  277. 2
  278. 2
  279. 2
  280. 2
  281. 1
  282. 1
  283. 1
  284. 1
  285. 1
  286. 1
  287. 1
  288. 1
  289. 1
  290. 1
  291. 2
  292. 1
  293. 0
  294. 1
  295. 0
  296. 1
  297. 1
  298. 1
  299. 1
  300. 1
  301. 1
  302. 1
  303. 0
  304. 1
  305. 1
  306. 1
  307. 0
  308. 1
  309. 21
  310. 0
  311. 8
  312. 0
  313. 0
  314. 0
  315. 0
  316. 0
  317. 0
  318. 0
  319. 0
  320. 0
  321. 0
  322. 0
  323. 0
  324. 0
  325. 0
  326. 0
  327. 0
  328. 1
  329. 1
  330. 1
  331. 1
  332. 0
  333. 0
  334. 0
  335. 1
  336. 1
  337. 1
  338. 1
  339. 1
  340. 1
  341. 1
  342. 1
  343. 2
  344. 1
  345. 0
  346. 1
  347. 1
  348. 1
  349. 1
  350. 1
  351. 1
  352. 0
  353. 1
  354. 1
  355. 0
  356. 0
  357. 0
  358. 0
  359. 0
  360. 0
  361. 0
  362. 0
  363. 0
  364. 0
  365. 0
  366. 0
  367. 0
  368. 0
  369. 3
  370. 0
  371. 0
  372. 0
  373. 0
  374. 0
  375. 0
  376. 0
  377. 0
  378. 0
  379. 0
  380. 0
  381. 0
  382. 0
  383. 0
  384. 0
  385. 0
  386. 0
  387. 0
  388. 0
  389. 0
  390. 0
  391. 0
  392. 0
  393. 0
  394. 0
  395. 0
  396. 0
  397. 0
  398. 0
  399. 0
  400. 0
  401. 0
  402. 0
  403. 0
  404. 0
  405. 0
  406. 0
  407. 0
  408. 0
  409. 0
  410. 0
  411. 0
  412. 0
  413. 0
  414. 0
  415. 0
  416. 0
  417. 0
  418. 0
  419. 0
  420. 0
  421. 0
  422. 0
  423. 0
  424. 0
  425. 0
  426. 0
  427. 0
  428. 0
  429. 0
  430. 0
  431. 0
  432. 0
  433. 0
  434. 0
  435. 0
  436. 0
  437. 0
  438. 0
  439. 0
  440. 0
  441. 0
  442. 0
  443. 0
  444. 0
  445. 0
  446. 0
  447. 0
  448. 0
  449. 0
  450. 0
  451. 0
  452. 0
  453. 0
  454. 0
  455. 0
  456. 0
  457. 0
  458. 0
  459. 0
  460. 0
  461. 0
  462. 0
  463. 0
  464. 0
  465. 0
  466. 0
  467. 0
  468. 0
  469. 0
  470. 0
  471. 0
  472. 0
  473. 0
  474. 0
  475. 0
  476. 0
  477. 0
  478. 1
  479. 1
  480. 1
  481. 1
  482. 1
  483. 0
  484. 0
  485. 0
  486. 0
  487. 0
  488. 0
  489. 0
  490. 0
  491. 0
  492. 0
  493. 0
  494. 0
  495. 0
  496. 0
  497. 0
  498. 0
  499. 0
  500. 0
  501. 0
  502. 0
  503. 0
  504. 0
  505. 0
  506. 0
  507. 0
  508. 0
  509. 0
  510. 0
  511. 0
  512. 0
  513. 0
  514. 0
  515. 0
  516. 0
  517. 0
  518. 0
  519. 0
  520. 0
  521. 0
  522. 0
  523. 0
  524. 0
  525. 0
  526. 0
  527. 0
  528. 0
  529. 0
  530. 0
  531. 0
  532. 0
  533. 0
  534. 0
  535. 0
  536. 0
  537. 0
  538. 0
  539. 0
  540. 0
  541. 0
  542. 0
  543. 0
  544. 0
  545. 0
  546. 0
  547. 0
  548. 0
  549. 0
  550. 0
  551. 0
  552. 0
  553. 0
  554. 0
  555. 0
  556. 0
  557. 0
  558. 0
  559. 0
  560. 0
  561. 0
  562. 0
  563. 0
  564. 0
  565. 0
  566. 0
  567. 0
  568. 0
  569. 0
  570. 0
  571. 0
  572. 0
  573. 0
  574. 0
  575. 0
  576. 0
  577. 0
  578. 0
  579. 0
  580. 1
  581. 1
  582. 0
  583. 0
  584. 0
  585. 0
  586. 0
  587. 0
  588. 0
  589. 0
  590. 0
  591. 0
  592. 0
  593. 0
  594. 0
  595. 0
  596. 0
  597. 0
  598. 0
  599. 0
  600. 0
  601. 0
  602. 0
  603. 0
  604. 0
  605. 0
  606. 0
  607. 0
  608. 0
  609. 0
  610. 0
  611. 2
  612. 1
  613. 0
  614. 0
  615. 0
  616. 0
  617. 0
  618. 0
  619. 0
  620. 0
  621. 0
  622. 0
  623. 0
  624. 0
  625. 0
  626. 0
  627. 0
  628. 0
  629. 0
  630. 0
  631. 0
  632. 0
  633. 0
  634. 0
  635. 0
  636. 0
  637. 0
  638. 0
  639. 0
  640. 0
  641. 0
  642. 0
  643. 0
  644. 0
  645. 0
  646. 0
  647. 0
  648. 0
  649. 0
  650. 0
  651. 0
  652. 0
  653. 0
  654. 0
  655. 0
  656. 0
  657. 0
  658. 0
  659. 0
  660. 0
  661. 0
  662. 0
  663. 0
  664. 0
  665. 0
  666. 0
  667. 0
  668. 0
  669. 0
  670. 0
  671. 0
  672. 0
  673. 0
  674. 0
  675. 0
  676. 0
  677. 0
  678. 0
  679. 0
  680. 0
  681. 0
  682. 0
  683. 0
  684. 0
  685. 0
  686. 0
  687. 0
  688. 0
  689. 0
  690. 0
  691. 0
  692. 0
  693. 0
  694. 0
  695. 0
  696. 0
  697. 0
  698. 0
  699. 0
  700. 0
  701. 0
  702. 0
  703. 0
  704. 0
  705. 0
  706. 0
  707. 0
  708. 0
  709. 0
  710. 0
  711. 0
  712. 0
  713. 0
  714. 0
  715. 0
  716. 0
  717. 0
  718. 0
  719. 0
  720. 0
  721. 0
  722. 0
  723. 0
  724. 0
  725. 0
  726. 0
  727. 0
  728. 0
  729. 0
  730. 0
  731. 0
  732. 0
  733. 0
  734. 0
  735. 0
  736. 0
  737. 0
  738. 1
  739. 0
  740. 0
  741. 0
  742. 0
  743. 0
  744. 0
  745. 0
  746. 0
  747. 0
  748. 0
  749. 0
  750. 0
  751. 0
  752. 0
  753. 0
  754. 0
  755. 0
  756. 0
  757. 0
  758. 0
  759. 0
  760. 0
  761. 0
  762. 0
  763. 0
  764. 0
  765. 0
  766. 0
  767. 0
  768. 0
  769. 0
  770. 0
  771. 0
  772. 0
  773. 0
  774. 0
  775. 0
  776. 0
  777. 0
  778. 0
  779. 0
  780. 0
  781. 0
  782. 0
  783. 0
  784. 0
  785. 0
  786. 0
  787. 0
  788. 0
  789. 0
  790. 0
  791. 0
  792. 0
  793. 0
  794. 0
  795. 0
  796. 0
  797. 0
  798. 0
  799. 0
  800. 0
  801. 0
  802. 0
  803. 0
  804. 0
  805. 0
  806. 0
  807. 0
  808. 0
  809. 0
  810. 0
  811. 0
  812. 0
  813. 0
  814. 0
  815. 0
  816. 0
  817. 0
  818. 0
  819. 0
  820. 0
  821. 0
  822. 0
  823. 0
  824. 0
  825. 0
  826. 0
  827. 0
  828. 0
  829. 0
  830. 0
  831. 0
  832. 0
  833. 0
  834. 0
  835. 0
  836. 0
  837. 0
  838. 0
  839. 0
  840. 0
  841. 0
  842. 0
  843. 0
  844. 0
  845. 0
  846. 0
  847. 0
  848. 0
  849. 0
  850. 0
  851. 0
  852. 0
  853. 0
  854. 0
  855. 0
  856. 0
  857. 0
  858. 0
  859. 0
  860. 0
  861. 0
  862. 0
  863. 0
  864. 0
  865. 0
  866. 0
  867. 0
  868. 0
  869. 0
  870. 0
  871. 0
  872. 0
  873. 0
  874. 0
  875. 0
  876. 0
  877. 0
  878. 0
  879. 0
  880. 0
  881. 0
  882. 0
  883. 0
  884. 0
  885. 0
  886. 0
  887. 0
  888. 0
  889. 0
  890. 0
  891. 0
  892. 0
  893. 0
  894. 0
  895. 0
  896. 0
  897. 0
  898. 0
  899. 0
  900. 0
  901. 0
  902. 0
  903. 0
  904. 0
  905. 0
  906. 0
  907. 0
  908. 0
  909. 0
  910. 0
  911. 0
  912. 0
  913. 0
  914. 0
  915. 0
  916. 0
  917. 0
  918. 0
  919. 0
  920. 0
  921. 0
  922. 0
  923. 0
  924. 0
  925. 0
  926. 0
  927. 0
  928. 0
  929. 0
  930. 0
  931. 0
  932. 0
  933. 0
  934. 0
  935. 0
  936. 0
  937. 0
  938. 0
  939. 0
  940. 0
  941. 0
  942. 0
  943. 0
  944. 0
  945. 0
  946. 0
  947. 0
  948. 0
  949. 0
  950. 0
  951. 0
  952. 0
  953. 0
  954. 0
  955. 0
  956. 0
  957. 0
  958. 0
  959. 0
  960. 0
  961. 0
  962. 0
  963. 0
  964. 0
  965. 0
  966. 0
  967. 0
  968. 0
  969. 0
  970. 0
  971. 0
  972. 0
  973. 0
  974. 0
  975. 0
  976. 0
  977. 0
  978. 0
  979. 0
  980. 0
  981. 0
  982. 0
  983. 0
  984. 0
  985. 0
  986. 0
  987. 0
  988. 0
  989. 0
  990. 0
  991. 0
  992. 0
  993. 0
  994. 0
  995. 0
  996. 0
  997. 0
  998. 0
  999. 0
  1000. 0
  1001. 0
  1002. 0
  1003. 0
  1004. 0
  1005. 0
  1006. 0
  1007. 0
  1008. 0
  1009. 0
  1010. 0
  1011. 0
  1012. 0
  1013. 0
  1014. 0
  1015. 0
  1016. 0
  1017. 0
  1018. 0
  1019. 0
  1020. 0
  1021. 0
  1022. 0
  1023. 0
  1024. 0
  1025. 0
  1026. 0
  1027. 0
  1028. 0
  1029. 0
  1030. 0
  1031. 0
  1032. 0
  1033. 0
  1034. 0
  1035. 0
  1036. 0
  1037. 0
  1038. 0
  1039. 0
  1040. 0
  1041. 0
  1042. 0
  1043. 0
  1044. 0
  1045. 0
  1046. 0
  1047. 0
  1048. 0
  1049. 0
  1050. 0
  1051. 0
  1052. 0
  1053. 0
  1054. 0
  1055. 0
  1056. 0
  1057. 0
  1058. 0
  1059. 0
  1060. 0
  1061. 0
  1062. 0
  1063. 0
  1064. 0
  1065. 0
  1066. 0
  1067. 0
  1068. 0
  1069. 0
  1070. 0
  1071. 0
  1072. 0
  1073. 0
  1074. 0
  1075. 0
  1076. 0
  1077. 0
  1078. 0
  1079. 0
  1080. 0
  1081. 0
  1082. 0
  1083. 0
  1084. 0
  1085. 0
  1086. 0
  1087. 0
  1088. 0
  1089. 0
  1090. 0
  1091. 0
  1092. 0
  1093. 0
  1094. 0
  1095. 0
  1096. 0
  1097. 0
  1098. 0
  1099. 0
  1100. 0
  1101. 0
  1102. 0
  1103. 0
  1104. 0
  1105. 0
  1106. 0
  1107. 0
  1108. 0
  1109. 0
  1110. 0
  1111. 0
  1112. 0
  1113. 0
  1114. 0
  1115. 0
  1116. 0
  1117. 0
  1118. 0
  1119. 0
  1120. 0
  1121. 0
  1122. 0
  1123. 0
  1124. 0
  1125. 0
  1126. 0
  1127. 0
  1128. 0
  1129. 0
  1130. 0
  1131. 0
  1132. 0
  1133. 0
  1134. 0
  1135. 0
  1136. 0
  1137. 0
  1138. 0
  1139. 0
  1140. 0
  1141. 0
  1142. 0
  1143. 0
  1144. 0
  1145. 0
  1146. 0
  1147. 0
  1148. 0
  1149. 0
  1150. 0
  1151. 0
  1152. 0
  1153. 0
  1154. 0
  1155. 0
  1156. 0
  1157. 0
  1158. 0
  1159. 0
  1160. 0
  1161. 0
  1162. 0
  1163. 0
  1164. 0
  1165. 0
  1166. 0
  1167. 0
'grp_week average degree = 0.510711225364182'

In [43]:
# average author degree (person types 2 and 4)
gwAutomatedAverageAuthorDegree2And4 <- calcAuthorMeanDegree( dataFrameIN = gwAutomatedDataDF, includeBothIN = TRUE )
paste( output_prefix, "average author degree (2 and 4) =", gwAutomatedAverageAuthorDegree2And4, sep = " " )

# average author degree (person type 2 only)
gwAutomatedAverageAuthorDegreeOnly2 <- calcAuthorMeanDegree( dataFrameIN = gwAutomatedDataDF, includeBothIN = FALSE )
paste( output_prefix, "average author degree (only 2) =", gwAutomatedAverageAuthorDegreeOnly2, sep = " " )

# average source degree (person types 3 and 4)
gwAutomatedAverageSourceDegree3And4 <- calcSourceMeanDegree( dataFrameIN = gwAutomatedDataDF, includeBothIN = TRUE )
paste( output_prefix, "average source degree (3 and 4) =", gwAutomatedAverageSourceDegree3And4, sep = " " )

# average source degree (person type 3 only)
gwAutomatedAverageSourceDegreeOnly3 <- calcSourceMeanDegree( dataFrameIN = gwAutomatedDataDF, includeBothIN = FALSE )
paste( output_prefix, "average source degree (only 3) =", gwAutomatedAverageSourceDegreeOnly3, sep = " " )


'grp_week average author degree (2 and 4) = 9.46875'
'grp_week average author degree (only 2) = 9.46875'
'grp_week average source degree (3 and 4) = 1.11406844106464'
'grp_week average source degree (only 3) = 1.11406844106464'

grp_week (gw) - automated - More metrics

Now that we have the data in statnet object, run the code in the following for more in-depth information:

  • context_text/R/sna/statnet/sna-statnet-network-stats.r

In [44]:
# Links:
# - manual (PDF): http://cran.r-project.org/web/packages/sna/sna.pdf
# - good notes: http://www.shizukalab.com/toolkits/sna/node-level-calculations

# Also, be advised that statnet and igraph don't really play nice together.
#    If you'll be using both, best idea is to have a workspace for each.

#==============================================================================#
# statnet
#==============================================================================#

# make sure you've loaded the statnet library (includes sna)
# install.packages( "statnet" )
#library( statnet )

#==============================================================================#
# NODE level
#==============================================================================#

# what is the standard deviation of the degrees?
gwAutomatedDegreeSd <- sd( gwAutomatedDegreeVector )
paste( output_prefix, "degree SD =", gwAutomatedDegreeSd, sep = " " )

# what is the variance of the degrees?
gwAutomatedDegreeVar <- var( gwAutomatedDegreeVector )
paste( output_prefix, "degree variance =", gwAutomatedDegreeVar, sep = " " )

# what is the max value among the degrees?
gwAutomatedDegreeMax <- max( gwAutomatedDegreeVector )
paste( output_prefix, "degree max =", gwAutomatedDegreeMax, sep = " " )

# calculate and plot degree distributions
gwAutomatedDegreeFrequenciesTable <- table( gwAutomatedDegreeVector )
paste( output_prefix, "degree frequencies =", gwAutomatedDegreeFrequenciesTable, sep = " " )
gwAutomatedDegreeFrequenciesTable

# node-level undirected betweenness
gwAutomatedBetweenness <- sna::betweenness( gwAutomatedNetworkStatnet, gmode = "graph", cmode = "undirected" )

#paste( "betweenness = ", gwAutomatedBetweenness, sep = "" )
# associate with each node as a node attribute.
#    (%v% is a shortcut for the get.vertex.attribute command)
gwAutomatedNetworkStatnet %v% "betweenness" <- gwAutomatedBetweenness

# also add degree vector to original data frame
gwAutomatedDataDF$betweenness <- gwAutomatedBetweenness

#==============================================================================#
# NETWORK level
#==============================================================================#

# graph-level degree centrality
gwAutomatedDegreeCentrality <- sna::centralization( gwAutomatedNetworkStatnet, sna::degree, mode = "graph" )
paste( output_prefix, "degree centrality =", gwAutomatedDegreeCentrality, sep = " " )

# graph-level betweenness centrality
gwAutomatedBetweennessCentrality <- sna::centralization( gwAutomatedNetworkStatnet, sna::betweenness, mode = "graph", cmode = "undirected" )
paste( output_prefix, "betweenness centrality =", gwAutomatedBetweennessCentrality, sep = " " )

# graph-level connectedness
gwAutomatedConnectedness <- sna::connectedness( gwAutomatedNetworkStatnet )
paste( output_prefix, "connectedness =", gwAutomatedConnectedness, sep = " " )

# graph-level transitivity
gwAutomatedTransitivity <- sna::gtrans( gwAutomatedNetworkStatnet, mode = "graph" )
paste( output_prefix, "transitivity =", gwAutomatedTransitivity, sep = " " )

# graph-level density
gwAutomatedDensity <- sna::gden( gwAutomatedNetworkStatnet, mode = "graph" )
paste( output_prefix, "density =", gwAutomatedDensity, sep = " " )


'grp_week degree SD = 1.92474544655479'
'grp_week degree variance = 3.7046450340334'
'grp_week degree max = 24'
  1. 'grp_week degree frequencies = 872'
  2. 'grp_week degree frequencies = 239'
  3. 'grp_week degree frequencies = 26'
  4. 'grp_week degree frequencies = 5'
  5. 'grp_week degree frequencies = 2'
  6. 'grp_week degree frequencies = 2'
  7. 'grp_week degree frequencies = 4'
  8. 'grp_week degree frequencies = 1'
  9. 'grp_week degree frequencies = 2'
  10. 'grp_week degree frequencies = 2'
  11. 'grp_week degree frequencies = 1'
  12. 'grp_week degree frequencies = 1'
  13. 'grp_week degree frequencies = 1'
  14. 'grp_week degree frequencies = 3'
  15. 'grp_week degree frequencies = 2'
  16. 'grp_week degree frequencies = 1'
  17. 'grp_week degree frequencies = 1'
  18. 'grp_week degree frequencies = 1'
  19. 'grp_week degree frequencies = 1'
gwAutomatedDegreeVector
  0   1   2   3   4   5   6   8   9  10  11  12  14  15  17  18  21  23  24 
872 239  26   5   2   2   4   1   2   2   1   1   1   3   2   1   1   1   1 
'grp_week degree centrality = 0.0201797716414285'
'grp_week betweenness centrality = 0.00454678734613902'
'grp_week connectedness = 0.00801192308201087'
Warning message in sna::gtrans(gwAutomatedNetworkStatnet, mode = "graph"):
“gtrans called with use.adjacency=TRUE, but your data looks too large for that to work well.  Overriding to edgelist method.”
'grp_week transitivity = 0.0372393247269116'
'grp_week density = 0.000438002766178543'

grp_week (gw) - automated - create node attribute DataFrame

If you want to just work with the traits of the nodes/vertexes, you can combine the attribute vectors into a data frame.


In [45]:
#==============================================================================#
# output attributes to data frame
#==============================================================================#

# if you want to just work with the traits of the nodes/vertexes, you can
#    combine the attribute vectors into a data frame.

# first, output network object to see what attributes you have
gwAutomatedNetworkStatnet

# then, combine them into a data frame.
gwAutomatedNodeAttrDF <- data.frame( id = gwAutomatedNetworkStatnet %v% "vertex.names",
                                     person_id = gwAutomatedNetworkStatnet %v% "person_id",
                                     person_type = gwAutomatedNetworkStatnet %v% "person_type",
                                     degree = gwAutomatedNetworkStatnet %v% "degree",
                                     betweenness = gwAutomatedNetworkStatnet %v% "betweenness" )


 Network attributes:
  vertices = 1167 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 298 
    missing edges= 0 
    non-missing edges= 298 

 Vertex attribute names: 
    betweenness degree person_id person_type vertex.names 

No edge attributes

grp_week (gw) - human

Next, we'll analyze the same week from the month of data coded by human coders. Set up some variables to store where data is located:

grp_week (gw) - human - Read data

Read in the data from tab-delimited data file, then get it in right data structures for use in R SNA.


In [46]:
# initialize variables
gwHumanDataFolder <- paste( data_directory, "/network/grp_month", sep = "" )
gwHumanDataFile <- "sourcenet_data-20171206-031319-grp_month-human-week_subset.tab"
gwHumanDataPath <- paste( gwHumanDataFolder, "/", gwHumanDataFile, sep = "" )

In [47]:
gwHumanDataPath


'/home/jonathanmorgan/work/django/research/work/phd_work/data/network/grp_month/sourcenet_data-20171206-031319-grp_month-human-week_subset.tab'

Load the data file into memory


In [48]:
# tab-delimited:
gwHumanDataDF <- read.delim( gwHumanDataPath, header = TRUE, row.names = 1, check.names = FALSE )

In [49]:
# get count of rows...
gwHumanRowCount <- nrow( gwHumanDataDF )
paste( output_prefix, "automated row count =", gwHumanRowCount, sep = " " )

# ...and columns
gwHumanColumnCount <- ncol( gwHumanDataDF )
paste( output_prefix, "automated column count =", gwHumanColumnCount, sep = " " )


'grp_week automated row count = 1167'
'grp_week automated column count = 1169'

Get just the tie rows and columns for initializing network libraries.


In [50]:
# the below syntax returns only as many columns as there are rows, so
#     omitting any trait columns that lie in columns on the right side
#     of the file.
gwHumanNetworkDF <- gwHumanDataDF[ , 1 : gwHumanRowCount ]
#str( gwHumanNetworkDF )

In [51]:
# convert to a matrix
gwHumanNetworkMatrix <- as.matrix( gwHumanNetworkDF )
# str( gwHumanNetworkMatrix )

grp_week (gw) - human - initialize statnet

First, load the statnet package, then load the automated grp_month week of data into statnet object and assign attributes to nodes.

Based on context_text/R/sna/statnet/sna-statnet-init.r.


In [52]:
# make sure you've loaded the statnet library
# install.packages( "statnet" )
library( statnet )

In [53]:
# If you have a data frame of attributes (each attribute is a column, with
#     attribute name the column name), you can associate those attributes
#     when you create the network.
# attribute help: http://www.inside-r.org/packages/cran/network/docs/loading.attributes

# load attributes from a file:
#tab_attribute_test1 <- read.delim( "tab-test1-attribute_data.txt", header = TRUE, row.names = 1, check.names = FALSE )

# or create DataFrame by just grabbing the attribute columns
#gwHumanNetworkAttributeDF <- gwHumanDataDF[ , 1169:1170 ]
gwHumanNetworkAttributeDF <- gwHumanDataDF[ , 1168:1169 ]

# convert matrix to statnet network object instance.
gwHumanNetworkStatnet <- network( gwHumanNetworkMatrix, matrix.type = "adjacency", directed = FALSE, vertex.attr = gwHumanNetworkAttributeDF )

# look at information now.
gwHumanNetworkStatnet

# Network attributes:
#  vertices = 314
#  directed = FALSE
#  hyper = FALSE
#  loops = FALSE
#  multiple = FALSE
#  bipartite = FALSE
#  total edges= 309
#    missing edges= 0
#    non-missing edges= 309
#
# Vertex attribute names:
#    person_type vertex.names
#
# No edge attributes


 Network attributes:
  vertices = 1167 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 340 
    missing edges= 0 
    non-missing edges= 340 

 Vertex attribute names: 
    person_id person_type vertex.names 

No edge attributes

In [9]:
# human - include ties Greater than or equal to 0 (GE0)
gwHumanMeanTieWeightGE0Vector <- apply( gwHumanNetworkMatrix, 1, calculateListMean )
gwHumanDataDF$meanTieWeightGE0 <- gwHumanMeanTieWeightGE0Vector

# human - include ties Greater than or equal to 1 (GE1)
gwHumanMeanTieWeightGE1Vector <- apply( gwHumanNetworkMatrix, 1, calculateListMean, minValueToIncludeIN = 1 )
gwHumanDataDF$meanTieWeightGE1 <- gwHumanMeanTieWeightGE1Vector

# human - Max tie weight?
gwHumanMaxTieWeightVector <- apply( gwHumanNetworkMatrix, 1, calculateListMax )
gwHumanDataDF$maxTieWeight <- gwHumanMaxTieWeightVector

grp_week (gw) - human - Basic metrics


In [54]:
# assuming that our statnet network object is in reference test1_statnet.

# Use the degree function in the sna package to create vector of degree values
#    for each node.  Make sure to pass the gmode parameter to tell it that the
#    graph is not directed (gmode = "graph", instead of "digraph").
# Doc: http://www.inside-r.org/packages/cran/sna/docs/degree
#degree_vector <- degree( test1_statnet, gmode = "graph" )

# If you have other libraries loaded that also implement a degree function, you
#    can also call this with package name:
gwHumanDegreeVector <- sna::degree( gwHumanNetworkStatnet, gmode = "graph" )

# output the vector
gwHumanDegreeVector

# want more info on the degree function?  You can get to it eventually through
#    the following:
#help( package = "sna" )
#??sna::degree

# what is the average (mean) degree?
gwHumanAvgDegree <- mean( gwHumanDegreeVector )
paste( output_prefix, "average degree =", gwHumanAvgDegree, sep = " " )

# subset vector to get only those that are above mean
gwHumanAboveMeanVector <- gwHumanDegreeVector[ gwHumanDegreeVector > gwHumanAvgDegree ]

# Take the degree and associate it with each node as a node attribute.
#    (%v% is a shortcut for the get.vertex.attribute command)
gwHumanNetworkStatnet %v% "degree" <- gwHumanDegreeVector

# also add degree vector to original data frame
gwHumanDataDF$degree <- gwHumanDegreeVector


  1. 6
  2. 0
  3. 14
  4. 0
  5. 13
  6. 0
  7. 1
  8. 11
  9. 10
  10. 1
  11. 18
  12. 1
  13. 36
  14. 0
  15. 0
  16. 21
  17. 23
  18. 6
  19. 16
  20. 20
  21. 1
  22. 3
  23. 0
  24. 1
  25. 1
  26. 0
  27. 17
  28. 2
  29. 2
  30. 2
  31. 2
  32. 2
  33. 1
  34. 1
  35. 1
  36. 1
  37. 1
  38. 13
  39. 1
  40. 1
  41. 1
  42. 1
  43. 0
  44. 1
  45. 5
  46. 0
  47. 0
  48. 1
  49. 11
  50. 0
  51. 0
  52. 8
  53. 2
  54. 1
  55. 1
  56. 1
  57. 1
  58. 0
  59. 1
  60. 1
  61. 0
  62. 1
  63. 1
  64. 1
  65. 1
  66. 0
  67. 0
  68. 1
  69. 1
  70. 1
  71. 1
  72. 16
  73. 0
  74. 1
  75. 1
  76. 1
  77. 1
  78. 1
  79. 1
  80. 1
  81. 1
  82. 1
  83. 1
  84. 1
  85. 1
  86. 1
  87. 1
  88. 1
  89. 1
  90. 1
  91. 1
  92. 1
  93. 3
  94. 1
  95. 1
  96. 1
  97. 9
  98. 1
  99. 1
  100. 1
  101. 1
  102. 1
  103. 1
  104. 2
  105. 6
  106. 1
  107. 0
  108. 0
  109. 0
  110. 1
  111. 1
  112. 1
  113. 1
  114. 1
  115. 1
  116. 2
  117. 2
  118. 2
  119. 2
  120. 1
  121. 1
  122. 1
  123. 1
  124. 1
  125. 1
  126. 1
  127. 1
  128. 1
  129. 1
  130. 1
  131. 1
  132. 1
  133. 1
  134. 2
  135. 1
  136. 1
  137. 1
  138. 1
  139. 1
  140. 2
  141. 1
  142. 1
  143. 1
  144. 5
  145. 1
  146. 1
  147. 1
  148. 1
  149. 1
  150. 1
  151. 7
  152. 1
  153. 1
  154. 1
  155. 1
  156. 1
  157. 1
  158. 1
  159. 1
  160. 1
  161. 2
  162. 1
  163. 1
  164. 1
  165. 1
  166. 1
  167. 1
  168. 1
  169. 1
  170. 1
  171. 1
  172. 1
  173. 0
  174. 1
  175. 1
  176. 6
  177. 0
  178. 0
  179. 0
  180. 0
  181. 1
  182. 1
  183. 4
  184. 1
  185. 1
  186. 1
  187. 1
  188. 1
  189. 1
  190. 1
  191. 1
  192. 1
  193. 1
  194. 1
  195. 1
  196. 1
  197. 0
  198. 0
  199. 1
  200. 1
  201. 1
  202. 1
  203. 1
  204. 1
  205. 1
  206. 1
  207. 1
  208. 1
  209. 1
  210. 1
  211. 1
  212. 4
  213. 4
  214. 4
  215. 4
  216. 4
  217. 1
  218. 0
  219. 1
  220. 1
  221. 1
  222. 1
  223. 1
  224. 1
  225. 1
  226. 1
  227. 1
  228. 1
  229. 1
  230. 1
  231. 1
  232. 1
  233. 4
  234. 1
  235. 1
  236. 1
  237. 1
  238. 2
  239. 2
  240. 2
  241. 2
  242. 2
  243. 2
  244. 1
  245. 1
  246. 0
  247. 1
  248. 1
  249. 1
  250. 1
  251. 1
  252. 1
  253. 1
  254. 1
  255. 1
  256. 1
  257. 1
  258. 1
  259. 1
  260. 1
  261. 2
  262. 1
  263. 1
  264. 1
  265. 1
  266. 1
  267. 1
  268. 1
  269. 1
  270. 1
  271. 1
  272. 1
  273. 1
  274. 1
  275. 1
  276. 2
  277. 2
  278. 2
  279. 2
  280. 3
  281. 1
  282. 1
  283. 1
  284. 1
  285. 1
  286. 1
  287. 1
  288. 1
  289. 1
  290. 1
  291. 4
  292. 1
  293. 1
  294. 1
  295. 1
  296. 1
  297. 1
  298. 1
  299. 1
  300. 1
  301. 1
  302. 1
  303. 1
  304. 1
  305. 1
  306. 1
  307. 1
  308. 1
  309. 23
  310. 0
  311. 8
  312. 0
  313. 0
  314. 0
  315. 0
  316. 0
  317. 0
  318. 0
  319. 0
  320. 0
  321. 0
  322. 0
  323. 0
  324. 0
  325. 0
  326. 0
  327. 0
  328. 1
  329. 0
  330. 1
  331. 0
  332. 0
  333. 0
  334. 0
  335. 1
  336. 0
  337. 0
  338. 1
  339. 0
  340. 0
  341. 1
  342. 0
  343. 2
  344. 0
  345. 1
  346. 1
  347. 1
  348. 0
  349. 0
  350. 1
  351. 1
  352. 0
  353. 0
  354. 1
  355. 0
  356. 0
  357. 0
  358. 0
  359. 10
  360. 0
  361. 0
  362. 1
  363. 0
  364. 0
  365. 0
  366. 0
  367. 0
  368. 0
  369. 3
  370. 0
  371. 0
  372. 0
  373. 0
  374. 0
  375. 0
  376. 0
  377. 0
  378. 0
  379. 0
  380. 0
  381. 0
  382. 0
  383. 0
  384. 0
  385. 0
  386. 0
  387. 0
  388. 0
  389. 0
  390. 0
  391. 0
  392. 0
  393. 0
  394. 0
  395. 0
  396. 0
  397. 0
  398. 0
  399. 0
  400. 0
  401. 0
  402. 0
  403. 0
  404. 0
  405. 0
  406. 0
  407. 0
  408. 0
  409. 0
  410. 0
  411. 0
  412. 0
  413. 0
  414. 0
  415. 0
  416. 0
  417. 0
  418. 0
  419. 0
  420. 0
  421. 0
  422. 0
  423. 0
  424. 0
  425. 0
  426. 0
  427. 0
  428. 0
  429. 0
  430. 0
  431. 0
  432. 0
  433. 0
  434. 0
  435. 0
  436. 0
  437. 0
  438. 0
  439. 0
  440. 0
  441. 0
  442. 0
  443. 0
  444. 0
  445. 0
  446. 0
  447. 0
  448. 0
  449. 0
  450. 0
  451. 0
  452. 0
  453. 0
  454. 0
  455. 0
  456. 0
  457. 0
  458. 0
  459. 0
  460. 0
  461. 0
  462. 0
  463. 0
  464. 0
  465. 0
  466. 0
  467. 0
  468. 0
  469. 0
  470. 0
  471. 0
  472. 0
  473. 0
  474. 0
  475. 0
  476. 0
  477. 0
  478. 1
  479. 1
  480. 1
  481. 1
  482. 1
  483. 0
  484. 0
  485. 0
  486. 0
  487. 0
  488. 0
  489. 0
  490. 0
  491. 0
  492. 0
  493. 0
  494. 0
  495. 0
  496. 0
  497. 0
  498. 0
  499. 0
  500. 0
  501. 0
  502. 0
  503. 0
  504. 0
  505. 0
  506. 0
  507. 0
  508. 0
  509. 0
  510. 0
  511. 0
  512. 0
  513. 0
  514. 0
  515. 0
  516. 0
  517. 0
  518. 0
  519. 0
  520. 0
  521. 0
  522. 0
  523. 0
  524. 0
  525. 0
  526. 0
  527. 0
  528. 0
  529. 0
  530. 0
  531. 0
  532. 0
  533. 0
  534. 0
  535. 0
  536. 0
  537. 0
  538. 0
  539. 0
  540. 0
  541. 0
  542. 0
  543. 0
  544. 0
  545. 0
  546. 0
  547. 0
  548. 0
  549. 0
  550. 0
  551. 0
  552. 0
  553. 0
  554. 0
  555. 0
  556. 0
  557. 0
  558. 0
  559. 0
  560. 0
  561. 0
  562. 0
  563. 0
  564. 0
  565. 0
  566. 0
  567. 0
  568. 0
  569. 0
  570. 0
  571. 0
  572. 0
  573. 0
  574. 0
  575. 0
  576. 0
  577. 0
  578. 1
  579. 0
  580. 1
  581. 1
  582. 1
  583. 0
  584. 0
  585. 0
  586. 0
  587. 0
  588. 0
  589. 0
  590. 0
  591. 0
  592. 0
  593. 0
  594. 0
  595. 0
  596. 0
  597. 0
  598. 0
  599. 0
  600. 0
  601. 0
  602. 0
  603. 0
  604. 0
  605. 0
  606. 0
  607. 0
  608. 0
  609. 0
  610. 0
  611. 2
  612. 0
  613. 0
  614. 0
  615. 0
  616. 0
  617. 0
  618. 0
  619. 0
  620. 0
  621. 0
  622. 0
  623. 0
  624. 0
  625. 0
  626. 0
  627. 0
  628. 0
  629. 0
  630. 0
  631. 0
  632. 0
  633. 0
  634. 0
  635. 0
  636. 0
  637. 0
  638. 0
  639. 0
  640. 0
  641. 0
  642. 0
  643. 0
  644. 0
  645. 0
  646. 0
  647. 0
  648. 0
  649. 0
  650. 0
  651. 0
  652. 0
  653. 0
  654. 0
  655. 0
  656. 0
  657. 0
  658. 0
  659. 0
  660. 0
  661. 0
  662. 0
  663. 0
  664. 0
  665. 0
  666. 0
  667. 0
  668. 0
  669. 0
  670. 0
  671. 0
  672. 0
  673. 0
  674. 0
  675. 0
  676. 0
  677. 0
  678. 0
  679. 0
  680. 0
  681. 0
  682. 0
  683. 0
  684. 0
  685. 0
  686. 0
  687. 0
  688. 0
  689. 0
  690. 0
  691. 0
  692. 0
  693. 0
  694. 0
  695. 0
  696. 0
  697. 0
  698. 0
  699. 0
  700. 0
  701. 0
  702. 0
  703. 0
  704. 0
  705. 0
  706. 0
  707. 0
  708. 0
  709. 0
  710. 0
  711. 0
  712. 0
  713. 0
  714. 0
  715. 0
  716. 0
  717. 0
  718. 0
  719. 0
  720. 0
  721. 0
  722. 0
  723. 0
  724. 0
  725. 0
  726. 0
  727. 0
  728. 0
  729. 0
  730. 0
  731. 0
  732. 0
  733. 0
  734. 0
  735. 0
  736. 0
  737. 1
  738. 1
  739. 1
  740. 0
  741. 0
  742. 0
  743. 0
  744. 0
  745. 0
  746. 0
  747. 0
  748. 0
  749. 0
  750. 0
  751. 0
  752. 0
  753. 0
  754. 0
  755. 0
  756. 0
  757. 0
  758. 0
  759. 0
  760. 0
  761. 0
  762. 0
  763. 0
  764. 0
  765. 0
  766. 0
  767. 0
  768. 0
  769. 0
  770. 0
  771. 0
  772. 0
  773. 0
  774. 0
  775. 0
  776. 0
  777. 0
  778. 0
  779. 0
  780. 0
  781. 0
  782. 0
  783. 0
  784. 0
  785. 0
  786. 0
  787. 0
  788. 0
  789. 0
  790. 0
  791. 0
  792. 0
  793. 0
  794. 0
  795. 0
  796. 0
  797. 0
  798. 0
  799. 0
  800. 0
  801. 0
  802. 0
  803. 0
  804. 0
  805. 0
  806. 0
  807. 0
  808. 0
  809. 0
  810. 0
  811. 0
  812. 0
  813. 0
  814. 0
  815. 0
  816. 0
  817. 0
  818. 0
  819. 0
  820. 0
  821. 0
  822. 0
  823. 0
  824. 0
  825. 0
  826. 0
  827. 0
  828. 0
  829. 0
  830. 0
  831. 0
  832. 0
  833. 0
  834. 0
  835. 0
  836. 0
  837. 0
  838. 0
  839. 0
  840. 0
  841. 0
  842. 0
  843. 0
  844. 0
  845. 0
  846. 0
  847. 0
  848. 0
  849. 0
  850. 0
  851. 0
  852. 0
  853. 0
  854. 0
  855. 0
  856. 0
  857. 0
  858. 0
  859. 0
  860. 0
  861. 0
  862. 0
  863. 0
  864. 0
  865. 0
  866. 0
  867. 0
  868. 0
  869. 0
  870. 0
  871. 0
  872. 0
  873. 0
  874. 0
  875. 0
  876. 0
  877. 0
  878. 0
  879. 0
  880. 0
  881. 0
  882. 0
  883. 0
  884. 0
  885. 0
  886. 0
  887. 0
  888. 0
  889. 0
  890. 0
  891. 0
  892. 0
  893. 0
  894. 0
  895. 0
  896. 0
  897. 0
  898. 0
  899. 0
  900. 0
  901. 0
  902. 0
  903. 0
  904. 0
  905. 0
  906. 0
  907. 0
  908. 0
  909. 0
  910. 0
  911. 0
  912. 0
  913. 0
  914. 0
  915. 0
  916. 0
  917. 0
  918. 0
  919. 0
  920. 0
  921. 0
  922. 0
  923. 0
  924. 0
  925. 0
  926. 0
  927. 0
  928. 0
  929. 0
  930. 0
  931. 0
  932. 0
  933. 0
  934. 0
  935. 0
  936. 0
  937. 0
  938. 0
  939. 0
  940. 0
  941. 0
  942. 0
  943. 0
  944. 0
  945. 0
  946. 0
  947. 0
  948. 0
  949. 0
  950. 0
  951. 0
  952. 0
  953. 0
  954. 0
  955. 0
  956. 0
  957. 0
  958. 0
  959. 0
  960. 0
  961. 0
  962. 0
  963. 0
  964. 0
  965. 0
  966. 0
  967. 0
  968. 0
  969. 0
  970. 0
  971. 0
  972. 0
  973. 0
  974. 0
  975. 0
  976. 0
  977. 0
  978. 0
  979. 0
  980. 0
  981. 0
  982. 0
  983. 0
  984. 0
  985. 0
  986. 0
  987. 0
  988. 0
  989. 0
  990. 0
  991. 0
  992. 0
  993. 0
  994. 0
  995. 0
  996. 0
  997. 0
  998. 0
  999. 0
  1000. 0
  1001. 0
  1002. 0
  1003. 0
  1004. 0
  1005. 0
  1006. 0
  1007. 0
  1008. 0
  1009. 0
  1010. 0
  1011. 0
  1012. 0
  1013. 0
  1014. 0
  1015. 0
  1016. 0
  1017. 0
  1018. 0
  1019. 0
  1020. 0
  1021. 0
  1022. 0
  1023. 0
  1024. 0
  1025. 0
  1026. 0
  1027. 0
  1028. 0
  1029. 0
  1030. 0
  1031. 0
  1032. 0
  1033. 0
  1034. 0
  1035. 0
  1036. 0
  1037. 0
  1038. 0
  1039. 0
  1040. 0
  1041. 0
  1042. 0
  1043. 0
  1044. 0
  1045. 0
  1046. 0
  1047. 0
  1048. 0
  1049. 0
  1050. 0
  1051. 0
  1052. 0
  1053. 0
  1054. 0
  1055. 0
  1056. 0
  1057. 0
  1058. 0
  1059. 0
  1060. 0
  1061. 0
  1062. 0
  1063. 0
  1064. 0
  1065. 0
  1066. 0
  1067. 0
  1068. 0
  1069. 0
  1070. 0
  1071. 0
  1072. 0
  1073. 0
  1074. 0
  1075. 0
  1076. 0
  1077. 0
  1078. 0
  1079. 0
  1080. 0
  1081. 0
  1082. 0
  1083. 0
  1084. 0
  1085. 0
  1086. 0
  1087. 0
  1088. 0
  1089. 0
  1090. 0
  1091. 0
  1092. 0
  1093. 0
  1094. 0
  1095. 0
  1096. 0
  1097. 0
  1098. 0
  1099. 0
  1100. 0
  1101. 0
  1102. 0
  1103. 0
  1104. 0
  1105. 0
  1106. 0
  1107. 0
  1108. 0
  1109. 0
  1110. 0
  1111. 0
  1112. 0
  1113. 0
  1114. 0
  1115. 0
  1116. 0
  1117. 0
  1118. 0
  1119. 0
  1120. 0
  1121. 0
  1122. 0
  1123. 0
  1124. 0
  1125. 0
  1126. 0
  1127. 0
  1128. 0
  1129. 0
  1130. 0
  1131. 0
  1132. 0
  1133. 0
  1134. 0
  1135. 0
  1136. 0
  1137. 0
  1138. 0
  1139. 0
  1140. 0
  1141. 0
  1142. 0
  1143. 0
  1144. 0
  1145. 0
  1146. 0
  1147. 0
  1148. 0
  1149. 0
  1150. 0
  1151. 0
  1152. 0
  1153. 0
  1154. 0
  1155. 0
  1156. 0
  1157. 0
  1158. 0
  1159. 0
  1160. 0
  1161. 0
  1162. 0
  1163. 0
  1164. 0
  1165. 0
  1166. 0
  1167. 0
'grp_week average degree = 0.582690659811482'

In [55]:
# average author degree (person types 2 and 4)
gwHumanAverageAuthorDegree2And4 <- calcAuthorMeanDegree( dataFrameIN = gwHumanDataDF, includeBothIN = TRUE )
paste( output_prefix, "average author degree (2 and 4) = ", gwHumanAverageAuthorDegree2And4, sep = " " )

# average author degree (person type 2 only)
gwHumanAverageAuthorDegreeOnly2 <- calcAuthorMeanDegree( dataFrameIN = gwHumanDataDF, includeBothIN = FALSE )
paste( output_prefix, "average author degree (only 2) = ", gwHumanAverageAuthorDegreeOnly2, sep = " " )

# average source degree (person types 3 and 4)
gwHumanAverageSourceDegree3And4 <- calcSourceMeanDegree( dataFrameIN = gwHumanDataDF, includeBothIN = TRUE )
paste( output_prefix, "average source degree (3 and 4) = ", gwHumanAverageSourceDegree3And4, sep = " " )

# average source degree (person type 3 only)
gwHumanAverageSourceDegreeOnly3 <- calcSourceMeanDegree( dataFrameIN = gwHumanDataDF, includeBothIN = FALSE )
paste( output_prefix, "average source degree (only 3) = ", gwHumanAverageSourceDegreeOnly3, sep = " " )


'grp_week average author degree (2 and 4) = 10.6060606060606'
'grp_week average author degree (only 2) = 10.6060606060606'
'grp_week average source degree (3 and 4) = 1.1913357400722'
'grp_week average source degree (only 3) = 1.1913357400722'

grp_week (gw) - human - More metrics

Now that we have the data in statnet object, run the code in the following for more in-depth information:

  • context_text/R/sna/statnet/sna-statnet-network-stats.r

In [56]:
# Links:
# - manual (PDF): http://cran.r-project.org/web/packages/sna/sna.pdf
# - good notes: http://www.shizukalab.com/toolkits/sna/node-level-calculations

# Also, be advised that statnet and igraph don't really play nice together.
#    If you'll be using both, best idea is to have a workspace for each.

#==============================================================================#
# statnet
#==============================================================================#

# make sure you've loaded the statnet library (includes sna)
# install.packages( "statnet" )
#library( statnet )

#==============================================================================#
# NODE level
#==============================================================================#

# what is the standard deviation of the degrees?
gwHumanDegreeSd <- sd( gwHumanDegreeVector )
paste( output_prefix, "degree SD =", gwHumanDegreeSd, sep = " " )

# what is the variance of the degrees?
gwHumanDegreeVar <- var( gwHumanDegreeVector )
paste( output_prefix, "degree variance =", gwHumanDegreeVar, sep = " " )

# what is the max value among the degrees?
gwHumanDegreeMax <- max( gwHumanDegreeVector )
paste( output_prefix, "degree max =", gwHumanDegreeMax, sep = " " )

# calculate and plot degree distributions
gwHumanDegreeFrequenciesTable <- table( gwHumanDegreeVector )
paste( output_prefix, "degree frequencies =", gwHumanDegreeFrequenciesTable, sep = " " )
gwHumanDegreeFrequenciesTable

# node-level undirected betweenness
gwHumanBetweenness <- sna::betweenness( gwHumanNetworkStatnet, gmode = "graph", cmode = "undirected" )

#paste( "betweenness = ", gwHumanBetweenness, sep = "" )
# associate with each node as a node attribute.
#    (%v% is a shortcut for the get.vertex.attribute command)
gwHumanNetworkStatnet %v% "betweenness" <- gwHumanBetweenness

# also add degree vector to original data frame
gwHumanDataDF$betweenness <- gwHumanBetweenness

#==============================================================================#
# NETWORK level
#==============================================================================#

# graph-level degree centrality
gwHumanDegreeCentrality <- sna::centralization( gwHumanNetworkStatnet, sna::degree, mode = "graph" )
paste( output_prefix, "degree centrality =", gwHumanDegreeCentrality, sep = " " )

# graph-level betweenness centrality
gwHumanBetweennessCentrality <- sna::centralization( gwHumanNetworkStatnet, sna::betweenness, mode = "graph", cmode = "undirected" )
paste( output_prefix, "betweenness centrality =", gwHumanBetweennessCentrality, sep = " " )

# graph-level connectedness
gwHumanConnectedness <- sna::connectedness( gwHumanNetworkStatnet )
paste( output_prefix, "connectedness =", gwHumanConnectedness, sep = " " )

# graph-level transitivity
gwHumanTransitivity <- sna::gtrans( gwHumanNetworkStatnet, mode = "graph" )
paste( output_prefix, "transitivity =", gwHumanTransitivity, sep = " " )

# graph-level density
gwHumanDensity <- sna::gden( gwHumanNetworkStatnet, mode = "graph" )
paste( output_prefix, "density =", gwHumanDensity, sep = " " )


'grp_week degree SD = 2.24329960040874'
'grp_week degree variance = 5.03239309719399'
'grp_week degree max = 36'
  1. 'grp_week degree frequencies = 858'
  2. 'grp_week degree frequencies = 244'
  3. 'grp_week degree frequencies = 27'
  4. 'grp_week degree frequencies = 4'
  5. 'grp_week degree frequencies = 8'
  6. 'grp_week degree frequencies = 2'
  7. 'grp_week degree frequencies = 4'
  8. 'grp_week degree frequencies = 1'
  9. 'grp_week degree frequencies = 2'
  10. 'grp_week degree frequencies = 1'
  11. 'grp_week degree frequencies = 2'
  12. 'grp_week degree frequencies = 2'
  13. 'grp_week degree frequencies = 2'
  14. 'grp_week degree frequencies = 1'
  15. 'grp_week degree frequencies = 2'
  16. 'grp_week degree frequencies = 1'
  17. 'grp_week degree frequencies = 1'
  18. 'grp_week degree frequencies = 1'
  19. 'grp_week degree frequencies = 1'
  20. 'grp_week degree frequencies = 2'
  21. 'grp_week degree frequencies = 1'
gwHumanDegreeVector
  0   1   2   3   4   5   6   7   8   9  10  11  13  14  16  17  18  20  21  23 
858 244  27   4   8   2   4   1   2   1   2   2   2   1   2   1   1   1   1   2 
 36 
  1 
'grp_week degree centrality = 0.0304271969022151'
'grp_week betweenness centrality = 0.0134361252020462'
'grp_week connectedness = 0.0198541656561737'
Warning message in sna::gtrans(gwHumanNetworkStatnet, mode = "graph"):
“gtrans called with use.adjacency=TRUE, but your data looks too large for that to work well.  Overriding to edgelist method.”
'grp_week transitivity = 0.0752148997134671'
'grp_week density = 0.000499734699666794'

grp_week (gw) - human - create node attribute DataFrame

If you want to just work with the traits of the nodes/vertexes, you can combine the attribute vectors into a data frame.


In [57]:
#==============================================================================#
# output attributes to data frame
#==============================================================================#

# if you want to just work with the traits of the nodes/vertexes, you can
#    combine the attribute vectors into a data frame.

# first, output network object to see what attributes you have
gwHumanNetworkStatnet

# then, combine them into a data frame.
gwHumanNodeAttrDF <- data.frame( id = gwHumanNetworkStatnet %v% "vertex.names",
                                 person_id = gwHumanNetworkStatnet %v% "person_id",
                                 person_type = gwHumanNetworkStatnet %v% "person_type",
                                 degree = gwHumanNetworkStatnet %v% "degree",
                                 betweenness = gwHumanNetworkStatnet %v% "betweenness" )


 Network attributes:
  vertices = 1167 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 340 
    missing edges= 0 
    non-missing edges= 340 

 Vertex attribute names: 
    betweenness degree person_id person_type vertex.names 

No edge attributes

grp_week QAP graph correlation between automated and ground truth

Now, compare the automated and human-coded networks themselves using graph correlation in QAP.

Based on: context_text/R/sna/statnet/sna-qap.r

Note: QAP compares two networks, so will have to wait until both OpenCalais and human coding networks have been processed.


In [58]:
# link to good doc on qaptest(){sna} function: http://www.inside-r.org/packages/cran/sna/docs/qaptest

# First, need to load data - see (or just source() ) the file "sna-load_data.r".
# source( "sna-load_data.r" )
# does the following (among other things):
# Start with loading in tab-delimited files.
#humanNetworkData <- read.delim( "human-sourcenet_data-20150504-002453.tab", header = TRUE, row.names = 1, check.names = FALSE )
#calaisNetworkData <- read.delim( "puter-sourcenet_data-20150504-002507.tab", header = TRUE, row.names = 1, check.names = FALSE )

# remove the right-most column, which contains non-tie info on nodes.
#humanNetworkTies <- humanNetworkData[ , -ncol( humanNetworkData ) ]
#gwAutomatedNetworkDF <- calaisNetworkData[ , -ncol( calaisNetworkData )]

# convert each to a matrix
#gwHumanNetworkMatrix <- as.matrix( gwHumanNetworkTies )
#gwAutomatedNetworkMatrix <- as.matrix( gwAutomatedNetworkDF )

# imports
# install.packages( "sna" )
# install.packages( "statnet" )
library( "sna" )

# package up data for calling qaptest() - first make 3-dimensional array to hold
#    our two matrices - this is known as a "graph set".
graphSetArray <- array( dim = c( 2, ncol( gwHumanNetworkMatrix ), nrow( gwHumanNetworkMatrix ) ) )

# then, place each matrix in one dimension of the array.
graphSetArray[ 1, , ] <- gwHumanNetworkMatrix
graphSetArray[ 2, , ] <- gwAutomatedNetworkMatrix

# first, try a graph correlation
graphCorrelation <- sna::gcor( gwHumanNetworkMatrix, gwAutomatedNetworkMatrix )
paste( output_prefix, "graph correlation =", graphCorrelation, sep = " " )

# try a qaptest...
qapGcorResult <- sna::qaptest( graphSetArray, sna::gcor, g1 = 1, g2 = 2 )
summary( qapGcorResult )
plot( qapGcorResult )

# graph covariance...
graphCovariance <- sna::gcov( gwHumanNetworkMatrix, gwAutomatedNetworkMatrix )
graphCovariance
paste( output_prefix, "graph covariance =", graphCovariance, sep = " " )

# try a qaptest...
qapGcovResult <- sna::qaptest( graphSetArray, sna::gcov, g1 = 1, g2 = 2 )
summary( qapGcovResult )
plot( qapGcovResult )

# Hamming Distance
graphHammingDist <- sna::hdist( gwHumanNetworkMatrix, gwAutomatedNetworkMatrix )
paste( output_prefix, "graph hamming distance =", graphHammingDist, sep = " " )

# try a qaptest...
qapHdistResult <- sna::qaptest( graphSetArray, sna::hdist, g1 = 1, g2 = 2 )
summary( qapHdistResult )
plot( qapHdistResult )

# graph structural correlation?
#graphStructCorrelation <- gscor( gwHumanNetworkMatrix, gwAutomatedNetworkMatrix )
#graphStructCorrelation


'grp_week graph correlation = 0.90223000784894'
QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 0 
	p(f(perm) <= f(d)): 1 

Test Diagnostics:
	Test Value (f(d)): 0.90223 
	Replications: 1000 
	Distribution Summary:
		Min:	 -0.0004507131 
		1stQ:	 -0.0004507131 
		Med:	 -0.0004507131 
		Mean:	 1.868092e-05 
		3rdQ:	 -0.0004507131 
		Max:	 0.007881724 
0.000477449419615562
'grp_week graph covariance = 0.000477449419615562'
QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 0 
	p(f(perm) <= f(d)): 1 

Test Diagnostics:
	Test Value (f(d)): 0.0004774494 
	Replications: 1000 
	Distribution Summary:
		Min:	 -2.38512e-07 
		1stQ:	 -2.38512e-07 
		Med:	 -2.38512e-07 
		Mean:	 3.046305e-08 
		3rdQ:	 -2.38512e-07 
		Max:	 4.170915e-06 
'grp_week graph hamming distance = 144'
QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 1 
	p(f(perm) <= f(d)): 0 

Test Diagnostics:
	Test Value (f(d)): 144 
	Replications: 1000 
	Distribution Summary:
		Min:	 1320 
		1stQ:	 1332 
		Med:	 1332 
		Mean:	 1331.344 
		3rdQ:	 1332 
		Max:	 1332 

Compare grp_month and grp_week using QAP

Now, compare the automated and human-coded networks from a month and a week against each other, to see what more time gets you.

Based on: context_text/R/sna/statnet/sna-qap.r

Note: QAP compares two networks, so will have to wait until both OpenCalais and human coding networks have been processed.

month-to-week - automated


In [59]:
output_prefix <- "month-to-week automated"

In [60]:
# link to good doc on qaptest(){sna} function: http://www.inside-r.org/packages/cran/sna/docs/qaptest

# First, need to load data - see (or just source() ) the file "sna-load_data.r".
# source( "sna-load_data.r" )
# does the following (among other things):
# Start with loading in tab-delimited files.
#humanNetworkData <- read.delim( "human-sourcenet_data-20150504-002453.tab", header = TRUE, row.names = 1, check.names = FALSE )
#calaisNetworkData <- read.delim( "puter-sourcenet_data-20150504-002507.tab", header = TRUE, row.names = 1, check.names = FALSE )

# remove the right-most column, which contains non-tie info on nodes.
#humanNetworkTies <- humanNetworkData[ , -ncol( humanNetworkData ) ]
#gwAutomatedNetworkDF <- calaisNetworkData[ , -ncol( calaisNetworkData )]

# convert each to a matrix
#gwHumanNetworkMatrix <- as.matrix( gwHumanNetworkTies )
#gwAutomatedNetworkMatrix <- as.matrix( gwAutomatedNetworkDF )

# imports
# install.packages( "sna" )
# install.packages( "statnet" )
library( "sna" )

# package up data for calling qaptest() - first make 3-dimensional array to hold
#    our two matrices - this is known as a "graph set".
graphSetArray <- array( dim = c( 2, ncol( gmAutomatedNetworkMatrix ), nrow( gmAutomatedNetworkMatrix ) ) )

# then, place each matrix in one dimension of the array.
graphSetArray[ 1, , ] <- gmAutomatedNetworkMatrix
graphSetArray[ 2, , ] <- gwAutomatedNetworkMatrix

# first, try a graph correlation
graphCorrelation <- sna::gcor( gmAutomatedNetworkMatrix, gwAutomatedNetworkMatrix )
paste( output_prefix, "graph correlation =", graphCorrelation, sep = " " )

# try a qaptest...
qapGcorResult <- sna::qaptest( graphSetArray, sna::gcor, g1 = 1, g2 = 2 )
summary( qapGcorResult )
plot( qapGcorResult )

# graph covariance...
graphCovariance <- sna::gcov( gmAutomatedNetworkMatrix, gwAutomatedNetworkMatrix )
graphCovariance
paste( output_prefix, "graph covariance =", graphCovariance, sep = " " )

# try a qaptest...
qapGcovResult <- sna::qaptest( graphSetArray, sna::gcov, g1 = 1, g2 = 2 )
summary( qapGcovResult )
plot( qapGcovResult )

# Hamming Distance
graphHammingDist <- sna::hdist( gmAutomatedNetworkMatrix, gwAutomatedNetworkMatrix )
paste( output_prefix, "graph hamming distance =", graphHammingDist, sep = " " )

# try a qaptest...
qapHdistResult <- sna::qaptest( graphSetArray, sna::hdist, g1 = 1, g2 = 2 )
summary( qapHdistResult )
plot( qapHdistResult )

# graph structural correlation?
#graphStructCorrelation <- gscor( gwHumanNetworkMatrix, gwAutomatedNetworkMatrix )
#graphStructCorrelation


'month-to-week automated graph correlation = 0.521381854496348'
QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 0 
	p(f(perm) <= f(d)): 1 

Test Diagnostics:
	Test Value (f(d)): 0.5213819 
	Replications: 1000 
	Distribution Summary:
		Min:	 -0.0007992741 
		1stQ:	 -0.0007992741 
		Med:	 -0.0007992741 
		Mean:	 -2.370216e-05 
		3rdQ:	 0.0006006753 
		Max:	 0.01320022 
0.000547399605464319
'month-to-week automated graph covariance = 0.000547399605464319'
QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 0 
	p(f(perm) <= f(d)): 1 

Test Diagnostics:
	Test Value (f(d)): 0.0005473996 
	Replications: 1000 
	Distribution Summary:
		Min:	 -8.391591e-07 
		1stQ:	 -8.391591e-07 
		Med:	 -8.391591e-07 
		Mean:	 8.920673e-09 
		3rdQ:	 6.306499e-07 
		Max:	 9.449504e-06 
'month-to-week automated graph hamming distance = 1876'
QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 1 
	p(f(perm) <= f(d)): 0 

Test Diagnostics:
	Test Value (f(d)): 1876 
	Replications: 1000 
	Distribution Summary:
		Min:	 3100 
		1stQ:	 3116 
		Med:	 3120 
		Mean:	 3118.104 
		3rdQ:	 3120 
		Max:	 3120 

month-to-week - human


In [61]:
output_prefix <- "month-to-week human"

In [62]:
# link to good doc on qaptest(){sna} function: http://www.inside-r.org/packages/cran/sna/docs/qaptest

# First, need to load data - see (or just source() ) the file "sna-load_data.r".
# source( "sna-load_data.r" )
# does the following (among other things):
# Start with loading in tab-delimited files.
#humanNetworkData <- read.delim( "human-sourcenet_data-20150504-002453.tab", header = TRUE, row.names = 1, check.names = FALSE )
#calaisNetworkData <- read.delim( "puter-sourcenet_data-20150504-002507.tab", header = TRUE, row.names = 1, check.names = FALSE )

# remove the right-most column, which contains non-tie info on nodes.
#humanNetworkTies <- humanNetworkData[ , -ncol( humanNetworkData ) ]
#gwHumanNetworkDF <- calaisNetworkData[ , -ncol( calaisNetworkData )]

# convert each to a matrix
#gwHumanNetworkMatrix <- as.matrix( gwHumanNetworkTies )
#gwHumanNetworkMatrix <- as.matrix( gwHumanNetworkDF )

# imports
# install.packages( "sna" )
# install.packages( "statnet" )
library( "sna" )

# package up data for calling qaptest() - first make 3-dimensional array to hold
#    our two matrices - this is known as a "graph set".
graphSetArray <- array( dim = c( 2, ncol( gmHumanNetworkMatrix ), nrow( gmHumanNetworkMatrix ) ) )

# then, place each matrix in one dimension of the array.
graphSetArray[ 1, , ] <- gmHumanNetworkMatrix
graphSetArray[ 2, , ] <- gwHumanNetworkMatrix

# first, try a graph correlation
graphCorrelation <- sna::gcor( gmHumanNetworkMatrix, gwHumanNetworkMatrix )
paste( output_prefix, "graph correlation =", graphCorrelation, sep = " " )

# try a qaptest...
qapGcorResult <- sna::qaptest( graphSetArray, sna::gcor, g1 = 1, g2 = 2 )
summary( qapGcorResult )
plot( qapGcorResult )

# graph covariance...
graphCovariance <- sna::gcov( gmHumanNetworkMatrix, gwHumanNetworkMatrix )
graphCovariance
paste( output_prefix, "graph covariance =", graphCovariance, sep = " " )

# try a qaptest...
qapGcovResult <- sna::qaptest( graphSetArray, sna::gcov, g1 = 1, g2 = 2 )
summary( qapGcovResult )
plot( qapGcovResult )

# Hamming Distance
graphHammingDist <- sna::hdist( gmHumanNetworkMatrix, gwHumanNetworkMatrix )
paste( output_prefix, "graph hamming distance =", graphHammingDist, sep = " " )

# try a qaptest...
qapHdistResult <- sna::qaptest( graphSetArray, sna::hdist, g1 = 1, g2 = 2 )
summary( qapHdistResult )
plot( qapHdistResult )

# graph structural correlation?
#graphStructCorrelation <- gscor( gmHumanNetworkMatrix, gwHumanNetworkMatrix )
#graphStructCorrelation


'month-to-week human graph correlation = 0.538643365355464'
QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 0 
	p(f(perm) <= f(d)): 1 

Test Diagnostics:
	Test Value (f(d)): 0.5386434 
	Replications: 1000 
	Distribution Summary:
		Min:	 -0.0008668826 
		1stQ:	 -0.0008668826 
		Med:	 -0.0008668826 
		Mean:	 6.670216e-06 
		3rdQ:	 0.0003936554 
		Max:	 0.007956883 
0.000628067460651923
'month-to-week human graph covariance = 0.000628067460651923'
QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 0 
	p(f(perm) <= f(d)): 1 

Test Diagnostics:
	Test Value (f(d)): 0.0006280675 
	Replications: 1000 
	Distribution Summary:
		Min:	 -1.0108e-06 
		1stQ:	 -1.0108e-06 
		Med:	 -1.0108e-06 
		Mean:	 1.365682e-08 
		3rdQ:	 4.59009e-07 
		Max:	 9.277863e-06 
'month-to-week human graph hamming distance = 1926'
QAP Test Results

Estimated p-values:
	p(f(perm) >= f(d)): 1 
	p(f(perm) <= f(d)): 0 

Test Diagnostics:
	Test Value (f(d)): 1926 
	Replications: 1000 
	Distribution Summary:
		Min:	 3322 
		1stQ:	 3342 
		Med:	 3346 
		Mean:	 3343.616 
		3rdQ:	 3346 
		Max:	 3346 

Save workspace image

Save all the information in the current image, in case we need/want it later.


In [10]:
# help( save.image )
save.image( file = workspace_file_name )

TODO

DONE:

  • Not sure what the problem was, but it is fixed (might have been the first-name lookup bug - if only first name, and one and only one person with that first name in database, it used to match them, even though you don't know if last name matched).
  • human data for grp_month has one fewer vertex (1167) than automated (1168). The missing person is row 355, user ID 781 (source_3), who is in automated, not in human. QAP needs same-size matrices.

    • 781 - Cook, Matthew ( Wayland Fire Department )
    • First, try to regenerate the data.
    • Then, if it doesn't get better, look into the article(s) where 781 - Cook, Matthew ( Wayland Fire Department ) is mentioned.