In [1]:
#libraries
library(GO.db)
library(topGO)
# library(org.Hs.eg.db)
library(org.Sc.sgd.db)
library(GOSemSim)


Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, cbind, colnames, do.call,
    duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect,
    is.unsorted, lapply, lengths, Map, mapply, match, mget, order,
    paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
    Reduce, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit, which, which.max, which.min

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:base’:

    colMeans, colSums, expand.grid, rowMeans, rowSums


Loading required package: graph
Loading required package: SparseM

Attaching package: ‘SparseM’

The following object is masked from ‘package:base’:

    backsolve


groupGOTerms: 	GOBPTerm, GOMFTerm, GOCCTerm environments built.

Attaching package: ‘topGO’

The following object is masked from ‘package:IRanges’:

    members


GOSemSim v2.0.4  For help: https://guangchuangyu.github.io/GOSemSim

If you use GOSemSim in published research, please cite:
Guangchuang Yu, Fei Li, Yide Qin, Xiaochen Bo, Yibo Wu, Shengqi Wang. GOSemSim: an R package for measuring semantic similarity among GO terms and gene products Bioinformatics 2010, 26(7):976-978

In [2]:
file <- "yeast_uetz"

ont <- "BP"
p <- 0.8
init <- 1

db <- org.Sc.sgd.db
mapping <- "org.Sc.sgd.db"
ID <- "ENSEMBL"
# db <- org.Hs.eg.db
# mapping <- "org.Hs.eg.db"
# ID <- "ENTREZ"

##load all community gene lists
setwd(sprintf("/home/david/Documents/ghsom/%s_communities_%s_%s", file, p, init))

#background gene list
backgroundFilename <- "all_genes.txt"
allGenes <- scan(backgroundFilename, character())

#load communities from file
g <- list()
numCom <- 0
filename <- sprintf("community_%s.txt", numCom)
while (file.exists(filename)) {
    numCom <- numCom + 1
    g[[numCom]] <- scan(filename, character())
    filename <- sprintf("community_%s.txt", numCom)
}

#distances between neurons
shortest.path <- read.csv("shortest_path.csv", sep=",", header=FALSE)

In [3]:
numCom


6

In [4]:
##SEMATIC SIMILARITY
#construct gosemsim object

scGO <- godata(mapping, ont=ont, keytype=ID)
print("DONE")


[1] "preparing gene to GO mapping data..."
[1] "preparing IC data..."
[1] "DONE"

In [64]:
allGeneNames <- scan(character(), file="../yeast_uetz_communities_0.5_1/all_genes.txt")

g  <- sapply(g, function(i) allGeneNames[as.integer(i)])
allGenes <- allGeneNames[as.integer(allGenes)]

In [5]:
interestingGenes <- factor(as.integer(allGenes %in% g[[1]]))
names(interestingGenes) <- allGenes

GOdata <- new("topGOdata", description=sprintf("topGO object"),
          ontology = ont, allGenes = interestingGenes,
          annotationFun = annFUN.org, mapping = mapping, 
          ID = ID, nodeSize = 10)


Building most specific GOs .....
	( 571 GO terms found. )

Build GO DAG topology ..........
	( 1846 GO terms and 4079 relations. )

Annotating nodes ...............
	( 250 genes annotated to the GO terms. )

In [9]:
usedGO(GOdata)


  1. 'GO:0000003'
  2. 'GO:0000070'
  3. 'GO:0000075'
  4. 'GO:0000278'
  5. 'GO:0000280'
  6. 'GO:0000288'
  7. 'GO:0000723'
  8. 'GO:0000819'
  9. 'GO:0000956'
  10. 'GO:0005975'
  11. 'GO:0006082'
  12. 'GO:0006091'
  13. 'GO:0006139'
  14. 'GO:0006259'
  15. 'GO:0006281'
  16. 'GO:0006302'
  17. 'GO:0006310'
  18. 'GO:0006325'
  19. 'GO:0006351'
  20. 'GO:0006352'
  21. 'GO:0006355'
  22. 'GO:0006357'
  23. 'GO:0006366'
  24. 'GO:0006367'
  25. 'GO:0006396'
  26. 'GO:0006397'
  27. 'GO:0006401'
  28. 'GO:0006402'
  29. 'GO:0006403'
  30. 'GO:0006412'
  31. 'GO:0006417'
  32. 'GO:0006461'
  33. 'GO:0006464'
  34. 'GO:0006468'
  35. 'GO:0006508'
  36. 'GO:0006518'
  37. 'GO:0006520'
  38. 'GO:0006605'
  39. 'GO:0006606'
  40. 'GO:0006725'
  41. 'GO:0006753'
  42. 'GO:0006793'
  43. 'GO:0006796'
  44. 'GO:0006807'
  45. 'GO:0006810'
  46. 'GO:0006886'
  47. 'GO:0006913'
  48. 'GO:0006914'
  49. 'GO:0006950'
  50. 'GO:0006974'
  51. 'GO:0006996'
  52. 'GO:0006997'
  53. 'GO:0007010'
  54. 'GO:0007033'
  55. 'GO:0007034'
  56. 'GO:0007049'
  57. 'GO:0007059'
  58. 'GO:0007067'
  59. 'GO:0007126'
  60. 'GO:0007154'
  61. 'GO:0007165'
  62. 'GO:0007346'
  63. 'GO:0008104'
  64. 'GO:0008150'
  65. 'GO:0008152'
  66. 'GO:0009056'
  67. 'GO:0009057'
  68. 'GO:0009058'
  69. 'GO:0009059'
  70. 'GO:0009116'
  71. 'GO:0009117'
  72. 'GO:0009123'
  73. 'GO:0009161'
  74. 'GO:0009259'
  75. 'GO:0009628'
  76. 'GO:0009653'
  77. 'GO:0009889'
  78. 'GO:0009890'
  79. 'GO:0009891'
  80. 'GO:0009892'
  81. 'GO:0009893'
  82. 'GO:0009987'
  83. 'GO:0010033'
  84. 'GO:0010467'
  85. 'GO:0010468'
  86. 'GO:0010556'
  87. 'GO:0010557'
  88. 'GO:0010558'
  89. 'GO:0010564'
  90. 'GO:0010604'
  91. 'GO:0010605'
  92. 'GO:0010608'
  93. 'GO:0010628'
  94. 'GO:0010629'
  95. 'GO:0010639'
  96. 'GO:0015031'
  97. 'GO:0015931'
  98. 'GO:0016043'
  99. 'GO:0016070'
  100. 'GO:0016071'
  101. 'GO:0016192'
  102. 'GO:0016236'
  103. 'GO:0016310'
  104. 'GO:0016569'
  105. 'GO:0017038'
  106. 'GO:0018130'
  107. 'GO:0018193'
  108. 'GO:0019219'
  109. 'GO:0019222'
  110. 'GO:0019438'
  111. 'GO:0019439'
  112. 'GO:0019538'
  113. 'GO:0019637'
  114. 'GO:0019693'
  115. 'GO:0019752'
  116. 'GO:0022402'
  117. 'GO:0022414'
  118. 'GO:0022607'
  119. 'GO:0022613'
  120. 'GO:0023052'
  121. 'GO:0030154'
  122. 'GO:0030163'
  123. 'GO:0031323'
  124. 'GO:0031324'
  125. 'GO:0031325'
  126. 'GO:0031326'
  127. 'GO:0031327'
  128. 'GO:0031328'
  129. 'GO:0032200'
  130. 'GO:0032268'
  131. 'GO:0032502'
  132. 'GO:0032774'
  133. 'GO:0033036'
  134. 'GO:0033043'
  135. 'GO:0033365'
  136. 'GO:0033554'
  137. 'GO:0034248'
  138. 'GO:0034504'
  139. 'GO:0034613'
  140. 'GO:0034622'
  141. 'GO:0034641'
  142. 'GO:0034645'
  143. 'GO:0034654'
  144. 'GO:0034655'
  145. 'GO:0034660'
  146. 'GO:0035556'
  147. 'GO:0036211'
  148. 'GO:0040007'
  149. 'GO:0042221'
  150. 'GO:0042254'
  151. 'GO:0042592'
  152. 'GO:0043043'
  153. 'GO:0043170'
  154. 'GO:0043412'
  155. 'GO:0043436'
  156. 'GO:0043603'
  157. 'GO:0043604'
  158. 'GO:0043623'
  159. 'GO:0043632'
  160. 'GO:0043933'
  161. 'GO:0044085'
  162. 'GO:0044087'
  163. 'GO:0044237'
  164. 'GO:0044238'
  165. 'GO:0044248'
  166. 'GO:0044249'
  167. 'GO:0044257'
  168. 'GO:0044260'
  169. 'GO:0044265'
  170. 'GO:0044267'
  171. 'GO:0044270'
  172. 'GO:0044271'
  173. 'GO:0044281'
  174. 'GO:0044283'
  175. 'GO:0044699'
  176. 'GO:0044700'
  177. 'GO:0044702'
  178. 'GO:0044710'
  179. 'GO:0044711'
  180. 'GO:0044712'
  181. 'GO:0044723'
  182. 'GO:0044744'
  183. 'GO:0044763'
  184. 'GO:0044765'
  185. 'GO:0044767'
  186. 'GO:0044770'
  187. 'GO:0044772'
  188. 'GO:0044802'
  189. 'GO:0045184'
  190. 'GO:0045786'
  191. 'GO:0045892'
  192. 'GO:0045893'
  193. 'GO:0045934'
  194. 'GO:0045935'
  195. 'GO:0045944'
  196. 'GO:0046483'
  197. 'GO:0046700'
  198. 'GO:0046907'
  199. 'GO:0048285'
  200. 'GO:0048518'
  201. 'GO:0048519'
  202. 'GO:0048522'
  203. 'GO:0048523'
  204. 'GO:0048583'
  205. 'GO:0048856'
  206. 'GO:0048869'
  207. 'GO:0050657'
  208. 'GO:0050658'
  209. 'GO:0050789'
  210. 'GO:0050794'
  211. 'GO:0050896'
  212. 'GO:0051028'
  213. 'GO:0051128'
  214. 'GO:0051129'
  215. 'GO:0051169'
  216. 'GO:0051170'
  217. 'GO:0051171'
  218. 'GO:0051172'
  219. 'GO:0051173'
  220. 'GO:0051179'
  221. 'GO:0051186'
  222. 'GO:0051234'
  223. 'GO:0051236'
  224. 'GO:0051246'
  225. 'GO:0051252'
  226. 'GO:0051253'
  227. 'GO:0051254'
  228. 'GO:0051276'
  229. 'GO:0051301'
  230. 'GO:0051321'
  231. 'GO:0051641'
  232. 'GO:0051649'
  233. 'GO:0051716'
  234. 'GO:0051726'
  235. 'GO:0051783'
  236. 'GO:0055086'
  237. 'GO:0055114'
  238. 'GO:0060249'
  239. 'GO:0060255'
  240. 'GO:0061024'
  241. 'GO:0065003'
  242. 'GO:0065004'
  243. 'GO:0065007'
  244. 'GO:0065008'
  245. 'GO:0065009'
  246. 'GO:0070271'
  247. 'GO:0070727'
  248. 'GO:0070887'
  249. 'GO:0071702'
  250. 'GO:0071704'
  251. 'GO:0071705'
  252. 'GO:0071822'
  253. 'GO:0071824'
  254. 'GO:0071840'
  255. 'GO:0072594'
  256. 'GO:0080090'
  257. 'GO:0090304'
  258. 'GO:0090305'
  259. 'GO:0090407'
  260. 'GO:0097659'
  261. 'GO:0098813'
  262. 'GO:1901135'
  263. 'GO:1901360'
  264. 'GO:1901361'
  265. 'GO:1901362'
  266. 'GO:1901564'
  267. 'GO:1901566'
  268. 'GO:1901575'
  269. 'GO:1901576'
  270. 'GO:1901605'
  271. 'GO:1901657'
  272. 'GO:1901987'
  273. 'GO:1902578'
  274. 'GO:1902580'
  275. 'GO:1902582'
  276. 'GO:1902589'
  277. 'GO:1902593'
  278. 'GO:1902679'
  279. 'GO:1902680'
  280. 'GO:1903046'
  281. 'GO:1903047'
  282. 'GO:1903506'
  283. 'GO:1903507'
  284. 'GO:1903508'
  285. 'GO:2000112'
  286. 'GO:2000113'
  287. 'GO:2001141'

In [16]:
goID <- "GO:0006914"
gene.universe <- genes(GOdata)
go.genes <- genesInTerm(GOdata, goID)[[1]]
sig.genes <- sigGenes(GOdata)

my.group <- new("classicCount", testStatistic = GOFisherTest, name = "fisher",
                 allMembers = gene.universe, groupMembers = go.genes,
                 sigMembers = sig.genes)
t <- contTable(my.group)

library(gridExtra)
grid.table(t)

runTest(my.group)


0.000111024375808973

In [13]:
enrichedGOTerms <- function(genes, allGenes, cutoff, correction, ont, mapping, ID, algorithm){
    interestingGenes <- factor(as.integer(allGenes %in% genes))
    names(interestingGenes) <- allGenes
    
    GOdata <- new("topGOdata", description=sprintf("topGO object"),
              ontology = ont, allGenes = interestingGenes,
              annotationFun = annFUN.org, mapping = mapping, 
              ID = ID, nodeSize = 10)
    
    result <- runTest(GOdata, algorithm = algorithm, statistic = "fisher")
    if (correction){
        GOs <- score(result)[which(p.adjust(score(result), method="BH") <= cutoff)]
    } else {
        GOs <- score(result)[score(result) <= cutoff]
    }
    
    plot <- showSigOfNodes(GOdata, score(result), firstSigNodes = 10, useInfo ='all', swPlot = FALSE)
    
    return(list(GOdata, GOs, plot))
}

In [14]:
enrichedGOs  <- sapply(g, enrichedGOTerms, allGenes=allGenes, 
                      cutoff=0.01, correction=FALSE, ont=ont, mapping=mapping, ID=ID, algorithm="elim")


Building most specific GOs .....
	( 571 GO terms found. )

Build GO DAG topology ..........
	( 1846 GO terms and 4079 relations. )

Annotating nodes ...............
	( 250 genes annotated to the GO terms. )

			 -- Elim Algorithm -- 

		 the algorithm is scoring 252 nontrivial nodes
		 parameters: 
			 test statistic: fisher
			 cutOff: 0.01

	 Level 13:	1 nodes to be scored	(0 eliminated genes)

	 Level 12:	4 nodes to be scored	(0 eliminated genes)

	 Level 11:	5 nodes to be scored	(0 eliminated genes)

	 Level 10:	5 nodes to be scored	(0 eliminated genes)

	 Level 9:	10 nodes to be scored	(0 eliminated genes)

	 Level 8:	13 nodes to be scored	(0 eliminated genes)

	 Level 7:	33 nodes to be scored	(20 eliminated genes)

	 Level 6:	46 nodes to be scored	(21 eliminated genes)

	 Level 5:	53 nodes to be scored	(28 eliminated genes)

	 Level 4:	44 nodes to be scored	(46 eliminated genes)

	 Level 3:	26 nodes to be scored	(46 eliminated genes)

	 Level 2:	11 nodes to be scored	(50 eliminated genes)

	 Level 1:	1 nodes to be scored	(50 eliminated genes)
Loading required package: Rgraphviz
Loading required package: grid

Attaching package: ‘grid’

The following object is masked from ‘package:topGO’:

    depth


Attaching package: ‘Rgraphviz’

The following objects are masked from ‘package:IRanges’:

    from, to

The following objects are masked from ‘package:S4Vectors’:

    from, to


Building most specific GOs .....
	( 571 GO terms found. )

Build GO DAG topology ..........
	( 1846 GO terms and 4079 relations. )

Annotating nodes ...............
	( 250 genes annotated to the GO terms. )

			 -- Elim Algorithm -- 

		 the algorithm is scoring 275 nontrivial nodes
		 parameters: 
			 test statistic: fisher
			 cutOff: 0.01

	 Level 13:	1 nodes to be scored	(0 eliminated genes)

	 Level 12:	4 nodes to be scored	(0 eliminated genes)

	 Level 11:	6 nodes to be scored	(0 eliminated genes)

	 Level 10:	6 nodes to be scored	(0 eliminated genes)

	 Level 9:	12 nodes to be scored	(0 eliminated genes)

	 Level 8:	19 nodes to be scored	(0 eliminated genes)

	 Level 7:	34 nodes to be scored	(0 eliminated genes)

	 Level 6:	51 nodes to be scored	(0 eliminated genes)

	 Level 5:	57 nodes to be scored	(26 eliminated genes)

	 Level 4:	47 nodes to be scored	(26 eliminated genes)

	 Level 3:	26 nodes to be scored	(26 eliminated genes)

	 Level 2:	11 nodes to be scored	(26 eliminated genes)

	 Level 1:	1 nodes to be scored	(26 eliminated genes)

Building most specific GOs .....
	( 571 GO terms found. )

Build GO DAG topology ..........
	( 1846 GO terms and 4079 relations. )

Annotating nodes ...............
	( 250 genes annotated to the GO terms. )

			 -- Elim Algorithm -- 

		 the algorithm is scoring 225 nontrivial nodes
		 parameters: 
			 test statistic: fisher
			 cutOff: 0.01

	 Level 13:	1 nodes to be scored	(0 eliminated genes)

	 Level 12:	2 nodes to be scored	(0 eliminated genes)

	 Level 11:	4 nodes to be scored	(0 eliminated genes)

	 Level 10:	4 nodes to be scored	(0 eliminated genes)

	 Level 9:	8 nodes to be scored	(0 eliminated genes)

	 Level 8:	14 nodes to be scored	(0 eliminated genes)

	 Level 7:	25 nodes to be scored	(0 eliminated genes)

	 Level 6:	37 nodes to be scored	(0 eliminated genes)

	 Level 5:	53 nodes to be scored	(36 eliminated genes)

	 Level 4:	44 nodes to be scored	(45 eliminated genes)

	 Level 3:	23 nodes to be scored	(45 eliminated genes)

	 Level 2:	9 nodes to be scored	(45 eliminated genes)

	 Level 1:	1 nodes to be scored	(45 eliminated genes)

Building most specific GOs .....
	( 571 GO terms found. )

Build GO DAG topology ..........
	( 1846 GO terms and 4079 relations. )

Annotating nodes ...............
	( 250 genes annotated to the GO terms. )

			 -- Elim Algorithm -- 

		 the algorithm is scoring 250 nontrivial nodes
		 parameters: 
			 test statistic: fisher
			 cutOff: 0.01

	 Level 12:	4 nodes to be scored	(0 eliminated genes)

	 Level 11:	6 nodes to be scored	(11 eliminated genes)

	 Level 10:	6 nodes to be scored	(11 eliminated genes)

	 Level 9:	12 nodes to be scored	(22 eliminated genes)

	 Level 8:	17 nodes to be scored	(22 eliminated genes)

	 Level 7:	29 nodes to be scored	(31 eliminated genes)

	 Level 6:	45 nodes to be scored	(73 eliminated genes)

	 Level 5:	58 nodes to be scored	(73 eliminated genes)

	 Level 4:	42 nodes to be scored	(92 eliminated genes)

	 Level 3:	22 nodes to be scored	(92 eliminated genes)

	 Level 2:	8 nodes to be scored	(92 eliminated genes)

	 Level 1:	1 nodes to be scored	(92 eliminated genes)

Building most specific GOs .....
	( 571 GO terms found. )

Build GO DAG topology ..........
	( 1846 GO terms and 4079 relations. )

Annotating nodes ...............
	( 250 genes annotated to the GO terms. )

			 -- Elim Algorithm -- 

		 the algorithm is scoring 276 nontrivial nodes
		 parameters: 
			 test statistic: fisher
			 cutOff: 0.01

	 Level 13:	1 nodes to be scored	(0 eliminated genes)

	 Level 12:	4 nodes to be scored	(0 eliminated genes)

	 Level 11:	6 nodes to be scored	(0 eliminated genes)

	 Level 10:	5 nodes to be scored	(12 eliminated genes)

	 Level 9:	11 nodes to be scored	(12 eliminated genes)

	 Level 8:	19 nodes to be scored	(12 eliminated genes)

	 Level 7:	33 nodes to be scored	(24 eliminated genes)

	 Level 6:	51 nodes to be scored	(28 eliminated genes)

	 Level 5:	60 nodes to be scored	(28 eliminated genes)

	 Level 4:	48 nodes to be scored	(28 eliminated genes)

	 Level 3:	26 nodes to be scored	(38 eliminated genes)

	 Level 2:	11 nodes to be scored	(38 eliminated genes)

	 Level 1:	1 nodes to be scored	(38 eliminated genes)

Building most specific GOs .....
	( 571 GO terms found. )

Build GO DAG topology ..........
	( 1846 GO terms and 4079 relations. )

Annotating nodes ...............
	( 250 genes annotated to the GO terms. )

			 -- Elim Algorithm -- 

		 the algorithm is scoring 271 nontrivial nodes
		 parameters: 
			 test statistic: fisher
			 cutOff: 0.01

	 Level 13:	1 nodes to be scored	(0 eliminated genes)

	 Level 12:	3 nodes to be scored	(0 eliminated genes)

	 Level 11:	4 nodes to be scored	(0 eliminated genes)

	 Level 10:	5 nodes to be scored	(0 eliminated genes)

	 Level 9:	11 nodes to be scored	(0 eliminated genes)

	 Level 8:	16 nodes to be scored	(0 eliminated genes)

	 Level 7:	34 nodes to be scored	(0 eliminated genes)

	 Level 6:	51 nodes to be scored	(0 eliminated genes)

	 Level 5:	60 nodes to be scored	(0 eliminated genes)

	 Level 4:	49 nodes to be scored	(10 eliminated genes)

	 Level 3:	25 nodes to be scored	(10 eliminated genes)

	 Level 2:	11 nodes to be scored	(10 eliminated genes)

	 Level 1:	1 nodes to be scored	(10 eliminated genes)

In [23]:
lengths(enrichedGOs)


  1. 29
  2. 50
  3. 68
  4. 28
  5. 32
  6. 21
  7. 41
  8. 60
  9. 47
  10. 34
  11. 26

In [15]:
enrichedGOs[[2,1]]


GO:0006914
0.000111024375808973
GO:0006605
4.74925119630649e-05
GO:0016192
0.00087861623591962
GO:0007033
0.00200880989377714
GO:0007034
0.000101229244111927
GO:0015031
0.00717492505664814
GO:0072594
8.75731737522821e-05

In [17]:
lengths(g)


  1. 34
  2. 58
  3. 146
  4. 88
  5. 146
  6. 72
  7. 188
  8. 102
  9. 194
  10. 27
  11. 65
  12. 75
  13. 103
  14. 51
  15. 50
  16. 67
  17. 43
  18. 37
  19. 52
  20. 49

In [72]:
head(shortest.path)


V1V2V3V4V5V6V7V8V9V10V25V26V27V28V29V30V31V32V33V34
01222222233334553423
10111111122223442312
21012222233334553421
21102222233334553421
21220122233334553423
21221012232333443423

In [ ]:
mgeneSim(g[[1]], semData=scGO, measure="Wang")

In [78]:
mgoSim(names(enrichedGOs[[1]]), names(enrichedGOs[[2]]), semData=scGO, measure="Resnik", combine="BMA")


0.031

In [79]:
mgoSim(names(enrichedGOs[[1]]), names(enrichedGOs[[32]]), semData=scGO, measure="Resnik", combine="BMA")


0.075

In [55]:
clusterSim(g[[1]], g[[2]], semData=scGO, measure="Wang", combine=NULL)


GO:0032049GO:0006809GO:0016226GO:0045429GO:0055114GO:1901300GO:0001402GO:0006970GO:0006972GO:0007232GO:0015918GO:0035376GO:0035690GO:0033617GO:0016050GO:0016485GO:0030433GO:0031503GO:0007097GO:0031578
GO:00000770.1210.0940.1210.0500.1610.1910.3110.2820.2420.3710.0940.0870.2890.0720.1440.0490.2800.0670.0490.112
GO:00013020.1700.1210.1600.0620.2300.1140.2740.1000.0870.2200.1310.1240.1710.0940.1980.0660.1750.0930.0650.069
GO:00064570.1290.1980.2760.0930.1910.0790.2110.1860.1630.1720.1000.0980.3050.1560.3740.1190.1200.1810.1170.045
GO:00331940.0360.0600.0840.0280.1100.2110.1290.4110.3550.1770.0610.0580.3090.0480.0940.0700.1740.1000.0690.014
GO:00345990.0730.1110.1430.0590.0890.3260.2500.3340.2900.3510.0530.0500.5820.0870.1730.0590.2960.0810.0580.030
GO:00422620.2490.2290.1410.1430.2540.1780.2780.1760.1550.3260.0790.0740.1920.0580.0940.1970.4220.0460.0370.126
GO:00454540.1600.1140.1470.1760.1980.2060.4780.0890.0780.3950.1190.1120.1560.0900.1760.0610.1660.0820.0600.125
GO:00610770.1060.1660.2290.0790.1560.0670.1760.1530.1330.1420.0830.0800.2510.1280.3040.0970.1010.1460.0950.038
GO:00063780.1510.2430.1430.1370.1170.0350.0730.0520.0440.0590.0320.0290.0910.0540.1000.2590.1970.0460.0340.082
GO:00169730.0810.0620.0810.0350.1470.0330.0680.0440.0380.0530.3480.3170.0360.0250.0390.1750.1050.1340.2010.046
GO:00434880.0850.1000.1410.1370.1750.0670.1660.0890.0770.1350.0530.0500.0690.0430.0780.2790.1060.0820.0590.085
GO:00459450.1380.2230.1130.4320.0600.0910.1110.0270.0230.0920.0180.0160.0470.0300.0500.1330.1120.0230.0190.102
GO:19001520.0850.1380.0760.1950.0590.0610.1160.0260.0220.0960.0170.0150.0480.0300.0500.1090.1960.0230.0180.226
GO:19003640.0950.1550.0880.1980.0690.0600.1220.0300.0260.0990.0190.0170.0550.0330.0580.1590.1270.0260.0200.145
GO:00092280.2650.3220.1580.1980.1800.0700.1270.0380.0330.1050.0580.0530.0710.0460.0750.0850.1700.0340.0270.071
GO:00007250.2580.2450.1550.1440.2770.1670.2490.1820.1570.2880.0810.0750.2010.0630.1080.2080.4120.0490.0390.121
GO:00007300.1710.1960.3090.1090.2200.1150.1880.1560.1290.2060.0610.0550.1670.3090.2100.1570.2810.0430.0330.096
GO:00430070.2570.2490.2450.1440.2910.0860.1700.0570.0500.1400.0840.0790.0980.1630.3030.2170.3080.0520.0400.190
GO:00075330.2010.1370.1720.0790.3420.0890.1800.0670.0590.1480.1000.0950.1040.0640.1140.1100.2070.0620.0480.072
GO:00082980.0360.0600.0840.0280.1100.0240.0600.1080.0930.0480.2150.2050.0810.0480.0940.0700.0360.3830.2520.014
GO:00550850.1860.1260.1620.0670.2240.1270.2850.0980.0860.2350.5300.5010.1730.1000.1960.0680.1880.1930.2220.078
GO:00001470.1330.0900.3780.0520.1490.0980.2020.0650.0570.1660.0950.0880.1180.3020.3770.0460.1420.0600.0460.124
GO:00068970.0340.0580.0860.0260.1150.0220.0590.1130.0960.0460.3160.2990.0820.0460.0970.0700.0330.2350.2620.012
GO:00300410.0830.0680.2770.0370.1130.0650.1410.0520.0440.1090.0660.0590.0900.5400.2980.0340.0970.0460.0340.082
GO:00002880.1250.2160.1260.1190.1050.0290.0620.0460.0380.0490.0270.0240.0800.0440.0870.1830.2850.0400.0290.109
GO:00346290.0300.0510.0710.0240.0910.0210.0510.0890.0760.0400.1760.1660.0670.0410.0770.0580.0300.6720.2060.012
GO:00516540.0340.0540.0760.0270.0950.0230.0550.0940.0810.0440.1960.1860.0720.0450.0820.0630.0340.1950.6450.013
GO:00068900.0520.0420.0590.0200.1640.0380.0950.0720.0610.0710.4130.3810.0550.0340.0620.0470.0570.1460.3050.022
GO:00000860.1150.0880.1150.0450.1650.0830.1950.0720.0610.1520.0930.0850.1230.0650.1390.0460.1270.0650.0460.186
GO:00063550.1700.2670.1360.3540.0740.0750.1420.0330.0280.1190.0220.0200.0580.0370.0610.1640.1360.0290.0230.135
GO:00199180.1320.1560.1530.0800.1360.0310.0730.0600.0500.0560.0330.0300.0970.0500.1090.2730.2120.0520.0370.074
GO:00349690.1980.1310.1890.0800.2200.0700.1320.0430.0370.1070.0640.0590.0750.1490.2310.2100.2880.0390.0300.167
GO:00442570.1860.1840.1670.1040.1380.0410.0850.0610.0520.0700.0380.0350.1080.0640.1190.3430.6100.0550.0410.259
GO:00517260.1640.1250.1640.1860.2210.2110.5160.1020.0890.4210.1270.1190.1760.0980.2020.0680.1700.0950.0670.179
GO:00346050.0740.1130.1470.0590.0920.1760.2550.4720.4080.4140.0540.0510.3520.0880.1780.0610.2530.0830.0600.030
GO:00081500.0910.1420.2130.0610.3290.0520.1480.3180.2820.1200.1610.1620.2230.1160.2890.1990.0800.3230.1930.030
GO:00064460.1840.3020.1550.3750.0860.0800.1530.0380.0330.1290.0260.0240.0710.0470.0750.2010.1810.0340.0280.175
GO:00195090.2660.2420.1650.1440.1870.0550.1110.0390.0330.0870.0510.0450.0700.0420.0750.1290.1740.0340.0260.066
GO:00165790.1680.1740.1700.0910.1530.0350.0810.0670.0570.0640.0390.0360.1060.0570.1200.4660.3230.0600.0430.124
GO:19024990.0990.1110.1010.2470.0820.1010.1420.0360.0300.1130.0220.0190.0640.0360.0690.1820.1670.0310.0230.131
GO:00000550.0460.0330.1140.0190.1130.0350.0750.0500.0430.0580.2850.2600.0400.0900.0880.0350.0500.1010.3440.031
GO:00066060.0420.0290.0360.0180.0910.0350.0650.0400.0350.0520.3320.3060.0330.0250.0350.0300.0470.1850.1930.023
GO:00069990.0680.1070.3010.0540.0920.0470.1130.0900.0770.0910.0510.0480.1530.3840.5500.0590.0690.0830.0570.077
GO:00512920.0480.0750.3060.0410.0570.0360.0780.0570.0480.0630.0340.0310.0990.6200.3420.0380.0510.0510.0370.061

In [52]:
clusterSim <- mclusterSim(g, semData=scGO, measure="Wang", combine="BMA")

In [53]:
head(clusterSim)


1.0000.6030.6380.5740.6540.627
0.6031.0000.7150.6410.7010.748
0.6380.7151.0000.6600.7260.718
0.5740.6410.6601.0000.6420.626
0.6540.7010.7260.6421.0000.676
0.6270.7480.7180.6260.6761.000

In [54]:
head(shortest.path)


V1V2V3V4V5V6
011111
101222
110111
121022
121202
121220

In [5]:
pathways <- read.table("../biochemical_pathways.tab", sep="\t")
cols <- c("pathway_name", "enzyme_name", "E.C._reaction_number", "gene_name", "reference")
colnames(pathways) <- cols

toGene <- function(ORFIdentifiers){
    genes <- character()
    for (identifier in ORFIdentifiers){
        gene <- character()
        try(
            gene <- as.character(org.Sc.sgdGENENAME[identifier])
        )
        genes <- c(genes, gene)
    }
    return(genes)
}

toPath <- function(ORFIdentifiers){
    paths <- character()
    for (identifier in ORFIdentifiers){
        path <- character()
        try(
            path <- as.character(org.Sc.sgdPATH[identifier])
        )
        paths <- c(paths, path)
    }
    return(paths)
}

get_pathways <- function(ORFIdentifiers, pathways) {
    genes  <- toGene(ORFIdentifiers)
    return(subset(pathways, gene_name %in% genes)$pathway_name)
}

get_pathway_genes <- function(ORFIdentifiers, pathways) {
    genes  <- toGene(ORFIdentifiers)
    return(subset(pathways, gene_name %in% genes)$gene_name)
}

In [ ]:
pathway_list <- sapply(g, get_pathways, pathways)
pathway_genes <- sapply(g, get_pathway_genes, pathways)

In [140]:
enrichedGOsPathway <- sapply(pathway_genes[lengths(pathway_genes) > 0], enrichedGOTerms, allGenes=allGeneNames, 
                      cutOff=cutOff, correction=correction, ont=ont, mapping=mapping, ID=ID)


Building most specific GOs .....
	( 1679 GO terms found. )

Build GO DAG topology ..........
	( 3637 GO terms and 8228 relations. )

Annotating nodes ...............
	( 1439 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 92 nontrivial nodes
		 parameters: 
			 test statistic: fisher

Building most specific GOs .....
	( 1679 GO terms found. )

Build GO DAG topology ..........
	( 3637 GO terms and 8228 relations. )

Annotating nodes ...............
	( 1439 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 67 nontrivial nodes
		 parameters: 
			 test statistic: fisher

Building most specific GOs .....
	( 1679 GO terms found. )

Build GO DAG topology ..........
	( 3637 GO terms and 8228 relations. )

Annotating nodes ...............
	( 1439 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 264 nontrivial nodes
		 parameters: 
			 test statistic: fisher

Building most specific GOs .....
	( 1679 GO terms found. )

Build GO DAG topology ..........
	( 3637 GO terms and 8228 relations. )

Annotating nodes ...............
	( 1439 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 209 nontrivial nodes
		 parameters: 
			 test statistic: fisher

Building most specific GOs .....
	( 1679 GO terms found. )

Build GO DAG topology ..........
	( 3637 GO terms and 8228 relations. )

Annotating nodes ...............
	( 1439 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 126 nontrivial nodes
		 parameters: 
			 test statistic: fisher

Building most specific GOs .....
	( 1679 GO terms found. )

Build GO DAG topology ..........
	( 3637 GO terms and 8228 relations. )

Annotating nodes ...............
	( 1439 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 61 nontrivial nodes
		 parameters: 
			 test statistic: fisher

Building most specific GOs .....
	( 1679 GO terms found. )

Build GO DAG topology ..........
	( 3637 GO terms and 8228 relations. )

Annotating nodes ...............
	( 1439 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 221 nontrivial nodes
		 parameters: 
			 test statistic: fisher

Building most specific GOs .....
	( 1679 GO terms found. )

Build GO DAG topology ..........
	( 3637 GO terms and 8228 relations. )

Annotating nodes ...............
	( 1439 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 96 nontrivial nodes
		 parameters: 
			 test statistic: fisher

Building most specific GOs .....
	( 1679 GO terms found. )

Build GO DAG topology ..........
	( 3637 GO terms and 8228 relations. )

Annotating nodes ...............
	( 1439 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 168 nontrivial nodes
		 parameters: 
			 test statistic: fisher

Building most specific GOs .....
	( 1679 GO terms found. )

Build GO DAG topology ..........
	( 3637 GO terms and 8228 relations. )

Annotating nodes ...............
	( 1439 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 44 nontrivial nodes
		 parameters: 
			 test statistic: fisher

Building most specific GOs .....
	( 1679 GO terms found. )

Build GO DAG topology ..........
	( 3637 GO terms and 8228 relations. )

Annotating nodes ...............
	( 1439 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 30 nontrivial nodes
		 parameters: 
			 test statistic: fisher

Building most specific GOs .....
	( 1679 GO terms found. )

Build GO DAG topology ..........
	( 3637 GO terms and 8228 relations. )

Annotating nodes ...............
	( 1439 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 95 nontrivial nodes
		 parameters: 
			 test statistic: fisher

Building most specific GOs .....
	( 1679 GO terms found. )

Build GO DAG topology ..........
	( 3637 GO terms and 8228 relations. )

Annotating nodes ...............
	( 1439 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 178 nontrivial nodes
		 parameters: 
			 test statistic: fisher

Building most specific GOs .....
	( 1679 GO terms found. )

Build GO DAG topology ..........
	( 3637 GO terms and 8228 relations. )

Annotating nodes ...............
	( 1439 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 129 nontrivial nodes
		 parameters: 
			 test statistic: fisher

Building most specific GOs .....
	( 1679 GO terms found. )

Build GO DAG topology ..........
	( 3637 GO terms and 8228 relations. )

Annotating nodes ...............
	( 1439 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 82 nontrivial nodes
		 parameters: 
			 test statistic: fisher

Building most specific GOs .....
	( 1679 GO terms found. )

Build GO DAG topology ..........
	( 3637 GO terms and 8228 relations. )

Annotating nodes ...............
	( 1439 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 191 nontrivial nodes
		 parameters: 
			 test statistic: fisher

Building most specific GOs .....
	( 1679 GO terms found. )

Build GO DAG topology ..........
	( 3637 GO terms and 8228 relations. )

Annotating nodes ...............
	( 1439 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 105 nontrivial nodes
		 parameters: 
			 test statistic: fisher

Building most specific GOs .....
	( 1679 GO terms found. )

Build GO DAG topology ..........
	( 3637 GO terms and 8228 relations. )

Annotating nodes ...............
	( 1439 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 41 nontrivial nodes
		 parameters: 
			 test statistic: fisher

Building most specific GOs .....
	( 1679 GO terms found. )

Build GO DAG topology ..........
	( 3637 GO terms and 8228 relations. )

Annotating nodes ...............
	( 1439 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 80 nontrivial nodes
		 parameters: 
			 test statistic: fisher

Building most specific GOs .....
	( 1679 GO terms found. )

Build GO DAG topology ..........
	( 3637 GO terms and 8228 relations. )

Annotating nodes ...............
	( 1439 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 77 nontrivial nodes
		 parameters: 
			 test statistic: fisher

In [141]:
range <- 1:length(enrichedGOsPathway)

simsPathway <- sapply(range, function(i) sapply(range, function(j) 
                    mgoSim(names(enrichedGOsPathway[[i]]),
                        names(enrichedGOsPathway[[j]]),
                        semData=scGO, measure="Wang", combine="BMA")))

In [142]:
head(simsPathway)


1.0000.7630.8430.8240.8590.7630.8360.7720.8430.5240.5790.7430.8010.7270.7900.7910.7880.6280.8710.665
0.7631.0000.7690.7070.7190.7420.7180.7450.8220.5840.7270.8040.8560.6490.6310.6530.7070.4660.7230.565
0.8430.7691.0000.9610.7650.8010.9300.7880.8440.5340.5700.7130.8620.6690.8270.8290.8980.4500.8860.678
0.8240.7070.9611.0000.7410.7630.9260.7600.8230.5090.5330.6640.8280.6590.8570.8280.9020.4480.8720.675
0.8590.7190.7650.7411.0000.5950.7400.6180.7720.4530.4810.6570.7750.7000.7100.7550.6860.5900.7240.558
0.7630.7420.8010.7630.5951.0000.7830.7460.7730.5500.6620.6670.7220.5580.6710.6190.7760.4060.7850.697

In [118]:
head(shortest.path)


V1V2V3V4V5V6V7V8V9V10V22V23V24V25V26V27V28V29V30V31
01111122234454556677
10122233245565667788
11011223135554556677
12102112124443445566
12120233245565667788
12212011223343445566

In [151]:
enrichedGOs[[1]]


GO:0006508
0.0305666626725701
GO:0009057
0.0268263523123756
GO:0009896
0.0275243318688756
GO:0019538
0.037645201260442
GO:0030163
0.038000496959972
GO:0035966
0.0369152752689263
GO:0035967
0.0232568990511012
GO:0042176
0.0192941639335425
GO:0043632
0.0317229083283707
GO:0070646
0.00379354097137652
GO:0070647
0.0151356737247353

In [ ]:
geneSimilarities <- sapply(allGenes, function(i) sapply(allGenes, function(j) geneSim(i, j, semData=scGO, combine="BMA")))

In [ ]:
geneSimilarities

In [6]:
cutOff <- 0.05

filename <- sprintf("%s-%s-%s-%s.rda", file, p, cutOff, ont)

if (file.exists(filename)){
    
    print(sprintf("loading: %s", filename))
    load(filename)
    print("loaded")
    
} else {
    
    print("creating topGO objects")

    geneLists <- vector("list", numCom) 
    GOdataObjects <- vector("list", numCom) 
    resultFishers <- vector("list", numCom) 
    results <- vector("list", numCom) 
    gos <- vector("list", numCom) 

    #perform enrichment analyses
    for (c in 1:numCom){

        #factor of interesting genes
        geneList <- factor(as.integer(allGenes %in% g[[c]]))
        names(geneList) <- allGenes
        geneLists[[c]] <- geneList

        #construct topGO object
        GOdata <- new("topGOdata", description=sprintf("topGO object for community %s", c),
                      ontology = ont, allGenes = geneList,
                      annotationFun = annFUN.org, mapping = mapping, 
                      ID = ID, nodeSize = 10)
        GOdataObjects[[c]] <- GOdata

        #fishers exact test classic
        resultFisher <- runTest(GOdata, algorithm = "classic", statistic = "fisher")
        resultFishers[[c]] <- resultFisher

        #tabulate results
        allRes <- GenTable(GOdata, classicFisher = resultFisher,
                      orderBy = "classicFisher")
        results[[c]] <- allRes
        
        #go terms < cut off  Benjamini-Hochberg multiple hypothesis corrected pval
        gos[[c]] <- score(resultFisher)[which(p.adjust(score(resultFisher), method="BH") <= cutOff)]

        print(sprintf("community %s complete", c))
    }
    
    print(sprintf("Saving data: %s", filename))
    save(geneLists, GOdataObjects, resultFishers, results, gos, file=filename)
    print("saved")
}


[1] "creating topGO objects"
Building most specific GOs .....
	( 1689 GO terms found. )

Build GO DAG topology ..........
	( 3643 GO terms and 8240 relations. )

Annotating nodes ...............
	( 1558 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 355 nontrivial nodes
		 parameters: 
			 test statistic: fisher
[1] "community 1 complete"
Building most specific GOs .....
	( 1689 GO terms found. )

Build GO DAG topology ..........
	( 3643 GO terms and 8240 relations. )

Annotating nodes ...............
	( 1558 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 581 nontrivial nodes
		 parameters: 
			 test statistic: fisher
[1] "community 2 complete"
Building most specific GOs .....
	( 1689 GO terms found. )

Build GO DAG topology ..........
	( 3643 GO terms and 8240 relations. )

Annotating nodes ...............
	( 1558 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 835 nontrivial nodes
		 parameters: 
			 test statistic: fisher
[1] "community 3 complete"
Building most specific GOs .....
	( 1689 GO terms found. )

Build GO DAG topology ..........
	( 3643 GO terms and 8240 relations. )

Annotating nodes ...............
	( 1558 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 681 nontrivial nodes
		 parameters: 
			 test statistic: fisher
[1] "community 4 complete"
Building most specific GOs .....
	( 1689 GO terms found. )

Build GO DAG topology ..........
	( 3643 GO terms and 8240 relations. )

Annotating nodes ...............
	( 1558 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 586 nontrivial nodes
		 parameters: 
			 test statistic: fisher
[1] "community 5 complete"
Building most specific GOs .....
	( 1689 GO terms found. )

Build GO DAG topology ..........
	( 3643 GO terms and 8240 relations. )

Annotating nodes ...............
	( 1558 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 644 nontrivial nodes
		 parameters: 
			 test statistic: fisher
[1] "community 6 complete"
Building most specific GOs .....
	( 1689 GO terms found. )

Build GO DAG topology ..........
	( 3643 GO terms and 8240 relations. )

Annotating nodes ...............
	( 1558 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 877 nontrivial nodes
		 parameters: 
			 test statistic: fisher
[1] "community 7 complete"
Building most specific GOs .....
	( 1689 GO terms found. )

Build GO DAG topology ..........
	( 3643 GO terms and 8240 relations. )

Annotating nodes ...............
	( 1558 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 706 nontrivial nodes
		 parameters: 
			 test statistic: fisher
[1] "community 8 complete"
Building most specific GOs .....
	( 1689 GO terms found. )

Build GO DAG topology ..........
	( 3643 GO terms and 8240 relations. )

Annotating nodes ...............
	( 1558 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 831 nontrivial nodes
		 parameters: 
			 test statistic: fisher
[1] "community 9 complete"
Building most specific GOs .....
	( 1689 GO terms found. )

Build GO DAG topology ..........
	( 3643 GO terms and 8240 relations. )

Annotating nodes ...............
	( 1558 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 408 nontrivial nodes
		 parameters: 
			 test statistic: fisher
[1] "community 10 complete"
Building most specific GOs .....
	( 1689 GO terms found. )

Build GO DAG topology ..........
	( 3643 GO terms and 8240 relations. )

Annotating nodes ...............
	( 1558 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 516 nontrivial nodes
		 parameters: 
			 test statistic: fisher
[1] "community 11 complete"
Building most specific GOs .....
	( 1689 GO terms found. )

Build GO DAG topology ..........
	( 3643 GO terms and 8240 relations. )

Annotating nodes ...............
	( 1558 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 567 nontrivial nodes
		 parameters: 
			 test statistic: fisher
[1] "community 12 complete"
Building most specific GOs .....
	( 1689 GO terms found. )

Build GO DAG topology ..........
	( 3643 GO terms and 8240 relations. )

Annotating nodes ...............
	( 1558 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 706 nontrivial nodes
		 parameters: 
			 test statistic: fisher
[1] "community 13 complete"
Building most specific GOs .....
	( 1689 GO terms found. )

Build GO DAG topology ..........
	( 3643 GO terms and 8240 relations. )

Annotating nodes ...............
	( 1558 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 531 nontrivial nodes
		 parameters: 
			 test statistic: fisher
[1] "community 14 complete"
Building most specific GOs .....
	( 1689 GO terms found. )

Build GO DAG topology ..........
	( 3643 GO terms and 8240 relations. )

Annotating nodes ...............
	( 1558 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 473 nontrivial nodes
		 parameters: 
			 test statistic: fisher
[1] "community 15 complete"
Building most specific GOs .....
	( 1689 GO terms found. )

Build GO DAG topology ..........
	( 3643 GO terms and 8240 relations. )

Annotating nodes ...............
	( 1558 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 612 nontrivial nodes
		 parameters: 
			 test statistic: fisher
[1] "community 16 complete"
Building most specific GOs .....
	( 1689 GO terms found. )

Build GO DAG topology ..........
	( 3643 GO terms and 8240 relations. )

Annotating nodes ...............
	( 1558 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 577 nontrivial nodes
		 parameters: 
			 test statistic: fisher
[1] "community 17 complete"
Building most specific GOs .....
	( 1689 GO terms found. )

Build GO DAG topology ..........
	( 3643 GO terms and 8240 relations. )

Annotating nodes ...............
	( 1558 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 501 nontrivial nodes
		 parameters: 
			 test statistic: fisher
[1] "community 18 complete"
Building most specific GOs .....
	( 1689 GO terms found. )

Build GO DAG topology ..........
	( 3643 GO terms and 8240 relations. )

Annotating nodes ...............
	( 1558 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 463 nontrivial nodes
		 parameters: 
			 test statistic: fisher
[1] "community 19 complete"
Building most specific GOs .....
	( 1689 GO terms found. )

Build GO DAG topology ..........
	( 3643 GO terms and 8240 relations. )

Annotating nodes ...............
	( 1558 genes annotated to the GO terms. )

			 -- Classic Algorithm -- 

		 the algorithm is scoring 602 nontrivial nodes
		 parameters: 
			 test statistic: fisher
[1] "community 20 complete"
[1] "Saving data: yeast_union-0.7-0.05-BP.rda"
[1] "saved"

In [7]:
print_accession_number <- function(terms, file){
    for (s in strsplit(names(terms), ":")){
        write(s[2], file=file, append=TRUE)
    }
}

In [8]:
###write accession number to file
for (i in 1:length(gos)){
    accessionFile <- sprintf("accession_numbers-%s-%s-%s", cutOff, ont, i)
    print_accession_number(gos[[i]], file=accessionFile)
}

In [10]:
wangAllGeneSim <- mgeneSim(allGenes, semData=scGO, measure="Wang", combine="BMA", verbose=TRUE)


  |======================================================================| 100%

In [11]:
clusters <- hclust(as.dist(-log(wangAllGeneSim)))
clusterCut <- cutree(clusters, numCom)

In [12]:
plot(clusters)



In [13]:
assignedCommunities <- numeric(length(allGenes))
names(assignedCommunities) <- allGenes

for (i in 1:numCom){
    for (geneName in g[[i]]){
        assignedCommunities[geneName] <- i
    }
}

In [14]:
library(NMI)

In [15]:
assignedCommunities <- assignedCommunities[names(assignedCommunities) %in% names(clusterCut)]

In [16]:
assignedCommunitiesDF <- data.frame(assignedCommunities)
assignedCommunitiesDF <- cbind(Row.Names = rownames(assignedCommunitiesDF), assignedCommunitiesDF)

In [17]:
clusterCutDF <- data.frame(clusterCut)
clusterCutDF <- cbind(Row.Names = rownames(clusterCutDF), clusterCutDF)

In [18]:
NMI(assignedCommunitiesDF, clusterCutDF)


$value = 0.0711074542082947

In [98]:
most_representative_term_weighted <- function(namedTerms){
    
    counts <- numeric(length(namedTerms))
    names(counts) <- names(namedTerms)

    for (term in names(namedTerms)) {
        ancestors <- as.list(GOBPANCESTOR[term])
        for (ancestor in ancestors[[term]]) {
            if (ancestor %in% names(counts)) {
                counts[ancestor] <- counts[ancestor] + 1
            }
        }

    }
#     return (sort(counts / sum(counts), decreasing=TRUE))
    return (sort(counts / max(counts), decreasing=TRUE))
}

In [33]:
most_representative_term_ancestor <- function(namedTerms){
    
    counts <- numeric(length(namedTerms))
    names(counts) <- names(namedTerms)

    for (term in names(namedTerms)) {
        ancestors <- as.list(GOBPANCESTOR[term])
        for (ancestor in ancestors[[term]]) {
            if (ancestor %in% names(counts)) {
                counts[ancestor] <- counts[ancestor] + 1
            }
        }

    }
#     return (sort(counts / sum(counts), decreasing=TRUE))
    return (names(sort(counts / sum(counts), decreasing=TRUE)[1]))
}

In [34]:
representativeTermsAncestor <- sapply(Filter(length, gos), most_representative_term_ancestor)

In [35]:
select(GO.db, keys=representativeTermsAncestor, columns=c("TERM", "DEFINITION"))


'select()' returned many:1 mapping between keys and columns
GOIDTERMDEFINITION
GO:0006644 phospholipid metabolic process The chemical reactions and pathways involving phospholipids, any lipid containing phosphoric acid as a mono- or diester.
GO:0007049 cell cycle The progression of biochemical and morphological phases and events that occur in a cell during successive cell replication or nuclear replication events. Canonically, the cell cycle comprises the replication and segregation of genetic material followed by the division of the cell, but in endocycles or syncytial cells nuclear replication or nuclear division may not be followed by cell division.
GO:0044699 single-organism process A biological process that involves only one organism.
GO:0044710 single-organism metabolic process A metabolic process - chemical reactions and pathways, including anabolism and catabolism, by which living organisms transform chemical substances - which involves a single organism.
GO:0051128 regulation of cellular component organization Any process that modulates the frequency, rate or extent of a process involved in the formation, arrangement of constituent parts, or disassembly of cell structures, including the plasma membrane and any external encapsulating structures such as the cell wall and cell envelope.
GO:0034641 cellular nitrogen compound metabolic process The chemical reactions and pathways involving various organic and inorganic nitrogenous compounds, as carried out by individual cells.
GO:0006725 cellular aromatic compound metabolic process The chemical reactions and pathways involving aromatic compounds, any organic compound characterized by one or more planar rings, each of which contains conjugated double bonds and delocalized pi electrons, as carried out by individual cells.
GO:0044710 single-organism metabolic process A metabolic process - chemical reactions and pathways, including anabolism and catabolism, by which living organisms transform chemical substances - which involves a single organism.
GO:0044238 primary metabolic process The chemical reactions and pathways involving those compounds which are formed as a part of the normal anabolic and catabolic processes. These processes take place in most, if not all, cells of the organism.
GO:0044699 single-organism process A biological process that involves only one organism.
GO:0005975 carbohydrate metabolic process The chemical reactions and pathways involving carbohydrates, any of a group of organic compounds based of the general formula Cx(H2O)y. Includes the formation of carbohydrate derivatives by the addition of a carbohydrate residue to another molecule.
GO:0044710 single-organism metabolic process A metabolic process - chemical reactions and pathways, including anabolism and catabolism, by which living organisms transform chemical substances - which involves a single organism.
GO:0030029 actin filament-based process Any cellular process that depends upon or alters the actin cytoskeleton, that part of the cytoskeleton comprising actin filaments and their associated proteins.
GO:0051726 regulation of cell cycle Any process that modulates the rate or extent of progression through the cell cycle.
GO:0016070 RNA metabolic process The cellular chemical reactions and pathways involving RNA, ribonucleic acid, one of the two main type of nucleic acid, consisting of a long, unbranched macromolecule formed from ribonucleotides joined in 3',5'-phosphodiester linkage.
GO:0044085 cellular component biogenesis A process that results in the biosynthesis of constituent macromolecules, assembly, and arrangement of constituent parts of a cellular component. Includes biosynthesis of constituent macromolecules, and those macromolecular modifications that are involved in synthesis or assembly of the cellular component.
GO:0044763 single-organism cellular process Any process that is carried out at the cellular level, occurring within a single organism.
GO:0048518 positive regulation of biological process Any process that activates or increases the frequency, rate or extent of a biological process. Biological processes are regulated by many means; examples include the control of gene expression, protein modification or interaction with a protein or substrate molecule.
GO:0016070 RNA metabolic process The cellular chemical reactions and pathways involving RNA, ribonucleic acid, one of the two main type of nucleic acid, consisting of a long, unbranched macromolecule formed from ribonucleotides joined in 3',5'-phosphodiester linkage.
GO:0051716 cellular response to stimulus Any process that results in a change in state or activity of a cell (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a stimulus. The process begins with detection of the stimulus by a cell and ends with a change in state or activity or the cell.
GO:0009893 positive regulation of metabolic process Any process that activates or increases the frequency, rate or extent of the chemical reactions and pathways within a cell or an organism.
GO:0009056 catabolic process The chemical reactions and pathways resulting in the breakdown of substances, including the breakdown of carbon compounds with the liberation of energy for use by the cell or organism.
GO:0042221 response to chemical Any process that results in a change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a chemical stimulus.
GO:0051179 localization Any process in which a cell, a substance, or a cellular entity, such as a protein complex or organelle, is transported, tethered to or otherwise maintained in a specific location. In the case of substances, localization may also be achieved via selective degradation.
GO:0061024 membrane organization A process which results in the assembly, arrangement of constituent parts, or disassembly of a membrane. A membrane is a double layer of lipid molecules that encloses all cells, and, in eukaryotes, many organelles; may be a single or double lipid bilayer; also includes associated proteins.
GO:0050789 regulation of biological process Any process that modulates the frequency, rate or extent of a biological process. Biological processes are regulated by many means; examples include the control of gene expression, protein modification or interaction with a protein or substrate molecule.
GO:0006629 lipid metabolic process The chemical reactions and pathways involving lipids, compounds soluble in an organic solvent but not, or sparingly, in an aqueous solvent. Includes fatty acids; neutral fats, other fatty-acid esters, and soaps; long-chain (fatty) alcohols and waxes; sphingoids and other long-chain bases; glycolipids, phospholipids and sphingolipids; and carotenes, polyprenols, sterols, terpenes and other isoprenoids.
GO:0065007 biological regulation Any process that modulates a measurable attribute of any biological process, quality or function.
GO:0018130 heterocycle biosynthetic process The chemical reactions and pathways resulting in the formation of heterocyclic compounds, those with a cyclic molecular structure and at least two different atoms in the ring (or rings).
GO:0008152 metabolic process The chemical reactions and pathways, including anabolism and catabolism, by which living organisms transform chemical substances. Metabolic processes typically transform small molecules, but also include macromolecular processes such as DNA repair and replication, and protein synthesis and degradation.
GO:0007031 peroxisome organization A process that is carried out at the cellular level which results in the assembly, arrangement of constituent parts, or disassembly of a peroxisome. A peroxisome is a small, membrane-bounded organelle that uses dioxygen (O2) to oxidize organic molecules.
GO:0065007 biological regulation Any process that modulates a measurable attribute of any biological process, quality or function.
GO:0051179 localization Any process in which a cell, a substance, or a cellular entity, such as a protein complex or organelle, is transported, tethered to or otherwise maintained in a specific location. In the case of substances, localization may also be achieved via selective degradation.
GO:0071840 cellular component organization or biogenesis A process that results in the biosynthesis of constituent macromolecules, assembly, arrangement of constituent parts, or disassembly of a cellular component.
GO:0071840 cellular component organization or biogenesis A process that results in the biosynthesis of constituent macromolecules, assembly, arrangement of constituent parts, or disassembly of a cellular component.
GO:0008152 metabolic process The chemical reactions and pathways, including anabolism and catabolism, by which living organisms transform chemical substances. Metabolic processes typically transform small molecules, but also include macromolecular processes such as DNA repair and replication, and protein synthesis and degradation.
GO:0008152 metabolic process The chemical reactions and pathways, including anabolism and catabolism, by which living organisms transform chemical substances. Metabolic processes typically transform small molecules, but also include macromolecular processes such as DNA repair and replication, and protein synthesis and degradation.
GO:0050789 regulation of biological process Any process that modulates the frequency, rate or extent of a biological process. Biological processes are regulated by many means; examples include the control of gene expression, protein modification or interaction with a protein or substrate molecule.
GO:0043603 cellular amide metabolic process The chemical reactions and pathways involving an amide, any derivative of an oxoacid in which an acidic hydroxy group has been replaced by an amino or substituted amino group, as carried out by individual cells.
GO:0009132 nucleoside diphosphate metabolic process The chemical reactions and pathways involving a nucleoside diphosphate, a compound consisting of a nucleobase linked to a deoxyribose or ribose sugar esterified with diphosphate on the sugar.
GO:0009058 biosynthetic process The chemical reactions and pathways resulting in the formation of substances; typically the energy-requiring part of metabolism in which simpler substances are transformed into more complex ones.
GO:0051179 localization Any process in which a cell, a substance, or a cellular entity, such as a protein complex or organelle, is transported, tethered to or otherwise maintained in a specific location. In the case of substances, localization may also be achieved via selective degradation.
GO:0048519 negative regulation of biological process Any process that stops, prevents, or reduces the frequency, rate or extent of a biological process. Biological processes are regulated by many means; examples include the control of gene expression, protein modification or interaction with a protein or substrate molecule.
GO:1901360 organic cyclic compound metabolic process The chemical reactions and pathways involving organic cyclic compound.
GO:0022402 cell cycle process The cellular process that ensures successive accurate and complete genome replication and chromosome segregation.
GO:0007163 establishment or maintenance of cell polarity Any cellular process that results in the specification, formation or maintenance of anisotropic intracellular organization or cell growth patterns.
GO:0043170 macromolecule metabolic process The chemical reactions and pathways involving macromolecules, any molecule of high relative molecular mass, the structure of which essentially comprises the multiple repetition of units derived, actually or conceptually, from molecules of low relative molecular mass.
GO:0009056 catabolic process The chemical reactions and pathways resulting in the breakdown of substances, including the breakdown of carbon compounds with the liberation of energy for use by the cell or organism.
GO:0043170 macromolecule metabolic process The chemical reactions and pathways involving macromolecules, any molecule of high relative molecular mass, the structure of which essentially comprises the multiple repetition of units derived, actually or conceptually, from molecules of low relative molecular mass.
GO:0051716 cellular response to stimulus Any process that results in a change in state or activity of a cell (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a stimulus. The process begins with detection of the stimulus by a cell and ends with a change in state or activity or the cell.
GO:0016043 cellular component organization A process that results in the assembly, arrangement of constituent parts, or disassembly of a cellular component.
GO:0009058 biosynthetic process The chemical reactions and pathways resulting in the formation of substances; typically the energy-requiring part of metabolism in which simpler substances are transformed into more complex ones.

In [36]:
simsGOAncestor <- mgoSim(representativeTermsAncestor, representativeTermsAncestor, semData=scGO, measure="Wang", combine=NULL)

In [37]:
head(simsGOAncestor)


GO:0006644GO:0007049GO:0044699GO:0044710GO:0051128GO:0034641GO:0006725GO:0044238GO:0005975GO:0030029GO:0071840GO:0043603GO:0009132GO:0009058GO:0048519GO:1901360GO:0022402GO:0007163GO:0043170GO:0016043
GO:00066441.0000.3540.2390.4050.1300.3340.3540.3230.3620.3540.1060.2940.6180.1970.0870.2810.3390.3540.2810.184
GO:00070490.3541.0000.5470.3790.2450.2890.3210.1910.1390.7220.2430.2560.2910.1910.1800.1560.8720.7220.1560.379
GO:00446990.2390.5471.0000.6430.1780.2120.2430.3400.2360.5470.4440.1920.1980.3400.3040.2760.4930.5470.2760.286
GO:00447100.4050.3790.6431.0000.1290.3400.3790.5070.3710.3790.2860.3050.3360.5070.2100.4190.3490.3790.4190.198
GO:00511280.1300.2450.1780.1291.0000.2250.2450.1420.1070.2450.4000.1980.1070.1420.4470.1170.2290.2450.1170.651
GO:00346410.3340.2890.2120.3400.2251.0000.6490.3790.2830.2890.2120.8880.4330.3790.1610.3140.2680.2890.3140.340

In [38]:
head(shortest.path)


V1V2V3V4V5V6V7V8V9V10V43V44V45V46V47V48V49V50V51V52
01111221225352433333
10111122125352433322
11022232236263542233
11202232236463544433
11220111114441324422
21221012225452434411

In [15]:
information_content <- function(term){
    return (goSim(term, term, semData=scGO, measure="Resnik"))
}

most_representative_term_ic <- function(namedTerms){
    ics <- sapply(names(namedTerms), information_content)
    names(ics) <- names(namedTerms)
    return(names(sort(ics, decreasing=TRUE)[1]))
}

In [16]:
representativeTermsIC <- sapply(Filter(length, gos), most_representative_term_ic)

In [17]:
select(GO.db, keys=representativeTermsIC, columns=c("TERM", "DEFINITION"))


'select()' returned 1:1 mapping between keys and columns
GOIDTERMDEFINITION
GO:0090114 COPII-coated vesicle budding The evagination of an endoplasmic reticulum membrane, resulting in formation of a COPII-coated vesicle.
GO:0031146 SCF-dependent proteasomal ubiquitin-dependent protein catabolic process The chemical reactions and pathways resulting in the breakdown of a protein or peptide by hydrolysis of its peptide bonds, initiated by the covalent attachment of ubiquitin, with ubiquitin-protein ligation catalyzed by an SCF (Skp1/Cul1/F-box protein) complex, and mediated by the proteasome.
GO:0009132 nucleoside diphosphate metabolic process The chemical reactions and pathways involving a nucleoside diphosphate, a compound consisting of a nucleobase linked to a deoxyribose or ribose sugar esterified with diphosphate on the sugar.
GO:0044038 cell wall macromolecule biosynthetic process The chemical reactions and pathways resulting in the formation of a macromolecule destined to form part of a cell wall.
GO:0043467 regulation of generation of precursor metabolites and energy Any process that modulates the frequency, rate or extent of the chemical reactions and pathways resulting in the formation of precursor metabolites, substances from which energy is derived, and the processes involved in the liberation of energy from these substances.
GO:0008033 tRNA processing The process in which a pre-tRNA molecule is converted to a mature tRNA, ready for addition of an aminoacyl group.
GO:0031113 regulation of microtubule polymerization Any process that modulates the frequency, rate or extent of microtubule polymerization.
GO:0051123 RNA polymerase II transcriptional preinitiation complex assembly The aggregation, arrangement and bonding together of proteins on an RNA polymerase II promoter DNA to form the transcriptional preinitiation complex (PIC), the formation of which is a prerequisite for transcription by RNA polymerase.
GO:0006896 Golgi to vacuole transport The directed movement of substances from the Golgi to the vacuole.
GO:0016973 poly(A)+ mRNA export from nucleus The directed movement of poly(A)+ mRNA out of the nucleus into the cytoplasm.
GO:0006576 cellular biogenic amine metabolic process The chemical reactions and pathways occurring at the level of individual cells involving any of a group of naturally occurring, biologically active amines, such as norepinephrine, histamine, and serotonin, many of which act as neurotransmitters.
GO:0002181 cytoplasmic translation The chemical reactions and pathways resulting in the formation of a protein in the cytoplasm. This is a ribosome-mediated process in which the information in messenger RNA (mRNA) is used to specify the sequence of amino acids in the protein.
GO:0040001 establishment of mitotic spindle localization The cell cycle process in which the directed movement of the mitotic spindle to a specific location in the cell occurs.
GO:0016558 protein import into peroxisome matrix The import of proteins into the peroxisomal matrix. A peroxisome targeting signal (PTS) binds to a soluble receptor protein in the cytosol, and the resulting complex then binds to a receptor protein in the peroxisome membrane and is imported. The cargo protein is then released into the peroxisome matrix.
GO:0006999 nuclear pore organization A process that is carried out at the cellular level which results in the assembly, arrangement of constituent parts, or disassembly of the nuclear pore.
GO:0007096 regulation of exit from mitosis Any process involved in the progression from anaphase/telophase to G1 that is associated with a conversion from high to low mitotic CDK activity.
GO:1904669 ATP export The directed movement of ATP out of a cell or organelle.

In [18]:
simsGOIC <- mgoSim(representativeTermsIC, representativeTermsIC, semData=scGO, measure="Wang", combine=NULL)

In [19]:
head(simsGOIC)


GO:0090114GO:0031146GO:0009132GO:0044038GO:0043467GO:0008033GO:0031113GO:0051123GO:0006896GO:0016973GO:0006576GO:0002181GO:0040001GO:0016558GO:0006999GO:0007096GO:1904669
GO:00901141.0000.0450.1260.0830.0700.0500.1870.0860.5510.3920.0680.0480.4280.4960.1970.1930.214
GO:00311460.0451.0000.1860.2410.1610.2260.0370.1540.0240.0920.1960.2960.0400.0320.0720.0350.020
GO:00091320.1260.1861.0000.1930.1920.4040.1020.2940.0650.1120.3500.2470.1190.0970.0890.1060.056
GO:00440380.0830.2410.1931.0000.2050.2700.1060.3990.0290.1140.2500.4620.0800.0640.1410.0710.025
GO:00434670.0700.1610.1920.2051.0000.1850.1810.1260.0390.0630.2740.1620.0680.0530.1280.1760.033
GO:00080330.0500.2260.4040.2700.1851.0000.0420.3830.0260.1300.3280.3280.0460.0370.0830.0400.021

In [20]:
head(shortest.path)


V1V2V3V4V5V6V7V8V9V10V11V12V13V14V15V16V17V18V19V20
01111121222233232334
10112211111122121223
11022112222233222334
11201222222233232334
12210232333344343445
12122022333344333445

In [148]:
wangClusterSim <- mclusterSim(g, semData=scGO, measure="Wang", combine="BMA")

In [149]:
head(wangClusterSim)


1.0000.5270.5440.4910.5340.4910.5130.4950.4480.5050.3750.6320.531
0.5271.0000.6100.5680.5200.5340.5230.5070.5480.5400.4550.5840.494
0.5440.6101.0000.6250.5230.6080.6510.6060.6340.6020.3580.6040.391
0.4910.5680.6251.0000.5140.5580.6090.5520.6230.5920.4160.5210.383
0.5340.5200.5230.5141.0000.4960.5140.5220.4640.5850.4090.5540.453
0.4910.5340.6080.5580.4961.0000.5570.5400.5390.5490.3540.5910.453

In [150]:
head(shortest.path)


V1V2V3V4V5V6V7V8V9V10V11V12V13
0111221122111
1012112231222
1101122232222
1210231223222
2112013342333
2123103341333

In [156]:
goSims <- matrix(numeric(), nrow=numCom, ncol=numCom)

for (i in 1:numCom){
    for (j in 1:numCom){
        goSims[i, j] = mgoSim(names(gos[[i]]), names(gos[[j]]), measure="Wang", semData=scGO, combine="BMA")
    }
}

In [157]:
head(goSims)


1.0000.1640.0930.1400.6020.1450.2210.3380.1340.276NA 0.5250.626
0.1641.0000.6330.4750.1710.3480.3630.3220.6720.234NA 0.3620.082
0.0930.6331.0000.3430.0850.2860.4110.2010.5630.200NA 0.2460.073
0.1400.4750.3431.0000.1320.3210.2360.2261.0000.276NA 0.3310.094
0.6020.1710.0850.1321.0000.1820.2980.5110.1340.234NA 0.4470.537
0.1450.3480.2860.3210.1821.0000.2850.2780.3850.286NA 0.3680.137

In [18]:
wangGoSims <- sapply(names(enrichedGOs), 
                     function(i) sapply(names(enrichedGOs), 
                                        function(j) mgoSim(i, j, semData=scGO, measure="Wang", combine="BMA")))

In [19]:
wangGoSims



In [22]:
mgeneSim(allGeneNames[as.integer(g[[1]])], semData=scGO, measure="Wang", combine="BMA")


  |======================================================================| 100%
YML028WYGL122CYPL214CYHL006CYKL130CYCR011CYBL007CYJR091CYGL145WYBR133CYDR214WYGR268CYLR291CYOR138CYEL023CYFR002W
YML028W1.0000.2180.3300.3890.1930.3850.1830.1500.1210.3240.5470.4770.1460.1570.4770.152
YGL122C0.2181.0000.2700.3180.1190.3400.0850.3320.4510.3510.0980.1850.2950.2230.1850.326
YPL214C0.3300.2701.0000.3380.1880.1710.1310.2120.0530.3190.1060.0690.3960.1230.0690.080
YHL006C0.3890.3180.3381.0000.2080.2340.2640.2630.0780.3100.2350.1050.2300.2490.1050.196
YKL130C0.1930.1190.1880.2081.0000.2450.1760.2980.2030.1520.1420.2430.1180.0910.2430.156
YCR011C0.3850.3400.1710.2340.2451.0000.3910.2420.5120.3730.2840.1960.1610.1120.1960.350
YBL007C0.1830.0850.1310.2640.1760.3911.0000.1490.4530.1670.1580.2830.0790.0620.2830.328
YJR091C0.1500.3320.2120.2630.2980.2420.1491.0000.3440.2100.1180.2000.1150.1400.2000.221
YGL145W0.1210.4510.0530.0780.2030.5120.4530.3441.0000.1170.0950.1580.0450.0350.1580.451
YBR133C0.3240.3510.3190.3100.1520.3730.1670.2100.1171.0000.1920.2170.2940.3300.2170.132
YDR214W0.5470.0980.1060.2350.1420.2840.1580.1180.0950.1921.0000.4770.0990.1440.4770.145
YGR268C0.4770.1850.0690.1050.2430.1960.2830.2000.1580.2170.4771.0000.0770.1441.0000.203
YLR291C0.1460.2950.3960.2300.1180.1610.0790.1150.0450.2940.0990.0771.0000.2680.0770.064
YOR138C0.1570.2230.1230.2490.0910.1120.0620.1400.0350.3300.1440.1440.2681.0000.1440.070
YEL023C0.4770.1850.0690.1050.2430.1960.2830.2000.1580.2170.4771.0000.0770.1441.0000.203
YFR002W0.1520.3260.0800.1960.1560.3500.3280.2210.4510.1320.1450.2030.0640.0700.2031.000

In [21]:
mgoSim(names(enrichedGOs[[1]]), names(enrichedGOs[[2]]), semData=scGO, measure="Wang", combine="BMA")


0.106

In [15]:
head(shortest.path)


V1V2V3V4V5V6
011111
101222
110111
121022
121202
121220

In [116]:
distances <- numeric(length = (numCom * (numCom - 1)) / 2)
semSims <- numeric(length = (numCom * (numCom - 1)) / 2)

completed <- 0

for (c1 in 1:length(enrichedGOsPathway)) {
    
    for (c2 in c1:length(enrichedGOsPathway)) {
        
        if (c1 == c2) next   
        
        completed <- completed + 1  
        semSims[completed] <- simsPathway[c1, c2]
            
        distances[completed] <- shortest.path[c1, c2]
        
        print(sprintf("Completed: %s", completed))
    }
}


[1] "Completed: 1"
[1] "Completed: 2"
[1] "Completed: 3"
[1] "Completed: 4"
[1] "Completed: 5"
[1] "Completed: 6"
[1] "Completed: 7"
[1] "Completed: 8"
[1] "Completed: 9"
[1] "Completed: 10"
[1] "Completed: 11"
[1] "Completed: 12"
[1] "Completed: 13"
[1] "Completed: 14"
[1] "Completed: 15"
[1] "Completed: 16"
[1] "Completed: 17"
[1] "Completed: 18"
[1] "Completed: 19"
[1] "Completed: 20"
[1] "Completed: 21"
[1] "Completed: 22"
[1] "Completed: 23"
[1] "Completed: 24"
[1] "Completed: 25"
[1] "Completed: 26"
[1] "Completed: 27"
[1] "Completed: 28"
[1] "Completed: 29"
[1] "Completed: 30"
[1] "Completed: 31"
[1] "Completed: 32"
[1] "Completed: 33"
[1] "Completed: 34"
[1] "Completed: 35"
[1] "Completed: 36"
[1] "Completed: 37"
[1] "Completed: 38"
[1] "Completed: 39"
[1] "Completed: 40"
[1] "Completed: 41"
[1] "Completed: 42"
[1] "Completed: 43"
[1] "Completed: 44"
[1] "Completed: 45"
[1] "Completed: 46"
[1] "Completed: 47"
[1] "Completed: 48"
[1] "Completed: 49"
[1] "Completed: 50"
[1] "Completed: 51"
[1] "Completed: 52"
[1] "Completed: 53"
[1] "Completed: 54"
[1] "Completed: 55"
[1] "Completed: 56"
[1] "Completed: 57"
[1] "Completed: 58"
[1] "Completed: 59"
[1] "Completed: 60"
[1] "Completed: 61"
[1] "Completed: 62"
[1] "Completed: 63"
[1] "Completed: 64"
[1] "Completed: 65"
[1] "Completed: 66"
[1] "Completed: 67"
[1] "Completed: 68"
[1] "Completed: 69"
[1] "Completed: 70"
[1] "Completed: 71"
[1] "Completed: 72"
[1] "Completed: 73"
[1] "Completed: 74"
[1] "Completed: 75"
[1] "Completed: 76"
[1] "Completed: 77"
[1] "Completed: 78"
[1] "Completed: 79"
[1] "Completed: 80"
[1] "Completed: 81"
[1] "Completed: 82"
[1] "Completed: 83"
[1] "Completed: 84"
[1] "Completed: 85"
[1] "Completed: 86"
[1] "Completed: 87"
[1] "Completed: 88"
[1] "Completed: 89"
[1] "Completed: 90"
[1] "Completed: 91"
[1] "Completed: 92"
[1] "Completed: 93"
[1] "Completed: 94"
[1] "Completed: 95"
[1] "Completed: 96"
[1] "Completed: 97"
[1] "Completed: 98"
[1] "Completed: 99"
[1] "Completed: 100"
[1] "Completed: 101"
[1] "Completed: 102"
[1] "Completed: 103"
[1] "Completed: 104"
[1] "Completed: 105"
[1] "Completed: 106"
[1] "Completed: 107"
[1] "Completed: 108"
[1] "Completed: 109"
[1] "Completed: 110"
[1] "Completed: 111"
[1] "Completed: 112"
[1] "Completed: 113"
[1] "Completed: 114"
[1] "Completed: 115"
[1] "Completed: 116"
[1] "Completed: 117"
[1] "Completed: 118"
[1] "Completed: 119"
[1] "Completed: 120"
[1] "Completed: 121"
[1] "Completed: 122"
[1] "Completed: 123"
[1] "Completed: 124"
[1] "Completed: 125"
[1] "Completed: 126"
[1] "Completed: 127"
[1] "Completed: 128"
[1] "Completed: 129"
[1] "Completed: 130"
[1] "Completed: 131"
[1] "Completed: 132"
[1] "Completed: 133"
[1] "Completed: 134"
[1] "Completed: 135"
[1] "Completed: 136"
[1] "Completed: 137"
[1] "Completed: 138"
[1] "Completed: 139"
[1] "Completed: 140"
[1] "Completed: 141"
[1] "Completed: 142"
[1] "Completed: 143"
[1] "Completed: 144"
[1] "Completed: 145"
[1] "Completed: 146"
[1] "Completed: 147"
[1] "Completed: 148"
[1] "Completed: 149"
[1] "Completed: 150"
[1] "Completed: 151"
[1] "Completed: 152"
[1] "Completed: 153"
[1] "Completed: 154"
[1] "Completed: 155"
[1] "Completed: 156"
[1] "Completed: 157"
[1] "Completed: 158"
[1] "Completed: 159"
[1] "Completed: 160"
[1] "Completed: 161"
[1] "Completed: 162"
[1] "Completed: 163"
[1] "Completed: 164"
[1] "Completed: 165"
[1] "Completed: 166"
[1] "Completed: 167"
[1] "Completed: 168"
[1] "Completed: 169"
[1] "Completed: 170"
[1] "Completed: 171"
[1] "Completed: 172"
[1] "Completed: 173"
[1] "Completed: 174"
[1] "Completed: 175"
[1] "Completed: 176"
[1] "Completed: 177"
[1] "Completed: 178"
[1] "Completed: 179"
[1] "Completed: 180"
[1] "Completed: 181"
[1] "Completed: 182"
[1] "Completed: 183"
[1] "Completed: 184"
[1] "Completed: 185"
[1] "Completed: 186"
[1] "Completed: 187"
[1] "Completed: 188"
[1] "Completed: 189"
[1] "Completed: 190"
[1] "Completed: 191"
[1] "Completed: 192"
[1] "Completed: 193"
[1] "Completed: 194"
[1] "Completed: 195"
[1] "Completed: 196"
[1] "Completed: 197"
[1] "Completed: 198"
[1] "Completed: 199"
[1] "Completed: 200"
[1] "Completed: 201"
[1] "Completed: 202"
[1] "Completed: 203"
[1] "Completed: 204"
[1] "Completed: 205"
[1] "Completed: 206"
[1] "Completed: 207"
[1] "Completed: 208"
[1] "Completed: 209"
[1] "Completed: 210"
[1] "Completed: 211"
[1] "Completed: 212"
[1] "Completed: 213"
[1] "Completed: 214"
[1] "Completed: 215"
[1] "Completed: 216"
[1] "Completed: 217"
[1] "Completed: 218"
[1] "Completed: 219"
[1] "Completed: 220"
[1] "Completed: 221"
[1] "Completed: 222"
[1] "Completed: 223"
[1] "Completed: 224"
[1] "Completed: 225"
[1] "Completed: 226"
[1] "Completed: 227"
[1] "Completed: 228"
[1] "Completed: 229"
[1] "Completed: 230"
[1] "Completed: 231"
[1] "Completed: 232"
[1] "Completed: 233"
[1] "Completed: 234"
[1] "Completed: 235"
[1] "Completed: 236"
[1] "Completed: 237"
[1] "Completed: 238"
[1] "Completed: 239"
[1] "Completed: 240"
[1] "Completed: 241"
[1] "Completed: 242"
[1] "Completed: 243"
[1] "Completed: 244"
[1] "Completed: 245"
[1] "Completed: 246"
[1] "Completed: 247"
[1] "Completed: 248"
[1] "Completed: 249"
[1] "Completed: 250"
[1] "Completed: 251"
[1] "Completed: 252"
[1] "Completed: 253"

In [117]:
plot(distances, semSims, xlab="Distance on Map", ylab="Shared Paths")



In [102]:
cor(distances, semSims, method="spearman")


-0.05678168738453

In [42]:
library(GOSim)
setOntology(ont, loadIC=FALSE)
setEvidenceLevel(evidences="all",organism=org.Sc.sgdORGANISM, gomap=org.Sc.sgdGO)
e <- GOenrichment(g[[46]], allGenes)


-> retrieving GO information for all available genes for organism 'Saccharomyces cerevisiae' in GO database
-> filtering GO terms according to evidence levels 'all'

Building most specific GOs .....
	( 1690 GO terms found. )

Build GO DAG topology ..........
	( 3645 GO terms and 8243 relations. )

Annotating nodes ...............
	( 1567 genes annotated to the GO terms. )

			 -- Elim Algorithm -- 

		 the algorithm is scoring 172 nontrivial nodes
		 parameters: 
			 test statistic: fisher
			 cutOff: 0.01

	 Level 14:	1 nodes to be scored	(0 eliminated genes)

	 Level 13:	3 nodes to be scored	(0 eliminated genes)

	 Level 12:	7 nodes to be scored	(0 eliminated genes)

	 Level 11:	9 nodes to be scored	(0 eliminated genes)

	 Level 10:	8 nodes to be scored	(1 eliminated genes)

	 Level 9:	17 nodes to be scored	(3 eliminated genes)

	 Level 8:	14 nodes to be scored	(8 eliminated genes)

	 Level 7:	20 nodes to be scored	(8 eliminated genes)

	 Level 6:	25 nodes to be scored	(8 eliminated genes)

	 Level 5:	28 nodes to be scored	(8 eliminated genes)

	 Level 4:	20 nodes to be scored	(28 eliminated genes)

	 Level 3:	12 nodes to be scored	(28 eliminated genes)

	 Level 2:	7 nodes to be scored	(28 eliminated genes)

	 Level 1:	1 nodes to be scored	(28 eliminated genes)

In [49]:
e


$GOTerms
go_idTermDefinition
15591GO:0018343 protein farnesylation The covalent attachment of a farnesyl group to a protein.
15594GO:0018344 protein geranylgeranylation The covalent attachment of a geranylgeranyl group to a protein.
16626GO:0006874 cellular calcium ion homeostasis Any process involved in the maintenance of an internal steady state of calcium ions at the level of a cell.
17047GO:0030010 establishment of cell polarity The specification and formation of anisotropic intracellular organization or cell growth patterns.
48636GO:0042127 regulation of cell proliferation Any process that modulates the frequency, rate or extent of cell proliferation.
79331GO:0070884 regulation of calcineurin-NFAT signaling cascade Any process that modulates the frequency, rate or extent of the calcineurin-NFAT signaling cascade.
$p.values
GO:0006874
0.00891715384596614
GO:0042127
0.00446713465220172
GO:0070884
0.00446713465220172
GO:0018343
1.71154584375543e-05
GO:0018344
0.000170063041559058
GO:0030010
0.00345128149568176
$genes
$`GO:0006874`
  1. 'YBR187W'
  2. 'YGL155W'
$`GO:0042127`
'YDL090C'
$`GO:0070884`
'YKL159C'
$`GO:0018343`
  1. 'YDL090C'
  2. 'YKL019W'
$`GO:0018344`
  1. 'YGL155W'
  2. 'YJL031C'
  3. 'YKL019W'
  4. 'YOR370C'
  5. 'YPR176C'
$`GO:0030010`
  1. 'YCR063W'
  2. 'YER093C'
  3. 'YER118C'
  4. 'YER149C'
  5. 'YFL039C'
  6. 'YGL054C'
  7. 'YGL155W'
  8. 'YGR014W'
  9. 'YGR058W'
  10. 'YGR262C'
  11. 'YHR115C'
  12. 'YHR129C'
  13. 'YIL144W'
  14. 'YLL049W'
  15. 'YLR319C'
  16. 'YMR294W'
  17. 'YNL116W'
  18. 'YOR127W'
  19. 'YOR301W'
  20. 'YPL161C'
  21. 'YPL174C'

In [88]:
goTerms <- e$GOTerms
p.values <- e$p.values

In [89]:
p.values.df <- data.frame(p.values)
p.values.df["go_id"] <- names(p.values)
p.values.df


p.valuesgo_id
GO:00068748.917154e-03GO:0006874
GO:00421274.467135e-03GO:0042127
GO:00708844.467135e-03GO:0070884
GO:00183431.711546e-05GO:0018343
GO:00183441.700630e-04GO:0018344
GO:00300103.451281e-03GO:0030010

In [90]:
goTerms <- merge(goTerms, p.values.df, by="go_id")

In [93]:
colnames(goTerms) <- c("GO_ID", "TERM", "DEFINITION", "P_VALUE")
head(goTerms)


GO_IDTERMDEFINITIONP_VALUE
GO:0006874 cellular calcium ion homeostasis Any process involved in the maintenance of an internal steady state of calcium ions at the level of a cell.8.917154e-03
GO:0018343 protein farnesylation The covalent attachment of a farnesyl group to a protein. 1.711546e-05
GO:0018344 protein geranylgeranylation The covalent attachment of a geranylgeranyl group to a protein. 1.700630e-04
GO:0030010 establishment of cell polarity The specification and formation of anisotropic intracellular organization or cell growth patterns. 3.451281e-03
GO:0042127 regulation of cell proliferation Any process that modulates the frequency, rate or extent of cell proliferation. 4.467135e-03
GO:0070884 regulation of calcineurin-NFAT signaling cascade Any process that modulates the frequency, rate or extent of the calcineurin-NFAT signaling cascade. 4.467135e-03

In [94]:
library(gridExtra)
grid.table(goTerms[,c("GO_ID", "TERM", "P_VALUE")])



In [46]:
g[[46]]


  1. 'YKL019W'
  2. 'YGL155W'
  3. 'YKL159C'
  4. 'YCR063W'
  5. 'YBR247C'
  6. 'YDL090C'
  7. 'YOL135C'

In [103]:
l <- as.list(org.Sc.sgdGO[["YKL019W"]])

In [116]:
gos <- sapply(l, function(i) i[["GOID"]])

In [122]:
t <- select(GO.db, keys=gos, columns=c("GOID","TERM","ONTOLOGY"))


'select()' returned many:1 mapping between keys and columns

In [123]:
grid.table(t)



In [ ]: