In [1]:
library(GO.db)
library(topGO)
library(GOSim)
library(org.Sc.sgd.db)
library(igraph)
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, cbind, colnames, do.call,
duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect,
is.unsorted, lapply, lengths, Map, mapply, match, mget, order,
paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
Reduce, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit, which, which.max, which.min
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: IRanges
Loading required package: S4Vectors
Attaching package: ‘S4Vectors’
The following objects are masked from ‘package:base’:
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: graph
Loading required package: SparseM
Attaching package: ‘SparseM’
The following object is masked from ‘package:base’:
backsolve
groupGOTerms: GOBPTerm, GOMFTerm, GOCCTerm environments built.
Attaching package: ‘topGO’
The following object is masked from ‘package:IRanges’:
members
Loading required package: annotate
Loading required package: XML
Attaching package: ‘XML’
The following object is masked from ‘package:graph’:
addNode
Attaching package: ‘igraph’
The following objects are masked from ‘package:topGO’:
algorithm, graph
The following objects are masked from ‘package:graph’:
degree, edges, intersection, union
The following objects are masked from ‘package:IRanges’:
simplify, union
The following objects are masked from ‘package:S4Vectors’:
compare, union
The following objects are masked from ‘package:BiocGenerics’:
normalize, union
The following objects are masked from ‘package:stats’:
decompose, spectrum
The following object is masked from ‘package:base’:
union
In [2]:
file <- "yeast_union"
ont <- "MF"
p <- 0.8
init <- 1
eta <- 0.0001
##load all community gene lists
setwd(sprintf("/home/david/Documents/ghsom/%s_communities_%s_%s", file, p, init, eta))
setOntology(ont, loadIC=TRUE)
setEvidenceLevel(evidences="all",organism=org.Sc.sgdORGANISM, gomap=org.Sc.sgdGO)
# calcICs()
db <- org.Sc.sgd.db
mapping <- "org.Sc.sgd.db"
ID <- "ENSEMBL"
#background gene list
backgroundFilename <- "all_genes.txt"
allGenes <- scan(backgroundFilename, character())
#load communities from file
g <- list()
numCom <- 0
filename <- sprintf("community_%s.txt", numCom)
while (file.exists(filename)) {
numCom <- numCom + 1
g[[numCom]] <- scan(filename, character())
filename <- sprintf("community_%s.txt", numCom)
}
#distances between neurons
shortest.path <- read.csv("shortest_path.csv", sep=",", header=FALSE)
initializing GOSim package ...
-> retrieving GO information for all available genes for organism 'human' in GO database
-> filtering GO terms according to evidence levels 'all'
-> loading files with information content for corresponding GO category (human)
finished.
-> loading files with information content for corresponding GO category (human)
-> retrieving GO information for all available genes for organism 'Saccharomyces cerevisiae' in GO database
-> filtering GO terms according to evidence levels 'all'
In [3]:
length(g)
11
In [22]:
calcICs()
calculating information contents for ontology MF using evidence codes 'all' (Saccharomyces cerevisiae) ...
done
In [4]:
allGeneNames <- scan(character(), file="../yeast_uetz_all_genes.txt")
allGenes <- allGeneNames[as.integer(allGenes)]
g <- sapply(g, function(i) allGeneNames[as.integer(i)])
Warning message in eval(expr, envir, enclos):
“NAs introduced by coercion”Warning message in FUN(X[[i]], ...):
“NAs introduced by coercion”Warning message in FUN(X[[i]], ...):
“NAs introduced by coercion”Warning message in FUN(X[[i]], ...):
“NAs introduced by coercion”Warning message in FUN(X[[i]], ...):
“NAs introduced by coercion”Warning message in FUN(X[[i]], ...):
“NAs introduced by coercion”Warning message in FUN(X[[i]], ...):
“NAs introduced by coercion”Warning message in FUN(X[[i]], ...):
“NAs introduced by coercion”Warning message in FUN(X[[i]], ...):
“NAs introduced by coercion”Warning message in FUN(X[[i]], ...):
“NAs introduced by coercion”Warning message in FUN(X[[i]], ...):
“NAs introduced by coercion”Warning message in FUN(X[[i]], ...):
“NAs introduced by coercion”
In [4]:
enrichments <- sapply(g, function(i) GOenrichment(i, allGenes, cutoff=0.05, method="weight01"))
Building most specific GOs .....
( 828 GO terms found. )
Build GO DAG topology ..........
( 1179 GO terms and 1446 relations. )
Annotating nodes ...............
( 1567 genes annotated to the GO terms. )
-- Weight01 Algorithm --
the algorithm is scoring 341 nontrivial nodes
parameters:
test statistic: fisher
Level 14: 1 nodes to be scored (0 eliminated genes)
Level 13: 2 nodes to be scored (0 eliminated genes)
Level 12: 2 nodes to be scored (4 eliminated genes)
Level 11: 5 nodes to be scored (9 eliminated genes)
Level 10: 7 nodes to be scored (10 eliminated genes)
Level 9: 14 nodes to be scored (42 eliminated genes)
Level 8: 30 nodes to be scored (80 eliminated genes)
Level 7: 49 nodes to be scored (267 eliminated genes)
Level 6: 71 nodes to be scored (327 eliminated genes)
Level 5: 69 nodes to be scored (491 eliminated genes)
Level 4: 53 nodes to be scored (698 eliminated genes)
Level 3: 29 nodes to be scored (894 eliminated genes)
Level 2: 8 nodes to be scored (1026 eliminated genes)
Level 1: 1 nodes to be scored (1197 eliminated genes)
Building most specific GOs .....
( 828 GO terms found. )
Build GO DAG topology ..........
( 1179 GO terms and 1446 relations. )
Annotating nodes ...............
( 1567 genes annotated to the GO terms. )
-- Weight01 Algorithm --
the algorithm is scoring 314 nontrivial nodes
parameters:
test statistic: fisher
Level 11: 3 nodes to be scored (0 eliminated genes)
Level 10: 6 nodes to be scored (0 eliminated genes)
Level 9: 13 nodes to be scored (20 eliminated genes)
Level 8: 27 nodes to be scored (57 eliminated genes)
Level 7: 44 nodes to be scored (270 eliminated genes)
Level 6: 62 nodes to be scored (327 eliminated genes)
Level 5: 63 nodes to be scored (477 eliminated genes)
Level 4: 55 nodes to be scored (654 eliminated genes)
Level 3: 31 nodes to be scored (874 eliminated genes)
Level 2: 9 nodes to be scored (1022 eliminated genes)
Level 1: 1 nodes to be scored (1206 eliminated genes)
Building most specific GOs .....
( 828 GO terms found. )
Build GO DAG topology ..........
( 1179 GO terms and 1446 relations. )
Annotating nodes ...............
( 1567 genes annotated to the GO terms. )
-- Weight01 Algorithm --
the algorithm is scoring 540 nontrivial nodes
parameters:
test statistic: fisher
Level 15: 1 nodes to be scored (0 eliminated genes)
Level 14: 2 nodes to be scored (0 eliminated genes)
Level 13: 2 nodes to be scored (6 eliminated genes)
Level 12: 4 nodes to be scored (9 eliminated genes)
Level 11: 5 nodes to be scored (9 eliminated genes)
Level 10: 9 nodes to be scored (15 eliminated genes)
Level 9: 21 nodes to be scored (44 eliminated genes)
Level 8: 49 nodes to be scored (86 eliminated genes)
Level 7: 85 nodes to be scored (275 eliminated genes)
Level 6: 129 nodes to be scored (363 eliminated genes)
Level 5: 111 nodes to be scored (560 eliminated genes)
Level 4: 77 nodes to be scored (795 eliminated genes)
Level 3: 34 nodes to be scored (1008 eliminated genes)
Level 2: 10 nodes to be scored (1125 eliminated genes)
Level 1: 1 nodes to be scored (1210 eliminated genes)
Building most specific GOs .....
( 828 GO terms found. )
Build GO DAG topology ..........
( 1179 GO terms and 1446 relations. )
Annotating nodes ...............
( 1567 genes annotated to the GO terms. )
-- Weight01 Algorithm --
the algorithm is scoring 322 nontrivial nodes
parameters:
test statistic: fisher
Level 10: 2 nodes to be scored (0 eliminated genes)
Level 9: 16 nodes to be scored (0 eliminated genes)
Level 8: 36 nodes to be scored (36 eliminated genes)
Level 7: 50 nodes to be scored (239 eliminated genes)
Level 6: 72 nodes to be scored (280 eliminated genes)
Level 5: 58 nodes to be scored (462 eliminated genes)
Level 4: 52 nodes to be scored (634 eliminated genes)
Level 3: 25 nodes to be scored (892 eliminated genes)
Level 2: 10 nodes to be scored (1032 eliminated genes)
Level 1: 1 nodes to be scored (1186 eliminated genes)
Building most specific GOs .....
( 828 GO terms found. )
Build GO DAG topology ..........
( 1179 GO terms and 1446 relations. )
Annotating nodes ...............
( 1567 genes annotated to the GO terms. )
-- Weight01 Algorithm --
the algorithm is scoring 244 nontrivial nodes
parameters:
test statistic: fisher
Level 12: 2 nodes to be scored (0 eliminated genes)
Level 11: 3 nodes to be scored (0 eliminated genes)
Level 10: 7 nodes to be scored (6 eliminated genes)
Level 9: 12 nodes to be scored (31 eliminated genes)
Level 8: 26 nodes to be scored (78 eliminated genes)
Level 7: 28 nodes to be scored (262 eliminated genes)
Level 6: 48 nodes to be scored (310 eliminated genes)
Level 5: 47 nodes to be scored (463 eliminated genes)
Level 4: 41 nodes to be scored (620 eliminated genes)
Level 3: 21 nodes to be scored (806 eliminated genes)
Level 2: 8 nodes to be scored (954 eliminated genes)
Level 1: 1 nodes to be scored (1159 eliminated genes)
Building most specific GOs .....
( 828 GO terms found. )
Build GO DAG topology ..........
( 1179 GO terms and 1446 relations. )
Annotating nodes ...............
( 1567 genes annotated to the GO terms. )
-- Weight01 Algorithm --
the algorithm is scoring 159 nontrivial nodes
parameters:
test statistic: fisher
Level 11: 1 nodes to be scored (0 eliminated genes)
Level 10: 2 nodes to be scored (0 eliminated genes)
Level 9: 5 nodes to be scored (6 eliminated genes)
Level 8: 10 nodes to be scored (40 eliminated genes)
Level 7: 10 nodes to be scored (218 eliminated genes)
Level 6: 27 nodes to be scored (258 eliminated genes)
Level 5: 37 nodes to be scored (336 eliminated genes)
Level 4: 36 nodes to be scored (481 eliminated genes)
Level 3: 22 nodes to be scored (813 eliminated genes)
Level 2: 8 nodes to be scored (957 eliminated genes)
Level 1: 1 nodes to be scored (1170 eliminated genes)
Building most specific GOs .....
( 828 GO terms found. )
Build GO DAG topology ..........
( 1179 GO terms and 1446 relations. )
Annotating nodes ...............
( 1567 genes annotated to the GO terms. )
-- Weight01 Algorithm --
the algorithm is scoring 240 nontrivial nodes
parameters:
test statistic: fisher
Level 12: 3 nodes to be scored (0 eliminated genes)
Level 11: 3 nodes to be scored (0 eliminated genes)
Level 10: 7 nodes to be scored (10 eliminated genes)
Level 9: 9 nodes to be scored (23 eliminated genes)
Level 8: 13 nodes to be scored (56 eliminated genes)
Level 7: 25 nodes to be scored (203 eliminated genes)
Level 6: 48 nodes to be scored (236 eliminated genes)
Level 5: 50 nodes to be scored (424 eliminated genes)
Level 4: 44 nodes to be scored (599 eliminated genes)
Level 3: 27 nodes to be scored (794 eliminated genes)
Level 2: 10 nodes to be scored (1004 eliminated genes)
Level 1: 1 nodes to be scored (1194 eliminated genes)
Building most specific GOs .....
( 828 GO terms found. )
Build GO DAG topology ..........
( 1179 GO terms and 1446 relations. )
Annotating nodes ...............
( 1567 genes annotated to the GO terms. )
-- Weight01 Algorithm --
the algorithm is scoring 419 nontrivial nodes
parameters:
test statistic: fisher
Level 15: 1 nodes to be scored (0 eliminated genes)
Level 14: 1 nodes to be scored (0 eliminated genes)
Level 13: 2 nodes to be scored (6 eliminated genes)
Level 12: 1 nodes to be scored (6 eliminated genes)
Level 11: 4 nodes to be scored (9 eliminated genes)
Level 10: 15 nodes to be scored (9 eliminated genes)
Level 9: 24 nodes to be scored (34 eliminated genes)
Level 8: 37 nodes to be scored (100 eliminated genes)
Level 7: 62 nodes to be scored (281 eliminated genes)
Level 6: 74 nodes to be scored (337 eliminated genes)
Level 5: 86 nodes to be scored (518 eliminated genes)
Level 4: 66 nodes to be scored (671 eliminated genes)
Level 3: 34 nodes to be scored (901 eliminated genes)
Level 2: 11 nodes to be scored (1049 eliminated genes)
Level 1: 1 nodes to be scored (1213 eliminated genes)
Building most specific GOs .....
( 828 GO terms found. )
Build GO DAG topology ..........
( 1179 GO terms and 1446 relations. )
Annotating nodes ...............
( 1567 genes annotated to the GO terms. )
-- Weight01 Algorithm --
the algorithm is scoring 502 nontrivial nodes
parameters:
test statistic: fisher
Level 15: 1 nodes to be scored (0 eliminated genes)
Level 14: 1 nodes to be scored (0 eliminated genes)
Level 13: 2 nodes to be scored (6 eliminated genes)
Level 12: 5 nodes to be scored (6 eliminated genes)
Level 11: 7 nodes to be scored (9 eliminated genes)
Level 10: 15 nodes to be scored (16 eliminated genes)
Level 9: 22 nodes to be scored (48 eliminated genes)
Level 8: 44 nodes to be scored (96 eliminated genes)
Level 7: 69 nodes to be scored (285 eliminated genes)
Level 6: 116 nodes to be scored (347 eliminated genes)
Level 5: 99 nodes to be scored (531 eliminated genes)
Level 4: 74 nodes to be scored (750 eliminated genes)
Level 3: 33 nodes to be scored (1003 eliminated genes)
Level 2: 13 nodes to be scored (1112 eliminated genes)
Level 1: 1 nodes to be scored (1212 eliminated genes)
Building most specific GOs .....
( 828 GO terms found. )
Build GO DAG topology ..........
( 1179 GO terms and 1446 relations. )
Annotating nodes ...............
( 1567 genes annotated to the GO terms. )
-- Weight01 Algorithm --
the algorithm is scoring 197 nontrivial nodes
parameters:
test statistic: fisher
Level 15: 1 nodes to be scored (0 eliminated genes)
Level 14: 1 nodes to be scored (0 eliminated genes)
Level 13: 2 nodes to be scored (6 eliminated genes)
Level 12: 1 nodes to be scored (6 eliminated genes)
Level 11: 2 nodes to be scored (9 eliminated genes)
Level 10: 4 nodes to be scored (9 eliminated genes)
Level 9: 7 nodes to be scored (18 eliminated genes)
Level 8: 11 nodes to be scored (71 eliminated genes)
Level 7: 17 nodes to be scored (261 eliminated genes)
Level 6: 42 nodes to be scored (278 eliminated genes)
Level 5: 45 nodes to be scored (381 eliminated genes)
Level 4: 33 nodes to be scored (524 eliminated genes)
Level 3: 22 nodes to be scored (801 eliminated genes)
Level 2: 8 nodes to be scored (907 eliminated genes)
Level 1: 1 nodes to be scored (1180 eliminated genes)
Building most specific GOs .....
( 828 GO terms found. )
Build GO DAG topology ..........
( 1179 GO terms and 1446 relations. )
Annotating nodes ...............
( 1567 genes annotated to the GO terms. )
-- Weight01 Algorithm --
the algorithm is scoring 210 nontrivial nodes
parameters:
test statistic: fisher
Level 15: 1 nodes to be scored (0 eliminated genes)
Level 14: 2 nodes to be scored (0 eliminated genes)
Level 13: 2 nodes to be scored (6 eliminated genes)
Level 12: 1 nodes to be scored (9 eliminated genes)
Level 11: 1 nodes to be scored (9 eliminated genes)
Level 10: 2 nodes to be scored (9 eliminated genes)
Level 9: 6 nodes to be scored (12 eliminated genes)
Level 8: 12 nodes to be scored (15 eliminated genes)
Level 7: 22 nodes to be scored (236 eliminated genes)
Level 6: 44 nodes to be scored (254 eliminated genes)
Level 5: 41 nodes to be scored (441 eliminated genes)
Level 4: 44 nodes to be scored (618 eliminated genes)
Level 3: 22 nodes to be scored (843 eliminated genes)
Level 2: 9 nodes to be scored (1021 eliminated genes)
Level 1: 1 nodes to be scored (1179 eliminated genes)
In [12]:
enrichedGOTerms <- function(genes, allGenes, cutoff, correction, ont, mapping, ID, algorithm){
interestingGenes <- factor(as.integer(allGenes %in% genes))
names(interestingGenes) <- allGenes
GOdata <- new("topGOdata", description=sprintf("topGO object"),
ontology = ont, allGenes = interestingGenes,
annotationFun = annFUN.org, mapping = mapping,
ID = ID, nodeSize = 1)
result <- runTest(GOdata, algorithm = algorithm, statistic = "fisher")
if (correction){
GOs <- score(result)[which(p.adjust(score(result), method="BH") <= cutoff)]
} else {
GOs <- score(result)[score(result) <= cutoff]
}
plot <- showSigOfNodes(GOdata, score(result), firstSigNodes = 10, useInfo ='all', swPlot = FALSE)
return(list(GOdata, GOs, plot))
}
In [13]:
enrichedGOs <- sapply(g, enrichedGOTerms, allGenes=allGenes,
cutoff=0.05, correction=FALSE, ont=ont, mapping=mapping, ID=ID, algorithm="weight01")
Building most specific GOs .....
( 275 GO terms found. )
Build GO DAG topology ..........
( 484 GO terms and 597 relations. )
Annotating nodes ...............
( 250 genes annotated to the GO terms. )
-- Weight01 Algorithm --
the algorithm is scoring 158 nontrivial nodes
parameters:
test statistic: fisher
Level 15: 1 nodes to be scored (0 eliminated genes)
Level 14: 1 nodes to be scored (0 eliminated genes)
Level 13: 2 nodes to be scored (1 eliminated genes)
Level 12: 1 nodes to be scored (1 eliminated genes)
Level 11: 1 nodes to be scored (2 eliminated genes)
Level 10: 2 nodes to be scored (2 eliminated genes)
Level 9: 7 nodes to be scored (4 eliminated genes)
Level 8: 10 nodes to be scored (5 eliminated genes)
Level 7: 17 nodes to be scored (45 eliminated genes)
Level 6: 27 nodes to be scored (46 eliminated genes)
Level 5: 29 nodes to be scored (69 eliminated genes)
Level 4: 29 nodes to be scored (92 eliminated genes)
Level 3: 22 nodes to be scored (122 eliminated genes)
Level 2: 8 nodes to be scored (141 eliminated genes)
Level 1: 1 nodes to be scored (178 eliminated genes)
Building most specific GOs .....
( 275 GO terms found. )
Build GO DAG topology ..........
( 484 GO terms and 597 relations. )
Annotating nodes ...............
( 250 genes annotated to the GO terms. )
-- Weight01 Algorithm --
the algorithm is scoring 214 nontrivial nodes
parameters:
test statistic: fisher
Level 12: 1 nodes to be scored (0 eliminated genes)
Level 11: 2 nodes to be scored (0 eliminated genes)
Level 10: 6 nodes to be scored (1 eliminated genes)
Level 9: 10 nodes to be scored (6 eliminated genes)
Level 8: 16 nodes to be scored (16 eliminated genes)
Level 7: 23 nodes to be scored (41 eliminated genes)
Level 6: 42 nodes to be scored (47 eliminated genes)
Level 5: 48 nodes to be scored (81 eliminated genes)
Level 4: 38 nodes to be scored (109 eliminated genes)
Level 3: 20 nodes to be scored (138 eliminated genes)
Level 2: 7 nodes to be scored (147 eliminated genes)
Level 1: 1 nodes to be scored (172 eliminated genes)
Building most specific GOs .....
( 275 GO terms found. )
Build GO DAG topology ..........
( 484 GO terms and 597 relations. )
Annotating nodes ...............
( 250 genes annotated to the GO terms. )
-- Weight01 Algorithm --
the algorithm is scoring 96 nontrivial nodes
parameters:
test statistic: fisher
Level 11: 1 nodes to be scored (0 eliminated genes)
Level 10: 1 nodes to be scored (0 eliminated genes)
Level 9: 6 nodes to be scored (1 eliminated genes)
Level 8: 7 nodes to be scored (7 eliminated genes)
Level 7: 13 nodes to be scored (33 eliminated genes)
Level 6: 19 nodes to be scored (38 eliminated genes)
Level 5: 16 nodes to be scored (62 eliminated genes)
Level 4: 14 nodes to be scored (84 eliminated genes)
Level 3: 15 nodes to be scored (101 eliminated genes)
Level 2: 3 nodes to be scored (112 eliminated genes)
Level 1: 1 nodes to be scored (160 eliminated genes)
Building most specific GOs .....
( 275 GO terms found. )
Build GO DAG topology ..........
( 484 GO terms and 597 relations. )
Annotating nodes ...............
( 250 genes annotated to the GO terms. )
-- Weight01 Algorithm --
the algorithm is scoring 144 nontrivial nodes
parameters:
test statistic: fisher
Level 12: 1 nodes to be scored (0 eliminated genes)
Level 11: 1 nodes to be scored (0 eliminated genes)
Level 10: 2 nodes to be scored (1 eliminated genes)
Level 9: 4 nodes to be scored (4 eliminated genes)
Level 8: 8 nodes to be scored (6 eliminated genes)
Level 7: 16 nodes to be scored (39 eliminated genes)
Level 6: 27 nodes to be scored (42 eliminated genes)
Level 5: 28 nodes to be scored (71 eliminated genes)
Level 4: 26 nodes to be scored (89 eliminated genes)
Level 3: 23 nodes to be scored (125 eliminated genes)
Level 2: 7 nodes to be scored (143 eliminated genes)
Level 1: 1 nodes to be scored (179 eliminated genes)
Building most specific GOs .....
( 275 GO terms found. )
Build GO DAG topology ..........
( 484 GO terms and 597 relations. )
Annotating nodes ...............
( 250 genes annotated to the GO terms. )
-- Weight01 Algorithm --
the algorithm is scoring 150 nontrivial nodes
parameters:
test statistic: fisher
Level 11: 2 nodes to be scored (0 eliminated genes)
Level 10: 4 nodes to be scored (0 eliminated genes)
Level 9: 6 nodes to be scored (2 eliminated genes)
Level 8: 10 nodes to be scored (11 eliminated genes)
Level 7: 16 nodes to be scored (38 eliminated genes)
Level 6: 28 nodes to be scored (44 eliminated genes)
Level 5: 30 nodes to be scored (78 eliminated genes)
Level 4: 26 nodes to be scored (107 eliminated genes)
Level 3: 19 nodes to be scored (129 eliminated genes)
Level 2: 8 nodes to be scored (142 eliminated genes)
Level 1: 1 nodes to be scored (172 eliminated genes)
Building most specific GOs .....
( 275 GO terms found. )
Build GO DAG topology ..........
( 484 GO terms and 597 relations. )
Annotating nodes ...............
( 250 genes annotated to the GO terms. )
-- Weight01 Algorithm --
the algorithm is scoring 241 nontrivial nodes
parameters:
test statistic: fisher
Level 14: 1 nodes to be scored (0 eliminated genes)
Level 13: 2 nodes to be scored (0 eliminated genes)
Level 12: 1 nodes to be scored (1 eliminated genes)
Level 11: 1 nodes to be scored (2 eliminated genes)
Level 10: 4 nodes to be scored (2 eliminated genes)
Level 9: 9 nodes to be scored (4 eliminated genes)
Level 8: 22 nodes to be scored (15 eliminated genes)
Level 7: 32 nodes to be scored (46 eliminated genes)
Level 6: 47 nodes to be scored (54 eliminated genes)
Level 5: 48 nodes to be scored (91 eliminated genes)
Level 4: 39 nodes to be scored (111 eliminated genes)
Level 3: 24 nodes to be scored (129 eliminated genes)
Level 2: 10 nodes to be scored (146 eliminated genes)
Level 1: 1 nodes to be scored (175 eliminated genes)
In [5]:
p.values <- enrichments[2,]
In [6]:
lengths(p.values)
- 11
- 10
- 28
- 17
- 9
- 12
- 12
- 21
- 25
- 22
- 16
In [7]:
lengths(g)
- 137
- 114
- 296
- 166
- 84
- 58
- 91
- 279
- 275
- 59
- 88
In [8]:
shortest.path <- shortest.path[lengths(p.values) > 0, lengths(p.values) > 0]
In [9]:
g <- g[sapply(p.values, function(i) length(i) > 0)]
p.values <- p.values[sapply(p.values, function(i) length(i) > 0)]
In [10]:
minimumSubsumers <- function(gos) sapply(gos, function(i) sapply(gos, function (j) getMinimumSubsumer(i, j)))
allMinimumSubsumers <- sapply(p.values, function(i)
sort(summary(as.factor(minimumSubsumers(names(i)))), decreasing = TRUE))
In [11]:
allMinimumSubsumers
- GO:0003674
- 56
- GO:0003824
- 22
- GO:0016740
- 16
- GO:0005488
- 12
- GO:0004672
- 2
- GO:0019787
- 2
- GO:0003677
- 1
- GO:0004049
- 1
- GO:0004596
- 1
- GO:0004712
- 1
- GO:0004713
- 1
- GO:0015616
- 1
- GO:0019789
- 1
- GO:0031625
- 1
- GO:0061630
- 1
- GO:0070273
- 1
- GO:0070300
- 1
- GO:0003674
- 64
- GO:0003824
- 18
- GO:0005488
- 4
- GO:0016772
- 2
- GO:0097159
- 2
- GO:0000049
- 1
- GO:0000384
- 1
- GO:0000386
- 1
- GO:0001671
- 1
- GO:0003677
- 1
- GO:0003887
- 1
- GO:0005545
- 1
- GO:0009927
- 1
- GO:0009982
- 1
- GO:0015248
- 1
- GO:0003674
- 440
- GO:0003824
- 172
- GO:0016740
- 32
- GO:0005488
- 30
- GO:0016772
- 30
- GO:0016773
- 12
- GO:0005515
- 10
- GO:0004396
- 6
- GO:0016491
- 4
- GO:0016779
- 4
- GO:0004674
- 2
- GO:0016616
- 2
- GO:0016765
- 2
- GO:0016787
- 2
- GO:0016829
- 2
- GO:0019899
- 2
- GO:0030234
- 2
- GO:0070566
- 2
- GO:0000121
- 1
- GO:0000309
- 1
- GO:0001054
- 1
- GO:0001094
- 1
- GO:0003939
- 1
- GO:0004340
- 1
- GO:0004359
- 1
- GO:0004449
- 1
- GO:0004478
- 1
- GO:0004515
- 1
- GO:0004662
- 1
- GO:0004693
- 1
- GO:0004707
- 1
- GO:0005536
- 1
- GO:0008139
- 1
- GO:0008270
- 1
- GO:0008379
- 1
- GO:0008536
- 1
- GO:0008565
- 1
- GO:0008599
- 1
- GO:0008865
- 1
- GO:0016538
- 1
- GO:0017112
- 1
- GO:0019158
- 1
- GO:0019172
- 1
- GO:0019901
- 1
- GO:0036381
- 1
- GO:0042393
- 1
- GO:0003674
- 150
- GO:0003824
- 70
- GO:0016740
- 12
- GO:0016788
- 8
- GO:0052866
- 8
- GO:0000030
- 6
- GO:0005515
- 6
- GO:0015144
- 2
- GO:0015291
- 2
- GO:0016747
- 2
- GO:0022857
- 2
- GO:0034595
- 2
- GO:0052744
- 2
- GO:0000009
- 1
- GO:0000026
- 1
- GO:0000149
- 1
- GO:0003756
- 1
- GO:0004026
- 1
- GO:0004169
- 1
- GO:0004438
- 1
- GO:0004439
- 1
- GO:0004806
- 1
- GO:0005078
- 1
- GO:0005351
- 1
- GO:0005355
- 1
- GO:0005484
- 1
- GO:0015297
- 1
- GO:0043812
- 1
- GO:0043813
- 1
- GO:0050291
- 1
- GO:0003674
- 54
- GO:0003824
- 8
- GO:0005488
- 4
- GO:0016740
- 2
- GO:0016787
- 2
- GO:0097159
- 2
- GO:0000978
- 1
- GO:0001077
- 1
- GO:0001135
- 1
- GO:0003887
- 1
- GO:0003924
- 1
- GO:0004663
- 1
- GO:0004725
- 1
- GO:0005525
- 1
- GO:0030276
- 1
- GO:0003674
- 90
- GO:0003824
- 26
- GO:0005488
- 10
- GO:0005515
- 2
- GO:0016740
- 2
- GO:0016787
- 2
- GO:0003962
- 1
- GO:0004014
- 1
- GO:0004067
- 1
- GO:0004347
- 1
- GO:0004722
- 1
- GO:0005543
- 1
- GO:0010997
- 1
- GO:0015085
- 1
- GO:0017056
- 1
- GO:0030170
- 1
- GO:0051087
- 1
- GO:0061630
- 1
- GO:0003674
- 82
- GO:0003824
- 18
- GO:0005488
- 14
- GO:0016787
- 6
- GO:0097159
- 6
- GO:0008094
- 4
- GO:0004003
- 2
- GO:0000150
- 1
- GO:0000400
- 1
- GO:0003689
- 1
- GO:0003729
- 1
- GO:0004520
- 1
- GO:0004791
- 1
- GO:0005048
- 1
- GO:0005487
- 1
- GO:0008270
- 1
- GO:0017056
- 1
- GO:0043140
- 1
- GO:0043141
- 1
- GO:0003674
- 298
- GO:0005488
- 54
- GO:0003824
- 34
- GO:0005515
- 10
- GO:0005198
- 6
- GO:0016740
- 4
- GO:0097159
- 4
- GO:0001167
- 2
- GO:0003690
- 2
- GO:0008134
- 2
- GO:0016772
- 2
- GO:0016787
- 2
- GO:0000384
- 1
- GO:0001097
- 1
- GO:0001164
- 1
- GO:0001168
- 1
- GO:0001187
- 1
- GO:0003735
- 1
- GO:0004691
- 1
- GO:0004723
- 1
- GO:0004749
- 1
- GO:0005200
- 1
- GO:0008121
- 1
- GO:0017025
- 1
- GO:0017056
- 1
- GO:0019776
- 1
- GO:0030943
- 1
- GO:0030983
- 1
- GO:0032093
- 1
- GO:0036402
- 1
- GO:0051010
- 1
- GO:0051537
- 1
- GO:0051880
- 1
- GO:0003674
- 318
- GO:0003824
- 182
- GO:0005488
- 30
- GO:0016740
- 24
- GO:0016787
- 16
- GO:0005515
- 6
- GO:0016301
- 4
- GO:0016491
- 4
- GO:0097159
- 4
- GO:0008135
- 2
- GO:0016684
- 2
- GO:0016773
- 2
- GO:0016788
- 2
- GO:0016860
- 2
- GO:0016887
- 2
- GO:0003743
- 1
- GO:0003746
- 1
- GO:0003873
- 1
- GO:0003876
- 1
- GO:0003955
- 1
- GO:0004165
- 1
- GO:0004331
- 1
- GO:0004373
- 1
- GO:0004535
- 1
- GO:0004602
- 1
- GO:0004679
- 1
- GO:0004849
- 1
- GO:0005085
- 1
- GO:0008092
- 1
- GO:0008574
- 1
- GO:0010181
- 1
- GO:0015616
- 1
- GO:0030234
- 1
- GO:0031683
- 1
- GO:0034450
- 1
- GO:0042800
- 1
- GO:0043022
- 1
- GO:0043130
- 1
- GO:0046523
- 1
- GO:0047066
- 1
- GO:0003674
- 366
- GO:0003824
- 42
- GO:0005488
- 22
- GO:0016740
- 10
- GO:0097159
- 4
- GO:0001104
- 3
- GO:0001191
- 3
- GO:0000977
- 2
- GO:0000981
- 2
- GO:0001076
- 2
- GO:0005515
- 2
- GO:0016772
- 2
- GO:0030234
- 2
- GO:0042623
- 2
- GO:0000978
- 1
- GO:0000979
- 1
- GO:0001075
- 1
- GO:0001077
- 1
- GO:0001106
- 1
- GO:0004638
- 1
- GO:0004639
- 1
- GO:0004644
- 1
- GO:0005096
- 1
- GO:0008270
- 1
- GO:0008897
- 1
- GO:0016247
- 1
- GO:0030362
- 1
- GO:0032947
- 1
- GO:0043142
- 1
- GO:0046961
- 1
- GO:0046982
- 1
- GO:0061630
- 1
- GO:0070180
- 1
- GO:1901916
- 1
- GO:0003674
- 154
- GO:0016787
- 30
- GO:0003824
- 28
- GO:0004553
- 10
- GO:0005488
- 10
- GO:0005515
- 2
- GO:0016853
- 2
- GO:0052736
- 2
- GO:0055106
- 2
- GO:0001075
- 1
- GO:0003755
- 1
- GO:0004151
- 1
- GO:0004512
- 1
- GO:0005047
- 1
- GO:0016887
- 1
- GO:0017108
- 1
- GO:0042277
- 1
- GO:0042973
- 1
- GO:0043495
- 1
- GO:0048487
- 1
- GO:0051670
- 1
- GO:0052861
- 1
- GO:0052862
- 1
- GO:0055105
- 1
- GO:0097027
- 1
In [12]:
minGO <- sapply(p.values, function(i) names(i)[which.min(i)])
maxGO <- sapply(p.values, function(i) names(i)[which.max(i)])
In [13]:
select(GO.db, keys=minGO, columns=c("TERM","DEFINITION"))
'select()' returned 1:1 mapping between keys and columns
GOID TERM DEFINITION
GO:0061630 ubiquitin protein ligase activity Catalysis of the transfer of ubiquitin to a substrate protein via the reaction X-ubiquitin + S --> X + S-ubiquitin, where X is either an E2 or E3 enzyme, the X-ubiquitin linkage is a thioester bond, and the S-ubiquitin linkage is an isopeptide bond between the C-terminal glycine of ubiquitin and the epsilon-amino group of lysine residues in the substrate. Note that this may include the extension of ubiquitin chains.
GO:0003677 DNA binding Any molecular function by which a gene product interacts selectively and non-covalently with DNA (deoxyribonucleic acid).
GO:0008139 nuclear localization sequence binding Interacting selectively and non-covalently with a nuclear localization sequence, a specific peptide sequence that acts as a signal to localize the protein within the nucleus.
GO:0000026 alpha-1,2-mannosyltransferase activity Catalysis of the transfer of a mannose residue to an oligosaccharide, forming an alpha-(1->2) linkage.
GO:0030276 clathrin binding Interacting selectively and non-covalently with a clathrin heavy or light chain, the main components of the coat of coated vesicles and coated pits, and which also occurs in synaptic vesicles.
GO:0004722 protein serine/threonine phosphatase activity Catalysis of the reaction: protein serine phosphate + H2O = protein serine + phosphate, and protein threonine phosphate + H2O = protein threonine + phosphate.
GO:0003689 DNA clamp loader activity Catalysis of the reaction: ATP + H2O = ADP + phosphate, to drive the opening of the ring structure of the PCNA complex, or any of the related sliding clamp complexes, and their closing around the DNA duplex.
GO:0051010 microtubule plus-end binding Interacting selectively and non-covalently with the plus end of a microtubule.
GO:0042800 histone methyltransferase activity (H3-K4 specific) Catalysis of the reaction: S-adenosyl-L-methionine + histone H3 L-lysine (position 4) = S-adenosyl-L-homocysteine + histone H3 N6-methyl-L-lysine (position 4). This reaction is the addition of a methyl group onto lysine at position 4 of the histone H3 protein.
GO:0001104 RNA polymerase II transcription cofactor activity Interacting selectively and non-covalently with an RNA polymerase II (RNAP II) regulatory transcription factor and also with the RNAP II basal transcription machinery in order to modulate transcription. Cofactors generally do not bind DNA, but rather mediate protein-protein interactions between regulatory transcription factors and the basal RNAP II transcription machinery.
GO:0005047 signal recognition particle binding Interacting selectively and non-covalently with the signal recognition particle.
In [14]:
select(GO.db, keys=maxGO, columns=c("TERM","DEFINITION"))
'select()' returned many:1 mapping between keys and columns
GOID TERM DEFINITION
GO:0015616 DNA translocase activity Catalysis of the reaction: ATP + H2O = ADP + phosphate, to drive movement along a single- or double-stranded DNA molecule.
GO:0003887 DNA-directed DNA polymerase activity Catalysis of the reaction: deoxynucleoside triphosphate + DNA(n) = diphosphate + DNA(n+1); the synthesis of DNA from deoxyribonucleotide triphosphates in the presence of a DNA template and a 3'hydroxyl group.
GO:0008599 protein phosphatase type 1 regulator activity Modulation of the activity of the enzyme protein phosphatase type 1.
GO:0015297 antiporter activity Enables the active transport of a solute across a membrane by a mechanism whereby two or more species are transported in opposite directions in a tightly coupled process not directly linked to a form of energy other than chemiosmotic energy. The reaction is: solute A(out) + solute B(in) = solute A(in) + solute B(out).
GO:0004725 protein tyrosine phosphatase activity Catalysis of the reaction: protein tyrosine phosphate + H2O = protein tyrosine + phosphate.
GO:0030170 pyridoxal phosphate binding Interacting selectively and non-covalently with pyridoxal 5' phosphate, 3-hydroxy-5-(hydroxymethyl)-2-methyl4-pyridine carboxaldehyde 5' phosphate, the biologically active form of vitamin B6.
GO:0008270 zinc ion binding Interacting selectively and non-covalently with zinc (Zn) ions.
GO:0017056 structural constituent of nuclear pore The action of a molecule that contributes to the structural integrity of the nuclear pore complex.
GO:0043130 ubiquitin binding Interacting selectively and non-covalently with ubiquitin, a protein that when covalently bound to other cellular proteins marks them for proteolytic degradation.
GO:0008270 zinc ion binding Interacting selectively and non-covalently with zinc (Zn) ions.
GO:0042973 glucan endo-1,3-beta-D-glucosidase activity Catalysis of the hydrolysis of (1->3)-beta-D-glucosidic linkages in (1->3)-beta-D-glucans.
In [15]:
head(shortest.path)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
0 1 1 1 1 2 1 2 2 2 3
1 0 1 2 2 2 1 1 2 2 3
1 1 0 1 1 1 2 2 1 3 2
1 2 1 0 2 2 2 3 2 3 3
1 2 1 2 0 1 2 3 2 3 2
2 2 1 2 1 0 3 3 1 4 1
In [ ]:
fall <- function(i) !all(is.na(i))
fany <- function(i) !any(is.na(i))
l <- length(g)
geneSims <- sapply(1:l, function(i) {
sapply(i:l, function(j){
if (i == j){
return(1)
} else {
#gene sims as dataframe
t <- getGeneSim(g[[i]], g[[j]], similarity="max", similarityTerm="Resnik", normalization=TRUE)
##remove na columns and rows
t <- t[apply(t, 1, fall), apply(t, 2, fall)]
##remove any remaining rows with nan
t <- t[apply(t, 1, fany),]
##BMA
return((sum(apply(t, 1, max)) + sum(apply(t, 2, max))) / (nrow(t) + ncol(t)))
}
})
})
filtering out genes not mapping to the currently set GO category ... ===> list of 137 genes reduced to 126
filtering out genes not mapping to the currently set GO category ... ===> list of 114 genes reduced to 111
Warning message in getGeneSim(g[[i]], g[[j]], similarity = "max", similarityTerm = "Resnik", :
“Similarity matrix contains values > 1! This may happen with simlarity='funSimMax', if one gene's GO annotation is a complete subset of another gene's GO annotation.”
filtering out genes not mapping to the currently set GO category ... ===> list of 137 genes reduced to 126
filtering out genes not mapping to the currently set GO category ... ===> list of 296 genes reduced to 286
Warning message in getGeneSim(g[[i]], g[[j]], similarity = "max", similarityTerm = "Resnik", :
“Similarity matrix contains values > 1! This may happen with simlarity='funSimMax', if one gene's GO annotation is a complete subset of another gene's GO annotation.”
filtering out genes not mapping to the currently set GO category ... ===> list of 137 genes reduced to 126
filtering out genes not mapping to the currently set GO category ... ===> list of 166 genes reduced to 154
Warning message in getGeneSim(g[[i]], g[[j]], similarity = "max", similarityTerm = "Resnik", :
“Similarity matrix contains values > 1! This may happen with simlarity='funSimMax', if one gene's GO annotation is a complete subset of another gene's GO annotation.”
filtering out genes not mapping to the currently set GO category ... ===> list of 137 genes reduced to 126
filtering out genes not mapping to the currently set GO category ... ===> list of 84 genes reduced to 83
Warning message in getGeneSim(g[[i]], g[[j]], similarity = "max", similarityTerm = "Resnik", :
“Similarity matrix contains values > 1! This may happen with simlarity='funSimMax', if one gene's GO annotation is a complete subset of another gene's GO annotation.”
filtering out genes not mapping to the currently set GO category ... ===> list of 137 genes reduced to 126
filtering out genes not mapping to the currently set GO category ... ===> list of 58 genes reduced to 55
Warning message in getGeneSim(g[[i]], g[[j]], similarity = "max", similarityTerm = "Resnik", :
“Similarity matrix contains values > 1! This may happen with simlarity='funSimMax', if one gene's GO annotation is a complete subset of another gene's GO annotation.”
filtering out genes not mapping to the currently set GO category ... ===> list of 137 genes reduced to 126
filtering out genes not mapping to the currently set GO category ... ===> list of 91 genes reduced to 85
Warning message in getGeneSim(g[[i]], g[[j]], similarity = "max", similarityTerm = "Resnik", :
“Similarity matrix contains values > 1! This may happen with simlarity='funSimMax', if one gene's GO annotation is a complete subset of another gene's GO annotation.”
filtering out genes not mapping to the currently set GO category ... ===> list of 137 genes reduced to 126
filtering out genes not mapping to the currently set GO category ... ===> list of 279 genes reduced to 266
Warning message in getGeneSim(g[[i]], g[[j]], similarity = "max", similarityTerm = "Resnik", :
“Similarity matrix contains values > 1! This may happen with simlarity='funSimMax', if one gene's GO annotation is a complete subset of another gene's GO annotation.”
In [ ]:
head(geneSims)
In [ ]:
geneSimsDF <- sapply(1:length(g), function(i)
sapply(1:length(g), function(j) {
if (i <= j){
x <- i
y <- j - i + 1
} else {
x <- j
y <- i - j + 1
}
return(geneSims[[x]] [[y]]) }))
In [ ]:
names <- character()
for (i in 1:length(g)){
names <- c(names, sprintf("Com %s", i))
}
In [ ]:
rownames(geneSimsDF) <- names
colnames(geneSimsDF) <- names
geneSimsDF <- round(geneSimsDF, 3)
geneSimsDF
In [ ]:
library(gridExtra)
grid.table(geneSimsDF)
In [ ]:
rownames(shortest.path) <- names
colnames(shortest.path) <- names
head(shortest.path)
In [ ]:
shortest.path
In [ ]:
library(gridExtra)
grid.table(shortest.path)
In [58]:
library(GOSemSim)
GOSemSim v2.0.4 For help: https://guangchuangyu.github.io/GOSemSim
If you use GOSemSim in published research, please cite:
Guangchuang Yu, Fei Li, Yide Qin, Xiaochen Bo, Yibo Wu, Shengqi Wang. GOSemSim: an R package for measuring semantic similarity among GO terms and gene products Bioinformatics 2010, 26(7):976-978
In [59]:
scGO <- godata(OrgDb = mapping, keytype = ID, ont = ont)
[1] "preparing gene to GO mapping data..."
[1] "preparing IC data..."
In [61]:
wangClusters <- mclusterSim(clusters = g, semData = scGO, measure = "Wang", combine = "BMA")
In [62]:
rownames(wangClusters) <- names
colnames(wangClusters) <- names
In [63]:
wangClusters
Com 1 Com 2 Com 3 Com 4 Com 5 Com 6
Com 1 1.000 0.607 0.511 0.705 0.692 0.559
Com 2 0.607 1.000 0.530 0.673 0.684 0.622
Com 3 0.511 0.530 1.000 0.642 0.605 0.540
Com 4 0.705 0.673 0.642 1.000 0.720 0.618
Com 5 0.692 0.684 0.605 0.720 1.000 0.637
Com 6 0.559 0.622 0.540 0.618 0.637 1.000
In [64]:
library(gridExtra)
grid.table(wangClusters)
In [65]:
resnikClusters <- mclusterSim(clusters = g, semData = scGO, measure = "Resnik", combine = "BMA")
In [66]:
rownames(resnikClusters) <- names
colnames(resnikClusters) <- names
In [67]:
library(gridExtra)
grid.table(resnikClusters)
In [68]:
distances <- numeric(length = (numCom * (numCom - 1)) / 2)
semSims <- numeric(length = (numCom * (numCom - 1)) / 2)
completed <- 0
for (c1 in 1:length(g)) {
for (c2 in c1:length(g)) {
if (c1 == c2) next
completed <- completed + 1
semSims[completed] <- wangClusters[c1, c2]
distances[completed] <- shortest.path[c1, c2]
print(sprintf("Completed: %s", completed))
}
}
[1] "Completed: 1"
[1] "Completed: 2"
[1] "Completed: 3"
[1] "Completed: 4"
[1] "Completed: 5"
[1] "Completed: 6"
[1] "Completed: 7"
[1] "Completed: 8"
[1] "Completed: 9"
[1] "Completed: 10"
[1] "Completed: 11"
[1] "Completed: 12"
[1] "Completed: 13"
[1] "Completed: 14"
[1] "Completed: 15"
In [69]:
plot(distances, semSims, xlab="Distance on Map", ylab="Wang semantic similarity")
In [ ]:
Content source: DavidMcDonald1993/ghsom
Similar notebooks: