In [1]:
require("topGO")
library(genefilter)
setwd("/home/gstupp/projects/Wolan/cmoon/CM7_CM1E2d56col_unenr123_rawextract_2017/")
Loading required package: topGO
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl,
intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unlist, unsplit
Loading required package: graph
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: GO.db
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: IRanges
Loading required package: S4Vectors
Loading required package: SparseM
Attaching package: ‘SparseM’
The following object is masked from ‘package:base’:
backsolve
Warning message:
“replacing previous import by ‘graph::.__C__dist’ when loading ‘topGO’”
groupGOTerms: GOBPTerm, GOMFTerm, GOCCTerm environments built.
Attaching package: ‘topGO’
The following object is masked from ‘package:IRanges’:
members
Attaching package: ‘genefilter’
The following object is masked from ‘package:base’:
anyNA
In [2]:
# load in the deseq results
load("out/res_rag_wt")
In [3]:
# load in protein cluster ID -> GO terms mapping
geneID2GO <- readMappings("out/geneid2go.map")
head(geneID2GO)
- $`22855681`
- 'GO:0006090'
- 'GO:0016301'
- 'GO:0003824'
- 'GO:0016310'
- 'GO:0016772'
- 'GO:0050242'
- 'GO:0005524'
- $`49973931`
- 'GO:0050485'
- 'GO:0055114'
- $`61502806`
- 'GO:0005509'
- 'GO:0005783'
- 'GO:0005515'
- 'GO:0051082'
- 'GO:0006457'
- $`17629190`
- 'GO:0006090'
- 'GO:0016301'
- 'GO:0003824'
- 'GO:0016310'
- 'GO:0016772'
- 'GO:0050242'
- 'GO:0005524'
- $`27803655`
- 'GO:0016616'
- 'GO:0016491'
- 'GO:0006631'
- 'GO:0055114'
- 'GO:0003857'
- 'GO:0070403'
- 'GO:0050662'
- $`52783963`
- 'GO:0016620'
- 'GO:0055114'
In [4]:
# load in the count data
load("out/dds_RTcontrol")
# filter out genes with low expression value as well as with very small variability across samples.
selProt <- genefilter(counts(dds), filterfun(kOverA(3, 6), function(x) (IQR(x) > 0.25)))
sum(selProt)
dim(counts(dds))
res=res[selProt,]
4615
- 5610
- 9
In [5]:
# human_mouse proteins only
res_hm = res[res$human_mouse=="True",]
geneList = res_hm$padj
names(geneList)<-rownames(res_hm)
In [6]:
# Tests based on gene counts
# universe is all proteins observed
# "sig" proteins are ones with a (adjusted) p-value < 0.01
topDiffGenes <- function(allScore) { return(allScore < 0.01)}
GOdata <- new("topGOdata", ontology = "BP", allGenes = geneList, geneSel = topDiffGenes, annot = annFUN.gene2GO, gene2GO = geneID2GO)
resultWeight01 <- runTest(GOdata, statistic = "fisher")
# "sig" proteins are determined by rank
resultWeight01ks <- runTest(GOdata, statistic = "ks")
Building most specific GOs ..... ( 50 GO terms found. )
Build GO DAG topology .......... ( 292 GO terms and 557 relations. )
Annotating nodes ............... ( 73 genes annotated to the GO terms. )
-- Weight01 Algorithm --
the algorithm is scoring 16 nontrivial nodes
parameters:
test statistic: fisher
Level 7: 1 nodes to be scored (0 eliminated genes)
Level 6: 2 nodes to be scored (0 eliminated genes)
Level 5: 2 nodes to be scored (2 eliminated genes)
Level 4: 3 nodes to be scored (25 eliminated genes)
Level 3: 4 nodes to be scored (39 eliminated genes)
Level 2: 3 nodes to be scored (49 eliminated genes)
Level 1: 1 nodes to be scored (57 eliminated genes)
-- Weight01 Algorithm --
the algorithm is scoring 292 nontrivial nodes
parameters:
test statistic: ks
score order: increasing
Level 13: 1 nodes to be scored (0 eliminated genes)
Level 12: 4 nodes to be scored (0 eliminated genes)
Level 11: 12 nodes to be scored (1 eliminated genes)
Level 10: 23 nodes to be scored (2 eliminated genes)
Level 9: 31 nodes to be scored (8 eliminated genes)
Level 8: 27 nodes to be scored (10 eliminated genes)
Level 7: 26 nodes to be scored (15 eliminated genes)
Level 6: 39 nodes to be scored (22 eliminated genes)
Level 5: 53 nodes to be scored (28 eliminated genes)
Level 4: 41 nodes to be scored (56 eliminated genes)
Level 3: 23 nodes to be scored (60 eliminated genes)
Level 2: 11 nodes to be scored (69 eliminated genes)
Level 1: 1 nodes to be scored (71 eliminated genes)
In [7]:
GenTable(GOdata, resultWeight01, topNodes = 10)
GO.ID Term Annotated Significant Expected result1
GO:0005975 carbohydrate metabolic process 9 2 0.49 0.072
GO:0015671 oxygen transport 2 1 0.11 0.107
GO:0006508 proteolysis 23 1 1.26 0.788
GO:0015669 gas transport 2 1 0.11 1.000
GO:0019538 protein metabolic process 25 1 1.37 1.000
GO:0006810 transport 15 1 0.82 1.000
GO:0043170 macromolecule metabolic process 26 1 1.42 1.000
GO:0044699 single-organism process 34 1 1.86 1.000
GO:0044765 single-organism transport 14 1 0.77 1.000
GO:0051234 establishment of localization 15 1 0.82 1.000
In [8]:
GenTable(GOdata, resultWeight01ks, topNodes = 10)
GO.ID Term Annotated Significant Expected result1
GO:0006813 potassium ion transport 2 0 0.11 0.047
GO:0006814 sodium ion transport 2 0 0.11 0.047
GO:0005975 carbohydrate metabolic process 9 2 0.49 0.135
GO:0006810 transport 15 1 0.82 0.171
GO:0006508 proteolysis 23 1 1.26 0.174
GO:0015671 oxygen transport 2 1 0.11 0.195
GO:0006955 immune response 4 0 0.22 0.271
GO:0007264 small GTPase mediated signal transductio... 1 0 0.05 0.290
GO:0006006 glucose metabolic process 2 0 0.11 0.368
GO:0007156 homophilic cell adhesion via plasma memb... 2 0 0.11 0.389
In [9]:
# same thing using GO MF
GOdataMF <- new("topGOdata", ontology = "MF", allGenes = geneList, geneSel = topDiffGenes, annot = annFUN.gene2GO, gene2GO = geneID2GO)
resultWeight01MF <- runTest(GOdataMF, statistic = "fisher")
resultWeight01ksMF <- runTest(GOdataMF, statistic = "ks")
Building most specific GOs ..... ( 82 GO terms found. )
Build GO DAG topology .......... ( 205 GO terms and 267 relations. )
Annotating nodes ............... ( 101 genes annotated to the GO terms. )
-- Weight01 Algorithm --
the algorithm is scoring 21 nontrivial nodes
parameters:
test statistic: fisher
Level 7: 2 nodes to be scored (0 eliminated genes)
Level 6: 3 nodes to be scored (0 eliminated genes)
Level 5: 3 nodes to be scored (17 eliminated genes)
Level 4: 4 nodes to be scored (32 eliminated genes)
Level 3: 6 nodes to be scored (48 eliminated genes)
Level 2: 2 nodes to be scored (51 eliminated genes)
Level 1: 1 nodes to be scored (77 eliminated genes)
-- Weight01 Algorithm --
the algorithm is scoring 205 nontrivial nodes
parameters:
test statistic: ks
score order: increasing
Level 14: 1 nodes to be scored (0 eliminated genes)
Level 13: 2 nodes to be scored (0 eliminated genes)
Level 12: 1 nodes to be scored (1 eliminated genes)
Level 11: 1 nodes to be scored (1 eliminated genes)
Level 10: 2 nodes to be scored (1 eliminated genes)
Level 9: 6 nodes to be scored (1 eliminated genes)
Level 8: 15 nodes to be scored (3 eliminated genes)
Level 7: 26 nodes to be scored (8 eliminated genes)
Level 6: 39 nodes to be scored (17 eliminated genes)
Level 5: 43 nodes to be scored (48 eliminated genes)
Level 4: 37 nodes to be scored (67 eliminated genes)
Level 3: 25 nodes to be scored (75 eliminated genes)
Level 2: 6 nodes to be scored (81 eliminated genes)
Level 1: 1 nodes to be scored (99 eliminated genes)
In [10]:
GenTable(GOdataMF, resultWeight01MF, topNodes = 10)
GO.ID Term Annotated Significant Expected result1
GO:0043169 cation binding 36 3 2.85 0.026
GO:0005515 protein binding 21 4 1.66 0.056
GO:0019825 oxygen binding 2 1 0.16 0.153
GO:0020037 heme binding 3 1 0.24 0.221
GO:0005506 iron ion binding 7 1 0.55 0.449
GO:0004252 serine-type endopeptidase activity 10 1 0.79 0.579
GO:0003824 catalytic activity 55 3 4.36 0.667
GO:0003674 molecular_function 101 8 8.00 1.000
GO:0008233 peptidase activity 25 1 1.98 1.000
GO:0004175 endopeptidase activity 12 1 0.95 1.000
In [11]:
GenTable(GOdataMF, resultWeight01ksMF, topNodes = 10)
GO.ID Term Annotated Significant Expected result1
GO:0016787 hydrolase activity 36 1 2.85 0.016
GO:0043169 cation binding 36 3 2.85 0.038
GO:0030145 manganese ion binding 2 0 0.16 0.064
GO:0005515 protein binding 21 4 1.66 0.148
GO:0003676 nucleic acid binding 4 0 0.32 0.163
GO:0004181 metallocarboxypeptidase activity 2 0 0.16 0.166
GO:0005215 transporter activity 4 0 0.32 0.189
GO:0003824 catalytic activity 55 3 4.36 0.200
GO:0008235 metalloexopeptidase activity 4 0 0.32 0.212
GO:0004867 serine-type endopeptidase inhibitor acti... 3 0 0.24 0.215
In [12]:
res_nh = res[res$human_mouse=="False",]
geneList = res_nh$padj
names(geneList)<-rownames(res_nh)
length(geneList)
4463
In [13]:
GOdataBP <- new("topGOdata", ontology = "BP", allGenes = geneList, geneSel = topDiffGenes, annot = annFUN.gene2GO, gene2GO = geneID2GO)
resultWeight01BP <- runTest(GOdataBP, statistic = "fisher")
resultWeight01ksBP <- runTest(GOdataBP, statistic = "ks")
Building most specific GOs ..... ( 170 GO terms found. )
Build GO DAG topology .......... ( 546 GO terms and 1107 relations. )
Annotating nodes ............... ( 3558 genes annotated to the GO terms. )
-- Weight01 Algorithm --
the algorithm is scoring 226 nontrivial nodes
parameters:
test statistic: fisher
Level 13: 1 nodes to be scored (0 eliminated genes)
Level 12: 2 nodes to be scored (0 eliminated genes)
Level 11: 6 nodes to be scored (259 eliminated genes)
Level 10: 14 nodes to be scored (262 eliminated genes)
Level 9: 23 nodes to be scored (381 eliminated genes)
Level 8: 23 nodes to be scored (904 eliminated genes)
Level 7: 21 nodes to be scored (1398 eliminated genes)
Level 6: 37 nodes to be scored (2317 eliminated genes)
Level 5: 44 nodes to be scored (2437 eliminated genes)
Level 4: 29 nodes to be scored (2571 eliminated genes)
Level 3: 17 nodes to be scored (2663 eliminated genes)
Level 2: 8 nodes to be scored (3348 eliminated genes)
Level 1: 1 nodes to be scored (3414 eliminated genes)
-- Weight01 Algorithm --
the algorithm is scoring 546 nontrivial nodes
parameters:
test statistic: ks
score order: increasing
Level 13: 6 nodes to be scored (0 eliminated genes)
Level 12: 31 nodes to be scored (0 eliminated genes)
Level 11: 41 nodes to be scored (337 eliminated genes)
Level 10: 48 nodes to be scored (488 eliminated genes)
Level 9: 61 nodes to be scored (626 eliminated genes)
Level 8: 63 nodes to be scored (1134 eliminated genes)
Level 7: 58 nodes to be scored (1679 eliminated genes)
Level 6: 79 nodes to be scored (2479 eliminated genes)
Level 5: 74 nodes to be scored (2542 eliminated genes)
Level 4: 45 nodes to be scored (2666 eliminated genes)
Level 3: 28 nodes to be scored (2757 eliminated genes)
Level 2: 11 nodes to be scored (3378 eliminated genes)
Level 1: 1 nodes to be scored (3418 eliminated genes)
In [14]:
GenTable(GOdataBP, resultWeight01BP, topNodes = 10)
GO.ID Term Annotated Significant Expected result1
GO:0071973 bacterial-type flagellum-dependent cell ... 238 32 9.03 8.0e-11
GO:0006633 fatty acid biosynthetic process 14 6 0.53 6.2e-06
GO:0032784 regulation of DNA-templated transcriptio... 3 3 0.11 5.3e-05
GO:0006629 lipid metabolic process 61 8 2.31 0.0013
GO:0035556 intracellular signal transduction 4 3 0.15 0.0014
GO:0006084 acetyl-CoA metabolic process 9 3 0.34 0.0038
GO:0006066 alcohol metabolic process 20 4 0.76 0.0080
GO:0015976 carbon utilization 4 2 0.15 0.0082
GO:0006096 glycolytic process 259 16 9.83 0.0341
GO:0006284 base-excision repair 1 1 0.04 0.0379
In [15]:
GenTable(GOdataBP, resultWeight01ksBP, topNodes = 10)
GO.ID Term Annotated Significant Expected result1
GO:0006090 pyruvate metabolic process 551 24 20.91 < 1e-30
GO:0016310 phosphorylation 585 24 22.20 < 1e-30
GO:0071973 bacterial-type flagellum-dependent cell ... 238 32 9.03 7.4e-14
GO:0006102 isocitrate metabolic process 20 0 0.76 2.3e-11
GO:0006351 transcription, DNA-templated 350 5 13.28 1.6e-08
GO:0006810 transport 243 5 9.22 8.8e-08
GO:0006164 purine nucleotide biosynthetic process 124 1 4.70 4.8e-05
GO:0009396 folic acid-containing compound biosynthe... 20 2 0.76 0.00013
GO:0017038 protein import 21 0 0.80 0.00016
GO:0006633 fatty acid biosynthetic process 14 6 0.53 0.00056
In [16]:
# Molecular function
GOdataMF <- new("topGOdata", ontology = "MF", allGenes = geneList, geneSel = topDiffGenes, annot = annFUN.gene2GO, gene2GO = geneID2GO)
resultWeight01MF <- runTest(GOdataMF, statistic = "fisher")
resultWeight01ksMF <- runTest(GOdataMF, statistic = "ks")
Building most specific GOs ..... ( 280 GO terms found. )
Build GO DAG topology .......... ( 447 GO terms and 535 relations. )
Annotating nodes ............... ( 3889 genes annotated to the GO terms. )
-- Weight01 Algorithm --
the algorithm is scoring 160 nontrivial nodes
parameters:
test statistic: fisher
Level 9: 2 nodes to be scored (0 eliminated genes)
Level 8: 7 nodes to be scored (0 eliminated genes)
Level 7: 19 nodes to be scored (971 eliminated genes)
Level 6: 35 nodes to be scored (977 eliminated genes)
Level 5: 41 nodes to be scored (1632 eliminated genes)
Level 4: 30 nodes to be scored (1996 eliminated genes)
Level 3: 19 nodes to be scored (2890 eliminated genes)
Level 2: 6 nodes to be scored (3022 eliminated genes)
Level 1: 1 nodes to be scored (3515 eliminated genes)
-- Weight01 Algorithm --
the algorithm is scoring 447 nontrivial nodes
parameters:
test statistic: ks
score order: increasing
Level 15: 1 nodes to be scored (0 eliminated genes)
Level 14: 2 nodes to be scored (0 eliminated genes)
Level 13: 2 nodes to be scored (3 eliminated genes)
Level 12: 1 nodes to be scored (62 eliminated genes)
Level 11: 2 nodes to be scored (62 eliminated genes)
Level 10: 2 nodes to be scored (62 eliminated genes)
Level 9: 6 nodes to be scored (63 eliminated genes)
Level 8: 19 nodes to be scored (63 eliminated genes)
Level 7: 70 nodes to be scored (992 eliminated genes)
Level 6: 147 nodes to be scored (1075 eliminated genes)
Level 5: 98 nodes to be scored (1874 eliminated genes)
Level 4: 56 nodes to be scored (2495 eliminated genes)
Level 3: 31 nodes to be scored (3092 eliminated genes)
Level 2: 9 nodes to be scored (3193 eliminated genes)
Level 1: 1 nodes to be scored (3523 eliminated genes)
In [17]:
GenTable(GOdataMF, resultWeight01MF, topNodes=10)
GO.ID Term Annotated Significant Expected result1
GO:0004634 phosphopyruvate hydratase activity 36 13 1.35 1.9e-10
GO:0005198 structural molecule activity 535 36 20.08 2.9e-10
GO:0004315 3-oxoacyl-[acyl-carrier-protein] synthas... 7 6 0.26 1.7e-08
GO:0000287 magnesium ion binding 90 15 3.38 8.2e-07
GO:0008762 UDP-N-acetylmuramate dehydrogenase activ... 2 2 0.08 0.0014
GO:0004435 phosphatidylinositol phospholipase C act... 2 2 0.08 0.0014
GO:0004022 alcohol dehydrogenase (NAD) activity 6 3 0.23 0.0079
GO:0008774 acetaldehyde dehydrogenase (acetylating)... 4 2 0.15 0.0080
GO:0018492 carbon-monoxide dehydrogenase (acceptor)... 16 3 0.60 0.0203
GO:0004853 uroporphyrinogen decarboxylase activity 1 1 0.04 0.0375
In [18]:
GenTable(GOdataMF, resultWeight01ksMF, topNodes = 10)
GO.ID Term Annotated Significant Expected result1
GO:0050242 pyruvate, phosphate dikinase activity 286 8 10.74 < 1e-30
GO:0016301 kinase activity 407 6 15.28 < 1e-30
GO:0016887 ATPase activity 166 5 6.23 2.4e-15
GO:0005198 structural molecule activity 535 36 20.08 9.9e-14
GO:0004450 isocitrate dehydrogenase (NADP+) activit... 20 0 0.75 1.3e-11
GO:0003677 DNA binding 349 5 13.10 1.2e-10
GO:0005524 ATP binding 756 7 28.38 1.8e-10
GO:0003899 DNA-directed RNA polymerase activity 338 2 12.69 1.4e-09
GO:0016820 hydrolase activity, acting on acid anhyd... 138 2 5.18 3.2e-08
GO:0004315 3-oxoacyl-[acyl-carrier-protein] synthas... 7 6 0.26 3.5e-06
In [19]:
print(showGroupDensity(GOdataMF, "GO:0050242", ranks = TRUE))
Content source: stuppie/CM7_CM1E2d56col_unenr123_rawextract_2017
Similar notebooks: