Running topGO


In [1]:
require("topGO")
library(genefilter)
setwd("/home/gstupp/projects/Wolan/cmoon/CM7_CM1E2d56col_unenr123_rawextract_2017/")


Loading required package: topGO
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
    do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unlist, unsplit

Loading required package: graph
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: GO.db
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: IRanges
Loading required package: S4Vectors

Loading required package: SparseM

Attaching package: ‘SparseM’

The following object is masked from ‘package:base’:

    backsolve

Warning message:
“replacing previous import by ‘graph::.__C__dist’ when loading ‘topGO’”
groupGOTerms: 	GOBPTerm, GOMFTerm, GOCCTerm environments built.
Attaching package: ‘topGO’

The following object is masked from ‘package:IRanges’:

    members


Attaching package: ‘genefilter’

The following object is masked from ‘package:base’:

    anyNA


In [2]:
# load in the deseq results
load("out/res_rag_wt")

In [3]:
# load in protein cluster ID -> GO terms mapping
geneID2GO <- readMappings("out/geneid2go.map")
head(geneID2GO)


$`22855681`
  1. 'GO:0006090'
  2. 'GO:0016301'
  3. 'GO:0003824'
  4. 'GO:0016310'
  5. 'GO:0016772'
  6. 'GO:0050242'
  7. 'GO:0005524'
$`49973931`
  1. 'GO:0050485'
  2. 'GO:0055114'
$`61502806`
  1. 'GO:0005509'
  2. 'GO:0005783'
  3. 'GO:0005515'
  4. 'GO:0051082'
  5. 'GO:0006457'
$`17629190`
  1. 'GO:0006090'
  2. 'GO:0016301'
  3. 'GO:0003824'
  4. 'GO:0016310'
  5. 'GO:0016772'
  6. 'GO:0050242'
  7. 'GO:0005524'
$`27803655`
  1. 'GO:0016616'
  2. 'GO:0016491'
  3. 'GO:0006631'
  4. 'GO:0055114'
  5. 'GO:0003857'
  6. 'GO:0070403'
  7. 'GO:0050662'
$`52783963`
  1. 'GO:0016620'
  2. 'GO:0055114'

In [4]:
# load in the count data
load("out/dds_RTcontrol")
# filter out genes with low expression value as well as with very small variability across samples.
selProt <- genefilter(counts(dds), filterfun(kOverA(3, 6), function(x) (IQR(x) > 0.25)))
sum(selProt)
dim(counts(dds))
res=res[selProt,]


4615
  1. 5610
  2. 9

human_mouse proteins only


In [5]:
# human_mouse proteins only
res_hm = res[res$human_mouse=="True",]

geneList = res_hm$padj
names(geneList)<-rownames(res_hm)

In [6]:
# Tests based on gene counts
# universe is all proteins observed
# "sig" proteins are ones with a (adjusted) p-value < 0.01
topDiffGenes <- function(allScore) { return(allScore < 0.01)}
GOdata <- new("topGOdata", ontology = "BP", allGenes = geneList, geneSel = topDiffGenes, annot = annFUN.gene2GO, gene2GO = geneID2GO)
resultWeight01 <- runTest(GOdata, statistic = "fisher")
# "sig" proteins are determined by rank
resultWeight01ks <- runTest(GOdata, statistic = "ks")


Building most specific GOs .....	( 50 GO terms found. )

Build GO DAG topology ..........	( 292 GO terms and 557 relations. )

Annotating nodes ...............	( 73 genes annotated to the GO terms. )

			 -- Weight01 Algorithm -- 

		 the algorithm is scoring 16 nontrivial nodes
		 parameters: 
			 test statistic:  fisher 

	 Level 7:	1 nodes to be scored	(0 eliminated genes)

	 Level 6:	2 nodes to be scored	(0 eliminated genes)

	 Level 5:	2 nodes to be scored	(2 eliminated genes)

	 Level 4:	3 nodes to be scored	(25 eliminated genes)

	 Level 3:	4 nodes to be scored	(39 eliminated genes)

	 Level 2:	3 nodes to be scored	(49 eliminated genes)

	 Level 1:	1 nodes to be scored	(57 eliminated genes)

			 -- Weight01 Algorithm -- 

		 the algorithm is scoring 292 nontrivial nodes
		 parameters: 
			 test statistic:  ks 
			 score order:  increasing 

	 Level 13:	1 nodes to be scored	(0 eliminated genes)

	 Level 12:	4 nodes to be scored	(0 eliminated genes)

	 Level 11:	12 nodes to be scored	(1 eliminated genes)

	 Level 10:	23 nodes to be scored	(2 eliminated genes)

	 Level 9:	31 nodes to be scored	(8 eliminated genes)

	 Level 8:	27 nodes to be scored	(10 eliminated genes)

	 Level 7:	26 nodes to be scored	(15 eliminated genes)

	 Level 6:	39 nodes to be scored	(22 eliminated genes)

	 Level 5:	53 nodes to be scored	(28 eliminated genes)

	 Level 4:	41 nodes to be scored	(56 eliminated genes)

	 Level 3:	23 nodes to be scored	(60 eliminated genes)

	 Level 2:	11 nodes to be scored	(69 eliminated genes)

	 Level 1:	1 nodes to be scored	(71 eliminated genes)

In [7]:
GenTable(GOdata, resultWeight01, topNodes = 10)


GO.IDTermAnnotatedSignificantExpectedresult1
GO:0005975 carbohydrate metabolic process 9 2 0.49 0.072
GO:0015671 oxygen transport 2 1 0.11 0.107
GO:0006508 proteolysis 23 1 1.26 0.788
GO:0015669 gas transport 2 1 0.11 1.000
GO:0019538 protein metabolic process 25 1 1.37 1.000
GO:0006810 transport 15 1 0.82 1.000
GO:0043170 macromolecule metabolic process26 1 1.42 1.000
GO:0044699 single-organism process 34 1 1.86 1.000
GO:0044765 single-organism transport 14 1 0.77 1.000
GO:0051234 establishment of localization 15 1 0.82 1.000

In [8]:
GenTable(GOdata, resultWeight01ks, topNodes = 10)


GO.IDTermAnnotatedSignificantExpectedresult1
GO:0006813 potassium ion transport 2 0 0.11 0.047
GO:0006814 sodium ion transport 2 0 0.11 0.047
GO:0005975 carbohydrate metabolic process 9 2 0.49 0.135
GO:0006810 transport 15 1 0.82 0.171
GO:0006508 proteolysis 23 1 1.26 0.174
GO:0015671 oxygen transport 2 1 0.11 0.195
GO:0006955 immune response 4 0 0.22 0.271
GO:0007264 small GTPase mediated signal transductio... 1 0 0.05 0.290
GO:0006006 glucose metabolic process 2 0 0.11 0.368
GO:0007156 homophilic cell adhesion via plasma memb... 2 0 0.11 0.389

In [9]:
# same thing using GO MF
GOdataMF <- new("topGOdata", ontology = "MF", allGenes = geneList, geneSel = topDiffGenes, annot = annFUN.gene2GO, gene2GO = geneID2GO)
resultWeight01MF <- runTest(GOdataMF, statistic = "fisher")
resultWeight01ksMF <- runTest(GOdataMF, statistic = "ks")


Building most specific GOs .....	( 82 GO terms found. )

Build GO DAG topology ..........	( 205 GO terms and 267 relations. )

Annotating nodes ...............	( 101 genes annotated to the GO terms. )

			 -- Weight01 Algorithm -- 

		 the algorithm is scoring 21 nontrivial nodes
		 parameters: 
			 test statistic:  fisher 

	 Level 7:	2 nodes to be scored	(0 eliminated genes)

	 Level 6:	3 nodes to be scored	(0 eliminated genes)

	 Level 5:	3 nodes to be scored	(17 eliminated genes)

	 Level 4:	4 nodes to be scored	(32 eliminated genes)

	 Level 3:	6 nodes to be scored	(48 eliminated genes)

	 Level 2:	2 nodes to be scored	(51 eliminated genes)

	 Level 1:	1 nodes to be scored	(77 eliminated genes)

			 -- Weight01 Algorithm -- 

		 the algorithm is scoring 205 nontrivial nodes
		 parameters: 
			 test statistic:  ks 
			 score order:  increasing 

	 Level 14:	1 nodes to be scored	(0 eliminated genes)

	 Level 13:	2 nodes to be scored	(0 eliminated genes)

	 Level 12:	1 nodes to be scored	(1 eliminated genes)

	 Level 11:	1 nodes to be scored	(1 eliminated genes)

	 Level 10:	2 nodes to be scored	(1 eliminated genes)

	 Level 9:	6 nodes to be scored	(1 eliminated genes)

	 Level 8:	15 nodes to be scored	(3 eliminated genes)

	 Level 7:	26 nodes to be scored	(8 eliminated genes)

	 Level 6:	39 nodes to be scored	(17 eliminated genes)

	 Level 5:	43 nodes to be scored	(48 eliminated genes)

	 Level 4:	37 nodes to be scored	(67 eliminated genes)

	 Level 3:	25 nodes to be scored	(75 eliminated genes)

	 Level 2:	6 nodes to be scored	(81 eliminated genes)

	 Level 1:	1 nodes to be scored	(99 eliminated genes)

In [10]:
GenTable(GOdataMF, resultWeight01MF, topNodes = 10)


GO.IDTermAnnotatedSignificantExpectedresult1
GO:0043169 cation binding 36 3 2.85 0.026
GO:0005515 protein binding 21 4 1.66 0.056
GO:0019825 oxygen binding 2 1 0.16 0.153
GO:0020037 heme binding 3 1 0.24 0.221
GO:0005506 iron ion binding 7 1 0.55 0.449
GO:0004252 serine-type endopeptidase activity 10 1 0.79 0.579
GO:0003824 catalytic activity 55 3 4.36 0.667
GO:0003674 molecular_function 101 8 8.00 1.000
GO:0008233 peptidase activity 25 1 1.98 1.000
GO:0004175 endopeptidase activity 12 1 0.95 1.000

In [11]:
GenTable(GOdataMF, resultWeight01ksMF, topNodes = 10)


GO.IDTermAnnotatedSignificantExpectedresult1
GO:0016787 hydrolase activity 36 1 2.85 0.016
GO:0043169 cation binding 36 3 2.85 0.038
GO:0030145 manganese ion binding 2 0 0.16 0.064
GO:0005515 protein binding 21 4 1.66 0.148
GO:0003676 nucleic acid binding 4 0 0.32 0.163
GO:0004181 metallocarboxypeptidase activity 2 0 0.16 0.166
GO:0005215 transporter activity 4 0 0.32 0.189
GO:0003824 catalytic activity 55 3 4.36 0.200
GO:0008235 metalloexopeptidase activity 4 0 0.32 0.212
GO:0004867 serine-type endopeptidase inhibitor acti... 3 0 0.24 0.215

Non -human/mouse


In [12]:
res_nh = res[res$human_mouse=="False",]

geneList = res_nh$padj
names(geneList)<-rownames(res_nh)
length(geneList)


4463

In [13]:
GOdataBP <- new("topGOdata", ontology = "BP", allGenes = geneList, geneSel = topDiffGenes, annot = annFUN.gene2GO, gene2GO = geneID2GO)
resultWeight01BP <- runTest(GOdataBP, statistic = "fisher")
resultWeight01ksBP <- runTest(GOdataBP, statistic = "ks")


Building most specific GOs .....	( 170 GO terms found. )

Build GO DAG topology ..........	( 546 GO terms and 1107 relations. )

Annotating nodes ...............	( 3558 genes annotated to the GO terms. )

			 -- Weight01 Algorithm -- 

		 the algorithm is scoring 226 nontrivial nodes
		 parameters: 
			 test statistic:  fisher 

	 Level 13:	1 nodes to be scored	(0 eliminated genes)

	 Level 12:	2 nodes to be scored	(0 eliminated genes)

	 Level 11:	6 nodes to be scored	(259 eliminated genes)

	 Level 10:	14 nodes to be scored	(262 eliminated genes)

	 Level 9:	23 nodes to be scored	(381 eliminated genes)

	 Level 8:	23 nodes to be scored	(904 eliminated genes)

	 Level 7:	21 nodes to be scored	(1398 eliminated genes)

	 Level 6:	37 nodes to be scored	(2317 eliminated genes)

	 Level 5:	44 nodes to be scored	(2437 eliminated genes)

	 Level 4:	29 nodes to be scored	(2571 eliminated genes)

	 Level 3:	17 nodes to be scored	(2663 eliminated genes)

	 Level 2:	8 nodes to be scored	(3348 eliminated genes)

	 Level 1:	1 nodes to be scored	(3414 eliminated genes)

			 -- Weight01 Algorithm -- 

		 the algorithm is scoring 546 nontrivial nodes
		 parameters: 
			 test statistic:  ks 
			 score order:  increasing 

	 Level 13:	6 nodes to be scored	(0 eliminated genes)

	 Level 12:	31 nodes to be scored	(0 eliminated genes)

	 Level 11:	41 nodes to be scored	(337 eliminated genes)

	 Level 10:	48 nodes to be scored	(488 eliminated genes)

	 Level 9:	61 nodes to be scored	(626 eliminated genes)

	 Level 8:	63 nodes to be scored	(1134 eliminated genes)

	 Level 7:	58 nodes to be scored	(1679 eliminated genes)

	 Level 6:	79 nodes to be scored	(2479 eliminated genes)

	 Level 5:	74 nodes to be scored	(2542 eliminated genes)

	 Level 4:	45 nodes to be scored	(2666 eliminated genes)

	 Level 3:	28 nodes to be scored	(2757 eliminated genes)

	 Level 2:	11 nodes to be scored	(3378 eliminated genes)

	 Level 1:	1 nodes to be scored	(3418 eliminated genes)

In [14]:
GenTable(GOdataBP, resultWeight01BP, topNodes = 10)


GO.IDTermAnnotatedSignificantExpectedresult1
GO:0071973 bacterial-type flagellum-dependent cell ...238 32 9.03 8.0e-11
GO:0006633 fatty acid biosynthetic process 14 6 0.53 6.2e-06
GO:0032784 regulation of DNA-templated transcriptio... 3 3 0.11 5.3e-05
GO:0006629 lipid metabolic process 61 8 2.31 0.0013
GO:0035556 intracellular signal transduction 4 3 0.15 0.0014
GO:0006084 acetyl-CoA metabolic process 9 3 0.34 0.0038
GO:0006066 alcohol metabolic process 20 4 0.76 0.0080
GO:0015976 carbon utilization 4 2 0.15 0.0082
GO:0006096 glycolytic process 259 16 9.83 0.0341
GO:0006284 base-excision repair 1 1 0.04 0.0379

In [15]:
GenTable(GOdataBP, resultWeight01ksBP, topNodes = 10)


GO.IDTermAnnotatedSignificantExpectedresult1
GO:0006090 pyruvate metabolic process 551 24 20.91 < 1e-30
GO:0016310 phosphorylation 585 24 22.20 < 1e-30
GO:0071973 bacterial-type flagellum-dependent cell ...238 32 9.03 7.4e-14
GO:0006102 isocitrate metabolic process 20 0 0.76 2.3e-11
GO:0006351 transcription, DNA-templated 350 5 13.28 1.6e-08
GO:0006810 transport 243 5 9.22 8.8e-08
GO:0006164 purine nucleotide biosynthetic process 124 1 4.70 4.8e-05
GO:0009396 folic acid-containing compound biosynthe... 20 2 0.76 0.00013
GO:0017038 protein import 21 0 0.80 0.00016
GO:0006633 fatty acid biosynthetic process 14 6 0.53 0.00056

In [16]:
# Molecular function
GOdataMF <- new("topGOdata", ontology = "MF", allGenes = geneList, geneSel = topDiffGenes, annot = annFUN.gene2GO, gene2GO = geneID2GO)
resultWeight01MF <- runTest(GOdataMF, statistic = "fisher")
resultWeight01ksMF <- runTest(GOdataMF, statistic = "ks")


Building most specific GOs .....	( 280 GO terms found. )

Build GO DAG topology ..........	( 447 GO terms and 535 relations. )

Annotating nodes ...............	( 3889 genes annotated to the GO terms. )

			 -- Weight01 Algorithm -- 

		 the algorithm is scoring 160 nontrivial nodes
		 parameters: 
			 test statistic:  fisher 

	 Level 9:	2 nodes to be scored	(0 eliminated genes)

	 Level 8:	7 nodes to be scored	(0 eliminated genes)

	 Level 7:	19 nodes to be scored	(971 eliminated genes)

	 Level 6:	35 nodes to be scored	(977 eliminated genes)

	 Level 5:	41 nodes to be scored	(1632 eliminated genes)

	 Level 4:	30 nodes to be scored	(1996 eliminated genes)

	 Level 3:	19 nodes to be scored	(2890 eliminated genes)

	 Level 2:	6 nodes to be scored	(3022 eliminated genes)

	 Level 1:	1 nodes to be scored	(3515 eliminated genes)

			 -- Weight01 Algorithm -- 

		 the algorithm is scoring 447 nontrivial nodes
		 parameters: 
			 test statistic:  ks 
			 score order:  increasing 

	 Level 15:	1 nodes to be scored	(0 eliminated genes)

	 Level 14:	2 nodes to be scored	(0 eliminated genes)

	 Level 13:	2 nodes to be scored	(3 eliminated genes)

	 Level 12:	1 nodes to be scored	(62 eliminated genes)

	 Level 11:	2 nodes to be scored	(62 eliminated genes)

	 Level 10:	2 nodes to be scored	(62 eliminated genes)

	 Level 9:	6 nodes to be scored	(63 eliminated genes)

	 Level 8:	19 nodes to be scored	(63 eliminated genes)

	 Level 7:	70 nodes to be scored	(992 eliminated genes)

	 Level 6:	147 nodes to be scored	(1075 eliminated genes)

	 Level 5:	98 nodes to be scored	(1874 eliminated genes)

	 Level 4:	56 nodes to be scored	(2495 eliminated genes)

	 Level 3:	31 nodes to be scored	(3092 eliminated genes)

	 Level 2:	9 nodes to be scored	(3193 eliminated genes)

	 Level 1:	1 nodes to be scored	(3523 eliminated genes)

In [17]:
GenTable(GOdataMF, resultWeight01MF, topNodes=10)


GO.IDTermAnnotatedSignificantExpectedresult1
GO:0004634 phosphopyruvate hydratase activity 36 13 1.35 1.9e-10
GO:0005198 structural molecule activity 535 36 20.08 2.9e-10
GO:0004315 3-oxoacyl-[acyl-carrier-protein] synthas... 7 6 0.26 1.7e-08
GO:0000287 magnesium ion binding 90 15 3.38 8.2e-07
GO:0008762 UDP-N-acetylmuramate dehydrogenase activ... 2 2 0.08 0.0014
GO:0004435 phosphatidylinositol phospholipase C act... 2 2 0.08 0.0014
GO:0004022 alcohol dehydrogenase (NAD) activity 6 3 0.23 0.0079
GO:0008774 acetaldehyde dehydrogenase (acetylating)... 4 2 0.15 0.0080
GO:0018492 carbon-monoxide dehydrogenase (acceptor)... 16 3 0.60 0.0203
GO:0004853 uroporphyrinogen decarboxylase activity 1 1 0.04 0.0375

In [18]:
GenTable(GOdataMF, resultWeight01ksMF, topNodes = 10)


GO.IDTermAnnotatedSignificantExpectedresult1
GO:0050242 pyruvate, phosphate dikinase activity 286 8 10.74 < 1e-30
GO:0016301 kinase activity 407 6 15.28 < 1e-30
GO:0016887 ATPase activity 166 5 6.23 2.4e-15
GO:0005198 structural molecule activity 535 36 20.08 9.9e-14
GO:0004450 isocitrate dehydrogenase (NADP+) activit... 20 0 0.75 1.3e-11
GO:0003677 DNA binding 349 5 13.10 1.2e-10
GO:0005524 ATP binding 756 7 28.38 1.8e-10
GO:0003899 DNA-directed RNA polymerase activity 338 2 12.69 1.4e-09
GO:0016820 hydrolase activity, acting on acid anhyd...138 2 5.18 3.2e-08
GO:0004315 3-oxoacyl-[acyl-carrier-protein] synthas... 7 6 0.26 3.5e-06

In [19]:
print(showGroupDensity(GOdataMF, "GO:0050242", ranks = TRUE))