In [3]:
#libraries
library(GO.db)
library(topGO)
library(org.Hs.eg.db)
library(org.Sc.sgd.db)
library(GOSemSim)


Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, cbind, colnames, do.call,
    duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect,
    is.unsorted, lapply, lengths, Map, mapply, match, mget, order,
    paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
    Reduce, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit, which, which.max, which.min

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:base’:

    colMeans, colSums, expand.grid, rowMeans, rowSums


Loading required package: graph
Loading required package: SparseM

Attaching package: ‘SparseM’

The following object is masked from ‘package:base’:

    backsolve


groupGOTerms: 	GOBPTerm, GOMFTerm, GOCCTerm environments built.

Attaching package: ‘topGO’

The following object is masked from ‘package:IRanges’:

    members



GOSemSim v2.0.4  For help: https://guangchuangyu.github.io/GOSemSim

If you use GOSemSim in published research, please cite:
Guangchuang Yu, Fei Li, Yide Qin, Xiaochen Bo, Yibo Wu, Shengqi Wang. GOSemSim: an R package for measuring semantic similarity among GO terms and gene products Bioinformatics 2010, 26(7):976-978

In [30]:
file <- "yeast_union"

ont="BP"
db <- org.Sc.sgd.db
mapping <- "org.Sc.sgd.db"
ID <- "ENSEMBL"

In [31]:
##load all community gene lists
setwd(sprintf("/home/david/Documents/ghsom/"))

#background gene list
backgroundFilename <- sprintf("%s_all_genes.txt", file)
allGenes <- scan(backgroundFilename, character())

In [7]:
scGO <- godata(mapping, ont=ont, keytype=ID)


[1] "preparing gene to GO mapping data..."
[1] "preparing IC data..."

In [12]:
semanticDistances <- mgeneSim(allGenes, semData=scGO, measure="Resnik", combine="BMA", verbose=FALSE)

In [14]:
ncol(semanticDistances)


224

In [10]:
semanticDistances <- 1 - semanticDistances

In [11]:
head(semanticDistances)


YGR046WYBR057CYJR122WYDR503CYGR136WYPR048WYGL175CYOR262WYDL246CYNL122CYGR024CYGL116WYJL057CYOL082WYJL110CYFR002WYGL025CYHR129CYOR078WYGL153W
YGR046W0.0000.7040.8140.3670.9490.7690.7510.8820.7860.6720.8500.8650.7480.8070.8260.9190.8320.9160.8780.906
YBR057C0.7040.0000.7840.7190.7820.7880.4870.7710.7880.5540.3810.6850.6520.7870.7690.8450.7430.8050.7200.736
YJR122W0.8140.7840.0000.8090.8450.0000.7840.8090.8450.7540.8640.8620.8360.6080.8340.6940.6890.8560.7790.862
YDR503C0.3670.7190.8090.0000.8120.7380.6670.8510.6430.6810.7560.8010.5770.7870.7460.8880.8430.8760.8170.842
YGR136W0.9490.7820.8450.8120.0000.8450.7950.8470.9600.8430.9690.7600.9660.8290.8900.6760.8600.8450.9420.877
YPR048W0.7690.7880.0000.7380.8450.0000.7780.8890.5600.6780.7660.8410.8240.7990.7860.8380.7200.9030.8400.862

In [ ]:
semanticDistances2 <- sapply(allGenes, function(i) {
    sapply(allGenes, function(j) {
        geneSim(i, j, semData=scGO, measure="Wang", combine="BMA")["geneSim"]
    })
})

In [ ]:
length(allGenes)

In [ ]:
nrow(semanticDistances2)

In [ ]:
semanticDistances2[is.na(semanticDistances2)] <- 0

In [ ]:
write.table(semanticDistances, sep=",", file = sprintf("%s_wang_similarity.csv", file), row.names=FALSE, col.names=FALSE)

In [15]:
library(GOSim)

setOntology(ont, loadIC=FALSE)
setEvidenceLevel(evidences="all",organism=org.Sc.sgdORGANISM, gomap=org.Sc.sgdGO)

head(allGenes)


Loading required package: annotate
Loading required package: XML

Attaching package: ‘XML’

The following object is masked from ‘package:graph’:

    addNode


In [33]:
allGeneSims <- 1 - getGeneSim(allGenes, similarity="funSimMax", 
                          similarityTerm="relevance", normalization = TRUE)

allGeneSims[is.na(allGeneSims)] <- 1


filtering out genes not mapping to the currently set GO category ... ===> list of  1647 genes reduced to  1529 
Warning message in getGeneSim(allGenes, similarity = "funSimMax", similarityTerm = "relevance", :
“Similarity matrix contains values > 1! This may happen with simlarity='funSimMax', if one gene's GO annotation is a complete subset of another gene's GO annotation.”

In [36]:
head(allGeneSims)


YLR268WYJL155CYBR255WYFR027WYJR122WYLR244CYPL144WYBR135WYBR160WYDL238CYDR076WYDR508CYPR028WYMR117CYLR096WYCR067CYMR047CYML055WYGL015CYPL013C
YLR268W0.0000000 0.9938983 1 0.8118516 0.7318703 1.0000000 0.7339676 0.972578570.278222620.9833591 0.8929767 0.175877 0.2725629 0.8150920 0.326100540.1846623 0.2633827 0.5411890 1 0.79901481
YJL155C0.9938983 0.0000000 1 0.4416787 0.8477656 0.1645404 0.9849700 0.403009850.409314410.5005958 0.4386883 1.000000 0.9874740 0.9814483 0.382358451.0000000 0.5504462 0.5971853 1 0.28307242
YBR255W1.0000000 1.0000000 0 1.0000000 1.0000000 1.0000000 1.0000000 1.000000001.000000001.0000000 1.0000000 1.000000 1.0000000 1.0000000 1.000000001.0000000 1.0000000 1.0000000 1 1.00000000
YFR027W0.8118516 0.4416787 1 0.0000000 0.6558710 0.1292241 0.6593126 0.150986050.119722700.3179332 0.2183519 1.000000 0.5952518 0.3655197 0.534205150.9757161 0.5486758 0.7136951 1 0.26000288
YJR122W0.7318703 0.8477656 1 0.6558710 0.0000000 0.8896328 0.5773167 0.714185070.596204070.9044549 0.5980483 1.000000 0.6167872 0.5772192 0.850767961.0000000 0.7613731 0.8959624 1 0.71567447
YLR244C1.0000000 0.1645404 1 0.1292241 0.8896328 0.0000000 1.0000000 0.059093640.061333580.4717284 0.1245602 1.000000 1.0000000 1.0000000 0.024491470.9400956 0.3237669 0.3793700 1 0.06517584

In [37]:
write.table(allGeneSims, sep=",", file = sprintf("%s_rel_similarity_GOSim.csv", file), row.names=TRUE, col.names=FALSE)