Objective: To identify genotype-phenotype trait association in yeast

Develop a workflow to identify genes indirectly associated with a certain yeast phenotype (butanol tolerance) using EKP and visualize them in an interactive knowledge graph using SPOT

Load the API scripts with login credentials


In [1]:
library(dplyr)
library(tidyr)
library(sqldf)
library(splitstackshape)
library(stringr)
library(compare)
setwd("../src")


Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Loading required package: gsubfn
Loading required package: proto
Loading required package: RSQLite
Loading required package: DBI
Loading required package: data.table
------------------------------------------------------------------------------
data.table + dplyr code now lives in dtplyr.
Please library(dtplyr)!
------------------------------------------------------------------------------

Attaching package: ‘data.table’

The following objects are masked from ‘package:dplyr’:

    between, first, last


Attaching package: ‘compare’

The following object is masked from ‘package:base’:

    isTRUE

Load the API scripts with login credentials


In [2]:
source("EuretosInfrastructure.R")
options(warn=-1)


Attaching package: ‘magrittr’

The following object is masked from ‘package:tidyr’:

    extract

Loading required package: qdapDictionaries
Loading required package: qdapRegex

Attaching package: ‘qdapRegex’

The following object is masked from ‘package:jsonlite’:

    validate

The following objects are masked from ‘package:dplyr’:

    escape, explain

Loading required package: qdapTools

Attaching package: ‘qdapTools’

The following object is masked from ‘package:data.table’:

    shift

The following object is masked from ‘package:dplyr’:

    id

Loading required package: RColorBrewer

Attaching package: ‘qdap’

The following object is masked from ‘package:magrittr’:

    %>%

The following object is masked from ‘package:stringr’:

    %>%

The following object is masked from ‘package:tidyr’:

    %>%

The following object is masked from ‘package:dplyr’:

    %>%

The following object is masked from ‘package:base’:

    Filter


Retrieving page 0
Retrieving page 1
Retrieving page 2
Retrieving page 3
Retrieving page 4
Retrieving page 5
Retrieving page 6
Retrieving page 7
Retrieving page 8
Retrieving page 9
Retrieving page 10
Retrieving page 11

DSM workflow starts here:

Load Input data provided by DSM this data consists of a list of yeast genes and a list of terms that represent butanol tolerance


In [3]:
yeast_genes<-read.csv("20170119_GeneList_DSM.txt",header=TRUE,sep="\t")

Step 1a : Get the starting concept identifiers


In [4]:
start<-getConceptID(tolower(as.character(yeast_genes[,"SGD_ID"])))

start<-start[,"EKP_Concept_Id"]




















































In [5]:
head(start)


  1. '3885475'
  2. '3885366'
  3. '3877933'
  4. '2480408'
  5. '3888043'
  6. '3878612'

Step 1b: Get the ending concept identifiers for "resistance to chemicals"


In [6]:
end <- unlist(getResistanceEKPID())
end<-end["content.id"] #EKP ID of resistance to chemicals




In [7]:
head(end)


content.id: '5886311'

Note: The concept representing "resistance to chemicals" within EKP is indicated by its content.id

Step 1c: Get the ending concept identifiers for "butanol tolerance"


In [8]:
end2<- unlist(getButanolID())
end2<-end2["content.id"] # EKP ID of butanol




In [9]:
head(end2)


content.id: '814946'

Note: The concept representing "butanol tolerance" within EKP is indicated by its content.id

Step 2a: Get Indirect relationships between "yeast genes"(start) and "resistance to chemicals"(end)


In [ ]:
resistance2Chemicals<-getIndirectRelation(start,end)

Calculations take time its wise to save intermediate results


In [ ]:
save(resistance2Chemicals, file = "resistance2Chemicals.rda")

In [10]:
load(file="resistance2Chemicals.rda")

Step 2b: Get Indirect relationships between "yeast genes"(start) and "resistance to butanol"(end)


In [ ]:
resistance2Butanol<-getIndirectRelation(start,end2)

Calculations take time its wise to save intermediate results


In [ ]:
save(resistance2Butanol, file = "resistance2Butanol.rda")

In [12]:
load(file="resistance2Butanol.rda")

Formatting and data cleaning


In [13]:
dfs1<-as.matrix(getTableFromJson(resistance2Chemicals))
dfs1<- data.frame(dfs1, stringsAsFactors=FALSE)

Formatting and data cleaning


In [14]:
dfs2<-as.matrix(getTableFromJson(resistance2Butanol))
dfs2<- data.frame(dfs2, stringsAsFactors=FALSE)

Step 3: Intersect "resistance to chemicals" and "1-butanol" concepts


In [15]:
comparison <- compare(dfs1,dfs2,allowAll=TRUE)
dfs<-comparison$tM

In [16]:
head(dfs)


SubjectPredicateObjectPublicationsScore
1128245 10773577 5886311 2314666958.8426
1128245 10773577 5886311 2314894949.8632
1128245 10773616 5886311 2314861118.8426
1128245 10773740 5886311 2154479599.8632
1435241 10773577 5886311 21274569410.1289
1435241 10773577 5886311 23137270610.4035

In [17]:
dim(dfs)


  1. 1333
  2. 5

Step 4: Map human readable triples from the reference database

reference list is collected from EKP


In [18]:
pred<-read.csv("Reference_Predicate_List.csv",header=TRUE)
pred<-pred[,c(2,3)]
colnames(pred)<-c("pred","names")


subject_name<-getConceptName(dfs[,"Subject"])
dfs<-cbind(dfs,subject_name[,2])

object_name<-getConceptName(dfs[,"Object"])
dfs<-cbind(dfs,object_name[,2])

predicate_name<-sqldf('select * from dfs left join pred on pred.pred=dfs.Predicate')

pbs<-getPubMedId(dfs$Publications)

tripleName<-cbind(subject_name[,"name"],as.character(predicate_name[,"names"]),object_name[,"name"],pbs,dfs[,"Score"])
tripleName<-tripleName[,c(1,2,3,5,6)]
colnames(tripleName)<-c("Subject","Predicate","Object","Provenance","Score")











































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































Loading required package: tcltk





















































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































In [20]:
head(tripleName)


SubjectPredicateObjectProvenanceScore
tavaborole binds with resistance to chemicals S000155945|YOL031C|toxaphene (640 uM) 8.8426
tavaborole binds with resistance to chemicals STRING-db4932.YOL033W 9.8632
tavaborole gene product variant increases resistance to chemicals STRING-db4932.YBR173C 8.8426
tavaborole gene product variant does not result in abnormalresistance to chemicals 15883373 9.8632
neothyonidioside binds with resistance to chemicals 27711131 10.1289
neothyonidioside binds with resistance to chemicals S000156575|YHR167W|radicicol 10.4035

In [21]:
dim(tripleName)


  1. 1333
  2. 5

Step 4: Write output to a file and Vizualize these in Triple Viewer/Spot


In [22]:
write.table(tripleName,file="./triple.csv",sep=";",row.names=FALSE)

Step 5: Summarizing the results post processing


In [23]:
gr2c<-filter(dfs1,Subject==start)  ## genes involving resistance to chemicals 

gr2b<-filter(dfs2,Subject==start)  ## genes involving resistance to chemicals 

interRC_RB<-intersect(gr2c[,1],gr2b[,1])

### Genes pre9,nkp2,snt307,rtg1

DSM_Genes <- getConceptName(gr2b[,"Subject"])
relationship <- sqldf('select * from gr2b left join pred on pred.pred=gr2b.Predicate')
relationship<- relationship$names
represent<-cbind(DSM_Genes,gr2b$Score,relationship)
pubmedID<-getPubMedId(gr2b$Publications)
represent<-cbind(represent,relationship,pubmedID)
represent<-represent[,c("name","gr2b$Score","relationship","V2")]
names(represent)<-c("DSMGenes","AssociationScoreButanol","RelationshipBtwGenesButanol","Publications")
write.csv(represent,file="RepresentationSummary.csv")






























In [ ]: