Objective: To identify genotype-phenotype trait association in yeast

Develop a workflow to identify genes indirectly associated with a certain yeast phenotype (butanol tolerance) using EKP and visualize them in an interactive knowledge graph using SPOT



In [1]:

    
library(dplyr)
library(tidyr)
library(sqldf)
library(splitstackshape)
library(stringr)
library(compare)
setwd("../src")









    



Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Loading required package: gsubfn
Loading required package: proto
Loading required package: RSQLite
Loading required package: DBI
Loading required package: data.table
------------------------------------------------------------------------------
data.table + dplyr code now lives in dtplyr.
Please library(dtplyr)!
------------------------------------------------------------------------------

Attaching package: ‘data.table’

The following objects are masked from ‘package:dplyr’:

    between, first, last


Attaching package: ‘compare’

The following object is masked from ‘package:base’:

    isTRUE



In [2]:

    
source("EuretosInfrastructure.R")
options(warn=-1)









    



Attaching package: ‘magrittr’

The following object is masked from ‘package:tidyr’:

    extract

Loading required package: qdapDictionaries
Loading required package: qdapRegex

Attaching package: ‘qdapRegex’

The following object is masked from ‘package:jsonlite’:

    validate

The following objects are masked from ‘package:dplyr’:

    escape, explain

Loading required package: qdapTools

Attaching package: ‘qdapTools’

The following object is masked from ‘package:data.table’:

    shift

The following object is masked from ‘package:dplyr’:

    id

Loading required package: RColorBrewer

Attaching package: ‘qdap’

The following object is masked from ‘package:magrittr’:

    %>%

The following object is masked from ‘package:stringr’:

    %>%

The following object is masked from ‘package:tidyr’:

    %>%

The following object is masked from ‘package:dplyr’:

    %>%

The following object is masked from ‘package:base’:

    Filter







    









    



Retrieving page 0
Retrieving page 1
Retrieving page 2
Retrieving page 3
Retrieving page 4
Retrieving page 5
Retrieving page 6
Retrieving page 7
Retrieving page 8
Retrieving page 9
Retrieving page 10
Retrieving page 11

DSM workflow starts here:

Load Input data provided by DSM this data consists of a list of yeast genes and a list of terms that represent butanol tolerance



In [3]:

    
yeast_genes<-read.csv("20170119_GeneList_DSM.txt",header=TRUE,sep="\t")

Step 1a : Get the starting concept identifiers



In [4]:

    
start<-getConceptID(tolower(as.character(yeast_genes[,"SGD_ID"])))

start<-start[,"EKP_Concept_Id"]



In [5]:

    
head(start)

Step 1b: Get the ending concept identifiers for "resistance to chemicals"



In [6]:

    
end <- unlist(getResistanceEKPID())
end<-end["content.id"] #EKP ID of resistance to chemicals



In [7]:

    
head(end)









    




content.id: '5886311'

Note: The concept representing "resistance to chemicals" within EKP is indicated by its content.id

Step 1c: Get the ending concept identifiers for "butanol tolerance"



In [8]:

    
end2<- unlist(getButanolID())
end2<-end2["content.id"] # EKP ID of butanol



In [9]:

    
head(end2)









    




content.id: '814946'

Note: The concept representing "butanol tolerance" within EKP is indicated by its content.id

Step 2a: Get Indirect relationships between "yeast genes"(start) and "resistance to chemicals"(end)



In [ ]:

    
resistance2Chemicals<-getIndirectRelation(start,end)

Calculations take time its wise to save intermediate results



In [ ]:

    
save(resistance2Chemicals, file = "resistance2Chemicals.rda")



In [10]:

    
load(file="resistance2Chemicals.rda")

Step 2b: Get Indirect relationships between "yeast genes"(start) and "resistance to butanol"(end)



In [ ]:

    
resistance2Butanol<-getIndirectRelation(start,end2)

Calculations take time its wise to save intermediate results



In [ ]:

    
save(resistance2Butanol, file = "resistance2Butanol.rda")



In [12]:

    
load(file="resistance2Butanol.rda")

Formatting and data cleaning



In [13]:

    
dfs1<-as.matrix(getTableFromJson(resistance2Chemicals))
dfs1<- data.frame(dfs1, stringsAsFactors=FALSE)

Formatting and data cleaning



In [14]:

    
dfs2<-as.matrix(getTableFromJson(resistance2Butanol))
dfs2<- data.frame(dfs2, stringsAsFactors=FALSE)

Step 3: Intersect "resistance to chemicals" and "1-butanol" concepts



In [15]:

    
comparison <- compare(dfs1,dfs2,allowAll=TRUE)
dfs<-comparison$tM



In [16]:

    
head(dfs)









    





Subject Predicate Object Publications Score

	1128245  10773577 5886311  231466695 8.8426   
	1128245  10773577 5886311  231489494 9.8632   
	1128245  10773616 5886311  231486111 8.8426   
	1128245  10773740 5886311  215447959 9.8632   
	1435241  10773577 5886311  212745694 10.1289  
	1435241  10773577 5886311  231372706 10.4035



In [17]:

    
dim(dfs)

Step 4: Map human readable triples from the reference database

reference list is collected from EKP



In [18]:

    
pred<-read.csv("Reference_Predicate_List.csv",header=TRUE)
pred<-pred[,c(2,3)]
colnames(pred)<-c("pred","names")


subject_name<-getConceptName(dfs[,"Subject"])
dfs<-cbind(dfs,subject_name[,2])

object_name<-getConceptName(dfs[,"Object"])
dfs<-cbind(dfs,object_name[,2])

predicate_name<-sqldf('select * from dfs left join pred on pred.pred=dfs.Predicate')

pbs<-getPubMedId(dfs$Publications)

tripleName<-cbind(subject_name[,"name"],as.character(predicate_name[,"names"]),object_name[,"name"],pbs,dfs[,"Score"])
tripleName<-tripleName[,c(1,2,3,5,6)]
colnames(tripleName)<-c("Subject","Predicate","Object","Provenance","Score")









    


















































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































    



Loading required package: tcltk



In [20]:

    
head(tripleName)









    





Subject Predicate Object Provenance Score

	tavaborole                                      binds with                                      resistance to chemicals                         S000155945|YOL031C|toxaphene (640 uM)           8.8426                                          
	tavaborole                                      binds with                                      resistance to chemicals                         STRING-db4932.YOL033W                           9.8632                                          
	tavaborole                                      gene product variant increases                  resistance to chemicals                         STRING-db4932.YBR173C                           8.8426                                          
	tavaborole                                      gene product variant does not result in abnormal resistance to chemicals                         15883373                                        9.8632                                          
	neothyonidioside                                binds with                                      resistance to chemicals                         27711131                                        10.1289                                         
	neothyonidioside                                binds with                                      resistance to chemicals                         S000156575|YHR167W|radicicol                    10.4035



In [21]:

    
dim(tripleName)

Step 4: Write output to a file and Vizualize these in Triple Viewer/Spot



In [22]:

    
write.table(tripleName,file="./triple.csv",sep=";",row.names=FALSE)

Step 5: Summarizing the results post processing



In [23]:

    
gr2c<-filter(dfs1,Subject==start)  ## genes involving resistance to chemicals 

gr2b<-filter(dfs2,Subject==start)  ## genes involving resistance to chemicals 

interRC_RB<-intersect(gr2c[,1],gr2b[,1])

### Genes pre9,nkp2,snt307,rtg1

DSM_Genes <- getConceptName(gr2b[,"Subject"])
relationship <- sqldf('select * from gr2b left join pred on pred.pred=gr2b.Predicate')
relationship<- relationship$names
represent<-cbind(DSM_Genes,gr2b$Score,relationship)
pubmedID<-getPubMedId(gr2b$Publications)
represent<-cbind(represent,relationship,pubmedID)
represent<-represent[,c("name","gr2b$Score","relationship","V2")]
names(represent)<-c("DSMGenes","AssociationScoreButanol","RelationshipBtwGenesButanol","Publications")
write.csv(represent,file="RepresentationSummary.csv")



In [ ]:

Subject	Predicate	Object	Publications	Score
1128245	10773577	5886311	231466695	8.8426
1128245	10773577	5886311	231489494	9.8632
1128245	10773616	5886311	231486111	8.8426
1128245	10773740	5886311	215447959	9.8632
1435241	10773577	5886311	212745694	10.1289
1435241	10773577	5886311	231372706	10.4035

Subject	Predicate	Object	Provenance	Score
tavaborole	binds with	resistance to chemicals	S000155945\|YOL031C\|toxaphene (640 uM)	8.8426
tavaborole	binds with	resistance to chemicals	STRING-db4932.YOL033W	9.8632
tavaborole	gene product variant increases	resistance to chemicals	STRING-db4932.YBR173C	8.8426
tavaborole	gene product variant does not result in abnormal	resistance to chemicals	15883373	9.8632
neothyonidioside	binds with	resistance to chemicals	27711131	10.1289
neothyonidioside	binds with	resistance to chemicals	S000156575\|YHR167W\|radicicol	10.4035

Objective: To identify genotype-phenotype trait association in yeast

Develop a workflow to identify genes indirectly associated with a certain yeast phenotype (butanol tolerance) using EKP and visualize them in an interactive knowledge graph using SPOT

Load the API scripts with login credentials

Load the API scripts with login credentials

DSM workflow starts here:

Load Input data provided by DSM this data consists of a list of yeast genes and a list of terms that represent butanol tolerance

Step 1a : Get the starting concept identifiers

Step 1b: Get the ending concept identifiers for "resistance to chemicals"

Note: The concept representing "resistance to chemicals" within EKP is indicated by its content.id

Step 1c: Get the ending concept identifiers for "butanol tolerance"

Note: The concept representing "butanol tolerance" within EKP is indicated by its content.id

Step 2a: Get Indirect relationships between "yeast genes"(start) and "resistance to chemicals"(end)

Calculations take time its wise to save intermediate results

Step 2b: Get Indirect relationships between "yeast genes"(start) and "resistance to butanol"(end)

Calculations take time its wise to save intermediate results

Formatting and data cleaning

Formatting and data cleaning

Step 3: Intersect "resistance to chemicals" and "1-butanol" concepts

Step 4: Map human readable triples from the reference database

reference list is collected from EKP

Step 4: Write output to a file and Vizualize these in Triple Viewer/Spot

Step 5: Summarizing the results post processing