Presenters: Aditya Bharadwaj, Jeff Law and T. M. Murali
Helpful links:
About Me:
In this session we will learn how to post more complex graphs to GraphSpace programatically.
One of the main goals of this tutorial is to help you share and interact with your own networks online at GraphSpace (www.graphspace.org). We suggest you bring your network(s) in a tab-delimited file of edges and have an idea of some of the visual properties (colors, edge style, node shapes) you want to use and we'll help you make it interactable and shareable online in this last session.
I have an example workflow where we will be repeating the analysis done by Ritz et al. [1] to reconstruct the WNT signaling pathway using PathLinker. In their analysis, they reconstructed signaling pathways using the set receptors and transcription factors (TFs) of a pathway as sources and targets for PathLinker. I have provided the receptors and TFs of the WNT pathway as well as an updated version of the human protein-protein interaction network (background interactome) for us to repeat their reconstruction of the WNT pathway. The figure below shows an overview of their analysis.
In the example workflow, you will learn how to take a list of edges create a high-quality graph sharable online. More specifically you'll learn to programatically:
After posting those results to GraphSpace you should have a pretty good idea how to get your own networks onto GraphSpace.
Below I have a little showcase of public graphs I and other users have created.
We'll be using the following packages in this tutorial.
In [ ]:
# If you're using anaconda, you can install pip using the following command
!conda install -y pip
# Install these packages using pip.
!pip install graphspace_python==0.8.2 pandas
# If you're not using anaconda and you're running python3, then use pip3 instead.
# --user tells pip to install locally without needing root priviledges
#!pip3 install --user graphspace_python==0.8.2 pandas
# Now download PathLinker
!conda install -y git
!git clone "https://github.com/Murali-group/PathLinker"
PathLinker computes the k shortest paths from a set of sources to a set of targets in a background graph/network. PathLinker uses Yen's Algorithm to find the k shortest paths.
PathLinker inputs:
PathLinker output:
Protein-protein interactions (PPIs) are available from multiple databases for many species. We work with Human and Yeast
Sources: BioGrid, DIP, InnateDB, IntAct, KEGG, MatrixDB, MINT, NetPath, PhosphositePlus, Reactome, SPIKE'
Edges are weighted using an evidence-based probabilistic approach. I also provideda file containing the evidence for each interaction including interaction type, experimental detection method, pubmed IDs and source database for each interaction and a mapping file
In [ ]:
# we can use pandas to look at the files in a nice table form
import pandas as pd
df = pd.read_csv("data/2017-06-16-human-ppi-weighted-cap0_75.tsv", sep='\t')
print("\n\tdata/2017-06-16-human-ppi-weighted-cap0_75.tsv")
print(df.head(3))
df = pd.read_csv("data/2017_06-human-interactome-evidence.tsv", sep='\t')
print("\n\tdata/2017_06-human-interactome-evidence.tsv")
print(df.head(3))
df = pd.read_csv('data/2017_06-human-interactome-mapping.tsv', sep='\t', header=None, names=["uniprotID", "gene_name"])
print("\n\tdata/2017_06-human-interactome-mapping.tsv")
print(df.head(3))
In [ ]:
df = pd.read_csv("data/Wnt-sources-targets.tsv", sep='\t')
print("\n\tdata/Wnt-sources-targets.tsv")
print(df)
In [ ]:
!python PathLinker/PathLinker.py --help
In [ ]:
!python PathLinker/PathLinker.py \
data/2017-06-16-human-ppi-weighted-cap0_75.tsv \
data/Wnt-sources-targets.tsv \
-k 100 \
-o output/Wnt- \
--write-paths
In [ ]:
!head output/Wnt-k_100-paths.txt output/Wnt-k_100-ranked-edges.txt
This step is not too difficult if we just want to view the nodes, edges and paths on GraphSpace. However, there is much more information we have that we would like to be able to quickly access and see on GraphSpace. For example:
If you want to see the main code where we build the GraphSpace graph and post the graph, skip to the bottom.
In [ ]:
# now import the packages we're going to use below
from graphspace_python.api.client import GraphSpace
from graphspace_python.graphs.classes.gsgraph import GSGraph
import networkx as nx
In [ ]:
# Read in the PathLinker results and keep track of the k for each of the edges
ranked_edges = {}
with open('output/Wnt-k_100-ranked-edges.txt', 'r') as file_handle:
for line in file_handle:
if line[0] == '#':
continue
line = line.rstrip().split('\t')
edge = (line[0], line[1])
k = line[2]
ranked_edges[edge] = k
# Read in the sources and targets file so we can change the shape and color of the sources and targets
sources = set()
targets = set()
with open('data/Wnt-sources-targets.tsv', 'r') as file_handle:
for line in file_handle:
if line[0] == '#':
continue
prot, prot_type = line.rstrip().split('\t')[0:2]
if prot_type.lower() in ["receptor", "source"]:
sources.add(prot)
elif prot_type.lower() in ["tf", "target"]:
targets.add(prot)
print("%d edges in PathLinker output, %d sources, %d targets" % (len(ranked_edges), len(sources), len(targets)))
In [ ]:
# before adding these edges to a GraphSpace object,
# we need to get the node, edge, and pathway information we want to post
# we also need to be able to map the uniprot IDs to their more friendly gene names
uniprot_to_gene = {}
with open('data/2017_06-human-interactome-mapping.tsv', 'r') as file_handle:
for line in file_handle:
if line[0] == '#':
continue
uniprotID, gene_name = line.rstrip().split('\t')
uniprot_to_gene[uniprotID] = gene_name
# get the edge weights from the interactome. We will add these to the popups later
edge_weights = {}
with open('data/2017-06-16-human-ppi-weighted-cap0_75.tsv', 'r') as file_handle:
for line in file_handle:
if line[0] == '#':
continue
u, v, w, evidence = line.rstrip().split('\t')
edge_weights[(u,v)] = float(w)
print("%s interactome edges" % (len(edge_weights)))
# get the netpath and kegg edges and nodes from the evidence file
# KEGG WNT pathway: http://www.genome.jp/kegg-bin/show_pathway?hsa04310
# NetPath WNT pathway: http://www.netpath.org/pathways?path_id=NetPath_8
kegg_wnt_edges = set()
netpath_wnt_edges = set()
evidence_file = 'data/2017_06-human-interactome-evidence.tsv'
with open(evidence_file, 'r') as file_handle:
for line in file_handle:
if line[0] == "#":
continue
u,v, directed, interactiontype, detectionmethod, pubid, source = line.rstrip().split('\t')
if source.lower() == 'kegg' and pubid.lower() == 'kegg:hsa04310':
kegg_wnt_edges.add((u,v))
if directed == "False":
kegg_wnt_edges.add((v,u))
elif source.lower() == 'netpath' and pubid.lower() == "netpath:netpath_8":
netpath_wnt_edges.add((u,v))
if directed == "False":
netpath_wnt_edges.add((v,u))
kegg_wnt_nodes = set([n for u,v in kegg_wnt_edges for n in (u,v)])
netpath_wnt_nodes = set([n for u,v in netpath_wnt_edges for n in (u,v)])
print("%s kegg wnt nodes, %s kegg wnt edges" % (len(kegg_wnt_nodes), len(kegg_wnt_edges)))
print("%s netpath wnt nodes, %s netpath wnt edges" % (len(netpath_wnt_nodes), len(netpath_wnt_edges)))
In [ ]:
# this next section is to get the sources of evidence, interaction type, and pubmed IDs for each interaction
# I chose to load it into a multi-level dictionary organized by source database
def getEvidence(edges, evidence_file):
"""
*edges*: a set of edges for which to get the evidence for
*split_family_nodes*: add evidence of family edges to pairwise interactions
*add_ev_to_family_edges*: add evidence of PPI edge covered by a family edge to the family edge.
If this option is specified, then a dictionary of direct evidence along with
a dictionary of the ppi evidence applied to the family edge is returned
returns a multi-level dictionary with the following structure
edge:
db/source:
interaction_type:
detection_method:
publication / database ids
for NetPath, KEGG and SPIKE, the detection method is the pathway name
NetPath, KEGG, and Phosphosite also have a pathway ID in the database/id set
which follows convention of id_type:id (for example: kegg:hsa04611)
"""
# initialize the evidence, edge_types, and edge dir/undir dictionaries
evidence = {}
edge_types = {}
edge_dir = {}
print("Reading evidence file %s" % (evidence_file))
# create a graph of the passed in edges
G = nx.Graph()
G.add_edges_from(edges)
# initialize the dictionaries
for t,h in G.edges():
evidence[(t,h)] = {}
evidence[(h,t)] = {}
edge_types[(t,h)] = set()
edge_types[(h,t)] = set()
edge_dir[(t,h)] = False
edge_dir[(h,t)] = False
file_handle = open(evidence_file, 'r')
for line in file_handle:
if line[0] == "#":
continue
u,v, directed, interactiontype, detectionmethod, pubid, source = line.rstrip().split('\t')
# header line of file
#uniprot_a uniprot_b directed interaction_type detection_method publication_id source
directed = True if directed == "True" else False
# We only need to get the evidence for the edges passed in, so if this edge is not in the list of edges, skip it
# G is a Graph so it handles undirected edges correctly
if not G.has_edge(u,v):
continue
evidence = addToEvidenceDict(evidence, (u,v), directed, source, interactiontype, detectionmethod, pubid)
edge_types = addEdgeType(edge_types, (u,v), directed, source, interactiontype)
if directed is True:
edge_dir[(u,v)] = True
else:
# if they are not already directed, then set them as undirected
if (u,v) not in edge_dir:
edge_dir[(u,v)] = False
edge_dir[(v,u)] = False
return evidence, edge_types, edge_dir
def addToEvidenceDict(evidence, e, directed, source, interactiontype, detectionmethod, pubid):
""" add the evidence of the edge to the evidence dictionary
*pubids*: publication id to add to this edge.
"""
if source not in evidence[e]:
evidence[e][source] = {}
if interactiontype not in evidence[e][source]:
evidence[e][source][interactiontype] = {}
if detectionmethod not in evidence[e][source][interactiontype]:
evidence[e][source][interactiontype][detectionmethod] = set()
evidence[e][source][interactiontype][detectionmethod].add(pubid)
if not directed:
# add the evidence for both directions
evidence = addToEvidenceDict(evidence, (e[1],e[0]), True, source, interactiontype, detectionmethod, pubid)
return evidence
def addEdgeType(edge_types, e, directed, source, interactiontype):
if e not in edge_types:
edge_types[e] = set()
# add the edge type as well
# direction was determined using the csbdb_interface.psimi_interaction_direction dictionary in CSBDB.
if not directed:
# it would be awesome if we knew which edges are part of complex formation vs other physical interactions
# is there an mi term for that?
t,h = e
edge_types[(t,h)].add('physical')
edge_types[(h,t)].add('physical')
elif source == "SPIKE":
edge_types[e].add('spike_regulation')
elif source == "KEGG":
if "phosphorylation" in interactiontype and "dephosphorylation" not in interactiontype:
# Most of them are phosphorylation
edge_types[e].add('phosphorylation')
if "activation" in interactiontype:
edge_types[e].add('activation')
if "inhibition" in interactiontype:
edge_types[e].add('inhibition')
else:
edge_types[e].add('enzymatic')
else:
# the rest is psi mi tags
# MI:0217 is for phosphorylation
if "MI:0217" in interactiontype:
# Most of the directed edges are phosphorylation
edge_types[e].add('phosphorylation')
else:
# TODO for now, just call the rest of the directed edges enzymatic.
edge_types[e].add('enzymatic')
return edge_types
Here is a small example of what the evidence dictionary looks like. I chose this edge because it is part of the NetPath WNT pathway and has a lot of support from multiple databases.
In [ ]:
# example edge: GSK3B-APC
edges = [("P49841", "P25054")]
evidence_file = 'data/2017_06-human-interactome-evidence.tsv'
evidence, edge_types, edge_dir = getEvidence(edges, evidence_file)
e = edges[0]
print("Evidence for edge %s-%s:" % (e[0], e[1]))
for source in evidence[e]:
print(source)
# for databases that don't provide an interaction type, I left it empty
for interactiontype in evidence[e][source]:
print("\t%s" % interactiontype)
# for curated databases or databases that don't provide a detection method, I put the name of the pathway or DB for the detection method
for detectionmethod in evidence[e][source][interactiontype]:
print("\t\t%s" % detectionmethod)
for pubid in evidence[e][source][interactiontype][detectionmethod]:
print("\t\t\t%s" % pubid)
print("Edge types:", edge_types[e])
print("Edge is directed:", edge_dir[e])
In [ ]:
# converts the evidence dictionary to html for the popup
def evidenceToHTML(u,v,evidence):
annotation = '<dl>'
sources = sorted(evidence.keys())
for source in sources:
annotation += '<dt>%s</dt>' % (source)
# TODO add interaction type color
for interactiontype in evidence[source]:
if interactiontype != '' and interactiontype != "None":
# use bull instead of li to save on white space
# nbsp stands for non-breaking space
annotation += '•  %s <br>' % interactiontype
for detectionmethod in evidence[source][interactiontype]:
# add a bullet point here after 4 spaces
annotation += '    •  '
annotation += '%s ' % detectionmethod
# now add the pubmed IDs.   is the html for a non-breaking space
pub_ids = evidence[source][interactiontype][detectionmethod]
#KEGG doesn't have pub ids. It has a pathway map and entry (evidence)
# now get the html for each of the links
pub_ids = [parsePubID(pub_id) for pub_id in pub_ids if parsePubID(pub_id) != '']
# use a non-breaking space with a comma so they all stay on the same line
annotation += ', '.join(pub_ids)
annotation += "<br>"
annotation += '<br>'
return annotation
# adds the links to the pubmed and other IDs
def parsePubID(publication_id):
id_type, pubid = publication_id.split(':')
if id_type == 'pubmed':
pubmedurl = 'http://www.ncbi.nlm.nih.gov/pubmed/%s' % (pubid)
desc = '<a style="color:blue" href="%s" target="PubMed">pmid:%s</a>' % (pubmedurl,pubid)
elif id_type == 'phosphosite':
phosphourl = 'http://www.phosphosite.org/siteAction.action?id=%s' % (pubid)
desc = '<a style="color:blue" href="%s" target="PSP">phosphosite:%s</a>' % (phosphourl,pubid)
elif id_type == 'kegg':
# links to KEGG pathway map
kegg_map_link = 'http://www.kegg.jp/kegg-bin/show_pathway?'
# links to KEGG pathway entry (evidence)
kegg_entry_link = 'http://www.kegg.jp/dbget-bin/www_bget?pathway+'
pathway_map_link = '<a style="color:blue" href="%s%s" target="KEGG">map</a>' % (kegg_map_link,pubid)
pathway_entry_link = '<a style="color:blue" href="%s%s" target="KEGG">evidence</a>' % (kegg_entry_link,pubid)
desc = "%s, %s"%(pathway_map_link,pathway_entry_link)
elif id_type == 'netpath':
netpath_url = "http://www.netpath.org/reactions?path_id=%s" % (pubid)
# links to KEGG pathway entry (evidence)
desc = '<a style="color:blue" href="%s" target="NetPath">netpath:%s</a>' % (netpath_url, pubid)
else:
# skip the rest for now
desc = ''
return desc
def getMainEdgeType(u,v, edge_types):
""" a single edge can have multiple edge types according to the different sources or databases
Choose a main edge type here
*edge_types* the set of edge types for a given edge
"""
main_edge_type = None
edge_type_order = ['phosphorylation', 'enzymatic', 'spike_regulation', 'activation', 'inhibition', 'physical']
for edge_type in edge_type_order:
if edge_type in edge_types:
main_edge_type = edge_type
break
if main_edge_type is None:
print("Warning: edge type of %s->%s not found. Setting to activation" % (u,v))
main_edge_type = 'activation'
return main_edge_type
In [ ]:
# takes in a node ID and the list of paths its in, and returns the html popup
def makeNodePopup(n, pathswithnode):
htmlstring = ''
uniproturl = 'http://www.uniprot.org/uniprot/%s'
htmlstring += '<b>Uniprot ID</b>: <a style="color:blue" href="%s" target="UniProtKB">%s</a><br>' % (uniproturl%n, n)
htmlstring += '<hr />'
htmlstring += '<b>Paths</b>: %s<br>' %(','.join(str(k) for k in sorted(pathswithnode)))
return htmlstring
# takes in the u->v edge, first path index k its in, and sources of evidence
# returns the html edge popup
def makeEdgePopup(u, v, k, evidence):
annotation = ''
annotation += '<b>%s - %s</b></br>'%(uniprot_to_gene[u], uniprot_to_gene[v])
annotation += '<b>%s - %s</b></br>'%(u,v)
annotation += '<b>Weight</b>: %.3f</br>' % (edge_weights[(u,v)])
annotation += '<b>Edge Ranking</b>: %s' % (k)
annotation += '<hr /><h><b>Sources of Evidence</b></h>'
annotation += evidenceToHTML(u,v,evidence[(u,v)])
return annotation
Here is an example of what the node and edge popups will look like for the node APC and the edge GSK3B-APC:
APC popup | GSK3B-APC popup |
---|---|
In [ ]:
# Dictionaries of node and edge properties
NODE_COLORS = {
#'target' : '#FFFF60', # yellow
#'source' : '#8CE1DC', # light blue
'target' : '#4286f4', # blue
'source' : '#4286f4', # blue
'default' : '#D8D8D8', # gray
'kegg' : '#ad6cfc', # purple
'netpath' : '#4286f4', # blue
}
NODE_SHAPES = {
'source' : 'triangle',
'target' : 'rectangle',
'default' : 'ellipse',
}
EDGE_COLORS = {
'physical' : '#27AF47', # green
'phosphorylation' : '#F07406', # orange
#'enzymatic' : '#2A69DC', # blue
'enzymatic' : '#DD4B4B', # red
'activation' : 'grey',
'inhibition' : 'grey',
'spike_regulation': 'brown',
'kegg' : '#ad6cfc', # purple
'netpath' : '#4286f4', # blue
}
# Adds all of the nodes and edges to the GraphSpace object with the specified shapes and colors
def constructGraph(ranked_edges, sources, targets):
'''
Posts the pathlinker result to graphspace
:param sources: list of source nodes
:param targets: list of target nodes
'''
# NetworkX object
#G = nx.DiGraph()
G = GSGraph()
# first get the evidence, edge types, and edge directionality for all of the edges we will be posting
evidence, edge_types, edge_dir = getEvidence(ranked_edges.keys(), evidence_file)
# get the nodes from the set of edges. we'll add those first
nodes = set([n for u,v in ranked_edges for n in (u,v)])
# add GraphSpace/Cytoscape.js attributes to all nodes.
for n in nodes:
#default is gray circle
node_type = 'default'
if n in sources:
# if n is the source, make it a triangle
node_type = 'source'
elif n in targets:
# if n is a taret, make it a square
node_type = 'target'
# find the path ranks of all of the paths the node is in so we can set the k value of this node
pathswithnode = set([int(ranked_edges[(t,h)]) for t,h in ranked_edges if t==n or h==n])
# The k value is used by the rank filter on GraphSpace
# All nodes and edges need a value for the filter to work
# Here we are filtering by the first path the node appears in which will allow us to step through the paths on GraphSpace
k_value = min(pathswithnode)
# set the name of the node to be the gene name and add the k to the label
gene_name = uniprot_to_gene[n]
node_popup = makeNodePopup(n, pathswithnode)
label = "%s\n%d"%(gene_name,k_value)
G.add_node(gene_name, popup=node_popup, label=label, k=k_value)
# now add the style for the node
shape = NODE_SHAPES[node_type]
color = NODE_COLORS[node_type]
if n in netpath_wnt_nodes:
color = NODE_COLORS['netpath']
elif n in kegg_wnt_nodes:
color = NODE_COLORS['kegg']
# some attributes (such as opacity) are not implemented in graphspace-python.
# For those, we need to make our own dictionary of the attribute and value
# for more style settings, see http://js.cytoscape.org/#style
attr_dict = {}
#attr_dict['background-opacity'] = 0.8
G.add_node_style(gene_name, attr_dict=attr_dict, shape=shape, color=color, width=45, height=45,
style='solid', border_color=color, border_width=2, bubble=color)
# Add all of the edges and their Graphspace/Cytoscape.js attributes
for (u,v) in ranked_edges:
# get the main edge type so we can style and color edges accordingly
main_edge_type = getMainEdgeType(u,v,edge_types[(u,v)])
gene_name_u = uniprot_to_gene[u]
gene_name_v = uniprot_to_gene[v]
k_value = ranked_edges[(u,v)]
edge_popup = makeEdgePopup(u,v,k_value, evidence)
G.add_edge(gene_name_u, gene_name_v, directed=edge_dir[(u,v)], popup=edge_popup, k=k_value)
color = EDGE_COLORS[main_edge_type]
if (u,v) in netpath_wnt_edges:
color = EDGE_COLORS['netpath']
elif (u,v) in kegg_wnt_edges:
color = EDGE_COLORS['kegg']
# TODO use the edge weight to set the width of the edge
width = 2
arrow_shape = None
if edge_dir[(u,v)] is True:
arrow_shape = "triangle"
# if this is an inhibition edge, make the arrowhead be a T shape
if 'activation' not in edge_types[(u,v)] and 'inhibition' in edge_types[(u,v)]:
arrow_shape = 'tee'
# some attributes (such as opacity) are not implemented in graphspace-python.
# For those, we need to make our own dictionary of the attribute and value
# for more style settings, see http://js.cytoscape.org/#style
attr_dict = {}
#attr_dict['opacity'] = 0.8
G.add_edge_style(gene_name_u, gene_name_v, attr_dict=attr_dict,
directed=edge_dir[(u,v)], color=color, width=width,
arrow_shape=arrow_shape, edge_style='solid')
return G
Now that we have all of the functions for getting the evidence and popups setup, we're ready to build our GraphSpace graph and then post it to GraphSpace.
For more info about using the Graphspace Python library, see the documentation
In [ ]:
# now finally construct the GraphSpace graph and post it to GraphSpace!
G = constructGraph(ranked_edges, sources, targets)
# TODO add a description of the graph with a legend describing shapes and colors
desc = ''
title = 'NetPath WNT Pathway PathLinker Reconstruction'
metadata = {'description':desc, 'title':title}
G.set_data(metadata)
tags = ['pathlinker', 'netpath', '2017icsb', 'tutorial']
G.set_tags(tags)
graph_name = "netpath-wnt-pathlinker-k100"
G.set_name(graph_name)
# post to graphspace
# set your username and password here
username = 'user6@example.com'
password = 'user6'
gs = GraphSpace(username, password)
gs_graph = gs.get_graph(graph_name, owner_email=username)
if gs_graph is None:
print("\nPosting graph '%s' to graphspace\n" % (graph_name))
gs_graph = gs.post_graph(G)
else:
print("\nGraph '%s' already exists. Updating it\n" % (graph_name))
# this can take a while if your graph has a lot of nodes and edges
gs_graph = gs.update_graph(G, graph_name=graph_name, owner_email=username)
print("Done")
print(gs_graph.url)
In [ ]:
from graphspace_python.graphs.classes.gsgroup import GSGroup
# you can also share your graph with a group
# create the group if it doesn't exist
#group = gs.post_group(GSGroup(name='icsb2017', description='sample group'))
# or get the group you already created it
#group = gs.get_group(name='icsb2017')
#print(group.url)
#gs.share_graph(graph=G, group=group)
gs_graph = gs.publish_graph(graph=G)
print(gs_graph.url)
Here is the layout I made for our graph:
In [ ]:
# Currently layouts store style attributes as well as x and y positions
# so if you layed out the graph how you wanted it but now want to update the style (such as edge width),
# you will need to copy the x and y positions of the layout you made to the updated graph
# I created the layout 'layout1', so I'll use that
layout_name = 'layout1'
layout = gs.get_graph_layout(graph=gs_graph,layout_name=layout_name)
# set the x and y position of each node in the updated graph to the x and y positions of the layout you created
print("Setting the x and y coordinates of each node to the positions in %s" % layout_name)
for node, positions in layout.positions_json.items():
G.set_node_position(node_name=node, x=positions['x'], y=positions['y'])
# now re-post the graph and the positions should be set
print("Updating graph", graph_name)
gs_graph = gs.update_graph(G, graph_name=graph_name, owner_email=username)
# have to publish/share again after updating
gs_graph = gs.publish_graph(graph=G)
print("Done")
print(gs_graph.url)
You should have all the tools you need to post your own networks to GraphSpace.
If you run into any problems, we'd be happy to help! Send us an email or if there's a bug in the code, please open an issue in the appropriate GitHub repository describing the error and the steps to reproduce it.
Happy posting!