Tuesday, 3/24/2015
Cytoscape is an all-in-one package for network data integration, analysys, and visualization. However, there are several powerful graph abnalysis tools designed for advanced users. They usually have command-line interface and are not easy to use for interactive data visualization. By using cyREST, you can access the best of the two worlds!
In this section, you will learn the following:
One of the most popular graph analysis library in Python community. It is easy to install, but a bit slower than the other two.
Very popular library in R programmers, but also available for Python. Lots of graph analysis functions and popular in social network analysis community.
In most cases, this is the fastest graph analysis library. Optimization is done by calling external Boost C++ library.
In this container, all of these three are installed and ready to use!
In [1]:
import requests
import json
import networkx as nx
from IPython.display import Image
from py2cytoscape import util as cy
from collections import OrderedDict
import numpy as np
from bokeh.charts import Bar
from bokeh.plotting import *
import matplotlib.pyplot as plt
%matplotlib inline
output_notebook()
import pandas as pd
PORT_NUMBER = 1234
IP = '137.110.137.158'
BASE = 'http://' + IP + ':' + str(PORT_NUMBER) + '/v1/'
HEADERS = {'Content-Type': 'application/json'}
# Start from a clean slate!
requests.delete(BASE + 'session')
!pip show py2cytoscape
In [2]:
# Create dictionary object from JSON file
f = open('data/yeast.json', 'r')
cyjs_network = json.load(f)
# Set network name
cyjs_network['data']['name'] = 'Yeast Sample 1'
res = requests.post(BASE + 'networks', data=json.dumps(cyjs_network), headers=HEADERS)
new_suid = res.json()['networkSUID']
# Apply style and layout
requests.get(BASE + 'apply/layouts/force-directed/' + str(new_suid))
requests.get(BASE + 'apply/styles/default/' + str(new_suid))
Image(BASE+'networks/' + str(new_suid) + '/views/first.png')
Out[2]:
Basic data exchange format for cyREST is Cytoscape.js JSON. To use objects in Cytoscape.js JSON format in other network libraries, we need to write a converter code for data rounndtrip.
This is still an ongoing project, but we are developing utility library named py2cytoscape which includes data conversion utilities and other usiful functions for Python users. Aa of today (March 2015), data roundtrip between Cytoscape,js JSOB and NetworkX is supported.
Here is the simplest example:
In [3]:
# Convert Python dictionary in Cytoscape.js format into NetworkX object
nx_network = cy.to_networkx(cyjs_network)
# Draw it with NetworkX renderer
plt.figure(figsize=(12,8));
nx.draw_spring(nx_network)
In [4]:
# Generate graphs with NetworkX
# Scale-Free graph wiht 100 nodes (Directed)
scale_free_graph = nx.scale_free_graph(100)
scale_free_graph.graph['name'] = 'Scale-Free Graph'
mst = nx.minimum_spanning_tree(scale_free_graph.to_undirected())
# Convert it into Cytoscape.js format
def post_nx(nx_graph, layout='force-directed'):
# Convert into Cytoscape.js JSON
cyjs_network = cy.from_networkx(nx_graph)
# POST it!
res = requests.post(BASE + 'networks', data=json.dumps(cyjs_network), headers=HEADERS)
suid = res.json()['networkSUID']
requests.get(BASE + 'apply/layouts/' + layout + '/' + str(suid))
return Image(url=BASE+'networks/' + str(suid) + '/views/first.png', embed=True)
network_images = []
network_images.append(post_nx(scale_free_graph))
network_images[0]
Out[4]:
In [5]:
post_nx(mst)
Out[5]:
In [6]:
# ...And other graph generators
graphs = {}
NUMBER_OF_NODES = 100
# Add Converted Cytoscape network
graphs['yeast'] = nx_network
# Complete
graphs['complete'] = nx.complete_graph(NUMBER_OF_NODES)
# Circular Ladder
graphs['circular ladder'] = nx.circular_ladder_graph(NUMBER_OF_NODES)
# Binominal
graphs['binominal'] = nx.binomial_graph(NUMBER_OF_NODES, 0.3)
In [7]:
for key in graphs.keys():
g = graphs[key]
# Perform simple graph analysis
# Node statistics
bc = nx.betweenness_centrality(g)
degree = nx.degree(g)
cc = nx.closeness_centrality(g)
# mst = nx.minimum_spanning_tree(g.to_undirected())
nx.set_node_attributes(g, 'betweenness', bc)
nx.set_node_attributes(g, 'closeness', cc)
nx.set_node_attributes(g, 'degree', degree)
# Network statistics
g.graph["avg_shortest_path_len"] = nx.average_shortest_path_length(g)
g.graph["density"] = nx.density(g)
network_images.append(post_nx(g, 'circular'))
In [8]:
network_images[4]
Out[8]:
In [9]:
import graph_tool as gt
import graph_tool.collection as collection
dot_graph = gt.load_graph('data/sample.dot')
dot_graph.save('data/sample.graphml', fmt='graphml')
def create_from_list(network_list):
payload = {'source': 'url', 'collection': 'Dot Sample'}
server_res = requests.post(BASE + 'networks', data=json.dumps(network_list), headers=HEADERS, params=payload)
return server_res.json()
network_file = 'file:////Users/kono/prog/git/vizbi-2015/tutorials/data/sample.graphml'
id_json = create_from_list([network_file])
# Apply layout
suid = id_json[0]['networkSUID'][0]
requests.get(BASE + 'apply/layouts/force-directed/' + str(suid))
Image(url=BASE+'networks/' + str(suid) + '/views/first.png', embed=True)
Out[9]:
In [10]:
import igraph
gal= igraph.Graph.Read_GML('data/galFiltered.gml')
dend = gal.community_edge_betweenness()
clusters = dend.as_clustering()
gal.vs['clusters'] = clusters.membership
df = pd.DataFrame(gal.vs['clusters'], gal.vs['label'], columns=['cluster'])
df.to_csv('data/clusters.txt')
df.head(10)
Out[10]:
Pandas is a standard library in Python data scientists community. It provids several data structures and functions for data preparation and analysis. It is a fairly large library with a lot of features and we do not have enough time to cover all feattures. There is a great book written by the author of the library. If you are intereted in Pandas, please read the following book:
In [11]:
df1 = pd.read_csv('http://www.ebi.ac.uk/Tools/webservices/psicquic/intact/webservices/current/search/query/brca*?format=tab27',
delimiter='\t', header=None)
df1.head()
Out[11]:
In [12]:
brca_df = pd.DataFrame()
brca_df['source'] = df1[0]
brca_df['target'] = df1[1]
brca_df.head()
Out[12]:
In [13]:
brca_df['source_uniprot'] = brca_df['source'].apply(lambda x: x.split(':', 1)[1])
brca_df['target_uniprot'] = brca_df['target'].apply(lambda x: x.split(':', 1)[1])
brca_df.head(10)
Out[13]:
In [14]:
# Small function to create NetworkX graph from DataFrame
net = nx.MultiGraph()
for i in range(len(brca_df)):
row = brca_df.ix[i]
source = net.add_node(row['source_uniprot'])
target = net.add_node(row['target_uniprot'])
net.add_edge(row['source_uniprot'], row['target_uniprot'])
res = requests.post(BASE + 'networks', data=json.dumps(cy.from_networkx(net)), headers=HEADERS)
suid = res.json()['networkSUID']
requests.get(BASE + 'apply/layouts/force-directed/' + str(suid))
Image(url=BASE+'networks/' + str(suid) + '/views/first.png', embed=True)
Out[14]:
In [15]:
# Generate a unique list of nodes
node_table = pd.DataFrame(net.nodes(), columns=['name'])
# Assign type for CHEBI nodes
node_table['type'] = node_table['name'].apply(lambda x: 'compound' if 'CHEBI' in x else 'protein')
node_table.head(10)
Out[15]:
In [16]:
nodes_url = BASE + 'networks/' + str(suid) + '/tables/defaultnode'
print(nodes_url)
df_nodes = pd.DataFrame(requests.get(nodes_url).json()['rows'])
df_nodes.head()
Out[16]:
In [17]:
merged = df_nodes.merge(node_table, on='name')
merged.head(10)
Out[17]:
In [ ]:
# First, create new column
type_column = {
'name' : 'type',
'type' : 'String'
}
requests.post(BASE + 'networks/' + str(suid) + '/tables/defaultnode/columns', data=json.dumps(type_column), headers=HEADERS)
def append_new_value(row, new_values):
new_val ={
'SUID': row['SUID'],
'value' : row['type']
}
new_values.append(new_val)
values = []
merged.apply(lambda row: append_new_value(row, values), axis=1)
# Now update it!
requests.put(BASE + 'networks/' + str(suid) + '/tables/defaultnode/columns/type', data=json.dumps(values), headers=HEADERS)
Out[ ]:
Now you can use new values under type column to create new visualization. Actual visualization code examples will be discussed in the next lesson, but here is an example:
In [ ]:
import graph_tool.community as community
import graph_tool.draw as draw
g = gt.load_graph('data/galFiltered.gml')
pos = draw.arf_layout(g, max_iter=0)
draw.graph_draw(g, pos=pos, output_size=(950, 1000), inline=True)