Friday, 4/17/2015 at Sanford
This is an introduction to cyREST and its basic API. In this section, you will learn how to access Cytoscape through RESTful API.
This tutorial is tested on the following platform:
In this tutorial, we will use several popular Python libraries to make this workflow more realistic.
NO. Because we are running this notebook server in Docker container with all dependencies.
Since you need to access Cytoscape via RESTful API, HTTP client library is the most important tool you need to understand. In this example, we use Requests library to simplify API call code.
Data will be exchanged as JSON between Cytoscape and Python code. Python has built-in support for JSON and we will use it in this workflow.
At this point, there is only one option for the cy-rest module: port number.
By default, port number used by cy-rest module is 1234. To change this, you need set a global Cytoscape property from Edit → Preserences → Properties... and add a new property resr.port.
Actual Docker runtime is only available to Linux operating system and if you use Mac or Windows version of Docker, it is running on a Linux virtual machine (called boot2docker).
We assume you are running Cytoscape desktop application and IPython Notebook server in a Docker container we provide. To access Cytoscape REST API, use the following URL:
url
http://IP_of_your_machine:PORT_NUMBER/v1/
where v1 is the current version number of API. Once the final release is ready, we guarantee compatibility of your scripts as long as major version number is the same.
ifconfig
ipconfig
All data exchanged between Cytoscape and other applications is in JSON. You can make the JSON data more humanreadable by using browser extensions:
If you prefer command-line tools, jq is the best choice.
In [1]:
# HTTP Client for Python
import requests
# Standard JSON library
import json
# Basic Setup
PORT_NUMBER = 1234 # This is the default port number of CyREST
In [2]:
# IP address of your PHYSICAL MACHINE (NOT VM)
IP = '137.110.137.158'
In [3]:
BASE = 'http://' + IP + ':' + str(PORT_NUMBER) + '/v1/'
# Header for posting data to the server as JSON
HEADERS = {'Content-Type': 'application/json'}
# Clean-up
requests.delete(BASE + 'session')
# Utility function to display JSON (Pretty-printer)
def pp(json_data):
print(json.dumps(json_data, indent=4))
First, send a simple request and check the server status.
Object returned from the requests contains return value of API as JSON. Let's convert it into Python object. JSON library in Python converts JSON string into simple Python object.
In [4]:
# Get server status
res = requests.get(BASE)
status_object = res.json()
print(json.dumps(status_object, indent=4))
And of course, you can access this API from other tools, including web browsers.
Click the following URL:
In [5]:
print(BASE)
Basic mechanism of cyREST is very simple. It accesses resources in Cytoscape with standard HTTP verbs: POST, GET, PUT, and DELETE. The URL above means "give me status of cyREST server."
And once you store the return values in Python object, you can access them through standard Python code:
In [6]:
print(status_object['apiVersion'])
print(status_object['memoryStatus']['usedMemory'])
If you are comfortable with these, you are ready to go!
There are many ways to load networks into Cytoscape from REST API:
Let's start from a simple file loading examples. The POST method is used to create new Cytoscape objects. For example,
POST http://localhost:1234/v1/networks
means create new network(s) by specified method. If you want to create networks from files on your machine or remote servers, all you need to do is create a list of file locations and post it to Cytoscape.
In [7]:
# Small utility function to create networks from list of URLs
def create_from_list(network_list, collection_name='Yeast Collection'):
payload = {'source': 'url', 'collection': collection_name}
server_res = requests.post(BASE + 'networks', data=json.dumps(network_list), headers=HEADERS, params=payload)
return server_res.json()
# Array of data source.
network_files = [
#This should be path in the LOCAL file system!
'file:////Users/kono/prog/git/sdcsb-advanced-tutorial/tutorials/data/yeast.json',
# SIF file on a web server
'http://chianti.ucsd.edu/cytoscape-data/galFiltered.sif'
# And of course, you can add as many files as you need...
]
# Create!
res_json = create_from_list(network_files)
print(json.dumps(res_json, indent=4))
SUID is the unique identifier for all graph objects in Cytoscape. You can access any objects in current session as long as you have its SUID. For the example above, you can access the new network SUIDs by:
In [8]:
sample_network_suids = []
for new_network in res_json:
sample_network_suids.append(new_network['networkSUID'][0])
sample_network_suids
Out[8]:
Note that Cytoscape may creates multiple networks from a single network resource. This is why you need index number after networkSUID. In this tutorial, all network resource (file) contains only one network, so you can just use 0 to access the result.
This is a bit trickey part. When you specify local file, you need to absolute path
On Docker container, your data file is mounted on:
/notebooks/data
However, actual file is in:
PATH_TO_YOUR_WORKSPACE/vizbi-2015-cytoscape-tutorial/notebooks/data
Although you can see the data directory on /notebooks/data, you need to use absolute path to access actual data from Cytoscape. You may think this is a bit annoying, but actually, this is the power of container technology. You can use completely isolated environment to run your workflow.
There are many public data sources and web services for biology. If the service supports Cytoscape-readable file formats, you can directly specify the query URL as the network resource location. For example, the following URL calls KEGG REST API and load the TCA Cycle pathway diagram for human.
Hand-drawn pathway diagram in KEGG:
You can just click the link above and press Install to directly install the app from the web.
If the data format is supported in Cytoscape, you can import it programmatically by passing the resource location (URL) to Cytoscape:
In [9]:
# Resource location as URL
tca_cycle_human = 'http://rest.kegg.jp/get/hsa00020/kgml'
# Pass it to Cytoscape
pp(create_from_list([tca_cycle_human], 'KEGG Metabolic Pathways'))
and now your Cytoscape window should look like this:
OK, this is not so interesting because it can be done manually from GUI if we want. But what happens if you need to check hundreds of resources and filter the results? You can easily handle such problems if you know how to write your workflow as notebook (code).
In this example, we will do the following:
In [10]:
# Find pathways involving cancer
res = requests.get('http://togows.org/search/kegg-pathway/cancer/1,50.json')
pp(res.json())
This raw result needs some work to make it usable in other services. In the following cell, Python creates URLs from the list of pathway ID/pathway name pair:
In [11]:
# Convert to URLs. This can be done with for loop, but for simplicity, we use map function.
# Extract ID portion of entries
path_ids = list(map(lambda x: x.split('\t')[0], res.json()))
# Make it consumable by KEGG API (Convert to list of URLs)
path_url_human = list(map(lambda x: 'http://rest.kegg.jp/get/' + x.replace('path:map', 'hsa') + '/kgml', path_ids))
pp(path_url_human)
In [12]:
# This may take a while...
pp(create_from_list(path_url_human[0:3], 'KEGG Metabolic Pathways'))
We used modern best practices to design cyREST API. All HTTP verbs are mapped to Cytoscape resources:
HTTP Verb | Description |
---|---|
GET | Retrieving resources (in most cases, it is Cytoscape data objects, such as networks or tables) |
POST | Creating resources |
PUT | Changing/replacing resources or collections |
DELETE | Deleting resources |
This design style is called Resource Oriented Architecture (ROA).
Actually, basic idea is very simple: mapping all operations to existing HTTP verbs. It is easy to understand once you try actual examples.
In [13]:
# Get a list of network IDs
get_all_networks_url = BASE + 'networks'
print(get_all_networks_url)
res = requests.get(get_all_networks_url)
pp(res.json())
In [14]:
# Pick the first network from the list above:
network_suid = res.json()[0]
get_network_url = BASE + 'networks/' + str(network_suid)
print(get_network_url)
# Get number of nodes in the network
get_nodes_count_url = BASE + 'networks/' + str(network_suid) + '/nodes/count'
print(get_nodes_count_url)
# Get all nodes
get_nodes_url = BASE + 'networks/' + str(network_suid) + '/nodes'
print(get_nodes_url)
# Get Node data table as CSV
get_node_table_url = BASE + 'networks/' + str(network_suid) + '/tables/defaultnode.csv'
print(get_node_table_url)
If a system's RESTful API is well-designed based on ROA best practices, it should be easy to guess similar functions as URLs.
Display a clickable URLs for the following functions:
In [15]:
# Write your answers here...
In [16]:
# Add a new nodes to existing network (with time stamps)
import datetime
new_nodes =[
'Node created at ' + str(datetime.datetime.now()),
'Node created at ' + str(datetime.datetime.now())
]
res = requests.post(get_nodes_url, data=json.dumps(new_nodes), headers=HEADERS)
new_node_ids = res.json()
pp(new_node_ids)
In [17]:
# Delete all nodes
requests.delete(BASE + 'networks/' + str(sample_network_suids[0]) + '/nodes')
Out[17]:
In [18]:
# Delete a network
requests.delete(BASE + 'networks/' + str(sample_network_suids[0]))
Out[18]:
In [19]:
# Update a node name
new_values = [
{
'SUID': new_node_ids[0]['SUID'],
'value' : 'updated 1'
},
{
'SUID': new_node_ids[1]['SUID'],
'value' : 'updated 2'
}
]
requests.put(BASE + 'networks/' + str(network_suid) + '/tables/defaultnode/columns/name', data=json.dumps(new_values), headers=HEADERS)
Out[19]:
And this is the most powerful feature in Cytoscape REST API. You can easily convert Python objects into Cytoscape networks, tables, or Visual Styles
Cytoscape REST API sends and receives data as JSON. For networks, it uses Cytoscape.js style JSON (support for more file formats are comming!). You can programmatically generates networks by converting Python dictionary into JSON.
Let's start with the simplest network JSON:
In [20]:
# Manually generates JSON as a Python dictionary
def create_network():
network = {
'data': {
'name': 'I\'m empty!'
},
'elements': {
'nodes':[],
'edges':[]
}
}
return network
# Difine a simple utility function
def postNetwork(data):
url_params = {
'collection': 'My Network Colleciton'
}
res = requests.post(BASE + 'networks', params=url_params, data=json.dumps(data), headers=HEADERS)
return res.json()['networkSUID']
# POST data to Cytoscape
empty_net_1 = create_network()
empty_net_1_suid = postNetwork(empty_net_1)
print('Empty network has SUID ' + str(empty_net_1_suid))
In [21]:
# Create sequence of letters (A-Z)
seq_letters = list(map(chr, range(ord('A'), ord('Z')+1)))
print(seq_letters)
In [22]:
# Option 1: Add nods and edges with for loops
def add_nodes_edges(network):
nodes = []
edges = []
for lt in seq_letters:
node = {
'data': {
'id': lt
}
}
nodes.append(node)
for lt in seq_letters:
edge = {
'data': {
'source': lt,
'target': 'A'
}
}
edges.append(edge)
network['elements']['nodes'] = nodes
network['elements']['edges'] = edges
network['data']['name'] = 'A is the hub.'
# Option 2: Add nodes and edges in functional way
def add_nodes_edges_functional(network):
network['elements']['nodes'] = list(map(lambda x: {'data': { 'id': x }}, seq_letters))
network['elements']['edges'] = list(map(lambda x: {'data': { 'source': x, 'target': 'A' }}, seq_letters))
network['data']['name'] = 'A is the hub (Functional Way)'
# Uncomment this if you want to see the actual JSON object
# print(json.dumps(empty_network, indent=4))
net1 = create_network()
net2 = create_network()
add_nodes_edges_functional(net1)
add_nodes_edges(net2)
networks = [net1, net2]
def visualize(net):
suid = postNetwork(net)
net['data']['SUID'] = suid
# Apply layout and Visual Style
requests.get(BASE + 'apply/layouts/force-directed/' + str(suid))
requests.get(BASE + 'apply/styles/Directed/' + str(suid))
for net in networks:
visualize(net)
In [23]:
from IPython.display import Image
Image(url=BASE+'networks/' + str(net1['data']['SUID'])+ '/views/first.png', embed=True)
Out[23]:
Essentially, writing your workflow as a notebook is a programming. To control Cytoscape efficiently from Notebooks, you need to understand basic data model of Cytoscape. Let me explain it as a notebook...
First, let's create a data file to visualize Cytoscape data model
In [24]:
%%writefile data/model.sif
Model parent_of ViewModel_1
Model parent_of ViewModel_2
Model parent_of ViewModel_3
ViewModel_1 parent_of Presentation_A
ViewModel_1 parent_of Presentation_B
ViewModel_2 parent_of Presentation_C
ViewModel_3 parent_of Presentation_D
ViewModel_3 parent_of Presentation_E
ViewModel_3 parent_of Presentation_F
In [25]:
model = [
'file:////Users/kono/prog/git/sdcsb-advanced-tutorial/tutorials/data/model.sif'
]
# Create!
res = create_from_list(model)
model_suid = res[0]['networkSUID'][0]
requests.get(BASE + 'apply/layouts/force-directed/' + str(model_suid))
Image(url=BASE+'networks/' + str(model_suid)+ '/views/first.png', embed=True)
Out[25]:
Essentially, Model in Cytoscape means networks and tables. Internally, Model can have multiple View Models.
State of the view.
This is why you need to use views instead of view in the API:
/v1/networks/SUID/views
However, Cytoscape 3.2.x has only one rendering engine for now, and end-users do not have access to this feature. Until Cytoscape Desktop supports multiple renderers, best practice is just use one view per model. To access the default view, there is a utility method first:
In [26]:
view_url = BASE + 'networks/' + str(model_suid) + '/views/first'
print('You can access (default) network view from this URL: ' + view_url)
Presentation is a stateless, actual graphics you see in the window. A View Model can have multiple Presentations. For now, you can assume there is always one presentation per View Model.
CyREST API is fairly low level, and you can access all levels of Cytoscpae data structures. But if you want to use Cytoscape as a simple network visualization engine for IPython Notebook, here are some tips:
ALL Cytoscape objects, networks, nodes, egdes, and tables have a session unique ID, called SUID. When you create any new data objects in Cytoscape, it returns SUIDs. You need to keep them as Python data objects (list, dict, amp, etc.) to access them later.
Until Cytoscape Desktop fully support multiple view/presentation feature, keep it simple: one view per model.
Of course, there is a API to add / remove / update one data object per API call, but it is extremely inefficient!
Edgelist is a minimalistic data format for networks and it is widely used in popular libraries including NetworkX and igraph. Preparing edgelist in Python is straightforward. You just need to prepare a list of edges as string like:
a b
b c
a c
c d
d f
b f
f g
f h
In Python, there are many ways to generate string like this. Here is a naive approach:
In [29]:
data_str = ''
n = 0
while n <100:
data_str = data_str + str(n) + '\t' + str(n+1) + '\n'
n = n + 1
# Join the first and last nodes
data_str = data_str + '100\t0\n'
# print(data_str)
# You can create multiple networks by running simple for loop:
for i in range(3):
res = requests.post(BASE + 'networks?format=edgelist&collection=Rings!', data=data_str, headers=HEADERS)
circle_suid = res.json()['networkSUID']
requests.get(BASE + 'apply/layouts/circular/' + str(circle_suid))
Image(url=BASE+'networks/' + str(circle_suid) + '/views/first.png', embed=True)
Out[29]:
Edge list is a human-editable text file to represent a graph structure. Using the sample data abobe (edge list example in 3.3.2), create a new network in Cytoscape from the edge list and visualize it just like the ring network above.
Hint: Use Magic!
In [28]:
# Write your code here...
In this section, we've learned how to generate networks programmatically in Python. But for real world problems, it is not a good idea to use low level Python code to generate networks because there are lots of cool graph libraries such as NetworkX or igraph which provide high level graph APIs. In the next session, let's use them to analyze real network data sets.