VIZBI Tutorial Session

Part 2: Cytoscape, IPython, Docker, and reproducible network data visualization workflows

Tuesday, 3/24/2015

Lesson 1: Introduction to cyREST

by Keiichiro Ono


Welcome!

This is an introduction to cyREST and its basic API. In this section, you will learn how to access Cytoscape through RESTful API.

Prerequisites

  • Basic knowledge of RESTful API
  • Basic Python skill - only basics, such as conditional statements, loops, basic data types.
  • Basic knowledge of Cytoscape
    • Cytoscape data types - Networks, Tables, and Styles.

System Requirments

This tutorial is tested on the following platform:

Client machine running Cytoscape

Server Running IPython Notebook


1. Import Python Libraries and Basic Setup

Libraries

In this tutorial, we will use several popular Python libraries to make this workflow more realistic.

Do I need to install all of them?

NO. Because we are running this notebook server in Docker container.

HTTP Client

Since you need to access Cytoscape via RESTful API, HTTP client library is the most important tool you need to understand. In this example, we use Requests library to simplify API call code.

JSON Encoding and Decoding

Data will be exchanged as JSON between Cytoscape and Python code. Python has built-in support for JSON and we will use it in this workflow.

Basic Setup for the API

At this point, there is only one option for the cy-rest module: port number.

Change Port Number

By default, port number used by cy-rest module is 1234. To change this, you need set a global Cytoscape property from Edit → Preserences → Properties... and add a new property resr.port.

What is happing in your machine?

Mac / Windows

Linux

Actual Docker runtime is only available to Linux operating system and if you use Mac or Windows version of Docker, it is running on a Linux virtual machine (called boot2docker).

URL to Access Cytoscape REST API

We assume you are running Cytoscape desktop application and IPython Notebook server in a Docker container we provide. To access Cytoscape REST API, use the following URL:

url
http://IP_of_your_machine:PORT_NUMBER/v1/

where v1 is the current version number of API. Once the final release is ready, we guarantee compatibility of your scripts as long as major version number is the same.

Check your machine's IP

  • For Linux and Mac:
    ifconfig
    
  • For Windows:
    ipconfig

Viewing JSON

All data exchanged between Cytoscape and other applications is in JSON. You can view the JSON data by using browser extensions:

If you prefer command-line, jq is the best choice.


In [1]:
# HTTP Client for Python
import requests

# Standard JSON library
import json

# Basic Setup
PORT_NUMBER = 1234 # This is the default port number of CyREST

# IP address of your PHYSICAL MACHINE (NOT VM)
IP = '192.168.100.172'

BASE = 'http://' + IP +  ':' + str(PORT_NUMBER) + '/v1/'

# Header for posting data to the server as JSON
HEADERS = {'Content-Type': 'application/json'}

# Clean-up
requests.delete(BASE + 'session')


Out[1]:
<Response [200]>

2. Test Cytoscape REST API

Check the status of server

First, send a simple request and check the server status.

Roundtrip between JSON and Python Object

Object returned from the requests contains return value of API as JSON. Let's convert it into Python object. JSON library in Python converts JSON string into simple Python object.


In [2]:
# Get server status
res = requests.get(BASE)
status_object = res.json()
print(json.dumps(status_object, indent=4))


{
    "apiVersion": "v1",
    "memoryStatus": {
        "freeMemory": 3294,
        "maxMemory": 13653,
        "usedMemory": 3021,
        "totalMemory": 6316
    },
    "numberOfCores": 8
}

In [3]:
print(status_object['apiVersion'])
print(status_object['memoryStatus']['usedMemory'])


v1
3021

If you are comfortable with this data type conversion, you are ready to go!


3. Import Networks from various data sources

There are many ways to load networks into Cytoscape from REST API:

  • Load from files
  • Load from web services
  • Send Cytoscape.js style JSON directly to Cytoscape
  • Send edgelist

3.1 Create networks from local files and URLs

Let's start from a simple file loading examples. The POST method is used to create new Cytoscape objects. For example,

POST http://localhost:1234/v1/networks

means create new network(s) by specified method. If you want to create networks from files on your machine or remote servers, all you need to do is create a list of file locations and post it to Cytoscape.


In [4]:
# Small utility function to create networks from list of URLs
def create_from_list(network_list, collection_name='Yeast Collection'):
    payload = {'source': 'url', 'collection': collection_name}
    server_res = requests.post(BASE + 'networks', data=json.dumps(network_list), headers=HEADERS, params=payload)
    return server_res.json()


# Array of data source. 
network_files = [
    #This should be path in the LOCAL file system! 
    'file:////Users/kono/git/vizbi-2015/tutorials/data/yeast.json',
    # SIF file on a web server
    'http://chianti.ucsd.edu/cytoscape-data/galFiltered.sif'
    
    # And of course, you can add as many files as you need...
]

# Create!
print(json.dumps(create_from_list(network_files), indent=4))


[
    {
        "networkSUID": [
            69988
        ],
        "source": "http://chianti.ucsd.edu/cytoscape-data/galFiltered.sif"
    },
    {
        "networkSUID": [
            68880
        ],
        "source": "file:////Users/kono/git/vizbi-2015/tutorials/data/yeast.json"
    }
]

What is SUID?

SUID is a unique identifiers for all graph objects in Cytoscape. You can access any objects in current session as long as you have its SUID.

Where is my local data file?

This is a bit trickey part. When you specify local file, you need to absolute path

On Docker container, your data file is mounted on:

/notebooks/data

However, actual file is in:

PATH_TO_YOUR_WORKSPACE/vizbi-2015-cytoscape-tutorial/notebooks/data

Although you can see the data directory on /notebooks/data, you need to use absolute path to access actual data from Cytoscape. You may think this is a bit annoying, but actually, this is the power of container technology. You can use completely isolated environment to run your workflow.

3.2 Create networks from public RESTful web services

There are many public network data services. If the service supports Cytoscape-readable file formats, you can specify the query URL as a network location. For example, the following URL calls KEGG REST API and load the TCA Cycle pathway diagram for human.


In [5]:
# Utility function to display JSON (Pretty-printer)
def pp(json_data):
    print(json.dumps(json_data, indent=4))

# You need KEGGScape App to load this file!
queries = [ 'http://rest.kegg.jp/get/hsa00020/kgml' ]
pp(create_from_list(queries, 'KEGG Metabolic Pathways'))


[
    {
        "networkSUID": [
            71153
        ],
        "source": "http://rest.kegg.jp/get/hsa00020/kgml"
    }
]

And of course, you can mix local files, URLs, and list of web service queries in a same list:


In [6]:
mixed = [
    'http://chianti.ucsd.edu/cytoscape-data/galFiltered.sif',
    'http://www.ebi.ac.uk/Tools/webservices/psicquic/intact/webservices/current/search/query/brca1?format=xml25'
]
result = create_from_list(mixed, 'Mixed Collection')
pp(result)


[
    {
        "networkSUID": [
            72863
        ],
        "source": "http://www.ebi.ac.uk/Tools/webservices/psicquic/intact/webservices/current/search/query/brca1?format=xml25"
    },
    {
        "networkSUID": [
            71449
        ],
        "source": "http://chianti.ucsd.edu/cytoscape-data/galFiltered.sif"
    }
]

In [7]:
mixed1 = result[0]['networkSUID'][0]
mixed1


Out[7]:
72863

Understand REST Principles

We used modern best practices to design cyREST API. All HTTP verbs are mapped to Cytoscape resources:

HTTP Verb Description
GET Retrieving resources (in most cases, it is Cytoscape data objects, such as networks or tables)
POST Creating resources
PUT Changing/replacing resources or collections
DELETE Deleting resources

This design style is called Resource Oriented Architecture (ROA).

Actually, basic idea is very simple: mapping all operations to existing HTTP verbs. It is easy to understand once you try actual examples.

GET (Get a resource)


In [8]:
# Get a list of network IDs
get_all_networks_url = BASE + 'networks'
print(get_all_networks_url)
res = requests.get(get_all_networks_url)
pp(res.json())


http://192.168.100.172:1234/v1/networks
[
    71153,
    68880,
    69988,
    71449,
    72863
]

In [9]:
# Pick the first network from the list above:
network_suid = res.json()[0]
get_network_url = BASE + 'networks/' + str(network_suid)
print(get_network_url)

# Get number of nodes in the network
get_nodes_count_url = BASE + 'networks/' + str(network_suid) + '/nodes/count'
print(get_nodes_count_url)

# Get all nodes
get_nodes_url = BASE + 'networks/' + str(network_suid) + '/nodes'
print(get_nodes_url)

# Get Node data table as CSV
get_node_table_url = BASE + 'networks/' + str(network_suid) + '/tables/defaultnode.csv'
print(get_node_table_url)


http://192.168.100.172:1234/v1/networks/71153
http://192.168.100.172:1234/v1/networks/71153/nodes/count
http://192.168.100.172:1234/v1/networks/71153/nodes
http://192.168.100.172:1234/v1/networks/71153/tables/defaultnode.csv

Exercise 1: Guess URLs

If a system's RESTful API is well-designed based on ROA best practices, it should be easy to guess similar functions as URLs.

Display a clickable URLs for the following functions:

  1. Show number of networks in current session
  2. Show all edges in a network
  3. Show full information for a node (can be any node)
  4. Show information for all columns in the default node table
  5. Show all values in default node table under "name" column

In [10]:
# Write your answers here...

# 1
print(BASE + 'networks/count')

#2
print(BASE + 'networks/' + str(network_suid) + '/edges')

#3
# First, get available node SUID.  Let's use the URL in the last section:
res = requests.get(get_nodes_url)
first_node_id = res.json()[0]
print(BASE + 'networks/' + str(network_suid) + '/nodes/' + str(first_node_id))

#4
print(BASE + 'networks/' + str(network_suid) + '/tables/defaultnode/columns')

#5
print(BASE + 'networks/' + str(network_suid) + '/tables/defaultnode/columns/name')


http://192.168.100.172:1234/v1/networks/count
http://192.168.100.172:1234/v1/networks/71153/edges
http://192.168.100.172:1234/v1/networks/71153/nodes/71163
http://192.168.100.172:1234/v1/networks/71153/tables/defaultnode/columns
http://192.168.100.172:1234/v1/networks/71153/tables/defaultnode/columns/name

POST (Create a new resource)

To create new resource (objects), you should use POST methods. URLs follow ROA standards, but you need to read API documents to understand data format for each object.


In [11]:
# Add a new nodes to existing network (with time stamps)
import datetime

new_nodes =[
    'Node created at ' + str(datetime.datetime.now()),
    'Node created at ' + str(datetime.datetime.now())
]

res = requests.post(get_nodes_url, data=json.dumps(new_nodes), headers=HEADERS)
new_node_ids = res.json()
pp(new_node_ids)


[
    {
        "name": "Node created at 2015-03-24 06:00:11.019778",
        "SUID": 74417
    },
    {
        "name": "Node created at 2015-03-24 06:00:11.019833",
        "SUID": 74418
    }
]

DELETE (Delete a resource)


In [12]:
# Delete all nodes
requests.delete(BASE + 'networks/' + str(mixed1) + '/nodes')


Out[12]:
<Response [200]>

In [13]:
# Delete a network
requests.delete(BASE + 'networks/' + str(mixed1))


Out[13]:
<Response [200]>

PUT (Update a resource)

PUT method is used to update information for existing resources. Just like POST methods, you need to know the data format to be posted.


In [14]:
# Update a node name
new_values = [
    {
        'SUID': new_node_ids[0]['SUID'],
        'value' : 'updated 1'
    },
    {
        'SUID': new_node_ids[1]['SUID'],
        'value' : 'updated 2'
    }
]
requests.put(BASE + 'networks/' + str(network_suid) + '/tables/defaultnode/columns/name', data=json.dumps(new_values), headers=HEADERS)


Out[14]:
<Response [200]>

3.3 Create networks from Python objects

And this is the most powerful feature in Cytoscape REST API. You can easily convert Python objects into Cytoscape networks, tables, or Visual Styles

How does this work?

Cytoscape REST API sends and receives data as JSON. For networks, it uses Cytoscape.js style JSON (support for more file formats are comming!). You can programmatically generates networks by converting Python dictionary into JSON.

3.3.1 Prepare Network as Cytoscape.js JSON

Let's start with the simplest network JSON:


In [15]:
# Start from a clean slate: remove all networks from current session
# requests.delete(BASE + 'networks')

In [16]:
# Manually generates JSON as a Python dictionary
def create_network():
    network = { 
            'data': {
                'name': 'I\'m empty!'
            },
            'elements': {
                'nodes':[],
                'edges':[]
            }
    }
    return network


# Difine a simple utility function
def postNetwork(data):
    url_params = {
        'collection': 'My Network Colleciton'
    }
    res = requests.post(BASE + 'networks', params=url_params, data=json.dumps(data), headers=HEADERS)
    return res.json()['networkSUID']


# POST data to Cytoscape
empty_net_1 = create_network()
empty_net_1_suid = postNetwork(empty_net_1)
print('Empty network has SUID ' + str(empty_net_1_suid))


Empty network has SUID 74437

Modify network dara programmatically

Since it's a simple Python dictionary, it is easy to add data to the network. Let's add some nodes and edges:


In [17]:
# Create sequence of letters (A-Z)
seq_letters = list(map(chr, range(ord('A'), ord('Z')+1)))
print(seq_letters)


['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']

In [18]:
# Option 1: Add nods and edges with for loops
def add_nodes_edges(network):
    nodes = []
    edges = []
    
    for lt in seq_letters:
        node = {
            'data': {
                'id': lt
            }
        }
        nodes.append(node)
    for lt in seq_letters:
        edge = {
            'data': { 
                'source': lt, 
                'target': 'A' 
            }
        }
        edges.append(edge)
    network['elements']['nodes'] = nodes
    network['elements']['edges'] = edges
    network['data']['name'] = 'A is the hub.'

# Option 2: Add nodes and edges in functional way
def add_nodes_edges_functional(network):
    network['elements']['nodes'] =  list(map(lambda x: {'data': { 'id': x }}, seq_letters))
    network['elements']['edges'] =  list(map(lambda x: {'data': { 'source': x, 'target': 'A' }}, seq_letters))
    network['data']['name'] = 'A is the hub (Functional Way)'

# Uncomment this if you want to see the actual JSON object
# print(json.dumps(empty_network, indent=4))

net1 = create_network()
net2 = create_network()

add_nodes_edges_functional(net1)
add_nodes_edges(net2)

networks = [net1, net2]

def visualize(net):
    suid = postNetwork(net)
    net['data']['SUID'] = suid
    # Apply layout and Visual Style
    requests.get(BASE + 'apply/layouts/force-directed/' + str(suid))
    requests.get(BASE + 'apply/styles/Directed/' + str(suid))

for net in networks:
    visualize(net)

Now, your Cytoscpae window should look like this:

Embed images in IPython Notebook

cyRest has function to generate PNG image directly from current network view. Let's try to see the result in this notebook.


In [19]:
from IPython.display import Image

Image(url=BASE+'networks/' + str(net1['data']['SUID'])+ '/views/first.png', embed=True)


Out[19]:

Introduction to Cytoscape Data Model

Essentially, writing your workflow as a notebook is a programming. To control Cytoscape efficiently from Notebooks, you need to understand basic data model of Cytoscape. Let me explain it as a notebook...

First, let's create a data file to visualize Cytoscape data model


In [20]:
%%writefile ../data/model.sif
Model parent_of ViewModel_1
Model parent_of ViewModel_2
Model parent_of ViewModel_3
ViewModel_1 parent_of Presentation_A
ViewModel_1 parent_of Presentation_B
ViewModel_2 parent_of Presentation_C
ViewModel_3 parent_of Presentation_D
ViewModel_3 parent_of Presentation_E
ViewModel_3 parent_of Presentation_F


Overwriting ../data/model.sif

In [21]:
model = [
    'file:////Users/kono/git/vizbi-2015/tutorials/data/model.sif'
]

# Create!
res = create_from_list(model)
model_suid = res[0]['networkSUID'][0]

requests.get(BASE + 'apply/layouts/force-directed/' + str(model_suid))
Image(url=BASE+'networks/' + str(model_suid)+ '/views/first.png', embed=True)


Out[21]:

Mode, View Model, and Presentation

Model

Essentially, Model in Cytoscape means networks and tables. Internally, Model can have multiple View Models.

View Model

State of the view.

This is why you need to use views instead of view in the API:

/v1/networks/SUID/views

However, Cytoscape 3.2.x has only one rendering engine for now, and end-users do not have access to this feature. Until Cytoscape Desktop supports multiple renderers, best practice is just use one view per model. To access the default view, there is a utility method first:


In [22]:
view_url = BASE + 'networks/' + str(model_suid) + '/views/first'
print('You can access (default) network view from this URL: ' + view_url)


You can access (default) network view from this URL: http://192.168.100.172:1234/v1/networks/74681/views/first

Presentation

Presentation is a stateless, actual graphics you see in the window. A View Model can have multiple Presentations. For now, you can assume there is always one presentation per View Model.


What do you need to know as a cyREST user?

CyREST API is fairly low level, and you can access all levels of Cytoscpae data structures. But if you want to use Cytoscape as a simple network visualization engine for IPython Notebook, here are some tips:

Tip 1: Always keep SUID when you create any new object

ALL Cytoscape objects, networks, nodes, egdes, and tables have a session unique ID, called SUID. When you create any new data objects in Cytoscape, it returns SUIDs. You need to keep them as Python data objects (list, dict, amp, etc.) to access them later.

Tip 2: Create one view per model

Until Cytoscape Desktop fully support multiple view/presentation feature, keep it simple: one view per model.

Tip 3: Minimize number of API calls

Of course, there is a API to add / remove / update one data object per API call, but it is extremely inefficient!

3.3.2 Prepare Network as edgelist

Edgelist is a minimalistic data format for networks and it is widely used in popular libraries including NetworkX and igraph. Preparing edgelist in Python is straightforward. You just need to prepare a list of edges as string like:

a b
b c
a c
c d
d f
b f
f g
f h

In Python, there are many ways to generate string like this. Here is a naive approach:


In [23]:
data_str = ''
n = 0
while n <100:
    data_str = data_str + str(n) + '\t' + str(n+1) + '\n'
    n = n + 1

# Join the first and last nodes
data_str = data_str + '100\t0\n'

# print(data_str)

res = requests.post(BASE + 'networks?format=edgelist&collection=Ring', data=data_str, headers=HEADERS)
circle_suid = res.json()['networkSUID']
requests.get(BASE + 'apply/layouts/circular/' + str(circle_suid))

Image(url=BASE+'networks/' + str(circle_suid) + '/views/first.png', embed=True)


Out[23]:

Exercise 2: Create a network from a simple edge list file

Edge list is a human-editable text file to represent a graph structure. Using the sample data abobe (edge list example in 3.3.2), create a new network in Cytoscape from the edge list and visualize it just like the ring network above.

Hint: Use Magic!


In [24]:
%%writefile ../data/small1.txt
a b
b c
a c
c d
d f
b f
f g
f h


Overwriting ../data/small1.txt

In [25]:
# Write your code here...
f = open('../data/small1.txt', 'r')
data = f.read()

res = requests.post(BASE + 'networks?format=edgelist&collection=Edge List Sample', data=data, headers=HEADERS)
small_network_suid = res.json()['networkSUID']

requests.get(BASE + 'apply/layouts/force-directed/' + str(small_network_suid))
requests.get(BASE + 'apply/styles/Directed/' + str(small_network_suid))

Image(url=BASE+'networks/' + str(small_network_suid) + '/views/first.png', embed=True)


Out[25]:

Discussion

In this section, we've learned how to generate networks programmatically in Python. But for real world problems, it is not a good idea to use low level Python code to generate networks because there are lots of cool graph libraries such as NetworkX or igraph which provide high level graph APIs. In the next session, let's use them to analyze real network data sets.

Continues to Lesson 2: Working with Graph Libraries