Final - DATA 661

IPySigma Lives: Presenting an Extensible Prototype Network Visualization Frontend for Jupyter Notebooks

John DeBlase, Daina Bouquin

Introduction

The potential uses of network analytics and visualizations are extensive, with applications ranging from social network analysis to environmental science to better understanding how political revolutions spread. However, many of the tools most commonly used for these types of analysis, particularly Python modules like NetworkX, are not designed to produce aesthetically pleasing, interactive visualizations that support the development of theories and inferences. Much of the time, data scientists using tools like Jupyter are left trying to work with network visualizations that look like hairballs or spending a great deal of time trying to use unfamiliar tools like JavaScript or Gephi to produce useful graphs. These people do not currently have access to simple interfaces that integrate with tools like Jupyter notebooks (formerly IPython) that the rest of their workflows rely upon.

Throughout this semester, we thought through this problem and began prototyping a flexible architecture to help people create SigmaJS networks from NetworkX objects without abandoning their Jupyter Notebooks. The result is an extensible proof-of-concept for a side-by-side Jupyter network visualization GUI that will allow users to quickly create clear visualizations that can help drive research processes.

The below sections aim to justify the use of Jupyter as a platform, the focus of the application, the components selected to create the application, and to outline steps that can be taken moving forward to improve and expand on the current infrastructure.

Why Jupyter?

Jupyter is becoming increasingly important to the data community for sharing and reproducibility, therefore tools that integrate with this environment are highly valuable. Reproducibility is also foundationally important to computational science endeavors more broadly, both in academia and in industry. Why Focus on Network Visualization? Visualization is integral to the data scientist’s ability to use network analytics to effectively derive theories and inferences. Research in social network analysis has shown that dynamic and interactive graph visualizations foster “theoretical insight” thus creating a real need for “dynamic network visualizations” [1]. Moreover, many scientific domains are “now convinced that network visualization is essential to improve their work since it allows them to see complex structures that statistics and modeling alone cannot reveal” [2].

Why SigmaJS?

SigmaJS is a JavaScript library dedicated to graph drawing. Unlike libraries like D3, Sigma is optimized for our usecase.

Evaluation

Over the 15 weeks of the CUNY SPS spring semester, an iterative, stepped approach was taken with the goal being to build a functional prototype to with the above described functionality. We were able to achive this goal, and also able to achieve our stretch goal by being accepted as speakers at JupyterCon 2017, the Jupyter Project’s first international conference.

Running the IPySigma prototype

Manual Install:

git clone this repo and install both the python and node components.

Python

The prototype python package is contained in the "ipysig" folder.

From the root directory: Build and activate a clean python environment>=2.7.10 with requirements.txt using virtualenv.
pip install -r requirements.txt to get the required packages.

Node.js

The node-express application is contained in the app folder.

Make sure your node version is >= v6.9.4 and that both npm and bower are installed globally.

From the root directory: cd ./app
type npm install to install the node modules locally in the app top-level folder
From app: cd ./browser
type bower install to install the bower_components folder (note: these steps might change in future versions with browserify)

Run the Demo

At the root directory launch a jupyter notebook server and run the notebook ipysig_test.ipynb and then follow the instructions for each cell.

Overview of components

Overview of IPySigma Application System

Jupyter Notebook Frontend/ Jupyter Notebook Server

To boot up the app using a running notebook server, first the IPySig controller class is imported and instantiated within the user’s notebook session. This is the core Python class of the application and is implemented as a Singleton. An IPySig object does a few key things on instantiation: If run for the first time, IPySig injects the API found in the ipysig.sigma_addon_methods into the nx.Graph class. This adds custom functionality to any existing or newly created graphs in the current session for exporting node and edge data in JSON format suitable for SigmaJS. Next, using a system call it fetches and stores the url and token (if any) of the running notebook server. Using the url and token, an express server process is spun up and stored in a class variable. This is run through several sets of protected method calls.

Once the express server is running, a “session” can be created by calling the connect() method passing in the graph object and a key name. The graph object reference gets stored in IPySig under this key name. The key name is also emitted via a socket.io client to a listener on the express server which stores the reference in node.js and sends a callback response that tells IPySig to spin up a webbrowser tab. When the browser tab gets opened an event is emitted to fetch the key name to the graph reference, thus completing the connection between the new browser tab and the running graph object in the IPython kernel.

A session has now been created and the user can use the frontend to display the graph object using the Load Graph button and giving the graph a Title. The title will serve as a means of saving a graph in BrowserDB via a LokiJS adaptor, allowing persistence of multiple graph objects in a single browser session. This functionality has not been implemented yet as of version 0.1.

Jupyterlab Services/Express/SigmaJS

Once a browser tab session is bound, pressing Load Graph and submitting a Title will emit to the ‘main-room’ socket.io listener on the express server. The jupyterlab services API connects to the running kernel and calls IPySig.export_graph_instance() with the correct graph_name as a promise.

Once the promise is fulfilled, the JSON graph data is emitted back up to the browser via the correct socket.id. The graph data is then rendered in the browser main_room listener via a call to make_graph which returns the sigma graph.

In future releases, each new graph will be saved to BrowserDB via a Loki.js adaptor. The user will be able to select from a list of past graphs rendered in that session for easy comparison. The option will also be given for the contents of the session to be exported to JSON.

The Future

There are a number of avenues that could be explored using the protocol and platform defined above. Over the next few months, Daina and John will define an applied usecase to be presented at JupyterCon along with a presentation on IPySigma's development and logic. Improvements will also be made to the code to address a number of issues currently documented on GitHub. Additionally, documentation will be automated using Sphinx to improve the seamlessness of updates and allow us to create more professional web presence. We have also begun creating a website that will act as a splash page for the tool.