matta - view and scaffold d3.js visualizations in IPython notebooks

basic examples

By @carnby.

This notebook showcases the basic matta visualizations, as well as their usage.

Note that the init_javascript call is not needed when running on local server having added the javascript code to your IPython profile.


In [1]:
import matta
matta.init_javascript(path='https://rawgit.com/carnby/matta/master/matta/libs/')


Out[1]:
matta Javascript code added.

Wordclouds

Wordclouds are implemented using the d3.layout.cloud layout by Jason Davies. They work with bags of words. The python Counter class is perfect for this purposes.


In [2]:
import requests
hamlet = requests.get('http://www.gutenberg.org/cache/epub/2265/pg2265.txt').text
hamlet[0:100]


Out[2]:
u"\ufeff***The Project Gutenberg's Etext of Shakespeare's First Folio***\r\n*********************The Tragedie"

In [3]:
import re
from collections import Counter

words = re.split(r'[\W]+', hamlet.lower())
counts = Counter(words)

In [4]:
from matta import wordcloud

wordcloud(items=counts.most_common(n=1000), typeface='Helvetica', font_scale=0.33, rotation=-7)


Treemaps

Treemaps use the Treemap Layout from d3.js. They work with trees, which we construct through networkx.DiGraph.


In [5]:
import requests
flare_data = requests.get('https://gist.githubusercontent.com/mbostock/4063582/raw/a05a94858375bd0ae023f6950a2b13fac5127637/flare.json').json()

In [6]:
flare_data['name']


Out[6]:
u'flare'

In [7]:
import networkx as nx

tree = nx.DiGraph()

def add_node(node):
    node_id = tree.number_of_nodes() + 1
    n = tree.add_node(node_id, name=node['name'])
    
    if 'size' in node:
        tree.node[node_id]['size'] = node['size']
    
    if 'children' in node:
        for child in node['children']:
            child_id = add_node(child)
            tree.add_edge(node_id, child_id)
    
    return node_id

add_node(flare_data)


Out[7]:
1

In [8]:
nx.is_arborescence(tree)


Out[8]:
True

In [9]:
from matta import treemap

treemap(tree=tree, node_value='size', node_label='name', font_size=9, node_border=1, node_padding=0)


Sankey

Sankey or flow diagrams use the Sankey plugin by Mike Bostock. They work with digraphs, just like treemaps. Note that graphs with loops are not supported.


In [10]:
sankey_data = requests.get('http://bost.ocks.org/mike/sankey/energy.json')

In [11]:
import json
from networkx.readwrite import json_graph

sankey_graph = json_graph.node_link_graph(json.loads(sankey_data.text))

In [12]:
sankey_graph.nodes_iter(data=True).next(), sankey_graph.edges_iter(data=True).next()


Out[12]:
((0, {u'name': u"Agricultural 'waste'"}), (0, 1, {u'value': 124.729}))

In [13]:
from matta import sankey

sankey(graph=sankey_graph, background_color='#efefef', node_label='name', link_weight='value', node_color='indigo', node_width=8, node_padding=13,
       link_color='#aaa', link_opacity=0.75)


Parallel Coordinates

Parallel Coordinates are based on the code by Jason Davies. They work with pandas.DataFrame.


In [14]:
import pandas as pd
df = pd.read_csv('http://bl.ocks.org/jasondavies/raw/1341281/cars.csv', index_col='name')
df.head()


Out[14]:
economy (mpg) cylinders displacement (cc) power (hp) weight (lb) 0-60 mph (s) year
name
AMC Ambassador Brougham 13.0 8 360 175 3821 11.0 73
AMC Ambassador DPL 15.0 8 390 190 3850 8.5 70
AMC Ambassador SST 17.0 8 304 150 3672 11.5 72
AMC Concord DL 6 20.2 6 232 90 3265 18.2 79
AMC Concord DL 18.1 6 258 120 3410 15.1 78

In [15]:
from matta import parallel_coordinates
parallel_coordinates(dataframe=df)


Graph

Graphs from networkx.DiGraph are visualized using the Force Layout in d3.js.


In [16]:
graph = nx.davis_southern_women_graph()

In [17]:
for node in graph.nodes_iter(data=True):
    graph.node[node[0]]['color'] = 'purple' if node[1]['bipartite'] else 'green'
    graph.node[node[0]]['size'] = graph.degree(node[0])

In [18]:
from matta import force_directed
force_directed(graph=graph, link_distance=200, avoid_collisions=True, clamp_to_viewport=True, 
               background_color='#efefef', node_value='size', node_min_ratio=8, node_max_ratio=36)



In [18]: