matta - view and scaffold d3.js visualizations in IPython notebooks

Let's Make Scaffold a Barchart

Probably you have seen the barchart example by Mike Bostock. It is here. It is embedded below so you can see it.

In this notebook I explain how to use matta to implement this barchart.

Why use matta? Because one thing is to have an example of a visualization, and another one is to have a reusable implementation. Reusable implementations are not about having a function, are about an entire context where you can easily use your visualization with other datasets.

How do we do it? In this notebook we see the basic scaffolding done by matta to reproduce the example chart to visualize a pandas DataFrame. By being able to use a DataFrame, we can forget about converting the dataset to the specific layout the visualization designed had in mind, and instead, you can focus on converting to a DataFrame (which will probably be very, very easy)

Let's begin.



In [1]:

    
from IPython.display import IFrame
IFrame('http://bl.ocks.org/mbostock/raw/3885304', 1000, 550)









    Out[1]:

Initial Setup

Here we load matta.

If you see the README, you will notice that you can install matta's javascript and css into your IPython profile. In this way you do not need to issue a init_javascript call. It is here just for demonstration - if you use a core matta visualization and export the notebook to NBViewer, you will need to execute it, to allow your visitor's browser to load the required js/css files.

If you installed matta into your profile, then using the function will do no harm - it detects that matta was loaded and does nothing.



In [2]:

    
import matta
matta.init_javascript(path='https://rawgit.com/carnby/matta/master/matta/libs/')









    Out[2]:






matta Javascript code added.

Data

Mike's example loads a TSV (Tab Separated Values) file with letter frequency. We can load directly into a pandas DataFrame.



In [3]:

    
import pandas as pd

df = pd.read_csv('http://bl.ocks.org/mbostock/raw/3885304/964f9100166627a89c7e6c23ce8128f5aefd5510/data.tsv', delimiter='\t')
df.head()

Sketching the Visualization

First, let's sketch the visualization by defining what are its options and code.

The visualization options or arguments are contained in a dictionary. Note that the dictionary contains a subdictionary named variables. Those variables will be exposed as methods of the scaffolded visualization, and are available in code as _variable_name.

Note also the data dictionary. It indicates that the visualization receives a pandas DataFrame. This dataframe is available internally as the _data_dataframe variable.



In [4]:

    
# the options
barchart_args = {
    'requirements': ['d3'],
    'visualization_name': 'barchart',
    'visualization_js': './barchart.js',
    'figure_id': None,
    'container_type': 'svg',
    'data': {
        'dataframe': None,
    },
    'options': {
        'background_color': None,
        'x_axis': True,
        'y_axis': True,
    },
    'variables': {
        'width': 960,
        'height': 500,
        'padding': {'left': 30, 'top': 20, 'right': 30, 'bottom': 30},
        'x': 'x',
        'y': 'y',
        'y_axis_ticks': 10,
        'color': 'steelblue',
        'y_label': None,
        'rotate_label': True,
    },
}

This is the visualization code. Note that is almost a copy-and-paste version of the original example. We just renamed the variables to _variable_name and used other auxiliary variables like _vis_width which are exposed by matta.

Note that the code is not strictly javascript. Actually, the file is expected to be a jinja2 template.

We save this template as barchart.js, as barchart_args['visualization_js'] points to it.



In [5]:

    
barchart_code = '''
var x = d3.scale.ordinal()
    .rangeRoundBands([0, _vis_width], .1);

var y = d3.scale.linear()
    .range([_vis_height, 0]);

if (_y_label == null) {
    _y_label = _y;
}

x.domain(_data_dataframe.map(function(d) { return d[_x]; }));
y.domain([0, d3.max(_data_dataframe, function(d) { return d[_y]; })]);

{% if options.x_axis %}
    var xAxis = d3.svg.axis()
        .scale(x)
        .orient("bottom");

    container.append("g")
        .attr("class", "x axis")
        .attr("transform", "translate(0," + _vis_height + ")")
        .call(xAxis);
{% endif %}

{% if options.y_axis %}
    var yAxis = d3.svg.axis()
        .scale(y)
        .orient("left");

    if (_y_axis_ticks != null) {
        yAxis.ticks(_y_axis_ticks);
    }

    var y_label = container.append("g")
        .attr("class", "y axis")
        .call(yAxis)
        .append("text");

    if (_rotate_label) {
        y_label.attr("transform", "rotate(-90)")
        .attr("y", 6)
        .attr("dy", ".71em")
        .style("text-anchor", "end");
    } else {
        y_label
        .attr("y", 6)
            .attr('x', 12)
        .attr("dy", ".71em")
        .style("text-anchor", "start");
    }

    y_label.text(_y_label);
{% endif %}

var bar = container.selectAll(".bar")
    .data(_data_dataframe);

bar.enter().append('rect').classed('bar', true);

bar.exit().remove();

bar.attr("x", function(d) { return x(d[_x]); })
    .attr("width", x.rangeBand())
    .attr("y", function(d) { return y(d[_y]); })
    .attr("height", function(d) { return _vis_height - y(d[_y]); })
    .attr('fill', _color);
'''

with open('./barchart.js', 'w') as f:
    f.write(barchart_code)

This is the actual matta code to display the visualization in the notebook. Note that the keyword arguments are keys from the barchart_args dictionary. If you use a keyword argument not present in the dictionary, an Exception will be raised.



In [6]:

    
from matta.sketch import build_sketch
barchart = build_sketch(barchart_args)
barchart(dataframe=df, x='letter', y='frequency', rotate_label=False)

That's it! :)

We ~~copied-and-pasted~~ implemented a barchart. The cool thing is that we didn't had to worry about data formats, since we knew the data was a DataFrame. We also didn't have to worry about dependencies like loading d3.js or to have a reusable visualization, because matta does all that.

The next step is to scaffold a reusable visualization. Actually, the code is very similar:



In [7]:

    
barchart(x='letter', y='frequency').scaffold(filename='./scaffolded_barchart.js', define_js_module=False)

What this does is to create a file named scaffolded_barchart.js which contains a reusable visualization. All variables declared in the arguments dictionary are available as property methods. The values specified when defining the arguments or when scaffolding will serve as defaults, but everything is changeable. Note that we did not specify a DataFrame this time!



In [8]:

    
!cat ./scaffolded_barchart.js









    




/**
 * mod_barchart was scaffolded using matta
 * Variables that start with an underscore (_) are passed as arguments in Python.
 * Variables that start with _data are data parameters of the visualization, and expected to be given as datum.
 *
 * For instance, d3.select('#figure').datum({'graph': a_json_graph, 'dataframe': a_json_dataframe}).call(visualization)
 * will fill the variables _data_graph and _data_dataframe.
 */

var matta_barchart = function() {
    var __fill_data__ = function(__data__) {
        
            func_barchart.dataframe(__data__.dataframe);
        
    };

    var func_barchart = function (selection) {
        console.log('selection', selection);

        var _vis_width = _width - _padding.left - _padding.right;
        var _vis_height = _height - _padding.top - _padding.bottom;

        selection.each(function(__data__) {
            __fill_data__(__data__);

            var container = null;

            if (d3.select(this).select("svg.barchart-container").empty()) {
                
                    var svg = d3.select(this).append("svg")
                        .attr("width", _width)
                        .attr("height", _height)
                        .attr('class', 'barchart-container');

                    

                    container = svg.append("g")
                        .classed('barchart-container', true)
                        .attr('transform', 'translate(' + _padding.left + ',' + _padding.top + ')');

                
            } else {
                container = d3.select(this).select("svg.barchart-container");
            }

            console.log('container', container.node());

            
                
var x = d3.scale.ordinal()
    .rangeRoundBands([0, _vis_width], .1);

var y = d3.scale.linear()
    .range([_vis_height, 0]);

if (_y_label == null) {
    _y_label = _y;
}

x.domain(_data_dataframe.map(function(d) { return d[_x]; }));
y.domain([0, d3.max(_data_dataframe, function(d) { return d[_y]; })]);


    var xAxis = d3.svg.axis()
        .scale(x)
        .orient("bottom");

    container.append("g")
        .attr("class", "x axis")
        .attr("transform", "translate(0," + _vis_height + ")")
        .call(xAxis);



    var yAxis = d3.svg.axis()
        .scale(y)
        .orient("left");

    if (_y_axis_ticks != null) {
        yAxis.ticks(_y_axis_ticks);
    }

    var y_label = container.append("g")
        .attr("class", "y axis")
        .call(yAxis)
        .append("text");

    if (_rotate_label) {
        y_label.attr("transform", "rotate(-90)")
        .attr("y", 6)
        .attr("dy", ".71em")
        .style("text-anchor", "end");
    } else {
        y_label
        .attr("y", 6)
            .attr('x', 12)
        .attr("dy", ".71em")
        .style("text-anchor", "start");
    }

    y_label.text(_y_label);


var bar = container.selectAll(".bar")
    .data(_data_dataframe);

bar.enter().append('rect').classed('bar', true);

bar.exit().remove();

bar.attr("x", function(d) { return x(d[_x]); })
    .attr("width", x.rangeBand())
    .attr("y", function(d) { return y(d[_y]); })
    .attr("height", function(d) { return _vis_height - y(d[_y]); })
    .attr('fill', _color);
            

        });
    };

    
        var _data_dataframe = null;
        func_barchart.dataframe = function(__) {
            if (arguments.length) {
                _data_dataframe = __;
                console.log('DATA dataframe', _data_dataframe);
                return func_barchart;
            }
            return _data_dataframe;
        };
    

    
    
        var _color = "steelblue";
        func_barchart.color = function(__) {
            if (arguments.length) {
                _color = __;
                console.log('setted color', _color);
                return func_barchart;
            }
            return _color;
        };
    
        var _y_label = null;
        func_barchart.y_label = function(__) {
            if (arguments.length) {
                _y_label = __;
                console.log('setted y_label', _y_label);
                return func_barchart;
            }
            return _y_label;
        };
    
        var _y_axis_ticks = 10;
        func_barchart.y_axis_ticks = function(__) {
            if (arguments.length) {
                _y_axis_ticks = __;
                console.log('setted y_axis_ticks', _y_axis_ticks);
                return func_barchart;
            }
            return _y_axis_ticks;
        };
    
        var _height = 500;
        func_barchart.height = function(__) {
            if (arguments.length) {
                _height = __;
                console.log('setted height', _height);
                return func_barchart;
            }
            return _height;
        };
    
        var _padding = {"top": 20, "right": 30, "left": 30, "bottom": 30};
        func_barchart.padding = function(__) {
            if (arguments.length) {
                _padding = __;
                console.log('setted padding', _padding);
                return func_barchart;
            }
            return _padding;
        };
    
        var _width = 960;
        func_barchart.width = function(__) {
            if (arguments.length) {
                _width = __;
                console.log('setted width', _width);
                return func_barchart;
            }
            return _width;
        };
    
        var _rotate_label = true;
        func_barchart.rotate_label = function(__) {
            if (arguments.length) {
                _rotate_label = __;
                console.log('setted rotate_label', _rotate_label);
                return func_barchart;
            }
            return _rotate_label;
        };
    
        var _y = "frequency";
        func_barchart.y = function(__) {
            if (arguments.length) {
                _y = __;
                console.log('setted y', _y);
                return func_barchart;
            }
            return _y;
        };
    
        var _x = "letter";
        func_barchart.x = function(__) {
            if (arguments.length) {
                _x = __;
                console.log('setted x', _x);
                return func_barchart;
            }
            return _x;
        };
    
    

    
    return func_barchart;
};

Testing the Visualization

To test the visualization, we will serialize the DataFrame and then display an IFrame with the visualization using a very simple template (which we, again, copied from the original source by Mike).

matta includes a dump_data function that calls a JSON serializer under the hoods. This serializer is able to handle DataFrames and other typical python data structures.



In [9]:

    
from matta import dump_data
dump_data(df, './data.json')



In [10]:

    
!head ./data.json









    



[{"frequency": 0.08167, "letter": "A"}, {"frequency": 0.01492, "letter": "B"}, {"frequency": 0.027819999999999998, "letter": "C"}, {"frequency": 0.04253, "letter": "D"}, {"frequency": 0.12702, "letter": "E"}, {"frequency": 0.02288, "letter": "F"}, {"frequency": 0.02015, "letter": "G"}, {"frequency": 0.06094, "letter": "H"}, {"frequency": 0.06966, "letter": "I"}, {"frequency": 0.0015300000000000001, "letter": "J"}, {"frequency": 0.00772, "letter": "K"}, {"frequency": 0.04025, "letter": "L"}, {"frequency": 0.024059999999999998, "letter": "M"}, {"frequency": 0.06749, "letter": "N"}, {"frequency": 0.07507, "letter": "O"}, {"frequency": 0.01929, "letter": "P"}, {"frequency": 0.00095, "letter": "Q"}, {"frequency": 0.05987000000000001, "letter": "R"}, {"frequency": 0.06327, "letter": "S"}, {"frequency": 0.09056, "letter": "T"}, {"frequency": 0.02758, "letter": "U"}, {"frequency": 0.00978, "letter": "V"}, {"frequency": 0.0236, "letter": "W"}, {"frequency": 0.0015, "letter": "X"}, {"frequency": 0.01974, "letter": "Y"}, {"frequency": 0.00074, "letter": "Z"}]

Now let's write the HTML file. Note the following code:

d3.json('./data.json', function(json) {
    var barchart = matta_barchart();
    d3.select('body').datum({dataframe: json}).call(barchart)

});

If you would like to change the width and the x attribute of the visualization, you would say instead:

var barchart = matta_barchart().width(700).x('other_column_in_the_dataframe');



In [11]:

    
with open('./test_barchart.html', 'w') as f:
    f.write('''
<!DOCTYPE html>
<meta charset="utf-8">
<style>
.bar { fill: steelblue; }
.bar:hover { fill: brown; }
.axis { font: 10px sans-serif; }
.axis path, .axis line { fill: none; stroke: #000; shape-rendering: crispEdges; }
.x.axis path { display: none; }
</style>
<body>
<script src="http://d3js.org/d3.v3.min.js"></script>
<script src="./scaffolded_barchart.js"></script>
<script>
d3.json('./data.json', function(json) {
    var barchart = matta_barchart();
    d3.select('body').datum({dataframe: json}).call(barchart)
    
});
</script>
''')

If you are viewing this in NBViewer then you will not see the IFrame. But trust me, it works ;)



In [13]:

    
from IPython.display import IFrame
IFrame('http://localhost:8888/files/test_barchart.html', 1000, 600)









    Out[13]:

I hope you found this useful, and that you start creating visualizations using matta. I do! :)