By @carnby
Probably you have seen the barchart example by Mike Bostock. It is here. It is embedded below so you can see it.
In this notebook I explain how to use matta to implement this barchart.
Why use matta? Because one thing is to have an example of a visualization, and another one is to have a reusable implementation. Reusable implementations are not about having a function, are about an entire context where you can easily use your visualization with other datasets.
How do we do it? In this notebook we see the basic scaffolding done by matta to reproduce the example chart to visualize a pandas DataFrame. By being able to use a DataFrame, we can forget about converting the dataset to the specific layout the visualization designed had in mind, and instead, you can focus on converting to a DataFrame (which will probably be very, very easy)
Let's begin.
In [1]:
from IPython.display import IFrame
IFrame('http://bl.ocks.org/mbostock/raw/3885304', 1000, 550)
Out[1]:
Here we load matta.
If you see the README, you will notice that you can install matta's javascript and css into your IPython profile. In this way you do not need to issue a init_javascript call. It is here just for demonstration - if you use a core matta visualization and export the notebook to NBViewer, you will need to execute it, to allow your visitor's browser to load the required js/css files.
If you installed matta into your profile, then using the function will do no harm - it detects that matta was loaded and does nothing.
In [2]:
import matta
matta.init_javascript(path='https://rawgit.com/carnby/matta/master/matta/libs/')
Out[2]:
In [3]:
import pandas as pd
df = pd.read_csv('http://bl.ocks.org/mbostock/raw/3885304/964f9100166627a89c7e6c23ce8128f5aefd5510/data.tsv', delimiter='\t')
df.head()
Out[3]:
First, let's sketch the visualization by defining what are its options and code.
The visualization options or arguments are contained in a dictionary. Note that the dictionary contains a subdictionary named variables. Those variables will be exposed as methods of the scaffolded visualization, and are available in code as _variable_name.
Note also the data dictionary. It indicates that the visualization receives a pandas DataFrame. This dataframe is available internally as the _data_dataframe variable.
In [4]:
# the options
barchart_args = {
'requirements': ['d3'],
'visualization_name': 'barchart',
'visualization_js': './barchart.js',
'figure_id': None,
'container_type': 'svg',
'data': {
'dataframe': None,
},
'options': {
'background_color': None,
'x_axis': True,
'y_axis': True,
},
'variables': {
'width': 960,
'height': 500,
'padding': {'left': 30, 'top': 20, 'right': 30, 'bottom': 30},
'x': 'x',
'y': 'y',
'y_axis_ticks': 10,
'color': 'steelblue',
'y_label': None,
'rotate_label': True,
},
}
This is the visualization code. Note that is almost a copy-and-paste version of the original example. We just renamed the variables to _variable_name and used other auxiliary variables like _vis_width which are exposed by matta.
Note that the code is not strictly javascript. Actually, the file is expected to be a jinja2 template.
We save this template as barchart.js, as barchart_args['visualization_js'] points to it.
In [5]:
barchart_code = '''
var x = d3.scale.ordinal()
.rangeRoundBands([0, _vis_width], .1);
var y = d3.scale.linear()
.range([_vis_height, 0]);
if (_y_label == null) {
_y_label = _y;
}
x.domain(_data_dataframe.map(function(d) { return d[_x]; }));
y.domain([0, d3.max(_data_dataframe, function(d) { return d[_y]; })]);
{% if options.x_axis %}
var xAxis = d3.svg.axis()
.scale(x)
.orient("bottom");
container.append("g")
.attr("class", "x axis")
.attr("transform", "translate(0," + _vis_height + ")")
.call(xAxis);
{% endif %}
{% if options.y_axis %}
var yAxis = d3.svg.axis()
.scale(y)
.orient("left");
if (_y_axis_ticks != null) {
yAxis.ticks(_y_axis_ticks);
}
var y_label = container.append("g")
.attr("class", "y axis")
.call(yAxis)
.append("text");
if (_rotate_label) {
y_label.attr("transform", "rotate(-90)")
.attr("y", 6)
.attr("dy", ".71em")
.style("text-anchor", "end");
} else {
y_label
.attr("y", 6)
.attr('x', 12)
.attr("dy", ".71em")
.style("text-anchor", "start");
}
y_label.text(_y_label);
{% endif %}
var bar = container.selectAll(".bar")
.data(_data_dataframe);
bar.enter().append('rect').classed('bar', true);
bar.exit().remove();
bar.attr("x", function(d) { return x(d[_x]); })
.attr("width", x.rangeBand())
.attr("y", function(d) { return y(d[_y]); })
.attr("height", function(d) { return _vis_height - y(d[_y]); })
.attr('fill', _color);
'''
with open('./barchart.js', 'w') as f:
f.write(barchart_code)
This is the actual matta code to display the visualization in the notebook. Note that the keyword arguments are keys from the barchart_args dictionary. If you use a keyword argument not present in the dictionary, an Exception will be raised.
In [6]:
from matta.sketch import build_sketch
barchart = build_sketch(barchart_args)
barchart(dataframe=df, x='letter', y='frequency', rotate_label=False)
That's it! :)
We copied-and-pasted implemented a barchart. The cool thing is that we didn't had to worry about data formats, since we knew the data was a DataFrame. We also didn't have to worry about dependencies like loading d3.js or to have a reusable visualization, because matta does all that.
The next step is to scaffold a reusable visualization. Actually, the code is very similar:
In [7]:
barchart(x='letter', y='frequency').scaffold(filename='./scaffolded_barchart.js', define_js_module=False)
What this does is to create a file named scaffolded_barchart.js which contains a reusable visualization. All variables declared in the arguments dictionary are available as property methods. The values specified when defining the arguments or when scaffolding will serve as defaults, but everything is changeable. Note that we did not specify a DataFrame this time!
In [8]:
!cat ./scaffolded_barchart.js
To test the visualization, we will serialize the DataFrame and then display an IFrame with the visualization using a very simple template (which we, again, copied from the original source by Mike).
matta includes a dump_data function that calls a JSON serializer under the hoods. This serializer is able to handle DataFrames and other typical python data structures.
In [9]:
from matta import dump_data
dump_data(df, './data.json')
In [10]:
!head ./data.json
Now let's write the HTML file. Note the following code:
d3.json('./data.json', function(json) {
var barchart = matta_barchart();
d3.select('body').datum({dataframe: json}).call(barchart)
});
If you would like to change the width and the x attribute of the visualization, you would say instead:
var barchart = matta_barchart().width(700).x('other_column_in_the_dataframe');
In [11]:
with open('./test_barchart.html', 'w') as f:
f.write('''
<!DOCTYPE html>
<meta charset="utf-8">
<style>
.bar { fill: steelblue; }
.bar:hover { fill: brown; }
.axis { font: 10px sans-serif; }
.axis path, .axis line { fill: none; stroke: #000; shape-rendering: crispEdges; }
.x.axis path { display: none; }
</style>
<body>
<script src="http://d3js.org/d3.v3.min.js"></script>
<script src="./scaffolded_barchart.js"></script>
<script>
d3.json('./data.json', function(json) {
var barchart = matta_barchart();
d3.select('body').datum({dataframe: json}).call(barchart)
});
</script>
''')
If you are viewing this in NBViewer then you will not see the IFrame. But trust me, it works ;)
In [13]:
from IPython.display import IFrame
IFrame('http://localhost:8888/files/test_barchart.html', 1000, 600)
Out[13]:
I hope you found this useful, and that you start creating visualizations using matta. I do! :)