As we have already discovered, elements are simple wrappers around your data that provide a semantically meaningful representation. Tabular data (also called columnar data) is one of the most common, general, and versatile data formats, corresponding to how data is laid out in a spreadsheet. There are many different ways to put data into a tabular format, but for interactive analysis having tidy data provides flexibility and simplicity.
In this tutorial all the information you have learned in the previous sections will finally really pay off. We will discover how to facet data and use different element types to explore and visualize the data contained in a real dataset.
In [ ]:
import numpy as np
import scipy.stats as ss
import pandas as pd
import holoviews as hv
hv.extension('bokeh')
%opts Curve Scatter [tools=['hover']]
In [ ]:
macro_df = pd.read_csv('../data/macro.csv')
macro_df.head()
For tidy data, the columns of the table represent variables or dimensions and the rows represent observations.
Mathematical variables can usually be described as dependent or independent. In HoloViews these correspond to value dimensions and key dimensions (respectively).
In this dataset 'country'
and 'year'
are independent variables or key dimensions, while the remainder are automatically inferred as value dimensions:
In [ ]:
macro = hv.Dataset(macro_df, kdims=['country', 'year'])
macro
We will also give the dimensions more sensible labels using redim.label
:
In [ ]:
macro = macro.redim.label(growth='GDP Growth', unem='Unemployment', year='Year', country='Country')
Once we have a Dataset
with multiple dimensions we can map these dimensions onto elements onto the .to
method. The method takes four main arguments:
As a first simple example let's go through such a declaration:
Curve
groupby
the 'country'.
In [ ]:
curves = macro.to(hv.Curve, kdims='year', vdims='unem', groupby='country')
curves
Alternatively we could also group by the year and view the unemployment rate by country as Bars instead:
In [ ]:
%%opts Bars [width=600 xrotation=45]
bars = macro.sort('country').to(hv.Bars, kdims='country', vdims='unem', groupby='year')
bars
In [ ]:
# Exercise: Create a HeatMap using ``macro.to``, declaring vdims 'year' and 'country', and kdims 'growth'
# You'll need to declare ``width`` and ``xrotation`` plot options for HeatMap to make the plot readable
# You can also add ``tools=['hover']`` to get more info
In [ ]:
Often we want to summarize the distribution of values, e.g. to reveal the distribution of unemployment rates for each OECD country across time. This means we want to ignore the 'year' dimension in our dataset, letting it be summarized instead. To stop HoloViews from grouping by the extra variable, we pass an empty list to the groupby argument.
In [ ]:
%%opts BoxWhisker [width=800 xrotation=30] (box_fill_color=Palette('Category20'))
macro.to(hv.BoxWhisker, 'country', 'growth', groupby=[])
In [ ]:
# Exercise: Display the distribution of GDP growth by year using the BoxWhisker element
In [ ]:
%%opts Scatter [width=800 height=400 size_index='growth'] (color=Palette('Category20') size=5)
%%opts NdOverlay [legend_position='left']
macro.to(hv.Scatter, 'year', ['unem', 'growth']).overlay().relabel('OECD Unemployment 1960 - 1990')
In [ ]:
# Exercise: Instead of faceting using an .overlay() of Scatter elements, facet the data using a .grid()
# of Curve or Area elements
In [ ]:
In [ ]:
# Exercise: You'll notice that you get quite a lot of countries in the grid.
# You can try supplying a short list of countries to the 'macro.select` method to get a more-practical subset.
# Hint: You may want to pass the shared_yaxis=True plot option to GridSpace, to get a numeric axis
In [ ]:
Another common operation is computing aggregates. We can also compute and visualize these easily using the aggregate
method. Simply supply the dimension(s) to aggregate by and supply a function and optionally a secondary function to compute the spread. Once we have computed the aggregate we can simply pass it to the Curve
and ErrorBars
:
In [ ]:
%%opts Curve [width=600]
agg = macro.aggregate('year', function=np.mean, spreadfn=np.std)
(hv.Curve(agg) * hv.ErrorBars(agg, kdims=['year'], vdims=['growth', 'growth_std']))
In [ ]:
# Exercise: Display aggregate GDP growth by country, building it up in a series of steps
# Step 1. First, aggregate the data by country rather than by year, using
# np.mean and ss.sem as the function and spreadfn, respectively, then
# make a `Bars` element from the resulting ``agg``
In [ ]:
agg = macro.aggregate('country', function=np.mean, spreadfn=ss.sem)
hv.Bars(agg)
In [ ]:
# Step 2: You should now have a bars plot, but with no error bars. To add the error bars,
# print the 'agg' as text to see which vdims are available (which will be different for
# different spreadfns), then overlay ErrorBars as above but for the new kdims and
# the appropriate vdims
# Hint: You'll want to make the plot wider and use an xrotation to see the labels clearly
In [ ]:
The next section shows a similar approach, but for working with gridded data, in multidimensional array formats.