Pandas is one of the most popular Python libraries providing high-performance, easy-to-use data structures and data analysis tools. Additionally it provides IO interfaces to store and load your data in a variety of formats including csv files, json, pickles and even databases. In other words it makes loading data, munging data and even complex data analysis tasks a breeze.

Combining the high-performance data analysis tools and IO capabilities that Pandas provides with interactivity and ease of generating complex visualization in HoloViews makes the two libraries a perfect match.

In this tutorial we will explore how you can easily convert between Pandas dataframes and HoloViews components. The tutorial assumes you already familiar with some of the core concepts of both libraries, so if you need a refresher on HoloViews have a look at the Introduction and Exploring Data.

Basic conversions


In [ ]:
import numpy as np
import pandas as pd
import holoviews as hv
from IPython.display import HTML
%reload_ext holoviews.ipython

In [ ]:
%output holomap='widgets'

The first thing to understand when working with pandas dataframes in HoloViews is how data is indexed. Pandas dataframes are structured as tables with any number of columns and indexes. HoloViews on the other hand deals with Dimensions. HoloViews container objects such as the HoloMap, NdLayout, GridSpace and NdOverlay have kdims, which provide metadata about the data along that dimension and how they can be sliced. Element objects on the other hand have both key dimensions (kdims) and value dimensions (vdims). The difference between kdims and vdims in HoloViews is that the former may be sliced and indexed while the latter merely provide a description about the values along that Dimension.

Let's start by constructing a Pandas dataframe of a few columns and display it as it's html format (throughtout this notebook we will visualize the DFrames using the IPython HTML display function, to allow this notebook to be tested, you can of course visualize dataframes directly).


In [ ]:
df = pd.DataFrame({'a':[1,2,3,4], 'b':[4,5,6,7], 'c':[8, 9, 10, 11]})
HTML(df.to_html())

Now that we have a basic dataframe we can wrap it in the HoloViews DFrame wrapper element.


In [ ]:
example = hv.DFrame(df)

The HoloViews DFrame wrapper element can either be displayed directly using some of the specialized plot types that Pandas supplies or be used as conversion interface to HoloViews objects. This Tutorial focuses only on the conversion interface, for the specialized Pandas and Seaborn plot types have a look at the Pandas and Seaborn tutorial.

The data on the DFrame Element is accessible via the .data attribute like on all other Elements.


In [ ]:
list(example.data.columns)

Having wrapped the dataframe in the DFrame wrapper we can now begin interacting with it. The simplest thing we can do is to convert it to a HoloViews Table object. The conversion interface has a simple signature, after selecting the Element type you want to convert to, in this case a Table, you pass the desired kdims and vdims to the corresponding conversion method, either as list of column name strings or as a single string.


In [ ]:
example_table = example.table(['a', 'b'], 'c')
example_table

As you can see, we now have a Table, which has a and b as its kdims and c as its value_dimension. The index of the original dataframe was dropped however. So if your data has some complex indices set ensure to convert them to simple columns using the .reset_index method on the pandas dataframe:


In [ ]:
HTML(df.reset_index().to_html())

Now we can employ the HoloViews slicing semantics to select the desired subset of the data and use the usual compositing + operator to lay the data out side by side:


In [ ]:
example_table[:, 4:8:2] + example_table[2:5:2, :]

Dropping and reducing columns

This was the simple case, we converted all the dataframe columns to a Table object. This time let's only select a subset of the Dimensions.


In [ ]:
example.scatter('a', 'b')

As you can see HoloViews simply ignored the remaining Dimension. By default the conversion functions ignore any numeric unselected Dimensions. All non-numeric dimensions are converted to dimensions on the returned HoloMap however. Both of these behaviors can be overridden by supplying explicit map dimensions and/or a reduce_fn.

You can perform this conversion with any type and lay your results out side-by-side making it easy to look at the same dataset in any number of ways.


In [ ]:
%%opts Curve [xticks=3 yticks=3]
example.curve('a', 'b') + example_table

Finally, we can convert all homogenous HoloViews types (i.e. anything except Layout and Overlay) back to a pandas dataframe using the dframe method.


In [ ]:
HTML(example_table.dframe().to_html())

Working with higher-dimensional data

The last section only scratched the surface, where HoloViews really comes into its own is for very high-dimensional datasets. Let's load a dataset of some macro-economic indicators for a OECD countries from 1964-1990 from the holoviews website.


In [ ]:
macro_df = pd.read_csv('http://ioam.github.com/holoviews/Tutorials/macro.csv', '\t')

Now we can display the first ten rows:


In [ ]:
HTML(macro_df[0:10].to_html())

As you can see some of the columns are poorly named and carry no information about the units of each quantity. The DFrame element allows defining either an explicit list of kdims which must match the number of columns or a dimensions dictionary, where the keys should match the columns and the values must be either string or HoloViews Dimension object.


In [ ]:
dimensions = {'unem': hv.Dimension('Unemployment', unit='%'),
              'capmob': 'Capital Mobility',
              'gdp': hv.Dimension('GDP Growth', unit='%')}
macro = hv.DFrame(macro_df, dimensions=dimensions)

Let's list the conversion methods supported by the standard DFrame element, if you have the Seaborn extension the DFrame object that is imported by default will support additional conversions:


In [ ]:
from holoviews.interface.pandas import DFrame as PDFrame
sorted([k for k in PDFrame.__dict__ if not k.startswith('_') and k != 'name'])

All these methods have a common signature, first the kdims, vdims, HoloMap dimensions and a reduce_fn. We'll see what that means in practice for some of the complex Element types in a minute.

Conversion to complex HoloViews components

We'll begin by setting a few default plot options, which will apply to all the objects. You can do this by setting the appropriate options directly Store.options with the desired {type}.{group}.{label} path or using the %opts line magic, see the Options Tutorial for more details.

Here we define some default options on Store.options directly using the %output magic only to set the dpi of the following figures.


In [ ]:
%output dpi=100
options = hv.Store.options()
opts = hv.Options('plot', aspect=2, fig_size=250, show_grid=True, legend_position='right')
options.NdOverlay = opts
options.Overlay = opts

Overlaying

Above we looked at converting a DFrame to simple Element types, however HoloViews also provides powerful container objects to explore high-dimensional data, currently these are HoloMap, NdOverlay, NdLayout and GridSpace. HoloMaps provide the basic conversion type from which you can conveniently convert to the other container types using the .overlay, .layout and .grid methods. This way we can easily create an overlay of GDP Growth curves by year for each country. Here 'year' is a key dimension and GDP Growth a value dimension. As we discussed before all non-numeric Dimensions become HoloMap kdims, in this case the 'country' is the only non-numeric Dimension, which we then overlay calling the .overlay method.


In [ ]:
%%opts Curve (color=Palette('Set3'))
gdp_curves = macro.curve('year', 'GDP Growth')
gdp_curves.overlay('country')

Collapsing

Now that we've extracted the gdp_curves we can apply some operations to them. The collapse method applies some function across the data along the supplied dimensions. This let's us quickly compute a the mean GDP Growth by year for example, but it also allows us to map a function with parameters to the data and visualize the resulting samples. A simple example is computing a curve for each percentile and embedding it in an NdOverlay.

Additionally we can apply a Palette to visualize the range of percentiles.


In [ ]:
%%opts NdOverlay [show_legend=False] Curve (color=Palette('Blues'))
hv.NdOverlay({i: gdp_curves.collapse('country', np.percentile, q=i) for i in range(0,101)})

Multiple key dimensions

Many HoloViews Element types support multiple kdims, including HeatMaps, Points, Scatter, Scatter3D, and Bars. Bars in particular allows you to lay out your data in groups, categories and stacks. By supplying the index of that dimension as a plotting option you can choose to lay out your data as groups of bars, categories in each group and stacks. Here we choose to lay out the trade surplus of each country with groups for each year, no categories, and stacked by country. Finally we choose to color the Bars for each item in the stack.


In [ ]:
%opts Bars [bgcolor='w' aspect=3 figure_size=450 show_frame=False]

In [ ]:
%%opts Bars [category_index=2 stack_index=0 group_index=1 legend_position='top' legend_cols=7 color_by=['stack']] (color=Palette('Dark2'))
macro.bars(['country', 'year'], 'trade')

Using the .select method we can pull out the data for just a few countries and specific years. We can also make more advanced use the Palettes.

Palettes can customized by selecting only a subrange of the underlying cmap to draw the colors from. The Palette draws samples from the colormap using the supplied sample_fn, which by default just draws linear samples but may be overriden with any function that draws samples in the supplied ranges. By slicing the Set1 colormap we draw colors only from the upper half of the palette and then reverse it.


In [ ]:
%%opts Bars [padding=0.02 color_by=['group']] (alpha=0.6, color=Palette('Set1', reverse=True)[0.:.2])
countries = {'Belgium', 'Netherlands', 'Sweden', 'Norway'}
macro.bars(['country', 'year'], 'Unemployment').select(year=(1978, 1985), country=countries)

Combining heterogeneous data

Many HoloViews Elements support multiple key and value dimensions. A HeatMap may be indexed by two kdims, so we can visualize each of the economic indicators by year and country in a Layout. Layouts are useful for heterogeneous data you want to lay out next to each other. Because all HoloViews objects support the + operator, we can use np.sum to compose them into a Layout.

Before we display the Layout let's apply some styling, we'll suppress the value labels applied to a HeatMap by default and substitute it for a colorbar. Additionally we up the number of xticks that are drawn and rotate them by 90 degrees to avoid overlapping. Flipping the y-axis ensures that the countries appear in alphabetical order. Finally we reduce some of the margins of the Layout and increase the size.


In [ ]:
%opts HeatMap [show_values=False xticks=40 xrotation=90 invert_yaxis=True]
%opts Layout [figure_size=150]

In [ ]:
hv.Layout([macro.heatmap(['year', 'country'], value)
           for value in macro.data.columns[2:]]).cols(2)

Another way of combining heterogeneous data dimensions is to map them to a multi-dimensional plot type. Scatter Elements for example support multiple vdims, which may be mapped onto the color and size of the drawn points in addition to the y-axis position.

As for the Curves above we supply 'year' as the sole key_dimension and rely on the DFrame to automatically convert the country to a map dimension, which we'll overlay. However this time we select both GDP Growth and Unemployment but to be plotted as points. To get a sensible chart, we adjust the scaling_factor for the points to get a reasonable distribution in sizes and apply a categorical Palette so we can distinguish each country.


In [ ]:
%%opts Scatter [scaling_factor=1.4] (color=Palette('Set3') edgecolors='k')
gdp_unem_scatter = macro.scatter('year', ['GDP Growth', 'Unemployment'])
gdp_unem_scatter.overlay('country')

Since the DFrame treats all columns in the dataframe as kdims we can map any dimension against any other, allowing us to explore relationships between economic indicators, for example the relationship between GDP Growth and Unemployment, again colored by country.


In [ ]:
%%opts Scatter [size_index=1 scaling_factor=1.3] (color=Palette('Dark2'))
macro.scatter('GDP Growth', 'Unemployment').overlay('country')

Combining heterogeneous Elements

Since all HoloViews Elements are composable we can generate complex figures just by applying the * operator. We'll simply reuse the GDP curves we generated earlier, combine them with the scatter points, which indicate the unemployment rate by size and annotate the data with some descriptions of what happened economically in these years.


In [ ]:
%%opts Curve (color='k') Scatter [color_index=2 size_index=2 scaling_factor=1.4] (cmap='Blues' edgecolors='k')
macro_overlay = gdp_curves * gdp_unem_scatter
annotations = hv.Arrow(1973, 8, 'Oil Crisis', 'v') * hv.Arrow(1975, 6, 'Stagflation', 'v') *\
hv.Arrow(1979, 8, 'Energy Crisis', 'v') * hv.Arrow(1981.9, 5, 'Early Eighties\n Recession', 'v')
macro_overlay * annotations

Since we didn't map the country to some other container type, we get a widget allowing us to view the plot separately for each country, reducing the forest of curves we encountered before to manageable chunks.

While looking at the plots individually like this allows us to study trends for each country, we may want to lay outa subset of the countries side by side. We can easily achieve this by selecting the countries we want to view and and then applying the .layout method. We'll also want to restore the aspect so the plots compose nicely.


In [ ]:
%opts Overlay [aspect=1]

In [ ]:
%%opts NdLayout [figure_size=100] Scatter [color_index=2] (cmap='Reds')
countries = {'United States', 'Canada', 'United Kingdom'}
(gdp_curves * gdp_unem_scatter).select(country=countries).layout('country')

Finally let's combine some plots for each country into a Layout, giving us a quick overview of each economic indicator for each country:


In [ ]:
%%opts Layout [fig_size=100] Scatter [color_index=2] (cmap='Reds')
(macro_overlay.relabel('GDP Growth', depth=1) +\
macro.curve('year', 'Unemployment', group='Unemployment',) +\
macro.curve('year', 'trade', ['country'], group='Trade') +\
macro.points(['GDP Growth', 'Unemployment'], [])).cols(2)

That's it for this Tutorial, if you want to see some more examples of using HoloViews with Pandas look at the Pandas and Seaborn Tutorial.