If the hvtutorial
environment has been correctly created and activated using the instructions listed on the welcome page, the following imports should run and hv.extension('bokeh')
should present a small HoloViews logo:
In [ ]:
import numpy as np
import pandas as pd
import holoviews as hv
hv.extension('bokeh')
Here we import the NumPy and pandas data libraries with their standard abbreviations, plus HoloViews with its standard abbreviation hv
. The line reading hv.extension('bokeh')
loads and activates the bokeh plotting backend, so all visualizations will be generated using Bokeh. We will see how to use matplotlib instead of bokeh later in the tutorial Customizing Visual Appearance.
In short, elements are HoloViews' most basic, core primitives. All the various types of hv.Element
accept semantic metadata that allows their input data to be given an automatic, visual representation. Most importantly, element objects always preserve the raw data they are supplied.
In this notebook we will explore a number of different element types and examine some of the ways that elements can supplement the supplied data with useful semantic data. To choose your own types to use in the exercises, you can browse them all in the reference gallery.
All basic elements accept their data as a single, mandatory positional argument which may be supplied in a number of different formats, some of which we will now examine. A handful of annotation elements are exceptions to this rule, namely Arrow
, Text
, Bounds
, Box
and Ellipse
, as they require additional positional arguments.
To start with a simple example, we will sample a quadratic function $y=100-x^2$ at 21 different values of $x$ and wrap that data in a HoloViews element:
In [ ]:
xs = [i for i in range(-10,11)]
ys = [100-(x**2) for x in xs]
simple_curve = hv.Curve((xs,ys))
simple_curve
Here we supplied two lists of values as a tuple to hv.Curve
, assigned the result to the attribute simple_curve
, and let Jupyter display the object using its default visual representation. As you can see, that default visual representation is a Bokeh plot, which is automatically generated by HoloViews when Jupyter requests it. But simple_curve
itself is just a wrapper around your data, not a plot, and you can choose other representations that are not plots. For instance, printing the object will give you a purely textual representation instead:
In [ ]:
print(simple_curve)
The textual representation indicates that this object is a continuous mapping from x
to y
, which is how HoloViews knew to render it as a continuous curve. You can also access the full original data if you wish:
In [ ]:
#simple_curve.data
If you uncomment that line, you should see the original data values, though in some cases like this one the data has been converted to a better format (a Pandas dataframe instead of Python lists).
There are a number of similar elements to Curve
such as Area
and Scatter
, which you can try out for yourself in the exercises.
In [ ]:
# Exercise: Try switching hv.Curve with hv.Area and hv.Scatter
In [ ]:
# Optional:
# Look at the .data attribute of the elements you created to see the raw data (as a pandas DataFrame)
Wrapping your data (xs
and ys
) here as a HoloViews element is sufficient to make it visualizable, but there are many other aspects of the data that we can capture to convey more about its meaning to HoloViews. For instance, we might want to specify what the x-axis and y-axis actually correspond to, in the real world. Perhaps this parabola is the trajectory of a ball thrown into the air, in which case we could declare the object as:
In [ ]:
trajectory = hv.Curve((xs,ys), kdims=['distance'], vdims=['height'])
trajectory
Here we have added semantic information about our data to the Curve
element. Specifically, we told HoloViews that the kdim
or key dimension of our data corresponds to the real-world independent variable ('distance'), and the vdim
or value dimension 'height' is the real-world dependent variable. Even though the additional information we provided is about the data, not directly about the plot, HoloViews is designed to reveal the properties of your data accurately, and so the axes now update to show what these dimensions represent.
In [ ]:
# Exercise: Take a look at trajectory.vdims
The type of an element is a declaration of important facts about your data, which gives HoloViews the appropriate hint required to generate a suitable visual representation from it. For instance, calling it a Curve
is a declaration from the user that the data consists of samples from an underlying continuous function, which is why HoloViews plots it as a connected object. If we convert to an hv.Scatter
object instead, the same set of data will show up as separated points, because "Scatter" does not make an assumption that the data is meant to be continuous:
In [ ]:
hv.Scatter(simple_curve)
Casting the same data between different Element types in this way is often useful as a way to see your data differently, particularly if you are not certain of a single best way to interpret the data. Casting preserves your declared metadata as much as possible, propagating your declarations from the original object to the new one.
In [ ]:
# How do you predict the representation for hv.Scatter(trajectory) will differ from
# hv.Scatter(simple_curve) above? Try it!
In [ ]:
# Also try casting the trajectory to an area then back to a curve.
The curve above was constructed from a list of x-values and a list of y-values. Next we will create an element using an entirely different datatype, namely a NumPy array:
In [ ]:
x = np.linspace(0, 10, 500)
y = np.linspace(0, 10, 500)
xx, yy = np.meshgrid(x, y)
arr = np.sin(xx)*np.cos(yy)
image = hv.Image(arr)
As above, we know that this data was sampled from a continuous function, but this time the data is mapping from two key dimensions, so we declare it as an hv.Image
object. As you might expect, an Image
object is visualized as an image by default:
In [ ]:
image
In [ ]:
# Exercise: Try visualizing different two-dimensional arrays.
# You can try a new function entirely or simple modifications of the existing one
# E.g., explore the effect of squaring and cubing the sine and cosine terms
In [ ]:
# Optional: Try supplying appropriate labels for the x- and y- axes
# Hint: The x,y positions are how you *index* (or key) the array *values* (so x and y are both kdims)
In addition to basic Python datatypes and xarray and NumPy array types, HoloViews elements can be passed tabular data in the form of pandas DataFrames:
In [ ]:
economic_data = pd.read_csv('../data/macro.csv')
economic_data.tail()
Let's build an element that helps us understand how the percentage growth in US GDP varies over time. As our dataframe contains GDP growth data for lots of countries, let us select the United States from the table and create a Curve
element from it:
In [ ]:
US_data = economic_data[economic_data['country'] == 'United States'] # Select data for the US only
US_data.tail()
In [ ]:
growth_curve = hv.Curve(US_data, kdims=['year'], vdims=['growth'])
growth_curve
In this case, declaring the kdims
and vdims
does not simply declare the axis labels, it allows HoloViews to discover which columns of the data should be used from the dataframe for each of the axes.
In [ ]:
# Exercise: Plot the unemployment (unem) over year
In this example, the simplistic axis labels are starting to get rather limiting. Changing the kdims
and vdims
is no longer trivial either, as they need to match the column names in the dataframe. Is the only solution to rename the columns in our dataframe to something more descriptive but more awkward to type?
Luckily, no. The recommendation is that you continue to use short, programmer and pandas-friendly, tab-completeable column names as these are also the most convenient dimension names to use with HoloViews.
What you should do instead is set the dimension labels, using the fact that dimensions are full, rich objects behind the scenes:
In [ ]:
gdp_growth = growth_curve.redim.label(growth='GDP growth')
gdp_growth
With the redim
method, we have associated a dimension label with the growth
dimension, resulting in a new element called gdp_growth
(you can check for yourself that growth_curve
is unchanged). Let's look at what the new dimension contains:
In [ ]:
gdp_growth.vdims
In [ ]:
# Exercise: Use redim.label to give the year dimension a better label
The redim
utility lets you easily change other dimension parameters, and as an example let's give our GDP growth dimension the appropriate unit:
In [ ]:
gdp_growth.redim.unit(growth='%')
In [ ]:
# Exercise: Use redim.unit to give the year dimension a better unit
# For instance, relabel to 'Time' then give the unit as 'year'
Viewing a single element at a time often conveys very little information for the space used. In this section, we introduce the two composition operators +
and *
to build Layout
and Overlay
objects.
Earlier on we were casting a parabola to different element types. Viewing the different types was awkward, wasting lots of vertical space in the notebook. What we will often want to do is view these elements side by side:
In [ ]:
layout = trajectory + hv.Scatter(trajectory) + hv.Area(trajectory) + hv.Spikes(trajectory)
layout.cols(2)
What we have created with the +
operator is an hv.Layout
object (with a hint that a two-column layout is desired):
In [ ]:
print(layout)
Now let us build a new layout by selecting elements from layout
:
In [ ]:
layout.Curve.I + layout.Spikes.I
We see that a Layout
lets us pick component elements via two levels of tab-completable attribute access. Note that by default the type of the element defines the first level of access and the second level of access automatically uses Roman numerals (because Python identifiers cannot start with numbers).
These two levels correspond to another type of semantic declaration that applies to the elements directly (rather than their dimensions), called group
and label
. Specifically, group
allows you to declare what kind of thing this object is, while label
allows you to label which specific object it is. What you put in those declarations, if anything, will form the title of the plot:
In [ ]:
cannonball = trajectory.relabel('Cannonball', group='Trajectory')
integral = hv.Area(trajectory).relabel('Filled', group='Trajectory')
labelled_layout = cannonball + integral
labelled_layout
In [ ]:
# Exercise: Try out the tab-completion of labelled_layout to build a new layout swapping the position of these elements
In [ ]:
# Optional: Try using two levels of dictionary-style access to grab the cannonball trajectory
Layout places objects side by side, allowing it to collect (almost!) any HoloViews objects that you want to indicate are related. Another operator *
allows you to overlay elements into a single plot, if they live in the same space (with matching dimensions and similar ranges over those dimensions). The result of *
is an Overlay
:
In [ ]:
trajectory * hv.Spikes(trajectory)
In [ ]:
# Exercise: Make an overlay of the Spikes object from layout on top of the filled trajectory area of labelled_layout
One thing that is specific to Overlays is the use of color cycles to automatically differentiate between elements of the same type and group
:
In [ ]:
tennis_ball = cannonball.clone((xs, 0.5*np.array(ys)), label='Tennis Ball')
cannonball + tennis_ball + (cannonball * tennis_ball)
Here we use the clone
method to make a shallower tennis-ball trajectory: the clone
method create a new object that preserves semantic metadata while allowing overrides (in this case we override the input data and the label
).
As you can see, HoloViews can determine that the two overlaid curves will be distinguished by color, and so it also provides a legend so that the mapping from color to data is clear.
In [ ]:
# Optional Exercise:
# 1. Create a thrown_ball curve with half the height of tennis_ball by cloning it and assigning the label 'Thrown ball'
# 2. Add thrown_ball to the overlay
In [ ]:
full_trajectory = cannonball.redim.label(distance='Horizontal distance', height='Vertical height')
ascending = full_trajectory[-10:1].relabel('ascending')
descending = cannonball.select(distance=(0,11.)).relabel('descending')
ascending * descending
Note that the slicing in HoloViews is done in the continuous space of the dimension and not in the integer space of individual data samples. In this instance, the slice is over the distance
dimension and we can see that the slicing semantics follow the usual Python convention of an inclusive lower bound and an exclusive upper bound.
This example also illustrates why we keep simple identifiers for dimension names and reserve longer descriptions for the dimension labels: certain methods such as the select
method shown above accept dimension names as keywords.
Later in the tutorial, we will see how elements and the principles of composition extend to containers (such as ) which make data exploration quick, easy and interactive. Before we examine the container types, we will look at how to customize the appearance of elements, change the plotting extension and specify output formats.
For a quick demonstration related to what we will be covering, hit the kernel restart button (⟳) in the toolbar for this notebook, change hv.extension('bokeh')
to hv.extension('matplotlib')
in the first cell and rerun the notebook!