In the Introductory Tutorial and the Element and Container overviews you can see how HoloViews allows you to wrap your data into annotated Element
s that can be composed easily into complex visualizations.
In this tutorial, we will see how all of the data you want to examine can be embedded as Elements
into a nested, sparsely populated, multi-dimensional data structure that gives you maximum flexibility to slice, select, and combine your data for visualization and analysis. With HoloViews objects, you can visualize your multi-dimensional data as animations, images, charts, and parameter spaces with ease, allowing you to quickly discover the important features interactively and then prepare corresponding plots for reports, publications, or web pages.
We will first start with the very powerful HoloMap
container, and then show how HoloMap
objects can be nested inside the other Container objects to make all of your data available easily.
In [ ]:
import numpy as np
import holoviews as hv
hv.notebook_extension()
%output holomap='auto'
%timer start
To start, here are some general imports we will be using, mainly from the Python standard library:
In [ ]:
import json
import matplotlib.dates as md
try:
from urllib2 import urlopen
except:
from urllib.request import urlopen
from io import BytesIO
Python users will be familiar with dictionaries as a way to collect data together in a conveniently accessible manner. Unlike NumPy arrays, dictionaries are sparse and heterogeneous and do not have to be declared with a fixed size.
HoloMap
s are a core part of HoloViews and are essential for generating animated visualizations. They also provide highly useful ways to manipulate your data for display and have several useful properties:
HoloMap
s are ordered (internally they use OrderedDictionary
, or if installed, the optimized cyordereddict
).HoloMap
s let you index your data with an arbitrary number of dimensions (e.g. date
and batch-number
), not just one like a Python dictionary.HoloMap
s let you select portions of your data by slicing each available dimension independently.HoloMap
s also provide ways to transform your data by sampling, reducing and collapsing the data Elements
.Dimension
s in a HoloMap
may be mapped onto parameter spaces for easy visualization of a portion of your multidimensional data space.In this notebook we will be exploring weather data from Hurricane Sandy, which swept across the Caribbean and the Eastern US seaboard in late October 2012. We will scrape our data from various online sources, exploring not only how we can quickly generate animations using HoloMaps, but also how we can deal with very high-dimensional data.
We've already downloaded and cropped a number of frames of the satellite-imagery-based wind speed models from NASA and cached them on the HoloViews website. If you want to select a different cropping region or sample more frames you can find out how to get the raw data directly from NASA in this Wiki entry. For now, we'll just get the preprocessed data:
In [ ]:
iobuffer = BytesIO(urlopen('http://assets.holoviews.org/hurricane.npz').read())
data = np.load(BytesIO(iobuffer.getvalue()))
dates = data['dates']
surface_data, nearsrfc_data = data['surface'], data['near_surface']
Now that we have loaded the data we can store the raw image arrays as RGB Elements and create a HoloMap. We begin by declaring the key dimensions (kdims
) of the HoloMap, which determine how the data will be stored and thus how you will be able to index and select it most easily. In this case we will index our HoloMap both by the frame number and the date:
In [ ]:
date_dim = hv.Dimension("Date", value_format=md.DateFormatter('%b %d %Y %H:%M UTC'), type=float)
kdims = ['Frame', date_dim]
Dimension
s can be specified as a simple string, or as a Dimension
object with additional information to give HoloViews some hints about how to format and display values along that Dimension
.
Creating a HoloMap
is just like creating a Python dictionary, and so you can either pass a dictionary object or a list of (key,value) pairs. The keys can each be a single value for a one-dimensional HoloMap
, or tuples for multiple Dimension
s.
In [ ]:
srfc = [((frame, date), hv.RGB(surface_data[...,frame], bounds=(0, 0)+surface_data.shape[0:2][::-1], xdensity=1,
label='Hurricane Sandy', group='Surface Wind Speed'))
for frame, date in zip(range(len(dates)), dates)]
nsrfc = [((frame, date), hv.RGB(nearsrfc_data[...,frame], bounds=(0, 0)+nearsrfc_data.shape[0:2][::-1], xdensity=1,
label='Hurricane Sandy', group='Near Surface Wind Speed'))
for frame, date in zip(range(len(dates)), dates)]
surface_wind = hv.HoloMap(srfc, kdims=kdims)
nearsurface_wind = hv.HoloMap(nsrfc, kdims=kdims)
Not only is the HoloMap
constructor similar to Python dictionaries, HoloMap
s also provide __getitem__
, __setitem__
, update
, get
, pop
, keys
, values
and items
just as for normal dictionaries. In addition, HoloMap
provides a .clone
method that will return a copy of the HoloMap
containing the same data, where the data and all the parameters may now be overridden.
A HoloMap
must be uniform in the type, group
, label
, and key dimensions of its Elements
, because it defines a parameter space of Elements
varying only in their n-dimensional index and data. This also allows HoloMaps
to inherit the value
and label
of its Elements
, which we can see by inspecting the HoloMap
repr()
for satellite_map
:
In [ ]:
print(surface_wind)
Since the RGB
elements we have created are not square we can declare that RGB
Element
s should be displayed with an aspect ratio of 1.0 using the %opts
line magic, which will apply to all subsequent cells:
In [ ]:
%opts RGB [aspect=1]
To get a quick glimpse at the data we have collected, you can access the .last
property, which will return the last Element
in the HoloMap
:
In [ ]:
surface_wind.last
If you are unsure how large the HoloMap
is or want to know a bit more about the Dimension
ranges, you can use the .info
property. For a HoloMap
, .info
will list the dimensions, their ranges for the key dimensions on the HoloMap
, and even the deep_dimensions
, i.e. any Dimension
s contained within the Element
s of the HoloMap
.
In [ ]:
surface_wind.info
Having found out a bit about the HoloMap
, we can look at a few frames, starting with selecting just the first three:
In [ ]:
surface_wind[0:3]
Because HoloMap
s support all the slicing semantics including steps, we can do things like select every second frame in the second half of the animation:
In [ ]:
surface_wind[7:14:2]
As you may have noticed, the slices are not simply by whole-number index, as for a numpy array. A HoloMap
, like all other Dimensioned
objects (i.e., most HoloViews components), is always sliceable by the values along its key dimensions, in whatever units they are expressed.
Apart from simple slicing semantics, you can also select Element
s by passing the Dimension values as a set. Since our Element
s are guaranteed to be uniform, a HoloMap
also allows deep indexing into the key dimensions of its Element
s, allowing us to easily select a subregion of each satellite frame (where :
alone means to select the entire range of that dimension)
In [ ]:
surface_wind[{0, 2, 3, 5}, :, 150:350, 50:250]
Finally let's put together everything we've learned about indexing and go one step further. So far we've been looking at just the surface wind speed plots, but now let's combine them into a Layout
. Just like Element
s, HoloMap
s can be grouped into a Layout
using the +
operator. Since the Layout
is a Tree
-based data structure it doesn't have any Dimension
s of its own and we can't use __getitem__
. Instead we may use select
, which can be found on all HoloViews components. The .select
method may be supplied with any number of dimension and value slice pairs. Slices may be supplied either as explicit slice
objects or as tuples.
In [ ]:
(surface_wind + nearsurface_wind).select(Frame=slice(0, 10, 2), x=(150,350), y=(50, 250))
HoloMap
s provide the starting point to display your data in any number of ways. While HoloMap
dimensions are displayed as frames of an animation by default, you can easily transform a HoloMap
into another n-D component type, such as an NdLayout
, GridSpace
, or NdOverlay
, via the .layout
, .grid
, and .overlay
methods.
Each of these methods groups the data along the values of the dimensions you specify and return the newly grouped object. These methods are each just convenience methods around the .groupby
method, which can split a HoloMap
into whatever container and group types you specify.
Before we can start grouping, however, we hit a snag in our indexing: the Frame and Date dimensions we specified above are redundant, because for each frame there is only one corresponding date. As a result, any groupby
operation will fail. But we can easily solve this problem by reindexing the HoloMap
:
In [ ]:
print("Dimensions before reindex: %s" % surface_wind.dimensions('key', label=True))
surface_reindexed = surface_wind.reindex(['Date'])
print("Dimensions after reindex: %s" % surface_reindexed.dimensions('key', label=True))
Now that we have removed the redundant Frame
Dimension we can create an NdLayout indexed just by the date:
In [ ]:
surface_reindexed[::4].layout('Date')
In [ ]:
%output size=250
For a more compact representation, you may also create a GridSpace
using the .grid
method. In a GridSpace
, each dimension maps onto an axis, which limits it to a maximum of two Dimension
s, but redundant data like the shared axes and axis labels are suppressed. To avoid the tick labels overlapping we will also define a rotation of the tick marks by a few degrees.
In [ ]:
%opts GridSpace [xrotation=10]
surface_reindexed[::2].grid('Date')
Now how do we go about combining the two HoloMap
s into a single GridSpace
? First let us reindex the near-surface data as well.
In [ ]:
nearsurface_reindexed = nearsurface_wind.reindex(['Date'])
The two HoloMaps we have represent wind speed at different heights. Meteorologists state the height of different air masses by their pressure. The near-surface imagery is at 850 hPa, while the surface level images are at 1000 hPa.
In [ ]:
height = hv.Dimension('Layer Height', unit='hPa')
We can add this Dimension
to the HoloMaps
via the add_dimension
method, which accepts the new dimension, the index position at which to insert that dimension and the dimension value as arguments:
In [ ]:
surface = surface_reindexed.add_dimension(height, 1, 1000)
near_surface = nearsurface_reindexed.add_dimension(height, 1, 850)
Now we can combine the two HoloMap
s by creating a clone and updating it with the other HoloMap
:
In [ ]:
combined_hurricane = surface.clone()
combined_hurricane.update(near_surface)
Using .info
we can confirm the two HoloMap
s have been successfully merged.
In [ ]:
combined_hurricane.info
Merging multiple HoloMap
s in this step-by-step way would be cumbersome, and avoiding this complexity is why the Collator
object (another instance of Dimensioned
) has been provided. Collator
will be described in the Columnar Data tutorial.
Now that both the Date
and Layer Height
are Dimension
s on the HoloMap
we have various options for laying out our data. We can simply map each Dimension
to an axis of a GridSpace
:
In [ ]:
combined_hurricane.select(Date=(None, None, 2)).grid(['Date', 'Layer Height'])
Or we can choose to animate one Dimension
but not the other:
In [ ]:
%output size=300
combined_hurricane.grid(['Date'])[::2]
Another powerful property of HoloMap
s is that when combined into a Layout
via the +
operator, their Dimension
s are coordinated across each frame. This allows you to handle missing values, because HoloViews will blank out any frames without matching dimension values when combining overlapping dimensions:
In [ ]:
%output size=100
surface_wind[0:4] + nearsurface_wind[3:6]
This feature becomes particularly important when combining data from different sources, which shares common dimensions but may not be sampled at precisely the same values. To demonstrate this, let's load some additional data.
Using the timestamps, we can look up weather data about different cities via the REST API provided by openweathermap.org as shown below. We'll actually use a cached copy of this data, from the HoloViews website, so that it loads more quickly and more reliably for the purposes of this tutorial.
First we define the new dimensions we will be adding:
In [ ]:
temp_dim = hv.Dimension('Temperature', unit="$^o$C")
humidity = hv.Dimension('Humidity', unit='%')
pressure = hv.Dimension('Pressure', unit='hpa')
wind = hv.Dimension('Wind Speed', unit='km/h')
Now we can load the data into a HoloMap
of ItemTable
s. ItemTable
s simply associate a value with each of the value dimensions we defined above. We will collect data for a few cities on the East Coast at all the timestamps associated with the satellite wind imagery. As for the satellite imagery, we've prefetched this data, and to find out how to do that yourself just go here.
In [ ]:
vdims = [temp_dim, humidity, pressure, wind]
cities = ['New York', 'Washington DC', 'Santiago de Cuba']
main_cols = ['temp', 'humidity', 'pressure']
tables = hv.HoloMap(kdims=['City', date_dim])
iobuffer = BytesIO(urlopen('http://assets.holoviews.org/weather.json').read())
weather_json = json.loads(iobuffer.read().decode())
for entry in weather_json:
city, date = entry['key']
tables[str(city), date] = hv.ItemTable(zip(vdims, tuple(entry['value'])))
Since the two datasets share the same timestamps we can now combine them into a Layout.
In [ ]:
heterogenous = (surface_reindexed + tables).select(Date=(None,None,2))
heterogenous
The Date dimension in the satellite data and the City and Date dimensions on the weather data combined seamlessly to give us this multi-dimensional selection widget. Since the satellite data is independent of the City it stays fixed when selecting a different city, while the Date, which is present on both, controls both components of the plot. You can play with the sliders a little bit and explore the data; once you've selected a slider you can also press R
and P
to play an animation back and forth respectively.
Now let's put together some of what we've learned. By making use of the slicing we can zoom in for each city on the satellite imagery and place the ItemTable
containing the weather data next to it. Then we'll arrange the layout in three columns by calling the .cols
method:
In [ ]:
%%output size=120
(surface_reindexed[:, 170:270, 200:300] +\
nearsurface_reindexed[:, 170:270, 200:300] +\
tables.select(City='New York').reindex(['Date']).relabel(label='New York', depth=1) +\
surface_reindexed[:, 150:250, 150:250] +\
nearsurface_reindexed[:, 150:250, 150:250] +\
tables.select(City='Washington DC').reindex(['Date']).relabel(label='Washington DC', depth=1) +\
surface_reindexed[:, 140:240, 50:150] +\
nearsurface_reindexed[:, 140:240, 50:150] +\
tables.select(City='Santiago de Cuba').reindex(['Date']).relabel(label='Santiago de Cuba', depth=1)).cols(3)
Now that you see how to assemble your data into an organization that lets you explore and analyze it, you can study the various Container types that make this possible, especially the section on nested containers. And then just try it out!
In [ ]:
%timer