To explain how to select and transform HoloViews Elements to summarize them or to rearrange the data, we first have to explore what kind of data each Element can hold. Different Element types represent discrete and continuous spaces of different dimensionality.

Discrete samples in continuous spaces

1D: Curve, Scatter, ErrorBars, Spread

These Elements usually represented a discretely sampled, continuous, indepent variable plotted against a discrete or continuously sampled dependent variable.

2D: Raster, Image, RGB, HSV, Surface

These Elements represent discrete samples in a 2D continuous space, allowing slicing, indexing and sampling.

Binned or Categorical data:

These types usually represents bins or categorical data in a one or two-dimensional space.

1D: Histogram, Bars

2D: HeatMap, QuadMesh

Raw coordinates in continuous space:

These Elements contain data that has not been discretely sampled or binned instead merely representing coordinates in a 1D, 2D or 3D space.

1D: Distribution

2D: Points, Path, Contours, Polygons

3D: Scatter3D

And finally the Table element, which supports n-dimensional data of any kind.

Basic operations:

Based on this rough grouping we can define which operations are valid on the data. In this Tutorial we will look at three types of operation:

  • slice : Selecting a contiguous portion of the data
  • indexing : Selecting a single data value
  • table/dframe : Converts any Element or UniformNdMapping type into a Table or pandas dataframe.
  • sample : Allows sampling of sampled, binned and categorical data. Can also generating subsampling in 1D and 2D.

These operations are all concerned with selecting, sampling or reshaping your data. In the second Transforming Data Tutorial we will look at operations on the data that reduce the dimensionality and transform the data in other ways.

We'll be going through each operation in detail and provide a visual illustration to help make the semantics of each operation clear. This Tutorial does however assume you are familiar with continuous and discrete coordinate systems so please review our Continuous Coordinates Tutorial if you haven't done so already.


In [ ]:
from itertools import product
import numpy as np
import holoviews as hv
from IPython.display import HTML
%reload_ext holoviews.ipython
%opts Layout [fig_size=125] Points (s=50)
%opts Bounds (linewidth=2 color='k') {+axiswise} Text (fontsize=16 color='k') Image (cmap='Reds')

Slicing and indexing Elements

In the Exploring Data Tutorial we saw how to select individual elements embedded in a multi-dimensional space and even explored deep slicing of the RGB elements to select a subregion of the images. In addition, the Continuous Coordinates Tutorial covered slicing and indexing in Elements representing continuous coordinate coordinate system such as Image types. We'll be going through each operation in detail and provide a visual illustration to help make the semantics of each operation clear

How the Element may be indexed depends on the key dimensions (or kdims) of the Element. The choice of the right Element type therefore depends on the nature and dimensionality of your data.

Regularly sampled or binned data in 1D

Certain Chart elements support single dimensional indexing, these include Scatter, Curve, Histogram and ErrorBars. Here we'll look at how we can easily slice a Histogram:


In [ ]:
np.random.seed(42)
edges, data = np.histogram(np.random.randn(100))
hist = hv.Histogram(edges, data)
hist * hist[0:1]

We can also access the value for a specific bin in the Histogram, any index inside a particular bin will return the corresponding value or frequency.


In [ ]:
hist[0.5]

Similarly we can slice a simple Curve in this way:


In [ ]:
xs = np.linspace(0, np.pi*2, 21)
curve = hv.Curve((xs, np.sin(xs)))
curve * curve[np.pi/2:np.pi*1.5] * hv.Scatter(curve)

As before we can also get the value for a specific sample point, whatever x-index is provided will snap to the closest sample point:


In [ ]:
curve[4.1]

It is important to note that indices will always return the raw indexed value, while a slice will retain the Element type even if there is only a single value:


In [ ]:
curve[4:4.5]

Slicing and indexing in 2D

Often data is defined in a 2D space, however, for that purpose there are equivalent types to the 1D Curve and Scatter types. A Path for example can be thought of as a line in a 2D space. It may therefore be sliced along both dimensions:


In [ ]:
r = np.arange(0, 1, 0.005)
xs, ys = (r * fn(85*np.pi*r) for fn in (np.cos, np.sin))
paths = hv.Path((xs, ys))
paths + paths[0:1, 0:1]

However indexing is not allowed in this space as it represents raw 2D coordinates not regularly sampled values.

Slicing in 3D

Slicing in 3D works much like slicing in 2D but just as in the 2D case indexing is not supported.


In [ ]:
xs = np.linspace(0, np.pi*8, 201)
scatter = hv.Scatter3D((xs, np.sin(xs), np.cos(xs)))
scatter + scatter[5:10, :, 0:]

The .table and .dframe methods

All core Element types can be tabularized into a Table Element. The .table() method is the easiest way to achieve this. Alternatively the .dframe() method does the equivalent but converts the data to a pandas dataframe. These methods are very useful if you want to transform the data into a different Element type or merge different analyses.

Tabularizing simple Elements

Raster

Let's start with a simple example, we'll create a Raster Element a simple 3x3 array and convert it to a Table with the .table method.


In [ ]:
raster = hv.Raster(np.random.rand(3, 3))
raster + hv.Points(raster)[-1:3, -1:3] + raster.table()

And equivalently we can get a pandas dataframe of the Image (note that we are only using the to_html method here to allow testing, you can display pandas dataframes directly):


In [ ]:
HTML(raster.dframe().to_html())

From now on we'll focus on transforming data within HoloViews for further examples and explanations of our pandas interface have a look at the Pandas Conversion and Pandas/Seaborn Tutorials.

Image

As shown in the Continous Coordinates Tutorial Images unlike Raster represent a continuous coordinate system. If we supply the equivalent data and bounds as the Raster example above we get the center of each pixel as the x/y-coordinate instead of the array index:


In [ ]:
extents = (0, 0, 3, 3)
img = hv.Image(np.random.rand(3, 3), bounds=extents)
img + hv.Points(img, extents=extents) + img.table()

Curves

All Element types except for Annotations can be tabularized in this way. Let's take a Curve of a sine wave:


In [ ]:
xs = np.arange(10)
curve = hv.Curve(zip(xs, np.sin(xs)))
curve + hv.Scatter(zip(xs, np.zeros(10))) + curve.table()

Tabularizing space containers

Nested objects can also be deconstructed in this way providing an easy way to get your raw data out of your specialized Element types. Let's say we want to make multiple observations of a noisy signal, we can collect the data into a HoloMap to visualize it and then call .table, allowing access to the data in tabular format making it easy to perform operations on it or transform it to other Element types. Deconstructing nested data in this way only works if the data is homogenous. Practically this means that your data structure may contain any of the following types Element, NdLayout, GridSpace, HoloMap and NdOverlay, but their dimensions should be consistent throughout.

Let's now go back to the Image example. We will now collect a number of observations of some noisy data into a HoloMap and display it:


In [ ]:
obs_hmap = hv.HoloMap({i: hv.Image(np.random.randn(10, 10), bounds=extents)
                   for i in range(3)}, key_dimensions=['Observation'])
obs_hmap

Now we can serialize this data just as before, this time we get a 4D table. The key dimensions of both the HoloMap, the Images as well as the z-values of Image are merged into a table. We can visualize the samples we have collected by converting it to a Scatter3D object.


In [ ]:
%%opts Layout [fig_size=150] Scatter3D [color_index=3] (cmap='Reds' edgecolor='k')
obs_hmap.table().to.scatter3d(['Observation', 'x', 'y'], ['z']) + obs_hmap.table()

This way of deconstructing will work for any data structure that satisfies the conditions described above, no matter how nested. If we vary the amount of noise in addition to performing multiple observations we can create a NdLayout of HoloMaps, one for each level of noise, and animated by the observation number.


In [ ]:
error_hmap = hv.HoloMap({(i, j): hv.Image(j*np.random.randn(3, 3), bounds=extents)
                         for i, j in product(range(3), np.linspace(0, 1, 3))},
                        key_dimensions=['Observation', 'noise'])
noise_layout = error_hmap.layout('noise')
noise_layout

And again, we can easily convert the object to a Table:


In [ ]:
%%opts Table [fig_size=150]
noise_layout.table()

Sampling

Sampling is a very similar operation to indexing specific coordinates in an Element, it is therefore necessary that the sampled Element has discrete samples, such as the discrete 1D Element types and Image types that we looked at above. The difference to regular indexing is that multiple indices may be supplied at the same time and that the return type is another Element type, usually either a Table or a Curve.

Sampling Elements

In general sampling on Elements can be performed via an explicit list of samples or by passing the samples for each dimension keyword arguments.

We'll start by providing a single sample to an Image object.


In [ ]:
%opts Image (cmap='Blues')

In [ ]:
extents = (0, 0, 10, 10)
img = hv.Image(np.random.rand(10, 10), bounds=extents)
img_coords = hv.Points(img.table(), extents=extents)
img + img * img_coords * hv.Points([img.closest([(5,5)])])(style=dict(color='r')) + img.sample([(5, 5)])

Next we can try sampling along only one Dimension on our 2D Image, leaving us with a 1D Element, in this case a Curve:


In [ ]:
sampled = img.sample(y=5)
img + img * img_coords * hv.Points(zip(sampled['x'], [img.closest(y=5)]*10)) + sampled

Sampling works on any regularly sampled Element type, for example we can select multiple samples along the x-axis of a Curve.


In [ ]:
xs = np.arange(10)
samples = [2, 4, 6, 8]
curve = hv.Curve(zip(xs, np.sin(xs)))
curve_samples = hv.Scatter(zip(xs, [0] * 10)) * hv.Scatter(zip(samples, [0]*len(samples))) 
curve + curve_samples + curve.sample(samples)

Sampling HoloMaps

'Sampling is often useful when you have more data than you wish to visualize or analyze at one time. Just like in the .table section we'll create a HoloMap containing a number observations of some noisy data.


In [ ]:
obs_hmap = hv.HoloMap({i: hv.Image(np.random.randn(10, 10), bounds=extents)
                       for i in range(3)}, key_dimensions=['Observation'])

HoloMaps also provide additional functionality to perform regular sampling on your data. In this case we'll take 3x3 subsamples of each of the Images.


In [ ]:
sample_style = dict(facecolors='r', edgecolors='k', alpha=1)
all_samples = obs_hmap.table().to.scatter3d(['Observation', 'x', 'y'], ['z'])(style=dict(alpha=0.15))
sampled = obs_hmap.sample((3,3))
subsamples = sampled.to.scatter3d(['Observation', 'x', 'y'], ['z'])(style=sample_style)
all_samples * subsamples + sampled

By supplying bounds in as a (left, bottom, right, top) tuple we can also sample a subregion of our images:


In [ ]:
sampled = obs_hmap.sample((3,3), bounds=(2,5,5,10))
subsamples = sampled.to.scatter3d(['Observation', 'x', 'y'], ['z'])(style=sample_style)
all_samples * subsamples + sampled

Since this kind of sampling is only well supported for continuous coordinate systems we can only apply this kind of sampling to Image types for now.

Sampling Charts

Sampling Chart type Elements like Curve, Scatter, Histogram is only supported by providing an explicit list of samples.


In [ ]:
xs = np.arange(10)
extents = (0, 0, 2, 10)
curve = hv.HoloMap({(i) : hv.Curve(zip(xs, np.sin(xs)*i))
                    for i in np.linspace(0.5, 1.5, 3)},
                   key_dimensions=['Observation'])
all_samples = curve.table().to.points(['Observation', 'x'], ['y'])
sampled = curve.sample([0, 2, 4, 6, 8])
sampling = all_samples * sampled.to.points(['Observation', 'x'], ['y'], extents=extents)(style=dict(color='r'))
sampling + sampled

Alternatively you can always deconstruct your data into a Table and perform select operations instead. This is also the easiest way to sample NdElement types like Bars. Individual samples should be supplied as a set, while ranges can be specified as a two-tuple.


In [ ]:
sampled = curve.table().select(Observation=(0, 1.1), x={0, 2, 4, 6, 8})
sampling = all_samples * sampled.to.points(['Observation', 'x'], ['y'], extents=extents)(style=dict(color='r'))
sampling + sampled

That is all for now, in this Tutorial we have discovered how to select, slice and sample our data and export it to HoloViews Table Elements or pandas dataframes. In the next Tutorial we will discover how to reduce our data along specific dimensions and how to apply generic operations on the data.