To explain how to select and transform HoloViews Elements to summarize them or to rearrange the data, we first have to explore what kind of data each Element can hold. Different Element types represent discrete and continuous spaces of different dimensionality.
1D: Curve
, Scatter
, ErrorBars
, Spread
These Elements usually represented a discretely sampled, continuous, indepent variable plotted against a discrete or continuously sampled dependent variable.
2D: Raster
, Image
, RGB
, HSV
, Surface
These Elements represent discrete samples in a 2D continuous space, allowing slicing, indexing and sampling.
These types usually represents bins or categorical data in a one or two-dimensional space.
1D: Histogram
, Bars
2D: HeatMap
, QuadMesh
These Elements contain data that has not been discretely sampled or binned instead merely representing coordinates in a 1D, 2D or 3D space.
1D: Distribution
2D: Points
, Path
, Contours
, Polygons
3D: Scatter3D
And finally the Table
element, which supports n-dimensional data of any kind.
Based on this rough grouping we can define which operations are valid on the data. In this Tutorial we will look at three types of operation:
UniformNdMapping
type into a Table
or pandas dataframe.These operations are all concerned with selecting, sampling or reshaping your data. In the second Transforming Data Tutorial we will look at operations on the data that reduce the dimensionality and transform the data in other ways.
We'll be going through each operation in detail and provide a visual illustration to help make the semantics of each operation clear. This Tutorial does however assume you are familiar with continuous and discrete coordinate systems so please review our Continuous Coordinates Tutorial if you haven't done so already.
In [ ]:
from itertools import product
import numpy as np
import holoviews as hv
from IPython.display import HTML
%reload_ext holoviews.ipython
%opts Layout [fig_size=125] Points (s=50)
%opts Bounds (linewidth=2 color='k') {+axiswise} Text (fontsize=16 color='k') Image (cmap='Reds')
In the Exploring Data Tutorial we saw how to select individual elements embedded in a multi-dimensional space and even explored deep slicing of the RGB
elements to select a subregion of the images. In addition, the Continuous Coordinates Tutorial covered slicing and indexing in Elements representing continuous coordinate coordinate system such as Image
types. We'll be going through each operation in detail and provide a visual illustration to help make the semantics of each operation clear
How the Element may be indexed depends on the key dimensions (or kdims
) of the Element. The choice of the right Element type therefore depends on the nature and dimensionality of your data.
Certain Chart elements support single dimensional indexing, these include Scatter
, Curve
, Histogram
and ErrorBars
. Here we'll look at how we can easily slice a Histogram
:
In [ ]:
np.random.seed(42)
edges, data = np.histogram(np.random.randn(100))
hist = hv.Histogram(edges, data)
hist * hist[0:1]
We can also access the value for a specific bin in the Histogram
, any index inside a particular bin will return the corresponding value or frequency.
In [ ]:
hist[0.5]
Similarly we can slice a simple Curve
in this way:
In [ ]:
xs = np.linspace(0, np.pi*2, 21)
curve = hv.Curve((xs, np.sin(xs)))
curve * curve[np.pi/2:np.pi*1.5] * hv.Scatter(curve)
As before we can also get the value for a specific sample point, whatever x-index is provided will snap to the closest sample point:
In [ ]:
curve[4.1]
It is important to note that indices will always return the raw indexed value, while a slice will retain the Element type even if there is only a single value:
In [ ]:
curve[4:4.5]
Often data is defined in a 2D space, however, for that purpose there are equivalent types to the 1D Curve and Scatter types. A Path
for example can be thought of as a line in a 2D space. It may therefore be sliced along both dimensions:
In [ ]:
r = np.arange(0, 1, 0.005)
xs, ys = (r * fn(85*np.pi*r) for fn in (np.cos, np.sin))
paths = hv.Path((xs, ys))
paths + paths[0:1, 0:1]
However indexing is not allowed in this space as it represents raw 2D coordinates not regularly sampled values.
Slicing in 3D works much like slicing in 2D but just as in the 2D case indexing is not supported.
In [ ]:
xs = np.linspace(0, np.pi*8, 201)
scatter = hv.Scatter3D((xs, np.sin(xs), np.cos(xs)))
scatter + scatter[5:10, :, 0:]
All core Element types can be tabularized into a Table
Element. The .table()
method is the easiest way to achieve this. Alternatively the .dframe()
method does the equivalent but converts the data to a pandas dataframe. These methods are very useful if you want to transform the data into a different Element type or merge different analyses.
Let's start with a simple example, we'll create a Raster
Element a simple 3x3 array and convert it to a Table
with the .table
method.
In [ ]:
raster = hv.Raster(np.random.rand(3, 3))
raster + hv.Points(raster)[-1:3, -1:3] + raster.table()
And equivalently we can get a pandas dataframe of the Image (note that we are only using the to_html
method here to allow testing, you can display pandas dataframes directly):
In [ ]:
HTML(raster.dframe().to_html())
From now on we'll focus on transforming data within HoloViews for further examples and explanations of our pandas interface have a look at the Pandas Conversion and Pandas/Seaborn Tutorials.
As shown in the Continous Coordinates Tutorial Images unlike Raster represent a continuous coordinate system. If we supply the equivalent data and bounds as the Raster example above we get the center of each pixel as the x/y-coordinate instead of the array index:
In [ ]:
extents = (0, 0, 3, 3)
img = hv.Image(np.random.rand(3, 3), bounds=extents)
img + hv.Points(img, extents=extents) + img.table()
In [ ]:
xs = np.arange(10)
curve = hv.Curve(zip(xs, np.sin(xs)))
curve + hv.Scatter(zip(xs, np.zeros(10))) + curve.table()
Nested objects can also be deconstructed in this way providing an easy way to get your raw data out of your specialized Element types. Let's say we want to make multiple observations of a noisy signal, we can collect the data into a HoloMap to visualize it and then call .table
, allowing access to the data in tabular format making it easy to perform operations on it or transform it to other Element types. Deconstructing nested data in this way only works if the data is homogenous. Practically this means that your data structure may contain any of the following types Element, NdLayout, GridSpace, HoloMap and NdOverlay, but their dimensions should be consistent throughout.
Let's now go back to the Image example. We will now collect a number of observations of some noisy data into a HoloMap and display it:
In [ ]:
obs_hmap = hv.HoloMap({i: hv.Image(np.random.randn(10, 10), bounds=extents)
for i in range(3)}, key_dimensions=['Observation'])
obs_hmap
Now we can serialize this data just as before, this time we get a 4D table. The key dimensions of both the HoloMap, the Images as well as the z-values of Image are merged into a table. We can visualize the samples we have collected by converting it to a Scatter3D object.
In [ ]:
%%opts Layout [fig_size=150] Scatter3D [color_index=3] (cmap='Reds' edgecolor='k')
obs_hmap.table().to.scatter3d(['Observation', 'x', 'y'], ['z']) + obs_hmap.table()
This way of deconstructing will work for any data structure that satisfies the conditions described above, no matter how nested. If we vary the amount of noise in addition to performing multiple observations we can create a NdLayout
of HoloMaps, one for each level of noise, and animated by the observation number.
In [ ]:
error_hmap = hv.HoloMap({(i, j): hv.Image(j*np.random.randn(3, 3), bounds=extents)
for i, j in product(range(3), np.linspace(0, 1, 3))},
key_dimensions=['Observation', 'noise'])
noise_layout = error_hmap.layout('noise')
noise_layout
And again, we can easily convert the object to a Table
:
In [ ]:
%%opts Table [fig_size=150]
noise_layout.table()
Sampling is a very similar operation to indexing specific coordinates in an Element
, it is therefore necessary that the sampled Element
has discrete samples, such as the discrete 1D Element types and Image types that we looked at above. The difference to regular indexing is that multiple indices may be supplied at the same time and that the return type is another Element
type, usually either a Table
or a Curve
.
In general sampling on Elements can be performed via an explicit list of samples or by passing the samples for each dimension keyword arguments.
We'll start by providing a single sample to an Image object.
In [ ]:
%opts Image (cmap='Blues')
In [ ]:
extents = (0, 0, 10, 10)
img = hv.Image(np.random.rand(10, 10), bounds=extents)
img_coords = hv.Points(img.table(), extents=extents)
img + img * img_coords * hv.Points([img.closest([(5,5)])])(style=dict(color='r')) + img.sample([(5, 5)])
Next we can try sampling along only one Dimension on our 2D Image, leaving us with a 1D Element, in this case a Curve
:
In [ ]:
sampled = img.sample(y=5)
img + img * img_coords * hv.Points(zip(sampled['x'], [img.closest(y=5)]*10)) + sampled
Sampling works on any regularly sampled Element type, for example we can select multiple samples along the x-axis of a Curve.
In [ ]:
xs = np.arange(10)
samples = [2, 4, 6, 8]
curve = hv.Curve(zip(xs, np.sin(xs)))
curve_samples = hv.Scatter(zip(xs, [0] * 10)) * hv.Scatter(zip(samples, [0]*len(samples)))
curve + curve_samples + curve.sample(samples)
In [ ]:
obs_hmap = hv.HoloMap({i: hv.Image(np.random.randn(10, 10), bounds=extents)
for i in range(3)}, key_dimensions=['Observation'])
HoloMaps also provide additional functionality to perform regular sampling on your data. In this case we'll take 3x3 subsamples of each of the Images.
In [ ]:
sample_style = dict(facecolors='r', edgecolors='k', alpha=1)
all_samples = obs_hmap.table().to.scatter3d(['Observation', 'x', 'y'], ['z'])(style=dict(alpha=0.15))
sampled = obs_hmap.sample((3,3))
subsamples = sampled.to.scatter3d(['Observation', 'x', 'y'], ['z'])(style=sample_style)
all_samples * subsamples + sampled
By supplying bounds in as a (left, bottom, right, top) tuple we can also sample a subregion of our images:
In [ ]:
sampled = obs_hmap.sample((3,3), bounds=(2,5,5,10))
subsamples = sampled.to.scatter3d(['Observation', 'x', 'y'], ['z'])(style=sample_style)
all_samples * subsamples + sampled
In [ ]:
xs = np.arange(10)
extents = (0, 0, 2, 10)
curve = hv.HoloMap({(i) : hv.Curve(zip(xs, np.sin(xs)*i))
for i in np.linspace(0.5, 1.5, 3)},
key_dimensions=['Observation'])
all_samples = curve.table().to.points(['Observation', 'x'], ['y'])
sampled = curve.sample([0, 2, 4, 6, 8])
sampling = all_samples * sampled.to.points(['Observation', 'x'], ['y'], extents=extents)(style=dict(color='r'))
sampling + sampled
Alternatively you can always deconstruct your data into a Table and perform select operations instead. This is also the easiest way to sample NdElement
types like Bars. Individual samples should be supplied as a set, while ranges can be specified as a two-tuple.
In [ ]:
sampled = curve.table().select(Observation=(0, 1.1), x={0, 2, 4, 6, 8})
sampling = all_samples * sampled.to.points(['Observation', 'x'], ['y'], extents=extents)(style=dict(color='r'))
sampling + sampled
That is all for now, in this Tutorial we have discovered how to select, slice and sample our data and export it to HoloViews Table Elements or pandas dataframes. In the next Tutorial we will discover how to reduce our data along specific dimensions and how to apply generic operations on the data.