8/8/2017
This set of notebooks will serve as examples for developing end-to-end workflows for and using pycroscopy.
In this example, we will learn how to slice multidimensional datasets.
In pycroscopy, all position dimensions of a dataset are collapsed into the first dimension and all other (spectroscopic) dimensions are collapsed to the second dimension to form a two dimensional matrix. The ancillary matricies, namely the spectroscopic indices and values matrix as well as the position indicies and values matrices will be essential for reshaping the data back to its original N dimensional form and for slicing multidimensional datasets
In [ ]:
# downloading packages:
# !pip install -U wget numpy h5py matplotlib ipywidgets pycroscopy
# Ensure python 3 compatibility:
from __future__ import division, print_function, absolute_import, unicode_literals
# In case some of these packages are not installed, install them
#!pip install -U os wget numpy h5py matplotlib pycroscopy
# The package for accessing files in directories, etc.:
import os
import wget
# The mathematical computation package:
import numpy as np
# The package used for creating and manipulating HDF5 files:
import h5py
# Packages for plotting:
import matplotlib.pyplot as plt
# basic interactive widgets:
from ipywidgets import interact
# Finally import pycroscopy for certain scientific analysis:
import pycroscopy as px
# set up notebook to show plots within the notebook
% matplotlib inline
For this example, we will be working with a Band Excitation Polarization Switching (BEPS) First Order Reversal Curve (FORC) dataset acquired from advanced atomic force microscopes. In the much simpler Band Excitation (BE) imaging datasets, a single spectra is acquired at each location in a two dimensional grid of spatial locations. Thus, BE imaging datasets have two position dimensions (X, Y) and one spectroscopic dimension (frequency - against which the spectra is recorded). The BEPS-FORC dataset used in this example has a spectra for each combination of three other paramaters (DC offset, Field, bias waveform type {FORC}). Thus, this dataset has three new spectral dimensions in addition to the spectra itself. Hence, this dataet becomes a 2+4 = 6 dimensional dataset
In [ ]:
# download the raw data file from Github:
h5_path = 'temp.h5'
url = 'https://raw.githubusercontent.com/pycroscopy/pycroscopy/master/data/FORC_BEPS.h5'
if os.path.exists(h5_path):
os.remove(h5_path)
_ = wget.download(url, h5_path)
In [ ]:
# Open the file in read-only mode
h5_file = h5py.File(h5_path, mode='r')
print('Datasets and datagroups within the file:\n------------------------------------')
px.hdf_utils.print_tree(h5_file)
In [ ]:
h5_meas_grp = h5_file['Measurement_000']
h5_main = h5_meas_grp['Channel_000/Raw_Data']
print('\nThe main dataset:\n------------------------------------')
print(h5_main)
The main dataset clearly does not provide the multidimensional information about the data that will be necessary to slice the data. For that we need the ancillary datasets that support this main dataset
In [ ]:
# pycroscopy has a convenient function to access datasets linked to a given dataset:
h5_spec_ind = px.hdf_utils.getAuxData(h5_main, 'Spectroscopic_Indices')[0]
h5_spec_val = px.hdf_utils.getAuxData(h5_main, 'Spectroscopic_Values')[0]
h5_pos_ind = px.hdf_utils.getAuxData(h5_main, 'Position_Indices')[0]
h5_pos_val = px.hdf_utils.getAuxData(h5_main, 'Position_Values')[0]
The position datasets are shaped as [spatial points, dimension] while the spectroscopic datasets are shaped as [dimension, spectral points]. Clearly the first axis of the position dataset and the second axis of the spectroscopic datasets match the correponding sizes of the main dataset.
Again, the sum of the position and spectroscopic dimensions results in the 6 dimensions originally described above.
Essentially, there is a unique combination of position and spectroscopic parameters for each cell in the two dimensionam main dataset. The interactive widgets below illustrate this point. The first slider represents the position dimension while the second represents the spectroscopic dimension. Each position index can be decoded to a set of X and Y indices and values while each spectroscopic index can be decoded into a set of frequency, dc offset, field, and forc parameters
In [ ]:
print('Main Datasets of shape:', h5_main.shape)
print('Position Datasets of shape:', h5_pos_ind.shape)
print('Spectroscopic Datasets of shape:', h5_spec_ind.shape)
spec_labels = px.hdf_utils.get_formatted_labels(h5_spec_ind)
pos_labels = px.hdf_utils.get_formatted_labels(h5_pos_ind)
def myfun(pos_index, spec_index):
for dim_ind, dim_name in enumerate(pos_labels):
print(dim_name,':',h5_pos_ind[pos_index, dim_ind])
for dim_ind, dim_name in enumerate(spec_labels):
print(dim_name,':',h5_spec_ind[dim_ind, spec_index])
interact(myfun, pos_index=(0,h5_main.shape[0]-1, 1), spec_index=(0,h5_main.shape[1]-1, 1));
The plots below show how the position and spectrocopic dimensions vary. Due to the high dimensionality of the spectroscopic dimensions, the variation of each dimension has been plotted separately.
Positions: For each Y index, the X index ramps up from 0 to 4 and repeats. Essentially, this means that for a given Y index, there were multiple measurments (different values of X)
Spectroscopic: The plot for FORC
shows that the next fastest dimension - DC offset
was varied 6 times. Correspondingly, the plot for DC offset
plot shows that this dimension ramps up from 0 to a little less than 40 for each FORC
index. This trend is the same for the faster varying dimensions - Frequency
and Field
.
In [ ]:
fig_1, axes = plt.subplots(ncols=2, figsize=(10,5))
px.plot_utils.plot_line_family(axes[0], np.arange(h5_pos_ind.shape[0]), h5_pos_ind[()].T,
line_names=pos_labels)
axes[0].set_xlabel('Position points')
axes[0].set_ylabel('Index')
axes[0].set_title('Position Indices')
axes[0].legend()
px.plot_utils.plot_line_family(axes[1], np.arange(h5_spec_ind.shape[1]), h5_spec_ind,
line_names=spec_labels)
axes[1].set_xlabel('Spectroscopic points')
axes[1].set_title('Spectroscopic Indices')
axes[1].legend()
fig_2, axes = plt.subplots(ncols=3, nrows=2, figsize=(15,10))
for dim_ind, axis, dim_label, dim_array in zip([0,2], [axes.flat[0], axes.flat[3]],
[spec_labels[0], spec_labels[2]],
[h5_spec_ind[0,:500], h5_spec_ind[2,:500]]):
axis.plot(np.arange(dim_array.size), dim_array)
axis.set_xlabel('Spectroscopic points')
axis.set_ylabel('Index')
axis.set_title('Dim ' + str(dim_ind) + ' - ' + dim_label)
rhs_axes = [axes.flat[ind] for ind in [1,2,4,5]]
for dim_ind, axis, dim_label, dim_array in zip(range(h5_spec_ind.shape[0]), rhs_axes, spec_labels, h5_spec_ind):
axis.plot(np.arange(dim_array.size), dim_array)
axis.set_xlabel('Spectroscopic points')
axis.set_ylabel('Index')
axis.set_title('Dim ' + str(dim_ind) + ' - ' + dim_label)
In [ ]:
# A similar version of this function is available in pycroscopy.io.hdf_utils.get_formatted_labels
def describe_dimensions(h5_aux):
for name, unit in zip(px.hdf_utils.get_attr(h5_aux, 'labels'),
px.hdf_utils.get_attr(h5_aux, 'units')):
print(name, '[', unit, ']')
print('Position dimension names and units:')
describe_dimensions(h5_pos_ind)
print('\nSpectrocopic dimension names and units:')
describe_dimensions(h5_spec_ind)
In [ ]:
# The function below has been implemented as pycroscopy.io.hdf_utils.get_dimensionality
def get_dim_sizes(ind_dset, is_position=False):
# ind_dset here is expected to be of the shape [dimension, points] like the spectroscopic indices
if is_position:
# Position dimensions will be turned from [points, dimension] to [dimension, points]
ind_dset = np.transpose(ind_dset[()])
dim_size = []
for col in ind_dset:
# For each dimension array, we will find the number of unique elements in it
dim_size.append(len(np.unique(col)))
return dim_size
pos_dim_sizes = get_dim_sizes(h5_pos_ind, is_position=True)
spec_dim_sizes = get_dim_sizes(h5_spec_ind)
print('Positions:',pos_dim_sizes, '\nSpectroscopic:', spec_dim_sizes)
In [ ]: