The Center for Nanophase Materials Science and The Institute for Functional Imaging for Materials
Oak Ridge National Laboratory
8/01/2017
Here, we will demonstrate how to load, reshape, and visualize multidimensional imaging datasets. For this example, we will load a three dimensional Band Excitation imaging dataset acquired from an atomic force microscope.
In [ ]:
# Make sure pycroscopy and wget are installed
!pip install pycroscopy
!pip install -U wget
In [ ]:
# Ensure python 3 compatibility
from __future__ import division, print_function, absolute_import
# Import necessary libraries:
import h5py
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from IPython.display import display
from os import remove
import pycroscopy as px
# set up notebook to show plots within the notebook
% matplotlib inline
For simplicity we will use a dataset that has already been transalated form its original data format into a pycroscopy compatible hierarchical data format (HDF5 or H5) file
Python uses the h5py libaray to read, write, and access HDF5 files
In [ ]:
# Downloading the example file from the pycroscopy Github project
url = 'https://raw.githubusercontent.com/pycroscopy/pycroscopy/master/data/BELine_0004.h5'
h5_path = 'temp.h5'
_ = wget.download(url, h5_path)
print('Working on:\n' + h5_path)
# Open the file in read-only mode
h5_file = h5py.File(h5_path, mode='r')
# Here, h5_file is an active handle to the open file
The file contents are stored in a tree structure, just like files on a contemporary computer. The file contains datagroups (similar to file folders) and datasets (similar to spreadsheets).
There are several datasets in the file and these store:
In [ ]:
print('Datasets and datagroups within the file:\n------------------------------------')
px.hdf_utils.print_tree(h5_file)
In [ ]:
print('Datagroup corresponding to a channel of information:')
print(h5_file['/Measurement_000/Channel_000/'])
print('\nDataset containing the raw data collected from the microscope:')
print(h5_file['/Measurement_000/Channel_000/Raw_Data'])
The output above shows that the "Raw_Data" dataset is a two dimensional dataset, and has complex value (a +bi) entries at each element in the 2D matrix.
This dataset is contained in a datagroup called "Channel_000" which itself is contained in a datagroup called "Measurement_000"
The datagroup "Channel_000" contains several "members", where these members could be datasets like "Raw_Data" or datagroups like "Channel_000"
HDF5 datasets and datagroups can also store metadata such as experimental parameters. These metadata can be text, numbers, small lists of numbers or text etc. These metadata can be very important for understanding the datasets and guide the analysis routines
In [ ]:
print('\nMetadata or attributes in a datagroup\n------------------------------------')
for key in h5_file['/Measurement_000'].attrs:
print('{} : {}'.format(key, px.hdf_utils.get_attr(h5_file['/Measurement_000'], key)))
In the case of the spectral dataset under investigation, a spectra with a single peak was collected at each spatial location on a two dimensional grid of points. Thus, this dataset has two position dimensions and one spectroscopic dimension (spectra).
In pycroscopy, all spatial dimensions are collapsed to a single dimension and similarly, all spectroscopic dimensions are also collapsed to a single dimension. Thus, the data is stored as a two-dimensional (N x P) matrix with N spatial locations each with P spectroscopic datapoints.
This general and intuitive format allows imaging data from any instrument, measurement scheme, size, or dimensionality to be represented in the same way.
Such an instrument independent data format enables a single set of anaysis and processing functions to be reused for multiple image formats or modalities.
In [ ]:
h5_chan_grp = h5_file['/Measurement_000/']
h5_main = h5_chan_grp['Channel_000/Raw_Data']
print('\nThe main dataset:\n------------------------------------')
print(h5_main)
print('Original three dimensional matrix had {} rows and {} columns \
each having {} spectral points'.format(h5_chan_grp.attrs['grid_num_rows'],
h5_chan_grp.attrs['grid_num_cols'],
h5_chan_grp.attrs['num_bins']))
print('Collapsing the position dimensions: ({}x{}, {}) -> ({}, {})'.format(
h5_chan_grp.attrs['grid_num_rows'],
h5_chan_grp.attrs['grid_num_cols'],
h5_chan_grp.attrs['num_bins'],
h5_chan_grp.attrs['grid_num_rows'] * h5_chan_grp.attrs['grid_num_cols'],
h5_chan_grp.attrs['num_bins']))
Each main dataset is always accompanied by four ancillary datasets that explain:
In the case of the 3d dataset under investigation, the positions will be arranged as row0-col0, row0-col1.... row0-colN, row1-col0.... The spectroscopic information remains unchanged.
In [ ]:
print('\nThe ancillary datasets:\n------------------------------------')
print(h5_file['/Measurement_000/Channel_000/Position_Indices'])
print(h5_file['/Measurement_000/Channel_000/Position_Values'])
print(h5_file['/Measurement_000/Channel_000/Spectroscopic_Indices'])
print(h5_file['/Measurement_000/Channel_000/Spectroscopic_Values'])
print('\nSpatial dimensions:', px.hdf_utils.get_attr(
h5_file['/Measurement_000/Channel_000/Position_Values'], 'labels'))
print('Spectroscopic dimensions:', px.hdf_utils.get_attr(
h5_file['/Measurement_000/Channel_000/Spectroscopic_Values'], 'labels'))
In [ ]:
fig, axis = plt.subplots(figsize=(6,6))
axis.plot(h5_file['/Measurement_000/Channel_000/Position_Indices'][:, 0],
'orange', label='column')
axis.plot(h5_file['/Measurement_000/Channel_000/Position_Indices'][:, 1],
'black', label='row')
axis.legend()
axis.set_title('Position Indices');
In [ ]:
# specify a pixel index of interest
pixel_ind = 128
# ensuring that this index is within the bounds of the dataset
pixel_ind = max(0, min(int(pixel_ind), h5_main.shape[0]))
# Extracting the frequency vector (x-axis) to plot the spectra against
freq_vec = h5_file['/Measurement_000/Channel_000/Bin_Frequencies'][()] * 1E-3
fig, axis = plt.subplots(figsize=(6,6))
axis.plot(freq_vec, np.abs(h5_main[pixel_ind]))
axis.set_xlabel('Frequency (kHz)')
axis.set_ylabel('Amplitude (a. u.)')
axis.set_title('Spectra at position {}'.format(pixel_ind));
If the frequency is fixed, the spatial distribution would result in a 2D spatial map.
Note that the spatial dimensions are collapsed to a single dimension in all pycroscopy datasets. Thus, the 1D vector at the specified frequency needs to be reshaped back to a 2D map
In [ ]:
# specify a pixel index of interest
freq_ind = 40
# ensuring that this index is within the bounds of the dataset
freq_ind = max(0, min(int(freq_ind), h5_main.shape[1]))
# extracting the position data (1D) at the spcified frequency index
data_vec = np.abs(h5_main[:, freq_ind])
# Constructing the 2D spatial map from the 1D vector:
spat_map = np.reshape(data_vec, (h5_chan_grp.attrs['grid_num_rows'],
h5_chan_grp.attrs['grid_num_cols']))
fig, axis = plt.subplots(figsize=(6,6))
axis.imshow(spat_map, cmap='inferno')
axis.set_xlabel('Columns')
axis.set_ylabel('Rows')
freq_vec = h5_file['/Measurement_000/Channel_000/Bin_Frequencies'][()] * 1E-3
axis.set_title('Amplitude at frequency {} kHz '.format(np.round(freq_vec[freq_ind], 2)));
There are several utility functions in pycroscopy that make it easy to access and reshape datasets. Here we show you how to return your dat to the N dimensional form in one easy step.
While this data is a simple example and can be reshaped manually, such reshape operations become especially useful for 5,6,7 or larger dimensional datasets.
In [ ]:
ndim_data, success = px.hdf_utils.reshape_to_Ndims(h5_main)
if not success:
print('There was a problem automatically reshaping the dataset. \
Attempting to reshape manually')
ndim_data = np.reshape(h5_main[()], (h5_chan_grp.attrs['grid_num_rows'],
h5_chan_grp.attrs['grid_num_cols'],
h5_chan_grp.attrs['num_bins']))
else:
print('Collapsed dataset originally of shape: ', h5_main.shape)
print('Reshaped dataset of shape: ', ndim_data.shape)
In [ ]:
# specify a pixel index of interest
freq_ind = 40
# ensuring that this index is within the bounds of the dataset
freq_ind = max(0, min(int(freq_ind), h5_main.shape[1]))
# Constructing the 2D spatial map from the 3D dataset
spat_map = np.abs(ndim_data[:, :, freq_ind])
fig, axis = plt.subplots(figsize=(6,6))
axis.imshow(spat_map, cmap='inferno')
axis.set_xlabel('Columns')
axis.set_ylabel('Rows')
freq_vec = h5_file['/Measurement_000/Channel_000/Bin_Frequencies'][()] * 1E-3
axis.set_title('Amplitude at frequency {} kHz '.format(np.round(freq_vec[freq_ind], 2)));
In [ ]:
h5_file.close()
In [ ]:
# Removing the temporary data file:
remove(h5_path)
In [ ]: