This notebook runs through an example of spectral unmixing to carry out unsupervised classification of a SERC hyperspectral data file using the PySpTools
package to carry out endmember extraction, plot abundance maps of the spectral endmembers, and use Spectral Angle Mapping and Spectral Information Divergence to classify the SERC tile.
Since spectral data is so large in size, it is often useful to remove any unncessary or redundant data in order to save computational time.
In this tutorial, we use the following user-defined functions:
read_neon_reflh5
: function to read in NEON AOP Hyperspectral Data file (in hdf5 format)clean_neon_refl_data
: function to clean NEON hyperspectral data, including applying the data ignore value and reflectance scale factor, and removing water vapor bandsplot_aop_refl
: function to plot a band of NEON hyperspectral data for referenceTo run this notebook, you the following python packages need to be installed. You can install from the notebook following the commands below, or you can install required packages from command line after activating the p35 environment.
PySpTools: Download pysptools-0.14.2.tar.gz from https://pypi.python.org/pypi/pysptools
import sys
!{sys.executable} -m pip install spectral
!{sys.executable} -m pip install scikit-image
!{sys.executable} -m pip install "C:\Users\bhass\Downloads\pysptools-0.14.2.tar.gz
!conda install --yes --prefix {sys.prefix} h5py
!conda install --yes --prefix {sys.prefix} scikit-learn
!conda install --yes --prefix {sys.prefix} cvxopt
See Also: https://stackoverflow.com/questions/45156080/installing-modules-to-anaconda-from-tar-gz
In [9]:
import h5py, os, osr, copy, time
import matplotlib.pyplot as plt
import numpy as np
import pysptools.util as util
import pysptools.eea as eea #endmembers extraction algorithms
import pysptools.abundance_maps as amap
import pysptools.classification as cls
import pysptools.material_count as cnt
from skimage import exposure
from ipywidgets import *
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
In [2]:
def read_neon_reflh5(refl_filename):
"""h5refl2array reads in a NEON AOP reflectance hdf5 file and returns
reflectance array, and header containing metadata in envi format.
--------
Parameters
refl_filename -- full or relative path and name of reflectance hdf5 file
--------
Returns
--------
reflArray:
array of reflectance values
header:
dictionary containing the following metadata (all strings):
bad_band_window1: min and max wavelenths of first water vapor window (tuple)
bad_band_window2: min and max wavelenths of second water vapor window (tuple)
bands: # of bands (float)
coordinate system string: coordinate system information (string)
data ignore value: value corresponding to no data (float)
interleave: 'BSQ' (string)
reflectance scale factor: factor by which reflectance is scaled (float)
wavelength: wavelength values (float)
wavelength unit: 'm' (string)
spatial extent meters:
--------
This function applies to the NEON hdf5 format implemented in 2016, and should be used for
data acquired during and after 2016 as of July 2018. Data in earlier NEON hdf5 format
(collected prior to 2016) is expected to be re-processed after the 2018 flight season.
--------
Example Execution:
--------
sercRefl, sercRefl_header = h5refl2array('NEON_D02_SERC_DP1_20160807_160559_reflectance.h5') """
#Read in reflectance hdf5 file (include full or relative path if data is located in a different directory)
hdf5_file = h5py.File(refl_filename,'r')
#Get the site name
file_attrs_string = str(list(hdf5_file.items()))
file_attrs_string_split = file_attrs_string.split("'")
sitename = file_attrs_string_split[1]
#Extract the reflectance & wavelength datasets
refl = hdf5_file[sitename]['Reflectance']
reflData = refl['Reflectance_Data']
reflArray = refl['Reflectance_Data'].value
#Create dictionary containing relevant metadata information
header = {}
header['map info'] = refl['Metadata']['Coordinate_System']['Map_Info'].value
header['wavelength'] = refl['Metadata']['Spectral_Data']['Wavelength'].value
#Extract no data value & set no data value to NaN
header['data ignore value'] = float(reflData.attrs['Data_Ignore_Value'])
header['reflectance scale factor'] = float(reflData.attrs['Scale_Factor'])
header['interleave'] = reflData.attrs['Interleave']
#Extract spatial extent from attributes
header['spatial extent'] = reflData.attrs['Spatial_Extent_meters']
#Extract bad band windows
header['bad_band_window1'] = (refl.attrs['Band_Window_1_Nanometers'])
header['bad_band_window2'] = (refl.attrs['Band_Window_2_Nanometers'])
#Extract projection information
header['projection'] = refl['Metadata']['Coordinate_System']['Proj4'].value
header['epsg'] = int(refl['Metadata']['Coordinate_System']['EPSG Code'].value)
#Extract map information: spatial extent & resolution (pixel size)
mapInfo = refl['Metadata']['Coordinate_System']['Map_Info'].value
hdf5_file.close
return reflArray, header
In [3]:
help(read_neon_reflh5)
In [4]:
load_start_time = time.time()
data_path = os.getcwd()
h5refl_filename = 'NEON_D02_SERC_DP3_368000_4306000_reflectance.h5'
print('h5 file name:',h5refl_filename)
data,header = read_neon_reflh5(h5refl_filename)
print('Data took ', round(time.time() - load_start_time,1), ' seconds to load.')
In [5]:
#Display information stored in header
for key in sorted(header.keys()):
print(key)
In [6]:
def clean_neon_refl_data(data,header):
"""Clean h5 reflectance data and header
1. set data ignore value (-9999) to NaN
2. apply reflectance scale factor (10000)
3. remove bad bands (water vapor bands + last 10 bands):
Band_Window_1_Nanometers = 1340,1445
Band_Window_2_Nanometers = 1790,1955
4. mask reflectance data > 1.0 ?? ##TO DO ?
"""
#reflSubCleaned = reflArray[clipIndex['yMin']:clipIndex['yMax'],clipIndex['xMin']:clipIndex['xMax'],:].astype(np.float)
#reflSubCleaned[reflSubCleaned==int(reflArray_metadata['noDataVal'])]=np.nan
#reflSubCleaned = reflSubCleaned/reflArray_metadata['scaleFactor']
data_clean = copy.copy(data)
header_clean = copy.copy(header)
#set data ignore value (-9999) to NaN:
## data_clean[data_clean==header['data ignore value']]=np.nan ??? get message ValueError: cannot convert float NaN to integer
if header['data ignore value'] in data:
print('data ignore values exist in data')
#apply reflectance scale factor (divide by 10000)
data_clean = data_clean/header['reflectance scale factor']
#remove bad bands
#1. determine indices corresponding to min/max center wavelength for each bad band window:
bb1_ind0 = np.max(np.where((np.asarray(header['wavelength'])<float(header['bad_band_window1'][0]))))
bb1_ind1 = np.min(np.where((np.asarray(header['wavelength'])>float(header['bad_band_window1'][1]))))
bb2_ind0 = np.max(np.where((np.asarray(header['wavelength'])<float(header['bad_band_window2'][0]))))
bb2_ind1 = np.min(np.where((np.asarray(header['wavelength'])>float(header['bad_band_window2'][1]))))
bb3_ind0 = len(header['wavelength'])-10
#define valid band ranges from indices:
vb1 = list(range(0,bb1_ind0)); #TO DO - don't hard code bands in, fi
vb2 = list(range(bb1_ind1,bb2_ind0))
vb3 = list(range(bb2_ind1,bb3_ind0))
data_clean = data_clean[:,:,vb1+vb2+vb3]
valid_band_range = [i for j in (range(0,bb1_ind0), range(bb1_ind1,bb2_ind0), range(bb2_ind1,bb3_ind0)) for i in j]
header_clean['wavelength'] = [header['wavelength'][i] for i in valid_band_range]
#Display wavelengths corresponding to bands removed:
#print('Wavelengths Removed:')
#print(header['wavelength'][191],'-',header['wavelength'][212],'nm')
#print(header['wavelength'][281],'-',header['wavelength'][314],'nm')
#print(header['wavelength'][416],'-',header['wavelength'][425],'nm')
print('Bad Band Windows (nm):')
print(header['bad_band_window1'])
print(header['bad_band_window2'])
#Display wavelengths corresponding to bands kept:
print('\nCenter Wavelengths of Bands Kept:')
print(header['wavelength'][0],'-',header['wavelength'][bb1_ind0],'nm')
print(header['wavelength'][bb1_ind1],'-',header['wavelength'][bb2_ind0],'nm')
print(header['wavelength'][bb2_ind1],'-',header['wavelength'][bb3_ind0],'nm')
#header_clean['wavelength'] = header
return data_clean, header_clean
In [7]:
print('Raw Data Dimensions:\n',data.shape)
#remove bad bands
clean_start_time = time.time()
data_clean,header_clean = clean_neon_refl_data(data,header)
print('\nData took', round(time.time() - clean_start_time,1), 'seconds to clean.')
print('Cleaned Data Dimensions:',data_clean.shape)
In [10]:
def plot_aop_refl(band_array,refl_extent,colorlimit=(0,1),ax=plt.gca(),title='',cbar ='on',cmap_title='',colormap='Greys'):
'''read in and plot a single band or 3 stacked bands of a reflectance array
--------
Parameters
--------
band_array: ndarray
Array of reflectance values, created from aop_h5refl2array
If 'band_array' is a 2-D array, plots intensity of values
If 'band_array' is a 3-D array (3 bands), plots RGB image, set cbar to 'off' and don't need to specify colormap
refl_extent: tuple
Extent of reflectance data to be plotted (xMin, xMax, yMin, yMax)
Stored in metadata['spatial extent'] from aop_h5refl2array function
colorlimit: tuple, optional
Range of values to plot (min,max).
Look at the histogram of reflectance values before plotting to determine colorlimit.
ax: axis handle, optional
Axis to plot on; specify if making figure with subplots. Default is current axis, plt.gca()
title: string, optional
plot title
cbar: string, optional
Use cbar = 'on' (default), shows colorbar; use if plotting 1-band array
If cbar = 'off' (or not 'on'), does no
cmap_title: string, optional
colorbar title (eg. 'reflectance', etc.)
colormap: string, optional
Matplotlib colormap to plot
see https://matplotlib.org/examples/color/colormaps_reference.html
Returns
--------
plots flightline array of single band of reflectance data
--------
Examples:
--------
>>> plot_aop_refl(sercb56,
sercMetadata['spatial extent'],
colorlimit=(0,0.3),
title='SERC Band 56 Reflectance',
cmap_title='Reflectance',
colormap='Greys_r')'''
import matplotlib.pyplot as plt
plot = plt.imshow(band_array,extent=refl_extent,clim=colorlimit);
if cbar == 'on':
cbar = plt.colorbar(plot,aspect=40); plt.set_cmap(colormap);
cbar.set_label(cmap_title,rotation=90,labelpad=20)
plt.title(title); ax = plt.gca();
ax.ticklabel_format(useOffset=False, style='plain'); #do not use scientific notation for ticklabels
rotatexlabels = plt.setp(ax.get_xticklabels(),rotation=90); #rotate x tick labels 90 degrees
In [12]:
plot_aop_refl(data_clean[:,:,0],header_clean['spatial extent'],(0,0.3))
Spectral Unmixing allows pixels to be composed of fractions or abundances of each class.
Endmembers can be thought of as the basis spectra of an image. Once these endmember spectra are determined, the image cube can be 'unmixed' into the fractional abundance of each material in each pixel (Winter, 1999).
Spectral Angle Mapper (SAM): is a physically-based spectral classification that uses an n-D angle to match pixels to reference spectra. The algorithm determines the spectral similarity between two spectra by calculating the angle between the spectra and treating them as vectors in a space with dimensionality equal to the number of bands. This technique, when used on calibrated reflectance data, is relatively insensitive to illumination and albedo effects. Endmember spectra used by SAM are extracted from the NFINDR algorithm. SAM compares the angle between the endmember spectrum vector and each pixel vector in n-D space. Smaller angles represent closer matches to the reference spectrum. Pixels further away than the specified maximum angle threshold in radians are not classified.
http://www.harrisgeospatial.com/docs/SpectralAngleMapper.html
Spectral Information Divergence (SID): is a spectral classification method that uses a divergence measure to match pixels to reference spectra. The smaller the divergence, the more likely the pixels are similar. Pixels with a measurement greater than the specified maximum divergence threshold are not classified. Endmember spectra used by SID in this example are extracted from the NFINDR endmembor extraction algorithm.
http://www.harrisgeospatial.com/docs/SpectralInformationDivergence.html
In [13]:
#ee.display?
wavelength_float = [float(i) for i in header_clean['wavelength']]
ee_axes = {}
ee_axes['wavelength'] = wavelength_float
ee_axes['x']='Wavelength, nm'
ee_axes['y']='Reflectance'
In [14]:
#Endmember Extraction (Unmixing) - NFINDR Algorithm (Winter, 1999)
ee_start_time = time.time()
ee = eea.NFINDR()
U = ee.extract(data_clean,4,maxit=5,normalize=False,ATGP_init=True)
ee.display(axes=ee_axes,suffix='SERC')
print('Endmember extraction took ', round(time.time() - ee_start_time,1), ' seconds to run.')
In [15]:
#Abundance Maps
amap_start_time = time.time()
am = amap.FCLS()
amaps = am.map(data_clean,U,normalize=False)
am.display(colorMap='jet',columns=4,suffix='SERC')
print('Abundance maps took ', round(time.time() - amap_start_time,1), ' seconds to generate.')
In [16]:
#Print mean values of each abundance map to better estimate thresholds
print('Abundance Map Mean Values:')
print('EM1:',np.mean(amaps[:,:,0]))
print('EM2:',np.mean(amaps[:,:,1]))
print('EM3:',np.mean(amaps[:,:,2]))
print('EM4:',np.mean(amaps[:,:,3]))
In [18]:
#Look at histogram of each abundance map to determine ballpark for thresholds to use
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(18,8))
ax1 = fig.add_subplot(2,4,1); plt.title('EM1')
amap1_hist = plt.hist(np.ndarray.flatten(amaps[:,:,0]),bins=50,range=[0,1.0])
ax2 = fig.add_subplot(2,4,2); plt.title('EM2')
amap1_hist = plt.hist(np.ndarray.flatten(amaps[:,:,1]),bins=50,range=[0,0.001])
ax3 = fig.add_subplot(2,4,3); plt.title('EM3')
amap1_hist = plt.hist(np.ndarray.flatten(amaps[:,:,2]),bins=50,range=[0,0.5])
ax4 = fig.add_subplot(2,4,4); plt.title('EM4')
amap1_hist = plt.hist(np.ndarray.flatten(amaps[:,:,3]),bins=50,range=[0,0.05])
Below we define a functions to compute and display SID
In [19]:
#Spectral Information Divergence
def SID(data,E,thrs=None):
sid = cls.SID()
cmap = sid.classify(data,E,threshold=thrs)
sid.display(colorMap='tab20b',suffix='SERC')
Now we can call this function using the three endmembers (classes) that contain the most information:
In [20]:
U2 = U[[0,2,3],:]
SID(data_clean, U2, [0.8,0.3,0.03])
From this map we can see that SID did a pretty good job of identifying the water (dark blue), roads/buildings (orange), and vegetation (blue). We can compare it to the USA Topo Base map to see how well it did:
In [21]:
#Display USA Topo Base Map
#Reference: https://viewer.nationalmap.gov/
from IPython.core.display import Image
UStopo_filename = 'SERC_368000_4307000_UStopo.PNG'
UStopo_image = os.path.join(data_path,UStopo_filename)
print('Base Map')
Image(filename=UStopo_image,width=225,height=225)
Out[21]:
On your own, try the Spectral Angle Mapper. You will use similar syntax to the SID. and refer to SAM documentation on pysptools and the Pine Creek example.
https://pysptools.sourceforge.io/classification.html#spectral-angle-mapper-sam https://pysptools.sourceforge.io/examples_front.html#examples-using-the-ipython-notebook
Which algorithm does a better job with classification? On your own, play around with different settings (eg. use more / less endmembers, etc.)
In [ ]:
#Spectral Angle Mapper
def SAM(data,E,thrs=None):
sam = cls.SAM()
cmap = sam.classify(data,E,threshold=thrs)
sam.display(colorMap='Paired',suffix='SERC')
PySpTools
has an alpha interface with the Python machine learning package skikit-learn
. To apply more advanced machine learning techniques, you may wish to explore some of these algorithms.
https://pysptools.sourceforge.io/skl.html http://scikit-learn.org/stable/
In [ ]:
In [ ]:
def subset_clean_neon_refl_data(refl_data,refl_header,subset_band_inds):
"""Subset and clean h5 reflectance data and header
1. subset bands
2. set data ignore value (-9999) to NaN
3. apply reflectance scale factor (10000)
"""
refl_subset = refl_data[:,:,subset_band_inds]
refl_header_subset = copy.copy(refl_header)
#set data ignore value (-9999) to NaN:
if refl_header['data ignore value'] in refl_data:
print('data ignore values exist in data')
refl_data_subset[refl_data_subset==refl_header['data ignore value']]=np.nan
#apply reflectance scale factor (divide by 10000)
refl_subset_clean = refl_subset/refl_header['reflectance scale factor']
#valid_band_range = [i for j in (range(0,bb1_ind0), range(bb1_ind1,bb2_ind0), range(bb2_ind1,bb3_ind0)) for i in j]
refl_header_subset['wavelength'] = [refl_header['wavelength'][i] for i in subset_band_inds]
refl_header_subset['fwhm'] = [refl_header['fwhm'][i] for i in subset_band_inds]
return refl_subset_clean, refl_header_subset
In [ ]:
LS8_OLI_bandcenters = [442.96,482.04,561.41,654.59,864.67,1608.86,2200.73]
LS8_OLI_bandwidths = [15.98,60.04,57.33,37.47,28.25,84.72,186.66]