This notebook runs through an example of spectral unmixing to carry out unsupervised classification of a SERC hyperspectral data file.
In this tutorial, we use the following functions:
read_neon_reflh5
: function to read in NEON AOP Hyperspectral Data file (in hdf5 format)clean_neon_refl_data
: function to clean NEON hyperspectral data, including applying the data ignore value and reflectance scale factor, and removing water vapor bandsRGBplot_widget
: function to interactively plot different 3-band combinations of NEON hyperspectral dataWe then use the PySpTools
package to carry out endmember extraction, plot abundance maps of the spectral endmembers, and use Spectral Information Divergence to classify the SERC tile.
To run this notebook, you the following python packages need to be installed. You can install from the notebook following the commands below, or you can install required packages from command line, making sure you have activated the py35 environment first.
PySpTools: Download pysptools-0.14.2.tar.gz from https://pypi.python.org/pypi/pysptools
import sys
!{sys.executable} -m pip install spectral
!{sys.executable} -m pip install scikit-image
!{sys.executable} -m pip install "C:\Users\bhass\Downloads\pysptools-0.14.2.tar.gz
!conda install --yes --prefix {sys.prefix} h5py
!conda install --yes --prefix {sys.prefix} scikit-learn
!conda install --yes --prefix {sys.prefix} cvxopt
See Also: https://stackoverflow.com/questions/45156080/installing-modules-to-anaconda-from-tar-gz
In [1]:
import os, h5py, time
import numpy as np
import pysptools.util as util
import pysptools.eea as eea #endmembers extraction algorithms
import pysptools.abundance_maps as amap
import pysptools.classification as cls
import pysptools.material_count as cnt
from skimage import exposure
from ipywidgets import *
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
In [2]:
import matplotlib.pyplot as plt
import h5py, osr, copy
def read_neon_reflh5(refl_filename):
"""h5refl2array reads in a NEON AOP reflectance hdf5 file and returns
reflectance array, and header containing metadata in envi format.
--------
Parameters
refl_filename -- full or relative path and name of reflectance hdf5 file
--------
Returns
--------
reflArray:
array of reflectance values
header:
dictionary containing the following metadata (all strings):
bad_band_window1
bad_band_window2
bands: # of bands (float)
coordinate system string: coordinate system information (string)
data ignore value: value corresponding to no data (float)
interleave: 'BSQ' (string)
map info: ?
reflectance scale factor: factor by which reflectance is scaled (float)
wavelength: wavelength values (float)
wavelength unit: 'm' (string)
spatial extent meters:
--------
This function applies to the NEON hdf5 format implemented in 2016, and should be used for
data acquired during and after 2016 as of July 2018. Data in earlier NEON hdf5 format
(collected prior to 2016) is expected to be re-processed after the 2018 flight season.
--------
Example Execution:
--------
sercRefl, sercRefl_header = h5refl2array('NEON_D02_SERC_DP1_20160807_160559_reflectance.h5') """
#Read in reflectance hdf5 file (include full or relative path if data is located in a different directory)
hdf5_file = h5py.File(refl_filename,'r')
#Get the site name
file_attrs_string = str(list(hdf5_file.items()))
file_attrs_string_split = file_attrs_string.split("'")
sitename = file_attrs_string_split[1]
#Extract the reflectance & wavelength datasets
refl = hdf5_file[sitename]['Reflectance']
reflData = refl['Reflectance_Data']
reflArray = refl['Reflectance_Data'].value
#Create dictionary containing relevant metadata information
header = {}
header['map info'] = refl['Metadata']['Coordinate_System']['Map_Info'].value
header['wavelength'] = refl['Metadata']['Spectral_Data']['Wavelength'].value
#Extract no data value & set no data value to NaN
header['data ignore value'] = float(reflData.attrs['Data_Ignore_Value'])
header['reflectance scale factor'] = float(reflData.attrs['Scale_Factor'])
header['interleave'] = reflData.attrs['Interleave']
#Extract spatial extent from attributes
header['spatial extent'] = reflData.attrs['Spatial_Extent_meters']
#Extract bad band windows
header['bad band window1'] = (refl.attrs['Band_Window_1_Nanometers'])
header['bad band window2'] = (refl.attrs['Band_Window_2_Nanometers'])
#Extract projection information
header['projection'] = refl['Metadata']['Coordinate_System']['Proj4'].value
header['epsg'] = int(refl['Metadata']['Coordinate_System']['EPSG Code'].value)
#Extract map information: spatial extent & resolution (pixel size)
mapInfo = refl['Metadata']['Coordinate_System']['Map_Info'].value
hdf5_file.close
return reflArray, header
In [3]:
help(read_neon_reflh5)
In [4]:
load_start_time = time.time()
data_path = os.getcwd()
#header_file = 'NEON_D02_SERC_DP3_368000_4306000_reflectance.hdr'
#data_file = os.path.join(data_path,header_file)
h5refl_filename = 'NEON_D02_SERC_DP3_368000_4306000_reflectance.h5'
print('h5 file name:',h5refl_filename)
data,header = read_neon_reflh5(h5refl_filename)
print('Data took ', round(time.time() - load_start_time,1), ' seconds to load.')
In [5]:
#Display information stored in header
for key in sorted(header.keys()):
print(key)
In [6]:
def clean_neon_refl_data(data,header):
"""Clean h5 reflectance data and header
1. set data ignore value (-9999) to NaN
2. apply reflectance scale factor (10000)
3. remove bad bands (water vapor bands + last 10 bands):
Band_Window_1_Nanometers = 1340,1445
Band_Window_2_Nanometers = 1790,1955
4. mask reflectance data > 1.0 ?? ##TO DO ?
"""
#reflSubCleaned = reflArray[clipIndex['yMin']:clipIndex['yMax'],clipIndex['xMin']:clipIndex['xMax'],:].astype(np.float)
#reflSubCleaned[reflSubCleaned==int(reflArray_metadata['noDataVal'])]=np.nan
#reflSubCleaned = reflSubCleaned/reflArray_metadata['scaleFactor']
data_clean = copy.copy(data)
header_clean = copy.copy(header)
#set data ignore value (-9999) to NaN:
## data_clean[data_clean==header['data ignore value']]=np.nan ??? get message ValueError: cannot convert float NaN to integer
if header['data ignore value'] in data:
print('data ignore values exist in data')
#apply reflectance scale factor (divide by 10000)
data_clean = data_clean/header['reflectance scale factor']
#remove bad bands
#1. determine indices corresponding to min/max center wavelength for each bad band window:
bb1_ind0 = np.max(np.where((np.asarray(header['wavelength'])<float(header['bad band window1'][0]))))
bb1_ind1 = np.min(np.where((np.asarray(header['wavelength'])>float(header['bad band window1'][1]))))
bb2_ind0 = np.max(np.where((np.asarray(header['wavelength'])<float(header['bad band window2'][0]))))
bb2_ind1 = np.min(np.where((np.asarray(header['wavelength'])>float(header['bad band window2'][1]))))
bb3_ind0 = len(header['wavelength'])-10
#define valid band ranges from indices:
vb1 = list(range(0,bb1_ind0)); #TO DO - don't hard code bands in, fi
vb2 = list(range(bb1_ind1,bb2_ind0))
vb3 = list(range(bb2_ind1,bb3_ind0))
data_clean = data_clean[:,:,vb1+vb2+vb3]
valid_band_range = [i for j in (range(0,bb1_ind0), range(bb1_ind1,bb2_ind0), range(bb2_ind1,bb3_ind0)) for i in j]
header_clean['wavelength'] = [header['wavelength'][i] for i in valid_band_range]
#Display wavelengths corresponding to bands removed:
#print('Wavelengths Removed:')
#print(header['wavelength'][191],'-',header['wavelength'][212],'nm')
#print(header['wavelength'][281],'-',header['wavelength'][314],'nm')
#print(header['wavelength'][416],'-',header['wavelength'][425],'nm')
print('Bad Band Windows (nm):')
print(header['bad band window1'])
print(header['bad band window2'])
#Display wavelengths corresponding to bands kept:
print('\nCenter Wavelengths of Bands Kept:')
print(header['wavelength'][0],'-',header['wavelength'][bb1_ind0],'nm')
print(header['wavelength'][bb1_ind1],'-',header['wavelength'][bb2_ind0],'nm')
print(header['wavelength'][bb2_ind1],'-',header['wavelength'][bb3_ind0],'nm')
#header_clean['wavelength'] = header
return data_clean, header_clean
In [7]:
print('Raw Data Dimensions:\n',data.shape)
#remove bad bands
clean_start_time = time.time()
data_clean,header_clean = clean_neon_refl_data(data,header)
print('\nData took', round(time.time() - clean_start_time,1), 'seconds to clean.')
print('Cleaned Data Dimensions:',data_clean.shape)
In [8]:
array = data_clean;
wavelengths = header['wavelength']
xmin = header['spatial extent'][0];
xmax = header['spatial extent'][1];
ymin = header['spatial extent'][2];
ymax = header['spatial extent'][3];
def RGBplot_widget(R,G,B):
#Pre-allocate array size
rgbArray = np.zeros((array.shape[0],array.shape[1],3), 'uint8')
Rband = array[:,:,R-1].astype(np.float)
Gband = array[:,:,G-1].astype(np.float)
Bband = array[:,:,B-1].astype(np.float)
rgbArray[..., 0] = Rband*256
rgbArray[..., 1] = Gband*256
rgbArray[..., 2] = Bband*256
# Apply Adaptive Histogram Equalization to Improve Contrast:
img_nonan = np.ma.masked_invalid(rgbArray) #first mask the image
img_adapteq = exposure.equalize_adapthist(img_nonan, clip_limit=0.10)
plot = plt.imshow(img_adapteq,extent=[xmin,xmax,ymin,ymax]);
plt.title('Bands: \nR:' + str(R) + ' (' + str(round(wavelengths[R-1])) +'nm)'
+ '\n G:' + str(G) + ' (' + str(round(wavelengths[G-1])) + 'nm)'
+ '\n B:' + str(B) + ' (' + str(round(wavelengths[B-1])) + 'nm)');
ax = plt.gca(); ax.ticklabel_format(useOffset=False, style='plain')
rotatexlabels = plt.setp(ax.get_xticklabels(),rotation=90)
interact(RGBplot_widget, R=(1,426,1), G=(1,426,1), B=(1,426,1))
Out[8]:
Spectral Unmixing allows pixels to be composed of fractions or abundances of each class.
Endmembers can be thought of as the basis spectra of an image. Once these endmember spectra are determined, the image cube can be 'unmixed' into the fractional abundance of each material in each pixel (Winter, 1999).
Spectral Angle Mapper (SAM):
Spectral Information Divergence (SID):
In [9]:
#ee.display?
wavelength_float = [float(i) for i in header_clean['wavelength']]
ee_axes = {}
ee_axes['wavelength'] = wavelength_float
ee_axes['x']='Wavelength, nm'
ee_axes['y']='Reflectance'
In [10]:
#Endmember Extraction (Unmixing) - NFINDR Algorithm (Winter, 1999)
ee_start_time = time.time()
ee = eea.NFINDR()
U = ee.extract(data_clean,4,maxit=5,normalize=False,ATGP_init=True)
ee.display(axes=ee_axes,suffix='SERC')
print('Endmember extraction took ', round(time.time() - ee_start_time,1), ' seconds to run.')
In [11]:
#Abundance Maps
amap_start_time = time.time()
am = amap.FCLS()
amaps = am.map(data_clean,U,normalize=False)
am.display(colorMap='jet',columns=4,suffix='SERC')
print('Abundance maps took ', round(time.time() - amap_start_time,1), ' seconds to generate.')
In [12]:
#Print mean values of each abundance map to better estimate thresholds
print('Abundance Map Mean Values:')
print('EM1:',np.mean(amaps[:,:,0]))
print('EM2:',np.mean(amaps[:,:,1]))
print('EM3:',np.mean(amaps[:,:,2]))
print('EM4:',np.mean(amaps[:,:,3]))
#Look at histogram of each abundance map to determine ballpark for thresholds to use
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(18,8))
ax1 = fig.add_subplot(2,4,1); plt.title('EM1')
amap1_hist = plt.hist(np.ndarray.flatten(amaps[:,:,0]),bins=50,range=[0,1.0])
ax2 = fig.add_subplot(2,4,2); plt.title('EM2')
amap1_hist = plt.hist(np.ndarray.flatten(amaps[:,:,1]),bins=50,range=[0,0.0001])
ax3 = fig.add_subplot(2,4,3); plt.title('EM3')
amap1_hist = plt.hist(np.ndarray.flatten(amaps[:,:,2]),bins=50,range=[0,0.5])
ax4 = fig.add_subplot(2,4,4); plt.title('EM4')
amap1_hist = plt.hist(np.ndarray.flatten(amaps[:,:,3]),bins=50,range=[0,0.05])
In [13]:
#Create functions to compute and display SAM and SID
#Spectral Angle Mapper
def SAM(data,E,thrs=None):
sam = cls.SAM()
cmap = sam.classify(data,E,threshold=thrs)
sam.display(colorMap='Paired',suffix='SERC')
#Spectral Information Divergence
def SID(data,E,thrs=None):
sid = cls.SID()
cmap = sid.classify(data,E,threshold=thrs)
sid.display(colorMap='tab20b',suffix='SERC')
In [14]:
U2 = U[[0,2,3],:]
SID(data_clean, U2, [0.8,0.3,0.03])
In [15]:
#Display USA Topo Base Map
#Reference: https://viewer.nationalmap.gov/
from IPython.core.display import Image
UStopo_filename = 'SERC_368000_4307000_UStopo.PNG'
UStopo_image = os.path.join(data_path,UStopo_filename)
print('Base Map')
Image(filename=UStopo_image,width=225,height=225)
Out[15]:
PySpTools
has an alpha interface with the Python machine learning package skikit-learn
. To apply more advanced machine learning techniques, you may wish to explore some of these algorithms.
https://pysptools.sourceforge.io/skl.html http://scikit-learn.org/stable/