Water Across Synthetic Aperature Radar (WASARD)

WASARD is a general purpose transfer model between optical and SAR imagery for water classification.
A trained WASARD model can be executed on SAR imagery to create water classification maps over a region.

This notebook is inspired by an IGARSS publication titled Water Across Synthetic Aperture Radar Data (WASARD): SAR Water Body Classification for the Open Data Cube authored by Zachary Kreiser, Brian Killough, Syed R Rizvi.

WASARD is trained using water classifications on optical imagery as a point of reference. A machine learning model is used to approximate a transfer function between SAR data and Optical Classifications.

Details regarding WASARD in this notebook
Transfer Model: Linear SVM
Optical water classifier: WOFS
Optical Source: Landsat 8
SAR Target: Sentinel 1a

In this notebook

Existing coordinates are loaded for Sentinel1 and Landsat8 Imagery
Upon comparison,coordinates are used to inform how data should be loaded.
Sentinel1 imagery is loaded.
Landsat8 imagery is loaded, reprojected, and upsampled to match the resolution of Sentine1
an existing water classifier is run on optical data.
optical water classifications are averaged across time and visualized
a wasard model is created and trained
a wasard model is run on sentinel data
the output of wasard is averaged across time and visualized

Retreiving/Cleaning Data

Connect to the data cube



In [ ]:

    
import datacube
dc = datacube.Datacube(app = "[notebook][wasard][samoa]")

Define Extents of Analysis



In [ ]:

    
sar_product_name = "s1g_gamma0"
optical_product_name = "ls8_usgs_sr_scene"

# #Sa'anapu
# longitude_extent = (-171.904492, -171.790327)
# latitude_extent = (-14.0  , -13.962341)

#Apia
latitude_extent = -13.853425,-13.815715
longitude_extent =-171.787842, -171.681356

date_range = ('2016-8-1','2018-3-1')



In [ ]:

    
from utils.data_cube_utilities.dc_display_map import display_map
display_map(latitude = latitude_extent, longitude = longitude_extent)

Examine Coordinate Extents



In [ ]:

    
sentinel_coordinates =  dc.load(product = sar_product_name,
                                latitude = latitude_extent,
                                longitude = longitude_extent,
                                time = date_range,
                                measurements = [])
landsat_coordinates  =  dc.load(product = optical_product_name,
                                latitude = latitude_extent,
                                longitude = longitude_extent,
                                time = date_range,
                                measurements = [])



In [ ]:

    
import utils.data_cube_utilities.xarray_bokeh_plotting as xr_bokeh
xr_bokeh.init_notebook()



In [ ]:

    
xr_bokeh.dim_alignement( sentinel_coordinates.isel(latitude = slice(0,100), longitude = slice(0,100)), " Sentinel 1",
                          landsat_coordinates.isel(latitude = slice(0,100), longitude = slice(0,100)),  "Landsat 8")

Load and reproject

Add bit about reprojecting landsat the way I do with sentinel.

Sentinel CRS



In [ ]:

    
sentinel_details = dc.list_products()[dc.list_products()["name"].str.contains(sar_product_name)]
sentinel_details



In [ ]:

    
sentinel_crs        = str(sentinel_details['crs'].values[0])
sentinel_resolution = tuple(sentinel_details['resolution'].values[0])



In [ ]:

    
print (sentinel_crs)

Load both SAR and Optical imagery



In [ ]:

    
sentinel_dataset     =  dc.load(product = sar_product_name,
                                latitude = latitude_extent,
                                longitude = longitude_extent,
                                time = date_range,
                                )



In [ ]:

    
landsat_dataset     =    dc.load(product = optical_product_name,
                                latitude = latitude_extent,
                                longitude = longitude_extent,
                                time = date_range,
                                output_crs = sentinel_crs,
                                resolution = sentinel_resolution
                                )

compare reprojected coordinates for alignment.



In [ ]:

    
subset_of_landsat_coords = landsat_dataset.isel(latitude = slice(0,100),
                                        longitude = slice(0,100),
                                        time = 0).coords



In [ ]:

    
subset_of_sentinel_coords = sentinel_dataset.isel(latitude = slice(0,100),
                                        longitude = slice(0,100),
                                        time = 0).coords



In [ ]:

    
xr_bokeh.dim_alignement(subset_of_sentinel_coords, "S1",
                        subset_of_landsat_coords, "LS")

Clean SAR Data



In [ ]:

    
import xarray as xr  
import numpy as np
import matplotlib.pyplot as plt

def remove_all_zero(dataset):
    return dataset.drop([c[0].values 
        for c in [(t,np.count_nonzero(dataset.sel(time=t).vv)) 
                  for t in dataset.time] if c[1]==0],dim='time')



In [ ]:

    
from typing import List  
import itertools

has_time_dimension = lambda x: "time" in dict(x.dims).keys()
get_first = lambda x: x[0]
get_last =  lambda x: x[-1]


def group_dates_by_day( dates: List[np.datetime64]) -> List[List[np.datetime64]]:
    generate_key = lambda b: ((b - np.datetime64('1970-01-01T00:00:00Z')) / (np.timedelta64(1, 'h')*24)).astype(int)
    return [list(group) for key, group in itertools.groupby(dates, key=generate_key)]

def reduce_on_day(ds: xr.Dataset,
                  reduction_func: np.ufunc = np.nanmean) -> xr.Dataset:
    #Group dates by day into date_groups
    day_groups = group_dates_by_day(ds.time.values)
    
    #slice large dataset into many smaller datasets by date_group
    group_chunks = (ds.sel(time = t) for t in day_groups)
    
    #reduce each dataset using something like "average" or "median" such that many values for a day become one value   
    group_slices = (_ds.reduce(reduction_func, dim = "time") for _ds in group_chunks if has_time_dimension(_ds))

    #recombine slices into larger dataset
    new_dataset  = xr.concat(group_slices, dim = "time") 
    
    #rename times values using the first time in each date_group  
    new_times = list(map(get_first, day_groups))    
    new_dataset = new_dataset.reindex(dict(time = np.array(new_times)))
    
    return new_dataset



In [ ]:

    
sentinel_dataset = remove_all_zero(sentinel_dataset)
sentinel_dataset = reduce_on_day(sentinel_dataset)



In [ ]:

    
subset_of_landsat_coords = landsat_dataset.isel(latitude = 0, longitude = 0).coords
subset_of_sentinel_coords = sentinel_dataset.isel(latitude = 0, longitude = 0).coords
xr_bokeh.dim_alignement(subset_of_sentinel_coords, "S1",
                        subset_of_landsat_coords, "LS")

Examine Optical Data



In [ ]:

    
from utils.data_cube_utilities.dc_mosaic import ls8_unpack_qa



In [ ]:

    
from utils.data_cube_utilities.dc_mosaic import ls8_unpack_qa
from utils.data_cube_utilities.dc_mosaic import create_median_mosaic
import xarray as xr  
import numpy as np

def clean_mask_ls8(ds:xr.Dataset) -> np.array:
    clear_xarray  = ls8_unpack_qa(ds.pixel_qa, "clear")  
    water_xarray  = ls8_unpack_qa(ds.pixel_qa, "water")

    cloud_free_boolean_mask = np.logical_or(clear_xarray, water_xarray)
    return cloud_free_boolean_mask

def median_mosaic_ls8(dataset):
    # The mask here is based on pixel_qa products. It comes bundled in with most Landsat Products.
        return create_median_mosaic(dataset, clean_mask = clean_mask_ls8(dataset))



In [ ]:

    
mosaic = median_mosaic_ls8(landsat_dataset.isel(time = slice(0,20)))



In [ ]:

    
from utils.data_cube_utilities.dc_rgb import rgb
rgb(mosaic, bands=['red', 'green', 'blue'], width= 15)

Water Classification: Optical Imagery

Import water classifier
This notebooks uses WOFS. A water classification algorithm for landsat imagery, developed by Geoscience Australia.



In [ ]:

    
from utils.data_cube_utilities.dc_water_classifier import wofs_classify

Classify
wofs_classify is used. An additional mask is passed in that indicates the presence of clouds



In [ ]:

    
water_classifications = wofs_classify(landsat_dataset, clean_mask= clean_mask_ls8(landsat_dataset), no_data= np.nan)

Visualize optical water classifciaton
A percentage (between the range of [0-1]) denotes the frequency of water in any given area,



In [ ]:

    
%matplotlib inline
def aspect_ratio_helper(x,y, fixed_width = 20):
    width = fixed_width
    height = y * (fixed_width / x)
    return (width, height)



In [ ]:

    
import matplotlib.pyplot as plt  

plt.figure(figsize = aspect_ratio_helper(*reversed(list(water_classifications.wofs.shape)[1:])))
water_classifications.mean(dim = 'time').wofs.plot(cmap = "jet_r")

Train WASARD water classifier using optical data

Import wasard



In [ ]:

    
from utils.data_cube_utilities import wasard

Train WASARD classifier
In this case only data from samoa will be used. In the general case, It is suggested that a geographicall varied dataset is used to train the model.



In [ ]:

    
samoa_classifier = wasard.wasard_classifier(sar_dataset=sentinel_dataset,
                                              landsat_dataset=landsat_dataset)

Water Classifcation: WASARD

Classify



In [ ]:

    
samoa_classified = samoa_classifier.wasard_classify(sentinel_dataset)

Visualize WASARD SAR water classifciaton
A percentage (between the range of [0-1]) denotes the frequency of water in any given area,



In [ ]:

    
plt.figure(figsize = aspect_ratio_helper(*reversed(list(water_classifications.wofs.shape)[1:])))
samoa_classified.wasard.mean(dim = "time").plot(cmap = "jet_r")