UN SDG Indicator 6.6.1:
Change in the Extent of Water-related Ecosystems

Notebook Summary

The United Nations have prescribed 17 "Sustainable Development Goals" (SDGs). This notebook attempts to monitor SDG Indicator 6.6.1 - change in the extent of water-related ecosystems. Indicator 6.6.1 has 4 sub-indicators:

i. The spatial extent of water-related ecosystems
ii. The quantity of water contained within these ecosystems
iii. The quality of water within these ecosystems
iv. The health or state of these ecosystems

This notebook primarily focuses on the first sub-indicator - spatial extents.

In the first section, one time period is analyzed. The mean of water classifications over time is visualized, the minimum and maximum water extents are visualized, and the frequency of water classification for each pixel is visualized.

In the second section, two time periods are analyzed - a baseline time period and an analysis time period. The change in pixels that are water at any time during their respective time periods is calculated, the change in water from the baseline time period to the analysis time period is visualized, and the area is classified by degree of change using classification labels provided by UN SDG 6.6.1.

Index

Import Dependencies and Connect to the Data Cube
Choose Platforms and Products
Get the Extents of the Cube
Analyze a Single Time Period
Compare Two Time Periods - a Baseline and an Analysis

Import Dependencies and Connect to the Data Cube ▴



In [1]:

    
# Supress Warning 
import warnings
warnings.filterwarnings('ignore')



In [2]:

    
%matplotlib inline
import warnings
import matplotlib.pyplot as plt

# Allow importing of our utilities.
import sys
sys.path.append("..")

# Import the datacube and the API
import datacube
from utils.data_cube_utilities.data_access_api import DataAccessApi
import datetime as dt
import numpy as np
import xarray as xr
import pandas as pd

from matplotlib.colors import LinearSegmentedColormap



In [3]:

    
# Create an instance of the datacube and API.
api = DataAccessApi()
dc = api.dc

Choose Platforms and Products ▴

List available products for each platform



In [4]:

    
# Get available products
products_info = dc.list_products()

# List LANDSAT 7 products
print("LANDSAT 7 Products:")
products_info[["platform", "name"]][products_info.platform == "LANDSAT_7"]









    



LANDSAT 7 Products:






    Out[4]:







  
    
      
      platform
      name
    
    
      id
      
      
    
  
  
    
      2
      LANDSAT_7
      ls7_usgs_sr_scene



In [5]:

    
# List LANDSAT 8 products
print("LANDSAT 8 Products:")
products_info[["platform", "name"]][products_info.platform == "LANDSAT_8"]









    



LANDSAT 8 Products:






    Out[5]:







  
    
      
      platform
      name
    
    
      id
      
      
    
  
  
    
      1
      LANDSAT_8
      ls8_usgs_sr_scene

Choose products



In [6]:

    
# These are the platforms (satelltes) and products (datacube sets) 
# used for this demonstration.
# Single time period analysis
platforms_single = ['LANDSAT_8']
products_single = ['ls8_usgs_sr_scene']
# Two time period analysis
platforms_two = ['LANDSAT_7', 'LANDSAT_8']
products_two = ['ls7_usgs_sr_scene', 'ls8_usgs_sr_scene']

all_platforms = []
for x in platforms_single + platforms_two:
    all_platforms.append(x) if x not in all_platforms else None
all_products = []
for x in products_single + products_two:
    all_products.append(x) if x not in all_products else None

Get the Extents of the Cube ▴



In [7]:

    
from utils.data_cube_utilities.dc_load import get_overlapping_area
from utils.data_cube_utilities.dc_time import dt_to_str

full_lat, full_lon, min_max_dates = \
    get_overlapping_area(api, all_platforms, all_products)

# Print the extents of each product.
str_min_max_dates = np.vectorize(dt_to_str)(min_max_dates)
for i, (platform, product) in enumerate(zip(all_platforms, all_products)):
    print("For platform {} and product {}:".format(platform, product))
    print("Time Extents:", str_min_max_dates[i])
    print()

# Print the extents of the combined data.
min_start_date_mutual = np.max(min_max_dates[:,0])
max_end_date_mutual = np.min(min_max_dates[:,1])
print("Overlapping Extents:")
print("Latitude Extents:", full_lat)
print("Longitude Extents:", full_lon)
print("Time Extents:", list(map(dt_to_str, (min_start_date_mutual, max_end_date_mutual))))









    



For platform LANDSAT_8 and product ls8_usgs_sr_scene:
Time Extents: ['2013-03-21' '2020-01-27']

For platform LANDSAT_7 and product ls7_usgs_sr_scene:
Time Extents: ['1999-07-08' '2020-01-10']

Overlapping Extents:
Latitude Extents: (-12.57305555565614, 18.32305555570214)
Longitude Extents: (-25.47250000020378, 44.01000000035208)
Time Extents: ['2013-03-21', '2020-01-10']

Visualize the full area



In [8]:

    
from utils.data_cube_utilities.dc_display_map import display_map
display_map(full_lat, full_lon)









    Out[8]:

Analyze a Single Time Period ▴

Define the Extents of the Analysis ▴

Specify start and end dates



In [9]:

    
# Select a subset of the time available (year, month, day).
time_extents_single = [dt.datetime(2014,1,1), dt.datetime(2014,12,31)]

Specify an area to analyze



In [10]:

    
# Specify latitude and longitude bounds of an interesting area within the full extents.

# Ghana
# lat = (6.5016, 6.5221) # Lake Volta (very small)
# lon = (-0.1618, -0.100) # Lake Volta (very small)
# lat = (6.5016, 6.5821) # Lake Volta (small)
# lon = (-0.1618, -0.055) # Lake Volta (small)
# lat = (6.5138, 6.5292) # Lake Volta (medium)
# lon = (-0.1669, -0.1493) # Lake Volta (medium)
# lat = (6.2989, 7.9287) # Lake Volta (large)
# lon = (-0.4559, 0.2637) # Lake Volta (large)
lat =  (10.8600, 10.9350) # Tono Dam
lon = (-1.1934, -1.1423) # Tono Dam

# Lake Sulunga, Tanzania
# lat = (-6.3605, -5.8252) # Large 
# lon = (34.9756, 35.4535) # Large
# lat = (-6.2593, -5.8701) # Small (close fit)
# lon = (34.9901, 35.3641) #  Small (close fit)

# Lake Manyara, Tanzania
# lat = (-3.8505, -3.3886) 
# lon = (35.7184, 35.9271)

# Vietnam
# lat = (10.037934, 10.237934) 
# lon = (104.992264, 105.192264)
# lat = (11.0645, 11.2845) # Tri An Lake
# lon = (106.9567, 107.2967) # Tri An Lake

Visualize the selected area



In [11]:

    
display_map(lat, lon)









    Out[11]:

Retrieve the Data from the Datacube ▴



In [12]:

    
from utils.data_cube_utilities.dc_load import match_dim_sizes
from utils.data_cube_utilities.clean_mask import landsat_qa_clean_mask, landsat_clean_mask_invalid
from utils.data_cube_utilities.aggregate import xr_scale_res
from utils.data_cube_utilities.sort import xarray_sortby_coord

def load_for_time_range(platforms, products, time_extents):
    measurements = ['red', 'blue', 'green', 'nir', 'swir1', 'swir2', 'pixel_qa']
    matching_abs_res, same_dim_sizes = match_dim_sizes(dc, products, lon, lat)
    datasets = {}
    clean_masks = {}
    for platform, product in zip(platforms, products):
        # Load the dataset.
        dataset = dc.load(platform=platform, product=product, lat=lat, lon=lon, 
                          time=time_extents, measurements=measurements,
                          group_by='solar_day', 
                          dask_chunks={'latitude':1000, 'longitude':1000, 'time':1}).persist()
        if len(dataset.dims) == 0: # The dataset is empty.
            continue
        # Get the clean mask.
        clean_mask = (landsat_qa_clean_mask(dataset, platform) &
                      ((dataset != -9999).to_array().all('variable')) &
                      landsat_clean_mask_invalid(dataset)).persist()
        dataset = dataset.drop('pixel_qa')
        # Discard acquisitions with no clean data.
        acq_times_to_keep = dataset.time.isel(time=(clean_mask.mean(['latitude', 'longitude']) > 0.001))
        dataset = dataset.sel(time=acq_times_to_keep)
        clean_mask = clean_mask.sel(time=acq_times_to_keep)
        # If needed, scale the datasets and clean masks to the same size in the x and y dimensions.
        if not same_dim_sizes:    
            dataset = xr_scale_res(dataset, abs_res=matching_abs_res)
            clean_mask = xr_scale_res(clean_mask.astype(np.uint8), abs_res=matching_abs_res).astype(np.bool)
        # Clean the data.
        dataset = dataset.astype(np.float16).where(clean_mask)
        datasets[product], clean_masks[product] = dataset, clean_mask
    # Combine everything.
    if len(datasets) > 0:
        dataset = xarray_sortby_coord(xr.concat(list(datasets.values()), dim='time'), coord='time')
        clean_mask = xarray_sortby_coord(xr.concat(list(clean_masks.values()), dim='time'), coord='time')
    else:
        dataset = xr.Dataset()
        clean_mask = xr.DataArray(np.empty((0,), dtype=np.bool))
    return dataset, clean_mask



In [13]:

    
dataset, clean_mask = load_for_time_range(platforms_single, products_single, time_extents_single)
dataset = dataset.persist()
clean_mask = clean_mask.persist()

Get Water Classifications Using the WOFS Algorithm ▴



In [14]:

    
# Whether or not to denoise the WOFS classifications.
denoise = False

# Denoising parameters may need to be tuned depending on the region and resolution scaling.
# *min_size*: Minimum size of connected pixels 
# that will not be filtered (minimum=2).
# *kernel_size*: Diameter of the modal kernal (minimum=3). 
# Larger values will run slower as it calculates the mode 
# of the neighborhood for each pixel inside this diameter.
# *connectivity*: Maximum distance between any two pixels. 
# A value of 1 allows only contiguous regions of pixels.
# *max_num_filter_runs*: The filter is run until the output stops changing 
# or this number of runs has been performed. 
std_denoise_params = dict(min_size=40, kernel_size=7, connectivity=10)
max_num_filter_runs = 4



In [15]:

    
from utils.data_cube_utilities.dc_water_classifier import wofs_classify
from utils.data_cube_utilities.dc_utilities import ignore_warnings
from utils.data_cube_utilities.raster_filter import lone_object_filter

def get_water_classifications(dataset, clean_mask, max_num_filter_runs, 
                              denoise=False, denoise_params=None):
    # Get WOFS classifications.
    water_da = ignore_warnings(wofs_classify, dataset).wofs
    water_da = water_da.where(clean_mask)

    # Denoise the classifications.
    if denoise:
        denoise_params = std_denoise_params if denoise_params is None else denoise_params
        
        # Encode NaN as -1 for `lone_object_filter()`.
        water_da = water_da.where(~xr_nan(water_da), -1)
        for time in water_da.time:
            water_slice_prev = water_da.sel(time=time).values
            water_slice_new = lone_object_filter(water_slice_prev, **std_denoise_params)
            # While the filter output is still changing or until the filter has
            # been run `max_num_filter_runs` times, keep rerunning it.
            filter_run_count = 0
            while (water_slice_new != water_slice_prev).any() and \
                  filter_run_count < max_num_filter_runs:
                water_slice_prev = water_slice_new
                water_slice_new = lone_object_filter(water_slice_prev, **std_denoise_params)
                filter_run_count += 1
            water_da.sel(time=time).values[:] = water_slice_new
        water_da = water_da.where(water_da!=-1) # Reencode -1 as NaN.  
    return water_da

water_da = get_water_classifications(dataset, clean_mask, max_num_filter_runs,
                                     denoise, std_denoise_params)



In [16]:

    
# Resample the data in time to avoid cluttering the x axes of plots 
# with tick labels (dates) and also smooth the plots.
dataset = dataset.resample(time='2d').mean('time').dropna('time', how='all').persist()
clean_mask = (clean_mask.resample(time='2d').mean('time').dropna('time', how='all') >= 0.5).persist()
water_da = water_da.resample(time='2d').mean('time').dropna('time', how='all').persist()

Prepare for Visualization ▴



In [17]:

    
# Define a standard dpi partly to tune font sizes in figures.
# A higher dpi and a lower figure size will result in larger 
# font sizes relative to the figure size.
std_dpi = 100

Find the Minimum and Maximum Water Extents ▴

We are finding the minimum and maximum water extents based on a max-composite of water data. We do this to reduce the appearance of missing data due to cloud cover and to avoid cluttering a plot of the mean of water over time later.



In [18]:

    
## Settings ##

# Formatting for full-width plots.
title_fontdict = dict(fontsize=16) # Title formatting
tick_label_fmt_dict = dict(axis='both', labelsize=12) # Tick label formatting
axis_label_fmt_dict = dict(fontsize=16) # Axis label formatting
legend_kwargs = dict(fontsize=12)

# Can be any of ['time-slice', 'per-pixel'].
# The value 'time-slice' obtains the minimum and maximum water extents on an time-slice basis.
# The value 'per-pixel' obtains the minumum and maximum water extents on a per-pixel basis.
water_extent_method = 'time-slice'
assert water_extent_method in ['time-slice', 'per-pixel'], \
    "The setting `water_extent_method` must be one of ['time-slice', 'per-pixel']."

# This setting is only relevant if `water_extent_method` is set to 'time-slice'.
# Water classifiers can sometimes output erroneous classifications and sometimes excessive clouds
# make compositing an impractical solution to accounting for noisy classifications. 
# And SAR data is sometimes unavailable for WASARD water classification over cloudy regions,
# so percentiles can be specified for the minimum and maximum time slices. These can be between
# 0 and 100. The minimum water time-slice selected will be the time slice with non-water extents
# at or below `percentile_min`. The maximum water time-slice selected will be the time slice 
# with water extents at or below `percentile_max`.
percentile_min, percentile_max = 100, 100

## End Settings ##

# Option 1: Obtain the acquisitions with the minimum and maximum water extents.
if water_extent_method == 'time-slice':
    count_non_water = water_da.where(water_da == 0).count(dim=['latitude', 'longitude'])
    count_water = water_da.where(water_da == 1).count(dim=['latitude', 'longitude'])
    
    # Find the acquisition with the least water (the one with the most non-water pixels).
    percentile_value_min = np.percentile(count_non_water, percentile_min, interpolation='lower')
    # Handle the case of multiple acquisitions with the same amount of non-water (notably 0).
    acqs_with_value_min = count_non_water == percentile_value_min
    if acqs_with_value_min.sum() > 1: # Find the acquisition with the least water (and most non-water).
        min_extent_time_ind = np.argmin(count_water.isel(time=acqs_with_value_min).values)
    else:
        min_extent_time_ind = (count_non_water == percentile_value_min).argmax().values
    min_extent_time = water_da.time.values[min_extent_time_ind]
    min_water_extent = water_da.isel(time=min_extent_time_ind)

    # Find the acquisition with the most water (the one with the most water pixels).
    percentile_value_max = np.percentile(count_water, percentile_max, interpolation='lower')
    # Handle the case of multiple acquisitions with the same amount of water (notably 0).
    acqs_with_value_max = count_water == percentile_value_max
    if acqs_with_value_max.sum() > 1: # Find the acquisition with the least non-water (and most water).
        max_extent_time_ind = np.argmin(count_non_water.isel(time=acqs_with_value_max).values)
    else:
        max_extent_time_ind = (count_water == percentile_value_max).argmax().values
    max_extent_time = water_da.time.values[max_extent_time_ind]
    max_water_extent = water_da.isel(time=max_extent_time_ind)
    
    dt64_to_date_str = lambda dt64 : pd.to_datetime(str(dt64)).strftime('%Y/%m/%d')
    max_water_date_str = dt64_to_date_str(max_extent_time)
    min_water_date_str = dt64_to_date_str(min_extent_time)
    
# Option 2: Obtain the minimum and maximum water extents on a per-pixel basis.
else: 
    min_water_extent = water_da.min('time')
    max_water_extent = water_da.max('time')
    max_water_date_str = "N/A"
    min_water_date_str = "N/A"



In [19]:

    
from utils.data_cube_utilities.plotter_utils import create_discrete_color_map, xarray_imshow
from utils.data_cube_utilities.plotter_utils import figure_ratio

# Begin plotting the data
fig_width = 12 # Use this to change the size of this figure.
fig = plt.figure(figsize=figure_ratio(water_da, fixed_width=fig_width), dpi=std_dpi)

# Create common colormap.
cmap = create_discrete_color_map([0,1], ['black', 'cyan'])

imshow_kwargs = dict(vmin=0, vmax=1, cmap=cmap)

legend_labels = {0:'Not Water', 1:'Water'}

# Minimum water extent
ax = plt.subplot(1,2,1)
fig, ax, im, cbar = \
    xarray_imshow(min_water_extent, fig=fig, ax=ax, use_colorbar=False, use_legend=True, 
                  legend_labels=legend_labels, imshow_kwargs=imshow_kwargs, 
                  x_label_kwargs=axis_label_fmt_dict, y_label_kwargs=axis_label_fmt_dict,
                  legend_kwargs=legend_kwargs)
ax.set_title("Minimum Water Extent \nTime (YYYY/MM/DD): {}".format(min_water_date_str), fontdict=title_fontdict)
ax.tick_params(**tick_label_fmt_dict)

# Maximum water extent
ax = plt.subplot(1,2,2)
fig, ax, im, cbar = \
    xarray_imshow(max_water_extent, fig=fig, ax=ax, use_colorbar=False, use_legend=True, 
                  legend_labels=legend_labels, imshow_kwargs=imshow_kwargs,
                  x_label_kwargs=axis_label_fmt_dict, y_label_kwargs=axis_label_fmt_dict,
                  legend_kwargs=legend_kwargs)
ax.set_title("Maximum Water Extent \nTime (YYYY/MM/DD): {}".format(max_water_date_str), fontdict=title_fontdict)
ax.tick_params(**tick_label_fmt_dict)

plt.tight_layout()
plt.show()

Create Water Extent Image ▴



In [20]:

    
from utils.data_cube_utilities.plotter_utils import binary_class_change_plot

fig_width = 6 # Use this to change the size of this figure.
fig = plt.figure(figsize=figure_ratio(water_da, fixed_width=fig_width), dpi=std_dpi)

# Shade 3 regions - never, sometimes, and always water.
(fig,ax), [never_sometimes_always_stats] = \
    binary_class_change_plot(
        [water_da==1], [clean_mask], colors=['black', 'yellow', 'cyan'], 
        class_legend_label='Water', fig=fig)
plt.show()



In [21]:

    
index = list(map(lambda str: str.format('Water'),
                  ['Never {}', 'Sometimes {}', 'Always {}', 
                   'Max {}', 'Min {}', 'No Data']))

num_table_rows = len(index)

stats_table_single = pd.DataFrame(data=np.zeros((num_table_rows, 4)),
                                  index=index, columns=['Date (YYYY/MM/DD)', 'Number', 'Percent', 'Area (km^2)'])

## Date
# There are the dates for the rows.
dates = np.array(['', '', '', max_water_date_str, min_water_date_str, ''])
stats_table_single.loc[:,'Date (YYYY/MM/DD)'] = dates

## Number
class_sums = np.array([never_sometimes_always_stats.loc['Never Water'].Number, 
                       never_sometimes_always_stats.loc['Sometimes Water'].Number,
                       never_sometimes_always_stats.loc['Always Water'].Number, 
                       max_water_extent.sum().values, 
                       min_water_extent.sum().values, 
                       never_sometimes_always_stats.loc['Unknown'].Number])
stats_table_single.loc[:, 'Number'] = class_sums

## Area
# Show pixel changes and calculate area using pixel resolution
prod_info = dc.list_products()
deg_per_px = prod_info[prod_info['name'] == products_single[0]]['resolution'].values[0]
deg_per_px = np.abs(deg_per_px)
# Roughly 111km per degree of latitude and longitude.
km_per_px = 111 * deg_per_px
# Calculate the total area.
sq_km_per_px = np.prod(km_per_px)
stats_table_single.loc[:, 'Area (km^2)'] = stats_table_single.loc[:,'Number'] * sq_km_per_px

## Percent
stats_table_single.loc[:, 'Percent'] = \
    [never_sometimes_always_stats.loc['Never Water'].Percent, 
     never_sometimes_always_stats.loc['Sometimes Water'].Percent,
     never_sometimes_always_stats.loc['Always Water'].Percent, 
     max_water_extent.mean().values, 
     min_water_extent.mean().values, 
     never_sometimes_always_stats.loc['Unknown'].Percent]
stats_table_single_fmt = stats_table_single.copy()
stats_table_single_fmt.loc[:, 'Percent'] = [f"{pct:0.2%}" for pct in stats_table_single.loc[:, 'Percent']]
stats_table_single_fmt









    Out[21]:







  
    
      
      Date (YYYY/MM/DD)
      Number
      Percent
      Area (km^2)
    
  
  
    
      Never Water
      
      37343.0
      74.76%
      35.501783
    
    
      Sometimes Water
      
      7359.0
      14.73%
      6.996160
    
    
      Always Water
      
      5248.0
      10.51%
      4.989244
    
    
      Max Water
      2014/01/14
      12554.0
      25.13%
      11.935018
    
    
      Min Water
      2014/12/26
      6930.0
      13.87%
      6.588313
    
    
      No Data
      
      0.0
      0.00%
      0.000000

Create a Time Series Plot of the Water ▴



In [22]:

    
from utils.data_cube_utilities.plotter_utils import xarray_time_series_plot
from utils.data_cube_utilities.plotter_utils import np_dt64_to_str

total_area = len(water_da.latitude) * len(water_da.longitude) * sq_km_per_px

figsize = (8, 4) # The width and height of the figure, respectively.
fig = plt.figure(figsize=figsize, dpi=std_dpi)

(water_da.sum(['latitude', 'longitude']) * sq_km_per_px).plot.line(marker='o')
plt.title('Water Area Over Time')
plt.ylabel('Area (km^2)')
plt.xlabel('Time')
plt.show()

Compare Two Time Periods - a Baseline and an Analysis ▴

Define the Extents of the Analysis ▴

Specify start and end dates



In [23]:

    
# Select a subset of the time available.
time_extents_two = {'baseline': [dt.datetime(2002,1,1), dt.datetime(2002,12,31)],
                    'analysis': [dt.datetime(2014,1,1), dt.datetime(2014,12,31)]}

Visualize the selected area



In [24]:

    
display_map(lat, lon)









    Out[24]:

Retrieve the Data from the Datacube ▴



In [25]:

    
from utils.data_cube_utilities.dc_mosaic import create_median_mosaic

datasets = {} # Maps categories to cleaned datasets.
clean_masks = {} # Maps categories to clean masks.
datasets_median_composites = {}
for category, time_extents in list(time_extents_two.items()):
    # Use LANDSAT_7 for the baseline because its SLC was broken afterward (2003).
    if category == 'baseline':
        platform = platforms_two[0]
        product = products_two[0]
    else: 
        platform = platforms_two[1]
        product = products_two[1]
    datasets[category], clean_masks[category] = \
        load_for_time_range([platform], [product], time_extents)
    datasets[category] = datasets[category].persist()
    clean_masks[category] = clean_masks[category].persist()
    datasets_median_composites[category] = \
        create_median_mosaic(datasets[category], clean_masks[category]).persist()

Get Water Classifications Using the WOFS Algorithm ▴



In [26]:

    
# Get WOFS classifications.
water_das = {}
for category in datasets:
    water_da = \
        get_water_classifications(datasets[category], clean_masks[category], 
                                  max_num_filter_runs, 
                                  denoise, std_denoise_params)
    water_das[category] = water_da



In [27]:

    
# Resample the data in time to avoid cluttering the x axes of plots 
# with tick labels (dates) and also smooth the plots.
for category in datasets:
    datasets[category] = datasets[category].resample(time='2d').mean('time').dropna('time', how='all').persist()
    clean_masks[category] = (clean_masks[category].resample(time='2d').mean('time').dropna('time', how='all') >= 0.5).persist()
    water_das[category] = water_das[category].resample(time='2d').mean('time').dropna('time', how='all').persist()

Show Water Extents of the Baseline and Analysis Periods ▴

Show the median composite for each period.
The results are displayed using a false color RGB where water is dark blue



In [28]:

    
from utils.data_cube_utilities.plotter_utils import figure_ratio

## Settings ##

# Formatting for full-width plots.
title_fontdict = dict(fontsize=16) # Title formatting
tick_label_fmt_dict = dict(axis='both', labelsize=12) # Tick label formatting
axis_label_fmt_dict = dict(fontsize=16) # Axis label formatting
legend_kwargs = dict(fontsize=12)
std_dpi = 200 # Standard image dpi.

def std_figsize(fig_width): 
    return figure_ratio(datasets['baseline'].isel(time=0), fixed_width=fig_width)

# This is the name of the band used as the 
# graysale background when showing water extents.
background_band = 'swir1'

## End Settings ##



In [29]:

    
from utils.data_cube_utilities.dc_rgb import rgb

fig_width = 12 # Use this to change the size of this figure.
fig, ax = plt.subplots(1, 2, figsize=std_figsize(fig_width))

for i, category in enumerate(water_das):
    rgb(datasets_median_composites[category], fig=fig, ax=ax[i], 
        bands=['swir2', 'nir', 'green'], min_possible=0, max_possible=3000)
    ax[i].set_title(category.capitalize())
plt.tight_layout()
plt.show()

Show which pixels are sometime water or always water for each period, with a gray-scale background.
Warning: These results can be impacted by cloud contamination. The identification of clouds in Landsat
scenes is not perfect, so errors in cloud identification can impact time series results



In [30]:

    
from utils.data_cube_utilities.plotter_utils import binary_class_change_plot

fig_width = 12 # Use this to change the size of this figure.
fig, ax = plt.subplots(1, 2, figsize=std_figsize(fig_width))

# Collect stats tables for each period so we can later create a change matrix.
stats_tables = {}

for i, category in enumerate(water_das):
    # Show water in blue over a grey-scale background.
    rgb(datasets_median_composites[category], bands=[background_band]*3, 
        ax=ax[i], use_data_min=True, use_data_max=True)
    water_composite = water_das[category].where(clean_masks[category], 0).mean('time')
    [fig, ax[i]], [stats_tables[category]] = \
    binary_class_change_plot(
        [water_das[category]==1],
        [clean_masks[category]],
        colors=['gray', 'yellow', 'blue'], 
        neg_trans=True,
        class_legend_label='Water', fig=fig, ax=ax[i], 
        title_kwargs=dict(label='Water ' + category.capitalize(), fontdict=title_fontdict),
        x_label_kwargs=axis_label_fmt_dict, y_label_kwargs=axis_label_fmt_dict, 
        legend_kwargs=legend_kwargs, 
        denoise=denoise, denoise_params=std_denoise_params)
plt.tight_layout()
plt.show()

Show and Analyze the Differences Between the Two Time Periods ▴

Compare the water extents of the cloud-filtered mosaics of the baseline and analysis time periods



In [31]:

    
fig_width = 10 # Use this to change the size of this figure.
fig, ax = plt.subplots(figsize=std_figsize(fig_width))

# Show water change over a grey-scale background.
rgb(datasets_median_composites['baseline'], bands=[background_band]*3, 
    ax=ax, use_data_min=True, use_data_max=True)
# Show regions based on changes between having zero or more than zero times 
# in which they are classified as water between the two time periods.
[fig, ax], [stats_table_two, change_matrix] = \
    binary_class_change_plot(
        [water_das['baseline']==1, water_das['analysis']==1],
        [clean_masks['baseline'], clean_masks['analysis']],
        colors=['gray', 'green', 'red', 'blue'], 
        neg_trans=True,
        class_legend_label='Water', fig=fig, ax=ax, title_kwargs=dict(fontdict=title_fontdict),
        x_label_kwargs=axis_label_fmt_dict, y_label_kwargs=axis_label_fmt_dict, 
        legend_kwargs=legend_kwargs, 
        denoise=denoise, denoise_params=std_denoise_params)
ax.tick_params(**tick_label_fmt_dict)
plt.show()

Change Statistics



In [32]:

    
# Add area to the stats table.
# 1. Get resolution. In this case, we know the x and y coordinates are longitude and latitude
#    and that the products all have the same resolution.
prod_info = dc.list_products()
deg_per_px = prod_info[prod_info['name'] == all_products[0]]['resolution'].values[0]
deg_per_px = np.abs(deg_per_px)
# Roughly 111km per degree of latitude and longitude.
km_per_px = 111 * deg_per_px
# 2. Calculate the total area
length_in_km = km_per_px * np.array([len(water_das['baseline'].latitude),
                                         len(water_das['baseline'].longitude)])
total_area_in_km = np.prod(length_in_km)
# 3. Compute the area for each row of the stats table and display the table.
stats_table_two.loc[:, 'Area (km^2)'] = stats_table_two.loc[:,'Percent'] * total_area_in_km

# Format percent as strings.
stats_table_two_fmt = stats_table_two.copy()
stats_table_two_fmt.loc[:, 'Percent'] = [f"{pct:0.2%}" for pct in stats_table_two.loc[:, 'Percent']]
stats_table_two_fmt.loc[:, 'Area (km^2)'] = [f"{area:.3f}" for area in stats_table_two.loc[:, 'Area (km^2)']]
stats_table_two_fmt









    Out[32]:







  
    
      
      Number
      Percent
      Area (km^2)
    
  
  
    
      No Water to No Water
      36258
      72.59%
      34.470
    
    
      No Water to Water
      56
      0.11%
      0.053
    
    
      Water to No Water
      1085
      2.17%
      1.032
    
    
      Water to Water
      12551
      25.13%
      11.932
    
    
      Net Change
      1141
      2.28%
      1.085
    
    
      Unknown
      0
      0.00%
      0.000

Change Matrix



In [33]:

    
from utils.data_cube_utilities.dc_utilities import reverse_array_dict
from utils.data_cube_utilities.plotter_utils import create_discrete_color_map
from utils.data_cube_utilities.plotter_utils import print_matrix

# Create a colormap for coloring the cells in the change matrix.
# The colors are, in order, gray, red, green, yellow, and blue. Many are manually specified
# as rgb lists because their matplotlib named colors are fairly bright and saturated.
change_matrix_cmap = \
    create_discrete_color_map([0,4], colors=['gray', [225,16,16], [16,128,16], [235,235,0], [32,32,192]])

# This is a matrix specifying each class transition based on the color it should recieve.
# Original classes are indexed by row and final classes are indexed by column.
cng_vals = [0, 1, 2, 3, 4]
cng_mat = np.array([
    # Always Sometimes Never
    [     4,      1,      1], # Always
    [     2,      3,      1], # Sometimes 
    [     2,      2,      0], # Never   
])

# Create mappings of transitions to values for coloring based on a colormap.
cls_trans_for_value = {cng_val:[] for cng_val in cng_vals}
for i, orig_class in enumerate(change_matrix.baseline.values):
    for j, final_class in enumerate(change_matrix.analysis.values):
        cng_val = cng_mat[i,j]
        cls_trans_for_value[cng_val].append((orig_class, final_class))
value_for_cls_trans = reverse_array_dict(cls_trans_for_value)

# Create the cell value matrix (used to color cells).
cell_value_mtx = np.empty_like(change_matrix.Percent.values)
for i, cls_label1 in enumerate(change_matrix.baseline.values):
    for j, cls_label2 in enumerate(change_matrix.analysis.values):
        cell_value_mtx[i,j] = value_for_cls_trans[(cls_label1, cls_label2)]

cell_label_mtx = np.full_like(cell_value_mtx, '', dtype=object)
for i, baseline_category in enumerate(change_matrix['baseline']):
    for j, analysis_category in enumerate(change_matrix['analysis']):
        cell_label_mtx[i,j] = "{0:.2%}".format(change_matrix.Percent.values[i,j])

row_labels = [label.capitalize() for label in change_matrix.baseline.values]
col_labels = [label.capitalize() for label in change_matrix.analysis.values]

fig = plt.figure(figsize=(6,6))
fig, ax = print_matrix(cell_value_mtx, cell_val_fmt='s',
                       cell_label_mtx=cell_label_mtx, cmap=change_matrix_cmap,
                       row_labels=row_labels, col_labels=col_labels, 
                       x_axis_ticks_position='top', x_axis_tick_kwargs={},
                       annot_kwargs=dict(size=14), fig=fig)
ax.yaxis.set_label_position('left')
plt.ylabel('Baseline Class', fontsize=18)
ax.xaxis.set_label_position('top')
plt.xlabel('Analysis Class', fontsize=18)
plt.show()

Calculate the area of each of the 9 transition classes in the change matrix.



In [34]:

    
# 1. Get resolution. In this case, we know the x and y coordinates are longitude and latitude
#    and that the products all have the same resolution.
prod_info = dc.list_products()
deg_per_px = prod_info[prod_info['name'] == all_products[0]]['resolution'].values[0]
deg_per_px = np.abs(deg_per_px)
# Roughly 111km per degree of latitude and longitude.
meters_per_px = 111000 * deg_per_px

change_matrix['Area (m^2)'] = change_matrix.Number * np.prod(meters_per_px)
cng_mtx_as_table = change_matrix.to_dataframe().reset_index()
baseline_analysis_cls_strs = cng_mtx_as_table[['baseline', 'analysis']].values
cng_mtx_as_table.drop(columns=['analysis', 'baseline'], inplace=True)
cng_mtx_as_table.index = ["{} to {}".format(base_cls, analysis_cls) for base_cls, analysis_cls 
                            in baseline_analysis_cls_strs]
cng_mtx_as_table.style.format({'percent': "{:.2%}"})









    Out[34]:




                    Number         Percent         Area (m^2)     

                
                        always to always
                        5107
                        0.102242
                        4855196.527855
            
            
                        sometimes to always
                        141
                        0.002823
                        134047.916669
            
            
                        never to always
                        0
                        0.000000
                        0.000000
            
            
                        always to sometimes
                        4598
                        0.092052
                        4371293.055625
            
            
                        sometimes to sometimes
                        2705
                        0.054154
                        2571628.472263
            
            
                        never to sometimes
                        56
                        0.001121
                        53238.888890
            
            
                        always to never
                        0
                        0.000000
                        0.000000
            
            
                        sometimes to never
                        1085
                        0.021722
                        1031503.472239
            
            
                        never to never
                        36258
                        0.725886
                        34470279.167218

Classify the Area by Degree of Change ▴



In [35]:

    
# Calculate the net change of water from the baseline to the analysis time periods.
starting_water_num = stats_table_two.loc[['Water to No Water', 'Water to Water'], ['Number']].sum()[0]
net_change_water_num = (stats_table_two.loc['No Water to Water', 'Number'] - \
                        stats_table_two.loc['Water to No Water', 'Number'])
net_change_water_pct = net_change_water_num / starting_water_num
print("The percent of change in the water extent from the baseline time period " \
      "to the analysis time period (relative change from the baseline) "\
      "is {0:.2%}.\n".format(net_change_water_pct))

# Classify the area by degree of change.
# Transitions from never water to some/always water (and vice versa) are considered changes.
frac_change_cls = stats_table_two.loc[['No Water to Water', 'Water to No Water'], ['Percent']].sum()[0]

region_class = None
frac_change_cls = abs(frac_change_cls)
if frac_change_cls <= 0.1:
    region_class = "Unmodified Natural (class A)"
if 0.1 < frac_change_cls <= 0.2:
    region_class = "Largely Natural (class B)"
if 0.2 < frac_change_cls <= 0.4:
    region_class = "Moderately Modified (class C)"
if 0.4 < frac_change_cls <= 0.6:
    region_class = "Largely Modified (class D)"
if 0.6 < frac_change_cls:
    region_class = "Seriously Modified (class E)"
print("The percent of pixels that are different water classes between the baseline and analysis time periods " \
      "is {0:.2%}, so this region can be classified as {1}".format(frac_change_cls, region_class))









    



The percent of change in the water extent from the baseline time period to the analysis time period (relative change from the baseline) is -7.55%.

The percent of pixels that are different water classes between the baseline and analysis time periods is 2.28%, so this region can be classified as Unmodified Natural (class A)

	Date (YYYY/MM/DD)	Number	Percent	Area (km^2)
Never Water		37343.0	74.76%	35.501783
Sometimes Water		7359.0	14.73%	6.996160
Always Water		5248.0	10.51%	4.989244
Max Water	2014/01/14	12554.0	25.13%	11.935018
Min Water	2014/12/26	6930.0	13.87%	6.588313
No Data		0.0	0.00%	0.000000

	Number	Percent	Area (km^2)
No Water to No Water	36258	72.59%	34.470
No Water to Water	56	0.11%	0.053
Water to No Water	1085	2.17%	1.032
Water to Water	12551	25.13%	11.932
Net Change	1141	2.28%	1.085
Unknown	0	0.00%	0.000

	Number	Percent	Area (m^2)
always to always	5107	0.102242	4855196.527855
sometimes to always	141	0.002823	134047.916669
never to always	0	0.000000	0.000000
always to sometimes	4598	0.092052	4371293.055625
sometimes to sometimes	2705	0.054154	2571628.472263
never to sometimes	56	0.001121	53238.888890
always to never	0	0.000000	0.000000
sometimes to never	1085	0.021722	1031503.472239
never to never	36258	0.725886	34470279.167218

UN SDG Indicator 6.6.1: Change in the Extent of Water-related Ecosystems

Notebook Summary

Index

Import Dependencies and Connect to the Data Cube ▴

Choose Platforms and Products ▴

Get the Extents of the Cube ▴

Analyze a Single Time Period ▴

Define the Extents of the Analysis ▴

Retrieve the Data from the Datacube ▴

Get Water Classifications Using the WOFS Algorithm ▴

Prepare for Visualization ▴

Find the Minimum and Maximum Water Extents ▴

Create Water Extent Image ▴

Create a Time Series Plot of the Water ▴

Compare Two Time Periods - a Baseline and an Analysis ▴

Define the Extents of the Analysis ▴

Retrieve the Data from the Datacube ▴

Get Water Classifications Using the WOFS Algorithm ▴

Show Water Extents of the Baseline and Analysis Periods ▴

Show and Analyze the Differences Between the Two Time Periods ▴

Classify the Area by Degree of Change ▴

UN SDG Indicator 6.6.1:
Change in the Extent of Water-related Ecosystems