CEOS Data Cube - WOFS Classification on Combined Data Sources


Description: This Python notebook allows users to directly interact with a CEOS-formatted data cube to perform analyses for water management. The following steps will allow users to connect to a data cube, define the analysis location and time period (extent of latitude/longitude and dates), and then run the Australian Water Observations from Space (WOFS) algorithm. The outputs of the WOFS algorithm include static and time series pixel-level water observations for any pixel. These results provide critical information for water management that will allow users to assess water cycle dynamics, historical water extent and the risk of floods and droughts. In this notebook, Landsat 5, 7, and 8 data will be used to create the output product. Future versions may consider the addition of water quality parameters (e.g. Total Suspended Matter, Chlorophyll-A, CDOM), coastal erosion analyses and in-situ precipitation and surface temperature data.


Import necessary Data Cube libraries and dependencies.


In [1]:
%matplotlib inline

import matplotlib.pyplot as plt

from datetime import datetime, timedelta
import numpy as np

import datacube
from utils.data_cube_utilities.dc_water_classifier import wofs_classify
from utils.data_cube_utilities.dc_utilities import perform_timeseries_analysis
from utils.data_cube_utilities.dc_display_map import display_map
import dc_au_colormaps

import xarray as xr

from dc_notebook_utilities import create_acq_date_gui

# supress warnings
import warnings
warnings.filterwarnings('ignore')

First, we must connect to our data cube. We can then query the contents of the data cube we have connected to, including both the metadata and the actual data.


In [2]:
from utils.data_cube_utilities.data_access_api import DataAccessApi
api = DataAccessApi()
dc = api.dc

Obtain the metadata of our cube... Initially, we need to get the platforms and products in the cube. The rest of the metadata will be dependent on these two options. We currently only have multisensor data for our Lake Chad region, so the metadata is generated below.

List available products for each platform


In [3]:
# Get available products
products_info = dc.list_products()

In [4]:
# List LANDSAT 7 products
print("LANDSAT 7 Products:")
products_info[["platform", "name"]][products_info.platform == "LANDSAT_7"]


LANDSAT 7 Products:
Out[4]:
platform name
id
2 LANDSAT_7 ls7_usgs_sr_scene

In [5]:
# List LANDSAT 8 products
print("LANDSAT 8 Products:")
products_info[["platform", "name"]][products_info.platform == "LANDSAT_8"]


LANDSAT 8 Products:
Out[5]:
platform name
id
1 LANDSAT_8 ls8_usgs_sr_scene

In [6]:
platforms = ['LANDSAT_7', 'LANDSAT_8']
products = ['ls7_usgs_sr_scene', 'ls8_usgs_sr_scene']

With the platform and product, we can get the rest of the metadata. This includes the resolution of a pixel, the latitude/longitude extents, and the minimum and maximum dates available of the chosen platform/product combination.


In [7]:
from utils.data_cube_utilities.dc_load import get_overlapping_area

full_lat, full_lon, min_max_dates = get_overlapping_area(api, platforms, products)

In [8]:
from utils.data_cube_utilities.dc_time import dt_to_str
print("Available Latitude Extents:", full_lat)
print("Available Longitude Extents:", full_lon)
print("Available Time Extents:", np.vectorize(dt_to_str)(min_max_dates))


Available Latitude Extents: (-12.57305555565614, 18.32305555570214)
Available Longitude Extents: (-25.47250000020378, 44.01000000035208)
Available Time Extents: [['1999-07-08' '2020-01-10']
 ['2013-03-21' '2020-01-27']]

In [9]:
display_map(latitude=full_lat, longitude=full_lon)


Out[9]:

In [10]:
start_dates = ["2000-01-01","2015-01-01"]
end_dates = ["2000-12-31", "2015-12-31"]

# Weija Reservoir, Ghana
lon_small = (-0.410, -0.330)
lat_small = (5.545, 5.620)

# Part of Lake Volta
lon_small = (-0.0847, -0.0545)
lat_small = (6.7616, 6.8312)

display_map(lat_small, lon_small)


Out[10]:

The entire dataset for all of the defined products and platforms are loaded below.

They are loaded seperately and then combined over the time axis of the data. Using the full time extent, you will see that there are over 400 seperate acquisitions.


In [11]:
dc.list_products()


Out[11]:
name description lon platform instrument time product_type creation_time format label lat crs resolution tile_size spatial_dimensions
id
2 ls7_usgs_sr_scene Landsat 7 USGS Collection 1 Level2 Surface Ref... None LANDSAT_7 ETM None LEDAPS None GeoTiff None None EPSG:4326 (-0.00027777777778, 0.00027777777778) None (latitude, longitude)
1 ls8_usgs_sr_scene Landsat 8 USGS Collection 1 Level2 Surface Ref... None LANDSAT_8 OLI_TIRS None LaSRC None GeoTiff None None EPSG:4326 (-0.00027777777778, 0.00027777777778) None (latitude, longitude)

At this point, we must access our data cube and analyze our data. In this example, we will run the WOFS algorithm. The wofs_classify function, seen below, will return a modified dataset, where a value of 1 indicates the pixel has been classified as water by the WoFS algorithm and 0 represents the pixel is non-water. You will see that this output dataset is of the same dimensions as the input dataset.


For more information on the WOFS algorithm, refer to:

Mueller, et al. (2015) "Water observations from space: Mapping surface water from 25 years of Landsat imagery across Australia." Remote Sensing of Environment.


In [12]:
from utils.data_cube_utilities.clean_mask import landsat_qa_clean_mask
wofs_classifications = []
measurements = ['blue', 'green', 'red', 'nir', 'swir1', 'swir2', 'pixel_qa']

for i, (product,platform) in enumerate(zip(products, platforms)):
    start_date = start_dates[i]
    end_date = end_dates[i]
    # Query the Data Cube
    dataset_in = dc.load(platform  = platform,
                         product   = product,
                         time      = (start_date, end_date),
                         lon       = lon_small, 
                         lat       = lat_small,
                         measurements=measurements)
    if len(dataset_in.dims) == 0: # The dataset is empty.
        continue
    # WOFS classification
    clean_mask = landsat_qa_clean_mask(dataset_in, platform)
    water_class = wofs_classify(dataset_in, clean_mask=clean_mask.values)
    wofs_classifications.append(water_class.copy(deep=True))
complete_dataset = None
if len(wofs_classifications) != 0:
    complete_dataset = xr.concat(wofs_classifications, 'time')
else:
    complete_dataset = xr.Dataset()

Execute the following code and then use the generated form to choose your desired acquisition date. The following two code blocks are only necessary if you would like to see the water mask of a single acquisition date.


In [13]:
acq_dates = list(water_class.time.values.astype(str))
acq_dates = list(map(lambda x:x.split("T")[0],acq_dates))
acq_date_input = create_acq_date_gui(acq_dates)



In [14]:
# Save form value
acq_date = acq_date_input.value
acq_date_index = acq_dates.index(acq_date)

# Get water class for selected acquisition date and mask no data values
water_class_for_acq_date = water_class.wofs[acq_date_index]
water_class_for_acq_date.values = water_class_for_acq_date.values.astype('float')
water_class_for_acq_date.values[water_class_for_acq_date.values == -9999] = np.nan

if len(water_class_for_acq_date.values[water_class_for_acq_date.values > -9999]) > 0:
    water_observations_for_acq_date_plot = water_class_for_acq_date.plot(cmap='BuPu')


With all of the pixels classified as either water/non-water, let's perform a time series analysis over our derived water class. The function, perform_timeseries_analysis, takes in a dataset of 3 dimensions (time, latitude, and longitude), then sums the values of each pixel over time. It also keeps track of the number of clear observations we have at each pixel. We can then normalize each pixel to determine areas at risk of flooding. The normalization calculation is simply:

$$normalized\_water\_observations = \dfrac{total\_water\_observations}{total\_clear\_observations}$$

.

The output each of the three calculations can be seen below.


In [15]:
time_series = perform_timeseries_analysis(water_class, 'wofs')
time_series


Out[15]:
<xarray.Dataset>
Dimensions:          (latitude: 252, longitude: 109)
Coordinates:
  * latitude         (latitude) float64 6.831 6.831 6.831 ... 6.762 6.762 6.762
  * longitude        (longitude) float64 -0.08458 -0.08431 ... -0.05486 -0.05458
Data variables:
    normalized_data  (latitude, longitude) float64 0.0 0.0 0.0 ... 0.0 0.0 0.0
    min              (latitude, longitude) float64 0.0 0.0 0.0 ... 0.0 0.0 0.0
    max              (latitude, longitude) float64 0.0 0.0 0.0 ... 0.0 0.0 0.0
    total_data       (latitude, longitude) float64 0.0 0.0 0.0 ... 0.0 0.0 0.0
    total_clean      (latitude, longitude) int64 13 13 12 12 12 ... 11 11 11 11

In [16]:
normalized_water_observations_plot = time_series.normalized_data.plot(cmap='dc_au_WaterSummary')



In [17]:
total_water_observations_plot = time_series.total_data.plot(cmap='dc_au_WaterObservations')



In [18]:
total_clear_observations_plot = time_series.total_clean.plot(cmap='dc_au_ClearObservations')