CEOS Data Cube - Water Analysis Notebook

Description: This Python notebook allows users to directly interact with a CEOS-formatted data cube to perform analyses for water management. The following steps will allow users to connect to a data cube, define the analysis location and time period (extent of latitude/longitude and dates), and then run the Australian Water Observations from Space (WOFS) algorithm. The outputs of the WOFS algorithm include static and time series pixel-level water observations for any pixel. These results provide critical information for water management that will allow users to assess water cycle dynamics, historical water extent and the risk of floods and droughts. Future versions may consider the addition of water quality parameters (e.g. Total Suspended Matter, Chlorophyll-A, CDOM), coastal erosion analyses and in-situ precipitation and surface temperature data.

Sensors: Landsat 8, Landsat 7 (lasrc)

Import necessary Data Cube libraries and dependencies.



In [1]:

    
%matplotlib inline

from datetime import datetime
import numpy as np

import datacube
from utils.data_cube_utilities.dc_water_classifier import wofs_classify
from utils.data_cube_utilities.dc_utilities import perform_timeseries_analysis, create_cfmask_clean_mask
import dc_au_colormaps
import math

from utils.data_cube_utilities.dc_display_map import display_map
from dc_notebook_utilities import create_platform_product_gui, generate_metadata_report, show_map_extents, create_extents_gui, create_acq_date_gui

# supress warnings
import warnings
warnings.filterwarnings('ignore')

First, we must connect to our data cube. We can then query the contents of the data cube we have connected to, including both the metadata and the actual data.



In [2]:

    
# Create an instance of the datacube and API.
from utils.data_cube_utilities.data_access_api import DataAccessApi
api = DataAccessApi()
dc = api.dc

Obtain the metadata of our cube... Initially, we need to get the platforms and products in the cube. The rest of the metadata will be dependent on these two options.

List available products for each platform



In [3]:

    
# Get available products
products_info = dc.list_products()



In [4]:

    
# List LANDSAT 7 products
print("LANDSAT 7 Products:")
products_info[["platform", "name"]][products_info.platform == "LANDSAT_7"]









    



LANDSAT 7 Products:






    Out[4]:







  
    
      
      platform
      name
    
    
      id
      
      
    
  
  
    
      2
      LANDSAT_7
      ls7_usgs_sr_scene



In [5]:

    
# List LANDSAT 8 products
print("LANDSAT 8 Products:")
products_info[["platform", "name"]][products_info.platform == "LANDSAT_8"]









    



LANDSAT 8 Products:






    Out[5]:







  
    
      
      platform
      name
    
    
      id
      
      
    
  
  
    
      1
      LANDSAT_8
      ls8_usgs_sr_scene

Choose a platform and product from those listed above.



In [6]:

    
platform = 'LANDSAT_7'
product = 'ls7_usgs_sr_scene'

With the platform and product, we can get the rest of the metadata. This includes the resolution of a pixel, the latitude/longitude extents, and the minimum and maximum dates available of the chosen platform/product combination.



In [7]:

    
extents = api.get_query_metadata(product=product, platform=platform, measurements=[])



In [8]:

    
from utils.data_cube_utilities.dc_time import dt_to_str

latitude_extents = extents['lat_extents']
longitude_extents = extents['lon_extents']
time_extents = extents['time_extents']
print("Available Latitude Extents:", latitude_extents)
print("Available Longitude Extents:", longitude_extents)
print("Available Time Extents:", np.vectorize(dt_to_str)(time_extents))









    



Available Latitude Extents: (-12.57305555565614, 18.32305555570214)
Available Longitude Extents: (-25.66583333353866, 44.05861111146359)
Available Time Extents: ['1999-07-08' '2020-01-10']



In [9]:

    
display_map(latitude=latitude_extents,longitude=longitude_extents)









    Out[9]:

Now choose extents that fall within the above bounding box.



In [10]:

    
# Lake Chad
# start_date = "1990-01-01"
# end_date = "1995-12-31"
# min_lon_small, max_lon_small = (14.244, 14.493)
# min_lat_small, max_lat_small = (12.737, 12.987)

# Southern Ghana (Land Change - #3 - Accra - small)
start_date = "2000-01-01"
end_date = "2000-12-31"
min_lat_small, max_lat_small = (5.3496, 5.5496) 
min_lon_small, max_lon_small = (-0.5622, -0.3221)

lon_small = (min_lon_small, max_lon_small)
lat_small = (min_lat_small, max_lat_small)
display_map(lat_small, lon_small)









    Out[10]:

Now that we have filled out the above two forms, we have enough information to query our data cube. The following code snippet ends with the actual Data Cube query, which will return the dataset with all the data matching our query.



In [11]:

    
params = dict(platform=platform,
              product=product,
              time=(start_date, end_date),
              lon=lon_small, 
              lat=lat_small,
              measurements=['blue', 'green', 'red', 'nir', 'swir1', 'swir2', 'pixel_qa'])
# Query the Data Cube
dataset_in = dc.load(**params)

At this point, we have finished accessing our data cube and we can turn to analyzing our data. In this example, we will run the WOfS algorithm. The wofs_classify function, seen below, will return a modified dataset, where a value of 1 indicates the pixel has been classified as water by the WoFS algorithm and 0 represents the pixel is non-water.

For more information on the WOfS algorithm, refer to:

Mueller, et al. (2015) "Water observations from space: Mapping surface water from 25 years of Landsat imagery across Australia." Remote Sensing of Environment.



In [12]:

    
from utils.data_cube_utilities.clean_mask import landsat_qa_clean_mask
clean_mask = landsat_qa_clean_mask(dataset_in, platform)
water_class = wofs_classify(dataset_in, clean_mask=clean_mask.values)

Execute the following code and then use the generated form to choose your desired acquisition date. The following two code blocks are only necessary if you would like to see the water mask of a single acquisition date.



In [13]:

    
acq_dates = list(water_class.time.values.astype(str))
acq_dates = list(map(lambda x:x.split("T")[0],acq_dates))
acq_date_input = create_acq_date_gui(acq_dates)



In [14]:

    
# Save form value
acq_date = acq_date_input.value
acq_date_index = acq_dates.index(acq_date)

# Get water class for selected acquisition date and mask no data values
water_class_for_acq_date = water_class.wofs[acq_date_index]
water_class_for_acq_date.values = water_class_for_acq_date.values.astype('float')
water_class_for_acq_date.values[water_class_for_acq_date.values == -9999] = np.nan

water_observations_for_acq_date_plot = water_class_for_acq_date.plot(cmap='BuPu')

With all of the pixels classified as either water/non-water, let's perform a time series analysis over our derived water class. The function, perform_timeseries_analysis, takes in a dataset of 3 dimensions (time, latitude, and longitude), then sums the values of each pixel over time. It also keeps track of the number of clear observations we have at each pixel. We can then normalize each pixel to determine areas at risk of flooding. The normalization calculation is simply:

$$normalized\_water\_observations = \dfrac{total\_water\_observations}{total\_clear\_observations}$$

The output each of the three calculations can be seen below.



In [15]:

    
time_series = perform_timeseries_analysis(water_class, 'wofs')

The following plots visualize the results of our timeseries analysis. You may change the color scales with the cmap option. For color scales available for use by cmap, see http://matplotlib.org/examples/color/colormaps_reference.html. You can also define discrete color scales by using the levels and colors. For example:

normalized_water_observations_plot = normalized_water_observations.plot(levels=3, colors=['#E5E5FF', '#4C4CFF', '#0000FF'])
normalized_water_observations_plot = normalized_water_observations.plot(levels=[0.00, 0.50, 1.01], colors=['#E5E5FF', '#0000FF'])

For more examples on how you can modify plots, see http://xarray.pydata.org/en/stable/plotting.html.



In [16]:

    
normalized_water_observations_plot = time_series.normalized_data.plot(cmap='dc_au_WaterSummary')



In [17]:

    
total_water_observations_plot = time_series.total_data.plot(cmap='dc_au_WaterObservations')



In [18]:

    
total_clear_observations_plot = time_series.total_clean.plot(cmap='dc_au_ClearObservations')