CEOS Data Cube - Water Analysis Notebook


Description: This Python notebook allows users to directly interact with a CEOS-formatted data cube to perform analyses for water management. The following steps will allow users to connect to a data cube, define the analysis location and time period (extent of latitude/longitude and dates), and then run the Australian Water Observations from Space (WOFS) algorithm. The outputs of the WOFS algorithm include static and time series pixel-level water observations for any pixel. These results provide critical information for water management that will allow users to assess water cycle dynamics, historical water extent and the risk of floods and droughts. Future versions may consider the addition of water quality parameters (e.g. Total Suspended Matter, Chlorophyll-A, CDOM), coastal erosion analyses and in-situ precipitation and surface temperature data.


Import necessary Data Cube libraries and dependencies.


In [1]:
%matplotlib inline

from datetime import datetime
import numpy as np

import datacube
from dc_water_classifier import wofs_classify
from dc_utilities import perform_timeseries_analysis
import dc_au_colormaps

from dc_notebook_utilities import *

First, we must connect to our data cube. We can then query the contents of the data cube we have connected to, including both the metadata and the actual data.


In [2]:
dc = datacube.Datacube(app='dc-water-analysis')
api = datacube.api.API(datacube=dc)

Obtain the metadata of our cube... Initially, we need to get the platforms and products in the cube. The rest of the metadata will be dependent on these two options.


In [3]:
# Get available products
products = dc.list_products()
platform_names = list(set(products.platform))
product_names = list(products.name)

Execute the following code and then use the generated form to choose your desired platfrom and product.


In [4]:
product_values = create_platform_product_gui(platform_names, product_names)

With the platform and product, we can get the rest of the metadata. This includes the resolution of a pixel, the latitude/longitude extents, and the minimum and maximum dates available of the chosen platform/product combination.


In [5]:
# Save the form values
platform = product_values[0].value
product = product_values[1].value

# Get the pixel resolution of the selected product
resolution = products.resolution[products.name == product]
lat_dist = resolution.values[0][0]
lon_dist = resolution.values[0][1]

# Get the extents of the cube
descriptor = api.get_descriptor({'platform': platform})[product]

min_date = descriptor['result_min'][0]
min_lat = descriptor['result_min'][1]
min_lon = descriptor['result_min'][2]

min_date_str = str(min_date.year) + '-' + str(min_date.month) + '-' + str(min_date.day)

min_lat_rounded = round(min_lat, 3)
min_lon_rounded =  round(min_lon, 3)

max_date = descriptor['result_max'][0]
max_lat = descriptor['result_max'][1] 
max_lon = descriptor['result_max'][2] 

max_date_str = str(max_date.year) + '-' + str(max_date.month) + '-' + str(max_date.day)

max_lat_rounded = round(max_lat, 3) #calculates latitude of the pixel's center
max_lon_rounded = round(max_lon, 3) #calculates longitude of the pixel's center

# Display metadata
generate_metadata_report(min_date_str, max_date_str, 
                         min_lon_rounded, max_lon_rounded, lon_dist,
                         min_lat_rounded, max_lat_rounded, lat_dist)

show_map_extents(min_lon_rounded, max_lon_rounded, min_lat_rounded, max_lat_rounded)


Metadata Report:

MinMaxResolution
Date: 2005-1-82016-4-12
Longitude: 34.037.00.000269493
Latitude: -1.01.0-0.000269493

Execute the following code and then use the generated form to choose the extents of your desired data.


In [7]:
extent_values = create_extents_gui(min_date_str, max_date_str,
                                   min_lon_rounded, max_lon_rounded,
                                   min_lat_rounded, max_lat_rounded)

Now that we have filled out the above two forms, we have enough information to query our data cube. The following code snippet ends with the actual Data Cube query, which will return the dataset with all the data matching our query.


In [8]:
# Save form values
start_date = datetime.strptime(extent_values[0].value, '%Y-%m-%d')
end_date = datetime.strptime(extent_values[1].value, '%Y-%m-%d')
min_lon = extent_values[2].value
max_lon = extent_values[3].value
min_lat = extent_values[4].value
max_lat = extent_values[5].value

# Query the Data Cube
dataset_in = dc.load(platform=platform,
                     product=product,
                     time=(start_date, end_date),
                     lon=(min_lon, max_lon), 
                     lat=(min_lat, max_lat))

At this point, we have finished accessing our data cube and we can turn to analyzing our data. In this example, we will run the WOfS algorithm. The wofs_classify function, seen below, will return a modified dataset, where a value of 1 indicates the pixel has been classified as water by the WoFS algorithm and 0 represents the pixel is non-water.


For more information on the WOfS algorithm, refer to:

Mueller, et al. (2015) "Water observations from space: Mapping surface water from 25 years of Landsat imagery across Australia." Remote Sensing of Environment.


In [14]:
water_class = wofs_classify(dataset_in)

Execute the following code and then use the generated form to choose your desired acquisition date. The following two code blocks are only necessary if you would like to see the water mask of a single acquisition date.


In [15]:
acq_dates = list(water_class.time.values.astype(str))
acq_date_input = create_acq_date_gui(acq_dates)

In [16]:
# Save form value
acq_date = acq_date_input.value
acq_date_index = acq_dates.index(acq_date)

# Get water class for selected acquisition date and mask no data values
water_class_for_acq_date = water_class.wofs[acq_date_index]
water_class_for_acq_date.values = water_class_for_acq_date.values.astype('float')
water_class_for_acq_date.values[water_class_for_acq_date.values == -9999] = np.nan

water_observations_for_acq_date_plot = water_class_for_acq_date.plot(cmap='BuPu')


With all of the pixels classified as either water/non-water, let's perform a time series analysis over our derived water class. The function, perform_timeseries_analysis, takes in a dataset of 3 dimensions (time, latitude, and longitude), then sums the values of each pixel over time. It also keeps track of the number of clear observations we have at each pixel. We can then normalize each pixel to determine areas at risk of flooding. The normalization calculation is simply:

$$normalized\_water\_observations = \dfrac{total\_water\_observations}{total\_clear\_observations}$$

.

The output each of the three calculations can be seen below.


In [17]:
time_series = perform_timeseries_analysis(water_class)

The following plots visualize the results of our timeseries analysis. You may change the color scales with the cmap option. For color scales available for use by cmap, see http://matplotlib.org/examples/color/colormaps_reference.html. You can also define discrete color scales by using the levels and colors. For example:

  • normalized_water_observations_plot = normalized_water_observations.plot(levels=3, colors=['#E5E5FF', '#4C4CFF', '#0000FF'])
  • normalized_water_observations_plot = normalized_water_observations.plot(levels=[0.00, 0.50, 1.01], colors=['#E5E5FF', '#0000FF'])

For more examples on how you can modify plots, see http://xarray.pydata.org/en/stable/plotting.html.


In [18]:
normalized_water_observations_plot = time_series.normalized_data.plot(cmap='dc_au_WaterSummary')



In [19]:
total_water_observations_plot = time_series.total_data.plot(cmap='dc_au_WaterObservations')



In [20]:
total_clear_observations_plot = time_series.total_clean.plot(cmap='dc_au_ClearObservations')



In [ ]: