Cloud-Filtered Custom Mosaics


Notebook Summary

This notebook can be used to create custom Landsat cloud-filtered mosaics for any time period and location. The mosaics can be output as GeoTIFF products for analysis in external GIS tools. The following mosaics are possible:

Median = midpoint of spectral data
Geomedian = Australian median product with improved spectral consistency
Most-Recent = most-recent clear pixel
Max-NDVI = maximum vegetation response

Users should review the DCAL Cloud Statistics notebook for more information about the cloud statistics for any given temporal and spatial combination. An understanding of the underlying data is important for creating a valid mosaic for further analyses. In many cases, cloud contamination will create poor mosaics. With a careful review of the cloud coverage over a given region and time period, it is possible to improve the mosaics and avoid false outputs.


Index

  • Import Dependencies and Connect to the Data Cube
  • Choose Platforms and Products
  • Get the Extents of the Cube
  • Define the Analysis Parameters
  • Load and Clean Data from the Data Cube
  • Create Mosaics
  • Create GeoTIFF Output Products

Import Dependencies and Connect to the Data Cube


In [1]:
# Enable importing of utilities.
import sys
sys.path.append('..')

# Supress warnings.
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import xarray as xr
import pandas as pd
import matplotlib.pyplot as plt
import bokeh
import holoviews as hv
import panel as pn

# Load Data Cube configuration.
import datacube
import utils.data_access_api as dc_api  
api = dc_api.DataAccessApi()
dc = api.dc

In [2]:
# Configure visualization libraries.
pn.extension()
# Use the bokeh backend for HoloViews.
hv.extension('bokeh')
bokeh.io.output_notebook()


Loading BokehJS ...

In [3]:
from utils.dask import create_local_dask_cluster

client = create_local_dask_cluster(spare_mem='3Gb')
client


Out[3]:

Client

Cluster

  • Workers: 3
  • Cores: 6
  • Memory: 13.64 GB

Choose Platforms and Products

<p style="color:red";>CHANGE INPUTS BELOW


In [4]:
def get_prods_for_platform(platform):
    prod_info = dc.list_products()
    return list(prod_info.loc[prod_info.platform == platform].name)

platforms = ["LANDSAT_7", "LANDSAT_8"]
platform_wgt = pn.widgets.Select(name='platform', value='LANDSAT_8', options=platforms)
prod_wgt = pn.widgets.Select(name='product', options=get_prods_for_platform(platform_wgt.value))

@pn.depends(platform_wgt.param.value, watch=True)
def update_prod_list(platform):
    prods = get_prods_for_platform(platform)
    prod_wgt.options = prods
    prod_wgt.value = prods[0] if len(prods) > 0 else None

pn.Row(platform_wgt, prod_wgt)


Out[4]:

Get the Extents of the Cube


In [5]:
from utils.dc_load import get_product_extents
from utils.dc_time import dt_to_str

full_lat, full_lon, min_max_dates = get_product_extents(api, platform_wgt.value, prod_wgt.value)

# Print the extents of the combined data.
print("Latitude Extents:", full_lat)
print("Longitude Extents:", full_lon)
print("Time Extents:", list(map(dt_to_str, min_max_dates)))


Latitude Extents: (-12.63361111121218, 18.40166666681388)
Longitude Extents: (-25.47250000020378, 44.01000000035208)
Time Extents: ['2013-03-21', '2020-01-27']

Visualize the available area


In [6]:
from utils.dc_display_map import display_map
display_map(full_lat, full_lon)


Out[6]:

Define the Analysis Parameters

<p style="color:red";>CHANGE INPUTS BELOW


In [7]:
from time import sleep
import datetime

# Select an analysis region (Lat-Lon) within the extents listed above. 
# HINT: Keep your region small (<0.5 deg square) to avoid memory overload issues
# Select a time period (Min-Max) within the extents

## Location, date, and spatial slider widgets ##

locations = {'Mombasa, Kenya': {'lat': (-4.1131, -3.9853), 'lon': (39.5445, 39.7420)},
         'Freetown, Sierra Leone': {'lat': (8.3267, 8.5123), 'lon': (-13.3109, -13.1197)},
         'Tano-Offin Forest - Ghana': {'lat': (6.5814, 6.8978), 'lon': (-2.2955, -1.9395)},
         'Custom': {}}
location_wgt = pn.widgets.Select(name='Location', value='Mombasa, Kenya', options=list(locations.keys()))
date_wgt = pn.widgets.DateRangeSlider(name='Time Range', start=min_max_dates[0], end=min_max_dates[1], 
                                      value=(datetime.datetime(2016,1,1), datetime.datetime(2016,12,31)))
lat_wgt = pn.widgets.RangeSlider(name='Latitude', start=full_lat[0], end=full_lat[1], step=0.1,
                                 value=locations[location_wgt.value]['lat'])
lon_wgt = pn.widgets.RangeSlider(name='Longitude', start=full_lon[0], end=full_lon[1], step=0.1,
                                 value=locations[location_wgt.value]['lon'])

# If true, denotes that changes in lat/lon are caused by a location widget change,
# not a change in one of the 4 float widgets (lat min, lat max, lon min, lon max).
location_changed = [False]
@pn.depends(location_wgt.param.value, watch=True)
def location_handler(location, location_changed=location_changed):
    # Update the lat/lon sliders with the values for the selected location.
    if location != 'Custom':
        location_changed[0] = True
        lat_wgt.value = locations[location].get('lat', lat_wgt.value)
        lon_wgt.value = locations[location].get('lon', lon_wgt.value)
        location_changed[0] = False

@pn.depends(lat_wgt.param.value, lon_wgt.param.value, watch=True)
def lat_lon_sliders_handler(lat, lon, location_changed=location_changed):
    sleep(0.01)
    # If the latitude or longitude are changed other than due to a newly
    # selected location (location widget), note that this is a custom location.
    if not location_changed[0]:
        location_wgt.value = 'Custom'

## End location, date, and spatial slider widgets ##

## Spatial extents float widgets ##

# Using the `panel.depends` decorator with these widgets in Panel version 0.8.3 does not work.
def set_lat_min_by_flt(evt):
    try:
        lat_min = float(evt.new)
        lat_wgt.value = (lat_min, lat_wgt.value[1])
    except:
        return
lat_min_flt_wgt = pn.widgets.LiteralInput(name='Latitude Min', value=lat_wgt.value[0], type=float)
lat_min_flt_wgt.param.watch(set_lat_min_by_flt, 'value')

def set_lat_max_by_flt(evt):
    try:
        lat_max = float(evt.new)
        lat_wgt.value = (lat_wgt.value[0], lat_max)
    except:
        return
lat_max_flt_wgt = pn.widgets.LiteralInput(name='Latitude Max', value=lat_wgt.value[1], type=float)
lat_max_flt_wgt.param.watch(set_lat_max_by_flt, 'value')

def set_lon_min_by_flt(evt):
    try:
        lon_min = float(evt.new)
        lon_wgt.value = (lon_min, lon_wgt.value[1])
    except:
        return
lon_min_flt_wgt = pn.widgets.LiteralInput(name='Longitude Min', value=lon_wgt.value[0], type=float)
lon_min_flt_wgt.param.watch(set_lon_min_by_flt, 'value')

def set_lon_max_by_flt(evt):
    try:
        lon_max = float(evt.new)
        lon_wgt.value = (lon_wgt.value[0], lon_max)
    except:
        return
lon_max_flt_wgt = pn.widgets.LiteralInput(name='Longitude Max', value=lon_wgt.value[1], type=float)
lon_max_flt_wgt.param.watch(set_lon_max_by_flt, 'value')

## End spatial extents float widgets ##
std_height = 50
std_width = 50
widgets_row_fmt = dict(width=6*std_width)
pn.Row(
    pn.WidgetBox(
        pn.Row(location_wgt, 
               **widgets_row_fmt, height=std_height), 
        pn.Row(date_wgt, 
               **widgets_row_fmt, height=std_height),
        pn.Row(lat_min_flt_wgt, lat_max_flt_wgt, 
               **widgets_row_fmt, height=std_height),
        pn.Row(lat_wgt, 
               **widgets_row_fmt, height=std_height), 
        pn.Row(lon_min_flt_wgt, lon_max_flt_wgt, 
               **widgets_row_fmt, height=std_height),
        pn.Row(lon_wgt, 
               **widgets_row_fmt, height=std_height)),
    pn.WidgetBox("""
       ## Information
       Select a location to set the latitude and longitude sliders. <br><br>
       You can set the area with the numeric widgets. <br><br>
       You can also drag the lower (left) and upper (right) values in the sliders to set the time range and area.""", 
                 **widgets_row_fmt))


Out[7]:

Visualize the selected area


In [8]:
display_map(lat_wgt.value,lon_wgt.value)


Out[8]:

Load and Clean Data from the Data Cube


In [9]:
measurements = ['red', 'green', 'blue', 'nir', 'swir1', 'swir2', 'pixel_qa']
landsat_dataset = dc.load(latitude = lat_wgt.value,
                          longitude = lon_wgt.value,
                          platform = platform_wgt.value,
                          time = date_wgt.value,
                          product = prod_wgt.value,
                          measurements = measurements,
                          group_by='solar_day', 
                          dask_chunks={'latitude':500, 'longitude':500, 'time':1})

In [10]:
# Displays the first few values of each data array to check the content
# Latitude and Longitude numbers = number of pixels in each dimension
# Time = number of time slices in the dataset

landsat_dataset


Out[10]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • latitude: 461
    • longitude: 712
    • time: 19
    • time
      (time)
      datetime64[ns]
      2016-01-10T07:31:41.510529 ... 2016-12-27T07:31:51.423561
      units :
      seconds since 1970-01-01 00:00:00
      array(['2016-01-10T07:31:41.510529000', '2016-01-26T07:31:41.904514000',
             '2016-03-14T07:31:27.626810000', '2016-03-30T07:31:17.510312000',
             '2016-05-01T07:31:14.033799000', '2016-05-17T07:31:11.492204000',
             '2016-06-02T07:31:17.385440000', '2016-06-18T07:31:20.323225000',
             '2016-07-04T07:31:29.807543000', '2016-07-20T07:31:36.081685000',
             '2016-08-05T07:31:38.922496000', '2016-08-21T07:31:44.231010000',
             '2016-09-22T07:31:51.493566000', '2016-10-08T07:31:54.885144000',
             '2016-10-24T07:31:58.311796000', '2016-11-09T07:31:57.029450000',
             '2016-11-25T07:31:57.790207000', '2016-12-11T07:31:54.929364000',
             '2016-12-27T07:31:51.423561000'], dtype='datetime64[ns]')
    • latitude
      (latitude)
      float64
      -3.985 -3.986 ... -4.113 -4.113
      units :
      degrees_north
      resolution :
      -0.00027777777778
      array([-3.985417, -3.985694, -3.985972, ..., -4.112639, -4.112917, -4.113194])
    • longitude
      (longitude)
      float64
      39.54 39.54 39.55 ... 39.74 39.74
      units :
      degrees_east
      resolution :
      0.00027777777778
      array([39.544583, 39.544861, 39.545139, ..., 39.741528, 39.741806, 39.742083])
    • red
      (time, latitude, longitude)
      int16
      dask.array<chunksize=(1, 461, 500), meta=np.ndarray>
      units :
      1
      nodata :
      -9999
      crs :
      EPSG:4326
      Array Chunk
      Bytes 12.47 MB 461.00 kB
      Shape (19, 461, 712) (1, 461, 500)
      Count 57 Tasks 38 Chunks
      Type int16 numpy.ndarray
      712 461 19
    • green
      (time, latitude, longitude)
      int16
      dask.array<chunksize=(1, 461, 500), meta=np.ndarray>
      units :
      1
      nodata :
      -9999
      crs :
      EPSG:4326
      Array Chunk
      Bytes 12.47 MB 461.00 kB
      Shape (19, 461, 712) (1, 461, 500)
      Count 57 Tasks 38 Chunks
      Type int16 numpy.ndarray
      712 461 19
    • blue
      (time, latitude, longitude)
      int16
      dask.array<chunksize=(1, 461, 500), meta=np.ndarray>
      units :
      1
      nodata :
      -9999
      crs :
      EPSG:4326
      Array Chunk
      Bytes 12.47 MB 461.00 kB
      Shape (19, 461, 712) (1, 461, 500)
      Count 57 Tasks 38 Chunks
      Type int16 numpy.ndarray
      712 461 19
    • nir
      (time, latitude, longitude)
      int16
      dask.array<chunksize=(1, 461, 500), meta=np.ndarray>
      units :
      1
      nodata :
      -9999
      crs :
      EPSG:4326
      Array Chunk
      Bytes 12.47 MB 461.00 kB
      Shape (19, 461, 712) (1, 461, 500)
      Count 57 Tasks 38 Chunks
      Type int16 numpy.ndarray
      712 461 19
    • swir1
      (time, latitude, longitude)
      int16
      dask.array<chunksize=(1, 461, 500), meta=np.ndarray>
      units :
      1
      nodata :
      -9999
      crs :
      EPSG:4326
      Array Chunk
      Bytes 12.47 MB 461.00 kB
      Shape (19, 461, 712) (1, 461, 500)
      Count 57 Tasks 38 Chunks
      Type int16 numpy.ndarray
      712 461 19
    • swir2
      (time, latitude, longitude)
      int16
      dask.array<chunksize=(1, 461, 500), meta=np.ndarray>
      units :
      1
      nodata :
      -9999
      crs :
      EPSG:4326
      Array Chunk
      Bytes 12.47 MB 461.00 kB
      Shape (19, 461, 712) (1, 461, 500)
      Count 57 Tasks 38 Chunks
      Type int16 numpy.ndarray
      712 461 19
    • pixel_qa
      (time, latitude, longitude)
      uint16
      dask.array<chunksize=(1, 461, 500), meta=np.ndarray>
      units :
      1
      nodata :
      1
      flags_definition :
      {'snow': {'bits': 4, 'values': {'0': 'no_snow', '1': 'snow'}}, 'clear': {'bits': 1, 'values': {'0': 'no_clear_land', '1': 'clear_land'}}, 'cloud': {'bits': 5, 'values': {'0': 'no_cloud', '1': 'cloud'}}, 'water': {'bits': 2, 'values': {'0': 'no_water', '1': 'water'}}, 'nodata': {'bits': 0, 'values': {'0': False, '1': True}}, 'pixel_qa': {'bits': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'values': {'1': 'Fill', '2': 'Clear', '4': 'Water', '8': 'Cloud shadow', '16': 'Snow', '32': 'Cloud', '64': 'Cloud Confidence Low Bit', '128': 'Cloud Confidence High Bit', '256': 'Cirrus Confidence Low Bit', '512': 'Cirrus Confidence High Bit', '1024': 'Unused', '2048': 'Unused', '4096': 'Unused', '8192': 'Unused', '16384': 'Unused', '32786': 'Unused'}, 'description': 'Level 2 Pixel Quality Band'}, 'cloud_shadow': {'bits': 3, 'values': {'0': 'no_cloud_shadow', '1': 'cloud_shadow'}}, 'cloud_confidence': {'bits': [6, 7], 'values': {'0': 'none', '1': 'low', '2': 'medium', '3': 'high'}}, 'cirrus_confidence': {'bits': [8, 9], 'values': {'0': 'none', '1': 'low', '2': 'medium', '3': 'high'}}, 'terrain_occlusion': {'bits': 10, 'values': {'0': 'no_occlusion', '1': 'occlusion'}}}
      crs :
      EPSG:4326
      Array Chunk
      Bytes 12.47 MB 461.00 kB
      Shape (19, 461, 712) (1, 461, 500)
      Count 57 Tasks 38 Chunks
      Type uint16 numpy.ndarray
      712 461 19
  • crs :
    EPSG:4326

Mask out clouds


In [11]:
from utils.clean_mask import landsat_qa_clean_mask, landsat_clean_mask_invalid

cloud_mask = (landsat_qa_clean_mask(landsat_dataset, platform=platform_wgt.value) & \
             (landsat_dataset != -9999).to_array().any('variable') & \
             landsat_clean_mask_invalid(landsat_dataset))
cleaned_dataset = landsat_dataset.where(cloud_mask).drop('pixel_qa')

In [12]:
cloud_mask = cloud_mask.persist()
cleaned_dataset = cleaned_dataset.persist()

Create Mosaics

Median Mosaic
Masks clouds from imagery using the median-valued cloud-free pixels in the time series. More specifically, each band (e.g. red) of each pixel is assigned its median across time. So this mosaic method generates values that are not in the dataset.

Geomedian Mosaic
Masks clouds from imagery using the geomedian-valued cloud-free pixels in the time series, which maintains the spectral band relationships. That is, this is a median through time for all bands considered collectively rather than separately, as is the case in a median mosaic. This algorithm was developed by Geoscience Australia and produces excellent cloud-filtered mosaic products for further analysis.

For more information, see the following paper: High-Dimensional Pixel Composites from Earth Observation Time Series, by, Dale Roberts, Norman Mueller, and Alexis McIntyre. IEEE Transactions on Geoscience and Remote Sensing, Vol. 55. No. 11, November 2017.

Most Recent and Least Recent Mosaic
Masks clouds from imagery using the most or least recent cloud-free pixels in the time series.

Max NDVI Mosaic
Masks clouds from imagery using the maximum NDVI across time for cloud-free pixels in the time series. The max NDVI mosaic will represent the highest amount of green vegetation for each pixel.


In [13]:
from functools import partial
from dask.diagnostics import Callback
from xarray.ufuncs import isnan as xr_nan
from holoviews.operation.datashader import datashade, rasterize

from utils.dc_mosaic import \
    create_median_mosaic, create_hdmedians_multiple_band_mosaic, \
    create_mosaic, create_min_max_var_mosaic
from utils.vegetation import NDVI
from utils.plotter_utils import figure_ratio

mosaic_methods = {'Median': create_median_mosaic, 
                  'Geomedian': create_hdmedians_multiple_band_mosaic, 
                  'Most Recent': create_mosaic, 
                  'Least Recent': partial(create_mosaic, reverse_time=True), 
                  'Max NDVI': partial(create_min_max_var_mosaic, var='NDVI', min_max='max')}
mosaic_methods_wgt = pn.widgets.RadioButtonGroup(
    value='Median', options=list(mosaic_methods.keys()))

computing_composite = [False]
composites = {}
composite_rgbs = {}
selected_composite_type = [None]
prev_plat = [platform_wgt.value]
prev_prod = [prod_wgt.value]
prev_time = [date_wgt.value]
prev_lat = [lat_wgt.value]
prev_lon = [lon_wgt.value]
@pn.depends(mosaic_method_name=mosaic_methods_wgt.param.value)
def mosaic(mosaic_method_name, **kwargs):
    computing_composite[0] = True
    # Create the composite if we have not already.
    selected_composite_type[0] = mosaic_method_name.lower().replace(' ', '_')
    # If the time or lat/lon extents have changed, clear the composites.
    if prev_plat[0] != platform_wgt.value or prev_prod[0] != prod_wgt.value or \
       prev_time[0] != date_wgt.value or prev_lat[0] != lat_wgt.value or prev_lon[0] != lon_wgt.value:
        for composite_type in list(composites.keys()):
            del composites[composite_type]
    prev_plat[0] = platform_wgt.value
    prev_prod[0] = prod_wgt.value
    prev_time[0] = date_wgt.value
    prev_lat[0] = lat_wgt.value
    prev_lon[0] = lon_wgt.value
    
    if selected_composite_type[0] not in composites:
        if mosaic_method_name == 'Max NDVI':
            cleaned_dataset['NDVI'] = NDVI(cleaned_dataset)
        composites[selected_composite_type[0]] = mosaic_methods[mosaic_method_name](cleaned_dataset, cloud_mask).persist()
    computing_composite[0] = False
    # Create the RGB image.
    if selected_composite_type[0] not in composite_rgbs:
        rgb = composites[selected_composite_type[0]][['red', 'green', 'blue']].to_array().transpose('latitude', 'longitude', 'variable')
        rgb_scaled = (rgb - rgb.min()) / (rgb.max() - rgb.min())
        composite_rgbs[selected_composite_type[0]] = rgb_scaled.where(~xr_nan(rgb_scaled), 0).persist()
    
    return hv.RGB(composite_rgbs[selected_composite_type[0]], kdims=['longitude', 'latitude'])

width, height = figure_ratio(cleaned_dataset, fixed_width=800)
width, height = int(width), int(height)
dmap = hv.DynamicMap(mosaic)
pn.Column(pn.WidgetBox("## Mosaics to Compare", mosaic_methods_wgt, width=width), 
          rasterize(dmap).opts(height=height, width=width))


Out[13]:

Create GeoTIFF Output Products

Only the composites that were selected at least once above for the currently selected area will be exported


In [14]:
import os
from time import sleep
from utils.import_export import export_slice_to_geotiff

# Wait for the composite to be obtained.
while computing_composite[0]:
    sleep(1)

output_dir = 'output/geotiffs/custom_mosaics'
if not os.path.exists(output_dir):
    os.makedirs(output_dir)
for composite_type in composites:
    export_slice_to_geotiff(composites[composite_type], f'{output_dir}/{composite_type}_composite.tif')

See the exported GeoTIFF files


In [15]:
!ls -lah output/geotiffs/custom_mosaics


total 133M
drwxrwsr-x 2 jovyan users 4.0K Jul  7 16:55 .
drwxrwsr-x 3 jovyan users 4.0K Jul  6 15:15 ..
-rw-rw-r-- 1 jovyan users 7.6M Jul  6 21:41 geomedian_composite.tif
-rw-rw-r-- 1 jovyan users  40M Jul  6 15:15 least_recent_composite.tif
-rw-rw-r-- 1 jovyan users  40M Jul  6 15:15 max_ndvi_composite.tif
-rw-r--r-- 1 jovyan users 7.6M Jul  7 16:55 median_composite.tif
-rw-rw-r-- 1 jovyan users  40M Jul  6 15:15 most_recent_composite.tif