Cloud-Filtered Custom Mosaics

Notebook Summary

This notebook can be used to create custom Landsat cloud-filtered mosaics for any time period and location. The mosaics can be output as GeoTIFF products for analysis in external GIS tools. The following mosaics are possible:

Median = midpoint of spectral data
Geomedian = Australian median product with improved spectral consistency
Most-Recent = most-recent clear pixel
Max-NDVI = maximum vegetation response

Users should review the DCAL Cloud Statistics notebook for more information about the cloud statistics for any given temporal and spatial combination. An understanding of the underlying data is important for creating a valid mosaic for further analyses. In many cases, cloud contamination will create poor mosaics. With a careful review of the cloud coverage over a given region and time period, it is possible to improve the mosaics and avoid false outputs.

Index

Import Dependencies and Connect to the Data Cube
Choose Platforms and Products
Get the Extents of the Cube
Define the Analysis Parameters
Load and Clean Data from the Data Cube
Create Mosaics
Create GeoTIFF Output Products

Import Dependencies and Connect to the Data Cube



In [1]:

    
# Enable importing of utilities.
import sys
sys.path.append('..')

# Supress warnings.
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import xarray as xr
import pandas as pd
import matplotlib.pyplot as plt
import bokeh
import holoviews as hv
import panel as pn

# Load Data Cube configuration.
import datacube
import utils.data_access_api as dc_api  
api = dc_api.DataAccessApi()
dc = api.dc



In [2]:

    
# Configure visualization libraries.
pn.extension()
# Use the bokeh backend for HoloViews.
hv.extension('bokeh')
bokeh.io.output_notebook()









    














    














    














    














    









  
  












    





    
        
        Loading BokehJS ...



In [3]:

    
from utils.dask import create_local_dask_cluster

client = create_local_dask_cluster(spare_mem='3Gb')
client









    Out[3]:







Client

  Scheduler: tcp://127.0.0.1:39355
  
Dashboard: /user/jcrattz/proxy/8787/status



Cluster

  Workers: 3
  Cores: 6
  Memory: 13.64 GB

Choose Platforms and Products

<p style="color:red";>CHANGE INPUTS BELOW



In [4]:

    
def get_prods_for_platform(platform):
    prod_info = dc.list_products()
    return list(prod_info.loc[prod_info.platform == platform].name)

platforms = ["LANDSAT_7", "LANDSAT_8"]
platform_wgt = pn.widgets.Select(name='platform', value='LANDSAT_8', options=platforms)
prod_wgt = pn.widgets.Select(name='product', options=get_prods_for_platform(platform_wgt.value))

@pn.depends(platform_wgt.param.value, watch=True)
def update_prod_list(platform):
    prods = get_prods_for_platform(platform)
    prod_wgt.options = prods
    prod_wgt.value = prods[0] if len(prods) > 0 else None

pn.Row(platform_wgt, prod_wgt)









    









    Out[4]:

Get the Extents of the Cube



In [5]:

    
from utils.dc_load import get_product_extents
from utils.dc_time import dt_to_str

full_lat, full_lon, min_max_dates = get_product_extents(api, platform_wgt.value, prod_wgt.value)

# Print the extents of the combined data.
print("Latitude Extents:", full_lat)
print("Longitude Extents:", full_lon)
print("Time Extents:", list(map(dt_to_str, min_max_dates)))









    



Latitude Extents: (-12.63361111121218, 18.40166666681388)
Longitude Extents: (-25.47250000020378, 44.01000000035208)
Time Extents: ['2013-03-21', '2020-01-27']

Visualize the available area



In [6]:

    
from utils.dc_display_map import display_map
display_map(full_lat, full_lon)









    Out[6]:

Define the Analysis Parameters

<p style="color:red";>CHANGE INPUTS BELOW



In [7]:

    
from time import sleep
import datetime

# Select an analysis region (Lat-Lon) within the extents listed above. 
# HINT: Keep your region small (<0.5 deg square) to avoid memory overload issues
# Select a time period (Min-Max) within the extents

## Location, date, and spatial slider widgets ##

locations = {'Mombasa, Kenya': {'lat': (-4.1131, -3.9853), 'lon': (39.5445, 39.7420)},
         'Freetown, Sierra Leone': {'lat': (8.3267, 8.5123), 'lon': (-13.3109, -13.1197)},
         'Tano-Offin Forest - Ghana': {'lat': (6.5814, 6.8978), 'lon': (-2.2955, -1.9395)},
         'Custom': {}}
location_wgt = pn.widgets.Select(name='Location', value='Mombasa, Kenya', options=list(locations.keys()))
date_wgt = pn.widgets.DateRangeSlider(name='Time Range', start=min_max_dates[0], end=min_max_dates[1], 
                                      value=(datetime.datetime(2016,1,1), datetime.datetime(2016,12,31)))
lat_wgt = pn.widgets.RangeSlider(name='Latitude', start=full_lat[0], end=full_lat[1], step=0.1,
                                 value=locations[location_wgt.value]['lat'])
lon_wgt = pn.widgets.RangeSlider(name='Longitude', start=full_lon[0], end=full_lon[1], step=0.1,
                                 value=locations[location_wgt.value]['lon'])

# If true, denotes that changes in lat/lon are caused by a location widget change,
# not a change in one of the 4 float widgets (lat min, lat max, lon min, lon max).
location_changed = [False]
@pn.depends(location_wgt.param.value, watch=True)
def location_handler(location, location_changed=location_changed):
    # Update the lat/lon sliders with the values for the selected location.
    if location != 'Custom':
        location_changed[0] = True
        lat_wgt.value = locations[location].get('lat', lat_wgt.value)
        lon_wgt.value = locations[location].get('lon', lon_wgt.value)
        location_changed[0] = False

@pn.depends(lat_wgt.param.value, lon_wgt.param.value, watch=True)
def lat_lon_sliders_handler(lat, lon, location_changed=location_changed):
    sleep(0.01)
    # If the latitude or longitude are changed other than due to a newly
    # selected location (location widget), note that this is a custom location.
    if not location_changed[0]:
        location_wgt.value = 'Custom'

## End location, date, and spatial slider widgets ##

## Spatial extents float widgets ##

# Using the `panel.depends` decorator with these widgets in Panel version 0.8.3 does not work.
def set_lat_min_by_flt(evt):
    try:
        lat_min = float(evt.new)
        lat_wgt.value = (lat_min, lat_wgt.value[1])
    except:
        return
lat_min_flt_wgt = pn.widgets.LiteralInput(name='Latitude Min', value=lat_wgt.value[0], type=float)
lat_min_flt_wgt.param.watch(set_lat_min_by_flt, 'value')

def set_lat_max_by_flt(evt):
    try:
        lat_max = float(evt.new)
        lat_wgt.value = (lat_wgt.value[0], lat_max)
    except:
        return
lat_max_flt_wgt = pn.widgets.LiteralInput(name='Latitude Max', value=lat_wgt.value[1], type=float)
lat_max_flt_wgt.param.watch(set_lat_max_by_flt, 'value')

def set_lon_min_by_flt(evt):
    try:
        lon_min = float(evt.new)
        lon_wgt.value = (lon_min, lon_wgt.value[1])
    except:
        return
lon_min_flt_wgt = pn.widgets.LiteralInput(name='Longitude Min', value=lon_wgt.value[0], type=float)
lon_min_flt_wgt.param.watch(set_lon_min_by_flt, 'value')

def set_lon_max_by_flt(evt):
    try:
        lon_max = float(evt.new)
        lon_wgt.value = (lon_wgt.value[0], lon_max)
    except:
        return
lon_max_flt_wgt = pn.widgets.LiteralInput(name='Longitude Max', value=lon_wgt.value[1], type=float)
lon_max_flt_wgt.param.watch(set_lon_max_by_flt, 'value')

## End spatial extents float widgets ##
std_height = 50
std_width = 50
widgets_row_fmt = dict(width=6*std_width)
pn.Row(
    pn.WidgetBox(
        pn.Row(location_wgt, 
               **widgets_row_fmt, height=std_height), 
        pn.Row(date_wgt, 
               **widgets_row_fmt, height=std_height),
        pn.Row(lat_min_flt_wgt, lat_max_flt_wgt, 
               **widgets_row_fmt, height=std_height),
        pn.Row(lat_wgt, 
               **widgets_row_fmt, height=std_height), 
        pn.Row(lon_min_flt_wgt, lon_max_flt_wgt, 
               **widgets_row_fmt, height=std_height),
        pn.Row(lon_wgt, 
               **widgets_row_fmt, height=std_height)),
    pn.WidgetBox("""
       ## Information
       Select a location to set the latitude and longitude sliders. <br><br>
       You can set the area with the numeric widgets. <br><br>
       You can also drag the lower (left) and upper (right) values in the sliders to set the time range and area.""", 
                 **widgets_row_fmt))









    









    Out[7]:

Visualize the selected area



In [8]:

    
display_map(lat_wgt.value,lon_wgt.value)









    Out[8]:

Load and Clean Data from the Data Cube



In [9]:

    
measurements = ['red', 'green', 'blue', 'nir', 'swir1', 'swir2', 'pixel_qa']
landsat_dataset = dc.load(latitude = lat_wgt.value,
                          longitude = lon_wgt.value,
                          platform = platform_wgt.value,
                          time = date_wgt.value,
                          product = prod_wgt.value,
                          measurements = measurements,
                          group_by='solar_day', 
                          dask_chunks={'latitude':500, 'longitude':500, 'time':1})



In [10]:

    
# Displays the first few values of each data array to check the content
# Latitude and Longitude numbers = number of pixels in each dimension
# Time = number of time slices in the dataset

landsat_dataset









    Out[10]:





xarray.Dataset
Dimensions:latitude: 461
longitude: 712
time: 19
Coordinates: (3)
time
(time)
datetime64[ns]
2016-01-10T07:31:41.510529 ... 2016-12-27T07:31:51.423561
units :
seconds since 1970-01-01 00:00:00
array(['2016-01-10T07:31:41.510529000', '2016-01-26T07:31:41.904514000',
       '2016-03-14T07:31:27.626810000', '2016-03-30T07:31:17.510312000',
       '2016-05-01T07:31:14.033799000', '2016-05-17T07:31:11.492204000',
       '2016-06-02T07:31:17.385440000', '2016-06-18T07:31:20.323225000',
       '2016-07-04T07:31:29.807543000', '2016-07-20T07:31:36.081685000',
       '2016-08-05T07:31:38.922496000', '2016-08-21T07:31:44.231010000',
       '2016-09-22T07:31:51.493566000', '2016-10-08T07:31:54.885144000',
       '2016-10-24T07:31:58.311796000', '2016-11-09T07:31:57.029450000',
       '2016-11-25T07:31:57.790207000', '2016-12-11T07:31:54.929364000',
       '2016-12-27T07:31:51.423561000'], dtype='datetime64[ns]')
latitude
(latitude)
float64
-3.985 -3.986 ... -4.113 -4.113
units :
degrees_north
resolution :
-0.00027777777778
array([-3.985417, -3.985694, -3.985972, ..., -4.112639, -4.112917, -4.113194])
longitude
(longitude)
float64
39.54 39.54 39.55 ... 39.74 39.74
units :
degrees_east
resolution :
0.00027777777778
array([39.544583, 39.544861, 39.545139, ..., 39.741528, 39.741806, 39.742083])
Data variables: (7)
red
(time, latitude, longitude)
int16
dask.array<chunksize=(1, 461, 500), meta=np.ndarray>
units :
1
nodata :
-9999
crs :
EPSG:4326




  
      Array  Chunk 
  
  
     Bytes  12.47 MB   461.00 kB 
     Shape  (19, 461, 712)   (1, 461, 500) 
     Count  57 Tasks  38 Chunks 
     Type  int16  numpy.ndarray 
  






green
(time, latitude, longitude)
int16
dask.array<chunksize=(1, 461, 500), meta=np.ndarray>
units :
1
nodata :
-9999
crs :
EPSG:4326




  
      Array  Chunk 
  
  
     Bytes  12.47 MB   461.00 kB 
     Shape  (19, 461, 712)   (1, 461, 500) 
     Count  57 Tasks  38 Chunks 
     Type  int16  numpy.ndarray 
  






blue
(time, latitude, longitude)
int16
dask.array<chunksize=(1, 461, 500), meta=np.ndarray>
units :
1
nodata :
-9999
crs :
EPSG:4326




  
      Array  Chunk 
  
  
     Bytes  12.47 MB   461.00 kB 
     Shape  (19, 461, 712)   (1, 461, 500) 
     Count  57 Tasks  38 Chunks 
     Type  int16  numpy.ndarray 
  






nir
(time, latitude, longitude)
int16
dask.array<chunksize=(1, 461, 500), meta=np.ndarray>
units :
1
nodata :
-9999
crs :
EPSG:4326




  
      Array  Chunk 
  
  
     Bytes  12.47 MB   461.00 kB 
     Shape  (19, 461, 712)   (1, 461, 500) 
     Count  57 Tasks  38 Chunks 
     Type  int16  numpy.ndarray 
  






swir1
(time, latitude, longitude)
int16
dask.array<chunksize=(1, 461, 500), meta=np.ndarray>
units :
1
nodata :
-9999
crs :
EPSG:4326




  
      Array  Chunk 
  
  
     Bytes  12.47 MB   461.00 kB 
     Shape  (19, 461, 712)   (1, 461, 500) 
     Count  57 Tasks  38 Chunks 
     Type  int16  numpy.ndarray 
  






swir2
(time, latitude, longitude)
int16
dask.array<chunksize=(1, 461, 500), meta=np.ndarray>
units :
1
nodata :
-9999
crs :
EPSG:4326




  
      Array  Chunk 
  
  
     Bytes  12.47 MB   461.00 kB 
     Shape  (19, 461, 712)   (1, 461, 500) 
     Count  57 Tasks  38 Chunks 
     Type  int16  numpy.ndarray 
  






pixel_qa
(time, latitude, longitude)
uint16
dask.array<chunksize=(1, 461, 500), meta=np.ndarray>
units :
1
nodata :
1
flags_definition :
{'snow': {'bits': 4, 'values': {'0': 'no_snow', '1': 'snow'}}, 'clear': {'bits': 1, 'values': {'0': 'no_clear_land', '1': 'clear_land'}}, 'cloud': {'bits': 5, 'values': {'0': 'no_cloud', '1': 'cloud'}}, 'water': {'bits': 2, 'values': {'0': 'no_water', '1': 'water'}}, 'nodata': {'bits': 0, 'values': {'0': False, '1': True}}, 'pixel_qa': {'bits': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'values': {'1': 'Fill', '2': 'Clear', '4': 'Water', '8': 'Cloud shadow', '16': 'Snow', '32': 'Cloud', '64': 'Cloud Confidence Low Bit', '128': 'Cloud Confidence High Bit', '256': 'Cirrus Confidence Low Bit', '512': 'Cirrus Confidence High Bit', '1024': 'Unused', '2048': 'Unused', '4096': 'Unused', '8192': 'Unused', '16384': 'Unused', '32786': 'Unused'}, 'description': 'Level 2 Pixel Quality Band'}, 'cloud_shadow': {'bits': 3, 'values': {'0': 'no_cloud_shadow', '1': 'cloud_shadow'}}, 'cloud_confidence': {'bits': [6, 7], 'values': {'0': 'none', '1': 'low', '2': 'medium', '3': 'high'}}, 'cirrus_confidence': {'bits': [8, 9], 'values': {'0': 'none', '1': 'low', '2': 'medium', '3': 'high'}}, 'terrain_occlusion': {'bits': 10, 'values': {'0': 'no_occlusion', '1': 'occlusion'}}}
crs :
EPSG:4326




  
      Array  Chunk 
  
  
     Bytes  12.47 MB   461.00 kB 
     Shape  (19, 461, 712)   (1, 461, 500) 
     Count  57 Tasks  38 Chunks 
     Type  uint16  numpy.ndarray 
  






Attributes: (1)
crs :
EPSG:4326

Mask out clouds



In [11]:

    
from utils.clean_mask import landsat_qa_clean_mask, landsat_clean_mask_invalid

cloud_mask = (landsat_qa_clean_mask(landsat_dataset, platform=platform_wgt.value) & \
             (landsat_dataset != -9999).to_array().any('variable') & \
             landsat_clean_mask_invalid(landsat_dataset))
cleaned_dataset = landsat_dataset.where(cloud_mask).drop('pixel_qa')



In [12]:

    
cloud_mask = cloud_mask.persist()
cleaned_dataset = cleaned_dataset.persist()

Create Mosaics

Median Mosaic
Masks clouds from imagery using the median-valued cloud-free pixels in the time series. More specifically, each band (e.g. red) of each pixel is assigned its median across time. So this mosaic method generates values that are not in the dataset.

Geomedian Mosaic
Masks clouds from imagery using the geomedian-valued cloud-free pixels in the time series, which maintains the spectral band relationships. That is, this is a median through time for all bands considered collectively rather than separately, as is the case in a median mosaic. This algorithm was developed by Geoscience Australia and produces excellent cloud-filtered mosaic products for further analysis.

For more information, see the following paper: High-Dimensional Pixel Composites from Earth Observation Time Series, by, Dale Roberts, Norman Mueller, and Alexis McIntyre. IEEE Transactions on Geoscience and Remote Sensing, Vol. 55. No. 11, November 2017.

Most Recent and Least Recent Mosaic
Masks clouds from imagery using the most or least recent cloud-free pixels in the time series.

Max NDVI Mosaic
Masks clouds from imagery using the maximum NDVI across time for cloud-free pixels in the time series. The max NDVI mosaic will represent the highest amount of green vegetation for each pixel.



In [13]:

    
from functools import partial
from dask.diagnostics import Callback
from xarray.ufuncs import isnan as xr_nan
from holoviews.operation.datashader import datashade, rasterize

from utils.dc_mosaic import \
    create_median_mosaic, create_hdmedians_multiple_band_mosaic, \
    create_mosaic, create_min_max_var_mosaic
from utils.vegetation import NDVI
from utils.plotter_utils import figure_ratio

mosaic_methods = {'Median': create_median_mosaic, 
                  'Geomedian': create_hdmedians_multiple_band_mosaic, 
                  'Most Recent': create_mosaic, 
                  'Least Recent': partial(create_mosaic, reverse_time=True), 
                  'Max NDVI': partial(create_min_max_var_mosaic, var='NDVI', min_max='max')}
mosaic_methods_wgt = pn.widgets.RadioButtonGroup(
    value='Median', options=list(mosaic_methods.keys()))

computing_composite = [False]
composites = {}
composite_rgbs = {}
selected_composite_type = [None]
prev_plat = [platform_wgt.value]
prev_prod = [prod_wgt.value]
prev_time = [date_wgt.value]
prev_lat = [lat_wgt.value]
prev_lon = [lon_wgt.value]
@pn.depends(mosaic_method_name=mosaic_methods_wgt.param.value)
def mosaic(mosaic_method_name, **kwargs):
    computing_composite[0] = True
    # Create the composite if we have not already.
    selected_composite_type[0] = mosaic_method_name.lower().replace(' ', '_')
    # If the time or lat/lon extents have changed, clear the composites.
    if prev_plat[0] != platform_wgt.value or prev_prod[0] != prod_wgt.value or \
       prev_time[0] != date_wgt.value or prev_lat[0] != lat_wgt.value or prev_lon[0] != lon_wgt.value:
        for composite_type in list(composites.keys()):
            del composites[composite_type]
    prev_plat[0] = platform_wgt.value
    prev_prod[0] = prod_wgt.value
    prev_time[0] = date_wgt.value
    prev_lat[0] = lat_wgt.value
    prev_lon[0] = lon_wgt.value
    
    if selected_composite_type[0] not in composites:
        if mosaic_method_name == 'Max NDVI':
            cleaned_dataset['NDVI'] = NDVI(cleaned_dataset)
        composites[selected_composite_type[0]] = mosaic_methods[mosaic_method_name](cleaned_dataset, cloud_mask).persist()
    computing_composite[0] = False
    # Create the RGB image.
    if selected_composite_type[0] not in composite_rgbs:
        rgb = composites[selected_composite_type[0]][['red', 'green', 'blue']].to_array().transpose('latitude', 'longitude', 'variable')
        rgb_scaled = (rgb - rgb.min()) / (rgb.max() - rgb.min())
        composite_rgbs[selected_composite_type[0]] = rgb_scaled.where(~xr_nan(rgb_scaled), 0).persist()
    
    return hv.RGB(composite_rgbs[selected_composite_type[0]], kdims=['longitude', 'latitude'])

width, height = figure_ratio(cleaned_dataset, fixed_width=800)
width, height = int(width), int(height)
dmap = hv.DynamicMap(mosaic)
pn.Column(pn.WidgetBox("## Mosaics to Compare", mosaic_methods_wgt, width=width), 
          rasterize(dmap).opts(height=height, width=width))

Create GeoTIFF Output Products

Only the composites that were selected at least once above for the currently selected area will be exported



In [14]:

    
import os
from time import sleep
from utils.import_export import export_slice_to_geotiff

# Wait for the composite to be obtained.
while computing_composite[0]:
    sleep(1)

output_dir = 'output/geotiffs/custom_mosaics'
if not os.path.exists(output_dir):
    os.makedirs(output_dir)
for composite_type in composites:
    export_slice_to_geotiff(composites[composite_type], f'{output_dir}/{composite_type}_composite.tif')

See the exported GeoTIFF files



In [15]:

    
!ls -lah output/geotiffs/custom_mosaics









    



total 133M
drwxrwsr-x 2 jovyan users 4.0K Jul  7 16:55 .
drwxrwsr-x 3 jovyan users 4.0K Jul  6 15:15 ..
-rw-rw-r-- 1 jovyan users 7.6M Jul  6 21:41 geomedian_composite.tif
-rw-rw-r-- 1 jovyan users  40M Jul  6 15:15 least_recent_composite.tif
-rw-rw-r-- 1 jovyan users  40M Jul  6 15:15 max_ndvi_composite.tif
-rw-r--r-- 1 jovyan users 7.6M Jul  7 16:55 median_composite.tif
-rw-rw-r-- 1 jovyan users  40M Jul  6 15:15 most_recent_composite.tif