Notebook Title (Set to title of notebook)


Notebook Summary

This section of the introduction should provide an overview of the notebook's context and intent. It should explain why the notebook exists, what it does, and why.


Index

As we will see below, every section in the index must have a matching heading in the body, and primary sections (level 1 - e.g. "Choose Platforms and Products") must by hyperlinked to their headings in the body.

Remember to check that (1) section headings in the index successfully link to the headings in the body and (2) the arrows by the headings in the body successfully link to the top of the notebook (see the <a> tag at the top of this cell).

Import Dependencies and Connect to the Data Cube

Option 1: Simpler - typically intended for internal use, quickly creating a first version of a notebook without widgets


In [ ]:
# Enable importing of our utilities.
import sys
sys.path.append('..')

# Import the most commonly used packages in our notebooks.
import datacube # Facilitates loading data from the Data Cube
import numpy as np # Numerical processing, including time
import pandas as pd # Tabular data structures - used for data analysis
import xarray as xr # Coordinate indexed arrays
import matplotlib.pyplot as plt

# Connect to the Data Cube.
dc = datacube.Datacube(app='notebook_name')

Option 2: User-friendly - allowing users to set notebook parameters with widgets


In [ ]:
# Enable importing of our utilities.
import sys
sys.path.append('..')

# Import the most commonly used packages in our notebooks.
import datacube
import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt

# Import extra widget and plotting libraries.
import bokeh
import holoviews as hv
import panel as pn

# Connect to the Data Cube.
dc = datacube.Datacube(app='notebook_name')

In [ ]:
# Configure visualization libraries.
pn.extension()
# Use the bokeh backend for HoloViews.
hv.extension('bokeh')
bokeh.io.output_notebook()

Choose Platforms and Products

Option 1: Simpler - Show available platforms and products, choose with text


In [ ]:
# Get available products
products_info = dc.list_products()

# List LANDSAT 7 products
print("LANDSAT 7 Products:")
products_info[["platform", "name"]][products_info.platform == "LANDSAT_7"]

In [ ]:
# List LANDSAT 8 products
print("LANDSAT 8 Products:")
products_info[["platform", "name"]][products_info.platform == "LANDSAT_8"]

In [ ]:
platform = "LANDSAT_8"
product = "ls8_usgs_sr_scene"

Option 2: User-friendly - Allow selection of platforms and products with widgets


In [ ]:
def get_prods_for_platform(platform):
    prod_info = dc.list_products()
    return list(prod_info.loc[prod_info.platform == platform].name)

platforms = ["LANDSAT_7", "LANDSAT_8"]
platform_wgt = pn.widgets.Select(name='platform', value='LANDSAT_8', options=platforms)
prod_wgt = pn.widgets.Select(name='product', options=get_prods_for_platform(platform_wgt.value))

@pn.depends(platform_wgt.param.value, watch=True)
def update_prod_list(platform):
    prods = get_prods_for_platform(platform)
    prod_wgt.options = prods
    prod_wgt.value = prods[0] if len(prods) > 0 else None

pn.Row(platform_wgt, prod_wgt)

Get the Extents of the Cube

Option 1: Simpler - if no widgets were used above


In [ ]:
from utils.data_cube_utilities.dc_time import dt_to_str

metadata = dc.load(platform=platform, product=product, measurements=[])

full_lat = metadata.latitude.values[[-1,0]]
full_lon = metadata.longitude.values[[0,-1]]
min_max_dates = list(map(dt_to_str, map(pd.to_datetime, metadata.time.values[[0,-1]])))

# Print the extents of the combined data.
print("Latitude Extents:", full_lat)
print("Longitude Extents:", full_lon)
print("Time Extents:", min_max_dates)

Option 2: User-friendly - if widgets were used above


In [ ]:
from utils.data_cube_utilities.dc_time import dt_to_str

metadata = dc.load(platform=platform_wgt.value, product=prod_wgt.value, measurements=[])

full_lat = metadata.latitude.values[[-1,0]]
full_lon = metadata.longitude.values[[0,-1]]
min_max_dates = list(map(dt_to_str, map(pd.to_datetime, metadata.time.values[[0,-1]])))

# Print the extents of the combined data.
print("Latitude Extents:", full_lat)
print("Longitude Extents:", full_lon)
print("Time Extents:", min_max_dates)

Visualize the available area


In [ ]:
from utils.data_cube_utilities.dc_display_map import display_map
display_map(full_lat, full_lon)

Define the Analysis Parameters

Analysis parameters

An optional section to inform the user of any parameters they'll need to configure to run the notebook, such as spatial and temporal extents (latitude, longitude, easting, northing, time), although those may be considered simple and common enough to omit here:

  • param_name_1: Simple description (e.g. example_value). Advice about appropriate values to choose for this parameter.
  • param_name_2: Simple description (e.g. example_value). Advice about appropriate values to choose for this parameter.

Option 1: Simpler - Allow selection of location and time by uncommenting a location's lat, lon, and time_extents variables and commenting those for other locations


In [ ]:
# Mombasa, Kenya
lat = (-4.1131, -3.9853)
lon = (39.5445, 39.7420)
time_extents = ("2014-01-01", "2014-12-31")

# Freetown, Sierra Leone
# lat = (8.3267, 8.5123)
# lon = (-13.3109, -13.1197)
# time_extents = ("2014-01-01", "2014-12-31")

# Tano-Offin Forest - Ghana
# lat = (6.5814, 6.8978)
# lon = (-2.2955, -1.9395)
# time_extents = ("2014-01-01", "2014-12-31")

Option 2: User-friendly - Allow selection of location and time with widgets


In [ ]:
from time import sleep
import datetime

def str_date_to_dt(date):
    return datetime.datetime.strptime(date, '%Y-%m-%d')

# Select an analysis region (Lat-Lon) within the extents listed above. 
# HINT: Keep your region small (<0.5 deg square) to avoid memory overload issues
# Select a time period (Min-Max) within the extents

## Location, date, and spatial slider widgets ##

locations = {'Mombasa, Kenya': {'lat': (-4.1131, -3.9853), 'lon': (39.5445, 39.7420)},
         'Freetown, Sierra Leone': {'lat': (8.3267, 8.5123), 'lon': (-13.3109, -13.1197)},
         'Tano-Offin Forest - Ghana': {'lat': (6.5814, 6.8978), 'lon': (-2.2955, -1.9395)},
         'Custom': {}}
location_wgt = pn.widgets.Select(name='Location', value='Mombasa, Kenya', options=list(locations.keys()))
date_wgt = pn.widgets.DateRangeSlider(name='Time Range', 
                                      start=str_date_to_dt(min_max_dates[0]), 
                                      end=str_date_to_dt(min_max_dates[1]), 
                                      value=(datetime.datetime(2016,1,1), datetime.datetime(2016,12,31)))
lat_wgt = pn.widgets.RangeSlider(name='Latitude', start=full_lat[0], end=full_lat[1], step=0.1,
                                 value=locations[location_wgt.value]['lat'])
lon_wgt = pn.widgets.RangeSlider(name='Longitude', start=full_lon[0], end=full_lon[1], step=0.1,
                                 value=locations[location_wgt.value]['lon'])

# If true, denotes that changes in lat/lon are caused by a location widget change,
# not a change in one of the 4 float widgets (lat min, lat max, lon min, lon max).
location_changed = [False]
@pn.depends(location_wgt.param.value, watch=True)
def location_handler(location, location_changed=location_changed):
    # Update the lat/lon sliders with the values for the selected location.
    if location != 'Custom':
        location_changed[0] = True
        lat_wgt.value = locations[location].get('lat', lat_wgt.value)
        lon_wgt.value = locations[location].get('lon', lon_wgt.value)
        location_changed[0] = False

@pn.depends(lat_wgt.param.value, lon_wgt.param.value, watch=True)
def lat_lon_sliders_handler(lat, lon, location_changed=location_changed):
    sleep(0.01)
    # If the latitude or longitude are changed other than due to a newly
    # selected location (location widget), note that this is a custom location.
    if not location_changed[0]:
        location_wgt.value = 'Custom'

## End location, date, and spatial slider widgets ##

## Spatial extents float widgets ##

# Using the `panel.depends` decorator with these widgets in Panel version 0.8.3 does not work.
def set_lat_min_by_flt(evt):
    try:
        lat_min = float(evt.new)
        lat_wgt.value = (lat_min, lat_wgt.value[1])
    except:
        return
lat_min_flt_wgt = pn.widgets.LiteralInput(name='Latitude Min', value=lat_wgt.value[0], type=float)
lat_min_flt_wgt.param.watch(set_lat_min_by_flt, 'value')

def set_lat_max_by_flt(evt):
    try:
        lat_max = float(evt.new)
        lat_wgt.value = (lat_wgt.value[0], lat_max)
    except:
        return
lat_max_flt_wgt = pn.widgets.LiteralInput(name='Latitude Max', value=lat_wgt.value[1], type=float)
lat_max_flt_wgt.param.watch(set_lat_max_by_flt, 'value')

def set_lon_min_by_flt(evt):
    try:
        lon_min = float(evt.new)
        lon_wgt.value = (lon_min, lon_wgt.value[1])
    except:
        return
lon_min_flt_wgt = pn.widgets.LiteralInput(name='Longitude Min', value=lon_wgt.value[0], type=float)
lon_min_flt_wgt.param.watch(set_lon_min_by_flt, 'value')

def set_lon_max_by_flt(evt):
    try:
        lon_max = float(evt.new)
        lon_wgt.value = (lon_wgt.value[0], lon_max)
    except:
        return
lon_max_flt_wgt = pn.widgets.LiteralInput(name='Longitude Max', value=lon_wgt.value[1], type=float)
lon_max_flt_wgt.param.watch(set_lon_max_by_flt, 'value')

## End spatial extents float widgets ##
std_height = 50
std_width = 50
widgets_row_fmt = dict(width=6*std_width)
pn.Row(
    pn.WidgetBox(
        pn.Row(location_wgt, 
               **widgets_row_fmt, height=std_height), 
        pn.Row(date_wgt, 
               **widgets_row_fmt, height=std_height),
        pn.Row(lat_min_flt_wgt, lat_max_flt_wgt, 
               **widgets_row_fmt, height=std_height),
        pn.Row(lat_wgt, 
               **widgets_row_fmt, height=std_height), 
        pn.Row(lon_min_flt_wgt, lon_max_flt_wgt, 
               **widgets_row_fmt, height=std_height),
        pn.Row(lon_wgt, 
               **widgets_row_fmt, height=std_height)),
    pn.WidgetBox("""
       ## Information
       Select a location to set the latitude and longitude sliders. <br><br>
       You can set the area with the numeric widgets. <br><br>
       You can also drag the lower (left) and upper (right) values in the sliders to set the time range and area.""", 
                 **widgets_row_fmt))

Visualize the selected area

Option 1: Simpler - if no widgets were used above


In [ ]:
display_map(lat,lon)

Option 2: User-friendly - if widgets were used above


In [ ]:
display_map(lat_wgt.value,lon_wgt.value)

Load and Clean Data from the Data Cube

Option 1: Simple - load normally, using NumPy

This is a fairly fast and reliable way to process the data - once it is loaded. All of the data is loaded together, so you will enconter memory limits for large queries, but NumPy is more widely supported and generally easier to work with than the alternative in option 2. Try getting a notebook working with this option first.


In [ ]:
measurements = ['red', 'green', 'blue', 'nir', 'swir1', 'swir2', 'pixel_qa']
# The `dask_chunks={'time':1}` serves to speed up queries from S3 through parallel downlaods.
landsat_dataset = dc.load(latitude = lat_wgt.value,
                          longitude = lon_wgt.value,
                          platform = platform_wgt.value,
                          time = date_wgt.value,
                          product = prod_wgt.value,
                          measurements = measurements,
                          group_by='solar_day', 
                          dask_chunks={'time':1}).compute()

Mask out clouds


In [ ]:
from utils.data_cube_utilities.clean_mask import landsat_qa_clean_mask, landsat_clean_mask_invalid

cloud_mask = (landsat_qa_clean_mask(landsat_dataset, platform=platform_wgt.value) & \
             (landsat_dataset != -9999).to_array().any('variable') & \
             landsat_clean_mask_invalid(landsat_dataset))
cleaned_dataset = landsat_dataset.where(cloud_mask).drop('pixel_qa')
del landsat_dataset

Option 2: Scalable - load with Dask

This is an even faster and much more scalable way to process the data. Chunks of the data are loaded and processed in a stream of tasks in parallel. Dask will avoid running out of memory within its ability. Simply provide it with a reasonable chunk size - about 200 MiB per chunk, though that will vary with the performance characteristics of the functions being applied to the data. Note that not all tasks that work with option 1 work with option 2.


In [ ]:
measurements = ['red', 'green', 'blue', 'nir', 'swir1', 'swir2', 'pixel_qa']
landsat_dataset = dc.load(latitude = lat_wgt.value,
                          longitude = lon_wgt.value,
                          platform = platform_wgt.value,
                          time = date_wgt.value,
                          product = prod_wgt.value,
                          measurements = measurements,
                          group_by='solar_day', 
                          dask_chunks={'latitude':1500, 'longitude':1500, 'time':1})

Mask out clouds


In [ ]:
from utils.data_cube_utilities.clean_mask import landsat_qa_clean_mask, landsat_clean_mask_invalid

cloud_mask = (landsat_qa_clean_mask(landsat_dataset, platform=platform_wgt.value) & \
             (landsat_dataset != -9999).to_array().any('variable') & \
             landsat_clean_mask_invalid(landsat_dataset))
cleaned_dataset = landsat_dataset.where(cloud_mask).drop('pixel_qa')
del landsat_dataset

In [ ]:
cloud_mask = cloud_mask.persist()
cleaned_dataset = cleaned_dataset.persist()

Visualization

Create a mosaic for the sake of demonstration


In [ ]:
from utils.data_cube_utilities.dc_mosaic import create_hdmedians_multiple_band_mosaic
from utils.data_cube_utilities.plotter_utils import figure_ratio

mosaic = create_hdmedians_multiple_band_mosaic(cleaned_dataset[['red', 'green', 'blue']], cloud_mask)
rgb = mosaic.to_array().transpose('latitude', 'longitude', 'variable')
rgb_scaled = ((rgb - rgb.min()) / (rgb.max() - rgb.min())).persist()

# Define the size of the plots.
width, height = figure_ratio(cleaned_dataset, fixed_width=12)
width, height = int(width), int(height)

Option 1: Simple - Matplotlib, Seaborn, Plotly

  • Matplotlib is most useful for quick, simple plotting
  • Seaborn is most useful for plotting pandas data (tabular datasets)
  • Plotly is most useful for interactive visualizations

In [ ]:
# Plot a particular time using the xarray object (which uses matplotlib).
rgb_scaled.compute().plot.imshow(figsize=(width, height))
plt.title('Plotting with Xarray (matplotlib)')
plt.show()

Option 2: Scalable, Beautiful - HoloViews, Datashader - prefer if option 2 was used in loading (Dask)

  • HoloViews is a scalable, interactive plotting library that has a different style of code than matplotlib and a different set of capabilities
  • Datashader allows efficient visualization of large datasets (pixel binning)

These libraries are part of the PyViz ecosystem, which has become increasingly popular. They are needed to plot very large rasters. If the raster below is large enough, you will notice that the resolution is upscaled when zooming in and vice versa. You can zoom in by clicking the icon of the magnifying glass in a box on the right and then dragging an area over the map with left-click. To zoom out, click the cycle button (two arrows forming a loop)


In [ ]:
from xarray.ufuncs import isnan as xr_nan
from holoviews.operation.datashader import rasterize

# Format the RGB image.
rgb_scaled = rgb_scaled.where(~xr_nan(rgb_scaled), 0) # Set NaN to 0.

pn.Column(rasterize(hv.RGB(rgb_scaled, kdims=['longitude', 'latitude'])).opts(height=65*height, width=65*width))