Data Cube Python Notebook Demonstration

by Brian Killough (NASA SEO)

Connect to a Data Cube
Select a region and time range
Display image details (bands, cloud mask, RGB)
Create a cloud-filtered median mosaic
Run the WOFS water detection algorithm
Run the Fractional Cover algorithm
Plot indices (NDVI, NDWI, NWBI)
Run the TSM Water Quality (suspended matter) algorithm
Run a time series cloud cover analysis
Run a time series WOFS water detection analysis

Index

Import Dependencies and Connect to the Data Cube
Browse the Available Products on the Storage Platform
Choose Platform and Product
Get the Extents of the Cube
Define the Extents of the Analysis
Load and Visualize the Data
Creating Composites
Water Detection
Fractional Cover
Spectral Indicies

Import Dependencies and Connect to the Data Cube ▴



In [1]:

    
import numpy as np

import datacube
import utils.data_cube_utilities.data_access_api as dc_api
api = dc_api.DataAccessApi()
dc = api.dc

Browse the Available Products on the Storage Platform ▴

You might want to learn more about what data is available and how it is stored.



In [2]:

    
list_of_products = dc.list_products()
list_of_products









    Out[2]:







  
    
      
      name
      description
      lat
      label
      product_type
      format
      platform
      instrument
      time
      creation_time
      lon
      crs
      resolution
      tile_size
      spatial_dimensions
    
    
      id
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2
      ls7_usgs_sr_scene
      Landsat 7 USGS Collection 1 Level2 Surface Ref...
      None
      None
      LEDAPS
      GeoTiff
      LANDSAT_7
      ETM
      None
      None
      None
      EPSG:4326
      (-0.00027777777778, 0.00027777777778)
      None
      (latitude, longitude)
    
    
      1
      ls8_usgs_sr_scene
      Landsat 8 USGS Collection 1 Level2 Surface Ref...
      None
      None
      LaSRC
      GeoTiff
      LANDSAT_8
      OLI_TIRS
      None
      None
      None
      EPSG:4326
      (-0.00027777777778, 0.00027777777778)
      None
      (latitude, longitude)

Choose Platform and Product [▴](#demo_notebook_top)

Use the platform names from the previous block to select a small Data Cube. The data_access_api utility will give you lat, lon, and time bounds of your Data Cube.

Only select a Landsat 7 or 8 product.



In [3]:

    
platform = "LANDSAT_7"
product = "ls7_usgs_sr_scene"

Get the Extents of the Cube ▴



In [4]:

    
# Get product extents
prod_extents = api.get_query_metadata(platform=platform, product=product, measurements=[])

full_lat = prod_extents['lat_extents']
print("Lat bounds:", full_lat)
full_lon = prod_extents['lon_extents']
print("Lon bounds:", full_lon)
time_extents = list(map(lambda time: time.strftime('%Y-%m-%d'), prod_extents['time_extents']))
print("Time bounds:", time_extents)









    



Lat bounds: (-12.57305555565614, 18.32305555570214)
Lon bounds: (-25.66583333353866, 44.05861111146359)
Time bounds: ['1999-07-08', '2020-01-10']

Visualize the full area



In [5]:

    
from utils.data_cube_utilities.dc_display_map import display_map
display_map(full_lat, full_lon)









    Out[5]:

Define the Extents of the Analysis [▴](#demo_notebook_top)



In [6]:

    
# ######### Colombia - Cartegena ##################
#lon = (-74.863, -74.823)
#lat = (1.326, 1.357)

######### Vietnam - Buan Tua Srah Lake ################## 
#lon = (108.02, 108.15)
#lat  = (12.18 , 12.30)

######## Kenya - Lake Nakuru ##################
lon = (36.02, 36.13)
lat = (-0.42, -0.28) 

time_extents = ('2015-01-01', '2015-12-31')

Visualize the full area



In [7]:

    
display_map(lat, lon)









    Out[7]:

Load and Visualize the Data ▴



In [8]:

    
dataset = dc.load(latitude = lat,
                  longitude = lon,
                  platform = platform,
                  time = time_extents,
                  product = product,
                  measurements = ['red', 'green', 'blue', 'nir', 'swir1', 'swir2', 'pixel_qa'])

Display Example Images

Single band visualization
For a quick inspection, let's look at two images. The first image will allow the selection of any band (red, blue, green, nir, swir1, swir2). The second image will mask clouds with bright red on a NIR image.

Select the desired acquistion in the block below. You can select from 1 to #, where the max value is the number of time slices noted in the block above. Change the comment statements below to select the bands for the first image.



In [9]:

    
acquisition_number = 20
# select an acquisition number from 1 to "time" using the array limits above



In [10]:

    
%matplotlib inline
dataset.nir.isel(time = acquisition_number).plot(cmap = "Greys")









    Out[10]:





<matplotlib.collections.QuadMesh at 0x7fa425a824a8>

Mask clouds and plot the clouds, cloud shadows, or no data, as RED pixels on a NIR image.



In [11]:

    
from utils.data_cube_utilities.clean_mask import \
    landsat_qa_clean_mask, landsat_clean_mask_invalid

clean_mask = (landsat_qa_clean_mask(dataset, platform) &
              ((dataset != -9999).to_array().all('variable')) &
              landsat_clean_mask_invalid(dataset)).persist()
cleaned_dataset = dataset.where(clean_mask)



In [12]:

    
from matplotlib.cm import Greys as red_on_grey
red_on_grey.set_bad('red',.5)

cleaned_dataset.isel(time = acquisition_number).nir.plot(cmap = red_on_grey)









    Out[12]:





<matplotlib.collections.QuadMesh at 0x7fa4227b8c88>

False Color Visualizations
Now we will look at two RGB images where the second image includes the cloud, coud shadow and no data mask in RED. Also, the scene is the same as the acquistion selected above.



In [13]:

    
from utils.data_cube_utilities.dc_rgb import rgb

rgb_da = dataset[['red', 'green', 'blue']].to_array()
# Substitude black pixels for ones with no_data (-9999) values.
rgb_da = rgb_da.where(rgb_da!=-9999, 0)
# Use values fairly close but not equal to the minimum and maximum 
# values as the values of minimum and maximum intensity in the image.
# The goal is to provide fairly consistent image brightness.
vmin = np.quantile(rgb_da.values, 0.10)
vmax = np.quantile(rgb_da.values, 0.90)



In [14]:

    
rgb(dataset.isel(time=acquisition_number), 
    min_possible=vmin, max_possible=vmax)









    Out[14]:





(<Figure size 720x566.02 with 1 Axes>,
 <matplotlib.axes._subplots.AxesSubplot at 0x7fa421e60e80>)



In [15]:

    
red = [255,0,0]



In [16]:

    
rgb(dataset.isel(time=acquisition_number), 
    # index value is the time slice number
    paint_on_mask = [(~clean_mask[acquisition_number], red)],
    min_possible=vmin, max_possible=vmax)









    Out[16]:





(<Figure size 720x566.02 with 1 Axes>,
 <matplotlib.axes._subplots.AxesSubplot at 0x7fa421b5c208>)

Creating Composites ▴

Most Recent Pixel Mosaic
Masks clouds from imagery and uses the most recent cloud-free pixels.



In [17]:

    
from utils.data_cube_utilities.dc_mosaic import create_mosaic
from utils.data_cube_utilities.dc_utilities import ignore_warnings

most_recent_composite = ignore_warnings(create_mosaic, cleaned_dataset, clean_mask)



In [18]:

    
most_recent_composite.nir.plot(cmap = "Greys")









    Out[18]:





<matplotlib.collections.QuadMesh at 0x7fa41fd095c0>



In [19]:

    
rgb(most_recent_composite, min_possible=vmin, max_possible=vmax)









    Out[19]:





(<Figure size 720x566.02 with 1 Axes>,
 <matplotlib.axes._subplots.AxesSubplot at 0x7fa41e8ca7f0>)

Median Mosaic
Masks clouds from imagery using the median valued cloud-free pixels in the time series



In [20]:

    
from utils.data_cube_utilities.dc_mosaic import create_median_mosaic

median_composite = ignore_warnings(create_median_mosaic, cleaned_dataset, clean_mask)



In [21]:

    
median_composite.nir.plot(cmap = "Greys")









    Out[21]:





<matplotlib.collections.QuadMesh at 0x7fa4077f9e48>



In [22]:

    
rgb(median_composite, min_possible=vmin, max_possible=vmax)









    Out[22]:





(<Figure size 720x566.02 with 1 Axes>,
 <matplotlib.axes._subplots.AxesSubplot at 0x7fa421b33f60>)

Water Detection ▴



In [23]:

    
from utils.data_cube_utilities.dc_water_classifier import wofs_classify

water_classification = wofs_classify(median_composite, mosaic = True)



In [24]:

    
water_classification.wofs.plot()









    Out[24]:





<matplotlib.collections.QuadMesh at 0x7fa41c71b4a8>

Fractional Cover ▴

Fractional Cover (FC) is used for landcover type estimation of vegetation, non-green vegetation, and bare soil of each pixel. We use a model from CSIRO (Juan Gerschmann) and apply it to a median mosaic. Read more here.

The PV component is the estimated probability of a pixel being photosynthetic vegetation. The BS component is the estimated probability of a pixel being bare soil. The NPV component is the estimated probability of a pixel being non-photosynthetic vegetation.



In [25]:

    
from utils.data_cube_utilities.dc_fractional_coverage_classifier import frac_coverage_classify  
frac_classes = frac_coverage_classify(median_composite)

Show the Probabilities of Pixels Being Photosynthetic Vegetation



In [26]:

    
frac_classes.pv.plot()









    Out[26]:





<matplotlib.collections.QuadMesh at 0x7fa4217aa4e0>

Show the Probabilities of Pixels Being Bare Soil



In [27]:

    
frac_classes.bs.plot(cmap = "Greys")









    Out[27]:





<matplotlib.collections.QuadMesh at 0x7fa4214e6390>

Plot a False Color RGB result where RGB = bs/pv/npv.



In [28]:

    
rgb(frac_classes, bands = ['bs', 'pv', 'npv'], 
    min_possible=0, max_possible=100)









    Out[28]:





(<Figure size 720x566.02 with 1 Axes>,
 <matplotlib.axes._subplots.AxesSubplot at 0x7fa42146e550>)

RGB Color Scale where RGB = bs/pv/npv

Spectral Indicies ▴

NDVI(Normalized Difference Vegetation Index
A derived index that correlates well with the existance of vegetation.

$$ NDVI = \frac{(NIR - RED)}{(NIR + RED)}$$



In [29]:

    
def NDVI(dataset):
    return (dataset.nir - dataset.red)/(dataset.nir + dataset.red)

NDWI Normalized Difference Water Index
A derived index that correlates well with the existance of water.

$$ NDWI = \frac{GREEN - NIR}{GREEN + NIR}$$



In [30]:

    
def NDWI(dataset):
    return (dataset.green - dataset.nir)/(dataset.green + dataset.nir)

NDBI Normalized Difference Build-Up Index A derived index that correlates well with the existance of urbanization.

$$NDBI = \frac{(SWIR - NIR)}{(SWIR + NIR)}$$



In [31]:

    
def NDBI(dataset):
        return (dataset.swir2 - dataset.nir)/(dataset.swir2 + dataset.nir)

Evaluate all three indices on a median mosaic



In [32]:

    
ndbi = NDBI(median_composite)  # Urbanization - Reds
ndvi = NDVI(median_composite)  # Dense Vegetation - Greens
ndwi = NDWI(median_composite)  # High Concentrations of Water - Blues



In [33]:

    
(ndbi).plot(cmap = "Reds", vmin=-1, vmax=1)









    Out[33]:





<matplotlib.collections.QuadMesh at 0x7fa4213e5710>



In [34]:

    
(ndvi).plot(cmap = "Greens", vmin=-1, vmax=1)









    Out[34]:





<matplotlib.collections.QuadMesh at 0x7fa4212a5978>



In [35]:

    
(ndwi).plot(cmap = "Blues", vmin=-1, vmax=1)









    Out[35]:





<matplotlib.collections.QuadMesh at 0x7fa4211d8be0>

Merge into one large Dataset
If your data-arrays share the same set of coordinates, or if you feel that you'll be using these values together in the future, you should consider merging them into a dataset



In [36]:

    
ds_ndvi = ndvi.to_dataset(name = "NDVI")
ds_ndwi = ndwi.to_dataset(name=  "NDWI")
ds_ndbi = ndbi.to_dataset(name = "NDBI")

urbanization_dataset = ds_ndvi.merge(ds_ndwi).merge(ds_ndbi)

Display False Color Composite
Where RGB = NDBI / NDVI / NDWI (Urbanization / Vegetation / Water)



In [37]:

    
# urbanization_dataset.NDBI.values *= 1 # Scale NDBI for visual purposes 
rgb(urbanization_dataset, bands = ['NDBI', 'NDVI', 'NDWI'], 
    min_possible=-1, max_possible=1)









    Out[37]:





(<Figure size 720x566.02 with 1 Axes>,
 <matplotlib.axes._subplots.AxesSubplot at 0x7fa42116d278>)

TSM

TSM = Total Suspended Matter. This parameter is a measure of the particulate matter in water and is often a proxy for water quality.



In [38]:

    
from utils.data_cube_utilities.dc_water_quality import tsm



In [39]:

    
mask_that_only_includes_water_pixels = water_classification.wofs == 1  
tsm_dataset = tsm(median_composite, clean_mask = mask_that_only_includes_water_pixels )

Plot TSM Results
The TSM value is the mean over the entire time range



In [40]:

    
tsm_dataset.tsm.plot(cmap = "jet")









    Out[40]:





<matplotlib.collections.QuadMesh at 0x7fa420cb29e8>

Time Series Cloud Cover Analysis

Calculates the average cloud coverage percentages for the analysis area for each time step



In [41]:

    
import numpy as np 

def dataset_cloud_percentage_time_series(dataset):
    
    ## Creates an array of True and False Values
    mask = np.logical_or(
        dataset.pixel_qa == 64 + 2,
        dataset.pixel_qa == 64 + 4
    )
    
    mask = np.logical_or(
        mask,
        dataset.pixel_qa == 64 + 8
    )
    
    mask = np.invert(mask)
    mask = mask.astype(np.int16)  ## True/False ----> 1/0  
    cloud_coverage = mask.mean(dim = ['latitude', 'longitude']) ## Find average value(between 1 and 0) per time slice. Treat as percentage
    return (cloud_coverage * 100).rename("cloud_coverage_percentage")



In [42]:

    
time_series = dataset_cloud_percentage_time_series(dataset)



In [43]:

    
time_series.plot.hist()









    Out[43]:





(array([13.,  0.,  1.,  1.,  0.,  0.,  2.,  1.,  0.,  3.]),
 array([ 16.66658353,  24.99992518,  33.33326683,  41.66660847,
         49.99995012,  58.33329177,  66.66663341,  74.99997506,
         83.33331671,  91.66665835, 100.        ]),
 <a list of 10 Patch objects>)



In [44]:

    
import matplotlib.pyplot as plt

times = time_series.time.values
values = time_series.values

plt.title('Cloud Coverage')
plt.scatter(times, values)









    Out[44]:





<matplotlib.collections.PathCollection at 0x7fa420bada58>

Time Series Water Detection Analysis
Time series output of the Australian Water Detection from Space(WOFS) results. The results show the percent of time that a pixel is classified as water over the entire time series. BLUE = frequent water, RED = infrequent water.



In [45]:

    
ts_water_classification = wofs_classify(dataset, clean_mask)



In [46]:

    
# Apply nan to no_data values
ts_water_classification = ts_water_classification.where(ts_water_classification != -9999)

##Time series aggregation that ignores nan values.    
water_classification_percentages = (ts_water_classification.mean(dim = ['time']) * 100).wofs.rename('water_classification_percentages')



In [47]:

    
## import color-scheme and set nans to black
from matplotlib.cm import jet_r as jet_r
jet_r.set_bad('black',1)

## apply nan to percentage values that aren't greater than 0, then plot  

water_classification_percentages\
    .where(water_classification_percentages > 0)\
    .plot(cmap = jet_r)









    Out[47]:





<matplotlib.collections.QuadMesh at 0x7fa4205899b0>

	name	description	lat	label	product_type	format	platform	instrument	time	creation_time	lon	crs	resolution	tile_size	spatial_dimensions
id
2	ls7_usgs_sr_scene	Landsat 7 USGS Collection 1 Level2 Surface Ref...	None	None	LEDAPS	GeoTiff	LANDSAT_7	ETM	None	None	None	EPSG:4326	(-0.00027777777778, 0.00027777777778)	None	(latitude, longitude)
1	ls8_usgs_sr_scene	Landsat 8 USGS Collection 1 Level2 Surface Ref...	None	None	LaSRC	GeoTiff	LANDSAT_8	OLI_TIRS	None	None	None	EPSG:4326	(-0.00027777777778, 0.00027777777778)	None	(latitude, longitude)