Export Notebook


Notebook Summary

The code in this notebook subsets a data cube, selects a specific set of variables, generates some additional data from those and then outputs that data into a GeoTIFF file. The goal is to be able to do external analyses of this data using other data analysis tools or GIS tools. The files would be reasonable in size, since we would restrict the region and parameters in the output.


Index


Import Dependencies and Connect to the Data Cube [▴](#top)


In [1]:
import xarray as xr  
import numpy as np
import datacube
from utils.data_cube_utilities.data_access_api import DataAccessApi

In [2]:
api = DataAccessApi()
dc = api.dc

Choose Platforms and Products [▴](#top)

List available products for each platform


In [3]:
list_of_products = dc.list_products()
list_of_products


Out[3]:
name description label lat instrument time format lon product_type platform creation_time crs resolution tile_size spatial_dimensions
id
2 ls7_usgs_sr_scene Landsat 7 USGS Collection 1 Level2 Surface Ref... None None ETM None GeoTiff None LEDAPS LANDSAT_7 None EPSG:4326 (-0.00027777777778, 0.00027777777778) None (latitude, longitude)
1 ls8_usgs_sr_scene Landsat 8 USGS Collection 1 Level2 Surface Ref... None None OLI_TIRS None GeoTiff None LaSRC LANDSAT_8 None EPSG:4326 (-0.00027777777778, 0.00027777777778) None (latitude, longitude)

Choose product


In [4]:
platform = "LANDSAT_7"
product = "ls7_usgs_sr_scene"

# platform = "LANDSAT_8"
# product = "ls8_usgs_sr_scene"

Get the Extents of the Cube [▴](#top)


In [5]:
from utils.data_cube_utilities.dc_load import get_product_extents
from utils.data_cube_utilities.dc_time import dt_to_str

full_lat, full_lon, min_max_dates = get_product_extents(api, platform, product)

# Print the extents of the combined data.
print("Latitude Extents:", full_lat)
print("Longitude Extents:", full_lon)
print("Time Extents:", list(map(dt_to_str, min_max_dates)))


Latitude Extents: (-12.57305555565614, 18.32305555570214)
Longitude Extents: (-25.66583333353866, 44.05861111146359)
Time Extents: ['1999-07-08', '2020-01-10']

In [6]:
## The code below renders a map that can be used to orient yourself with the region.
from utils.data_cube_utilities.dc_display_map import display_map
display_map(full_lat, full_lon)


Out[6]:

Define the Extents of the Analysis [▴](#top)


In [7]:
######### Ghana - Pambros Salt Ponds ################## 
lon = (-0.3013, -0.2671)
lat = (5.5155, 5.5617)

time_extents = ('2015-01-01', '2015-12-31')

In [8]:
from utils.data_cube_utilities.dc_display_map import display_map
display_map(lat, lon)


Out[8]:

Load Data from the Data Cube [▴](#top)


In [9]:
landsat_dataset = dc.load(latitude = lat,
                          longitude = lon,
                          platform = platform,
                          time = time_extents,
                          product = product,
                          measurements = ['red', 'green', 'blue', 'nir', 'swir1', 'swir2', 'pixel_qa'])

In [10]:
landsat_dataset


Out[10]:
<xarray.Dataset>
Dimensions:    (latitude: 168, longitude: 124, time: 11)
Coordinates:
  * time       (time) datetime64[ns] 2015-01-12T10:14:16.197501 ... 2015-12-30T10:17:04.234884
  * latitude   (latitude) float64 5.562 5.562 5.561 5.561 ... 5.516 5.516 5.515
  * longitude  (longitude) float64 -0.3013 -0.301 -0.3007 ... -0.2674 -0.2671
Data variables:
    red        (time, latitude, longitude) int16 1867 1809 1838 ... 1267 1296
    green      (time, latitude, longitude) int16 1612 1612 1612 ... 1374 1340
    blue       (time, latitude, longitude) int16 1504 1440 1440 ... 1285 1316
    nir        (time, latitude, longitude) int16 2672 2715 2757 ... 1198 1198
    swir1      (time, latitude, longitude) int16 2790 2790 2866 ... 860 860 823
    swir2      (time, latitude, longitude) int16 2236 2197 2119 ... 604 644 604
    pixel_qa   (time, latitude, longitude) uint16 66 66 66 66 ... 224 224 224
Attributes:
    crs:      EPSG:4326

Derive Products [▴](#top)

Masks


In [11]:
from utils.data_cube_utilities.clean_mask import landsat_qa_clean_mask

clear_xarray = landsat_qa_clean_mask(landsat_dataset, platform, cover_types=['clear'])
water_xarray = landsat_qa_clean_mask(landsat_dataset, platform, cover_types=['water'])
shadow_xarray = landsat_qa_clean_mask(landsat_dataset, platform, cover_types=['shadow'])

clean_xarray = xr.ufuncs.logical_or(clear_xarray , water_xarray).rename("clean_mask")

Water Classification


In [12]:
from utils.data_cube_utilities.dc_water_classifier import wofs_classify

water_classification = wofs_classify(landsat_dataset,
                                     clean_mask = clean_xarray.values, 
                                     mosaic = False)

In [13]:
wofs_xarray = water_classification.wofs

Normalized Indices


In [14]:
def NDVI(dataset):
    return ((dataset.nir - dataset.red)/(dataset.nir + dataset.red)).rename("NDVI")

In [15]:
def NDWI(dataset):
    return ((dataset.green - dataset.nir)/(dataset.green + dataset.nir)).rename("NDWI")

In [16]:
def NDBI(dataset):
        return ((dataset.swir2 - dataset.nir)/(dataset.swir2 + dataset.nir)).rename("NDBI")

In [17]:
ndbi_xarray = NDBI(landsat_dataset)  # Urbanization - Reds
ndvi_xarray = NDVI(landsat_dataset)  # Dense Vegetation - Greens
ndwi_xarray = NDWI(landsat_dataset)  # High Concentrations of Water - Blues

TSM


In [18]:
from utils.data_cube_utilities.dc_water_quality import tsm

tsm_xarray = tsm(landsat_dataset, clean_mask = wofs_xarray.values.astype(bool) ).tsm

EVI


In [19]:
def EVI(dataset, c1 = None, c2 = None, L = None):
        return ((dataset.nir - dataset.red)/((dataset.nir  + (c1 * dataset.red) - (c2 *dataset.blue) + L))).rename("EVI")

In [20]:
evi_xarray = EVI(landsat_dataset, c1 = 6, c2 = 7.5, L = 1 )

Combine Data [▴](#top)


In [21]:
combined_dataset = xr.merge([landsat_dataset,
          ## <span id="combine_data">Combine Data [&#9652;](#top)</span>  clean_xarray,
          clear_xarray,
          water_xarray,
          shadow_xarray,
          evi_xarray,
          ndbi_xarray,
          ndvi_xarray,
          ndwi_xarray,
          wofs_xarray,
          tsm_xarray])

# Copy original crs to merged dataset 
combined_dataset = combined_dataset.assign_attrs(landsat_dataset.attrs)

combined_dataset


Out[21]:
<xarray.Dataset>
Dimensions:      (latitude: 168, longitude: 124, time: 11)
Coordinates:
  * time         (time) datetime64[ns] 2015-01-12T10:14:16.197501 ... 2015-12-30T10:17:04.234884
  * latitude     (latitude) float64 5.562 5.562 5.561 ... 5.516 5.516 5.515
  * longitude    (longitude) float64 -0.3013 -0.301 -0.3007 ... -0.2674 -0.2671
Data variables:
    red          (time, latitude, longitude) int16 1867 1809 1838 ... 1267 1296
    green        (time, latitude, longitude) int16 1612 1612 1612 ... 1374 1340
    blue         (time, latitude, longitude) int16 1504 1440 1440 ... 1285 1316
    nir          (time, latitude, longitude) int16 2672 2715 2757 ... 1198 1198
    swir1        (time, latitude, longitude) int16 2790 2790 2866 ... 860 823
    swir2        (time, latitude, longitude) int16 2236 2197 2119 ... 644 604
    pixel_qa     (time, latitude, longitude) uint16 66 66 66 66 ... 224 224 224
    clear_mask   (time, latitude, longitude) bool True True True ... False False
    water_mask   (time, latitude, longitude) bool False False ... False False
    shadow_mask  (time, latitude, longitude) bool False False ... False False
    EVI          (time, latitude, longitude) float64 0.3102 0.3271 ... 0.1095
    NDBI         (time, latitude, longitude) float64 -0.08883 ... -0.3296
    NDVI         (time, latitude, longitude) float64 0.1774 0.2003 ... -0.03929
    NDWI         (time, latitude, longitude) float64 -0.2474 -0.2549 ... 0.05595
    wofs         (time, latitude, longitude) float64 0.0 0.0 ... -9.999e+03
    tsm          (time, latitude, longitude) float64 0.0 0.0 0.0 ... 148.5 148.1
Attributes:
    crs:      EPSG:4326

Export Data [▴](#top)

Export to GeoTIFF [▴](#top)

Export each acquisition as a GeoTIFF.


In [22]:
from utils.data_cube_utilities.import_export import export_xarray_to_multiple_geotiffs

# Ensure the output directory exists before writing to it.
if platform == 'LANDSAT_7':
    !mkdir -p output/geotiffs/landsat7
else:
    !mkdir -p output/geotiffs/landsat8

output_path = "output/geotiffs/landsat{0}/landsat{0}".format(7 if platform=='LANDSAT_7' else 8)

export_xarray_to_multiple_geotiffs(combined_dataset, output_path)

Check to see what files were exported. The size of these files is also shown.


In [23]:
if platform == 'LANDSAT_7':
    !ls -lah output/geotiffs/landsat7/*.tif
else:
    !ls -lah output/geotiffs/landsat8/*.tif


-rw-r--r-- 1 jovyan users 2.1M May 21 17:01 output/geotiffs/landsat7/geo_medians_ls7.tif
-rw-r--r-- 1 jovyan users 2.1M May 21 17:01 output/geotiffs/landsat7/geo_medoids_ls7.tif
-rw-r--r-- 1 jovyan users 1.3M May 21 17:01 output/geotiffs/landsat7/landsat7_2015_01_12_10_14_16.tif
-rw-r--r-- 1 jovyan users 1.3M May 21 17:01 output/geotiffs/landsat7/landsat7_2015_01_28_10_14_20.tif
-rw-r--r-- 1 jovyan users 1.3M May 21 17:01 output/geotiffs/landsat7/landsat7_2015_03_01_10_14_33.tif
-rw-r--r-- 1 jovyan users 1.3M May 21 17:01 output/geotiffs/landsat7/landsat7_2015_03_17_10_14_41.tif
-rw-r--r-- 1 jovyan users 1.3M May 21 17:01 output/geotiffs/landsat7/landsat7_2015_04_18_10_14_55.tif
-rw-r--r-- 1 jovyan users 1.3M May 21 17:01 output/geotiffs/landsat7/landsat7_2015_07_23_10_15_27.tif
-rw-r--r-- 1 jovyan users 1.3M May 21 17:01 output/geotiffs/landsat7/landsat7_2015_09_09_10_15_35.tif
-rw-r--r-- 1 jovyan users 1.3M May 21 17:01 output/geotiffs/landsat7/landsat7_2015_11_12_10_16_21.tif
-rw-r--r-- 1 jovyan users 1.3M May 21 17:01 output/geotiffs/landsat7/landsat7_2015_11_28_10_16_35.tif
-rw-r--r-- 1 jovyan users 1.3M May 21 17:01 output/geotiffs/landsat7/landsat7_2015_12_14_10_16_51.tif
-rw-r--r-- 1 jovyan users 1.3M May 21 17:01 output/geotiffs/landsat7/landsat7_2015_12_30_10_17_04.tif

Sanity check using gdalinfo to make sure that all of our bands exist .


In [24]:
if platform == 'LANDSAT_7':
    !gdalinfo output/geotiffs/landsat7/landsat7_2015_01_09_03_06_13.tif
else:
    !gdalinfo output/geotiffs/landsat8/landsat8_2015_01_01_03_07_41.tif


ERROR 4: output/geotiffs/landsat7/landsat7_2015_01_09_03_06_13.tif: No such file or directory
gdalinfo failed - unable to open 'output/geotiffs/landsat7/landsat7_2015_01_09_03_06_13.tif'.

Zip all GeoTIFFs.


In [25]:
if platform == 'LANDSAT_7':
    !tar -cvzf output/geotiffs/landsat7/landsat_7.tar.gz output/geotiffs/landsat7/*.tif
else:
    !tar -cvzf output/geotiffs/landsat8/landsat_8.tar.gz output/geotiffs/landsat8/*.tif


output/geotiffs/landsat7/geo_medians_ls7.tif
output/geotiffs/landsat7/geo_medoids_ls7.tif
output/geotiffs/landsat7/landsat7_2015_01_12_10_14_16.tif
output/geotiffs/landsat7/landsat7_2015_01_28_10_14_20.tif
output/geotiffs/landsat7/landsat7_2015_03_01_10_14_33.tif
output/geotiffs/landsat7/landsat7_2015_03_17_10_14_41.tif
output/geotiffs/landsat7/landsat7_2015_04_18_10_14_55.tif
output/geotiffs/landsat7/landsat7_2015_07_23_10_15_27.tif
output/geotiffs/landsat7/landsat7_2015_09_09_10_15_35.tif
output/geotiffs/landsat7/landsat7_2015_11_12_10_16_21.tif
output/geotiffs/landsat7/landsat7_2015_11_28_10_16_35.tif
output/geotiffs/landsat7/landsat7_2015_12_14_10_16_51.tif
output/geotiffs/landsat7/landsat7_2015_12_30_10_17_04.tif

Export to NetCDF [▴](#top)

Export all acquisitions together as a single NetCDF.


In [26]:
combined_dataset


Out[26]:
<xarray.Dataset>
Dimensions:      (latitude: 168, longitude: 124, time: 11)
Coordinates:
  * time         (time) datetime64[ns] 2015-01-12T10:14:16.197501 ... 2015-12-30T10:17:04.234884
  * latitude     (latitude) float64 5.562 5.562 5.561 ... 5.516 5.516 5.515
  * longitude    (longitude) float64 -0.3013 -0.301 -0.3007 ... -0.2674 -0.2671
Data variables:
    red          (time, latitude, longitude) int16 1867 1809 1838 ... 1267 1296
    green        (time, latitude, longitude) int16 1612 1612 1612 ... 1374 1340
    blue         (time, latitude, longitude) int16 1504 1440 1440 ... 1285 1316
    nir          (time, latitude, longitude) int16 2672 2715 2757 ... 1198 1198
    swir1        (time, latitude, longitude) int16 2790 2790 2866 ... 860 823
    swir2        (time, latitude, longitude) int16 2236 2197 2119 ... 644 604
    pixel_qa     (time, latitude, longitude) uint16 66 66 66 66 ... 224 224 224
    clear_mask   (time, latitude, longitude) bool True True True ... False False
    water_mask   (time, latitude, longitude) bool False False ... False False
    shadow_mask  (time, latitude, longitude) bool False False ... False False
    EVI          (time, latitude, longitude) float64 0.3102 0.3271 ... 0.1095
    NDBI         (time, latitude, longitude) float64 -0.08883 ... -0.3296
    NDVI         (time, latitude, longitude) float64 0.1774 0.2003 ... -0.03929
    NDWI         (time, latitude, longitude) float64 -0.2474 -0.2549 ... 0.05595
    wofs         (time, latitude, longitude) float64 0.0 0.0 ... -9.999e+03
    tsm          (time, latitude, longitude) float64 0.0 0.0 0.0 ... 148.5 148.1
Attributes:
    crs:      EPSG:4326

In [27]:
import os
import pathlib
from utils.data_cube_utilities.import_export import export_xarray_to_netcdf

# Ensure the output directory exists before writing to it.
ls_num = 7 if platform=='LANDSAT_7' else 8
output_dir = f"output/netcdfs/landsat{ls_num}"
pathlib.Path(output_dir).mkdir(parents=True, exist_ok=True)

output_file_path = output_dir + f"/ls{ls_num}_netcdf_example.nc"
export_xarray_to_netcdf(combined_dataset.red, output_file_path)