In [1]:
# Enable importing of utilities.
from sys import path
path.append("..")

ARDC Training: Python Notebooks

Task-F: Data Export

The code in this notebook subsets a data cube, selects a specific set of variables, creates a new XARRAY, and then outputs that data into a GeoTIFF file and CSV file. This output file can be used in other software programs (e.g. QGIS, ArcGIS, EXCEL) for more specific analyses. We will keep the region small so that we can control file sizes.

Import the Datacube Configuration


In [2]:
import datacube
import utils.data_cube_utilities.data_access_api as dc_api  
api = dc_api.DataAccessApi()
dc = datacube.Datacube(app = 'ardc_task_f')
api.dc = dc

Browse the available Data Cubes on the storage platform

You might want to learn more about what data is stored and how it is stored.


In [3]:
list_of_products = dc.list_products()
netCDF_products = list_of_products[list_of_products['format'] == 'NetCDF']
netCDF_products


Out[3]:
name description lon creation_time label platform lat time instrument product_type format crs resolution tile_size spatial_dimensions
id
13 ls7_ledaps_ghana Landsat 7 USGS Collection 1 Higher Level SR sc... None None None LANDSAT_7 None None ETM LEDAPS NetCDF EPSG:4326 (-0.000269494585236, 0.000269494585236) (0.943231048326, 0.943231048326) (latitude, longitude)
17 ls7_ledaps_kenya Landsat 7 USGS Collection 1 Higher Level SR sc... None None None LANDSAT_7 None None ETM LEDAPS NetCDF EPSG:4326 (-0.000269493, 0.000269493) (0.99981903, 0.99981903) (latitude, longitude)
18 ls7_ledaps_senegal Landsat 7 USGS Collection 1 Higher Level SR sc... None None None LANDSAT_7 None None ETM LEDAPS NetCDF EPSG:4326 (-0.000271152, 0.00027769) (0.813456, 0.83307) (latitude, longitude)
16 ls7_ledaps_sierra_leone Landsat 7 USGS Collection 1 Higher Level SR sc... None None None LANDSAT_7 None None ETM LEDAPS NetCDF EPSG:4326 (-0.000269494585236, 0.000269494585236) (0.943231048326, 0.943231048326) (latitude, longitude)
19 ls7_ledaps_tanzania Landsat 7 USGS Collection 1 Higher Level SR sc... None None None LANDSAT_7 None None ETM LEDAPS NetCDF EPSG:4326 (-0.000271277688070265, 0.000271139577954979) (0.999929558226998, 0.999962763497961) (latitude, longitude)
31 ls7_ledaps_vietnam Landsat 7 USGS Collection 1 Higher Level SR sc... None None None LANDSAT_7 None None ETM LEDAPS NetCDF EPSG:4326 (-0.000269494585236, 0.000269494585236) (0.943231048326, 0.943231048326) (latitude, longitude)
9 ls8_lasrc_ghana Landsat 8 USGS Collection 1 Higher Level SR sc... None None None LANDSAT_8 None None OLI_TIRS LaSRC NetCDF EPSG:4326 (-0.000269494585236, 0.000269494585236) (0.943231048326, 0.943231048326) (latitude, longitude)
10 ls8_lasrc_kenya Landsat 8 USGS Collection 1 Higher Level SR sc... None None None LANDSAT_8 None None OLI_TIRS LaSRC NetCDF EPSG:4326 (-0.000271309115317046, 0.00026957992707863) (0.999502780827996, 0.999602369607559) (latitude, longitude)
11 ls8_lasrc_senegal Landsat 8 USGS Collection 1 Higher Level SR sc... None None None LANDSAT_8 None None OLI_TIRS LaSRC NetCDF EPSG:4326 (-0.000271152, 0.00027769) (0.813456, 0.83307) (latitude, longitude)
8 ls8_lasrc_sierra_leone Landsat 8 USGS Collection 1 Higher Level SR sc... None None None LANDSAT_8 None None OLI_TIRS LaSRC NetCDF EPSG:4326 (-0.000269494585236, 0.000269494585236) (0.943231048326, 0.943231048326) (latitude, longitude)
15 ls8_lasrc_tanzania Landsat 8 USGS Collection 1 Higher Level SR sc... None None None LANDSAT_8 None None OLI_TIRS LaSRC NetCDF EPSG:4326 (-0.000271277688070265, 0.000271139577954979) (0.999929558226998, 0.999962763497961) (latitude, longitude)

Pick a product

Use the platform and product names from the previous block to select a Data Cube.


In [5]:
# Change the data platform and data cube here
# This data export notebook will only work for Landsat-7 datasets

platform = "LANDSAT_7"

# product = "ls7_ledaps_ghana"
# product = "ls7_ledaps_kenya"
# product = "ls7_ledaps_senegal"
# product = "ls7_ledaps_sierra_leone"
# product = "ls7_ledaps_tanzania"
product = "ls7_ledaps_vietnam"

In [6]:
from utils.data_cube_utilities.dc_time import _n64_to_datetime, dt_to_str

extents = api.get_full_dataset_extent(platform = platform, product = product, measurements=[])

latitude_extents = (min(extents['latitude'].values),max(extents['latitude'].values))
longitude_extents = (min(extents['longitude'].values),max(extents['longitude'].values))
time_extents = (min(extents['time'].values),max(extents['time'].values))

print("Latitude Extents:", latitude_extents)
print("Longitude Extents:", longitude_extents)
print("Time Extents:", list(map(dt_to_str, map(_n64_to_datetime, time_extents))))


Latitude Extents: (9.176425374578418, 13.964805165051667)
Longitude Extents: (102.40430421277932, 108.93092407802477)
Time Extents: ['1999-09-08', '2016-12-29']

Visualize Data Cube Region


In [7]:
## The code below renders a map that can be used to orient yourself with the region.
from utils.data_cube_utilities.dc_display_map import display_map
display_map(latitude = latitude_extents, longitude = longitude_extents)


Out[7]:

Pick a smaller analysis region and display that region

Try to keep your region to less than 0.2-deg x 0.2-deg for rapid processing. You can click on the map above to find the Lat-Lon coordinates of any location. Pick a time window of 1 year to keep the file small.


In [10]:
## Vietnam - Central Lam Dong Province ##
longitude_extents = (105.2, 105.5)
latitude_extents  = (11.25, 11.55)

time_extents = ('2010-01-01', '2010-12-31')

In [11]:
from utils.data_cube_utilities.dc_display_map import display_map
display_map(latitude = latitude_extents, longitude = longitude_extents)


Out[11]:

In [12]:
landsat_dataset = dc.load(latitude = latitude_extents,
                          longitude = longitude_extents,
                          platform = platform,
                          time = time_extents,
                          product = product,
                          measurements = ['red', 'green', 'blue', 'nir', 'swir1', 'swir2', 'pixel_qa'])

In [13]:
landsat_dataset # this is a printout of the first few values of each parameter in the XARRAY


Out[13]:
<xarray.Dataset>
Dimensions:    (latitude: 1115, longitude: 1114, time: 10)
Coordinates:
  * time       (time) datetime64[ns] 2010-01-09T03:11:26 ... 2010-12-27T03:13:08
  * latitude   (latitude) float64 11.55 11.55 11.55 11.55 ... 11.25 11.25 11.25
  * longitude  (longitude) float64 105.2 105.2 105.2 105.2 ... 105.5 105.5 105.5
Data variables:
    red        (time, latitude, longitude) int16 757 881 922 ... 908 1000 1074
    green      (time, latitude, longitude) int16 806 828 874 851 ... 699 759 820
    blue       (time, latitude, longitude) int16 601 600 621 642 ... 552 590 629
    nir        (time, latitude, longitude) int16 2038 1852 1620 ... 2012 2012
    swir1      (time, latitude, longitude) int16 1136 1001 565 ... 2454 2685
    swir2      (time, latitude, longitude) int16 529 500 327 ... 1128 1450 1611
    pixel_qa   (time, latitude, longitude) int32 66 66 66 66 66 ... 66 66 66 66
Attributes:
    crs:      EPSG:4326

Create Several Common Application Products

Unpack pixel_qa

This is the Landsat-7 pixel quality data that is used to screen for clouds, shadows, snow, etc. These values can be quite valuable when doing an analysis in a GIS tool, but you will not need all of them.


In [14]:
import xarray as xr  
import numpy as np

def ls7_unpack_qa( data_array , cover_type):  
    
    land_cover_endcoding = dict( fill         =[1] ,
                                 clear        =[322, 386, 834, 898, 1346],
                                 water        =[324, 388, 836, 900, 1348],
                                 shadow       =[328, 392, 840, 904, 1350],
                                 snow         =[336, 368, 400, 432, 848, 880, 812, 944, 1352],
                                 cloud        =[352, 368, 416, 432, 848, 880, 912, 944, 1352],
                                 low_conf_cl  =[322, 324, 328, 336, 352, 368, 834, 836, 840, 848, 864, 880],
                                 med_conf_cl  =[386, 388, 392, 400, 416, 432, 898, 900, 904, 928, 944],
                                 high_conf_cl =[480, 992],
                                 low_conf_cir =[322, 324, 328, 336, 352, 368, 386, 388, 392, 400, 416, 432, 480],
                                 high_conf_cir=[834, 836, 840, 848, 864, 880, 898, 900, 904, 912, 928, 944], 
                                 terrain_occ  =[1346,1348, 1350, 1352]
                               )
    boolean_mask = np.isin(data_array, land_cover_endcoding[cover_type]) 
    return xr.DataArray(boolean_mask.astype(np.int8),
                        coords = data_array.coords,
                        dims = data_array.dims,
                        name = cover_type + "_mask",
                        attrs = data_array.attrs)

In [15]:
clear_xarray  = ls7_unpack_qa(landsat_dataset.pixel_qa, "clear")  
water_xarray  = ls7_unpack_qa(landsat_dataset.pixel_qa, "water")
shadow_xarray = ls7_unpack_qa(landsat_dataset.pixel_qa, "shadow") 
cloud_xarray = ls7_unpack_qa(landsat_dataset.pixel_qa, "cloud")

In [16]:
clean_xarray = xr.ufuncs.logical_or(clear_xarray , water_xarray).astype(np.int8).rename("clean_mask")

clean_mask = np.logical_or(clear_xarray.values.astype(bool),
                           water_xarray.values.astype(bool))

Spectral Indices

Below are a number of common spectral indices.


In [17]:
def NDVI(dataset):
    return ((dataset.nir - dataset.red)/(dataset.nir + dataset.red)).rename("NDVI")

In [18]:
def NDWI(dataset):
    return ((dataset.green - dataset.nir)/(dataset.green + dataset.nir)).rename("NDWI")

In [19]:
def NDBI(dataset):
        return ((dataset.swir2 - dataset.nir)/(dataset.swir2 + dataset.nir)).rename("NDBI")

In [20]:
def EVI(dataset, c1 = None, c2 = None, L = None):
        return ((dataset.nir - dataset.red)/((dataset.nir  + (c1 * dataset.red) - (c2 *dataset.blue) + L))).rename("EVI")

In [21]:
ndbi_xarray = NDBI(landsat_dataset)  # Urbanization - Reds
ndvi_xarray = NDVI(landsat_dataset)  # Dense Vegetation - Greens
ndwi_xarray = NDWI(landsat_dataset)  # High Concentrations of Water - Blues  
evi_xarray = EVI(landsat_dataset, c1 = 6, c2 = 7.5, L = 1 ) # Enhanced Vegetation Index

TSM

This is the Total Suspended Matter (TSM) index which measures the quality of water using a simple equation with one of Landsat bands. For the analysis below we will use the water pixels from the Landsat "pixel_qa" so that the code run faster than using the WOFS water analysis.


In [22]:
from utils.data_cube_utilities.dc_water_quality import tsm

tsm_xarray = tsm(landsat_dataset, clean_mask = water_xarray.values.astype(bool) ).tsm

Combine Everything


In [23]:
combined_dataset = xr.merge([landsat_dataset,
          clean_xarray,
          clear_xarray,
          water_xarray,
          shadow_xarray,
          cloud_xarray,                  
          evi_xarray,
          ndbi_xarray,
          ndvi_xarray,
          ndwi_xarray,
          tsm_xarray])

# Copy original crs to merged dataset 
combined_dataset = combined_dataset.assign_attrs(landsat_dataset.attrs)

combined_dataset  # this is a printout of the first few values of each parameter in the XARRAY


Out[23]:
<xarray.Dataset>
Dimensions:      (latitude: 1115, longitude: 1114, time: 10)
Coordinates:
  * time         (time) datetime64[ns] 2010-01-09T03:11:26 ... 2010-12-27T03:13:08
  * latitude     (latitude) float64 11.55 11.55 11.55 ... 11.25 11.25 11.25
  * longitude    (longitude) float64 105.2 105.2 105.2 ... 105.5 105.5 105.5
Data variables:
    red          (time, latitude, longitude) int16 757 881 922 ... 908 1000 1074
    green        (time, latitude, longitude) int16 806 828 874 ... 699 759 820
    blue         (time, latitude, longitude) int16 601 600 621 ... 552 590 629
    nir          (time, latitude, longitude) int16 2038 1852 1620 ... 2012 2012
    swir1        (time, latitude, longitude) int16 1136 1001 565 ... 2454 2685
    swir2        (time, latitude, longitude) int16 529 500 327 ... 1450 1611
    pixel_qa     (time, latitude, longitude) int32 66 66 66 66 ... 66 66 66 66
    clean_mask   (time, latitude, longitude) int8 0 0 0 0 0 0 0 ... 0 0 0 0 0 0
    clear_mask   (time, latitude, longitude) int8 0 0 0 0 0 0 0 ... 0 0 0 0 0 0
    water_mask   (time, latitude, longitude) int8 0 0 0 0 0 0 0 ... 0 0 0 0 0 0
    shadow_mask  (time, latitude, longitude) int8 0 0 0 0 0 0 0 ... 0 0 0 0 0 0
    cloud_mask   (time, latitude, longitude) int8 0 0 0 0 0 0 0 ... 0 0 0 0 0 0
    EVI          (time, latitude, longitude) float64 0.6178 0.3679 ... 0.2508
    NDBI         (time, latitude, longitude) float64 -0.5878 -0.5748 ... -0.1107
    NDVI         (time, latitude, longitude) float64 0.4583 0.3553 ... 0.304
    NDWI         (time, latitude, longitude) float64 -0.4332 -0.3821 ... -0.4209
    tsm          (time, latitude, longitude) float64 0.0 0.0 0.0 ... 0.0 0.0 0.0
Attributes:
    crs:      EPSG:4326

Export CSV

This section will be used to create a CSV export file for a given pixel. You will identify the pixel by selecting a specific Lat-Lon position and then the code will find the closest pixel to that point (nearest neighbor). Use the map at the top of this notebook to view your region and pick a Lat-Lon location. You can find an exact location by clicking on the map. The CSV file will contain the time series data for each XARRAY parameter.


In [24]:
# Lat and Lon coordinates extracted from the map above 
pixel_lat = 11.3972
pixel_lon = 105.3528

In [25]:
pixel = combined_dataset.sel(latitude  = pixel_lat,
                             longitude = pixel_lon,
                              method = 'nearest') # nearest neighbor selection

In [26]:
import xarray as xr
import csv

def ts_pixel_to_csv(pixel: xr.Dataset,
                    csv_file_name: str):
    def __yield_from_time_axis(px):
        for i in range(len(px.time)):
            yield px.isel(time = i)
    def __format_time(t):
        return t
    
    with open(csv_file_name,'w') as out:
        csv_out=csv.writer(out)
        column_names = ['time'] + list(pixel.data_vars)
        csv_out.writerow(column_names)

        for row in __yield_from_time_axis(pixel):
            csv_out.writerow([__format_time(row.time.values)] + [row[key].values for key in list(pixel.data_vars)])

In [27]:
csv_name = 'test.csv'

In [28]:
ts_pixel_to_csv(pixel, csv_name)

Export GeoTIFF

This section will be used to create a GeoTIFF export.


File formatting


In [29]:
import time
def time_to_string(t):
    return time.strftime("%Y_%m_%d_%H_%M_%S", time.gmtime(t.astype(int)/1000000000))

This function can be used to write a single time slice from an xarray to geotiff


In [30]:
from utils.data_cube_utilities import dc_utilities
def export_slice_to_geotiff(ds, path):
    dc_utilities.write_geotiff_from_xr(path,
                                        ds.astype(np.float32),
                                        list(combined_dataset.data_vars.keys()),
                                        crs="EPSG:4326")

For each time slice in a dataset we call export_slice_to_geotif


In [31]:
def export_xarray_to_geotiff(ds, path):
    for t in ds.time:
        time_slice_xarray = ds.sel(time = t)
        export_slice_to_geotiff(time_slice_xarray,
                                path + "_" + time_to_string(t) + ".tif")

Start Export

This is where we will start the GeoTIFF export and review the final product. The lines after this text box have been "commented out" so that they can be run at the end, after you have completed the creation of the XARRAY above and reviewed the data.


In [32]:
# export_xarray_to_geotiff(combined_dataset, "geotiffs/landsat7")

Check to see what files exist in geotiffs


In [33]:
# !ls -lah geotiffs/*.tif