Vegetation Phenology


Notebook Summary

  • LANDSAT 7, LANDSAT 8, or both are used to detect changes in plant life over time.
  • Either NDVI or EVI may be used as a indicator of vegatation.
  • The outputs of this notebook can be used to assess differences in agriculture fields over time or space and also allow the assessment of growing states such as planting and harvesting.

Algorithmic process


How It Works

To detect changes in plant life, we use a measure called NDVI.

  • NDVI is the ratio of the difference between the amount of near infrared light (NIR) and red light (RED) divided by their sum.
$$ NDVI = \frac{(NIR - RED)}{(NIR + RED)}$$


The intention is to observe how much red light is being absorbed versus reflected. Photosynthetic plants absorb most of the visible spectrum's wavelengths when they are healthy. When they aren't healthy, more of that light will get reflected, making the difference between NIR and RED much smaller, which will lower the NDVI value. So a higher NDVI indicates a higher degree of vegetation.

However, we can also use a measure called EVI.

$$ EVI = G * \frac{(NIR - RED)}{(NIR + C1*RED-C2*BLUE+L)}$$


EVI is superior to NDVI in accuracy because it is less dependent on the solar incidence angle, atmospheric conditions (e.g. particles and clouds), shadows, and soil appearance. Normally, $G=2.5$, $C1=6$, $C2=7.5$, and $L=1$.

Additionally, there is a 2-band version of EVI called EVI2.

$$ EVI2 = G * \frac{(NIR - RED)}{(NIR + C*RED+L)}$$


EVI2 does not require a blue band like EVI, which means less data is required to use it. Additionally, the blue band used in EVI can have a low signal-to-noise ratio in earth observation imagery. When atmospheric effects are insignificant (e.g. on clear days), EVI2 should closely match EVI. Normally, $G=2.5$, $C=2.4$, and $L=1$.

Import Dependencies and Connect to the Data Cube [▴](#top)


In [1]:
%matplotlib inline
import warnings
# Supress Warning 
warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt

# Import the datacube and the API
import datacube
from utils.data_cube_utilities.data_access_api import DataAccessApi
import numpy as np
import pandas as pd
import xarray as xr
import datetime as dt

In [2]:
# Create an instance of the datacube and API.
api = DataAccessApi()
dc = api.dc

Choose Platform and Product [▴](#top)


In [3]:
# Get available products
products_info = dc.list_products()

In [4]:
# List LANDSAT 7 products
print("LANDSAT 7 Products:")
products_info[["platform", "name"]][products_info.platform == "LANDSAT_7"]


LANDSAT 7 Products:
Out[4]:
platform name
id
2 LANDSAT_7 ls7_usgs_sr_scene

In [5]:
# List LANDSAT 8 products
print("LANDSAT 8 Products:")
products_info[["platform", "name"]][products_info.platform == "LANDSAT_8"]


LANDSAT 8 Products:
Out[5]:
platform name
id
1 LANDSAT_8 ls8_usgs_sr_scene

In [6]:
# These are the platforms (satellites) and products (datacube sets) 
# used for this demonstration.
use_Landsat7 = True
use_Landsat8 = False
platforms = []
products = []
if use_Landsat7:
    platforms.append('LANDSAT_7')
    products.append('ls7_usgs_sr_scene')
if use_Landsat8:
    platforms.append('LANDSAT_8')
    products.append('ls8_usgs_sr_scene')

Get the Maximum Extents of the Cube [▴](#top)


In [7]:
for product, platform in zip(products, platforms):
    prod_extents = api.get_query_metadata(platform=platform, product=product, measurements=[])

    latitude_extents = prod_extents['lat_extents']
    longitude_extents = prod_extents['lon_extents']
    time_extents = list(map(lambda time: time.strftime('%Y-%m-%d'), prod_extents['time_extents']))

    print("{}:".format(platform))
    print("Lat bounds:", latitude_extents)
    print("Lon bounds:", longitude_extents)
    print("Time bounds:", time_extents)


LANDSAT_7:
Lat bounds: (-12.57305555565614, 18.32305555570214)
Lon bounds: (-25.66583333353866, 44.05861111146359)
Time bounds: ['1999-07-08', '2020-01-10']

In [8]:
from utils.data_cube_utilities.dc_load import get_overlapping_area
from utils.data_cube_utilities.dc_display_map import display_map

full_lat, full_lon, min_max_dates = get_overlapping_area(api, platforms, products)

# Display the total shared area available for these datacube products.
display_map(latitude = full_lat,longitude = full_lon)


Out[8]:

Define the Extents of the Analysis [▴](#top)

Specify start and end dates in the same order as platforms and products


In [9]:
# Use these four lines to select the time slice common to all products.
# min_start_date_mutual = np.max(min_max_dates[:,0])
# max_end_date_mutual = np.min(min_max_dates[:,1])
# start_dates = [min_start_date_mutual, min_start_date_mutual]
# end_dates = [max_end_date_mutual, max_end_date_mutual]
# Use these two lines to select all data available to each product.
# start_dates = min_max_dates[:,0]
# end_dates = min_max_dates[:,1]

# Select a subset of the time available.
start_date, end_date = "2016-01-01", "2017-12-31"
start_date, end_date = "2014-09-01", "2015-01-31"

Specify an area to analyze


In [10]:
# Specify latitude and longitude bounds of an interesting area within the full extents

# Ghana
# min_lat_small, max_lat_small = (5.7261, 5.7390) # Crops NW of Accra (small)
# min_lon_small, max_lon_small = (-0.4960, -0.4889) # Crops NW of Accra (small)
# min_lat_small, max_lat_small = (5.7259, 5.7517) # Crops NW of Accra (medium)
# min_lon_small, max_lon_small = (-0.5308, -0.5143) # Crops NW of Accra (medium)
min_lat_small, max_lat_small = (8.0074, 8.0203) # Central Ghana - West of Kintampo
min_lon_small, max_lon_small = (-2.0486, -2.0332) # Central Ghana - West of Kintampo
    
# Vietnam
# min_lat_small, max_lat_small = (10.95, 11.00) # Area #1
# min_lon_small, max_lon_small = (107.15, 107.20) # Area #1
# min_lat_small, max_lat_small = (11.10, 11.39) # Area #2
# min_lon_small, max_lon_small = (106.8, 106.92) # Area #2
# min_lat_small, max_lat_small = (9.8, 10.0) # Area #3
# min_lon_small, max_lon_small = (105.2, 105.4) # Area #3
# min_lat_small, max_lat_small = (9.9, 10.1) # Area #4
# min_lon_small, max_lon_small = (105.0, 105.2) # Area #4

Visualize the selected area


In [11]:
lon_small = (min_lon_small, max_lon_small)
lat_small = (min_lat_small, max_lat_small)
display_map(lat_small, lon_small)


Out[11]:

Load Data from the Data Cube [▴](#top)


In [12]:
from utils.data_cube_utilities.dc_load import load_multiplatform

# Select which spectral index to use to track changes in vegetation.
# Can be any of ['NDVI', 'EVI', 'EVI2'].
spectral_index = 'EVI2'

if spectral_index == 'NDVI':
    measurements = ['red', 'nir']
elif spectral_index == 'EVI':
    measurements = ['red', 'nir', 'blue']
else: # EVI2
    measurements = ['red', 'nir']
# Include pixel_qa band to obtain the clean mask in load_multiplatform.
# Include red, green, and blue bands to be able to show RGB images of the data.
measurements = list(set(measurements + ['pixel_qa', 'red', 'green', 'blue']))
dataset, clean_mask, _ = \
    load_multiplatform(dc, platforms, products, 
                       load_params=dict(lat=lat_small, lon=lon_small, time=(start_date, end_date),
                                        measurements=measurements))
cleaned_dataset = dataset.where(clean_mask)
del dataset # Save memory

Calculate Vegetation Index [▴](#top)


In [13]:
from utils.data_cube_utilities.dc_ndvi_anomaly import NDVI, EVI, EVI2

if spectral_index == 'NDVI':
    vegetation_arr = NDVI(cleaned_dataset)
elif spectral_index == 'EVI':
    vegetation_arr = EVI(cleaned_dataset)
else: # 'EVI2'
    vegetation_arr = EVI2(cleaned_dataset)
cleaned_dataset[spectral_index] = vegetation_arr

Examine the Selected Area [▴](#top)

If no plots appear in the figures below, there is no data available for the region selected

Box-and-Whisker Plot by Full Time Period, Week, Month, Week of Year, or Month of Year.


In [14]:
from utils.data_cube_utilities.plotter_utils import xarray_time_series_plot

## Settings ##

# Specify the type of curve fit to use for the vegetation index along time.
# Can be one of [None, 'gaussian', 'gaussian_filter', 'poly', 'cubic_spline'].
# Using None will not plot a curve fit.
curve_fit_type = 'gaussian_filter'
assert curve_fit_type in [None, 'gaussian', 'gaussian_filter', 'poly', 'cubic_spline'], \
    "The variable `curve_fit_type` must be in " \
    "[None, 'gaussian', 'gaussian_filter', 'poly', 'cubic_spline']"

# Specify the target aggregation type of the curve fit. Input can be either 'mean' or 'median'.
curve_fit_target = 'median'
assert curve_fit_target in ['mean', 'median'], "The variable 'curve_fit_target' must be either "\
                                               "'mean' or 'median'."

# The maximum number of data points that appear along time in each plot.
# If more than this number of data points need to be plotted, a grid of plots will be created.
max_times_per_plot = 10

## End Settings ##

# Can be one of [None, 'week', 'month', 'weekofyear', 'monthofyear'].
for bin_by in ['monthofyear']:
    aggregated_by_str = None
    if bin_by is None:
        plotting_data = cleaned_dataset
    elif bin_by == 'week':
        plotting_data = cleaned_dataset.resample(time='1w').mean()
        aggregated_by_str = 'Week'
    elif bin_by == 'month':
        plotting_data = cleaned_dataset.resample(time='1m').mean()
        aggregated_by_str = 'Month'
    elif bin_by == 'weekofyear':
        plotting_data = cleaned_dataset.groupby('time.week').mean(dim=('time'))
        aggregated_by_str = 'Week of Year'
    elif bin_by == 'monthofyear':
        plotting_data = cleaned_dataset.groupby('time.month').mean(dim=('time'))
        aggregated_by_str = 'Month of Year'
    
params = dict(dataset=plotting_data, 
              plot_descs={spectral_index:{'none':[{'box':{'boxprops':{'facecolor':'forestgreen'}}}]}})
if curve_fit_type is not None:
    if curve_fit_type == 'gaussian':
        curve_fit_desc = [{'gaussian': {}}]
    elif curve_fit_type == 'gaussian_filter':
        curve_fit_desc = [{'gaussian_filter': {}}]
    elif curve_fit_type == 'poly':
        curve_fit_desc = [{'poly': {'degree': 3}}]
    else: # 'cubic_spline'
        curve_fit_desc = [{'cubic_spline': {}}]
    params['plot_descs'][spectral_index][curve_fit_target] = curve_fit_desc
    
xarray_time_series_plot(**params, fig_params=dict(figsize=(8,4), dpi=150))
plt.title('Box-and-Whisker Plot by Full Time Period')
agg_str = "(Aggregated by {})".format(aggregated_by_str) if aggregated_by_str is not None else ""
print("{} {}".format(spectral_index, agg_str))


EVI2 (Aggregated by Month of Year)

Create a Max NDVI Composite [▴](#top)


In [15]:
from utils.data_cube_utilities.dc_mosaic import create_max_ndvi_mosaic
from utils.data_cube_utilities.dc_rgb import rgb

max_ndvi_mosaic = create_max_ndvi_mosaic(cleaned_dataset, clean_mask)
rgb(max_ndvi_mosaic, use_data_min=True, use_data_max=True, min_inten=0.05)
plt.show()


Export the Mosaic to a PNG and a GeoTIFF [▴](#top)

Export to PNG


In [16]:
from utils.data_cube_utilities.dc_utilities import write_png_from_xr
import os
import pathlib
png_dir = 'output/pngs'
pathlib.Path(png_dir).mkdir(parents=True, exist_ok=True)
write_png_from_xr('{}/NDVI_Phenology_Max_NDVI_Mosaic.png'.format(png_dir), max_ndvi_mosaic, ['red','green','blue'], scale=(0,4000))

Export to GeoTIFF


In [17]:
from utils.data_cube_utilities.import_export import export_xarray_to_multiple_geotiffs
geotiff_dir = 'output/geotiffs/NDVI_Phenology'
pathlib.Path(geotiff_dir).mkdir(parents=True, exist_ok=True)
export_xarray_to_multiple_geotiffs(cleaned_dataset, "{}/NDVI_Phenology".format(geotiff_dir))