Deriving a vegetation index from 4-band satellite data

A vegetation index is generated by combining two or more spectral bands from a satellite image. There are many different vegetation indices; in this exercise we'll learn about the most commonly-used index.

How to use this exercise

Adjacent to this notebook, you'll find a notebook titled generate-ndvi-exercise-key.ipynb: that notebook contains the "answers", while the code in this notebook is incomplete. As you work through this notebook, you'll see some codeblocks that contain None values: replace these None instances with your own code until the codeblocks here run successfully. If you get stuck try to use outside resources like documentation, previous work, or even other students here in this class before taking a peek at the Key.

NDVI

Researchers often use a vegetation index called NDVI to measure the "greenness" or density of vegetation across a landscape. In addition to monitoring vegetation health, NDVI (Normalized Difference Vegetation Index) can be used to track climate change, agricultural production, desertification, and land cover change. Developed by NASA scientist Compton Tucker in 1977, NDVI is derived from satellite imagery and compares reflected near-infrared light to reflected visible red light. It can be expressed as following equation:

In general, healthy and/or dense vegetation reflects a lot of near-infrared light and not as much red visible light. Conversely, when vegetation is sparse or not-so-healthy, its near-infrared reflectance decreases and its red light reflectance increases. You can read more about how NDVI is used to study cyclical, seasonal, and long-term changes to the Earth's physical characteristics from NASA and USGS researchers.

To create this vegetation index, we're going to use PlanetScope's SR (Surface Reflectance) data product. You can learn more about Surface Reflectance (SR) and Planet data here, but for the purposes of this exercise, all you need to know is: SR data is satellite data that has been algorithmically corrected to remove atmospheric interference. If you'd like to learn much more about reflectance conversions and atmospheric interference, see this in-class exercise.

In this exercise, you'll learn how to perform an NDVI calculation on PlanetScope Surface Reflectance data in Python, and generate a colorized NDVI image for visual analysis. Here are the steps to follow:

Download a PlanetScope SR product
Extract data from the red and near-infrared bands
Perform the NDVI calculation
Save the NDVI image
Apply a color scheme to the NDVI image
Generate a histogram to view NDVI values

Requirements

Python 2.7 or 3+
Planet's Python Client
rasterio
numpy
matplotlib
Planet API Key, stored as environment variable $PL_API_KEY.
Planet 4-Band Imagery with the following specifications: item-type: PSScene4Band; asset-type: analytic_sr

Step 1. Download a PlanetScope SR Product

For this exercise you'll need a 4-band PlanetScope Surface Reflectance product. You can search for & download your own data, or use the demo data provided in-class. If you choose to use the demo data, skip to Step 2.

To search for your own data, you'll first need to define an Area of Interest (AOI). http://geojson.io is a free browser-based tool that makes generating a GeoJSON-formatted AOI easy.

Once that's done, use one of the following methods to search for & download data:

using Planet's Python CLI to interact with Planet's API from the command line
using Planet's API directly to search and download
using the Planet Explorer site to visually search for & download data

With all of the above, you'll want to filter for 4-Band PlanetScope data (item type: PSScene4Band) and download the associated SR product (asset type: analytic_sr)

Option 1: Searching & Downloading via CLI

If you choose to use Planet's CLI, you might fight these search and download quickstart guides to be useful.



In [ ]:

    
# To use Planet's CLI from this Notebook, begin your line as follows:
!planet data

# < add your own code here >

Option 2: Searching & Downloading via API

If you prefer to use Planet's API directly via Python, this search & download quickstart Notebook may be useful.



In [ ]:

    
# To use Planet's API, you'll probably begin by importing your favorite HTTP toolkit, e.g.:
import requests
from requests.auth import HTTPBasicAuth

# Your Planet API key is available in this Notebook as an env variable, e.g.:
import os
PLANET_API_KEY = os.getenv('PL_API_KEY')

# < add your own code here >

Option 3: Searching & Downloading via Planet Explorer

If you prefer to browse for images visually, log in to your Planet account and use Planet Explorer to search for PlanetScope imagery. You'll want to make sure to set the Source filter to show only 4-band PlanetScope Scene results.

You can click here for an example search showing 4-band PlanetScope data in California's Central Valley.

Success! Data Obtained

Regardless of the path you chose to obtain data for this exercise, once you have successfully aquired a 4-band PlanetScope analytic_SR-type GeoTIFF, place the file in the data/ directory adjacent to this Notebook.

Step 2. Extract the data from the red and near-infrared bands

For this step, use Rasterio to open the raster image you downloaded (the .tif file). If you need a refresher, you might find the 'Inspecting Satellite Imagery with rasterio' Notebook useful.

Once you've read in the raster image, use Rasterio read the data from the red and near-infrared bands: this will load the band data into arrays that you can manipulate using Python's NumPy libary.

Note: in PlanetScope 4-band images, the band order is BGRN: (1) Blue, (2) Green, (3) Red, (4) Near-infrared.



In [1]:

    
import rasterio

filename = None
# < update above with your own code >


# Load red and NIR bands - note all PlanetScope 4-band images have band order BGRN

# < add your own code here >

Step 3. Perform the NDVI calculation

Next, you're going to calculate NDVI through subtraction and division of the values stored in the NumPy arrays. This calculation will give you NDVI values that range from -1 to 1. Values closer to 1 indicate a greater density of vegetation or higher level of "greenness."

As a reminder, the NDVI formula is:

\begin{equation*} ndvi = \frac{nir-red}{(nir+red)} \end{equation*}

Where nir is the Near-infrared band, and red is the Red band.



In [ ]:

    
# allow division by zero without throwing a warning
import numpy
numpy.seterr(divide='ignore', invalid='ignore')


# Calculate NDVI - remember, bands read via rasterio are just numpy arrays
ndvi = None

# < update above with your own code >

As a quick check of our calculations, let's print the minimum and maximum values in our calculated ndvi. Because we're using the NDVI formula to normalize the input bands, we know that our expected values should fall within -1.0 to +1.0.

(HINT: this is still a numpy array, so use numpy functions here).



In [ ]:

    
# check range NDVI values, excluding NaN

# < add your own code here >

Assuming your min & max values are in-range -- congratulations! You have performed what is known as raster band math. Well done. This skill has many applications beyond the NDVI you're calculating in this exercise: the relationship of values between different spectral bands is the basis for many kinds of remote sensing analysis.

Step 5. Save the NDVI image

Now that you've calculated NDVI values, you're going to save the results to a new single-band image, making sure the new image file uses the geospatial metadata from the GeoTIFF you originally acquired, and the dtype of the new numpy array you generated above.



In [ ]:

    
# get the metadata of original GeoTIFF:

# < add your own code here >


# get the dtype of our NDVI array:

# < add your own code here >


# set the source metadata as kwargs we'll use to write the new data:

# < add your own code here >


# update the 'dtype' value to match our NDVI array's dtype:

# < add your own code here >


# update the 'count' value since our output will no longer be a 4-band image:

# < add your own code here >


# Finally, use rasterio to write new raster file 'data/ndvi.tif':

# < add your own code here >

Step 6. Apply a color scheme to visualize the NDVI values on the image

Now that you've created ndvi.tif, you may be tempted to open it immediately & take a look at what you've accomplished. If you do, don't be disappointed when ndvi.tif opened in your favorite image viewer doesn't look like much at all. That's normal! Remember that this is not just any .tif but a GeoTIFF - one in which every pixel has a value of 1.0 or less.

At this point, you could open ndvi.tif in a Desktop GIS GUI like QGIS, and define color values for each pixel in order to get meaningful visual information out of the data. But this is a Python exercise, so let's use Matplotlib to do the same thing.

As we verified earlier, we know the values in our NDVI will range from -1 to 1. To best visualize this, we want to use a diverging color scheme, and we want to center the colorbar at a defined midpoint. Interestingly, the best midpoint for NDVI analysis is 0.1 - not 0.0 as you might expect. You can read more about how NDVIs are interpreted here.

To normalize a colorbar against our custom midpoint, we're going to take advantage of the following handy class originally created by Joe Kington:



In [ ]:

    
from matplotlib import colors

# Credit: Joe Kington
class MidpointNormalize(colors.Normalize):
    """
    Normalize the colorbar so that diverging bars work there way either side from a prescribed midpoint value
    
    """
    def __init__(self, vmin=None, vmax=None, midpoint=None, clip=False):
        self.midpoint = midpoint
        colors.Normalize.__init__(self, vmin, vmax, clip)

    def __call__(self, value, clip=None):
        # I'm ignoring masked values and all kinds of edge cases to make a
        # simple example...
        x, y = [self.vmin, self.midpoint, self.vmax], [0, 0.5, 1]
        return numpy.ma.masked_array(numpy.interp(value, x, y), numpy.isnan(value))



In [ ]:

    
# Begin by pulling in pyplot
import matplotlib.pyplot as plt

# Set min/max values from NDVI range for image
# HINT: refer back to earlier, when we verified our min & max values were within expected range
min = None
max = None
# < update above with your own code >

# Set our custom midpoint for most effective NDVI analysis
mid = None
# < update above with your own code >

# Set your favorite diverging color scheme 
# You can use https://matplotlib.org/users/colormaps.html as a reference
colormap = None
# < update above with your own code >

# Call MidPointNormalize with our min, max, and custom midpoint
norm = None
# < update above with your own code >

# Create a pyplot figure, in which we'll display our colorized NDVI
fig = None
# < update above with your own code >

# Add a subplot to our figure, which will contain the colorbar
ax = None
# < update above with your own code >

# Use 'imshow' to specify the input data, colormap, min, max, and norm for the colorbar
cbar_plot = None
# < update above with your own code >

# Turn off the display of axis labels 
ax.axis('off')

# Set a title 
ax.set_title('Normalized Difference Vegetation Index', fontsize=18, fontweight='bold')

# Configure the colorbar with 'cbar_plot'
cbar = None
# < update above with your own code >

# Call 'savefig' to save this plot to an image file
# < add your own code here >

# Finally - let's take a look!
plt.show()

7. Generate a histogram of NDVI values

Congratulations! You've used band math to apply a well-known vegetation index formula to satellite data, and visualized it for analysis using a diverging color ramp. You're well on your way to getting meaningful information out of satellite imagery using Python.

As one last step, you use pyplot to generate a histogram of values in your NDVI calculation. This can be useful for quick analysis, giving visual insight into the distribution of "healthy" vs "unhealthy" vegetation values in your study area.



In [ ]:

    
# Define a new figure
fig2 = None
# < update above with your own code >

# Give this new figure a subplot, which will contain the histogram itself
ax = None
# < update above with your own code >

# Add a title & (x,y) labels to the plot
# < add your own code here >

# For the x-axis, we want to count every pixel that is not an empty value
x = None
# < update above with your own code >

# Define the number of bins to divide the data into
bins = None
# < update above with your own code >

# Define a color for the histogram
# You can use https://matplotlib.org/2.0.0/examples/color/named_colors.html as a reference
color = None
# < update above with your own code >

# call 'hist` with our x-axis, bins, and color details
# < add your own code here >

# Save the generated figure to an external image file
# < add your own code here >

# Finally - let's take a look!
plt.show()