A vegetation index is generated by combining two or more spectral bands from a satellite image. There are many different vegetation indices; in this exercise we'll learn about the most commonly-used index.
Researchers often use a vegetation index called NDVI to measure the "greenness" or density of vegetation across a landscape. In addition to monitoring vegetation health, NDVI (Normalized Difference Vegetation Index) can be used to track climate change, agricultural production, desertification, and land cover change. Developed by NASA scientist Compton Tucker in 1977, NDVI is derived from satellite imagery and compares reflected near-infrared light to reflected visible red light. It can be expressed as following equation:
In general, healthy and/or dense vegetation reflects a lot of near-infrared light and not as much red visible light. Conversely, when vegetation is sparse or not-so-healthy, its near-infrared reflectance decreases and its red light reflectance increases. You can read more about how NDVI is used to study cyclical, seasonal, and long-term changes to the Earth's physical characteristics from NASA and USGS researchers.
To create this vegetation index, we're going to use PlanetScope's SR (Surface Reflectance) data product. You can learn more about Surface Reflectance (SR) and Planet data here, but for the purposes of this exercise, all you need to know is: SR data is satellite data that has been algorithmically corrected to remove atmospheric interference.
In this exercise, you'll learn how to perform an NDVI calculation on PlanetScope Surface Reflectance data in Python, and generate a colorized NDVI image for visual analysis. Here are the steps to follow:
$PL_API_KEY
.item-type
: PSScene4Band
; asset-type
: analytic_sr
For this exercise you'll need a 4-band PlanetScope Surface Reflectance product. You can search for & download your own data, or use the demo data provided in-class. If you choose to use the demo data, skip to Step 2.
To search for your own data, you'll first need to define an Area of Interest (AOI). http://geojson.io is a free browser-based tool that makes generating a GeoJSON-formatted AOI easy.
Once that's done, use one of the following methods to search for & download data:
With all of the above, you'll want to filter for 4-Band PlanetScope data (item type: PSScene4Band
) and download the associated SR product (asset type: analytic_sr
)
In [ ]:
# To use Planet's CLI from this Notebook, begin your line as follows:
!planet data
# Here is an example of using Planet's CLI to search for a known item id:
# !planet data download --item-type PSScene4Band --asset-type analytic_sr --dest data --string-in id 20160831_180302_0e26
If you prefer to use Planet's API directly via Python, this search & download quickstart Notebook may be useful.
In [ ]:
# To use Planet's API, you'll probably begin by importing your favorite HTTP toolkit, e.g.:
import requests
from requests.auth import HTTPBasicAuth
# Your Planet API key is available in this Notebook as an env variable, e.g.:
import os
PLANET_API_KEY = os.getenv('PL_API_KEY')
If you prefer to browse for images visually, log in to your Planet account and use Planet Explorer to search for PlanetScope imagery. You'll want to make sure to set the Source filter to show only 4-band PlanetScope Scene
results.
You can click here for an example search showing 4-band PlanetScope data in California's Central Valley.
Regardless of the path you chose to obtain data for this exercise, once you have successfully aquired a 4-band PlanetScope analytic_SR
-type GeoTIFF, place the file in the data/ directory adjacent to this Notebook.
For this step, use Rasterio to open the raster image you downloaded (the .tif file). After that, use Rasterio read the data from the red and near-infrared bands: this will load the band data into arrays that you can manipulate using Python's NumPy libary.
Note: in PlanetScope 4-band images, the band order is BGRN: (1) Blue, (2) Green, (3) Red, (4) Near-infrared.
In [ ]:
import rasterio
filename = "data/20160831_180302_0e26_3B_AnalyticMS_SR.tif"
# Load red and NIR bands - note all PlanetScope 4-band images have band order BGRN
with rasterio.open(filename) as src:
band_red = src.read(3)
with rasterio.open(filename) as src:
band_nir = src.read(4)
Next, you're going to calculate NDVI through subtraction and division of the values stored in the NumPy arrays. This calculation will give you NDVI values that range from -1 to 1. Values closer to 1 indicate a greater density of vegetation or higher level of "greenness."
As a reminder, the NDVI formula is:
\begin{equation*} ndvi = \frac{nir-red}{(nir+red)} \end{equation*}Where nir
is the Near-infrared band, and red
is the Red band.
In [ ]:
# allow division by zero without throwing a warning
import numpy
numpy.seterr(divide='ignore', invalid='ignore')
# Calculate NDVI - remember, bands read via rasterio are just numpy arrays
ndvi = (band_nir.astype(float) - band_red.astype(float)) / (band_nir + band_red)
As a quick check of our calculations, let's print the minimum and maximum values in our calculated ndvi
. Because we're using the NDVI formula to normalize the input bands, we know that our expected values should fall within -1.0 to +1.0.
(HINT: this is still a numpy array, so use numpy functions here).
In [ ]:
# check range NDVI values, excluding NaN
print(numpy.nanmin(ndvi))
print(numpy.nanmax(ndvi))
Assuming your min & max values are in-range -- congratulations! You have performed what is known as raster band math. Well done. This skill has many applications beyond the NDVI you're calculating in this exercise: the relationship of values between different spectral bands is the basis for many kinds of remote sensing analysis.
Now that you've calculated NDVI values, you're going to save the results to a new single-band image, making sure the new image file uses the geospatial metadata from the GeoTIFF you originally acquired, and the dtype
of the new numpy array you generated above.
In [ ]:
# get the metadata of original GeoTIFF:
meta = src.meta
print(meta)
# get the dtype of our NDVI array:
ndvi_dtype = ndvi.dtype
print(ndvi_dtype)
# set the source metadata as kwargs we'll use to write the new data:
kwargs = meta
# update the 'dtype' value to match our NDVI array's dtype:
kwargs.update(dtype=ndvi_dtype)
# update the 'count' value since our output will no longer be a 4-band image:
kwargs.update(count=1)
# Finally, use rasterio to write new raster file 'data/ndvi.tif':
with rasterio.open('data/ndvi.tif', 'w', **kwargs) as dst:
dst.write(ndvi, 1)
Now that you've created ndvi.tif, you may be tempted to open it immediately & take a look at what you've accomplished. If you do, don't be disappointed when ndvi.tif
opened in your favorite image viewer doesn't look like much at all. That's normal! Remember that this is not just any .tif but a GeoTIFF - one in which every pixel has a value of 1.0 or less.
At this point, you could open ndvi.tif
in a Desktop GIS GUI like QGIS, and define color values for each pixel in order to get meaningful visual information out of the data. But this is a Python exercise, so let's use Matplotlib to do the same thing.
As we verified earlier, we know the values in our NDVI will range from -1 to 1. To best visualize this, we want to use a diverging color scheme, and we want to center the colorbar at a defined midpoint. Interestingly, the best midpoint for NDVI analysis is 0.1 - not 0.0 as you might expect. You can read more about how NDVIs are interpreted here.
To normalize a colorbar against our custom midpoint, we're going to take advantage of the following handy class originally created by Joe Kington:
In [ ]:
from matplotlib import colors
# Credit: Joe Kington
class MidpointNormalize(colors.Normalize):
"""
Normalize the colorbar so that diverging bars work there way either side from a prescribed midpoint value
"""
def __init__(self, vmin=None, vmax=None, midpoint=None, clip=False):
self.midpoint = midpoint
colors.Normalize.__init__(self, vmin, vmax, clip)
def __call__(self, value, clip=None):
# I'm ignoring masked values and all kinds of edge cases to make a
# simple example...
x, y = [self.vmin, self.midpoint, self.vmax], [0, 0.5, 1]
return numpy.ma.masked_array(numpy.interp(value, x, y), numpy.isnan(value))
In [ ]:
# Begin by pulling in pyplot
import matplotlib.pyplot as plt
# Set min/max values from NDVI range for image
# HINT: refer back to earlier, when we verified our min & max values were within expected range
min=numpy.nanmin(ndvi)
max=numpy.nanmax(ndvi)
# Set our custom midpoint for most effective NDVI analysis
mid=0.1
# Set your favorite diverging color scheme
# You can use https://matplotlib.org/users/colormaps.html as a reference
colormap = plt.cm.RdYlGn
# Call MidPointNormalize with our min, max, and custom midpoint
norm = MidpointNormalize(vmin=min, vmax=max, midpoint=mid)
# Create a pyplot figure, in which we'll display our colorized NDVI
fig = plt.figure(figsize=(20,10))
# Add a subplot to our figure, which will contain the colorbar
ax = fig.add_subplot(111)
# Use 'imshow' to specify the input data, colormap, min, max, and norm for the colorbar
cbar_plot = ax.imshow(ndvi, cmap=colormap, vmin=min, vmax=max, norm=norm)
# Turn off the display of axis labels
ax.axis('off')
# Set a title
ax.set_title('Normalized Difference Vegetation Index', fontsize=18, fontweight='bold')
# Configure the colorbar
cbar = fig.colorbar(cbar_plot, orientation='horizontal', shrink=0.65)
# Call 'savefig' to save this plot to an image file
fig.savefig("data/ndvi-fig.png", dpi=200, bbox_inches='tight', pad_inches=0.7)
# Finally - let's take a look!
plt.show()
Congratulations! You've used band math to apply a well-known vegetation index formula to satellite data, and visualized it for analysis using a diverging color ramp. You're well on your way to getting meaningful information out of satellite imagery using Python.
As one last step, you use pyplot
to generate a histogram of values in your NDVI calculation. This can be useful for quick analysis, giving visual insight into the distribution of "healthy" vs "unhealthy" vegetation values in your study area.
In [ ]:
# Define a new figure
fig2 = plt.figure(figsize=(20,10))
# Give this new figure a subplot, which will contain the histogram itself
ax = fig2.add_subplot(111)
# Add a title & (x,y) labels to the plot
plt.title("NDVI Histogram", fontsize=18, fontweight='bold')
plt.xlabel("NDVI values", fontsize=14)
plt.ylabel("Number of pixels", fontsize=14)
# For the x-axis, we want to count every pixel that is not an empty value
x = ndvi[~numpy.isnan(ndvi)]
# Define the number of bins to divide the data into
bins = 20
# Define a color for the histogram
# You can use https://matplotlib.org/2.0.0/examples/color/named_colors.html as a reference
color = 'lightgreen'
# call 'hist` with our x-axis, bins, and color details
ax.hist(x,bins,color=color)
# Save the generated figure to an external image file
fig2.savefig("data/ndvi-histogram.png", dpi=200, bbox_inches='tight', pad_inches=0.7)
# Finally - let's take a look!
plt.show()