In [1]:
# Enable importing of utilities
import sys
sys.path.append('..')
%matplotlib inline
from pylab import rcParams
rcParams['figure.figsize'] = 10, 10
The previous tutorial introduced Landsat 7 imagery. The Lake Chad dataset was split into pre and post rainy season data-sets. The datasets were then cleaned up to produce a cloud-free and SLC-gap-free composite.
This tutorial will focus on analyzing bodies of water using the results of a water classification algorithm called WOFS
The algorithmic process is fairly simple. It is a chain of operations on our composite imagery. The goal here is to use water classifiers on our composite imagery to create comparabe water-products. Then to use the difference between the water products as a change classifier.
In our previous notebook two composites were created to represent cloud and SLC-gap imagery of pre-rainy season and post rainy season Landsat7 imagery. They were saved NETCDF files to use in this tutorial.
Xarrays were designed with NETCDF as it's primary storage format so loading them should be a synch. Start with the import:
In [2]:
import xarray as xr
In [3]:
pre_rain = xr.open_dataset('../demo/pre_rain.nc')
Lets print its contents as a high level check that data is loaded.
In [4]:
pre_rain
Out[4]:
The pre_rain
xarray should represents an area that looks somewhat like this:
Note: figure above is cached result
In [5]:
post_rain = xr.open_dataset('../demo/post_rain.nc')
Lets print this one as well
In [6]:
post_rain
Out[6]:
The post xarray represents an area that looks somewhat like this:
Note: figure above is cached result
The goal of water classification is to classify each pixel as water or not water. The applications of water classification can range from identifying flood-plains or coastal boundaries, to observing trends like coastal erosion or the seasonal fluctuations of water. The purpose of this section is to classify bodies of water on pre and post rainy season composites so that we can start analyzing change in lake-chad's surface area.
WOFS( Water Observations From Space) is a water classifier developed by the Australian government following extreme flooding in 2011. It uses a regression tree machine learning model trained on several geographically and geologically varied sections of the Australian continent on over 25 years of Landsat imagery.
While details of its implementation are outside of the scope of this tutorial, you can:
Running the wofs classifier is as simple as running a function call. It is typically good practice to create simple functions that accept an Xarray Dataset and return a processed XARRAY Dataset with new data-arrays within it.
In [7]:
from utils.data_cube_utilities.dc_water_classifier import wofs_classify
import numpy as np
clean_mask = np.ones((pre_rain.sizes['latitude'],pre_rain.sizes['longitude'])).astype(np.bool)
pre_water = wofs_classify(pre_rain, clean_mask = clean_mask, mosaic = True)
print(pre_water)
post_water = wofs_classify(post_rain, clean_mask = clean_mask, mosaic = True)
An interesting feature of Xarrays is their built-in support for plotting. Any data-arrays can plot its values using a plot function. Let's see what data-arrays come with wofs classifiers:
In [8]:
pre_water
Out[8]:
The printout shows that wofs produced a dataset with a single data-array called wofs
. Lets see what sort of values are in wofs
by running an np.unique command on it.
In [9]:
np.unique(pre_water.wofs)
Out[9]:
In [10]:
pre_water.wofs.plot(cmap = "Blues")
Out[10]:
In [11]:
post_water.wofs.plot(cmap = "Blues")
Out[11]:
The two images rendered above aren't too revealing when it comes to observing significant trends in water change. Perhaps we should take advantage of Xarrays arithmetic capabilities to detect or highlight change in our water classes.
Arithmetic operations like addition and subtraction can be applied to xarray datasets that share the same shape. For example, the following differencing operation ....
In [12]:
water_change = post_water - pre_water
... applies the difference operator to all values within the wofs data-array with extreme efficiency. If we were, to check unique values again...
In [13]:
np.unique(water_change.wofs)
Out[13]:
... then we should encounter three values. 1, 0, -1. These values can be interpreted as values indicating change in water. The table below should serve as an incredibly clear reference:
Understanding the intuition and logic behind this differencing, I think we're ready to take a look at a plot of water change over the area...
In [14]:
water_change.wofs.plot()
Out[14]:
In [15]:
## Create a boolean xarray
water_growth = (water_change.wofs == 1)
water_loss = (water_change.wofs == -1)
## Casting a 'boolean' to an 'int' makes 'True' values '1' and 'False' Values '0'. Summing should give us our totals
total_growth = water_growth.astype(np.int8).sum()
total_loss = water_loss.astype(np.int8).sum()
The results...
In [16]:
print("Growth:", int(total_growth.values))
print("Loss:", int(total_loss.values))
print("Net Change:", int(total_growth - total_loss))
Several guesses can be made here as to why water was lost after the rainy season. Since that is out scope for this lecture(and beyond the breadth of this developer's knowledge) I'll leave definitive answers to the right researchers in this field.
What can be provided, however, is an addititional figure regarding trends precipitation.
Lets bring back the GPM data one more time and increase the time range by one year in both directions.
Instead of spanning the year of 2015 to 2016, let's do 2014 to 2017.
Load GPM
Using the same code from our first gpm tutorial, let's load in three years of rainfall data:
In [17]:
import datacube
dc = datacube.Datacube(app = "chad_rainfall", config = '/home/localuser/.datacube.conf')
## Define Geographic boundaries using a (min,max) tuple.
latitude = (12.75, 13.0)
longitude = (14.25, 14.5)
## Specify a date range using a (min,max) tuple
from datetime import datetime
time = (datetime(2014,1,1), datetime(2017,1,2))
## define the name you gave your data while it was being "ingested", as well as the platform it was captured on.
product = 'gpm_imerg_gis_daily_global'
platform = 'GPM'
measurements = ['total_precipitation']
gpm_data = dc.load(latitude = latitude, longitude = longitude,
product = product, platform = platform,
measurements=measurements)
Display Data
We'll aggregate spatial axis so that we're left with a mean value of the region for each point in time. Let's plot those points in a time series.
In [18]:
times = gpm_data.time.values
values = gpm_data.mean(['latitude', 'longitude']).total_precipitation.values
import matplotlib.pyplot as plt
plt.plot(times, values)
Out[18]:
This concludes our series on observing the rainy season's contributions to Lake Chad's surface area. Hopefully you've taken an understanding of, or even interest to datacube and xarrays.
I encourage you to check out more of our notebooks on our github with applications ranging from landslide detection to fractional coverage or even the Wofs water detection algorithm