Our aim in this notebook is to automatically identify crops in a Planet image and then create georeferenced geojson features that represent those crops.
In this notebook, we train a KNN classifier to predict crop/non-crop pixels on one Planet image based on 2-year-old ground truth data, and we use that classifier to predict crop/non-crop pixels in another Planet image. We then segment the classified image and create georeferenced geojson features that outline the predicted crops.
K-nearest neighbors (KNN) is a supervised learning technique for classification. It is a straightforward machine-learning algorithm. See wikipedia for more details. In this notebook, we use KNN Classification to classify pixels as crop / non-crop based on their red, green, blue, and NIR band top-of-atmosphere (TOA) reflectance values.
TOA reflectance is not affected by the satellite that takes the image, the time of day, or the time of year. It is affected by atmosphere, which isn't ideal. However, correcting for the atmosphere is complicated, requires detailed input data, and is error-prone. Therefore, TOA reflectance is the dataset we are going to use. The Planet product we use has the bands scaled to TOA radiance. We use the radiance coefficient, stored in the product metadata, to scale bands to TOA reflectance.
The data used in this notebook is selected and described in the Identify Datasets notebook and prepared in the Prepare Datasets notebook. Survey data from 2015 is used as the 'ground truth' for training the KNN Classifier and assessing accuracy. The Planet imagery used in this notebook was collected in 2017, so we expect there to be some inaccuracies in the ground truth data due to the 2-year gap. However, we don't expect a large difference in the crop/non-crop status of regions, so expect reasonable accuracy.
In [1]:
# Notebook dependencies
from collections import namedtuple
import copy
from functools import partial
import json
import os
import pickle
from xml.dom import minidom
import cv2
import ipyleaflet as ipyl
import matplotlib
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
import numpy as np
import pyproj
import rasterio
from shapely.geometry import mapping, shape, Polygon
from shapely.ops import transform
from sklearn import neighbors
from sklearn.metrics import classification_report
%matplotlib inline
We define the datasets to use for training the KNN classifier and testing the classifier. These datasets were prepared in the datasets-identify
notebook.
The training dataset will actually come from the 'test' data, the test dataset will come from the 'train' data. The reason for this swap is because the 'train' data is mostly composed of 'crop' regions while the 'test' data is composed of an approximately equal representation of 'crop' and 'non-crop' regions. Swapping these datasets results in an improvement in the classifier f-score on the test dataset of almost 10% (from 0.68 to 0.77 in the trial run).
In [2]:
# all data is saved in the pre-data directory
predata_dir = 'pre-data'
In [3]:
# train data
train_scene_filename = os.path.join(predata_dir, 'test_scene_cropped.tif')
train_metadata_filename = os.path.join(predata_dir, 'test_scene_cropped_metadata.xml')
train_ground_truth_filename = os.path.join(predata_dir, 'ground-truth-test.geojson')
In [4]:
# test data
test_scene_filename = os.path.join(predata_dir, 'train_scene_cropped.tif')
test_metadata_filename = os.path.join(predata_dir, 'train_scene_cropped_metadata.xml')
test_ground_truth_filename = os.path.join(predata_dir, 'ground-truth-train.geojson')
Steps:
To train the KNN classifier, we need to identify the class of each pixel as either 'crop' or 'non-crop'. The class is determined from whether the pixel is associated with a crop feature in the ground truth data. To make this determination, we must first convert the ground truth data into OpenCV contours, then we use these contours to separate 'crop' from 'non-crop' pixels. The pixel values and their classification are then used to train the KNN classifier.
In [5]:
# Utility functions: features to contours
def project_feature(feature, proj_fcn):
"""Creates a projected copy of the feature.
:param feature: geojson description of feature to be projected
:param proj_fcn: partial function defining projection transform"""
g1 = shape(feature['geometry'])
g2 = transform(proj_fcn, g1)
proj_feat = copy.deepcopy(feature)
proj_feat['geometry'] = mapping(g2)
return proj_feat
def project_features_to_srs(features, img_srs, src_srs='epsg:4326'):
"""Project features to img_srs.
If src_srs is not specified, WGS84 (only geojson-supported crs) is assumed.
:param features: list of geojson features to be projected
:param str img_srs: destination spatial reference system
:param str src_srs: source spatial reference system
"""
# define projection
# from shapely [docs](http://toblerity.org/shapely/manual.html#shapely.ops.transform)
proj_fcn = partial(
pyproj.transform,
pyproj.Proj(init=src_srs),
pyproj.Proj(init=img_srs))
return [project_feature(f, proj_fcn) for f in features]
def polygon_to_contour(feature_geometry, image_transform):
"""Convert the exterior ring of a geojson Polygon feature to an
OpenCV contour.
image_transform is typically obtained from `img.transform` where
img is obtained from `rasterio.open()
:param feature_geometry: the 'geometry' entry in a geojson feature
:param rasterio.Affine image_transform: image transformation"""
points_xy = shape(feature_geometry).exterior.coords
points_x, points_y = zip(*points_xy) # inverse of zip
rows, cols = rasterio.transform.rowcol(image_transform, points_x, points_y)
return np.array([pnt for pnt in zip(cols, rows)], dtype=np.int32)
In [6]:
def load_geojson(filename):
with open(filename, 'r') as f:
return json.load(f)
In [7]:
def load_contours(ground_truth_filename, pl_filename):
with rasterio.open(pl_filename) as img:
img_transform = img.transform
img_srs = img.crs['init']
ground_truth_data = load_geojson(ground_truth_filename)
# project to image srs
projected_features = project_features_to_srs(ground_truth_data, img_srs)
# convert projected features to contours
contours = [polygon_to_contour(f['geometry'], img_transform)
for f in projected_features]
return contours
print(len(load_contours(train_ground_truth_filename, train_scene_filename)))
In [8]:
# Utility functions: loading an image
NamedBands = namedtuple('NamedBands', 'b, g, r, nir')
def load_bands(filename):
"""Loads a 4-band BGRNir Planet Image file as a list of masked bands.
The masked bands share the same mask, so editing one band mask will
edit them all."""
with rasterio.open(filename) as src:
b, g, r, nir = src.read()
mask = src.read_masks(1) == 0 # 0 value means the pixel is masked
bands = NamedBands(b=b, g=g, r=r, nir=nir)
return NamedBands(*[np.ma.array(b, mask=mask)
for b in bands])
def get_rgb(named_bands):
return [named_bands.r, named_bands.g, named_bands.b]
def check_mask(band):
return '{}/{} masked'.format(band.mask.sum(), band.mask.size)
print(check_mask(load_bands(train_scene_filename).r))
In [9]:
# Utility functions: converting an image to reflectance
NamedCoefficients = namedtuple('NamedCoefficients', 'b, g, r, nir')
# https://github.com/planetlabs/notebooks/blob/master/jupyter-notebooks/ndvi/ndvi_planetscope.ipynb
def read_refl_coefficients(metadata_filename):
xmldoc = minidom.parse(metadata_filename)
nodes = xmldoc.getElementsByTagName("ps:bandSpecificMetadata")
# XML parser refers to bands by numbers 1-4
coeffs = {}
for node in nodes:
bn = node.getElementsByTagName("ps:bandNumber")[0].firstChild.data
if bn in ['1', '2', '3', '4']:
i = int(bn)
value = node.getElementsByTagName("ps:reflectanceCoefficient")[0].firstChild.data
coeffs[i] = float(value)
return NamedCoefficients(b=coeffs[1],
g=coeffs[2],
r=coeffs[3],
nir=coeffs[4])
def to_reflectance(bands, coeffs):
return NamedBands(*[b.astype(float)*c for b,c in zip(bands, coeffs)])
def load_refl_bands(filename, metadata_filename):
bands = load_bands(filename)
coeffs = read_refl_coefficients(metadata_filename)
return to_reflectance(bands, coeffs)
print(read_refl_coefficients(train_metadata_filename))
In [10]:
# Utility functions: displaying an image
def _linear_scale(ndarray, old_min, old_max, new_min, new_max):
"""Linear scale from old_min to new_min, old_max to new_max.
Values below min/max are allowed in input and output.
Min/Max values are two data points that are used in the linear scaling.
"""
#https://en.wikipedia.org/wiki/Normalization_(image_processing)
return (ndarray - old_min)*(new_max - new_min)/(old_max - old_min) + new_min
# print(linear_scale(np.array([1,2,10,100,256,2560, 2660]), 2, 2560, 0, 256))
def _mask_to_alpha(bands):
# band = np.atleast_3d(bands)[...,0]
band = np.atleast_3d(bands[0])
alpha = np.zeros_like(band)
alpha[~band.mask] = 1
return alpha
def _add_alpha_mask(bands):
return np.dstack([bands, _mask_to_alpha(bands)])
def bands_to_display(bands, alpha=False):
"""Converts a list of 3 bands to a 3-band rgb, normalized array for display."""
assert len(bands) in [1,3]
all_bands = np.dstack(bands)
old_min = np.percentile(all_bands, 2)
old_max = np.percentile(all_bands, 98)
new_min = 0
new_max = 1
scaled = [np.clip(_linear_scale(b.astype(np.double),
old_min, old_max, new_min, new_max),
new_min, new_max)
for b in bands]
filled = [b.filled(fill_value=new_min) for b in scaled]
if alpha:
filled.append(_mask_to_alpha(scaled))
return np.dstack(filled)
In [11]:
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, sharex=True, sharey=True, figsize=(15,15))
for ax in (ax1, ax2):
ax.set_aspect('equal', adjustable='box-forced')
ax1.imshow(bands_to_display(get_rgb(load_bands(train_scene_filename)), alpha=False))
ax1.set_title('Display Bands, No Alpha')
ax2.imshow(bands_to_display(get_rgb(load_bands(train_scene_filename)), alpha=True))
ax2.set_title('Display Bands, Alpha from Mask')
plt.tight_layout()
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, sharex=True, sharey=True, figsize=(15,15))
for ax in (ax1, ax2):
ax.set_aspect('equal', adjustable='box-forced')
ax1.imshow(bands_to_display(get_rgb(load_refl_bands(train_scene_filename,
train_metadata_filename)),
alpha=False))
ax1.set_title('Reflectance Bands, No Alpha')
ax2.imshow(bands_to_display(get_rgb(load_refl_bands(train_scene_filename,
train_metadata_filename)),
alpha=True))
ax2.set_title('Reflectance Bands, Alpha from Mask')
plt.tight_layout()
In [12]:
# Utility functions: displaying contours
def draw_contours(img, contours, color=(0, 1, 0), thickness=2):
"""Draw contours over a copy of the image"""
n_img = img.copy()
# http://docs.opencv.org/2.4.2/modules/core/doc/drawing_functions.html#drawcontours
cv2.drawContours(n_img,contours,-1,color,thickness=thickness)
return n_img
In [13]:
plt.figure(1, figsize=(10,10))
plt.imshow(draw_contours(bands_to_display(get_rgb(load_bands(train_scene_filename)), alpha=False),
load_contours(train_ground_truth_filename, train_scene_filename)))
_ = plt.title('Contours Drawn over Image')
In [14]:
# Utility functions: masking pixels using contours
def _create_contour_mask(contours, shape):
"""Masks out all pixels that are not within a contour"""
mask = np.zeros(shape, dtype=np.uint8)
cv2.drawContours(mask, contours, -1, (1), thickness=-1)
return mask < 1
def combine_masks(masks):
"""Masks any pixel that is masked in any mask.
masks is a list of masks, all the same size"""
return np.any(np.dstack(masks), 2)
def _add_mask(bands, mask):
# since band masks are linked, could use any band here
bands[0].mask = combine_masks([bands[0].mask, mask])
def mask_contours(bands, contours, in_contours=False):
contour_mask = _create_contour_mask(contours, bands[0].mask.shape)
if in_contours:
contour_mask = ~contour_mask
_add_mask(bands, contour_mask)
return bands
In [15]:
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, sharex=True, sharey=True, figsize=(15,15))
for ax in (ax1, ax2):
ax.set_aspect('equal', adjustable='box-forced')
ax1.imshow(bands_to_display(get_rgb(mask_contours(load_bands(train_scene_filename),
load_contours(train_ground_truth_filename,
train_scene_filename))),
alpha=True))
ax1.set_title('Crop Pixels')
ax2.imshow(bands_to_display(get_rgb(mask_contours(load_bands(train_scene_filename),
load_contours(train_ground_truth_filename,
train_scene_filename),
in_contours=True)),
alpha=True))
ax2.set_title('Non-Crop Pixels')
plt.tight_layout()
We use the scikit-learn KNeighborsClassifier to perform KNN classification.
KNeighborsClassifier fit input is two datasets: X, a 2d array that provides the features to be classified on, and y, a 1d array that provides the classes. X and y are ordered along the first dimension (example).
The number of neighbors (the K in KNN) is a parameter of the estimator that can greatly affect it's performance. Instead of guessing the parameter, we use cross-validation to tune the parameter. Cross-validation is performed in KNN Cross-Validation. In this notebook, we save the X and y datasets so they can be imported by that notebook and we use the number of neighbors that was identified in that notebook to provide the greatest accuracy.
We first take the non-crop and crop datasets above and create a single classified band, where the values of the band indicate the pixel class.
In [16]:
def classified_band_from_masks(band_mask, class_masks):
class_band = np.zeros(band_mask.shape, dtype=np.uint8)
for i, mask in enumerate(class_masks):
class_band[~mask] = i
# turn into masked array, using band_mask as mask
return np.ma.array(class_band, mask=band_mask)
def create_contour_classified_band(pl_filename, ground_truth_filename):
band_mask = load_bands(pl_filename)[0].mask
contour_mask = _create_contour_mask(load_contours(ground_truth_filename, pl_filename),
band_mask.shape)
return classified_band_from_masks(band_mask, [contour_mask, ~contour_mask])
plt.figure(1, figsize=(10,10))
plt.imshow(create_contour_classified_band(train_scene_filename, train_ground_truth_filename))
_ = plt.title('Ground Truth Classes')
Next we convert this band and the underlying image pixel band values to the X and y inputs to the classifier fit function.
In [17]:
def to_X(bands):
"""Convert to a list of pixel values, maintaining order and filtering out masked pixels."""
return np.stack([b.compressed() for b in bands], axis=1)
def to_y(classified_band):
return classified_band.compressed()
We then save X and y for import into KNN Parameter Tuning and then specify the number of neighbors that was found in that notebook to provide the best accuracy.
In [18]:
def save_cross_val_data(pl_filename, ground_truth_filename, metadata_filename):
train_class_band = create_contour_classified_band(pl_filename, ground_truth_filename)
X = to_X(load_refl_bands(pl_filename, metadata_filename))
y = to_y(train_class_band)
cross_val_dir = os.path.join('data', 'knn_cross_val')
if not os.path.exists(cross_val_dir):
os.makedirs(cross_val_dir)
xy_file = os.path.join(cross_val_dir, 'xy_file.npz')
np.savez(xy_file, X=X, y=y)
return xy_file
print(save_cross_val_data(train_scene_filename,
train_ground_truth_filename,
train_metadata_filename))
In [19]:
# best n_neighbors found in KNN Parameter Tuning
n_neighbors = 3
Finally, we fit the classifier.
We do not use cache for this model because the saved model is so bit (350M) and it does not take long to fit the model.
In [20]:
def fit_classifier(pl_filename, ground_truth_filename, metadata_filename, n_neighbors):
n_neighbors = 15
weights = 'uniform'
clf = neighbors.KNeighborsClassifier(n_neighbors, weights=weights)
train_class_band = create_contour_classified_band(pl_filename, ground_truth_filename)
X = to_X(load_refl_bands(pl_filename, metadata_filename))
y = to_y(train_class_band)
clf.fit(X, y)
return clf
In [25]:
clf = fit_classifier(train_scene_filename,
train_ground_truth_filename,
train_metadata_filename,
n_neighbors)
To see how well the classifier works on its own input data, we use the classifier to predict the class of pixels using just the pixel band values. How well the classifier performs is based mostly on (1) the accuracy of the train data and (2) clear separation of the pixel classes based on the input data (pixel band values). The classifier parameters (number of neighbors, distance weighting function) can be tweaked to overcome weaknesses in (2).
In [23]:
def classified_band_from_y(band_mask, y):
class_band = np.ma.array(np.zeros(band_mask.shape),
mask=band_mask.copy())
class_band[~class_band.mask] = y
return class_band
def predict(pl_filename, metadata_filename, clf):
bands = load_refl_bands(pl_filename, metadata_filename)
X = to_X(bands)
y = clf.predict(X)
return classified_band_from_y(bands[0].mask, y)
# it takes a while to run the prediction so cache the results
train_predicted_class_band = predict(train_scene_filename, train_metadata_filename, clf)
In [24]:
def imshow_class_band(ax, class_band):
"""Show classified band with legend. Alters ax in place."""
im = ax.imshow(class_band)
# https://stackoverflow.com/questions/25482876/how-to-add-legend-to-imshow-in-matplotlib
colors = [ im.cmap(im.norm(value)) for value in (0,1)]
labels = ('crop', 'non-crop')
# https://matplotlib.org/users/legend_guide.html
# tag: #creating-artists-specifically-for-adding-to-the-legend-aka-proxy-artists
patches = [mpatches.Patch(color=c, label=l) for c,l in zip(colors, labels)]
ax.legend(handles=patches, bbox_to_anchor=(1, 1), loc='upper right', borderaxespad=0.)
def plot_predicted_vs_truth(predicted_class_band, truth_class_band, figsize=(15,15)):
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2,
sharex=True, sharey=True,
figsize=figsize)
for ax in (ax1, ax2):
ax.set_aspect('equal', adjustable='box-forced')
imshow_class_band(ax1, predicted_class_band)
ax1.set_title('Classifier Predicted Classes')
imshow_class_band(ax2, truth_class_band)
ax2.set_title('Ground Truth Classes')
plt.tight_layout()
plot_predicted_vs_truth(train_predicted_class_band,
create_contour_classified_band(train_scene_filename,
train_ground_truth_filename))
The predicted classes are close to the ground truth classes, but appear a little noisier and there are a few regions that are different.
There is one instance where they disagree, the crop region in rows 1650-2150, columns 1500-2000.
In [25]:
# Utility functions: Comparing ground truth and actual in a region
def calculate_ndvi(bands):
return (bands.nir.astype(np.float) - bands.r) / (bands.nir + bands.r)
def plot_region_compare(region_slice, predicted_class_band, class_band, img_bands):
# x[1:10:5,::-1] is equivalent to
# obj = (slice(1,10,5), slice(None,None,-1)); x[obj]
# https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2, sharex=True, sharey=True,
figsize=(10,10))
for ax in (ax1, ax2, ax3, ax4):
ax.set_aspect('equal', adjustable='box-forced')
imshow_class_band(ax1, predicted_class_band[region_slice])
ax1.set_title('Predicted Classes')
imshow_class_band(ax2, class_band[region_slice])
ax2.set_title('Ground Truth Classes')
region_bands = NamedBands(*[b[region_slice] for b in img_bands])
ax3.imshow(bands_to_display(get_rgb(region_bands)))
ax3.set_title('Planet Scene RGB')
ax4.imshow(bands_to_display(3*[calculate_ndvi(region_bands)], alpha=False))
ax4.set_title('Planet Scene NDVI')
plt.tight_layout()
In [26]:
plot_region_compare((slice(1650,2150,None), slice(600,1100,None)),
train_predicted_class_band,
create_contour_classified_band(train_scene_filename,
train_ground_truth_filename),
load_refl_bands(train_scene_filename, train_metadata_filename))
The classifier filters out the regions identified in the ground truth as 'crop' on the right side of this region. The region is brown in the RGB image and doesn't appear to be a crop. However, in the center of the region, the classifier agrees with the ground truth that this region, which appears light brown in RGB and dark in NDVI, is a 'crop'. It is unclear from visual inspection why this region is classified as a crop, and without ground truth data from the year the image was taken, it is hard to verify whether this classification is correct or not. However, it appears to be a misclassification due to errors in the ground truth data (ie the ground truth data indicates this is a 'crop' region while it is not) due to the time gap between when the ground truth data was collected and when the imagery was collected.
First, let's check out a 'noisier' region. We will zoom into rows 600-1000, columns 200-600 to take a look.
In [27]:
plt.figure(1, figsize=(8,8))
plt.imshow(bands_to_display(get_rgb(load_refl_bands(test_scene_filename, test_metadata_filename)),
alpha=True))
_ = plt.title('Test Image RGB')
In [28]:
test_predicted_class_band = predict(test_scene_filename, test_metadata_filename, clf)
In [29]:
plot_predicted_vs_truth(test_predicted_class_band,
create_contour_classified_band(test_scene_filename,
test_ground_truth_filename))
The predicted classes are close to the ground truth classes, but appear a little noisier and there are a few regions that are different.
First, let's look a a noisier region.
In [30]:
plot_region_compare((slice(600,1000,None), slice(200,600,None)),
test_predicted_class_band,
create_contour_classified_band(test_scene_filename,
test_ground_truth_filename),
load_refl_bands(test_scene_filename, test_metadata_filename))
Ah! In this region, some of the 'noisier' appearance is actually due to the ground truth data being generalized a bit and missing some of the detail that the classified image is picking up. What is really cool is that the trees along the river are classified as 'non-crop' by the classifier, while the ground truth data classified the trees as 'crop' (possibly because the trees are shading the crop there).
However, there is some noise which is due to a misclassification of some of the crop pixels as 'non-crop' (lower-left, center-right regions). Hopefully segmentation will clean up these noisy regions.
Now let's check out a 'changed' region. This is a region that was marked as a non-crop region in the ground truth data, but is not predicted to be a crop by the KNN classifier. Lets zoom into rows 200-600, columns 400-800 to take a look.
In [31]:
plot_region_compare((slice(200,600,None), slice(400,800,None)),
test_predicted_class_band,
create_contour_classified_band(test_scene_filename,
test_ground_truth_filename),
load_refl_bands(test_scene_filename, test_metadata_filename))
Wow! In the Planet scene RGB and NDVI, we can see that the crop shape has changed between when the ground truth data was collected and when the Planet scene was collected (2-year gap). The class predicted by the KNN classifier is more accurate than the ground truth class.
Although the most informative way to determine the accuracy of the classification results is to compare the images of the predicted classes to the ground truth classes, this is subjective and can't be generalized. We will fall back on classic classification metrics and see if they match with our quantitative assessment.
scikit-learn provides a convenient function for calculating and reporting classification metrics, classification_report
. Using this, we will compare the classification metrics obtained with the train and the test datasets (in order).
In [32]:
# train dataset
print(classification_report(to_y(create_contour_classified_band(train_scene_filename,
train_ground_truth_filename)),
to_y(train_predicted_class_band),
target_names=['crop', 'non-crop']))
In [33]:
# test dataset
print(classification_report(to_y(create_contour_classified_band(test_scene_filename,
test_ground_truth_filename)),
to_y(test_predicted_class_band),
target_names=['crop', 'non-crop']))
The ultimate goal of the classification of pixels is accurate segmentation of crop regions. This likely will require that each region have a majority of correctly classified pixels. Therefore, both precision and recall are important and the f1-score is a good metric for predicting accuracy.
The f1-score does drop between the train and test datasets, but considering that the classifier was trained on the 'train' dataset, the drop is to be expected. An f1-score of 0.77 is promising and is adequate to try segmentation schemes on the classified data.
In [34]:
class_band = test_predicted_class_band.astype(np.uint8)
plt.figure(1, figsize=(8,8))
plt.imshow(class_band)
_ = plt.title('Test Predicted Classes')
In [35]:
def denoise(img, plot=False):
def plot_img(img, title=None):
if plot:
figsize = 8; plt.figure(figsize=(figsize,figsize))
plt.imshow(img)
if title:
plt.title(title)
denoised = class_band.copy()
plot_img(denoised, 'Original')
denoised = cv2.medianBlur(denoised, 5)
plot_img(denoised, 'Median Blur 5x5')
return denoised
denoised = denoise(class_band, plot=True)
To convert the classified raster to vector features, we use the OpenCV findContours function. This function requires that the regions of interest have pixel values set to 1 with all other pixels have values set to 0.
In [36]:
def to_crop_image(class_band):
denoised = denoise(class_band)
# crop pixels are 1, non-crop pixels are zero
crop_image = np.zeros(denoised.shape, dtype=np.uint8)
crop_image[(~class_band.mask) & (denoised == 0)] = 1
return crop_image
crop_image = to_crop_image(class_band)
plt.figure(figsize=(8,8)); plt.imshow(crop_image)
_ = plt.title('Crop regions set to 1 (yellow)')
In [37]:
def get_crop_contours(img):
"""Identify contours that represent crop segments from a binary image.
The crop pixel values must be 1, non-crop pixels zero.
"""
# http://docs.opencv.org/trunk/d3/dc0/group__imgproc__shape.html#ga17ed9f5d79ae97bd4c7cf18403e1689a
_, contours, _ = cv2.findContours(img, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
def is_region(contour):
num_points = contour.shape[0]
return num_points >= 3
return [c for c in contours
if is_region(c)]
contours = get_crop_contours(crop_image)
print(len(contours))
In [38]:
figsize = 8; plt.figure(figsize=(figsize,figsize))
plt.imshow(draw_contours(bands_to_display(3*[np.ma.array(denoised, mask=class_band.mask)],
alpha=False),
contours,
thickness=4))
_ = plt.title('Segmented Crop Outlines')
figsize = 8; plt.figure(figsize=(figsize,figsize))
plt.imshow(draw_contours(bands_to_display(get_rgb(load_refl_bands(test_scene_filename,
test_metadata_filename)),
alpha=False),
contours,
thickness=-1))
_ = plt.title('Segmented Crop Regions')
In [39]:
# Utility functions: contours to features
def contour_to_polygon(contour, image_transform):
"""Convert an OpenCV contour to a rasterio Polygon.
image_transform is typically obtained from `img.transform` where
img is obtained from `rasterio.open()
:param contour: OpenCV contour
:param rasterio.Affine image_transform: image transformation"""
# get list of x and y coordinates from contour
# contour: np.array(<list of points>, dtype=np.int32)
# contour shape: (<number of points>, 1, 2)
# point: (col, row)
points_shape = contour.shape
# get rid of the 1-element 2nd axis then convert 2d shape to a list
cols_rows = contour.reshape((points_shape[0],points_shape[2])).tolist()
cols, rows = zip(*cols_rows) # inverse of zip
# convert rows/cols to x/y
offset = 'ul' # OpenCV: contour point [0,0] is the upper-left corner of pixel 0,0
points_x, points_y = rasterio.transform.xy(image_transform, rows, cols, offset=offset)
# create geometry from series of points
points_xy = zip(points_x, points_y)
return Polygon(points_xy)
def project_features_from_srs(features, img_srs, dst_srs='epsg:4326'):
"""Project features from img_srs.
If dst_srs is not specified, WGS84 (only geojson-supported crs) is assumed.
:param features: list of geojson features to be projected
:param str img_srs: source spatial reference system
:param str dst_srs: destination spatial reference system
"""
# define projection
# from shapely [docs](http://toblerity.org/shapely/manual.html#shapely.ops.transform)
proj_fcn = partial(
pyproj.transform,
pyproj.Proj(init=img_srs),
pyproj.Proj(init=dst_srs))
return [project_feature(f, proj_fcn) for f in features]
def polygon_to_feature(polygon):
return {'type': 'Feature',
'properties': {},
'geometry': mapping(polygon)}
def contours_to_features(contours, pl_filename):
"""Convert contours to geojson features.
Features are simplified with a tolerance of 3 pixels then
filtered to those with 100x100m2 area.
"""
with rasterio.open(pl_filename) as img:
img_srs = img.crs['init']
img_transform = img.transform
polygons = [contour_to_polygon(c, img_transform)
for c in contours]
tolerance = 3
simplified_polygons = [p.simplify(tolerance, preserve_topology=True)
for p in polygons]
def passes_filters(p):
min_area = 50 * 50
return p.area > min_area
features = [polygon_to_feature(p)
for p in simplified_polygons
if passes_filters(p)]
return project_features_from_srs(features, img_srs)
features = contours_to_features(contours, test_scene_filename)
print(len(features))
In [40]:
# Create crop feature layer
feature_collection = {
"type": "FeatureCollection",
"features": features,
"properties": {"style":{
'weight': 0,
'fillColor': "blue",
'fillOpacity': 1}}
}
feature_layer = ipyl.GeoJSON(data=feature_collection)
In [41]:
# Initialize map using parameters from above map
# and deleting map instance if it exists
try:
del crop_map
except NameError:
pass
zoom = 13
center = [38.30839, -121.55187] # lat/lon
In [42]:
# Create map, adding box drawing controls
# Reuse parameters if map already exists
try:
center = crop_map.center
zoom = crop_map.zoom
except NameError:
pass
# Change tile layer to one that makes it easier to see crop features
# Layer selected using https://leaflet-extras.github.io/leaflet-providers/preview/
map_tiles = ipyl.TileLayer(url='http://{s}.basemaps.cartocdn.com/light_all/{z}/{x}/{y}.png')
crop_map = ipyl.Map(
center=center,
zoom=zoom,
default_tiles = map_tiles
)
crop_map.add_layer(feature_layer)
# Display map
crop_map
Alright! Those features look like they are georeferenced correctly and look reasonably 'crop-like'.
In this notebook, we have trained a KNN classifier to predict crop/non-crop pixels on one Planet image based on 2-year-old ground truth data, and we used that classifier to predict crop/non-crop pixels in another Planet image. We then segmented the classified image and created georeferenced geojson features that outline the predicted crops. It's been quite a journey and the results are quite promising!