This notebook is a part of the Crop Segmentation and Classification notebook project. In this notebook, we identify and the ground truth data, define the area of intereste, and identify and download imagery data for use in crop segmentation and classification. We do this for two sets of data: a training set, where we will develop our segmentation and classification algorithms, and a testing set, where we test the accuracy of the resulting segmentation and classification.
The general sections are:
Ground truth crop type and boundary data is not easy to come by. Therefore, the area and time of interest for this problem is primarily defined by the availability of ground truth data. The 2015 Sacramento County Land Use DWR Survey Dataset is a free dataset covering Sacramento county in 2015. It provides hand-adjusted boundaries and provides crop types.
The primary satellite imagery we will use in this study is SSO Planetscope 2 imagery. We will use the Analytic Radiance Ortho Product, which is 4-band (Blue, Green Red, Near-IR) and radiometrically corrected to at-sensor radiance. Correction to at-sensor radiance removes any variation in imagery between different satellites and allows for calculating vegetative indices. Further, the coefficients for correcting to at-sensor reflectance are provided in the scene metadata, which further improves the consistency between images taken at different times.
SSO Planetscope 2 satellites were launched Feb 14, 2017 (news release), therefore they did not image Sacramento county in 2015. Although at this time we are focusing on PlanetScope imagery, in the future we may use Landsat 8 imagery as a bridge between 2015 and 2017.
This notebook was developed in a Docker container. This Dockerfile was used to build the image.
The user-specific planet api key must be stored in the environmental variable $PL_API_KEY. To pass this key into the Docker container, add -e PL_API_KEY=your_api_key when you call docker run (replace 'your_api_key' with your api key).
Python dependencies are tracked in the conda environmental files root.yml
(for Python3 dependencies) and python2.yml
(for Python2 dependencies and the conda environment that is used in this notebook).
In [1]:
# Notebook dependencies
import copy
import datetime
from functools import partial
import json
import os
import zipfile
import ipyleaflet as ipyl
import ipywidgets as ipyw
from IPython.display import display, Image
import fiona
import pandas as pd
from planet import api
from planet.api import filters
import pyproj
import shapely
from shapely.geometry import shape, mapping
from shapely.ops import transform
In this section we will download the ground truth data, filter it to crop features, and save it as a geojson dataset (geojson is our preferred format for processing in these notebooks).
We first start with the shapefile zip file downloaded from (http://www.water.ca.gov/landwateruse/docs/landusedata/shapes/15sa.zip).
To download this file yourself, in the pre-data
directory, run:
$> wget https://water.ca.gov/-/media/DWR-Website/Web-Pages/Programs/Water-Use-And-Efficiency/Land-And-Water-Use/Land-Use-Surveys/Files/2015/15sa.zip
In [36]:
data_dir = 'data'
predata_dir = 'pre-data'
shapefile_zip = os.path.join(predata_dir, '15sa.zip')
shapefile_dir = os.path.join(data_dir, 'dwr_survey')
In [37]:
with zipfile.ZipFile(shapefile_zip, 'r') as zip_ref:
zip_ref.extractall(shapefile_dir)
In [38]:
# Specify the shapefile location and ensure it indeed exists
survey_shapefile = os.path.join(shapefile_dir, 'SA15.shp')
assert os.path.isfile(survey_shapefile)
The data is provided as a shapefile. It is easier to process the data as geojson. Therefore, we will convert the data to geojson. Additionally, the data contains polygons that aren't crops. Since we are only interested in crops, we will filter the data to only the crop polygons.
We will use fiona to load the shapefile, shapely to manage the geometries
In [39]:
src_proj = fiona.open(survey_shapefile, 'r').crs['init']
print(src_proj)
Turns out it is EPSG:26910. Geojson only supports EPSG:4326. We will need to reproject the shapes.
In [40]:
# define projection
# from shapely [docs](http://toblerity.org/shapely/manual.html#shapely.ops.transform)
def define_to_wkt_projection(dataset):
"""dataset is obtained from fiona.open(file)"""
src_proj = dataset.crs['init']
dst_proj = 'epsg:4326'
project_to_wkt = partial(
pyproj.transform,
pyproj.Proj(init=src_proj),
pyproj.Proj(init=dst_proj))
return project_to_wkt
def project_feature(feat, projection_fcn):
g1 = shape(feat['geometry'])
g2 = transform(projection_fcn, g1)
feat['geometry'] = mapping(g2)
The survey data has attributes that provide the crop type. These attributes are described in a pdf distributed with the shapefile. It was unzipped along with the shapefile files and is located at data/dwr_survey/09legend.pdf
.
We are interested in the agricultural classes. Class is specified by the 'CLASS1' attribute of the feature.
The agricultural class label and descriptions are:
In [41]:
# Class ids from dwr_survey/09legend.pdf
agg_classes = ['G', 'R', 'F', 'P', 'T', 'D', 'C', 'V']
def is_agricultural(feat):
return feat['properties']['CLASS1'] in agg_classes
In [42]:
def load_ground_truth(filename):
features = []
with fiona.open(filename) as survey_data:
to_wkt_projection = define_to_wkt_projection(survey_data)
for feat in survey_data:
if is_agricultural(feat):
project_feature(feat, to_wkt_projection)
features.append(feat)
return features
features = load_ground_truth(survey_shapefile)
print(len(features))
In this section we will identify an area of interest (aoi) for each study.
Selection of the area of interest for our study is based on the following:
We visualize the ground truth data in an interactive window that allows definition of the aoi. Then we query the Planet API to determine the availability of imatery. After using this interactive visualization, we identified two AOIs, which are defined in the notebook as aoi_test
and aoi_train
. Next, we identify the Planet scene we want to download for our study.
In [43]:
# Assign colors to classes
# colors determined using [colorbrewer2.org](http://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3)
colors = ['#ffffd9','#edf8b1','#c7e9b4','#7fcdbb','#41b6c4','#1d91c0','#225ea8','#0c2c84']
class_colors = dict((a,c) for a,c in zip(agg_classes, colors))
def get_color(cls):
return class_colors[cls]
In [44]:
# Create crop feature layer
feature_collection = {
"type": "FeatureCollection",
"features": features
}
for f in feature_collection['features']:
feature_color = get_color(f['properties']['CLASS1'])
f['properties']['style'] = {
'weight': 0,
'fillColor': feature_color,
'fillOpacity': 1}
feature_layer = ipyl.GeoJSON(data=feature_collection)
In [45]:
# Initialize map using parameters from above map
# and deleting map instance if it exists
try:
del aoi_map
except NameError:
pass
zoom = 11
center = [38.3586252, -121.3853994] # lat/lon
In [46]:
# Create map, adding box drawing controls
# Reuse parameters if map already exists
try:
center = aoi_map.center
zoom = aoi_map.zoom
except NameError:
pass
# Change tile layer to one that makes it easier to see crop features
# Layer selected using https://leaflet-extras.github.io/leaflet-providers/preview/
map_tiles = ipyl.TileLayer(url='http://{s}.basemaps.cartocdn.com/light_all/{z}/{x}/{y}.png')
aoi_map = ipyl.Map(
center=center,
zoom=zoom,
default_tiles = map_tiles
)
aoi_map.add_layer(feature_layer)
# Add box drawing control
# refs:
# https://github.com/kscottz/PythonFromSpace/blob/master/TheBasics.ipynb
# https://github.com/ellisonbg/ipyleaflet/blob/master/examples/DrawControl.ipynb
rectangle = {'shapeOptions': {'color': 'blue'}}
dc = ipyl.DrawControl(
polygon={}, polyline={}, # disable polygons and polylines
rectangle={'shapeOptions': {'color': 'blue'}}
)
# When a box is drawn, update the label with the number of unique classes
# and save the box geometry as AOI
total_unique_classes = len(set([f['properties']['CLASS1'] for f in features]))
label = ipyw.Label(layout=ipyw.Layout(width='100%'))
aois = []
def handle_draw(self, action, geo_json):
if action == 'created':
box_shape = shape(geo_json['geometry'])
contained_features = [f for f in features
if shape(f['geometry']).within(box_shape)]
unique_classes = set([f['properties']['CLASS1'] for f in contained_features])
label.value = '{} unique classes out of {} total'.format(
len(unique_classes), total_unique_classes)
aois.append(geo_json)
elif action == 'deleted':
aois.remove(geo_json)
dc.on_draw(handle_draw)
aoi_map.add_control(dc)
# Display map and label
ipyw.VBox([aoi_map, label])
From this map, we have identified two potential aois for a training dataset and a testing dataset
In [47]:
# Run this to use cached aois
aoi_train = {u'geometry': {u'type': u'Polygon', u'coordinates': [[[-121.58460974693298, 38.29170496647727], [-121.58460974693298, 38.32726528409606], [-121.5248715877533, 38.32726528409606], [-121.5248715877533, 38.29170496647727], [-121.58460974693298, 38.29170496647727]]]}, u'type': u'Feature', u'properties': {u'style': {u'opacity': 0.5, u'noClip': False, u'weight': 4, u'fillColor': None, u'color': u'blue', u'lineCap': None, u'stroke': True, u'smoothFactor': 1, u'dashArray': None, u'fillOpacity': 0.2, u'clickable': True, u'lineJoin': None, u'fill': True}}}
aoi_test = {u'geometry': {u'type': u'Polygon', u'coordinates': [[[-121.3113248348236, 38.28911976564886], [-121.3113248348236, 38.34622533958], [-121.2344205379486, 38.34622533958], [-121.2344205379486, 38.28911976564886], [-121.3113248348236, 38.28911976564886]]]}, u'type': u'Feature', u'properties': {u'style': {u'opacity': 0.5, u'fillOpacity': 0.2, u'noClip': False, u'weight': 4, u'color': u'blue', u'lineCap': None, u'dashArray': None, u'smoothFactor': 1, u'stroke': True, u'fillColor': None, u'clickable': True, u'lineJoin': None, u'fill': True}}}
How many PL images cover the AOI defined above?
We will answer this question by querying the planet api. The client is how we interact with the planet api. It is created with the user-specific api key, which is pulled from $PL_API_KEY
Much of this code is pulled from PythonFromSpace/TheBasics.ipynb
In [48]:
planet_api_key = os.environ['PL_API_KEY']
# quick check that key is defined
assert planet_api_key, "PL_API_KEY not defined."
client = api.ClientV1(api_key=planet_api_key)
Query Planet API
Filter to scenes that contain AOI. If the number is zero, go back and redefine the AOI to be smaller.
In [49]:
# build a query using the AOI and
# a cloud_cover filter that excludes 'cloud free' scenes
def query_by_aoi(aoi):
old = datetime.datetime(year=2016,month=6,day=1)
new = datetime.datetime(year=2016,month=10,day=1)
search_aoi = aoi['geometry']
query = filters.and_filter(
filters.geom_filter(search_aoi),
filters.range_filter('cloud_cover', lt=75),
filters.date_range('acquired', gt=old),
filters.date_range('acquired', lt=new)
)
# build a request for only PlanetScope imagery
request = filters.build_search_request(
query, item_types=['PSScene4Band']
)
# run search
# if you don't have an API key configured, this will raise an exception
result = client.quick_search(request)
scenes = []
planet_map = {}
for item in result.items_iter(limit=500):
planet_map[item['id']]=item
props = item['properties']
props["id"] = item['id']
props["geometry"] = item["geometry"]
props["thumbnail"] = item["_links"]["thumbnail"]
scenes.append(props)
scenes = pd.DataFrame(data=scenes)
aoi_shape = shape(search_aoi)
footprints = []
overlaps = []
# go through the geometry from our api call, convert to a shape and calculate overlap area.
# also save the shape for safe keeping
for footprint in scenes["geometry"].tolist():
s = shape(footprint)
footprints.append(s)
overlap = 100.0*(aoi_shape.intersection(s).area / aoi_shape.area)
overlaps.append(overlap)
# take our lists and add them back to our dataframe
scenes['overlap'] = pd.Series(overlaps, index=scenes.index)
scenes['footprint'] = pd.Series(footprints, index=scenes.index)
full_coverage = scenes["overlap"] > 99
good_scenes = scenes[(full_coverage)]
return good_scenes
scenes_test = query_by_aoi(aoi_test)
print(len(scenes_test))
scenes_train = query_by_aoi(aoi_train)
print(len(scenes_train))
Visualize subset of scene thumbnails and then select one
If all you see are broken image icons, click one of the urls and log into the site.
In [50]:
def display_thumbnails(scenes, limit=10):
for thumb_url in scenes['thumbnail'].tolist()[:limit]:
img_image = Image(url=thumb_url)
display(img_image)
print(thumb_url)
# Uncomment one of the lines below to display the thumbnails
# display_thumbnails(scenes_train, limit=5)
# display_thumbnails(scenes_test, limit=10)
For the train scenes, https://api.planet.com/data/v1/item-types/PSScene4Band/items/20160831_180231_0e0e/thumb looks pretty good. It is from August 2016 and looks clear of clouds.
For the test scenes, https://api.planet.com/data/v1/item-types/PSScene4Band/items/20160831_180257_0e26/thumb looks great. It is from the same day as the train scene.
In [51]:
train_thumbnail = 'https://api.planet.com/data/v1/item-types/PSScene4Band/items/20160831_180231_0e0e/thumb'
display(Image(url=train_thumbnail))
test_thumbnail = 'https://api.planet.com/data/v1/item-types/PSScene4Band/items/20160831_180257_0e26/thumb'
display(Image(url=test_thumbnail))
In [52]:
def save_geojson(features, filename):
with open(filename, "w") as f:
f.write(json.dumps(features))
In [53]:
# save train AOI
save_geojson(aoi_train, os.path.join(predata_dir, 'aoi-train.geojson'))
# save test
save_geojson(aoi_test, os.path.join(predata_dir, 'aoi-test.geojson'))
# save ground truth data
save_geojson(features, os.path.join(predata_dir, 'ground-truth.geojson'))