Identify Datasets

This notebook is a part of the Crop Segmentation and Classification notebook project. In this notebook, we identify and the ground truth data, define the area of intereste, and identify and download imagery data for use in crop segmentation and classification. We do this for two sets of data: a training set, where we will develop our segmentation and classification algorithms, and a testing set, where we test the accuracy of the resulting segmentation and classification.

The general sections are:

  1. Explore Ground Truth Data
  2. Identify Area of Interest
  3. Download Planet Scene

Ground truth crop type and boundary data is not easy to come by. Therefore, the area and time of interest for this problem is primarily defined by the availability of ground truth data. The 2015 Sacramento County Land Use DWR Survey Dataset is a free dataset covering Sacramento county in 2015. It provides hand-adjusted boundaries and provides crop types.

The primary satellite imagery we will use in this study is SSO Planetscope 2 imagery. We will use the Analytic Radiance Ortho Product, which is 4-band (Blue, Green Red, Near-IR) and radiometrically corrected to at-sensor radiance. Correction to at-sensor radiance removes any variation in imagery between different satellites and allows for calculating vegetative indices. Further, the coefficients for correcting to at-sensor reflectance are provided in the scene metadata, which further improves the consistency between images taken at different times.

SSO Planetscope 2 satellites were launched Feb 14, 2017 (news release), therefore they did not image Sacramento county in 2015. Although at this time we are focusing on PlanetScope imagery, in the future we may use Landsat 8 imagery as a bridge between 2015 and 2017.

Usage Notes

This notebook was developed in a Docker container. This Dockerfile was used to build the image.

The user-specific planet api key must be stored in the environmental variable $PL_API_KEY. To pass this key into the Docker container, add -e PL_API_KEY=your_api_key when you call docker run (replace 'your_api_key' with your api key).

Python dependencies are tracked in the conda environmental files root.yml (for Python3 dependencies) and python2.yml (for Python2 dependencies and the conda environment that is used in this notebook).


In [1]:
# Notebook dependencies
import copy
import datetime
from functools import partial
import json
import os
import zipfile

import ipyleaflet as ipyl
import ipywidgets as ipyw
from IPython.display import display, Image
import fiona
import pandas as pd
from planet import api
from planet.api import filters
import pyproj
import shapely
from shapely.geometry import shape, mapping
from shapely.ops import transform

Explore Ground Truth Data

In this section we will download the ground truth data, filter it to crop features, and save it as a geojson dataset (geojson is our preferred format for processing in these notebooks).

Unzip data

We first start with the shapefile zip file downloaded from (http://www.water.ca.gov/landwateruse/docs/landusedata/shapes/15sa.zip).

To download this file yourself, in the pre-data directory, run:

$> wget https://water.ca.gov/-/media/DWR-Website/Web-Pages/Programs/Water-Use-And-Efficiency/Land-And-Water-Use/Land-Use-Surveys/Files/2015/15sa.zip

In [36]:
data_dir = 'data'
predata_dir = 'pre-data'

shapefile_zip = os.path.join(predata_dir, '15sa.zip')
shapefile_dir = os.path.join(data_dir, 'dwr_survey')

In [37]:
with zipfile.ZipFile(shapefile_zip, 'r') as zip_ref:
    zip_ref.extractall(shapefile_dir)

In [38]:
# Specify the shapefile location and ensure it indeed exists
survey_shapefile = os.path.join(shapefile_dir, 'SA15.shp')
assert os.path.isfile(survey_shapefile)

Prepare data

The data is provided as a shapefile. It is easier to process the data as geojson. Therefore, we will convert the data to geojson. Additionally, the data contains polygons that aren't crops. Since we are only interested in crops, we will filter the data to only the crop polygons.

We will use fiona to load the shapefile, shapely to manage the geometries

Reproject to WGS84

What is the coordinate reference system for this dataset?


In [39]:
src_proj = fiona.open(survey_shapefile, 'r').crs['init']
print(src_proj)


epsg:26910

Turns out it is EPSG:26910. Geojson only supports EPSG:4326. We will need to reproject the shapes.


In [40]:
# define projection
# from shapely [docs](http://toblerity.org/shapely/manual.html#shapely.ops.transform)
def define_to_wkt_projection(dataset):
    """dataset is obtained from fiona.open(file)"""
    src_proj = dataset.crs['init']
    dst_proj = 'epsg:4326'

    project_to_wkt = partial(
        pyproj.transform,
        pyproj.Proj(init=src_proj),
        pyproj.Proj(init=dst_proj))
    return project_to_wkt

def project_feature(feat, projection_fcn):
    g1 = shape(feat['geometry'])
    g2 = transform(projection_fcn, g1)
    feat['geometry'] = mapping(g2)
Filter to agricultural classes

The survey data has attributes that provide the crop type. These attributes are described in a pdf distributed with the shapefile. It was unzipped along with the shapefile files and is located at data/dwr_survey/09legend.pdf.

We are interested in the agricultural classes. Class is specified by the 'CLASS1' attribute of the feature.

The agricultural class label and descriptions are:

  • G: Grain and Hay Crops
  • R: Rice
  • F: Field Crops
  • P: Pasture
  • T: Truck, Nursery, and Berry Crops
  • D: Deciduous Fruits and Nutes
  • C: Citrus and Tropical
  • V: Vineyards

In [41]:
# Class ids from dwr_survey/09legend.pdf
agg_classes = ['G', 'R', 'F', 'P', 'T', 'D', 'C', 'V']

def is_agricultural(feat):
    return feat['properties']['CLASS1'] in agg_classes

Finally: Load data

Load the ground truth data into a list of geojson features, filtering to only agricultural classes and projecting to wkt. Because this process takes a while, save the loaded features for later use.


In [42]:
def load_ground_truth(filename):
    features = []
    with fiona.open(filename) as survey_data:
        to_wkt_projection = define_to_wkt_projection(survey_data)
        for feat in survey_data:
            if is_agricultural(feat):
                project_feature(feat, to_wkt_projection)
                features.append(feat)
    return features

features = load_ground_truth(survey_shapefile)
print(len(features))


7429

Identify Area of Interest

In this section we will identify an area of interest (aoi) for each study.

Selection of the area of interest for our study is based on the following:

  1. compact representation of many crop classes
  2. availability of imagery
  3. as large as possible but smaller than a planet image (to allow for in-scene analysis)

We visualize the ground truth data in an interactive window that allows definition of the aoi. Then we query the Planet API to determine the availability of imatery. After using this interactive visualization, we identified two AOIs, which are defined in the notebook as aoi_test and aoi_train. Next, we identify the Planet scene we want to download for our study.

Criteria 1: compact representation of many crop classes

Let's start by identifying a region of compact representation of many crop classes. We will do so by drawing a box (the aoi) over the map of the crops and then displaying the number of unique classes represented in the box.


In [43]:
# Assign colors to classes
# colors determined using [colorbrewer2.org](http://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3)
colors = ['#ffffd9','#edf8b1','#c7e9b4','#7fcdbb','#41b6c4','#1d91c0','#225ea8','#0c2c84']
class_colors = dict((a,c) for a,c in zip(agg_classes, colors))

def get_color(cls):
    return class_colors[cls]

In [44]:
# Create crop feature layer
feature_collection = {
    "type": "FeatureCollection",
    "features": features
}

for f in feature_collection['features']:
    feature_color = get_color(f['properties']['CLASS1'])
    f['properties']['style'] = {
        'weight': 0,
        'fillColor': feature_color,
        'fillOpacity': 1}

feature_layer = ipyl.GeoJSON(data=feature_collection)

In [45]:
# Initialize map using parameters from above map
# and deleting map instance if it exists
try:
    del aoi_map
except NameError:
    pass


zoom = 11
center = [38.3586252, -121.3853994] # lat/lon

In [46]:
# Create map, adding box drawing controls
# Reuse parameters if map already exists
try:
    center = aoi_map.center
    zoom = aoi_map.zoom
except NameError:
    pass

# Change tile layer to one that makes it easier to see crop features
# Layer selected using https://leaflet-extras.github.io/leaflet-providers/preview/
map_tiles = ipyl.TileLayer(url='http://{s}.basemaps.cartocdn.com/light_all/{z}/{x}/{y}.png')
aoi_map = ipyl.Map(
        center=center, 
        zoom=zoom,
        default_tiles = map_tiles
    )

aoi_map.add_layer(feature_layer)  

# Add box drawing control
# refs:
# https://github.com/kscottz/PythonFromSpace/blob/master/TheBasics.ipynb
# https://github.com/ellisonbg/ipyleaflet/blob/master/examples/DrawControl.ipynb
rectangle = {'shapeOptions': {'color': 'blue'}} 
dc = ipyl.DrawControl(
    polygon={}, polyline={}, # disable polygons and polylines
    rectangle={'shapeOptions': {'color': 'blue'}}
)

# When a box is drawn, update the label with the number of unique classes
# and save the box geometry as AOI
total_unique_classes = len(set([f['properties']['CLASS1'] for f in features]))
label = ipyw.Label(layout=ipyw.Layout(width='100%'))

aois = []
def handle_draw(self, action, geo_json):
    if action == 'created':
        box_shape = shape(geo_json['geometry'])
        contained_features = [f for f in features
                              if shape(f['geometry']).within(box_shape)]
        unique_classes = set([f['properties']['CLASS1'] for f in contained_features])
        label.value = '{} unique classes out of {} total'.format(
            len(unique_classes), total_unique_classes)
        aois.append(geo_json)
    elif action == 'deleted':
        aois.remove(geo_json)
    
dc.on_draw(handle_draw)
aoi_map.add_control(dc) 

# Display map and label
ipyw.VBox([aoi_map, label])


From this map, we have identified two potential aois for a training dataset and a testing dataset


In [47]:
# Run this to use cached aois
aoi_train = {u'geometry': {u'type': u'Polygon', u'coordinates': [[[-121.58460974693298, 38.29170496647727], [-121.58460974693298, 38.32726528409606], [-121.5248715877533, 38.32726528409606], [-121.5248715877533, 38.29170496647727], [-121.58460974693298, 38.29170496647727]]]}, u'type': u'Feature', u'properties': {u'style': {u'opacity': 0.5, u'noClip': False, u'weight': 4, u'fillColor': None, u'color': u'blue', u'lineCap': None, u'stroke': True, u'smoothFactor': 1, u'dashArray': None, u'fillOpacity': 0.2, u'clickable': True, u'lineJoin': None, u'fill': True}}}
aoi_test = {u'geometry': {u'type': u'Polygon', u'coordinates': [[[-121.3113248348236, 38.28911976564886], [-121.3113248348236, 38.34622533958], [-121.2344205379486, 38.34622533958], [-121.2344205379486, 38.28911976564886], [-121.3113248348236, 38.28911976564886]]]}, u'type': u'Feature', u'properties': {u'style': {u'opacity': 0.5, u'fillOpacity': 0.2, u'noClip': False, u'weight': 4, u'color': u'blue', u'lineCap': None, u'dashArray': None, u'smoothFactor': 1, u'stroke': True, u'fillColor': None, u'clickable': True, u'lineJoin': None, u'fill': True}}}

Criteria 2: availability of imagery

How many PL images cover the AOI defined above?

We will answer this question by querying the planet api. The client is how we interact with the planet api. It is created with the user-specific api key, which is pulled from $PL_API_KEY

planet client documentation

Much of this code is pulled from PythonFromSpace/TheBasics.ipynb


In [48]:
planet_api_key = os.environ['PL_API_KEY']

# quick check that key is defined
assert planet_api_key, "PL_API_KEY not defined."

client = api.ClientV1(api_key=planet_api_key)

Query Planet API

Filter to scenes that contain AOI. If the number is zero, go back and redefine the AOI to be smaller.


In [49]:
# build a query using the AOI and
# a cloud_cover filter that excludes 'cloud free' scenes
def query_by_aoi(aoi):
    old = datetime.datetime(year=2016,month=6,day=1)
    new = datetime.datetime(year=2016,month=10,day=1)

    search_aoi = aoi['geometry']
    query = filters.and_filter(
        filters.geom_filter(search_aoi),
        filters.range_filter('cloud_cover', lt=75),
        filters.date_range('acquired', gt=old),
        filters.date_range('acquired', lt=new)
    )

    # build a request for only PlanetScope imagery
    request = filters.build_search_request(
        query, item_types=['PSScene4Band']
    )

    # run search
    # if you don't have an API key configured, this will raise an exception
    result = client.quick_search(request)
    scenes = []
    planet_map = {}
    for item in result.items_iter(limit=500):
        planet_map[item['id']]=item
        props = item['properties']
        props["id"] = item['id']
        props["geometry"] = item["geometry"]
        props["thumbnail"] = item["_links"]["thumbnail"]
        scenes.append(props)
    scenes = pd.DataFrame(data=scenes)

    aoi_shape = shape(search_aoi)
    footprints = []
    overlaps = []
    # go through the geometry from our api call, convert to a shape and calculate overlap area.
    # also save the shape for safe keeping
    for footprint in scenes["geometry"].tolist():
        s = shape(footprint)
        footprints.append(s)
        overlap = 100.0*(aoi_shape.intersection(s).area / aoi_shape.area)
        overlaps.append(overlap)
    # take our lists and add them back to our dataframe
    scenes['overlap'] = pd.Series(overlaps, index=scenes.index)
    scenes['footprint'] = pd.Series(footprints, index=scenes.index)

    full_coverage = scenes["overlap"] > 99
    good_scenes = scenes[(full_coverage)]
    return good_scenes

scenes_test = query_by_aoi(aoi_test)
print(len(scenes_test))

scenes_train = query_by_aoi(aoi_train)
print(len(scenes_train))


1
4

Visualize subset of scene thumbnails and then select one

If all you see are broken image icons, click one of the urls and log into the site.


In [50]:
def display_thumbnails(scenes, limit=10):
    for thumb_url in scenes['thumbnail'].tolist()[:limit]:
        img_image = Image(url=thumb_url)
        display(img_image)
        print(thumb_url)

# Uncomment one of the lines below to display the thumbnails
# display_thumbnails(scenes_train, limit=5)
# display_thumbnails(scenes_test, limit=10)

For the train scenes, https://api.planet.com/data/v1/item-types/PSScene4Band/items/20160831_180231_0e0e/thumb looks pretty good. It is from August 2016 and looks clear of clouds.

For the test scenes, https://api.planet.com/data/v1/item-types/PSScene4Band/items/20160831_180257_0e26/thumb looks great. It is from the same day as the train scene.


In [51]:
train_thumbnail = 'https://api.planet.com/data/v1/item-types/PSScene4Band/items/20160831_180231_0e0e/thumb'
display(Image(url=train_thumbnail))

test_thumbnail = 'https://api.planet.com/data/v1/item-types/PSScene4Band/items/20160831_180257_0e26/thumb'
display(Image(url=test_thumbnail))


Save datasets

We will save the train and test AOI and the ground truth data.


In [52]:
def save_geojson(features, filename):
    with open(filename, "w") as f:
        f.write(json.dumps(features))

In [53]:
# save train AOI
save_geojson(aoi_train, os.path.join(predata_dir, 'aoi-train.geojson'))

# save test
save_geojson(aoi_test, os.path.join(predata_dir, 'aoi-test.geojson'))

# save ground truth data
save_geojson(features, os.path.join(predata_dir, 'ground-truth.geojson'))