Crop Temporal Analysis

Throughout the growth cycle of a crop, the plants sprout, grow, leaf out, and mature. Accordingly, the reflectance measure over the crop, it's spectral signature, changes as well. In this tutorial, we investigate this temporal change in spectral signature of each crop through temporal analysis of satellite imagery of the crops. This type of analysis is made possible with the daily coverage of PlanetScope imagery.

We perform analysis in increasing steps of complexity in this tutorial. First, we calculate statics over a single field in a single Surface Reflectance image. We take advantage of the Cloud-Optimized Geotiffs (COGs) Planet provides to download only the pixels within the field, then calculate summary statistics on each band. Next, we calculate statistics over all images covering the field, keeping track of collection date, thus introducing temporal analysis over one field. Finally, we perform temporal analysis over multiple fields, keeping track of their field type classification, to compare temporal signatures across multiple field types. We analyze the results and look into how they can help differentiate the crop types.

In this tutorial, we focus on imagery collected within June-September 2017 (the growth season). We use field boundaries and crop type definitions collected in 2015. This dataset is described in the Identify Datasets notebook and prepared in the Prepare Datasets notebook. Because the survey data was collected in 2015 and the imagery used in this tutorial was collected in 2017, we expect there to be some inaccuracies in the field type classification, due to rotating crops within the two-year gap.

In the interest of calculation time, we limit the calculation of temporal statistics to sample subsets of fields and imagery. Definitions for 946 fields are provided in this dataset. Within the tutorial, we find that approximately 120 images are collected of each field within those dates. We also determine that it takes approximately 3 minutes to activate, download, and calculate statistics from 10 images, 10 minutes for 50 images (due to activation time, this does not scale linearly to 12-18 seconds for a single image).

We first perform an initial calculation from 10 scenes per field for 15 fields. It takes approximately 30 minutes to download and process the 150 scenes. In working with this dataset, we create an algorithm for filtering out outliers (likely due to cloudy scenes). The resulting data reveals some differentiation in the statistics from fields, but the data is sparse. Next, we perform another calculation of temporal statistics, this time over 50 images per field and 15 fields, a total of 750 data points and approximately 2.5 hours calculation time. To speed up running this notebook, we cache the statistics data from the both calculations and pull from the cached data unless run_process is set to True.

In analyzing the results, we find that temporal analysis can be helpful in identifying the field types. The mean reflectance across the three field types studied in this notebook (corn, safflower, and sunflower) shows variation between the visual and NIR bands, variation in behavior over time, and variation in steady-state reflectance.


In [1]:
import datetime
import json
import os
import shutil
import subprocess

import geojson
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from planet import api
from planet.api import filters, downloader
import rasterio
from shapely.geometry import shape

Prepare Crop AOI Features

We start with ground-truth.geojson, which was prepared in the datasets-identify tutorial. This dataset came from a 2015 survey of a region in Sacramento.

We then filter to features that correspond with Field Crops. We then sample those features to create our study features. We do so by randomly choosing a sample size of features from each subclass within the class.

Load the Prepared Ground Truth Features


In [2]:
# this file was prepared in the datasets-identify tutorial
ground_truth_file = os.path.join('src', 'ground-truth.geojson')

In [3]:
def load_geojson(filename):
    with open(filename, 'r') as f:
        return json.load(f)

ground_truth = load_geojson(ground_truth_file)

Filter to Field crops

The survey data has attributes that provide the crop type. These attributes are described in a pdf distributed with the shapefile. It was unzipped along with the shapefile files and is located at data/dwr_survey/09legend.pdf.

We are interested in the Field Crop class. Class is specified by the CLASS1 property. The Field Crop class is identified by CLASS1 value of F.


In [4]:
crop_ground_truth = [f for f in ground_truth
                     if f['properties']['CLASS1'] == 'F']
print('{} out of {} features are field crops.'
      .format(len(crop_ground_truth),len(ground_truth)))


991 out of 7429 features are field crops.

Filter to Valid Subclasses

The field subclasses are:

  • 1: Cotton
  • 2: Safflower
  • 3: Flax
  • 4: Hops
  • 5: Sugar Beets
  • 6: Corn (field & sweet)
  • 7: Grain sorghum
  • 8: Sudan
  • 9: Castor beans
  • 10: Beans (dry)
  • 11: Miscellaneous field
  • 12: Sunflowers
  • 13: Hybrid sorghum/sudan
  • 14: Millet
  • 15: Sugar cane

In this section, we filter out features that are uncategorized (subclass is **).


In [5]:
field_type_names = {
    1: 'cotton',
    2: 'safflower',
    3: 'flax',
    4: 'hops',
    5: 'sugar beets',
    6: 'corn',
    7: 'grain sorghum',
    8: 'sudan',
    9: 'castor beans',
    10: 'beans',
    11: 'misc field',
    12: 'sunflowers',
    13: 'hybrid sorghum/sudan',
    14: 'millet',
    15: 'sugar cane'
}

In [6]:
cat_crop_ground_truth = [f for f in crop_ground_truth
                         if f['properties']['SUBCLASS1'] != '**']
print('{} out of {} crop field features are categorized.'
      .format(len(cat_crop_ground_truth),len(crop_ground_truth)))


946 out of 991 crop field features are categorized.

Calculate Stats for a Single Field

We will run through the process of calculating stats for a single field.

This involves finding the overlapping scenes, activating them, and then calculating stats on the pixels within the field directly from the COG at the download url.


In [7]:
# define and create test directory
# delete it if it already exists to ensure we start from a clear slate
test_dir = os.path.join('data', 'test')
if os.path.isdir(test_dir):
    shutil.rmtree(test_dir)
os.mkdir(test_dir)

In [8]:
field_geojson = cat_crop_ground_truth[0]

In [9]:
shape(field_geojson['geometry'])


Out[9]:

In [10]:
def get_id(field_geojson):
    return field_geojson['id']

print(get_id(field_geojson))


87

Save geojson file


In [11]:
def create_save_dir(aoi_geojson, root_dir='data'):
    save_dir = os.path.join(root_dir, get_id(aoi_geojson))

    if not os.path.isdir(save_dir):
        os.makedirs(save_dir)
    return save_dir

save_dir = create_save_dir(field_geojson)
print(save_dir)


data/87

In [12]:
def save_geojson_file(aoi_geojson, save_dir):
    filename = os.path.join(save_dir, 'aoi.geojson')
    with open(filename, "w") as f:
        f.write(json.dumps(aoi_geojson))
    return filename

geojson_filename = save_geojson_file(field_geojson, save_dir)
print('wrote to {}'.format(geojson_filename))


wrote to data/87/aoi.geojson

Search for Overlapping Scenes

In this section we search for surface reflectance 4band PS scenes that were collected between June and October, 2017 that also overlap the field aoi.


In [13]:
# command-line search
# !planet data search --item-type PSScene4Band --geom $geojson_filename

In [14]:
planet_api_key = os.environ['PL_API_KEY']

# quick check that key is defined
assert planet_api_key, "PL_API_KEY not defined."

client = api.ClientV1(api_key=planet_api_key)

In [15]:
item_type = 'PSScene4Band'
asset_type = 'analytic_sr'

In [16]:
# utility functions for searching for scenes that cover the field aoi
# between June through September, 2017
def build_request(aoi, item_type):
    old = datetime.datetime(year=2017,month=6,day=1)
    new = datetime.datetime(year=2017,month=10,day=1)

    search_aoi = aoi['geometry']
    query = filters.and_filter(
        filters.geom_filter(search_aoi),
        filters.range_filter('cloud_cover', lt=75),
        filters.date_range('acquired', gte=old),
        filters.date_range('acquired', lt=new)
    )

    # build a request for only PlanetScope imagery
    request = filters.build_search_request(
        query, item_types=[item_type]
    )
    
    return request


def get_monthly_stats(client, request):
    stats_request = request.copy()
    stats_request['interval'] = 'month'
    return client.stats(stats_request).get()
    
request = build_request(field_geojson, item_type)
monthly_stats = get_monthly_stats(client, request)
total = sum([b['count'] for b in monthly_stats['buckets']])
print('monthly stats for intersecting scenes, {} total'.format(total))
print(monthly_stats)


monthly stats for intersecting scenes, 122 total
{'utc_offset': '+0h', 'interval': 'month', 'buckets': [{'count': 24, 'start_time': '2017-06-01T00:00:00.000000Z'}, {'count': 37, 'start_time': '2017-07-01T00:00:00.000000Z'}, {'count': 33, 'start_time': '2017-08-01T00:00:00.000000Z'}, {'count': 28, 'start_time': '2017-09-01T00:00:00.000000Z'}]}

In [17]:
# utilities for retrieving scene information and filtering to only scenes
# that totally overlap field
def get_items(client, request, limit=500):
    # run search
    # if you don't have an API key configured, this will raise an exception
    result = client.quick_search(request)
    return result.items_iter(limit=limit)

def filter_by_overlaps(items, aoi):
    aoi_shape = shape(aoi['geometry'])
    
    def get_overlap(item):
        item_shape = shape(item['geometry'])
        overlap = 100.0*(aoi_shape.intersection(item_shape).area / aoi_shape.area)
        return overlap
    
    return (i for i in items if get_overlap(i) > 99)

def search_items(aoi, item_type, client, limit=500):
    request = build_request(aoi, item_type)
    items = get_items(client, request, limit=limit)
    return filter_by_overlaps(items, aoi)   

item = list(search_items(field_geojson, item_type, client, limit=5))[0]
print(item['id'])
print(item['properties'])


20170818_190629_1050
{'acquired': '2017-08-18T19:06:29.542084Z', 'anomalous_pixels': 0.01, 'cloud_cover': 0.01, 'columns': 8580, 'epsg_code': 32610, 'ground_control': True, 'gsd': 3.7, 'instrument': 'PS2', 'item_type': 'PSScene4Band', 'origin_x': 611976, 'origin_y': 4235748, 'pixel_resolution': 3, 'provider': 'planetscope', 'published': '2017-10-29T01:45:40Z', 'quality_category': 'test', 'rows': 4394, 'satellite_id': '1050', 'strip_id': '696218', 'sun_azimuth': 146.6, 'sun_elevation': 61, 'updated': '2017-12-07T05:57:27Z', 'usable_data': 0, 'view_angle': 0.2}

Activate Scene

We use the planet api client downloader to handle activation of the scenes. The downloader handles activation, polling activation status, and (if desired), downloading. Because we are using remote COGs, we do not want to download the scene. However, the downloader can still help us out. It has a cool feature where you can provide it with a function to call when a scene is activated.

In this section, we will provide it with a function that records the scene id and download url. The function is actually just a method of a class (Tracker) that maintains a dataset of ids and download urls. The method simply updates that list when it is called by the downloader.

The use of this Tracker class to keep track of the download urls is a bit of a complicated solution for what we need. Similarly, the use of the bulk downloader to activate only one scene is overly complicated. However, this is serving as a simple example of the workflow. We will use something very similar to this process to activate multiple scenes and calculate the stats of only the pixels within the field aoi in the next section.


In [18]:
# create a downloader that will handle scene activateion
dl = downloader.create(client)

In [19]:
# This class keeps track of activated scene download urls
# It also creates the `on_complete` partial function, which can be called
# by the downloader to update the list of scene download urls
class Tracker(object):
    def __init__(self):
        self.urls = dict()
        
    def get_on_complete(self):
        def on_complete(item, asset):
            self.urls[item['id']] = asset['location']
            print('{}:{}'.format(item['id'], asset['location']))
        return on_complete

# create the function that keeps track of the download urls
tracker = Tracker()
dl.on_complete = tracker.get_on_complete()

# use the downloader to activate the scene and get the download url
dl.shutdown()
%time dl.activate((i for i in [item]), [asset_type])
item_id, download_url = list(tracker.urls.items())[0]
print(item_id)


20170818_190629_1050:https://api.planet.com/data/v1/download?token=eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJKTEgzM1VsU2h4aGJXMUw5VWxwZUlHOGFWNXoyOEtmZVdTUE04Zk4wTStaWmsrRTNITFlRb3BJaTk2SklUT05EMTY1ME51ajl4RkhKM3FzbXZSbUZ1Zz09IiwiaXRlbV90eXBlX2lkIjoiUFNTY2VuZTRCYW5kIiwidG9rZW5fdHlwZSI6InR5cGVkLWl0ZW0iLCJleHAiOjE1MzAwNTQyMzMsIml0ZW1faWQiOiIyMDE3MDgxOF8xOTA2MjlfMTA1MCIsImFzc2V0X3R5cGUiOiJhbmFseXRpY19zciJ9.FkxnWYQgyXfnmkQlgacssXuVvVuUe9RQuo1xkxYNcXkE64TGy4P7Z6OqWGeARp1mXHK8ENyEODGwEJtXFLSaHw
CPU times: user 10 ms, sys: 0 ns, total: 10 ms
Wall time: 769 ms
20170818_190629_1050

Use GDAL to Download AOI Image

Now that we know the download url and have the crop aoi, we can use the COG nature of Planet scenes to only download the AOI Image. This saves us a great deal of download time and local storage!


In [20]:
# we need to use the vsicurl gdal driver to work with COGs.
vsicurl_url = '/vsicurl/' + download_url

In [21]:
# uncomment to check that gdal can access COG
# we need to use the vsicurl gdal driver to work with COGs

# vsicurl_url = '/vsicurl/' + download_url
# !gdalinfo $vsicurl_url

In [22]:
def create_output_filename(item_id, save_dir):
    filename = os.path.join(save_dir, item_id + '.tif')
    return filename

output_file = create_output_filename(item_id, save_dir)

In [23]:
# we use gdalwarp and the crop_to_cutline argument to only download the aoi portion of the COG
def _gdalwarp(input_filename, output_filename, options, verbose=False):
    commands = ['gdalwarp'] + options + \
               ['-overwrite',
                input_filename,
                output_filename]
    if verbose: print(' '.join(commands))
    subprocess.check_call(commands)
    
def download_scene_aoi(download_url, output_filename, geojson_filename, verbose=False):
    vsicurl_url = '/vsicurl/' + download_url
    options = [
        '-cutline', geojson_filename,
        '-crop_to_cutline',
    ]
    _gdalwarp(vsicurl_url, output_filename, options, verbose=verbose)

%time download_scene_aoi(download_url, output_file, geojson_filename, verbose=True)


gdalwarp -cutline data/87/aoi.geojson -crop_to_cutline -overwrite /vsicurl/https://api.planet.com/data/v1/download?token=eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJKTEgzM1VsU2h4aGJXMUw5VWxwZUlHOGFWNXoyOEtmZVdTUE04Zk4wTStaWmsrRTNITFlRb3BJaTk2SklUT05EMTY1ME51ajl4RkhKM3FzbXZSbUZ1Zz09IiwiaXRlbV90eXBlX2lkIjoiUFNTY2VuZTRCYW5kIiwidG9rZW5fdHlwZSI6InR5cGVkLWl0ZW0iLCJleHAiOjE1MzAwNTQyMzMsIml0ZW1faWQiOiIyMDE3MDgxOF8xOTA2MjlfMTA1MCIsImFzc2V0X3R5cGUiOiJhbmFseXRpY19zciJ9.FkxnWYQgyXfnmkQlgacssXuVvVuUe9RQuo1xkxYNcXkE64TGy4P7Z6OqWGeARp1mXHK8ENyEODGwEJtXFLSaHw data/87/20170818_190629_1050.tif
CPU times: user 0 ns, sys: 10 ms, total: 10 ms
Wall time: 5.93 s

Visualize Image


In [24]:
# load local visual module
# autoreload because visual is in development

%load_ext autoreload
%autoreload 2

import visual

In [25]:
def load_sr(filename):
    with rasterio.open(filename, 'r') as src:
        # visual band ordering: red, green, blue, alpha
        b, g, r, n = src.read() 

        # NoData value is 0
        mask = b == 0

    return [np.ma.array(band, mask=mask) for band in [b, g, r, n]]

def visualize_sr(filename, title='Cropped Scene'):
    bgrn_bands = load_sr(filename)

    rgb_bands = [bgrn_bands[i] for i in [2, 1, 0]]
    visual.plot_image(rgb_bands, title=title, figsize=(5, 5))

print(output_file)
visualize_sr(output_file)


data/87/20170818_190629_1050.tif

Calculate Band Statistics


In [26]:
def get_band_stats(band):
    """Calculate simple statistics for a band"""
    # Consider adding stats from here:
    # https://docs.scipy.org/doc/scipy/reference/stats.mstats.html
    stats = {
        'mean': band.mean(),
        'std': band.std(),
        'max': band.max(),
        'min': band.min(),
        'count': band.count()
    }
    return stats

def get_stats(filename):
    bands = load_sr(filename)
    stats = [get_band_stats(band)
             for band in bands]
    return stats

print(get_stats(output_file))


[{'mean': 1496.3472610447598, 'std': 161.00181677186868, 'max': 1770, 'min': 950, 'count': 27547}, {'mean': 1700.2954586706358, 'std': 177.0793771974933, 'max': 1955, 'min': 1114, 'count': 27547}, {'mean': 1957.8580970704613, 'std': 193.53208828129823, 'max': 2231, 'min': 1262, 'count': 27547}, {'mean': 2656.661124623371, 'std': 179.23117001844483, 'max': 3381, 'min': 1959, 'count': 27547}]

Calculate Stats Directly For Multiple Scenes

In this section, we download the aoi image and calculate the band statistics all together in a function that is called directly by the downloader.

In the interest of time, we limit the number of images to 10 or less (though the search may return 10 scenes, some of them may not completely overlap the field).


In [27]:
def get_subclass(crop_geojson):
    return crop_geojson['properties']['SUBCLASS1']

def get_date(scene_id):
    date_str = scene_id[:8]
    return datetime.datetime.strptime(date_str, "%Y%m%d").date()

In [28]:
class StatsCalculator(object):
    def __init__(self, aoi_geojson, root_dir='data'):
        self.save_dir = create_save_dir(aoi_geojson, root_dir=root_dir)
        self.geojson_file = save_geojson_file(aoi_geojson, self.save_dir)
        self.aoi_geojson = aoi_geojson
        self.info = []
        
    def get_on_complete(self, overwrite=False, verbose=False):
        def on_complete(item, asset):
            download_url = asset['location']
            scene_id = item['id']
            if verbose: print('{}'.format(scene_id))
            
            output_file = create_output_filename(scene_id, self.save_dir)
            if overwrite or not os.path.isfile(output_file):
                if verbose: print('downloading...')
                download_scene_aoi(download_url, output_file, self.geojson_file, verbose=False)
            
            band_names = ['blue', 'green', 'red', 'nir']
            band_stats = get_stats(output_file)
            for stats, name in zip(band_stats, band_names):
                info = {
                    'field_id': get_id(self.aoi_geojson),
                    'field_type': get_subclass(self.aoi_geojson),
                    'scene_id': scene_id,
                    'scene_filename': output_file,
                    'band': name,
                    'date': get_date(scene_id)
                }
                info.update(stats)
                self.info.append(info)

        return on_complete

def get_field_stats_info(field_geojson, max_scenes, client, item_type, overwrite=False, verbose=False):
    if max_scenes > 0:
        items = search_items(field_geojson, item_type, client, limit=max_scenes)
        stats_calculator = StatsCalculator(field_geojson)
        dl.on_complete = stats_calculator.get_on_complete(overwrite=overwrite, verbose=verbose)
        
        dl.shutdown()
        dl.activate(items, [asset_type])
        info = stats_calculator.info
    else:
        # skip activation if max_scenes is not > 0
        info = []
    return pd.DataFrame(data=info)

# it takes about 3-4 minutes to download and calculate stats on 10 scenes
max_scenes=10
 
# we set overwrite to False (the default value of that variable) so we do not re-download
# the image if it is already cached
field_info = get_field_stats_info(field_geojson, max_scenes, client, item_type, overwrite=False)

In [29]:
print('calculated statistics across {} scenes'.format(int(len(field_info) / 4)))


calculated statistics across 7 scenes

In [30]:
# show first few scenes of field

# we filter to just red data points because there are 4 data points for each scene
# (red, green, blue, and NIR)
collects = field_info[field_info.band == 'red'][:3]

for collect in collects.itertuples():
    title='{}\n'.format(collect.scene_id)
    title += 'mean % : {}'.format(collect.mean)
    visualize_sr(collect.scene_filename, title=title)


Plot Band Means


In [80]:
def plot_band_statistic(df, stat, title):
    for field_type, group in df.groupby(['field_type']):
        field_name = field_type_names[int(field_type)]

        x = group['date'].tolist()
        y = group[stat].tolist()
        plt.scatter(x, y, marker='^', label=field_name)
    
    all_x = df['date'].tolist()
    one_day = datetime.timedelta(days=1)
    plt.xlim(min(all_x) - one_day, max(all_x) + one_day)
    plt.margins(x=0)
    plt.xticks(rotation=90)
    plt.title(title)  

def plot_statistic(df, statistic, title=None):
    fig = plt.figure(figsize=(20,5))

    bands = ['blue', 'green', 'red', 'nir']
    
    title = title if title is not None else statistic
    fig.suptitle(title)

    for i, band in enumerate(bands):
        ax = fig.add_subplot(1,4,i+1)
        band_group = df[df['band'] == band]
        plot_band_statistic(band_group, statistic, band)

    ax.legend(bbox_to_anchor=(1.05, 0), loc='lower left', borderaxespad=0.)
    plt.show()

def plot_all_statistics(df):
    plot_statistic(df, 'mean')
    plot_statistic(df, 'std')

plot_all_statistics(field_info)


Calculate Stats for Multiple Fields

Up to this point, we have focused on stats of a single field. Now we get into the meat of things, calculating stats across multiple fields and grouping by crop type (specified by the subclass property).

Determine List of Sample Features

There are just too many categorized crop fields to calculate statistics on them all in a reasonable time using just one CPU. Therefore, we will create a list of sample features, features that equally represent the crop types.


In [81]:
# determine the subclasses in this set and counts
subclasses_list = [field_geojson['properties']['SUBCLASS1']
                   for field_geojson in cat_crop_ground_truth]
subclasses = dict([x, subclasses_list.count(x)]
                  for x in set(subclasses_list))
print('subclasses and counts')
print(json.dumps(subclasses, indent=4))


subclasses and counts
{
    "12": 14,
    "6": 776,
    "10": 26,
    "4": 8,
    "13": 4,
    "8": 35,
    "2": 65,
    "11": 1,
    "7": 17
}

In [82]:
# number of samples for each subclass
num_samples = 5

In [83]:
# filter the subclasses to those with adequate number of features
filt_subclasses = [subclass
                   for (subclass, count) in subclasses.items()
                   if count > num_samples]
print('filtered subclasses: {}'.format(filt_subclasses))


filtered subclasses: ['12', '6', '10', '4', '8', '2', '7']

In [84]:
# lets focus on only 3 subclasses for now, comment to use all subclasses
filt_subclasses = filt_subclasses[:3]
print('filtered subclasses: {}'.format(filt_subclasses))


filtered subclasses: ['12', '6', '10']

In [85]:
# create a list of sample features
# first filter to features within a subclass, then randomly pick a sample of those features

np.random.seed(0) # make random sampling repeatable

sample_features = []
for subclass in filt_subclasses:
    subclass_features = [f for f in crop_ground_truth if get_subclass(f) == subclass]
    sample_features.extend(np.random.choice(subclass_features, num_samples, replace=False))
print('{} sample field features'.format(len(sample_features)))


15 sample field features

Calculate Stats for Sample Features

In this section, we calculate statistics for all of the sample features. We loop through each of the fields, activating and calculating stats for each scene that overlaps the field.

The calculation of stats for one field from 50 scenes takes 10 minutes. Therefore, we expect the calculation of stats for 15 fields from 50 scenes each to take 150 minutes, or 2.5 hours. That's quite a while! For now, we will limit the number of scenes to 10, which should take 30 minutes to process. Still a while, but not unreasonable.


In [86]:
max_scenes = 10

In [87]:
def get_all_field_stats_info(sample_features, client, item_type, max_scenes=10):
    all_info = []

    for field_geojson in sample_features:
        info = get_field_stats_info(field_geojson, max_scenes, client, item_type)
        all_info.extend(info)
    
    return all_info

In [88]:
run_process = False
scenes_info_filename = 'src/scene_info_df.pkl'

if run_process:
    info_df = analyze_more_scenes(sample_features, client, item_type)
    
    # uncommenting overwrites cached data
    # info_df.to_pickle(scenes_info_filename)
else:
    info_df = pd.read_pickle(scenes_info_filename)

Explore Stats

We explore the statistics results by plotting them and then filtering outliers. We look for entries where the mean statistic of all 4 bands are all outliers.


In [89]:
plot_all_statistics(info_df)


It is difficult to see the results due to the outliers across all of the bands. These outliers could potentially be cloudy scenes. To clean up the results, we will filter out the scenes that have mean values that are outliers across all bands.


In [90]:
def filter_bad_scenes(df):
    bad_scene_sets = []
    for band, group in df.groupby(['band']):
        m = group['mean']
        mmax = m.mean() + 2 * m.std()
        outliers = group[m > mmax]
        bad_scene_sets.append(set(outliers['scene_id']))
    
    # get list of scenes that have outliers in all 4 bands
    bad_scenes = set.intersection(*bad_scene_sets)
    print('{} bad scenes:\n{}'.format(len(bad_scenes), bad_scenes))
    bad_scene_entry = df.scene_id.isin(bad_scenes)

    print('{} entries from the bad scenes filtered out'.format(bad_scene_entry.sum()))

    return df[~bad_scene_entry]

filtered_info_df = filter_bad_scenes(info_df)


3 bad scenes:
{'20170903_180631_1030', '20170903_180632_1030', '20170914_180835_0f38'}
24 entries from the bad scenes filtered out

In [91]:
plot_all_statistics(filtered_info_df)


Awesome! The filter removed the outliers. From these results, it looks like corn mean reflectance is darker in the blue, green, and red bands than sunflowers and safflower. Also, the mean reflectance of corn in the NIR band decreases over time. Sunflowers and safflower are more difficult to distinguish. Ultimately, more data points would help here.

The section below will run calculations on 50 scenes per field. It can take 2.5 hours so is not for the faint of heart! The data is also cached, so you can load and explore it without having to go through all of that processing.


In [92]:
def analyze_more_scenes(sample_features, client, item_type):
    dl.shutdown()
    all_info = get_all_field_stats_info(sample_features, client, item_type, max_scenes=50)
    info_df = pd.DataFrame(data=all_info)
    filtered_info_df = filter_bad_scenes(info_df)
    
    return filtered_info_df

run_process = False
more_scenes_filename = 'src/more_scenes_df.pkl'

if run_process:
    more_scenes_df = analyze_more_scenes(sample_features, client, item_type)
    
    # uncommenting overwrites cached data
    # filtered_info_df.to_pickle(more_scenes_filename)
else:
    more_scenes_df = pd.read_pickle(more_scenes_filename)

In [93]:
plot_all_statistics(more_scenes_df)


These plots give a little more insight into the field temporal statistics. Sunflower and corn NIR reflectance means decrease over time, while safflower NIR reflectance mean stays pretty steady. Corn red, green, and blue reflectance mean stays pretty steady, with a slight increase over time. Sunflower red, green, and blue reflectance mean is all over the board, which is kind of confusing. Safflower red, green, and blue reflectance mean is, similar to corn, pretty steady with a slow increase over time.

In general, there isn't much of a pattern in the standard deviation of the reflectance bands. Corn does stand out as having the lowest reflectance standard deviation across the blue, green, and red bands. Additionally, NIR reflectance standard deviation is higher. The NIR reflectance is much higher in the NIR band than the red, green and blue bands (which makes sense, we are looking at actively growing fields), and the higher standard deviation could either be due to increased sensitivity to the health of the crops vs the visual bands, or it could be due to higher noise associated with the higher signal of the NIR reflectance band.


In [ ]: