Creating Labeled Data from a Planet GeoTIFF with Label Maker

In this notebook, we create labeled data for training a machine learning algorithm. As inputs, we use OpenStreetMap as the ground truth source and a Planet GeoTIFF as the image source. Development Seed's Label Maker tool is used to download and prepare the ground truth data, chip the Planet imagery, and package the two to feed into the training process.

The primary interface for Label Maker is through the command-line interface (cli). It is configured through the creation of a configuration file. More information about that configuration file and command line usage can be found in the Label Maker repo README.

The goal of this tutorial is to demonstrate labeling data from a local GeoTIFF as well as a Cloud-Optimized GeoTIFF (COG) available via the Planet API. This is inspired by the fact that label-maker now supports both local GeoTIFFs and remote COGs (blog post). When only a portion of a scene is needed, accessing it as a remote COG can save time, bandwidth, and local storage.

NOTE: Currently, label-maker supports only 8-bit RGB imagery. Therefore, the visual asset is best for use with label-maker.

RUNNING NOTE

This notebook is meant to be run in a docker image specific to this folder. The docker image must be built from the custom Dockerfile according to the directions below.

In label-data directory:

docker build -t planet-notebooks:label .

Then start up the docker container as you usually would, specifying planet-notebooks:label as the image.

There is currently an incompatibility between the URL Planet uses for COGs (which does not use a geotiff name along with the tif extension) and the released version of label-maker. The released version looks for the tif extension in the url before treating it as a COG. See the issue for more information. There is a fixed version at jreiberkyle/label-maker.

This image installs the fixed version of label-maker along with its dependencies.

Install Dependencies

In addition to the python packages imported below, the label-maker and planet python packages are also a dependency. However, in this notebook, both packages are accessed through their command-line interface.



In [1]:

    
import json
import os

import geojson
import ipyleaflet as ipyl
import ipywidgets as ipyw
from IPython.display import Image
import numpy as np
import rasterio
import requests
from requests.auth import HTTPBasicAuth
from shapely.geometry import shape

Setup

In this section, we create the local data directory and set the scene specifications.

As noted in the intro, label-maker only supports 8-bit imagery, so we want to make sure to use the visual asset.



In [2]:

    
# create data directory
data_dir = os.path.join('data', 'label-maker-geotiff')
if not os.path.isdir(data_dir):
    os.makedirs(data_dir)



In [3]:

    
# scene specifications
item_id = '760818_4848718_2017-09-17_0e2f'
item_type = 'PSOrthoTile'
asset_type = 'visual'

Label Local GeoTIFF

In this portion of the tutorial, we create labeled data using a local GeoTIFF downloaded from Planet.

Download Scene



In [4]:

    
!planet data download --item-type $item_type --asset-type $asset_type \
    --string-in id $item_id --dest $data_dir --quiet









    



{"item": "760818_4848718_2017-09-17_0e2f", "asset": "visual", "location": "/home/jovyan/work/jupyter-notebooks/label-data/data/label-maker-geotiff/760818_4848718_2017-09-17_0e2f_RGB_Visual.tif"}

Create Label Maker Config File

Label maker is behavior is specified through a configuration file. The configuration file we use in this tutorial was pulled from the label-maker tutorial on mapping buildings in Vietnam and then customized to utilize a local GeoTIFF. The imagery url is set to the GeoTIFF filename. We also changed the bounds to an area of interest fully contained within the GeoTIFF. This is because I am not sure how label maker handles masked pixels.

See the label-maker README for a description of the config entries.



In [5]:

    
# define AOI
bounds_geom = {'type': 'Polygon',
 'coordinates': [[[105.81775409169494, 20.84015810005586],
   [105.9111433289945, 20.84015810005586],
   [105.9111433289945, 20.925748489914824],
   [105.81775409169494, 20.925748489914824],
   [105.81775409169494, 20.84015810005586]]]}

bounding_box = shape(bounds_geom).bounds
bounding_box









    Out[5]:





(105.81775409169494, 20.84015810005586, 105.9111433289945, 20.925748489914824)



In [6]:

    
# define location relative to data_dir
geotiff_filename = '760818_4848718_2017-09-17_0e2f_RGB_Visual.tif'

# create config file
local_config = {
    "country": "vietnam",
    "bounding_box": bounding_box,
    "zoom": 17,
    "classes": [
    { "name": "Buildings", "filter": ["has", "building"] }
    ],
    "imagery": geotiff_filename,
    "background_ratio": 1,
    "ml_type": "classification"
}

# define project files and folders
local_config_name = 'config_local.json'
local_config_filename = os.path.join(data_dir, local_config_name)

# write config file
with open(local_config_filename, 'w') as cfile:
    cfile.write(json.dumps(local_config))

print('wrote config to {}'.format(local_config_filename))









    



wrote config to data/label-maker-geotiff/config_local.json

Run label-maker

In this section, we use label-maker to download and prepare the OSM label data and tile the GeoTIFF.

For more details on running label-maker, see the README.



In [7]:

    
!cd $data_dir && label-maker download --config $local_config_name









    



Saving QA tiles to data/vietnam.mbtiles
   100%     52.5 MiB       4.0 MiB/s            0:00:00 ETA



In [8]:

    
!cd $data_dir && label-maker labels --config $local_config_name









    



Determining labels for each tile
---
Buildings: 18 tiles
Total tiles: 1225
Writing out labels to data/labels.npz



In [9]:

    
# skip preview because it fails due to imagery-offset arg
# https://github.com/developmentseed/label-maker/issues/79
# !cd $data_dir && label-maker preview -n 3 --config $local_config_name



In [34]:

    
tiles_dir = os.path.join(data_dir, 'data', 'tiles')
print(tiles_dir)









    



data/label-maker-geotiff/data/tiles



In [35]:

    
!ls $tiles_dir



In [37]:

    
# clean out tiles directory if it exists
!rm -rf $tiles_dir



In [38]:

    
%time !cd $data_dir && label-maker images --config $local_config_name









    



Downloading 36 tiles to data/tiles
CPU times: user 40 ms, sys: 30 ms, total: 70 ms
Wall time: 3.5 s



In [39]:

    
# look at three tiles that were generated
num_tiles = 3
for img in os.listdir(tiles_dir)[:num_tiles]:
    img_filename = os.path.join(tiles_dir, img)
    print(img_filename)
    display(Image(filename=img_filename))









    



data/label-maker-geotiff/data/tiles/104064-57754-17.jpg






    












    



data/label-maker-geotiff/data/tiles/104087-57749-17.jpg






    












    



data/label-maker-geotiff/data/tiles/104080-57759-17.jpg



In [40]:

    
# will not be able to open image tiles that weren't generated because the label tiles contained no classes
!cd $data_dir && label-maker package --config $local_config_name









    



Saving packaged file to data/data.npz

Check labeled data package



In [41]:

    
data_file = os.path.join(data_dir, 'data', 'data.npz')
data = np.load(data_file)



In [42]:

    
for k in data.keys():
    print('data[\'{}\'] shape: {}'.format(k, data[k].shape))









    



data['x_train'] shape: (28, 256, 256, 3)
data['y_train'] shape: (28, 2)
data['x_test'] shape: (8, 256, 256, 3)
data['y_test'] shape: (8, 2)

28 x (image) and y (label) datasets were created in the train set, and 8 x and y datasets were created in the test set, adding up to 36 sets total. Not enough to train a classifier, but this is only one image in a daily image stream, so looking at an image stack would allow us to build up an excellent labeled training dataset quickly!

Remote Cloud-Optimized GeoTIFF

In this portion of this tutorial, we are accessing a portion of the GeoTIFF directly from the download endpoint. This way we only download the pixels that we need.

Activate Scene

Before we can access the scene, it must be activated. Here, we activate the scene using the planet cli since it waits until the asset is activated before moving on.

Because activations do not last very long, be sure to activate right before you access the scene.

For another tutorial covering accessing Planet COGs, see the Download a Subarea tutorial.



In [45]:

    
!planet data download --activate-only --item-type $item_type --asset-type $asset_type \
    --string-in id $item_id --quiet









    



{"item": "760818_4848718_2017-09-17_0e2f", "asset": "visual", "location": "https://api.planet.com/data/v1/download?token=eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiI4czVvZXROZzFQQjUrWlRnYzNPNmYrc1N6aUNYQ1RjMGI0ejVydVBvWWZQOTJqRXVISHJJdkF0K0ZVSDVRZC9yWm5SM2ZYWUw5dlQvckpabm1GN2liZz09IiwiaXRlbV90eXBlX2lkIjoiUFNPcnRob1RpbGUiLCJ0b2tlbl90eXBlIjoidHlwZWQtaXRlbSIsImV4cCI6MTUzODAwOTk5MCwiaXRlbV9pZCI6Ijc2MDgxOF80ODQ4NzE4XzIwMTctMDktMTdfMGUyZiIsImFzc2V0X3R5cGUiOiJ2aXN1YWwifQ.ArQRvPxUUArGB10brTD7-A3euqGc6yzFTcQazA4qVi0cq-Yxly6N5os86uUR0P64bg-JviqCvSOSm9pmTZbsww"}

Get Download URL



In [46]:

    
item_url = 'https://api.planet.com/data/v1/item-types/{}/items/{}/assets'.format(item_type, item_id)

# Request a new download URL
result = requests.get(item_url, auth=HTTPBasicAuth(os.environ['PL_API_KEY'], ''))
download_url = result.json()[asset_type]['location']

Check downloading COG within bounding box

GDAL uses the vsicurl driver to access the COG. To envoke the vsicurl driver, we prepend the url with /vsicurl/.



In [47]:

    
vsicurl_url = '/vsicurl/' + download_url



In [48]:

    
# check if COG url is valid
!gdalinfo $vsicurl_url









    



Driver: GTiff/GeoTIFF
Files: /vsicurl/https://api.planet.com/data/v1/download?token=eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJHUHQwZGFFSTY1K1RVUjd0bnY5RWVnVWZsNE5YT21KTDZDekJRb2U5YjlGbHhuekk4VWpIUWFuWHhPSWxRay92TWovVzhYQzVkNE9LN1NNUU1XWmY2QT09IiwiaXRlbV90eXBlX2lkIjoiUFNPcnRob1RpbGUiLCJ0b2tlbl90eXBlIjoidHlwZWQtaXRlbSIsImV4cCI6MTUzODAwOTk5MiwiaXRlbV9pZCI6Ijc2MDgxOF80ODQ4NzE4XzIwMTctMDktMTdfMGUyZiIsImFzc2V0X3R5cGUiOiJ2aXN1YWwifQ.SVYsQOSnsOgmltUbKrrP4OGL0jj6BA05tPFmnUKau94hEIHtmlCBYwMCYY7G7RCpVHgQtsGdw2RxkZtcaNYv6w
Size is 8000, 8000
Coordinate System is:
PROJCS["WGS 84 / UTM zone 48N",
    GEOGCS["WGS 84",
        DATUM["WGS_1984",
            SPHEROID["WGS 84",6378137,298.257223563,
                AUTHORITY["EPSG","7030"]],
            AUTHORITY["EPSG","6326"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4326"]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",0],
    PARAMETER["central_meridian",105],
    PARAMETER["scale_factor",0.9996],
    PARAMETER["false_easting",500000],
    PARAMETER["false_northing",0],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]],
    AXIS["Easting",EAST],
    AXIS["Northing",NORTH],
    AUTHORITY["EPSG","32648"]]
Origin = (571500.000000000000000,2328500.000000000000000)
Pixel Size = (3.125000000000000,-3.125000000000000)
Metadata:
  AREA_OR_POINT=Area
  TIFFTAG_DATETIME=2017:09:17 02:52:52
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  (  571500.000, 2328500.000) (105d41'17.58"E, 21d 3'21.61"N)
Lower Left  (  571500.000, 2303500.000) (105d41'13.86"E, 20d49'48.46"N)
Upper Right (  596500.000, 2328500.000) (105d55'43.79"E, 21d 3'17.49"N)
Lower Right (  596500.000, 2303500.000) (105d55'38.77"E, 20d49'44.38"N)
Center      (  584000.000, 2316000.000) (105d48'28.50"E, 20d56'33.14"N)
Band 1 Block=512x512 Type=Byte, ColorInterp=Red
  Overviews: 2667x2667, 889x889, 297x297
  Mask Flags: PER_DATASET ALPHA 
  Overviews of mask band: 2667x2667, 889x889, 297x297
Band 2 Block=512x512 Type=Byte, ColorInterp=Green
  Overviews: 2667x2667, 889x889, 297x297
  Mask Flags: PER_DATASET ALPHA 
  Overviews of mask band: 2667x2667, 889x889, 297x297
Band 3 Block=512x512 Type=Byte, ColorInterp=Blue
  Overviews: 2667x2667, 889x889, 297x297
  Mask Flags: PER_DATASET ALPHA 
  Overviews of mask band: 2667x2667, 889x889, 297x297
Band 4 Block=512x512 Type=Byte, ColorInterp=Alpha
  Overviews: 2667x2667, 889x889, 297x297



In [49]:

    
# write geojson file
geojson_str = geojson.dumps(geojson.Feature(geometry=bounds_geom))
geojson_file = os.path.join(data_dir, 'bounds.geojson')
with open(geojson_file, 'w') as cfile:
    cfile.write(geojson_str)



In [50]:

    
output_file = os.path.join(data_dir, item_id + '_bounds.tif')



In [51]:

    
%time !gdalwarp -cutline $geojson_file -crop_to_cutline -overwrite $vsicurl_url $output_file









    



Using band 4 of source image as alpha.
Creating output file that is 3125P x 3048L.
Processing /vsicurl/https://api.planet.com/data/v1/download?token=eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJHUHQwZGFFSTY1K1RVUjd0bnY5RWVnVWZsNE5YT21KTDZDekJRb2U5YjlGbHhuekk4VWpIUWFuWHhPSWxRay92TWovVzhYQzVkNE9LN1NNUU1XWmY2QT09IiwiaXRlbV90eXBlX2lkIjoiUFNPcnRob1RpbGUiLCJ0b2tlbl90eXBlIjoidHlwZWQtaXRlbSIsImV4cCI6MTUzODAwOTk5MiwiaXRlbV9pZCI6Ijc2MDgxOF80ODQ4NzE4XzIwMTctMDktMTdfMGUyZiIsImFzc2V0X3R5cGUiOiJ2aXN1YWwifQ.SVYsQOSnsOgmltUbKrrP4OGL0jj6BA05tPFmnUKau94hEIHtmlCBYwMCYY7G7RCpVHgQtsGdw2RxkZtcaNYv6w [1/1] : 0...10...20...30...40...50...60...70...80...90...100 - done.
CPU times: user 540 ms, sys: 480 ms, total: 1.02 s
Wall time: 28.6 s



In [52]:

    
# load local visual module
# autoreload because visual is in development

%load_ext autoreload
%autoreload 2

import visual









    



The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload



In [53]:

    
def load_rgb(filename):
    with rasterio.open(filename, 'r') as src:
        # visual band ordering: red, green, blue, alpha
        r, g, b, a = src.read() 
        # mask wherever the alpha band is zero
        mask = a == 0  
    bands = [np.ma.array(band, mask=mask) for band in [r,g,b]]
    return bands

rgb_bands = load_rgb(output_file)
visual.plot_image(rgb_bands, title='Cropped Scene')

Create Config File



In [54]:

    
# create config file
config = local_config.copy()
config['imagery'] = download_url

# define project files and folders
config_filename = os.path.join(data_dir, 'config.json')

# write config file
with open(config_filename, 'w') as cfile:
    cfile.write(json.dumps(config))

print('wrote config to {}'.format(config_filename))









    



wrote config to data/label-maker-geotiff/config.json

Run Label Maker

The only label maker commands that interact with the imagery are preview and images. We have already run download and label above, so we don't need to run them again. We do, however, need to clear out the tiles directory (created by images) of the tiles created from the local GeoTIFF.

NOTE: This section requires the container be from the planet-notebooks:label image. See introduction for building instructions.



In [55]:

    
# skip preview because it fails due to imagery-offset arg
# https://github.com/developmentseed/label-maker/issues/79
# !cd $data_dir && label-maker preview -n 3



In [56]:

    
# clear tiles directory
!cd $tiles_dir && rm -R *



In [59]:

    
# download image tiles
# Note: if this doesn't work, there are two possibilities:
# 1. the activation code has timed out. re-activate if it has been over an hour
# 2. this notebook is not being run in planet-notebooks:label image, which implements a fix to label-maker.
!cd $data_dir && label-maker images









    



Downloading 36 tiles to data/tiles



In [60]:

    
# look at three tiles that were generated
num_tiles = 3
for img in os.listdir(tiles_dir)[:num_tiles]:
    img_filename = os.path.join(tiles_dir, img)
    print(img_filename)
    display(Image(filename=img_filename))









    



data/label-maker-geotiff/data/tiles/104095-57761-17.jpg






    












    



data/label-maker-geotiff/data/tiles/104070-57772-17.jpg






    












    



data/label-maker-geotiff/data/tiles/104080-57759-17.jpg



In [ ]: