In this notebook, we create labeled data for training a machine learning algorithm. As inputs, we use OpenStreetMap as the ground truth source and a Planet GeoTIFF as the image source. Development Seed's Label Maker tool is used to download and prepare the ground truth data, chip the Planet imagery, and package the two to feed into the training process.
The primary interface for Label Maker is through the command-line interface (cli). It is configured through the creation of a configuration file. More information about that configuration file and command line usage can be found in the Label Maker repo README.
The goal of this tutorial is to demonstrate labeling data from a local GeoTIFF as well as a Cloud-Optimized GeoTIFF (COG) available via the Planet API. This is inspired by the fact that label-maker now supports both local GeoTIFFs and remote COGs (blog post). When only a portion of a scene is needed, accessing it as a remote COG can save time, bandwidth, and local storage.
NOTE: Currently, label-maker supports only 8-bit RGB imagery. Therefore, the visual
asset is best for use with label-maker.
RUNNING NOTE
This notebook is meant to be run in a docker image specific to this folder. The docker image must be built from the custom Dockerfile according to the directions below.
In label-data directory:
docker build -t planet-notebooks:label .
Then start up the docker container as you usually would, specifying planet-notebooks:label
as the image.
There is currently an incompatibility between the URL Planet uses for COGs (which does not use a geotiff name along with the tif
extension) and the released version of label-maker. The released version looks for the tif
extension in the url before treating it as a COG. See the issue for more information. There is a fixed version at jreiberkyle/label-maker.
This image installs the fixed version of label-maker along with its dependencies.
In addition to the python packages imported below, the label-maker and planet python packages are also a dependency. However, in this notebook, both packages are accessed through their command-line interface.
In [1]:
import json
import os
import geojson
import ipyleaflet as ipyl
import ipywidgets as ipyw
from IPython.display import Image
import numpy as np
import rasterio
import requests
from requests.auth import HTTPBasicAuth
from shapely.geometry import shape
In [2]:
# create data directory
data_dir = os.path.join('data', 'label-maker-geotiff')
if not os.path.isdir(data_dir):
os.makedirs(data_dir)
In [3]:
# scene specifications
item_id = '760818_4848718_2017-09-17_0e2f'
item_type = 'PSOrthoTile'
asset_type = 'visual'
In [4]:
!planet data download --item-type $item_type --asset-type $asset_type \
--string-in id $item_id --dest $data_dir --quiet
Label maker is behavior is specified through a configuration file. The configuration file we use in this tutorial was pulled from the label-maker tutorial on mapping buildings in Vietnam and then customized to utilize a local GeoTIFF. The imagery url is set to the GeoTIFF filename. We also changed the bounds to an area of interest fully contained within the GeoTIFF. This is because I am not sure how label maker handles masked pixels.
See the label-maker README for a description of the config entries.
In [5]:
# define AOI
bounds_geom = {'type': 'Polygon',
'coordinates': [[[105.81775409169494, 20.84015810005586],
[105.9111433289945, 20.84015810005586],
[105.9111433289945, 20.925748489914824],
[105.81775409169494, 20.925748489914824],
[105.81775409169494, 20.84015810005586]]]}
bounding_box = shape(bounds_geom).bounds
bounding_box
Out[5]:
In [6]:
# define location relative to data_dir
geotiff_filename = '760818_4848718_2017-09-17_0e2f_RGB_Visual.tif'
# create config file
local_config = {
"country": "vietnam",
"bounding_box": bounding_box,
"zoom": 17,
"classes": [
{ "name": "Buildings", "filter": ["has", "building"] }
],
"imagery": geotiff_filename,
"background_ratio": 1,
"ml_type": "classification"
}
# define project files and folders
local_config_name = 'config_local.json'
local_config_filename = os.path.join(data_dir, local_config_name)
# write config file
with open(local_config_filename, 'w') as cfile:
cfile.write(json.dumps(local_config))
print('wrote config to {}'.format(local_config_filename))
In this section, we use label-maker to download and prepare the OSM label data and tile the GeoTIFF.
For more details on running label-maker, see the README.
In [7]:
!cd $data_dir && label-maker download --config $local_config_name
In [8]:
!cd $data_dir && label-maker labels --config $local_config_name
In [9]:
# skip preview because it fails due to imagery-offset arg
# https://github.com/developmentseed/label-maker/issues/79
# !cd $data_dir && label-maker preview -n 3 --config $local_config_name
In [34]:
tiles_dir = os.path.join(data_dir, 'data', 'tiles')
print(tiles_dir)
In [35]:
!ls $tiles_dir
In [37]:
# clean out tiles directory if it exists
!rm -rf $tiles_dir
In [38]:
%time !cd $data_dir && label-maker images --config $local_config_name
In [39]:
# look at three tiles that were generated
num_tiles = 3
for img in os.listdir(tiles_dir)[:num_tiles]:
img_filename = os.path.join(tiles_dir, img)
print(img_filename)
display(Image(filename=img_filename))
In [40]:
# will not be able to open image tiles that weren't generated because the label tiles contained no classes
!cd $data_dir && label-maker package --config $local_config_name
In [41]:
data_file = os.path.join(data_dir, 'data', 'data.npz')
data = np.load(data_file)
In [42]:
for k in data.keys():
print('data[\'{}\'] shape: {}'.format(k, data[k].shape))
28 x (image) and y (label) datasets were created in the train set, and 8 x and y datasets were created in the test set, adding up to 36 sets total. Not enough to train a classifier, but this is only one image in a daily image stream, so looking at an image stack would allow us to build up an excellent labeled training dataset quickly!
In this portion of this tutorial, we are accessing a portion of the GeoTIFF directly from the download endpoint. This way we only download the pixels that we need.
Before we can access the scene, it must be activated. Here, we activate the scene using the planet cli since it waits until the asset is activated before moving on.
Because activations do not last very long, be sure to activate right before you access the scene.
For another tutorial covering accessing Planet COGs, see the Download a Subarea tutorial.
In [45]:
!planet data download --activate-only --item-type $item_type --asset-type $asset_type \
--string-in id $item_id --quiet
In [46]:
item_url = 'https://api.planet.com/data/v1/item-types/{}/items/{}/assets'.format(item_type, item_id)
# Request a new download URL
result = requests.get(item_url, auth=HTTPBasicAuth(os.environ['PL_API_KEY'], ''))
download_url = result.json()[asset_type]['location']
In [47]:
vsicurl_url = '/vsicurl/' + download_url
In [48]:
# check if COG url is valid
!gdalinfo $vsicurl_url
In [49]:
# write geojson file
geojson_str = geojson.dumps(geojson.Feature(geometry=bounds_geom))
geojson_file = os.path.join(data_dir, 'bounds.geojson')
with open(geojson_file, 'w') as cfile:
cfile.write(geojson_str)
In [50]:
output_file = os.path.join(data_dir, item_id + '_bounds.tif')
In [51]:
%time !gdalwarp -cutline $geojson_file -crop_to_cutline -overwrite $vsicurl_url $output_file
In [52]:
# load local visual module
# autoreload because visual is in development
%load_ext autoreload
%autoreload 2
import visual
In [53]:
def load_rgb(filename):
with rasterio.open(filename, 'r') as src:
# visual band ordering: red, green, blue, alpha
r, g, b, a = src.read()
# mask wherever the alpha band is zero
mask = a == 0
bands = [np.ma.array(band, mask=mask) for band in [r,g,b]]
return bands
rgb_bands = load_rgb(output_file)
visual.plot_image(rgb_bands, title='Cropped Scene')
In [54]:
# create config file
config = local_config.copy()
config['imagery'] = download_url
# define project files and folders
config_filename = os.path.join(data_dir, 'config.json')
# write config file
with open(config_filename, 'w') as cfile:
cfile.write(json.dumps(config))
print('wrote config to {}'.format(config_filename))
The only label maker commands that interact with the imagery are preview
and images
. We have already run download
and label
above, so we don't need to run them again. We do, however, need to clear out the tiles directory (created by images
) of the tiles created from the local GeoTIFF.
NOTE: This section requires the container be from the planet-notebooks:label
image. See introduction for building instructions.
In [55]:
# skip preview because it fails due to imagery-offset arg
# https://github.com/developmentseed/label-maker/issues/79
# !cd $data_dir && label-maker preview -n 3
In [56]:
# clear tiles directory
!cd $tiles_dir && rm -R *
In [59]:
# download image tiles
# Note: if this doesn't work, there are two possibilities:
# 1. the activation code has timed out. re-activate if it has been over an hour
# 2. this notebook is not being run in planet-notebooks:label image, which implements a fix to label-maker.
!cd $data_dir && label-maker images
In [60]:
# look at three tiles that were generated
num_tiles = 3
for img in os.listdir(tiles_dir)[:num_tiles]:
img_filename = os.path.join(tiles_dir, img)
print(img_filename)
display(Image(filename=img_filename))
In [ ]: