In this notebook, we create labeled data for training a machine learning algorithm. As inputs, we use OpenStreetMap as the ground truth source and a Planet mosaic as the image source. Development Seed's Label Maker tool is used to download and prepare the ground truth data, chip the Planet imagery, and package the two to feed into the training process.
The primary interface for Label Maker is through the command-line interface (cli). It is configured through the creation of a configuration file. More information about that configuration file and command line usage can be found in the Label Maker repo README.
RUNNING NOTE
This notebook is meant to be run in a docker image specific to this folder. The docker image must be built from the custom Dockerfile according to the directions below.
In label-data directory:
docker build -t planet-notebooks:label .
Then start up the docker container as you usually would, specifying planet-notebooks:label
as the image.
In addition to the python packages imported below, the label-maker python package is also a dependency. However, it's primary usage is through the command-line interface (cli), so we use juypter notebook bash magic to run label-maker via the cli instead of importing the python package.
In [1]:
import json
import os
import ipyleaflet as ipyl
import ipywidgets as ipyw
from IPython.display import Image
import numpy as np
In this tutorial, we use the Planet mosaic tile service. There are many mosaics to choose from. For a list of mosaics available, visit https://api.planet.com/basemaps/v1/mosaics.
We first build the url for the xyz basemap tile service, then we add authorization in the form of the Planet API key.
In [2]:
# Planet tile server base URL (Planet Explorer Mosaics Tiles)
mosaic = 'global_monthly_2018_02_mosaic'
mosaicsTilesURL_base = 'https://tiles.planet.com/basemaps/v1/planet-tiles/{}/gmap/{{z}}/{{x}}/{{y}}.png'.format(mosaic)
mosaicsTilesURL_base
Out[2]:
In [3]:
# Planet tile server url with auth
planet_api_key = os.environ['PL_API_KEY']
planet_mosaic = mosaicsTilesURL_base + '?api_key=' + planet_api_key
# url is not printed because it will show private api key
This config file is pulled from the label-maker repo README.md example and then customized to utilize the Planet mosaic. The imagery url is set to the Planet mosaic url and the zoom is changed to 15, the maximum zoom supported by the Planet tile services.
See the label-maker README.md file for a description of the config entries.
In [4]:
# create data directory
data_dir = os.path.join('data', 'label-maker-mosaic')
if not os.path.isdir(data_dir):
os.makedirs(data_dir)
In [5]:
# label-maker doesn't clean up, so start with a clean slate
!cd $data_dir && rm -R *
In [6]:
# create config file
bounding_box = [1.09725, 6.05520, 1.34582, 6.30915]
config = {
"country": "togo",
"bounding_box": bounding_box,
"zoom": 15,
"classes": [
{ "name": "Roads", "filter": ["has", "highway"] },
{ "name": "Buildings", "filter": ["has", "building"] }
],
"imagery": planet_mosaic,
"background_ratio": 1,
"ml_type": "classification"
}
# define project files and folders
config_filename = os.path.join(data_dir, 'config.json')
# write config file
with open(config_filename, 'w') as cfile:
cfile.write(json.dumps(config))
print('wrote config to {}'.format(config_filename))
In [7]:
# calculate center of map
bounds_lat = [bounding_box[1], bounding_box[3]]
bounds_lon = [bounding_box[0], bounding_box[2]]
def calc_center(bounds):
return bounds[0] + (bounds[1] - bounds[0])/2
map_center = [calc_center(bounds_lat), calc_center(bounds_lon)] # lat/lon
print(bounding_box)
print(map_center)
In [8]:
# create and visualize mosaic at approximately the same bounds as defined in the config file
map_zoom = 12
layout=ipyw.Layout(width='800px', height='800px') # set map layout
mosaic_map = ipyl.Map(center=map_center, zoom=map_zoom, layout=layout)
mosaic_map.add_layer(ipyl.TileLayer(url=planet_mosaic))
mosaic_map
In [9]:
mosaic_map.bounds
Out[9]:
In this step, label-maker downloads the OSM vector tiles for the country specified in the config file.
According to Label Maker documentation, these can be visualized with mbview. So far I have not been successful getting mbview to work. I will keep on trying and would love to hear how you got this to work!
In [10]:
!cd $data_dir && label-maker download
In [11]:
!cd $data_dir && label-maker labels
Visualizing classification.geojson
in QGIS gives:
Although Label Maker doesn't tell us which classes line up with the labels (see the legend in the visualization for labels), it looks like the following relationships hold:
Most of the large region with no roads or buildings at the bottom portion of the image is the water off the coast.
Create a subset of the image chips for preview before creating them all. Preview chips are placed in subdirectories named after each class specified in the config file.
NOTE This section is commented out because preview fails due to imagery-offset arg. See more: https://github.com/developmentseed/label-maker/issues/79
In [23]:
# !cd $data_dir && label-maker preview -n 3
In [24]:
# !ls $data_dir/data/examples
In [25]:
# for fclass in ('Roads', 'Buildings'):
# example_dir = os.path.join(data_dir, 'data', 'examples', fclass)
# print(example_dir)
# for img in os.listdir(example_dir):
# print(img)
# display(Image(os.path.join(example_dir, img)))
Other than the fact that 4 tiles were created instead of the specified 3, the results look pretty good! All Road examples have roads, and all Building examples have buildings.
In this step, we invoke label-maker images
, which downloads and chips the mosaic into tiles that match the label tiles.
Interestingly, only 372 image tiles are downloaded, while 576 label tiles were generated. Looking at the label tile generation output (370 Road tiles, 270 Building tiles) along with the classification.geojson
visualization (only two tiles that are Building and not Road), we find that there are only 372 label tiles that represent at least one of the Road/Building classes. This is why only 372 image tiles were generated.
In [26]:
!cd $data_dir && label-maker images
In [27]:
# look at three tiles that were generated
tiles_dir = os.path.join(data_dir, 'data', 'tiles')
print(tiles_dir)
for img in os.listdir(tiles_dir)[:3]:
print(img)
display(Image(os.path.join(tiles_dir, img)))
In [28]:
# will not be able to open image tiles that weren't generated because the label tiles contained no classes
!cd $data_dir && label-maker package
In [29]:
data_file = os.path.join(data_dir, 'data', 'data.npz')
data = np.load(data_file)
In [30]:
for k in data.keys():
print('data[\'{}\'] shape: {}'.format(k, data[k].shape))
297 x (image) and y (label) datasets were created in the train set, and 75 x and y datasets were created in the test set, adding up to 372 sets total, equal to the number of image tiles downloaded.
The next step after creating labeled data is to train the machine learning algorithm.
This Development Seed walkthrough demonstrates how to train a neural network classifier.