In [0]:
#@title Copyright 2020 Google LLC. { display-mode: "form" }
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Introduction

This is an Earth Engine <> TensorFlow demonstration notebook. This demonstrates a per-pixel neural network implemented in a way that allows the trained model to be hosted on Google AI Platform and used in Earth Engine for interactive prediction from an ee.Model.fromAIPlatformPredictor. See this example notebook for background on the dense model.

Running this demo may incur charges to your Google Cloud Account!

Setup software libraries

Import software libraries and/or authenticate as necessary.

Authenticate to Colab and Cloud

To read/write from a Google Cloud Storage bucket to which you have access, it's necessary to authenticate (as yourself). This should be the same account you use to login to Earth Engine. When you run the code below, it will display a link in the output to an authentication page in your browser. Follow the link to a page that will let you grant permission to the Cloud SDK to access your resources. Copy the code from the permissions page back into this notebook and press return to complete the process.

(You may need to run this again if you get a credentials error later.)


In [0]:
from google.colab import auth
auth.authenticate_user()

Authenticate to Earth Engine

Authenticate to Earth Engine the same way you did to the Colab notebook. Specifically, run the code to display a link to a permissions page. This gives you access to your Earth Engine account. This should be the same account you used to login to Cloud previously. Copy the code from the Earth Engine permissions page back into the notebook and press return to complete the process.


In [0]:
import ee
ee.Authenticate()
ee.Initialize()

Test the TensorFlow installation

Import TensorFlow and check the version.


In [0]:
import tensorflow as tf
print(tf.__version__)

Test the Folium installation

We will use the Folium library for visualization. Import the library and check the version.


In [0]:
import folium
print(folium.__version__)

Define variables

The training data are land cover labels with a single vector of Landsat 8 pixel values (BANDS) as predictors. See this example notebook for details on how to generate these training data.


In [0]:
# REPLACE WITH YOUR CLOUD PROJECT!
PROJECT = 'your-project'

# Cloud Storage bucket with training and testing datasets.
DATA_BUCKET = 'ee-docs-demos'
# Output bucket for trained models.  You must be able to write into this bucket.
OUTPUT_BUCKET = 'your-bucket'

# Training and testing dataset file names in the Cloud Storage bucket.
TRAIN_FILE_PREFIX = 'Training_demo'
TEST_FILE_PREFIX = 'Testing_demo'
file_extension = '.tfrecord.gz'
TRAIN_FILE_PATH = 'gs://' + DATA_BUCKET + '/' + TRAIN_FILE_PREFIX + file_extension
TEST_FILE_PATH = 'gs://' + DATA_BUCKET + '/' + TEST_FILE_PREFIX + file_extension

# The labels, consecutive integer indices starting from zero, are stored in
# this property, set on each point.
LABEL = 'landcover'
# Number of label values, i.e. number of classes in the classification.
N_CLASSES = 3

# Use Landsat 8 surface reflectance data for predictors.
L8SR = ee.ImageCollection('LANDSAT/LC08/C01/T1_SR')
# Use these bands for prediction.
BANDS = ['B2', 'B3', 'B4', 'B5', 'B6', 'B7']

# These names are used to specify properties in the export of 
# training/testing data and to define the mapping between names and data
# when reading into TensorFlow datasets.
FEATURE_NAMES = list(BANDS)
FEATURE_NAMES.append(LABEL)

# List of fixed-length features, all of which are float32.
columns = [
  tf.io.FixedLenFeature(shape=[1], dtype=tf.float32) for k in FEATURE_NAMES
]

# Dictionary with feature names as keys, fixed-length features as values.
FEATURES_DICT = dict(zip(FEATURE_NAMES, columns))

Read data

Check existence of the data files

Check that you have permission to read the files in the output Cloud Storage bucket.


In [0]:
print('Found training file.' if tf.io.gfile.exists(TRAIN_FILE_PATH) 
    else 'No training file found.')
print('Found testing file.' if tf.io.gfile.exists(TEST_FILE_PATH) 
    else 'No testing file found.')

Read into a tf.data.Dataset

Here we are going to read a file in Cloud Storage into a tf.data.Dataset. (these TensorFlow docs explain more about reading data into a tf.data.Dataset). Check that you can read examples from the file. The purpose here is to ensure that we can read from the file without an error. The actual content is not necessarily human readable. Note that we will use all data for training.


In [0]:
# Create a dataset from the TFRecord file in Cloud Storage.
train_dataset = tf.data.TFRecordDataset([TRAIN_FILE_PATH, TEST_FILE_PATH],
                                        compression_type='GZIP')

# Print the first record to check.
print(iter(train_dataset).next())

Parse the dataset

Now we need to make a parsing function for the data in the TFRecord files. The data comes in flattened 2D arrays per record and we want to use the first part of the array for input to the model and the last element of the array as the class label. The parsing function reads data from a serialized Example proto (i.e. example.proto) into a dictionary in which the keys are the feature names and the values are the tensors storing the value of the features for that example. (Learn more about parsing Example protocol buffer messages).


In [0]:
def parse_tfrecord(example_proto):
  """The parsing function.

  Read a serialized example into the structure defined by FEATURES_DICT.

  Args:
    example_proto: a serialized Example.

  Returns:
    A tuple of the predictors dictionary and the LABEL, cast to an `int32`.
  """
  parsed_features = tf.io.parse_single_example(example_proto, FEATURES_DICT)
  labels = parsed_features.pop(LABEL)
  return parsed_features, tf.cast(labels, tf.int32)

# Map the function over the dataset.
parsed_dataset = train_dataset.map(parse_tfrecord, num_parallel_calls=4)

from pprint import pprint

# Print the first parsed record to check.
pprint(iter(parsed_dataset).next())

Note that each record of the parsed dataset contains a tuple. The first element of the tuple is a dictionary with bands names for keys and tensors storing the pixel data for values. The second element of the tuple is tensor storing the class label.

Adjust dimension and shape

Turn the dictionary of {name: tensor,...} into a 1x1xP array of values, where P is the number of predictors. Turn the label into a 1x1xN_CLASSES array of indicators (i.e. one-hot vector), in order to use a categorical crossentropy-loss function. Return a tuple of (predictors, indicators where each is a three dimensional array; the first two dimensions are spatial x, y (i.e. 1x1 kernel).


In [0]:
# Inputs as a tuple.  Make predictors 1x1xP and labels 1x1xN_CLASSES.
def to_tuple(inputs, label):
  return (tf.expand_dims(tf.transpose(list(inputs.values())), 1),
          tf.expand_dims(tf.one_hot(indices=label, depth=N_CLASSES), 1))

input_dataset = parsed_dataset.map(to_tuple)
# Check the first one.
pprint(iter(input_dataset).next())

input_dataset = input_dataset.shuffle(128).batch(8)

Model setup

Make a densely-connected convolutional model, where the convolution occurs in a 1x1 kernel. This is exactly analagous to the model generated in this example notebook, but operates in a convolutional manner in a 1x1 kernel. This allows Earth Engine to apply the model spatially, as demonstrated below.

Note that the model used here is purely for demonstration purposes and hasn't gone through any performance tuning.

Create the Keras model

Before we create the model, there's still a wee bit of pre-processing to get the data into the right input shape and a format that can be used with cross-entropy loss. Specifically, Keras expects a list of inputs and a one-hot vector for the class. (See the Keras loss function docs, the TensorFlow categorical identity docs and the tf.one_hot docs for details).

Here we will use a simple neural network model with a 64 node hidden layer. Once the dataset has been prepared, define the model, compile it, fit it to the training data. See the Keras Sequential model guide for more details.


In [0]:
from tensorflow import keras

# Define the layers in the model.  Note the 1x1 kernels.
model = tf.keras.models.Sequential([
  tf.keras.layers.Input((None, None, len(BANDS),)),
  tf.keras.layers.Conv2D(64, (1,1), activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.1),
  tf.keras.layers.Conv2D(N_CLASSES, (1,1), activation=tf.nn.softmax)
])

# Compile the model with the specified loss and optimizer functions.
model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Fit the model to the training data.  Lucky number 7.
model.fit(x=input_dataset, epochs=7)

Save the trained model

Export the trained model to TensorFlow SavedModel format in your cloud storage bucket. The Cloud Platform storage browser is useful for checking on these saved models.


In [0]:
MODEL_DIR = 'gs://' + OUTPUT_BUCKET + '/demo_pixel_model'
model.save(MODEL_DIR, save_format='tf')

EEification

EEIfication prepares the model for hosting on Google AI Platform. Learn more about EEification from this doc. First, get (and SET) input and output names of the nodes. CHANGE THE OUTPUT NAME TO SOMETHING THAT MAKES SENSE FOR YOUR MODEL! Keep the input name of 'array', which is how you'll pass data into the model (as an array image).


In [0]:
from tensorflow.python.tools import saved_model_utils

meta_graph_def = saved_model_utils.get_meta_graph_def(MODEL_DIR, 'serve')
inputs = meta_graph_def.signature_def['serving_default'].inputs
outputs = meta_graph_def.signature_def['serving_default'].outputs

# Just get the first thing(s) from the serving signature def.  i.e. this
# model only has a single input and a single output.
input_name = None
for k,v in inputs.items():
  input_name = v.name
  break

output_name = None
for k,v in outputs.items():
  output_name = v.name
  break

# Make a dictionary that maps Earth Engine outputs and inputs to
# AI Platform inputs and outputs, respectively.
import json
input_dict = "'" + json.dumps({input_name: "array"}) + "'"
output_dict = "'" + json.dumps({output_name: "output"}) + "'"
print(input_dict)
print(output_dict)

Run the EEifier

The actual EEification is handled by the earthengine modle prepare command. Note that you will need to set your Cloud Project prior to running the command.


In [0]:
# Put the EEified model next to the trained model directory.
EEIFIED_DIR = 'gs://' + OUTPUT_BUCKET + '/eeified_pixel_model'

# You need to set the project before using the model prepare command.
!earthengine set_project {PROJECT}
!earthengine model prepare --source_dir {MODEL_DIR} --dest_dir {EEIFIED_DIR} --input {input_dict} --output {output_dict}

Deploy and host the EEified model on AI Platform

Now there is another TensorFlow SavedModel stored in EEIFIED_DIR ready for hosting by AI Platform. Do that from the gcloud command line tool, installed in the Colab runtime by default. Note that the MODEL_NAME must be unique. If you already have a model by that name, either name a new model or a new version of the old model. The Cloud Console AI Platform models page is useful for monitoring your models.

If you change anything about the trained model, you'll need to re-EEify it and create a new version!


In [0]:
MODEL_NAME = 'pixel_demo_model'
VERSION_NAME = 'v0'

!gcloud ai-platform models create {MODEL_NAME} --project {PROJECT}
!gcloud ai-platform versions create {VERSION_NAME} \
  --project {PROJECT} \
  --model {MODEL_NAME} \
  --origin {EEIFIED_DIR} \
  --framework "TENSORFLOW" \
  --runtime-version=2.1 \
  --python-version=3.7

Connect to the hosted model from Earth Engine

  1. Generate the input imagery. This should be done in exactly the same way as the training data were generated. See this example notebook for details.
  2. Connect to the hosted model.
  3. Use the model to make predictions.
  4. Display the results.

Note that it takes the model a couple minutes to spin up and make predictions.


In [0]:
# Cloud masking function.
def maskL8sr(image):
  cloudShadowBitMask = ee.Number(2).pow(3).int()
  cloudsBitMask = ee.Number(2).pow(5).int()
  qa = image.select('pixel_qa')
  mask = qa.bitwiseAnd(cloudShadowBitMask).eq(0).And(
    qa.bitwiseAnd(cloudsBitMask).eq(0))
  return image.updateMask(mask).select(BANDS).divide(10000)

# The image input data is a 2018 cloud-masked median composite.
image = L8SR.filterDate('2018-01-01', '2018-12-31').map(maskL8sr).median()

# Get a map ID for display in folium.
rgb_vis = {'bands': ['B4', 'B3', 'B2'], 'min': 0, 'max': 0.3, 'format': 'png'}
mapid = image.getMapId(rgb_vis)

# Turn into an array image for input to the model.
array_image = image.float().toArray()

# Point to the model hosted on AI Platform.
model = ee.Model.fromAiPlatformPredictor(
    projectName=PROJECT,
    modelName=MODEL_NAME,
    version=VERSION_NAME,
    # Can be anything, but don't make it too big.
    inputTileSize=[8, 8],
    # Keep this the same as your training data.
    proj=ee.Projection('EPSG:4326').atScale(30),
    fixInputProj=True,
    # Note the names here need to match what you specified in the
    # output dictionary you passed to the EEifier.
    outputBands={'output': {
        'type': ee.PixelType.float(),
        'dimensions': 1
      }
    },
)

# model.predictImage outputs a one dimensional array image that
# packs the output nodes of your model into an array.  These
# are class probabilities that you need to unpack into a 
# multiband image with arrayFlatten().  If you want class
# labels, use arrayArgmax() as follows.
predictions = model.predictImage(array_image)
probabilities = predictions.arrayFlatten([['bare', 'veg', 'water']])
label = predictions.arrayArgmax().arrayGet([0]).rename('label')

# Get map IDs for display in folium.
probability_vis = {
    'bands': ['bare', 'veg', 'water'], 'max': 0.5, 'format': 'png'
}
label_vis = {
    'palette': ['red', 'green', 'blue'], 'min': 0, 'max': 2, 'format': 'png'
}
probability_mapid = probabilities.getMapId(probability_vis)
label_mapid = label.getMapId(label_vis)

# Visualize the input imagery and the predictions.
map = folium.Map(location=[37.6413, -122.2582], zoom_start=11)

folium.TileLayer(
    tiles=mapid['tile_fetcher'].url_format,
    attr='Map Data &copy; <a href="https://earthengine.google.com/">Google Earth Engine</a>',
    overlay=True,
    name='median composite',
  ).add_to(map)
folium.TileLayer(
  tiles=label_mapid['tile_fetcher'].url_format,
  attr='Map Data &copy; <a href="https://earthengine.google.com/">Google Earth Engine</a>',
  overlay=True,
  name='predicted label',
).add_to(map)
folium.TileLayer(
  tiles=probability_mapid['tile_fetcher'].url_format,
  attr='Map Data &copy; <a href="https://earthengine.google.com/">Google Earth Engine</a>',
  overlay=True,
  name='probability',
).add_to(map)
map.add_child(folium.LayerControl())
map