In [0]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Object pose alignment

View source on GitHub

Precisely estimating the pose of objects is fundamental to many industries. For instance, in augmented and virtual reality, it allows users to modify the state of some variable by interacting with these objects (e.g. volume controlled by a mug on the user's desk).

This notebook illustrates how to use Tensorflow Graphics to estimate the rotation and translation of known 3D objects.

This capability is illustrated by two different demos:

  1. Machine learning demo illustrating how to train a simple neural network capable of precisely estimating the rotation and translation of a given object with respect to a reference pose.
  2. Mathematical optimization demo that takes a different approach to the problem; does not use machine learning.

Note: The easiest way to use this tutorial is as a Colab notebook, which allows you to dive in with no setup.

Setup & Imports

If Tensorflow Graphics is not installed on your system, the following cell can install the Tensorflow Graphics package for you.


In [0]:
!pip install tensorflow_graphics

Now that Tensorflow Graphics is installed, let's import everything needed to run the demos contained in this notebook.


In [0]:
import time

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

from tensorflow_graphics.geometry.transformation import quaternion
from tensorflow_graphics.math import vector
from tensorflow_graphics.notebooks import threejs_visualization
from tensorflow_graphics.notebooks.resources import tfg_simplified_logo

tf.compat.v1.enable_v2_behavior()

# Loads the Tensorflow Graphics simplified logo.
vertices = tfg_simplified_logo.mesh['vertices'].astype(np.float32)
faces = tfg_simplified_logo.mesh['faces']
num_vertices = vertices.shape[0]

1. Machine Learning

Model definition

Given the 3D position of all the vertices of a known mesh, we would like a network that is capable of predicting the rotation parametrized by a quaternion (4 dimensional vector), and translation (3 dimensional vector) of this mesh with respect to a reference pose. Let's now create a very simple 3-layer fully connected network, and a loss for the task. Note that this model is very simple and definitely not optimal, which is out of scope for this notebook.


In [0]:
# Constructs the model.
model = keras.Sequential()
model.add(layers.Flatten(input_shape=(num_vertices, 3)))
model.add(layers.Dense(64, activation=tf.nn.tanh))
model.add(layers.Dense(64, activation=tf.nn.relu))
model.add(layers.Dense(7))


def pose_estimation_loss(y_true, y_pred):
  """Pose estimation loss used for training.

  This loss measures the average of squared distance between some vertices
  of the mesh in 'rest pose' and the transformed mesh to which the predicted
  inverse pose is applied. Comparing this loss with a regular L2 loss on the
  quaternion and translation values is left as exercise to the interested
  reader.

  Args:
    y_true: The ground-truth value.
    y_pred: The prediction we want to evaluate the loss for.

  Returns:
    A scalar value containing the loss described in the description above.
  """
  # y_true.shape : (batch, 7)
  y_true_q, y_true_t = tf.split(y_true, (4, 3), axis=-1)
  # y_pred.shape : (batch, 7)
  y_pred_q, y_pred_t = tf.split(y_pred, (4, 3), axis=-1)

  # vertices.shape: (num_vertices, 3)
  # corners.shape:(num_vertices, 1, 3)
  corners = tf.expand_dims(vertices, axis=1)

  # transformed_corners.shape: (num_vertices, batch, 3)
  # q and t shapes get pre-pre-padded with 1's following standard broadcast rules.
  transformed_corners = quaternion.rotate(corners, y_pred_q) + y_pred_t

  # recovered_corners.shape: (num_vertices, batch, 3)
  recovered_corners = quaternion.rotate(transformed_corners - y_true_t,
                                        quaternion.inverse(y_true_q))

  # vertex_error.shape: (num_vertices, batch)
  vertex_error = tf.reduce_sum((recovered_corners - corners)**2, axis=-1)

  return tf.reduce_mean(vertex_error)


optimizer = keras.optimizers.Adam()
model.compile(loss=pose_estimation_loss, optimizer=optimizer)
model.summary()

Data generation

Now that we have a model defined, we need data to train it. For each sample in the training set, a random 3D rotation and 3D translation are sampled and applied to the vertices of our object. Each training sample consists of all the transformed vertices and the inverse rotation and translation that would allow to revert the rotation and translation applied to the sample.


In [0]:
def generate_training_data(num_samples):
  # random_angles.shape: (num_samples, 3)
  random_angles = np.random.uniform(-np.pi, np.pi,
                                    (num_samples, 3)).astype(np.float32)

  # random_quaternion.shape: (num_samples, 4)
  random_quaternion = quaternion.from_euler(random_angles)

  # random_translation.shape: (num_samples, 3)
  random_translation = np.random.uniform(-2.0, 2.0,
                                         (num_samples, 3)).astype(np.float32)

  # data.shape : (num_samples, num_vertices, 3)
  data = quaternion.rotate(vertices[tf.newaxis, :, :],
                           random_quaternion[:, tf.newaxis, :]
                          ) + random_translation[:, tf.newaxis, :]

  # target.shape : (num_samples, 4+3)
  target = tf.concat((random_quaternion, random_translation), axis=-1)

  return np.array(data), np.array(target)

In [0]:
num_samples = 10000

data, target = generate_training_data(num_samples)

print(data.shape)   # (num_samples, num_vertices, 3): the vertices
print(target.shape)  # (num_samples, 4+3): the quaternion and translation

Training

At this point, everything is in place to start training the neural network!


In [0]:
# Callback allowing to display the progression of the training task.
class ProgressTracker(keras.callbacks.Callback):

  def __init__(self, num_epochs, step=5):
    self.num_epochs = num_epochs
    self.current_epoch = 0.
    self.step = step
    self.last_percentage_report = 0

  def on_epoch_end(self, batch, logs={}):
    self.current_epoch += 1.
    training_percentage = int(self.current_epoch * 100.0 / self.num_epochs)
    if training_percentage - self.last_percentage_report >= self.step:
      print('Training ' + str(
          training_percentage) + '% complete. Training loss: ' + str(
              logs.get('loss')) + ' | Validation loss: ' + str(
                  logs.get('val_loss')))
      self.last_percentage_report = training_percentage

In [0]:
reduce_lr_callback = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,
    patience=10,
    verbose=0,
    mode='auto',
    min_delta=0.0001,
    cooldown=0,
    min_lr=0)

In [0]:
# google internal 1
# Everything is now in place to train.
EPOCHS = 100
pt = ProgressTracker(EPOCHS)
history = model.fit(
    data,
    target,
    epochs=EPOCHS,
    validation_split=0.2,
    verbose=0,
    batch_size=32,
    callbacks=[reduce_lr_callback, pt])

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.ylim([0, 1])
plt.legend(['loss', 'val loss'], loc='upper left')
plt.xlabel('Train epoch')
_ = plt.ylabel('Error [mean square distance]')

Testing

The network is now trained and ready to use! The displayed results consist of two images. The first image contains the object in 'rest pose' (pastel lemon color) and the rotated and translated object (pastel honeydew color). This effectively allows to observe how different the two configurations are. The second image also shows the object in rest pose, but this time the transformation predicted by our trained neural network is applied to the rotated and translated version. Hopefully, the two objects are now in a very similar pose.

Note: press play multiple times to sample different test cases. You will notice that sometimes the scale of the object is off. This comes from the fact that quaternions can encode scale. Using a quaternion of unit norm would result in not changing the scale of the result. We let the interested reader experiment with adding this constraint either in the network architecture, or in the loss function.

Start with a helper function to apply a quaternion and a translation:


In [0]:
def transform_points(target_points, quaternion_variable, translation_variable):
  return quaternion.rotate(target_points,
                           quaternion_variable) + translation_variable

Define a threejs viewer for the transformed shape:


In [0]:
class Viewer(object):

  def __init__(self, my_vertices):
    my_vertices = np.asarray(my_vertices)
    context = threejs_visualization.build_context()
    light1 = context.THREE.PointLight.new_object(0x808080)
    light1.position.set(10., 10., 10.)
    light2 = context.THREE.AmbientLight.new_object(0x808080)
    lights = (light1, light2)

    material = context.THREE.MeshLambertMaterial.new_object({
        'color': 0xfffacd,
    })

    material_deformed = context.THREE.MeshLambertMaterial.new_object({
        'color': 0xf0fff0,
    })

    camera = threejs_visualization.build_perspective_camera(
        field_of_view=30, position=(10.0, 10.0, 10.0))

    mesh = {'vertices': vertices, 'faces': faces, 'material': material}
    transformed_mesh = {
        'vertices': my_vertices,
        'faces': faces,
        'material': material_deformed
    }
    geometries = threejs_visualization.triangular_mesh_renderer(
        [mesh, transformed_mesh],
        lights=lights,
        camera=camera,
        width=400,
        height=400)

    self.geometries = geometries

  def update(self, transformed_points):
    self.geometries[1].getAttribute('position').copyArray(
        transformed_points.numpy().ravel().tolist())
    self.geometries[1].getAttribute('position').needsUpdate = True

Define a random rotation and translation:


In [0]:
def get_random_transform():
  # Forms a random translation
  with tf.name_scope('translation_variable'):
    random_translation = tf.Variable(
        np.random.uniform(-2.0, 2.0, (3,)), dtype=tf.float32)

  # Forms a random quaternion
  hi = np.pi
  lo = -hi
  random_angles = np.random.uniform(lo, hi, (3,)).astype(np.float32)
  with tf.name_scope('rotation_variable'):
    random_quaternion = tf.Variable(quaternion.from_euler(random_angles))

  return random_quaternion, random_translation

Run the model to predict the transformation parameters, and visualize the result:


In [0]:
random_quaternion, random_translation = get_random_transform()

initial_orientation = transform_points(vertices, random_quaternion,
                                       random_translation).numpy()
viewer = Viewer(initial_orientation)

predicted_transformation = model.predict(initial_orientation[tf.newaxis, :, :])

predicted_inverse_q = quaternion.inverse(predicted_transformation[0, 0:4])
predicted_inverse_t = -predicted_transformation[0, 4:]

predicted_aligned = quaternion.rotate(initial_orientation + predicted_inverse_t,
                                      predicted_inverse_q)

viewer = Viewer(predicted_aligned)

2. Mathematical optimization

Here the problem is tackled using mathematical optimization, which is another traditional way to approach the problem of object pose estimation. Given correspondences between the object in 'rest pose' (pastel lemon color) and its rotated and translated counter part (pastel honeydew color), the problem can be formulated as a minimization problem. The loss function can for instance be defined as the sum of Euclidean distances between the corresponding points using the current estimate of the rotation and translation of the transformed object. One can then compute the derivative of the rotation and translation parameters with respect to this loss function, and follow the gradient direction until convergence. The following cell closely follows that procedure, and uses gradient descent to align the two objects. It is worth noting that although the results are good, there are more efficient ways to solve this specific problem. The interested reader is referred to the Kabsch algorithm for further details.

Note: press play multiple times to sample different test cases.

Define the loss and gradient functions:


In [0]:
def loss(target_points, quaternion_variable, translation_variable):
  transformed_points = transform_points(target_points, quaternion_variable,
                                        translation_variable)
  error = (vertices - transformed_points) / num_vertices
  return vector.dot(error, error)


def gradient_loss(target_points, quaternion, translation):
  with tf.GradientTape() as tape:
    loss_value = loss(target_points, quaternion, translation)
  return tape.gradient(loss_value, [quaternion, translation])

Create the optimizer.


In [0]:
learning_rate = 0.05
with tf.name_scope('optimization'):
  optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate)

Initialize the random transformation, run the optimization and animate the result.


In [0]:
random_quaternion, random_translation = get_random_transform()

transformed_points = transform_points(vertices, random_quaternion,
                                      random_translation)

viewer = Viewer(transformed_points)

nb_iterations = 100
for it in range(nb_iterations):
  gradients_loss = gradient_loss(vertices, random_quaternion,
                                 random_translation)
  optimizer.apply_gradients(
      zip(gradients_loss, (random_quaternion, random_translation)))
  transformed_points = transform_points(vertices, random_quaternion,
                                        random_translation)

  viewer.update(transformed_points)
  time.sleep(0.1)