Licensed under the Apache License, Version 2.0 (the "License");


In [0]:
#@title Licensed under the Apache License, Version 2.0 (the "License"); { display-mode: "form" }
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Forward and Backward mode gradients in TFF

View source on GitHub

In [0]:
#@title Upgrade to TensorFlow nightly
!pip install --upgrade tf-nightly

In [0]:
#@title Install TF Quant Finance
!pip install tff-nightly

This notebook demonstrates the difference between forward and backward gradient computation


In [0]:
#@title Imports { display-mode: "form" }

import tensorflow as tf
import functools
import tf_quant_finance as tff

Consider a simple vector-function in two variables $x$ and $y$:

$ \begin{align} & f = [f_1, f_2, f_3] \\ & where \\ \end{align} $

$ \begin{align} f_1 &= x^2 \\ f_2 &= y^2 \\ f_3 &= x y \\ \end{align} $


In [0]:
def func(x):
    func = tf.stack([x[0]**2, x[1]**2, x[0] * x[1]])
    return func

start = tf.constant([1,2], dtype=tf.float64)

Backward mode

For a vector $u = [u_1, u_2, u_3]$, backward gradient computes partial derivatives of the dot product $u \cdot f(x, y)$

$ \begin{align} \frac {\partial (u \cdot f)}{\partial x} &= u_1 \frac{\partial f_1}{\partial x} + u_2 \frac{\partial f_2}{\partial x} + u_3 \frac{\partial f_3}{\partial x} \\ \\ &= 2 u_1 x + u_3 y\\ \\ \frac {\partial (u \cdot f)}{\partial y} &= u_1 \frac{\partial f_1}{\partial y} + u_2 \frac{\partial f_2}{\partial y} + u_3 \frac{\partial f_3}{\partial y} \\ \\ &= 2 u_1 y + u_3 x \end{align} $

In Tensorflow, [$u_1$, $u_2$, $u_3$] is by default set to [1, 1, 1].

Setting [$x$, $y$] to [1, 2], backward mode returns the gradients summed up by components


In [4]:
# Note that the output is u  d(u.f(x, y))dx and d(u.f(x, y))dy
tff.math.gradients(func, start)


Out[4]:
<tf.Tensor: shape=(2,), dtype=float64, numpy=array([4., 5.])>

The user has access to [$u_1$, $u_2$, $u_3$] as well. Setting the values to [0, 0, 1] leads to the gradient $[\frac{\partial f_3}{\partial x}, \frac{\partial f_3}{\partial y}]$


In [0]:
tff.math.gradients(func, start, 
                   output_gradients=tf.constant([0, 0, 1], dtype=tf.float64))


Out[0]:
<tf.Tensor: shape=(2,), dtype=float64, numpy=array([1., 1.])>

Forward mode

TFF provides an opportunity to compute a forward gradient as well. For a vector $w = [w_1, w_2]$, forward gradient computes differentials for $[f_1, f_2, f_3]$

$ \begin{align} {\partial f_1} &= w_1 \frac{\partial f_1}{\partial x} + w_2 \frac{\partial f_1}{\partial y} \\ \\ &= 2 w_1 x \\ \\ {\partial f_2} &= w_1 \frac{\partial f_2}{\partial x} + w_2 \frac{\partial f_2}{\partial y} \\ \\ &= 2 w_2 y \\ \\ {\partial f_3} &= w_1 \frac{\partial f_3}{\partial x} + w_2 \frac{\partial f_3}{\partial y} \\ \\ &= w_1 x + w_2 y \\ \\ \end{align} $

In TFF, [$w_1$, $w_2$] is by default set to [1, 1]. Setting [$x$, $y$] to [1, 2], forward mode returns the differentials by components.


In [5]:
tff.math.fwd_gradient(func, start)


Out[5]:
<tf.Tensor: shape=(3,), dtype=float64, numpy=array([2., 4., 3.])>

Remember, Tensorflow is the tool commonly used in Machine Learning. In Machine Learning, the aim is to minmize the scalar loss function, the loss function being the sum of the gradient with respect to the feature set. This lowest loss is the loss summed up over all training examples which can be computed via backward gradient.

However, let's take the use case where we are valuing a set of options, say ten, against a single spot price $S_0$. We now have ten price functions and we need their gradients against spot $S_0$ (ten deltas).

Using the forward gradients with respect to $S_0$ would give us the ten delta's in a single pass.

Using the backward gradients would result in the sum of the ten delta's, which may not be that useful.

It is useful to note that varying the weights would also give you individual components of the gradients (in other words [1, 0] and [0, 1] as values of [$w_1$, $w_2$], instead of the default [1, 1], similarly for backward. This is, of course, at the expense of more compute.


In [6]:
tff.math.fwd_gradient(func, start,
                      input_gradients=tf.constant([1.0, 0.0], dtype=tf.float64))


Out[6]:
<tf.Tensor: shape=(3,), dtype=float64, numpy=array([2., 0., 2.])>

In [7]:
tff.math.fwd_gradient(func, start,
                      input_gradients=tf.constant([0.0, 0.1], dtype=tf.float64))


Out[7]:
<tf.Tensor: shape=(3,), dtype=float64, numpy=array([0. , 0.4, 0.1])>

In [0]: