In [0]:
#@title Licensed under the Apache License, Version 2.0 (the "License"); { display-mode: "form" }
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
TensorFlow Finance has built-in library for unconstrained numerical optimization. This notebook shows how to use it.
|
In [0]:
#@title Upgrade to TensorFlow 2.1+
!pip install --upgrade tensorflow
In [0]:
#@title Install TF Quant Finance
!pip install tf-quant-finance
In [0]:
#@title Imports
import tensorflow as tf
import tensorflow_probability as tfp
import numpy as np
import functools
import matplotlib.colors as colors
import tf_quant_finance.math.optimizer as tff_optimizer
from matplotlib import pyplot as plt
from IPython.core.pylabtools import figsize
tf.compat.v1.enable_eager_execution()
figsize(12, 8)
In [0]:
# Decorator for automatic gradient evaluation.
def make_val_and_grad_fn(value_fn):
@functools.wraps(value_fn)
def val_and_grad(x):
return tfp.math.value_and_gradient(value_fn, x)
return val_and_grad
Below is an example of optimizing a function in TensorFlow Finance.
In [3]:
# Define a function.
@make_val_and_grad_fn
def func(x):
return (1 - x[0])**2 + 100*(x[1]-x[0]**2)**2
# Start point for optimization.
start = tf.constant([0.0, 0.0], dtype=tf.float64)
# Optimize
result = tff_optimizer.conjugate_gradient_minimize(func, start)
print("Argmin: %s" % result.position.numpy())
print("Objective value at argmin: %s" % result.objective_value.numpy())
In this example we used the Conjugate Gradinet optimization algorithm to optimize a function $f$ of two variables, defined as
$$f(x, y) = (1-x)^2 + 100(y-x^2)^2.$$This is the Rosenbrock function and it has single local minimum $f(1, 1) = 0$, which is also a global minimum.
We had to supply two things to the algorithm:
Callable which given a point $\mathbf{x}$ calculates objective value at the point $f(\mathbf{x})$ and gradient at the point $\nabla f(\mathbf{x})$.
Initial position. Conjugate Gradient method is an iterative method, i.e. it builds a sequence of points approaching the minimum. It needs to know at which point to start. In our case it is the origin (0, 0). Algorithm also uses initial position to infer dimensionality of the function domain and dtype which will be used to do all calculations.
In example below we didn't explicitly specify gradient of objective function. That's because TensorFlow supports automatic differentiation. If you define a differentiable function using only TensorFlow ops that support gradients, you will be able to calculate its derivative at any point automatically.
We used decorator make_val_and_grad_func
to make from function which evaluates $f(\mathbb{x})$ function which evaluates pair $f(\mathbb{x}), \nabla f(\mathbb{x}).$ This decorator was defined at the top of this notebook.
In principle, we could explicitly specify gradient:
In [4]:
def func_and_grad(position):
x, y = position[0], position[1]
func = (1 - x)**2 + 100*(y-x**2)**2
grad = tf.stack([
2* (x-1) - 400*(y-x**2) * x,
200 * (y-x**2)
])
return func, grad
result = tff_optimizer.conjugate_gradient_minimize(func_and_grad, start)
print(result.position.numpy())
But we recommend to always use automatic differentiation.
Conjugate Gradient algorithm stops if any of following conditions is satisifed:
max_iterations
) is reached;tolerance
. Note that norm is inf-norm, i.e. maximal absolute value over all coordinates of gradient;f_relative_tolerance
;x_tolerance
.Parameters max_iterations
, tolerance
, f_relative_tolerance
and x_tolerance
are parameters of optimization algorithm.
By default x_tolerance
and f_relative_tolerance
are set to 0.
Gradient tolerance is an important parameter. If it is too small, algorithm may perform many iterations in the neigborhood of the local minimum but never stop. If it is too big, algorithm may terminate too far from the minimum. Default value is $10^{-8}$.
Example below shows how number of iterations and number of objective evaluations needed for optimization to converge depend on gradient tolerance.
In [7]:
@make_val_and_grad_fn
def rosenbrock(x):
return 100 * tf.reduce_sum(tf.square(x[1:] - tf.square(x[:-1]))) + \
tf.reduce_sum(tf.square(1 - x[:-1]))
start = tf.zeros(4, dtype=tf.float64)
tolerance_range = []
num_iterations = []
num_func_calls = []
for x in range(15):
tolerance = 10.0**(-x)
result = tff_optimizer.conjugate_gradient_minimize(
rosenbrock,
start,
max_iterations=1000,
tolerance=tolerance)
tolerance_range.append(tolerance)
num_iterations.append(result.num_iterations.numpy())
num_func_calls.append(result.num_objective_evaluations.numpy())
plt.plot(tolerance_range, num_iterations, label='Iterations')
plt.plot(tolerance_range, num_func_calls, label='Objectrive evaluations')
plt.semilogx()
plt.xlim(1, 1e-14)
plt.xlabel('Tolerance')
plt.legend()
plt.show()
The Conjugate Gradient algorithm has several adjustable parameters. These parameters can be passed using the params
argument. For details on the parameters refer
to the source code and this article.
The example below shows how to adjust custom algorithm parameters. We will vary the parameter $\sigma$ and see how that affects the number of iterations and the number of objective evaluations needed for convergence.
In [8]:
@make_val_and_grad_fn
def func(x):
return (1 - x[0])**2 + 100*(x[1]-x[0]**2)**2
start = tf.zeros(2, dtype=tf.float64)
delta_range = np.linspace(0.05, 0.5, 9)
num_iterations = []
num_func_calls = []
for delta in delta_range:
result = tff_optimizer.conjugate_gradient_minimize(
func,
start,
max_iterations=1000,
params = tff_optimizer.ConjugateGradientParams(
sufficient_decrease_param=delta))
num_iterations.append(result.num_iterations.numpy())
num_func_calls.append(result.num_objective_evaluations.numpy())
plt.plot(delta_range, num_iterations, label='Iterations')
plt.plot(delta_range, num_func_calls, label='Objectrive evaluations')
plt.xlabel(r'$\delta$')
plt.legend()
plt.show()
Batching is a powerful feature of TensorFlow. It is an ability to perform the same operations with different components of tensors in parallel. Optimization algorithms in TensorFlow Finance are designed to support batching (although batches can only be 1-dimensional).
There are two use cases when batches may be useful:
You have one function, but want to try optimizing it starting from different initial positions.
You have a parametrized family of functions, and you want to optimize all of them in parallel.
Assume you have a function with several local minima and you want to find all of them.
One way to do so is to perform optimization several times, starting from different inital positions. With batching you can invoke optimization only once, with batch of different inital positions. For this you will have to specify objective function in such a way so it supports batching.
For example let's consider Himmelblau's function. It has 4 local minima. Let's start optimizing from 30 different random points and plot optimization paths for each of these points.
For demonstration, to get intermediary position on each iteration, we will invoke algorithm with different values of max_iter
. On practice you will have to only invoke it once with max_iter
sufficiently big for all paths to converge.
In [9]:
def himmelblau_function(x, y):
return (x * x + y - 11) ** 2 + (x + y * y - 7) ** 2
@make_val_and_grad_fn
def objective(coord):
return himmelblau_function(coord[..., 0], coord[..., 1])
np.random.seed(0)
starts = tf.constant(np.random.uniform(-5, 5, (30, 2)), dtype='float64')
paths = []
for iter_count in range(100):
results = tff_optimizer.conjugate_gradient_minimize(
objective,
initial_position=starts,
tolerance=1e-8,
max_iterations=iter_count)
paths.append(results.position.numpy())
if (np.all(results.converged.numpy())):
print('All converged after %d iterations' % iter_count)
break
# Show the function.
x, y = np.meshgrid(np.arange(-5, 5, 0.05), np.arange(-5, 5, 0.05))
z = himmelblau_function(x, y)
plt.pcolormesh(x, y, z, cmap='RdYlGn',
norm=colors.SymLogNorm(2),
vmin=np.min(z), vmax=np.max(z))
plt.colorbar()
# Show the paths.
paths = np.array(paths)
paths_x = paths[:,:,0].T
paths_y = paths[:,:,1].T
for i in range(len(starts)):
plt.plot(paths_x[i], paths_y[i], marker = '.')
plt.xlim(-5, 5)
plt.ylim(-5, 5)
plt.show()
We can see, that any path converges to one of four minima.
Sometimes you might want to find any local minimum, but start from multiple starting points. In this case it's not necessary to wait until all paths converge.
Conjugate Gradient algorithm supports parameter stopping_condition
which specify when to stop optimization. It is a function which takes two boolean tensors, one indicating which optimization paths converged and other - which failed. By default optimization will terminate when all paths converged or failed.
However, if you pass tff_optimizer.converged_any
, algorithm will terminate if at least one path converged (or all failed).
In [10]:
results = tff_optimizer.conjugate_gradient_minimize(
objective,
initial_position=starts,
tolerance=1e-8,
stopping_condition=tff_optimizer.converged_any)
assert np.any(results.converged.numpy())
print('Converged after %d iterations.' % results.num_iterations)
found_minima = tf.boolean_mask(results.position, results.converged).numpy()
print('Found minima: %s' % found_minima)
You can see that to converge from all points took 22 iterations, but to converge from any point took only 7 iterations, and we found 2 of 4 minima.
In the same way you can specify stopping_condition
for BFGS and L-BFGS algorithms.
In [0]:
np.random.seed(12345)
dim = 100 # Dimensionality of domain.
n = 500 # Number of functions.
a = np.random.randn(n, dim)
@make_val_and_grad_fn
def quadratic(x):
return tf.reduce_sum((x - a) ** 2, axis=-1)
results = tff_optimizer.conjugate_gradient_minimize(
quadratic,
initial_position=tf.zeros((n, dim), dtype=tf.float64),
max_iterations=200)
assert np.allclose(results.position.numpy() - a, 1e-8)
Code above demonstrates how to use Conjugate Gradient algorithm. TensorFlow Finance provides also implementations of others optimization algorithms:
Conjugate Gradient method. This is a first-order method.
BFGS. This is Quasi-Newton first-order method. It needs $O(N^2)$ memory ($N$ - domain dimensionality) to approximate the Hessian matrix, so it may not work for large $N$.
L-BFGS. This is optimized version of BFGS which uses less memory.
Nelder-Mead. This is gradient free method, i.e. it doesn't need gradient. It can be used to optimize non-differentiable function (or a function for which gradient can't be easily computed), but it typically will perform many more evaluations of the objective function than first-order methods.
Example below optimizes the same function with all available algorithms.
In [12]:
def rosenbrock(x):
return 100 * tf.reduce_sum(tf.square(x[1:] - tf.square(x[:-1]))) + \
tf.reduce_sum(tf.square(1 - x[:-1]))
rosenbrock_with_gradient = make_val_and_grad_fn(rosenbrock)
start = tf.zeros(7, dtype=tf.float64)
# Conjugate Gradient.
result = tff_optimizer.conjugate_gradient_minimize(
rosenbrock_with_gradient,
start,
tolerance=1e-4,
max_iterations=1000)
assert(np.allclose(result.position.numpy(), np.ones(7), 1e-4))
print("Objective evaluations with Conjugate Gradient: %d" %
result.num_objective_evaluations.numpy())
# BFGS.
result = tff_optimizer.bfgs_minimize(
rosenbrock_with_gradient,
start,
tolerance=1e-4,
max_iterations=1000)
assert(np.allclose(result.position.numpy(), np.ones(7), 1e-4))
print("Objective evaluations with BFGS: %d" %
result.num_objective_evaluations.numpy())
# L-BFGS.
result = tff_optimizer.lbfgs_minimize(
rosenbrock_with_gradient,
start,
tolerance=1e-4,
max_iterations=1000)
assert(np.allclose(result.position.numpy(), np.ones(7), 1e-4))
print("Objective evaluations with L-BFGS: %d" %
result.num_objective_evaluations.numpy())
# Nelder-Mead.
result = tff_optimizer.nelder_mead_minimize(
rosenbrock,
initial_vertex=start,
position_tolerance=1e-4,
max_iterations=1000)
assert(np.allclose(result.position.numpy(), np.ones(7), 1e-4))
print("Objective evaluations with Nelder-Mead: %d" %
result.num_objective_evaluations.numpy())
For more examples on BFGS, L-BFGS and Nelder-Mead methods, refer to this example from TensorFlow Probability.