In [ ]:
%%HTML
<style>
.container { width:100% }
</style>
In this notebook we show how to find the minimum of the function
$$ x \mapsto \exp(x) - 2 \cdot x^2 + 1 $$
using the TensorFlow library.
We plot this function using numpy
, matplotlib
, and seaborn
.
In [ ]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
First, we define the function $x \mapsto \exp(x) - 2 \cdot x^2 + 1$ as a Python function that can take a
numpy
array as its argument.
In [ ]:
def fm(x):
return np.exp(x) - 2 * x**2 + 1
Next, we plot this function for all $x$ such that $-1 \leq x \leq 3$.
In [ ]:
Xs = np.arange(-1.0, 3, 0.01)
Ys = fm(Xs)
plt.figure(figsize=(12,12))
sns.set(style='whitegrid')
sns.lineplot(Xs, Ys, color='b')
plt.axvline(x=0.0, c='k')
plt.axhline(y=0.0, c='k')
plt.ylim(-0.5, 3.0)
plt.xlim(-1.0, 3,0)
plt.xlabel('x')
plt.ylabel('y')
plt.title('x |-> exp(x) - 2 * x**2 + 1')
For $x \geq 0$, the function $f$ seems to have a minimum somewhere between $2.0$ and $2.5$. We want to compute this minimum numerically using gradient descent via
TensorFlow, but we do not want to compute the gradient of
$f$ ourselves. In order to install tensorflow
, the following command can be used:
conda install -c conda-forge tensorflow
In [ ]:
import tensorflow as tf
We start by defining a variable $x$. Later, we will define the function $$ f(x) := \exp(x) - 2 \cdot x^2 + 1 $$ and compute the value $x_0$ such that $f(x_0) \leq f(x)$ for all $x \geq 0$. The variable $x$ is a single precision variable, hence we use tf.float32 as its data type. The variable is initialized to the value $1$. We also assign a name to it, but this name is completely optional, since this name is only used when we print the variable. Hence it is only useful for debugging.
In [ ]:
x = tf.Variable(1, dtype=tf.float32, name='var_x')
x
Since this is a variable that contains only a single number and not an array or a matrix, its shape is
(). The string 'var_x:0'
is an internal name used by TensorFlow to manage this variable.
Note that TensorFlow has appended the string ':0' at the end of the string var_x
in order to ensure that this name is unique.
Let us define a cost function $f$ using tensorflow
next. Mathematically,
this cost function is the function $f$ from above:
$$ f(x) = \exp(x) - 2 \cdot x^2 + 1 $$
Note that we have used the variable x
defined above in the right hand side of this definition.
In [ ]:
f = tf.exp(x) - 2 * x**2 + 1
f
Conceptually, f
as defined above is a term made up from constants and variables. Technically, f
is an object of the class Tensor
. Since the last operation that is executed when computing $f(x)$ is the
addition of $1$ and, conceptually, a tensor is just an abstract syntax tree representing an expression, this tensor has the internal name add:0
.
Having defined the function $f$, we can now try to minimize it via
gradient descent. The module tf.train contains various algorithms for minimization. tf.train.GradientDescentOptimzer
is the optimizer that implements one step of gradient descent. When doing gradient descent, we will use a learning rate $\alpha$ of $0.2$. Using a smaller learning rate would slow down gradient descent, but if we would use a learning rate that is significantly larger, then gradient descent would start to oscillate and
hence gradient descent would not converge and, therefore, it would not be able to find the minimum.
In [ ]:
α = 0.2
train = tf.train.GradientDescentOptimizer(α).minimize(f)
Tensorflow issues lot of deprecation warnings. As they are quite annoying and we can't do anything about the issues, let us suppress further warnings. This is done by setting the environment variable TF_CPP_MIN_LOG_LEVEL
to the value of '2'
. To this end we have to use the module os
.
In [ ]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
Up to now, train is just an object.
In [ ]:
train
In order to start running the gradient descent optimizer, we first have to create an initializer object that can later be used to initialize all of our variables. Of course, in this simple example there is just one variable x, but in general, there could be many different variables.
In [ ]:
init = tf.global_variables_initializer()
init
Next, we start a TensorFlow session that performs the real work. Session is a TensorFlow class that has a method called run. This method can be used to evaluate a variable or to perform one step of an iterative algorithm like gradient descent.
In [ ]:
with tf.Session() as session:
session.run(init) # initialize x to 1
for k in range(12): # we do 12 steps of gradient descent
session.run(train) # run one step of gradient descent
v = session.run(x) # evaluate x so we can print it
print('%2d: %f' % (k, v))
This computation shows that the function $f$ takes its minimal value at $x \approx 2.153292$.
Note that although we have used gradient descent, we never had to calculate the derivative of the function $f$. This derivative has been calculated by TensorFlow instead.
In [ ]: