TensorFlow introduction: the art of the sesh

This introduction seeks to broach a few basic topics in TensorFlow: what it is, how operations and data are defined for its computational graphs and how its operations are visualized. In doing this, some basic examples are shown, involving linear regression and basic optimizer usage.

What the shit is TensorFlow?

TensorFlow is an open source software library for numerical computation using data flow graphs. In a data flow graph, nodes represent mathematical operations and edges represent the multidimensional data arrays (tensors) communicated between them.

TensorFlow is usually used in Python, though in the background it is using hardcore efficient code to parallelize its calculations a lot and is well-suited to GPU hardware. The Python convention to import TensorFlow is as follows:



In [1]:

    
import tensorflow as tf



In [2]:

    
print('TensorFlow version:', tf.__version__)









    



TensorFlow version: 1.15.0

It can be helpful to hide some TensorFlow logging messages:



In [3]:

    
tf.TF_CPP_MIN_LOG_LEVEL = 3

tensors, ranks, shapes and types

The central unit of data in TensorFlow is the tensor, in the sense of it being an array of some arbitrary dimensionality. A tensor of rank 0 is a scalar, a tensor of rank 1 is a vector, a tensor of rank 2 is a matrix, a tensor of rank 3 is a 3-tensor, and so on.

rank	mathamatical object	shape	example
0	scalar	`[]`	`3`
1	vector	`[3]`	`[1. ,2., 3.]`
2	matrix	`[2, 3]`	`[[1., 2., 3.], [4., 5., 6.]]`
3	3-tensor	`[2, 1, 3]`	`[[[1., 2., 3.]], [[7., 8., 9.]]]`
n	n-tensor	...	...

The various number types that TensorFlow can handle are as follows:

data type	Python type	description
`DT_FLOAT`	`t.float32`	32 bits floating point
`DT_DOUBLE`	`t.float64`	64 bits floating point
`DT_INT8`	`t.int8`	8 bits signed integer
`DT_INT16`	`t.int16`	16 bits signed integer
`DT_INT32`	`t.int32`	32 bits signed integer
`DT_INT64`	`t.int64`	64 bits signed integer

TensorFlow mechanics: computational graphs and nodes

TensorFlow programs are defined as computational graphs. For TensorFlow, a computational graph is a series of TensorFlow operations arranged in a graph of nodes. A node takes zero or more tensors as inputs and produces a tensor as an output. Generally, a TensorFlow program consists of sections like these:

1 Build a graph using TensorFlow operations.
2 Feed data to TensorFlow and run the graph.
3 Update variables in the graph and return values.

A simple TensorFlow node is a constant. It takes no inputs and simply outputs a value that it stores internally. Here are some constants:



In [4]:

    
node_1 = tf.constant(3.0, tf.float32)
node_2 = tf.constant(4.0) # (also tf.float32 by default)
print("node_1: {node}".format(node=node_1))
print("node_2: {node}".format(node=node_2))









    



node_1: Tensor("Const:0", shape=(), dtype=float32)
node_2: Tensor("Const_1:0", shape=(), dtype=float32)

The printouts of the nodes do not evaluate the outputs the nodes would produce, but show simply what the nodes would evaluate. To evaluate nodes, the computational graph is run in an encapsulation of the control and state of the TensorFlow runtime called a TensorFlow "session".

session

A Session is a class for running TensorFlow operations. A session object encapsulates the environment in which operations are executed and tensors are evaluated. For example, sesh.run(c) evaluates the tensor c.

A session is run using its run method:

tf.Session.run(
    fetches,
    feed_dict    = None,
    options      = None,
    run_metadata = None
)

This method runs operations and evaluates tensors in fetches. It returns one epoch of TensorFlow computation, by running the necessary graph fragment to execute every operation and evaluate every tensor in fetches, substituting the values in feed_dict for the corresponding input values. The fetches option can be a single graph element, or an arbitrary nested list, tuple, namedtuple, dict or OrderedDict containing graph elements at its leaves. The value returned by run has the same shape as the fetches argument, where the leaves are replaced by the corresponding values returned by TensorFlow.

So, those constant nodes could be evaluated like this:



In [5]:

    
sesh = tf.Session()
sesh.run([node_1, node_2])









    Out[5]:





[3.0, 4.0]

More complicated nodes than constants are operations. For example, two constant nodes could be added:



In [6]:

    
node_3 = tf.add(node_1, node_2)
node_3









    Out[6]:





<tf.Tensor 'Add:0' shape=() dtype=float32>



In [7]:

    
sesh.run(node_3)









    Out[7]:





7.0

This is, of course, a trivial mathematical operation, but it has been performed using very computationally efficient infrastructure. Far more complicated operations can be encoded in a computational graph and run using TensorFlow.

placeholders

A computational graph can be parameterized to accept external inputs. These entry points for data are called placeholders.

So, let's create some placeholders that can hold 32 bit floating point numbers and let's also make a node for the addition operation applied to these placeholders:



In [8]:

    
a = tf.placeholder(tf.float32)
b = tf.placeholder(tf.float32)
adder_node = a+b  # + provides a shortcut for tf.add(a, b)
adder_node









    Out[8]:





<tf.Tensor 'add_1:0' shape=<unknown> dtype=float32>

The feed_dict parameter of a session run method is used to input data to these placeholders:



In [9]:

    
print(sesh.run(adder_node, {a: 3, b: 4}))

7.0

The same can be done with multiple values:



In [10]:

    
print(sesh.run(adder_node, {a: [3, 4], b: [5, 6]}))









    



[ 8. 10.]

You can start to see now how parallelism is core to TensorFlow.

Further nodes can be added to the computational graph easily:



In [11]:

    
add_and_triple = adder_node * 3.

sesh.run(add_and_triple, {a: 3, b: 4})









    Out[11]:





21.0

variables

Variables are nodes that have values that can change. These are used to have variable values in models, to make models trainable. A variable is defined with a type and an initial value.

We can make a linear model featuring changable variables like this:



In [12]:

    
W = tf.Variable([  .3], dtype=tf.float32)
b = tf.Variable([- .3], dtype=tf.float32)
x = tf.placeholder(tf.float32)
linear_model = W*x+b

Constants are initialized when they are called and their value doesn't change, but variables are initialized in a TensorFlow program using a special operation:



In [13]:

    
sesh = tf.Session()
init = tf.global_variables_initializer()
sesh.run(init)

Since x is a placeholder, this linear model can be evaluated for several x values in parallel:



In [14]:

    
print(sesh.run(linear_model, {x: [1, 2, 3, 4]}))









    



[0.         0.3        0.6        0.90000004]

A large number of values can be stored in a variable easily, like this:



In [15]:

    
weights = tf.Variable(
    tf.random_normal(
        [784, 200],
        stddev=0.35
    ),
    name = "weights"
)
with tf.Session() as sesh:
    sesh.run(tf.global_variables_initializer())
    print(sesh.run(weights))









    



[[-0.0695153   0.24844736 -0.29384595 ... -0.4262702   0.5362359
   0.35842693]
 [ 0.5328874  -0.00643804 -0.14546531 ... -0.13259502 -0.26890212
   0.3340728 ]
 [ 0.13115722  0.42233288 -0.4024447  ...  0.4704668   0.08130927
  -0.2908541 ]
 ...
 [-0.11956223  0.25437382  0.11524835 ... -0.32708403 -0.3637847
  -0.22663982]
 [ 0.14936803  0.22951277  0.11283576 ... -0.35600176 -0.22614416
  -0.14243811]
 [-0.06893592 -0.06819851  0.14464253 ... -0.3501179   0.32624316
  -0.05766584]]

The value of a variable can be changed using operating like tf.assign:



In [16]:

    
a = tf.Variable(10, dtype=tf.float32)
with tf.Session() as sesh:
    sesh.run(tf.global_variables_initializer())
    print("initial variable value: {value}".format(value = sesh.run(a)))
    sesh.run(tf.assign(a, 20))
    print("reassigned variable value: {value}".format(value = sesh.run(a)))









    



initial variable value: 10.0
reassigned variable value: 20.0

loss function

A loss function measures how far a model is from provided data. For a linear regression model, a standard loss function is the sums of the squares of the deltas between the current model and the provided data.

So, here we create a linear model:



In [17]:

    
W = tf.Variable([ .3], dtype=tf.float32)
b = tf.Variable([-.3], dtype=tf.float32)
x = tf.placeholder(tf.float32)
linear_model = W*x+b

We create a placeholder for our target values:



In [18]:

    
y = tf.placeholder(tf.float32)

We create the loss function for the model:



In [19]:

    
loss = tf.reduce_sum(tf.square(linear_model-y))

We launch a TensorFlow session, initialize graph variables (W, b), specify the target values (y), specify the model parameters to try (x) and run:



In [20]:

    
with tf.Session() as sesh:
    sesh.run(tf.global_variables_initializer())
    results = sesh.run(
        loss,
        {
            x: [1,  2,  3,  4],
            y: [0, -1, -2, -3]
        }
    )
print(results)

In machine learning, a simple linear model like this would be modified automatically by changing the variables W and b to try to find good model parameters. In this example, ideal values would be W = -1 and b = 1, which would result in the loss function being 0.

training optimizers

Optimizers change variables in models in order to minimize loss functions. There is a lot of study ongoing on optimizers and there are many types. A simple optimizer is gradient descent. It modifies each variable according to the magnitude of the derivative of loss with respect to that variable.

Let's see the gradient descent optimize our linear regression model. We can do this by defining the optimizer (including its learning rate), defining what it is trying to minimize and then running that minimization in TensorFlow (instead of just running the loss function):



In [21]:

    
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
train = optimizer.minimize(loss)



In [22]:

    
with tf.Session() as sesh:
    sesh.run(tf.global_variables_initializer())
    for i in range(1000):
        sesh.run(
            train,
            {
                x: [1,  2,  3,  4],
                y: [0, -1, -2, -3]
            }
        )
    print(sesh.run([W, b]))









    



[array([-0.9999969], dtype=float32), array([0.9999908], dtype=float32)]

Here we can see the linear model parameters that resulted from this gradient descent minimization, and they are pretty close to -1 and -1, which is cool. Tous en choeur maintenant, the code is as follows:



In [23]:

    
import tensorflow as tf

tf.reset_default_graph()
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
W = tf.Variable([ .3], dtype = tf.float32)
b = tf.Variable([-.3], dtype = tf.float32)
linear_model = W*x+b
loss = tf.reduce_sum(tf.square(linear_model-y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
train = optimizer.minimize(loss)

x_train = [1,  2,  3,  4]
y_train = [0, -1, -2, -3]

with tf.Session() as sesh:
    sesh.run(tf.global_variables_initializer())
    for i in range(1000):
        sesh.run(
            train,
            {
                x: x_train,
                y: y_train
            }
        )
    current_W, current_b, current_loss = sesh.run(
        [W, b, loss],
        {
            x: x_train,
            y: y_train
        }
    )
    print('W: {W}, b: {b}, loss: {loss}'.format(W=current_W, b=current_b, loss=current_loss))









    



W: [-0.9999969], b: [0.9999908], loss: 5.699973826267524e-11