This introduction seeks to broach a few basic topics in TensorFlow: what it is, how operations and data are defined for its computational graphs and how its operations are visualized. In doing this, some basic examples are shown, involving linear regression and basic optimizer usage.
TensorFlow is an open source software library for numerical computation using data flow graphs. In a data flow graph, nodes represent mathematical operations and edges represent the multidimensional data arrays (tensors) communicated between them.
TensorFlow is usually used in Python, though in the background it is using hardcore efficient code to parallelize its calculations a lot and is well-suited to GPU hardware. The Python convention to import TensorFlow is as follows:
In [1]:
import tensorflow as tf
In [2]:
print('TensorFlow version:', tf.__version__)
It can be helpful to hide some TensorFlow logging messages:
In [3]:
tf.TF_CPP_MIN_LOG_LEVEL = 3
The central unit of data in TensorFlow is the tensor, in the sense of it being an array of some arbitrary dimensionality. A tensor of rank 0 is a scalar, a tensor of rank 1 is a vector, a tensor of rank 2 is a matrix, a tensor of rank 3 is a 3-tensor, and so on.
rank | mathamatical object | shape | example |
---|---|---|---|
0 | scalar | [] |
3 |
1 | vector | [3] |
[1. ,2., 3.] |
2 | matrix | [2, 3] |
[[1., 2., 3.], [4., 5., 6.]] |
3 | 3-tensor | [2, 1, 3] |
[[[1., 2., 3.]], [[7., 8., 9.]]] |
n | n-tensor | ... | ... |
The various number types that TensorFlow can handle are as follows:
data type | Python type | description |
---|---|---|
DT_FLOAT |
t.float32 |
32 bits floating point |
DT_DOUBLE |
t.float64 |
64 bits floating point |
DT_INT8 |
t.int8 |
8 bits signed integer |
DT_INT16 |
t.int16 |
16 bits signed integer |
DT_INT32 |
t.int32 |
32 bits signed integer |
DT_INT64 |
t.int64 |
64 bits signed integer |
TensorFlow programs are defined as computational graphs. For TensorFlow, a computational graph is a series of TensorFlow operations arranged in a graph of nodes. A node takes zero or more tensors as inputs and produces a tensor as an output. Generally, a TensorFlow program consists of sections like these:
A simple TensorFlow node is a constant. It takes no inputs and simply outputs a value that it stores internally. Here are some constants:
In [4]:
node_1 = tf.constant(3.0, tf.float32)
node_2 = tf.constant(4.0) # (also tf.float32 by default)
print("node_1: {node}".format(node=node_1))
print("node_2: {node}".format(node=node_2))
The printouts of the nodes do not evaluate the outputs the nodes would produce, but show simply what the nodes would evaluate. To evaluate nodes, the computational graph is run in an encapsulation of the control and state of the TensorFlow runtime called a TensorFlow "session".
A Session
is a class for running TensorFlow operations. A session object encapsulates the environment in which operations are executed and tensors are evaluated. For example, sesh.run(c)
evaluates the tensor c
.
A session is run using its run
method:
tf.Session.run(
fetches,
feed_dict = None,
options = None,
run_metadata = None
)
This method runs operations and evaluates tensors in fetches. It returns one epoch of TensorFlow computation, by running the necessary graph fragment to execute every operation and evaluate every tensor in fetches, substituting the values in feed_dict
for the corresponding input values. The fetches
option can be a single graph element, or an arbitrary nested list, tuple, namedtuple, dict or OrderedDict containing graph elements at its leaves. The value returned by run
has the same shape as the fetches argument, where the leaves are replaced by the corresponding values returned by TensorFlow.
So, those constant nodes could be evaluated like this:
In [5]:
sesh = tf.Session()
sesh.run([node_1, node_2])
Out[5]:
More complicated nodes than constants are operations. For example, two constant nodes could be added:
In [6]:
node_3 = tf.add(node_1, node_2)
node_3
Out[6]:
In [7]:
sesh.run(node_3)
Out[7]:
This is, of course, a trivial mathematical operation, but it has been performed using very computationally efficient infrastructure. Far more complicated operations can be encoded in a computational graph and run using TensorFlow.
A computational graph can be parameterized to accept external inputs. These entry points for data are called placeholders.
So, let's create some placeholders that can hold 32 bit floating point numbers and let's also make a node for the addition operation applied to these placeholders:
In [8]:
a = tf.placeholder(tf.float32)
b = tf.placeholder(tf.float32)
adder_node = a+b # + provides a shortcut for tf.add(a, b)
adder_node
Out[8]:
The feed_dict
parameter of a session run
method is used to input data to these placeholders:
In [9]:
print(sesh.run(adder_node, {a: 3, b: 4}))
The same can be done with multiple values:
In [10]:
print(sesh.run(adder_node, {a: [3, 4], b: [5, 6]}))
You can start to see now how parallelism is core to TensorFlow.
Further nodes can be added to the computational graph easily:
In [11]:
add_and_triple = adder_node * 3.
sesh.run(add_and_triple, {a: 3, b: 4})
Out[11]:
Variables are nodes that have values that can change. These are used to have variable values in models, to make models trainable. A variable is defined with a type and an initial value.
We can make a linear model featuring changable variables like this:
In [12]:
W = tf.Variable([ .3], dtype=tf.float32)
b = tf.Variable([- .3], dtype=tf.float32)
x = tf.placeholder(tf.float32)
linear_model = W*x+b
Constants are initialized when they are called and their value doesn't change, but variables are initialized in a TensorFlow program using a special operation:
In [13]:
sesh = tf.Session()
init = tf.global_variables_initializer()
sesh.run(init)
Since x
is a placeholder, this linear model can be evaluated for several x
values in parallel:
In [14]:
print(sesh.run(linear_model, {x: [1, 2, 3, 4]}))
A large number of values can be stored in a variable easily, like this:
In [15]:
weights = tf.Variable(
tf.random_normal(
[784, 200],
stddev=0.35
),
name = "weights"
)
with tf.Session() as sesh:
sesh.run(tf.global_variables_initializer())
print(sesh.run(weights))
The value of a variable can be changed using operating like tf.assign
:
In [16]:
a = tf.Variable(10, dtype=tf.float32)
with tf.Session() as sesh:
sesh.run(tf.global_variables_initializer())
print("initial variable value: {value}".format(value = sesh.run(a)))
sesh.run(tf.assign(a, 20))
print("reassigned variable value: {value}".format(value = sesh.run(a)))
A loss function measures how far a model is from provided data. For a linear regression model, a standard loss function is the sums of the squares of the deltas between the current model and the provided data.
So, here we create a linear model:
In [17]:
W = tf.Variable([ .3], dtype=tf.float32)
b = tf.Variable([-.3], dtype=tf.float32)
x = tf.placeholder(tf.float32)
linear_model = W*x+b
We create a placeholder for our target values:
In [18]:
y = tf.placeholder(tf.float32)
We create the loss function for the model:
In [19]:
loss = tf.reduce_sum(tf.square(linear_model-y))
We launch a TensorFlow session, initialize graph variables (W
, b
), specify the target values (y
), specify the model parameters to try (x
) and run:
In [20]:
with tf.Session() as sesh:
sesh.run(tf.global_variables_initializer())
results = sesh.run(
loss,
{
x: [1, 2, 3, 4],
y: [0, -1, -2, -3]
}
)
print(results)
In machine learning, a simple linear model like this would be modified automatically by changing the variables W
and b
to try to find good model parameters. In this example, ideal values would be W = -1
and b = 1
, which would result in the loss function being 0.
Optimizers change variables in models in order to minimize loss functions. There is a lot of study ongoing on optimizers and there are many types. A simple optimizer is gradient descent. It modifies each variable according to the magnitude of the derivative of loss with respect to that variable.
Let's see the gradient descent optimize our linear regression model. We can do this by defining the optimizer (including its learning rate), defining what it is trying to minimize and then running that minimization in TensorFlow (instead of just running the loss function):
In [21]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
train = optimizer.minimize(loss)
In [22]:
with tf.Session() as sesh:
sesh.run(tf.global_variables_initializer())
for i in range(1000):
sesh.run(
train,
{
x: [1, 2, 3, 4],
y: [0, -1, -2, -3]
}
)
print(sesh.run([W, b]))
Here we can see the linear model parameters that resulted from this gradient descent minimization, and they are pretty close to -1
and -1
, which is cool. Tous en choeur maintenant, the code is as follows:
In [23]:
import tensorflow as tf
tf.reset_default_graph()
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
W = tf.Variable([ .3], dtype = tf.float32)
b = tf.Variable([-.3], dtype = tf.float32)
linear_model = W*x+b
loss = tf.reduce_sum(tf.square(linear_model-y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
train = optimizer.minimize(loss)
x_train = [1, 2, 3, 4]
y_train = [0, -1, -2, -3]
with tf.Session() as sesh:
sesh.run(tf.global_variables_initializer())
for i in range(1000):
sesh.run(
train,
{
x: x_train,
y: y_train
}
)
current_W, current_b, current_loss = sesh.run(
[W, b, loss],
{
x: x_train,
y: y_train
}
)
print('W: {W}, b: {b}, loss: {loss}'.format(W=current_W, b=current_b, loss=current_loss))