TensorFlow is a general-purpose system for graph-based computation. A typical use is machine learning. In this notebook, we'll introduce the basic concepts of TensorFlow using some simple examples.
TensorFlow gets its name from tensors, which are arrays of arbitrary dimensionality. A vector is a 1-d array and is known as a 1st-order tensor. A matrix is a 2-d array and a 2nd-order tensor. The "flow" part of the name refers to computation flowing through a graph. Training and inference in a neural network, for example, involves the propagation of matrix computations through many nodes in a computational graph.
When you think of doing things in TensorFlow, you might want to think of creating tensors (like matrices), adding operations (that output other tensors), and then executing the computation (running the computational graph). In particular, it's important to realize that when you add an operation on tensors, it doesn't execute immediately. Rather, TensorFlow waits for you to define all the operations you want to perform. Then, TensorFlow optimizes the computation graph, deciding how to execute the computation, before generating the data. Because of this, a tensor in TensorFlow isn't so much holding the data as a placeholder for holding the data, waiting for the data to arrive when a computation is executed.
In [2]:
import tensorflow as tf
with tf.Session():
input1 = tf.constant([1.0, 1.0, 1.0, 1.0])
input2 = tf.constant([2.0, 2.0, 2.0, 2.0])
output = tf.add(input1, input2)
result = output.eval()
print result
What we're doing is creating two vectors, [1.0, 1.0, 1.0, 1.0] and [2.0, 2.0, 2.0, 2.0], and then adding them. Here's equivalent code in raw Python and using numpy:
In [3]:
print [x + y for x, y in zip([1.0] * 4, [2.0] * 4)]
In [4]:
import numpy as np
x, y = np.full(4, 1.0), np.full(4, 2.0)
print "{} + {} = {}".format(x, y, x + y)
The example above of adding two vectors involves a lot more than it seems, so let's look at it in more depth.
import tensorflow as tf
This import brings TensorFlow's public API into our IPython runtime environment.
with tf.Session():
When you run an operation in TensorFlow, you need to do it in the context of a Session. A session holds the computation graph, which contains the tensors and the operations. When you create tensors and operations, they are not executed immediately, but wait for other operations and tensors to be added to the graph, only executing when finally requested to produce the results of the session. Deferring the execution like this provides additional opportunities for parallelism and optimization, as TensorFlow can decide how to combine operations and where to run them after TensorFlow knows about all the operations.
input1 = tf.constant([1.0, 1.0, 1.0, 1.0])
input2 = tf.constant([2.0, 2.0, 2.0, 2.0])
The next two lines create tensors using a convenience function called constant, which is similar to numpy's array and numpy's full. If you look at the code for constant, you can see the details of what it is doing to create the tensor. In summary, it creates a tensor of the necessary shape and applies the constant operator to it to fill it with the provided values. The values to constant can be Python or numpy arrays. constant can take an optional shape parameter, which works similarly to numpy's fill if provided, and an optional name parameter, which can be used to put a more human-readable label on the operation in the TensorFlow operation graph.
output = tf.add(input1, input2)
You might think add just adds the two vectors now, but it doesn't quite do that. What it does is put the add operation into the computational graph. The results of the addition aren't available yet. They've been put in the computation graph, but the computation graph hasn't been executed yet.
result = output.eval()
print result
eval() is also slightly more complicated than it looks. Yes, it does get the value of the vector (tensor) that results from the addition. It returns this as a numpy array, which can then be printed. But, it's important to realize it also runs the computation graph at this point, because we demanded the output from the operation node of the graph; to produce that, it had to run the computation graph. So, this is the point where the addition is actually performed, not when add was called, as add just put the addition operation into the TensorFlow computation graph.
In [5]:
import tensorflow as tf
with tf.Session():
input1 = tf.constant(1.0, shape=[4])
input2 = tf.constant(2.0, shape=[4])
input3 = tf.constant(3.0, shape=[4])
output = tf.add(tf.add(input1, input2), input3)
result = output.eval()
print result
This version uses constant in a way similar to numpy's fill, specifying the optional shape and having the values copied out across it.
The add operator supports operator overloading, so you could try writing it inline as input1 + input2 instead as well as experimenting with other operators.
In [6]:
with tf.Session():
input1 = tf.constant(1.0, shape=[4])
input2 = tf.constant(2.0, shape=[4])
output = input1 + input2
print output.eval()
Next, let's do something very similar, adding two matrices:
$\begin{bmatrix}
In [7]:
import tensorflow as tf
import numpy as np
with tf.Session():
input1 = tf.constant(1.0, shape=[2, 3])
input2 = tf.constant(np.reshape(np.arange(1.0, 7.0, dtype=np.float32), (2, 3)))
output = tf.add(input1, input2)
print output.eval()
Recall that you can pass numpy or Python arrays into constant.
In this example, the matrix with values from 1 to 6 is created in numpy and passed into constant, but TensorFlow also has range, reshape, and tofloat operators. Doing this entirely within TensorFlow could be more efficient if this was a very large matrix.
Try experimenting with this code a bit -- maybe modifying some of the values, using the numpy version, doing this using, adding another operation, or doing this using TensorFlow's range function.
Let's move on to matrix multiplication. This time, let's use a bit vector and some random values, which is a good step toward some of what we'll need to do for regression and neural networks.
In [8]:
#@test {"output": "ignore"}
import tensorflow as tf
import numpy as np
with tf.Session():
input_features = tf.constant(np.reshape([1, 0, 0, 1], (1, 4)).astype(np.float32))
weights = tf.constant(np.random.randn(4, 2).astype(np.float32))
output = tf.matmul(input_features, weights)
print "Input:"
print input_features.eval()
print "Weights:"
print weights.eval()
print "Output:"
print output.eval()
Above, we're taking a 1 x 4 vector [1 0 0 1] and multiplying it by a 4 by 2 matrix full of random values from a normal distribution (mean 0, stdev 1). The output is a 1 x 2 matrix.
You might try modifying this example. Running the cell multiple times will generate new random weights and a new output. Or, change the input, e.g., to [0 0 0 1]), and run the cell again. Or, try initializing the weights using the TensorFlow op, e.g., random_normal, instead of using numpy to generate the random weights.
What we have here is the basics of a simple neural network already. If we are reading in the input features, along with some expected output, and change the weights based on the error with the output each time, that's a neural network.
Let's look at adding two small matrices in a loop, not by creating new tensors every time, but by updating the existing values and then re-running the computation graph on the new data. This happens a lot with machine learning models, where we change some parameters each time such as gradient descent on some weights and then perform the same computations over and over again.
In [9]:
#@test {"output": "ignore"}
import tensorflow as tf
import numpy as np
with tf.Session() as sess:
# Set up two variables, total and weights, that we'll change repeatedly.
total = tf.Variable(tf.zeros([1, 2]))
weights = tf.Variable(tf.random_uniform([1,2]))
# Initialize the variables we defined above.
tf.initialize_all_variables().run()
# This only adds the operators to the graph right now. The assignment
# and addition operations are not performed yet.
update_weights = tf.assign(weights, tf.random_uniform([1, 2], -1.0, 1.0))
update_total = tf.assign(total, tf.add(total, weights))
for _ in range(5):
# Actually run the operation graph, so randomly generate weights and then
# add them into the total. Order does matter here. We need to update
# the weights before updating the total.
sess.run(update_weights)
sess.run(update_total)
print weights.eval(), total.eval()
This is more complicated. At a high level, we create two variables and add operations over them, then, in a loop, repeatedly execute those operations. Let's walk through it step by step.
Starting off, the code creates two variables, total and weights. total is initialized to [0, 0] and weights is initialized to random values between -1 and 1.
Next, two assignment operators are added to the graph, one that updates weights with random values from [-1, 1], the other that updates the total with the new weights. Again, the operators are not executed here. In fact, this isn't even inside the loop. We won't execute these operations until the eval call inside the loop.
Finally, in the for loop, we run each of the operators. In each iteration of the loop, this executes the operators we added earlier, first putting random values into the weights, then updating the totals with the new weights. This call uses eval on the session; the code also could have called eval on the operators (e.g. update_weights.eval).
It can be a little hard to wrap your head around exactly what computation is done when. The important thing to remember is that computation is only performed on demand.
Variables can be useful in cases where you have a large amount of computation and data that you want to use over and over again with just a minor change to the input each time. That happens quite a bit with neural networks, for example, where you just want to update the weights each time you go through the batches of input data, then run the same operations over again.
This has been a gentle introduction to TensorFlow, focused on what TensorFlow is and the very basics of doing anything in TensorFlow. If you'd like more, the next tutorial in the series is Getting Started with TensorFlow, also available in the notebooks directory.
In [19]:
x = tf.placeholder(tf.string)
a = tf.identity(x)
with tf.Session() as sess:
output = sess.run(a, feed_dict={x: 'Hello World'})
print(output)
In [20]:
x = tf.placeholder(tf.string)
y = tf.placeholder(tf.int32)
z = tf.placeholder(tf.float32)
with tf.Session() as sess:
output = sess.run(x, feed_dict={x: 'Test String', y: 123, z: 45.67})
print(output)
In [24]:
#tf.subtract(tf.constant(2.0),tf.constant(1))
output = tf.sub(tf.cast(tf.constant(2.0), tf.int32), tf.constant(1)) # 1
print(output)
In [30]:
import tensorflow as tf
# TODO: Convert the following to TensorFlow:
x = tf.constant(10)
y = tf.constant(2)
y = tf.placeholder(tf.float32, 10)
x = tf.placeholder(tf.float32, 2)
z = tf.sub(tf.div(x,y),tf.cast(tf.constant(1), tf.float64))
# TODO: Print z from a session
with tf.Session() as sess:
output = sess.run(z)
print(output)
In [32]:
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
In [37]:
n_labels = 5
init = tf.initialize_all_variables()
bias = tf.Variable(tf.zeros(n_labels))
with tf.Session() as sess:
output = sess.run(bias)
print(output)
In [34]:
n_features = 120
n_labels = 5
weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))
In [38]:
# Solution is available in the other "quiz_solution.py" tab
import tensorflow as tf
def get_weights(n_features, n_labels):
"""
Return TensorFlow weights
:param n_features: Number of features
:param n_labels: Number of labels
:return: TensorFlow weights
"""
# TODO: Return weights
return tf.Variable(tf.truncated_normal((n_features, n_labels)))
def get_biases(n_labels):
"""
Return TensorFlow bias
:param n_labels: Number of labels
:return: TensorFlow bias
"""
# TODO: Return biases
return tf.Variable(tf.zeros(n_labels))
def linear(input, w, b):
"""
Return linear function in TensorFlow
:param input: TensorFlow input
:param w: TensorFlow weights
:param b: TensorFlow biases
:return: TensorFlow linear function
"""
# TODO: Linear Function (xW + b)
return tf. add(tf.matmul(input,w),b)
In [41]:
# Solution is available in the other "sandbox_solution.py" tab
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
#from quiz import get_weights, get_biases, linear
def mnist_features_labels(n_labels):
"""
Gets the first <n> labels from the MNIST dataset
:param n_labels: Number of labels to use
:return: Tuple of feature list and label list
"""
mnist_features = []
mnist_labels = []
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)
# In order to make quizzes run faster, we're only looking at 10000 images
for mnist_feature, mnist_label in zip(*mnist.train.next_batch(10000)):
# Add features and labels if it's for the first <n>th labels
if mnist_label[:n_labels].any():
mnist_features.append(mnist_feature)
mnist_labels.append(mnist_label[:n_labels])
return mnist_features, mnist_labels
# Number of features (28*28 image is 784 features)
n_features = 784
# Number of labels
n_labels = 3
# Features and Labels
features = tf.placeholder(tf.float32)
labels = tf.placeholder(tf.float32)
# Weights and Biases
w = get_weights(n_features, n_labels)
b = get_biases(n_labels)
# Linear Function xW + b
logits = linear(features, w, b)
# Training data
train_features, train_labels = mnist_features_labels(n_labels)
with tf.Session() as session:
# TODO: Initialize session variables
session.run(tf.initialize_all_variables())
# Softmax
prediction = tf.nn.softmax(logits)
# Cross entropy
# This quantifies how far off the predictions were.
# You'll learn more about this in future lessons.
cross_entropy = -tf.reduce_sum(labels * tf.log(prediction), reduction_indices=1)
# Training loss
# You'll learn more about this in future lessons.
loss = tf.reduce_mean(cross_entropy)
# Rate at which the weights are changed
# You'll learn more about this in future lessons.
learning_rate = 0.08
# Gradient Descent
# This is the method used to train the model
# You'll learn more about this in future lessons.
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
# Run optimizer and get loss
_, l = session.run(
[optimizer, loss],
feed_dict={features: train_features, labels: train_labels})
# Print loss
print('Loss: {}'.format(l))
In [44]:
# Solution is available in the other "solution.py" tab
import tensorflow as tf
def run():
output = None
logit_data = [2.0, 1.0, 0.1]
logits = tf.placeholder(tf.float32)
# TODO: Calculate the softmax of the logits
softmax = tf.nn.softmax(logit_data)
with tf.Session() as sess:
# TODO: Feed in the logit data
output = sess.run(softmax, feed_dict={logits: logit_data} )
return output
In [49]:
# Solution is available in the other "solution.py" tab
import tensorflow as tf
softmax_data = [0.27, 0.11, 0.33, 0.10, 0.19]
one_hot_data = [0, 0, 0, 1, 0]
softmax = tf.placeholder(tf.float32)
one_hot = tf.placeholder(tf.float32)
# TODO: Print cross entropy from session
cross_entropy = -tf.reduce_sum(tf.mul(one_hot, tf.log(softmax)))
with tf.Session() as sess:
print(sess.run(cross_entropy, feed_dict={softmax:softmax_data, one_hot:one_hot_data}))
In [50]:
import pandas as pd
# TODO: Set weight1, weight2, and bias
weight1 = 1.0
weight2 = 1.0
bias = -2.0
# DON'T CHANGE ANYTHING BELOW
# Inputs and outputs
test_inputs = [(0, 0), (0, 1), (1, 0), (1, 1)]
correct_outputs = [False, False, False, True]
outputs = []
# Generate and check output
for test_input, correct_output in zip(test_inputs, correct_outputs):
linear_combination = weight1 * test_input[0] + weight2 * test_input[1] + bias
output = int(linear_combination >= 0)
is_correct_string = 'Yes' if output == correct_output else 'No'
outputs.append([test_input[0], test_input[1], linear_combination, output, is_correct_string])
# Print output
num_wrong = len([output[4] for output in outputs if output[4] == 'No'])
output_frame = pd.DataFrame(outputs, columns=['Input 1', ' Input 2', ' Linear Combination', ' Activation Output', ' Is Correct'])
if not num_wrong:
print('Nice! You got it all correct.\n')
else:
print('You got {} wrong. Keep trying!\n'.format(num_wrong))
print(output_frame.to_string(index=False))
In [ ]:
import pandas as pd
# TODO: Set weight1, weight2, and bias
weight1 = 1.0
weight2 = 1.0
bias = -2.0
# DON'T CHANGE ANYTHING BELOW
# Inputs and outputs
test_inputs = [(0, 0), (0, 1), (1, 0), (1, 1)]
correct_outputs = [False, False, False, True]
outputs = []
# Generate and check output
for test_input, correct_output in zip(test_inputs, correct_outputs):
linear_combination = weight1 * test_input[0] + weight2 * test_input[1] + bias
output = int(linear_combination >= 0)
is_correct_string = 'Yes' if output == correct_output else 'No'
outputs.append([test_input[0], test_input[1], linear_combination, output, is_correct_string])
# Print output
num_wrong = len([output[4] for output in outputs if output[4] == 'No'])
output_frame = pd.DataFrame(outputs, columns=['Input 1', ' Input 2', ' Linear Combination', ' Activation Output', ' Is Correct'])
if not num_wrong:
print('Nice! You got it all correct.\n')
else:
print('You got {} wrong. Keep trying!\n'.format(num_wrong))
print(output_frame.to_string(index=False))
In [ ]:
import numpy as np
def sigmoid(x):
# TODO: Implement sigmoid function
return 1/(1 + np.exp(-x))
inputs = np.array([0.7, -0.3])
weights = np.array([0.1, 0.8])
bias = -0.1
# TODO: Calculate the output
output = sigmoid(np.dot(weights, inputs) + bias)
print('Output:')
print(output)
In [ ]:
import numpy as np
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1/(1+np.exp(-x))
def sigmoid_prime(x):
"""
# Derivative of the sigmoid function
"""
return sigmoid(x) * (1 - sigmoid(x))
learnrate = 0.5
x = np.array([1, 2, 3, 4])
y = np.array(0.5)
# Initial weights
w = np.array([0.5, -0.5, 0.3, 0.1])
### Calculate one gradient descent step for each weight
### Note: Some steps have been consilated, so there are
### fewer variable names than in the above sample code
# TODO: Calculate the node's linear combination of inputs and weights
h = np.dot(x, w)
# TODO: Calculate output of neural network
nn_output = sigmoid(h)
# TODO: Calculate error of neural network
error = y - nn_output
# TODO: Calculate the error term
# Remember, this requires the output gradient, which we haven't
# specifically added a variable for.
error_term = error * sigmoid_prime(h)
# Note: The sigmoid_prime function calculates sigmoid(h) twice,
# but you've already calculated it once. You can make this
# code more efficient by calculating the derivative directly
# rather than calling sigmoid_prime, like this:
# error_term = error * nn_output * (1 - nn_output)
# TODO: Calculate change in weights
del_w = learnrate * error_term * x
print('Neural Network output:')
print(nn_output)
print('Amount of Error:')
print(error)
print('Change in Weights:')
print(del_w)
In [51]:
import numpy as np
def sigmoid(x):
# TODO: Implement sigmoid function
return 1/(1 + np.exp(-x))
inputs = np.array([0.7, -0.3])
weights = np.array([0.1, 0.8])
bias = -0.1
# TODO: Calculate the output
output = sigmoid(np.dot(weights, inputs) + bias)
print('Output:')
print(output)
In [ ]:
import numpy as np
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1/(1+np.exp(-x))
def sigmoid_prime(x):
"""
# Derivative of the sigmoid function
"""
return sigmoid(x) * (1 - sigmoid(x))
learnrate = 0.5
x = np.array([1, 2, 3, 4])
y = np.array(0.5)
# Initial weights
w = np.array([0.5, -0.5, 0.3, 0.1])
### Calculate one gradient descent step for each weight
### Note: Some steps have been consilated, so there are
### fewer variable names than in the above sample code
# TODO: Calculate the node's linear combination of inputs and weights
h = np.dot(x, w)
# TODO: Calculate output of neural network
nn_output = sigmoid(h)
# TODO: Calculate error of neural network
error = y - nn_output
# TODO: Calculate the error term
# Remember, this requires the output gradient, which we haven't
# specifically added a variable for.
error_term = error * sigmoid_prime(h)
# Note: The sigmoid_prime function calculates sigmoid(h) twice,
# but you've already calculated it once. You can make this
# code more efficient by calculating the derivative directly
# rather than calling sigmoid_prime, like this:
# error_term = error * nn_output * (1 - nn_output)
# TODO: Calculate change in weights
del_w = learnrate * error_term * x
print('Neural Network output:')
print(nn_output)
print('Amount of Error:')
print(error)
print('Change in Weights:')
print(del_w)
In [53]:
import numpy as np
import pandas as pd
admissions = pd.read_csv('binary.csv')
# Make dummy variables for rank
data = pd.concat([admissions, pd.get_dummies(admissions['rank'], prefix='rank')], axis=1)
data = data.drop('rank', axis=1)
# Standarize features
for field in ['gre', 'gpa']:
mean, std = data[field].mean(), data[field].std()
data.loc[:,field] = (data[field]-mean)/std
# Split off random 10% of the data for testing
np.random.seed(42)
sample = np.random.choice(data.index, size=int(len(data)*0.9), replace=False)
data, test_data = data.ix[sample], data.drop(sample)
# Split into features and targets
features, targets = data.drop('admit', axis=1), data['admit']
features_test, targets_test = test_data.drop('admit', axis=1), test_data['admit']
In [54]:
import numpy as np
from data_prep import features, targets, features_test, targets_test
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1 / (1 + np.exp(-x))
# TODO: We haven't provided the sigmoid_prime function like we did in
# the previous lesson to encourage you to come up with a more
# efficient solution. If you need a hint, check out the comments
# in solution.py from the previous lecture.
# Use to same seed to make debugging easier
np.random.seed(42)
n_records, n_features = features.shape
last_loss = None
# Initialize weights
weights = np.random.normal(scale=1 / n_features**.5, size=n_features)
# Neural Network hyperparameters
epochs = 1000
learnrate = 0.5
for e in range(epochs):
del_w = np.zeros(weights.shape)
for x, y in zip(features.values, targets):
# Loop through all records, x is the input, y is the target
# Activation of the output unit
# Notice we multiply the inputs and the weights here
# rather than storing h as a separate variable
output = sigmoid(np.dot(x, weights))
# The error, the target minus the network output
error = y - output
# The error term
# Notice we calulate f'(h) here instead of defining a separate
# sigmoid_prime function. This just makes it faster because we
# can re-use the result of the sigmoid function stored in
# the output variable
error_term = error * output * (1 - output)
# The gradient descent step, the error times the gradient times the inputs
del_w += error_term * x
# Update the weights here. The learning rate times the
# change in weights, divided by the number of records to average
weights += learnrate * del_w / n_records
# Printing out the mean square error on the training set
if e % (epochs / 10) == 0:
out = sigmoid(np.dot(features, weights))
loss = np.mean((out - targets) ** 2)
if last_loss and last_loss < loss:
print("Train loss: ", loss, " WARNING - Loss Increasing")
else:
print("Train loss: ", loss)
last_loss = loss
# Calculate accuracy on test data
tes_out = sigmoid(np.dot(features_test, weights))
predictions = tes_out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))
In [55]:
import numpy as np
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1/(1+np.exp(-x))
# Network size
N_input = 4
N_hidden = 3
N_output = 2
np.random.seed(42)
# Make some fake data
X = np.random.randn(4)
weights_input_to_hidden = np.random.normal(0, scale=0.1, size=(N_input, N_hidden))
weights_hidden_to_output = np.random.normal(0, scale=0.1, size=(N_hidden, N_output))
# TODO: Make a forward pass through the network
hidden_layer_in = np.dot(X, weights_input_to_hidden)
hidden_layer_out = sigmoid(hidden_layer_in)
print('Hidden-layer Output:')
print(hidden_layer_out)
output_layer_in = np.dot(hidden_layer_out, weights_hidden_to_output)
output_layer_out = sigmoid(output_layer_in)
print('Output-layer Output:')
print(output_layer_out)
In [56]:
import numpy as np
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1 / (1 + np.exp(-x))
x = np.array([0.5, 0.1, -0.2])
target = 0.6
learnrate = 0.5
weights_input_hidden = np.array([[0.5, -0.6],
[0.1, -0.2],
[0.1, 0.7]])
weights_hidden_output = np.array([0.1, -0.3])
## Forward pass
hidden_layer_input = np.dot(x, weights_input_hidden)
hidden_layer_output = sigmoid(hidden_layer_input)
output_layer_in = np.dot(hidden_layer_output, weights_hidden_output)
output = sigmoid(output_layer_in)
## Backwards pass
## TODO: Calculate output error
error = target - output
# TODO: Calculate error term for output layer
output_error_term = error * output * (1 - output)
# TODO: Calculate error term for hidden layer
hidden_error_term = np.dot(output_error_term, weights_hidden_output) * \
hidden_layer_output * (1 - hidden_layer_output)
# TODO: Calculate change in weights for hidden layer to output layer
delta_w_h_o = learnrate * output_error_term * hidden_layer_output
# TODO: Calculate change in weights for input layer to hidden layer
delta_w_i_h = learnrate * hidden_error_term * x[:, None]
print('Change in weights for hidden layer to output layer:')
print(delta_w_h_o)
print('Change in weights for input layer to hidden layer:')
print(delta_w_i_h)
In [ ]: