Tutorial based on 6.1 Example: Learning XOR (page 170 in Deep Learning book)
The XOR function (“exclusive or”): operation on two binary values, x1 and x2. When only one of these values==1, the XOR function returns 1. Otherwise, 0. Right now, not concerned with statistical generalization. We want our network to perform correctly on the four points X = {[0,0] , [0,1] , [1,0] , and [1,1] }.
We can treat this problem as a regression problem and use a mean squared error loss function to simplify the math for this example as much as possible (there are better approaches for modeling binary data).
In [1]:
#load tensorflow
import tensorflow as tf
Network 1 - Single Layer
We can minimize in closed form with respect to w and b using the normal equations. After solving the normal equations, we obtain w = 0 and b = 1/2. The linear model simply outputs 0.5 everywhere.
In [2]:
##Network 1##
sess1 = tf.Session()
#input data
XOR_X = [[0,0],[0,1],[1,0],[1,1]] #input
#placeholders
x_ = tf.placeholder(tf.float32, shape=[4,2], name="x-input")
#use weights/biases from book solution (page 171)
w = tf.Variable(tf.zeros([2,1]), tf.float32)
b = tf.Variable([1/2.], tf.float32)
init = tf.global_variables_initializer()
sess1.run(init)
#operation node
linear_model = tf.matmul(x_,w) + b
#see what the predictions are
print(sess1.run(linear_model, {x_: XOR_X}))
Network 2 - Two Layers
Solve the same problem using a model that learns a different feature space in which a linear model is able to represent the solution. Introduce a very simple feedforward network with one hidden layer containing two hidden units -> change what is given to output layer.
In [3]:
##Network 2##
sess2 = tf.Session()
#input data
XOR_X = [[0,0],[0,1],[1,0],[1,1]] #input
#placeholders
x_ = tf.placeholder(tf.float32, shape=[4,2], name="x-input")
#use weights/biases from book example (page 173)
w1 = tf.Variable(tf.ones([2,2]), tf.float32) #W
w2 = tf.Variable([[1.],[-2.]], tf.float32) #w
b1 = tf.Variable([[0.,-1.]], tf.float32) #c
b2 = tf.Variable(tf.zeros(1), tf.float32) #b
init2 = tf.global_variables_initializer()
sess2.run(init2)
#operation nodes
transformedH = tf.nn.relu(tf.matmul(x_,w1) + b1, name=None) #hidden layer with rect. linear act. func.
linear_model = tf.matmul(transformedH, w2) + b2
#see what the predictions are
print(sess2.run(linear_model, {x_: XOR_X}))
Network 3 - Two Layers + Optimization w/random initial param weights
In a real situation, there are lots of model parameters and training examples, we cannot guess the solution as we did above. Instead, a gradient-based optimization algorithm can find parameters that produce very little error.
Now solve the same problem but let's use gradient-based optimization to find params in order to do so need to measure error/loss (also need predicted values!).
In [4]:
##Network 3##
sess3 = tf.Session()
#input data
XOR_X = [[0,0],[0,1],[1,0],[1,1]] #input
XOR_Y = [[0],[1],[1],[0]] #predicted
#placeholders, now we need one for predicted vals too
x_ = tf.placeholder(tf.float32, shape=[4,2], name="x-input")
y_ = tf.placeholder(tf.float32, shape=[4,1], name="y-input")
#now we will define with some random values as starting points
w1 = tf.Variable(tf.random_uniform([2,2], -2, 2), tf.float32) #W
w2 = tf.Variable(tf.random_uniform([2,1], -2, 2), tf.float32) #w
b1 = tf.Variable(tf.zeros([2]), tf.float32) #c
b2 = tf.Variable(tf.zeros([1]), tf.float32) #b
#operation nodes
transformedH = tf.nn.relu(tf.matmul(x_,w1) + b1) #hidden layer with rect. linear act. func.
linear_model = tf.matmul(transformedH, w2) + b2
#MSE
loss = tf.reduce_sum(tf.square(linear_model - y_)) #create error vector.We call tf.square to square that error.
#gradient descent
optimizer = tf.train.GradientDescentOptimizer(0.01) #0.01 is learning rate
train = optimizer.minimize(loss) #feed optimizer loss function
init3 = tf.global_variables_initializer()
sess3.run(init3)
#train it
for i in range(10000):
sess3.run(train, {x_: XOR_X, y_: XOR_Y})
#take a look at the results
predictions = sess3.run(linear_model, {x_: XOR_X})
curr_w1, curr_w2, curr_b1, curr_b2, curr_loss = sess3.run([w1, w2, b1, b2, loss], {x_: XOR_X, y_: XOR_Y})
hidlay = sess3.run(transformedH, {x_: XOR_X, y_: XOR_Y})
print("predictions:\n %s\n hlayout:\n %s\n"%(predictions,hidlay))
print("w1:\n %s \nw2:\n %s \nb1: %s \nb2: %s \nloss: %s"%(curr_w1, curr_w2, curr_b1, curr_b2, curr_loss))
Network 4 - Two Layers + Optimization w/non-random initial param weights
Using the approach above we will often find a different solution because the minima found depends on the rand. initial weights (if sess3 ran enough, will find similar solution 2 examples every once in a while). If we set the weights closer to the values provided in the example we consistently get the same results.
In [5]:
##Network 4##
sess4 = tf.Session()
#input data
XOR_X = [[0,0],[0,1],[1,0],[1,1]] #input
XOR_Y = [[0],[1],[1],[0]] #predicted
#placeholders, now we need one for predicted vals too
x_ = tf.placeholder(tf.float32, shape=[4,2], name="x-input")
y_ = tf.placeholder(tf.float32, shape=[4,1], name="y-input")
#constrain rand. values
w1 = tf.Variable(tf.random_uniform([2,2], .7, 1.3), tf.float32) #W
w2 = tf.Variable(tf.random_uniform([2,1], -2, 1), tf.float32) #w
b1 = tf.Variable(tf.zeros([2]), tf.float32) #c
b2 = tf.Variable(tf.zeros([1]), tf.float32) #b
#operation nodes
transformedH = tf.nn.relu(tf.matmul(x_,w1) + b1) #hidden layer with rect. linear act. func.
linear_model = tf.matmul(transformedH, w2) + b2
#MSE
loss = tf.reduce_sum(tf.square(linear_model - y_)) #create error vector.We call tf.square to square that error.
#gradient descent
optimizer = tf.train.GradientDescentOptimizer(0.01) #0.01 is learning rate
train = optimizer.minimize(loss) #feed optimizer loss function
init4 = tf.global_variables_initializer()
sess4.run(init4)
#train it
for i in range(10000):
sess4.run(train, {x_: XOR_X, y_: XOR_Y})
#stake a look at the results
predictions = sess4.run(linear_model, {x_: XOR_X})
curr_w1, curr_w2, curr_b1, curr_b2, curr_loss = sess4.run([w1, w2, b1, b2, loss], {x_: XOR_X, y_: XOR_Y})
hidlay = sess4.run(transformedH, {x_: XOR_X, y_: XOR_Y})
print("predictions:\n %s\n hlayout:\n %s\n"%(predictions,hidlay))
print("w1:\n %s \nw2:\n %s \nb1: %s \nb2: %s \nloss: %s"%(curr_w1, curr_w2, curr_b1, curr_b2, curr_loss))
Is there a better way to save out diagnostic info?
The network below includes several summary ops and saves the graph so we can take a closer look in TensorBoard. Once the network has run and event files have been generated in your log directory enter the code below into a temrinal to launch TensorBoard:
tensorboard --logdir=/path/to/logdir
You should get a message that looks like this (paste the link in your internet browser):
Starting TensorBoard 41 on port 6006 (You can navigate to http://168.150.16.155:6006)
In [6]:
#Clear the default graph stack and reset the global default graph.
tf.reset_default_graph()
import time
log_dir = "/Users/bmk/Google Drive/desktopBackup/PSC211_S17/tensorBoardFiles/NSC211_BKlecture_code"#"./tesnorBoardFiles/NSC211_BKlecture_code"
start_time = time.time() #record start time later
#input data
XOR_X = [[0,0],[0,1],[1,0],[1,1]] #input
XOR_Y = [[0],[1],[1],[0]] #predicted
#placeholders, now we need one for predicted vals too
x_ = tf.placeholder(tf.float32, shape=[4,2], name="x-input")
y_ = tf.placeholder(tf.float32, shape=[4,1], name="y-input")
#constrain rand. values
w1 = tf.Variable(tf.random_uniform([2,2], .7, 1.3), tf.float32,name="L1_weights") #W
w2 = tf.Variable(tf.random_uniform([2,1], -2, 1), tf.float32,name="L2_weights") #w
b1 = tf.Variable(tf.zeros([2]), tf.float32,name="L1_biases") #c
b2 = tf.Variable(tf.zeros([1]), tf.float32,name="L2_biases") #b
#add summary histograms
tf.summary.histogram('layer1_weights',w1)
tf.summary.histogram('layer2_weights',w2)
tf.summary.histogram('layer1_biases',b1)
tf.summary.histogram('layer2_biases',b2)
#operation nodes
transformedH = tf.nn.relu(tf.matmul(x_,w1) + b1) #hidden layer with rect. linear act. func.
tf.summary.histogram('transformed_output',transformedH)
linear_model = tf.matmul(transformedH, w2) + b2
tf.summary.histogram("predicted", linear_model)
#MSE
loss = tf.reduce_sum(tf.square(linear_model - y_)) #create error vector.We call tf.square to square that error.
tf.summary.scalar("curr_loss", loss)
#gradient descent
optimizer = tf.train.GradientDescentOptimizer(0.01) #0.01 is learning rate
train = optimizer.minimize(loss) #feed optimizer loss function
# Build the summary Tensor based on the TF collection of summaries.
summary = tf.summary.merge_all()
init = tf.global_variables_initializer()
sess = tf.Session()
# Instantiate a SummaryWriter to output summaries and the Graph.
summary_writer = tf.summary.FileWriter(log_dir, sess.graph)
sess.run(init)
#train it
for i in range(10000):
fdict = {x_: XOR_X, y_: XOR_Y}
#run the session and get loss and summary info
_, curr_loss, suminfo = sess.run([train, loss, summary], feed_dict=fdict)
duration = time.time() - start_time
# Write the summaries and print an overview every 100 trials.
if i % 100 == 0:
# Print status to stdout.
print("Step %d: loss = %.2f (%.3f sec)" % (i, curr_loss, duration))
# Update the events file.
summary_writer.add_summary(suminfo, i)
summary_writer.flush()
#take a look at the results
predictions = sess.run(linear_model, {x_: XOR_X})
curr_w1, curr_w2, curr_b1, curr_b2, curr_loss = sess.run([w1, w2, b1, b2, loss], {x_: XOR_X, y_: XOR_Y})
hidlay = sess.run(transformedH, {x_: XOR_X, y_: XOR_Y})
print("predictions:\n %s\n hlayout:\n %s\n"%(predictions,hidlay))
print("w1:\n %s \nw2:\n %s \nb1: %s \nb2: %s \nloss: %s"%(curr_w1, curr_w2, curr_b1, curr_b2, curr_loss))
In [ ]: