Tutorial based on 6.1 Example: Learning XOR (page 170 in Deep Learning book)

The XOR function (“exclusive or”): operation on two binary values, x1 and x2. When only one of these values==1, the XOR function returns 1. Otherwise, 0. Right now, not concerned with statistical generalization. We want our network to perform correctly on the four points X = {[0,0] , [0,1] , [1,0] , and [1,1] }.

We can treat this problem as a regression problem and use a mean squared error loss function to simplify the math for this example as much as possible (there are better approaches for modeling binary data).


In [1]:
#load tensorflow
import tensorflow as tf

Network 1 - Single Layer

We can minimize in closed form with respect to w and b using the normal equations. After solving the normal equations, we obtain w = 0 and b = 1/2. The linear model simply outputs 0.5 everywhere.


In [2]:
##Network 1##
sess1 = tf.Session()
#input data
XOR_X = [[0,0],[0,1],[1,0],[1,1]] #input

#placeholders
x_ = tf.placeholder(tf.float32, shape=[4,2], name="x-input")
#use weights/biases from book solution (page 171)
w = tf.Variable(tf.zeros([2,1]), tf.float32)
b = tf.Variable([1/2.], tf.float32)

init = tf.global_variables_initializer()
sess1.run(init)

#operation node
linear_model = tf.matmul(x_,w) + b 

#see what the predictions are
print(sess1.run(linear_model, {x_: XOR_X}))


[[ 0.5]
 [ 0.5]
 [ 0.5]
 [ 0.5]]

Network 2 - Two Layers

Solve the same problem using a model that learns a different feature space in which a linear model is able to represent the solution. Introduce a very simple feedforward network with one hidden layer containing two hidden units -> change what is given to output layer.


In [3]:
##Network 2##
sess2 = tf.Session()
#input data
XOR_X = [[0,0],[0,1],[1,0],[1,1]] #input

#placeholders
x_ = tf.placeholder(tf.float32, shape=[4,2], name="x-input")
#use weights/biases from book example (page 173)
w1 = tf.Variable(tf.ones([2,2]), tf.float32) #W
w2 = tf.Variable([[1.],[-2.]], tf.float32) #w
b1 = tf.Variable([[0.,-1.]], tf.float32) #c
b2 = tf.Variable(tf.zeros(1), tf.float32) #b

init2 = tf.global_variables_initializer()
sess2.run(init2)

#operation nodes
transformedH = tf.nn.relu(tf.matmul(x_,w1) + b1, name=None) #hidden layer with rect. linear act. func.
linear_model = tf.matmul(transformedH, w2) + b2

#see what the predictions are
print(sess2.run(linear_model, {x_: XOR_X}))


[[ 0.]
 [ 1.]
 [ 1.]
 [ 0.]]

Network 3 - Two Layers + Optimization w/random initial param weights

In a real situation, there are lots of model parameters and training examples, we cannot guess the solution as we did above. Instead, a gradient-based optimization algorithm can find parameters that produce very little error.

Now solve the same problem but let's use gradient-based optimization to find params in order to do so need to measure error/loss (also need predicted values!).


In [4]:
##Network 3##
sess3 = tf.Session()
#input data
XOR_X = [[0,0],[0,1],[1,0],[1,1]] #input
XOR_Y = [[0],[1],[1],[0]] #predicted

#placeholders, now we need one for predicted vals too
x_ = tf.placeholder(tf.float32, shape=[4,2], name="x-input")
y_ = tf.placeholder(tf.float32, shape=[4,1], name="y-input") 
#now we will define with some random values as starting points 
w1 = tf.Variable(tf.random_uniform([2,2], -2, 2), tf.float32) #W
w2 = tf.Variable(tf.random_uniform([2,1], -2, 2), tf.float32) #w
b1 = tf.Variable(tf.zeros([2]), tf.float32) #c
b2 = tf.Variable(tf.zeros([1]), tf.float32) #b

#operation nodes
transformedH = tf.nn.relu(tf.matmul(x_,w1) + b1) #hidden layer with rect. linear act. func.
linear_model = tf.matmul(transformedH, w2) + b2

#MSE
loss = tf.reduce_sum(tf.square(linear_model - y_)) #create error vector.We call tf.square to square that error.

#gradient descent 
optimizer = tf.train.GradientDescentOptimizer(0.01) #0.01 is learning rate
train = optimizer.minimize(loss) #feed optimizer loss function 

init3 = tf.global_variables_initializer()
sess3.run(init3)

#train it
for i in range(10000):
        sess3.run(train, {x_: XOR_X, y_: XOR_Y})

#take a look at the results
predictions = sess3.run(linear_model, {x_: XOR_X}) 
curr_w1, curr_w2, curr_b1, curr_b2, curr_loss  = sess3.run([w1, w2, b1, b2, loss], {x_: XOR_X, y_: XOR_Y})
hidlay  = sess3.run(transformedH, {x_: XOR_X, y_: XOR_Y})
print("predictions:\n %s\n hlayout:\n %s\n"%(predictions,hidlay))
print("w1:\n %s \nw2:\n %s \nb1: %s \nb2: %s \nloss: %s"%(curr_w1, curr_w2, curr_b1, curr_b2, curr_loss))


predictions:
 [[ 0.33333355]
 [ 1.        ]
 [ 0.33333355]
 [ 0.33333355]]
 hlayout:
 [[ 0.         0.       ]
 [ 0.         1.0122894]
 [ 0.         0.       ]
 [ 0.         0.       ]]

w1:
 [[ 0.09127363 -1.01660681]
 [-0.91181755  1.01512766]] 
w2:
 [[-1.87455237]
 [ 0.65857297]] 
b1: [-0.13342872 -0.00283831] 
b2: [ 0.33333355] 
loss: 0.666667

Network 4 - Two Layers + Optimization w/non-random initial param weights

Using the approach above we will often find a different solution because the minima found depends on the rand. initial weights (if sess3 ran enough, will find similar solution 2 examples every once in a while). If we set the weights closer to the values provided in the example we consistently get the same results.


In [5]:
##Network 4##
sess4 = tf.Session()
#input data
XOR_X = [[0,0],[0,1],[1,0],[1,1]] #input
XOR_Y = [[0],[1],[1],[0]] #predicted

#placeholders, now we need one for predicted vals too
x_ = tf.placeholder(tf.float32, shape=[4,2], name="x-input")
y_ = tf.placeholder(tf.float32, shape=[4,1], name="y-input") 
#constrain rand. values
w1 = tf.Variable(tf.random_uniform([2,2], .7, 1.3), tf.float32) #W
w2 = tf.Variable(tf.random_uniform([2,1], -2, 1), tf.float32) #w
b1 = tf.Variable(tf.zeros([2]), tf.float32) #c
b2 = tf.Variable(tf.zeros([1]), tf.float32) #b

#operation nodes
transformedH = tf.nn.relu(tf.matmul(x_,w1) + b1) #hidden layer with rect. linear act. func.
linear_model = tf.matmul(transformedH, w2) + b2

#MSE
loss = tf.reduce_sum(tf.square(linear_model - y_)) #create error vector.We call tf.square to square that error.

#gradient descent 
optimizer = tf.train.GradientDescentOptimizer(0.01) #0.01 is learning rate
train = optimizer.minimize(loss) #feed optimizer loss function 

init4 = tf.global_variables_initializer()
sess4.run(init4)

#train it
for i in range(10000):
        sess4.run(train, {x_: XOR_X, y_: XOR_Y})

#stake a look at the results
predictions = sess4.run(linear_model, {x_: XOR_X}) 
curr_w1, curr_w2, curr_b1, curr_b2, curr_loss  = sess4.run([w1, w2, b1, b2, loss], {x_: XOR_X, y_: XOR_Y})
hidlay  = sess4.run(transformedH, {x_: XOR_X, y_: XOR_Y})
print("predictions:\n %s\n hlayout:\n %s\n"%(predictions,hidlay))
print("w1:\n %s \nw2:\n %s \nb1: %s \nb2: %s \nloss: %s"%(curr_w1, curr_w2, curr_b1, curr_b2, curr_loss))


predictions:
 [[  2.80746485e-06]
 [  9.99997854e-01]
 [  9.99997854e-01]
 [  1.48406957e-06]]
 hlayout:
 [[  1.32329925e-08   0.00000000e+00]
 [  1.09424460e+00   0.00000000e+00]
 [  1.09424460e+00   0.00000000e+00]
 [  2.18848920e+00   1.14814603e+00]]

w1:
 [[ 1.0942446   1.14814603]
 [ 1.0942446   1.14814603]] 
w2:
 [[ 0.91386795]
 [-1.7419312 ]] 
b1: [  1.32329925e-08  -1.14814603e+00] 
b2: [  2.79537176e-06] 
loss: 1.9293e-11

Is there a better way to save out diagnostic info?

The network below includes several summary ops and saves the graph so we can take a closer look in TensorBoard. Once the network has run and event files have been generated in your log directory enter the code below into a temrinal to launch TensorBoard:

tensorboard --logdir=/path/to/logdir

You should get a message that looks like this (paste the link in your internet browser):

Starting TensorBoard 41 on port 6006 (You can navigate to http://168.150.16.155:6006)


In [6]:
#Clear the default graph stack and reset the global default graph.
tf.reset_default_graph()

import time

log_dir = "/Users/bmk/Google Drive/desktopBackup/PSC211_S17/tensorBoardFiles/NSC211_BKlecture_code"#"./tesnorBoardFiles/NSC211_BKlecture_code" 
start_time = time.time() #record start time later 

#input data
XOR_X = [[0,0],[0,1],[1,0],[1,1]] #input
XOR_Y = [[0],[1],[1],[0]] #predicted

#placeholders, now we need one for predicted vals too
x_ = tf.placeholder(tf.float32, shape=[4,2], name="x-input")
y_ = tf.placeholder(tf.float32, shape=[4,1], name="y-input") 
#constrain rand. values
w1 = tf.Variable(tf.random_uniform([2,2], .7, 1.3), tf.float32,name="L1_weights") #W
w2 = tf.Variable(tf.random_uniform([2,1], -2, 1), tf.float32,name="L2_weights") #w
b1 = tf.Variable(tf.zeros([2]), tf.float32,name="L1_biases") #c
b2 = tf.Variable(tf.zeros([1]), tf.float32,name="L2_biases") #b

#add summary histograms
tf.summary.histogram('layer1_weights',w1)
tf.summary.histogram('layer2_weights',w2)
tf.summary.histogram('layer1_biases',b1)
tf.summary.histogram('layer2_biases',b2)

#operation nodes
transformedH = tf.nn.relu(tf.matmul(x_,w1) + b1) #hidden layer with rect. linear act. func.
tf.summary.histogram('transformed_output',transformedH)

linear_model = tf.matmul(transformedH, w2) + b2
tf.summary.histogram("predicted", linear_model)

#MSE
loss = tf.reduce_sum(tf.square(linear_model - y_)) #create error vector.We call tf.square to square that error.
tf.summary.scalar("curr_loss", loss)

#gradient descent 
optimizer = tf.train.GradientDescentOptimizer(0.01) #0.01 is learning rate
train = optimizer.minimize(loss) #feed optimizer loss function 

# Build the summary Tensor based on the TF collection of summaries.
summary = tf.summary.merge_all()

init = tf.global_variables_initializer()
sess = tf.Session()

# Instantiate a SummaryWriter to output summaries and the Graph.
summary_writer = tf.summary.FileWriter(log_dir, sess.graph)

sess.run(init)

#train it
for i in range(10000):
        fdict = {x_: XOR_X, y_: XOR_Y}
        #run the session and get loss and summary info
        _, curr_loss, suminfo = sess.run([train, loss, summary], feed_dict=fdict)
        duration = time.time() - start_time
        # Write the summaries and print an overview every 100 trials.
        if i % 100 == 0:
            # Print status to stdout.
            print("Step %d: loss = %.2f (%.3f sec)" % (i, curr_loss, duration))
            # Update the events file.
            summary_writer.add_summary(suminfo, i)
            summary_writer.flush()

#take a look at the results
predictions = sess.run(linear_model, {x_: XOR_X}) 
curr_w1, curr_w2, curr_b1, curr_b2, curr_loss  = sess.run([w1, w2, b1, b2, loss], {x_: XOR_X, y_: XOR_Y})
hidlay  = sess.run(transformedH, {x_: XOR_X, y_: XOR_Y})
print("predictions:\n %s\n hlayout:\n %s\n"%(predictions,hidlay))
print("w1:\n %s \nw2:\n %s \nb1: %s \nb2: %s \nloss: %s"%(curr_w1, curr_w2, curr_b1, curr_b2, curr_loss))


---------------------------------------------------------------------------
PermissionDeniedError                     Traceback (most recent call last)
<ipython-input-6-ed3c7e23db3b> in <module>()
     48 
     49 # Instantiate a SummaryWriter to output summaries and the Graph.
---> 50 summary_writer = tf.summary.FileWriter(log_dir, sess.graph)
     51 
     52 sess.run(init)

/Users/jdstokes/anaconda/envs/datasci/lib/python2.7/site-packages/tensorflow/python/summary/writer/writer.pyc in __init__(self, logdir, graph, max_queue, flush_secs, graph_def)
    306       graph_def: DEPRECATED: Use the `graph` argument instead.
    307     """
--> 308     event_writer = EventFileWriter(logdir, max_queue, flush_secs)
    309     super(FileWriter, self).__init__(event_writer, graph, graph_def)
    310 

/Users/jdstokes/anaconda/envs/datasci/lib/python2.7/site-packages/tensorflow/python/summary/writer/event_file_writer.pyc in __init__(self, logdir, max_queue, flush_secs)
     61     self._logdir = logdir
     62     if not gfile.IsDirectory(self._logdir):
---> 63       gfile.MakeDirs(self._logdir)
     64     self._event_queue = six.moves.queue.Queue(max_queue)
     65     self._ev_writer = pywrap_tensorflow.EventsWriter(

/Users/jdstokes/anaconda/envs/datasci/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.pyc in recursive_create_dir(dirname)
    312   """
    313   with errors.raise_exception_on_not_ok_status() as status:
--> 314     pywrap_tensorflow.RecursivelyCreateDir(compat.as_bytes(dirname), status)
    315 
    316 

/Users/jdstokes/anaconda/envs/datasci/lib/python2.7/contextlib.pyc in __exit__(self, type, value, traceback)
     22         if type is None:
     23             try:
---> 24                 self.gen.next()
     25             except StopIteration:
     26                 return

/Users/jdstokes/anaconda/envs/datasci/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.pyc in raise_exception_on_not_ok_status()
    464           None, None,
    465           compat.as_text(pywrap_tensorflow.TF_Message(status)),
--> 466           pywrap_tensorflow.TF_GetCode(status))
    467   finally:
    468     pywrap_tensorflow.TF_DeleteStatus(status)

PermissionDeniedError: /Users/bmk

In [ ]: