TensorFlow Fold Quick Start

TensorFlow Fold is a library for turning complicated Python data structures into TensorFlow Tensors.



In [1]:

    
# boilerplate
import random
import tensorflow as tf
sess = tf.InteractiveSession()
import tensorflow_fold as td

The basic elements of Fold are blocks. We'll start with some blocks that work on simple data types.



In [2]:

    
scalar_block = td.Scalar()
vector3_block = td.Vector(3)

Blocks are functions with associated input and output types.



In [3]:

    
def block_info(block):
    print("%s: %s -> %s" % (block, block.input_type, block.output_type))
    
block_info(scalar_block)
block_info(vector3_block)









    



<td.Scalar dtype='float32'>: PyObjectType() -> TensorType((), 'float32')
<td.Vector dtype='float32' size=3>: PyObjectType() -> TensorType((3,), 'float32')

We can use eval() to see what a block does with its input:



In [4]:

    
scalar_block.eval(42)









    Out[4]:





array(42.0, dtype=float32)



In [5]:

    
vector3_block.eval([1,2,3])









    Out[5]:





array([ 1.,  2.,  3.], dtype=float32)

Not very exciting. We can compose simple blocks together with Record, like so:



In [6]:

    
record_block = td.Record({'foo': scalar_block, 'bar': vector3_block})
block_info(record_block)









    



<td.Record ordered=False>: PyObjectType() -> TupleType(TensorType((3,), 'float32'), TensorType((), 'float32'))

We can see that Fold's type system is a bit richer than vanilla TF; we have tuple types! Running a record block does what you'd expect:



In [7]:

    
record_block.eval({'foo': 1, 'bar': [5, 7, 9]})









    Out[7]:





(array([ 5.,  7.,  9.], dtype=float32), array(1.0, dtype=float32))

One useful thing you can do with blocks is wire them up to create pipelines using the >> operator, which performs function composition. For example, we can take our two tuple tensors and compose it with Concat, like so:



In [8]:

    
record2vec_block = record_block >> td.Concat()
record2vec_block.eval({'foo': 1, 'bar': [5, 7, 9]})









    Out[8]:





array([ 5.,  7.,  9.,  1.], dtype=float32)

Note that because Python dicts are unordered, Fold always sorts the outputs of a record block by dictionary key. If you want to preserve order you can construct a Record block from an OrderedDict.

The whole point of Fold is to get your data into TensorFlow; the Function block lets you convert a TITO (Tensors In, Tensors Out) function to a block:



In [9]:

    
negative_block = record2vec_block >> td.Function(tf.negative)
negative_block.eval({'foo': 1, 'bar': [5, 7, 9]})









    Out[9]:





array([-5., -7., -9., -1.], dtype=float32)

This is all very cute, but where's the beef? Things start to get interesting when our inputs contain sequences of indeterminate length. The Map block comes in handy here:



In [10]:

    
map_scalars_block = td.Map(td.Scalar())

There's no TF type for sequences of indeterminate length, but Fold has one:



In [11]:

    
block_info(map_scalars_block)









    



<td.Map element_block=<td.Scalar dtype='float32'>>: None -> SequenceType(TensorType((), 'float32'))

Right, but you've done the TF RNN Tutorial and even poked at seq-to-seq. You're a wizard with dynamic rnns. What does Fold offer?

Well, how about jagged arrays?



In [12]:

    
jagged_block = td.Map(td.Map(td.Scalar()))
block_info(jagged_block)









    



<td.Map element_block=<td.Map element_block=<td.Scalar dtype='float32'>>>: None -> SequenceType(SequenceType(TensorType((), 'float32')))

The Fold type system is fully compositional; any block you can create can be composed with Map to create a sequence, or Record to create a tuple, or both to create sequences of tuples or tuples of sequences:



In [13]:

    
seq_of_tuples_block = td.Map(td.Record({'foo': td.Scalar(), 'bar': td.Scalar()}))
seq_of_tuples_block.eval([{'foo': 1, 'bar': 2}, {'foo': 3, 'bar': 4}])









    Out[13]:





[(array(2.0, dtype=float32), array(1.0, dtype=float32)),
 (array(4.0, dtype=float32), array(3.0, dtype=float32))]



In [14]:

    
tuple_of_seqs_block = td.Record({'foo': td.Map(td.Scalar()), 'bar': td.Map(td.Scalar())})
tuple_of_seqs_block.eval({'foo': range(3), 'bar': range(7)})









    Out[14]:





([array(0.0, dtype=float32),
  array(1.0, dtype=float32),
  array(2.0, dtype=float32),
  array(3.0, dtype=float32),
  array(4.0, dtype=float32),
  array(5.0, dtype=float32),
  array(6.0, dtype=float32)],
 [array(0.0, dtype=float32),
  array(1.0, dtype=float32),
  array(2.0, dtype=float32)])

Most of the time, you'll eventually want to get one or more tensors out of your sequence, for wiring up to your particular learning task. Fold has a bunch of built-in reduction functions for this that do more or less what you'd expect:



In [15]:

    
((td.Map(td.Scalar()) >> td.Sum()).eval(range(10)),
 (td.Map(td.Scalar()) >> td.Min()).eval(range(10)),
 (td.Map(td.Scalar()) >> td.Max()).eval(range(10)))









    Out[15]:





(array(45.0, dtype=float32),
 array(0.0, dtype=float32),
 array(9.0, dtype=float32))

The general form of such functions is Reduce:



In [16]:

    
(td.Map(td.Scalar()) >> td.Reduce(td.Function(tf.multiply))).eval(range(1,10))









    Out[16]:





array(362880.0, dtype=float32)

If the order of operations is important, you should use Fold instead of Reduce (but if you can use Reduce you should, because it will be faster):



In [17]:

    
((td.Map(td.Scalar()) >> td.Fold(td.Function(tf.divide), tf.ones([]))).eval(range(1,5)),
 (td.Map(td.Scalar()) >> td.Reduce(td.Function(tf.divide), tf.ones([]))).eval(range(1,5)))  # bad, not associative!









    Out[17]:





(array(0.0416666679084301, dtype=float32),
 array(0.6666666865348816, dtype=float32))

Now, let's do some learning! This is the part where "magic" happens; if you want a deeper understanding of what's happening here you might want to jump right to our more formal blocks tutorial or learn more about running blocks in TensorFlow



In [18]:

    
def reduce_net_block():
    net_block = td.Concat() >> td.FC(20) >> td.FC(1, activation=None) >> td.Function(lambda xs: tf.squeeze(xs, axis=1))
    return td.Map(td.Scalar()) >> td.Reduce(net_block)

The reduce_net_block function creates a block (net_block) that contains a two-layer fully connected (FC) network that takes a pair of scalar tensors as input and produces a scalar tensor as output. This network gets applied in a binary tree to reduce a sequence of scalar tensors to a single scalar tensor.

One thing to notice here is that we are calling tf.squeeze with axis=1, even though the Fold output type of td.FC(1, activation=None) (and hence the input type of the enclosing Function block) is a TensorType with shape (1). This is because all Fold blocks actually run on TF tensors with an implicit leading batch dimension, which enables execution via dynamic batching. It is important to bear this in mind when creating Function blocks that wrap functions that are not applied elementwise.



In [19]:

    
def random_example(fn):
    length = random.randrange(1, 10)
    data = [random.uniform(0,1) for _ in range(length)]
    result = fn(data)
    return data, result

The random_example function generates training data consisting of (example, fn(example)) pairs, where example is a random list of numbers, e.g.:



In [20]:

    
random_example(sum)









    Out[20]:





([0.787305870095568,
  0.22965378372211998,
  0.37373230100201726,
  0.5763790512875622,
  0.8213490322728823,
  0.8670031890415114],
 3.655423227421661)



In [21]:

    
random_example(min)









    Out[21]:





([0.6092255329819952, 0.3401567642529808, 0.20512903038956665],
 0.20512903038956665)



In [22]:

    
def train(fn, batch_size=100):
    net_block = reduce_net_block()
    compiler = td.Compiler.create((net_block, td.Scalar()))
    y, y_ = compiler.output_tensors
    loss = tf.nn.l2_loss(y - y_)
    train = tf.train.AdamOptimizer().minimize(loss)
    sess.run(tf.global_variables_initializer())
    validation_fd = compiler.build_feed_dict(random_example(fn) for _ in range(1000))
    for i in range(2000):
        sess.run(train, compiler.build_feed_dict(random_example(fn) for _ in range(batch_size)))
        if i % 100 == 0:
            print(i, sess.run(loss, validation_fd))
    return net_block

Now we're going to train a neural network to approximate a reduction function of our choosing. Calling eval() repeatedly is super-slow and cannot exploit batch-wise parallelism, so we create a Compiler. See our page on running blocks in TensorFlow for more on Compilers and how to use them effectively.



In [23]:

    
sum_block = train(sum)









    



/usr/local/google/home/madscience/nuke/v3/local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py:91: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "






    



(0, 3709.2959)
(100, 117.03122)
(200, 75.517761)
(300, 39.155235)
(400, 10.953562)
(500, 4.590332)
(600, 2.8660746)
(700, 2.0546255)
(800, 1.573489)
(900, 1.2537044)
(1000, 1.0065227)
(1100, 0.82658422)
(1200, 0.67432761)
(1300, 0.55223799)
(1400, 0.46296757)
(1500, 0.38158983)
(1600, 0.316338)
(1700, 0.26881805)
(1800, 0.22481206)
(1900, 0.20074199)



In [24]:

    
sum_block.eval([1, 1])









    Out[24]:





array(2.006655216217041, dtype=float32)

Breaking news: deep neural network learns to calculate 1 + 1!!!!

Of course we've done something a little sneaky here by constructing a model that can only represent associative functions and then training it to compute an associative function. The technical term for being sneaky in machine learning is inductive bias.



In [25]:

    
min_block = train(min)









    



(0, 499.09598)
(100, 46.026665)
(200, 25.741219)
(300, 18.191158)
(400, 14.682983)
(500, 12.306305)
(600, 10.402517)
(700, 8.670351)
(800, 6.9115524)
(900, 5.1144924)
(1000, 3.6718786)
(1100, 2.6184769)
(1200, 2.0114093)
(1300, 1.6398822)
(1400, 1.3298371)
(1500, 1.0525734)
(1600, 0.77793711)
(1700, 0.55954146)
(1800, 0.40301239)
(1900, 0.2982769)



In [26]:

    
min_block.eval([2, -1, 4])









    Out[26]:





array(-0.6417261958122253, dtype=float32)

Oh noes! What went wrong? Note that we trained our network to compute min on positive numbers; negative numbers are outside of its input distribution.



In [27]:

    
min_block.eval([0.3, 0.2, 0.9])









    Out[27]:





array(0.1865474432706833, dtype=float32)

Well, that's better. What happens if you train the network on negative numbers as well as on positives? What if you only train on short lists and then evaluate the net on long ones? What if you used a Fold block instead of a Reduce? ... Happy Folding!