In [1]:
import tf_einsum_opt
import tensorflow as tf
import numpy as np

In [2]:
sess = tf.Session()

Small scale example


In [3]:
def func(a, b, c):
    res = tf.einsum('ijk,ja,kb->iab', a, b, c) + 1
    res = tf.einsum('iab,kb->iak', res, c)
    return res
a = tf.random_normal((10, 11, 12))
b = tf.random_normal((11, 13))
c = tf.random_normal((12, 14))
# res = func(a, b, c)
orders, optimized_func = tf_einsum_opt.optimizer(func, sess, a, b, c)


Found 2 einsums.
The running time of the whole function is 0.000356 s
Einsums constitue 153.0 % of the running time (0.000544 s).
Optimizing einsum in <ipython-input-3-8750a51928fb>:2, it constitues 84.3% of the overall running time (0.000300 s).
By changing the order of einsum in "<ipython-input-3-8750a51928fb>:2" to [0 2 1] you program will run 14.7 % faster.
Optimizing einsum in <ipython-input-3-8750a51928fb>:3, it constitues 68.7% of the overall running time (0.000244 s).
Einsum improvements haven't found, good work!
The overall predicted savings from all the recommendations are 14.685877%

In [8]:
res1 = func(a, b, c)
%timeit sess.run(res1)


The slowest run took 60.63 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 369 µs per loop

In [9]:
res2 = optimized_func(a, b, c)
%timeit sess.run(res2)


The slowest run took 53.60 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 332 µs per loop

In [12]:
# Check that the results of optimized and the original function are the same.
np.testing.assert_allclose(*sess.run([res1, res2]), rtol=1e-5, atol=1e-5)

Example with more savings, but slower to optimize


In [13]:
def func(a, b, c, d):
    res = tf.einsum('si,sj,sk,ij->s', a, b, d, c)
    res += tf.einsum('s,si->s', res, a)
    return res
a = tf.random_normal((100, 101))
b = tf.random_normal((100, 102))
c = tf.random_normal((101, 102))
d = tf.random_normal((100, 30))
orders, optimized_func = tf_einsum_opt.optimizer(func, sess, a, b, c, d)


Found 2 einsums.
The running time of the whole function is 1.398991 s
Einsums constitue 109.4 % of the running time (1.530153 s).
Optimizing einsum in <ipython-input-13-1748bfc6b08e>:2, it constitues 109.3% of the overall running time (1.529651 s).
By changing the order of einsum in "<ipython-input-13-1748bfc6b08e>:2" to [0 3 1 2] you program will run 109.3 % faster.
The rest of einsums are using < 10% of the overall running time each, we will not gain much by optimizing them.
The overall predicted savings from all the recommendations are 109.290959%

In [5]:
res1 = func(a, b, c, d)
%timeit sess.run(res1)


1 loop, best of 3: 1.34 s per loop

In [6]:
res2 = optimized_func(a, b, c, d)
%timeit sess.run(res2)


The slowest run took 28.74 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 767 µs per loop

Look at the recommendations:


In [14]:
orders


Out[14]:
{'<ipython-input-13-1748bfc6b08e>:2': array([0, 3, 1, 2])}

It means "in file <ipython-input-13-1748bfc6b08e> line 2 change the order of arguments of einsum using permutation [0, 3, 1, 2]", i.e. from tf.einsum('si,sj,sk,ij->s', a, b, d, c) to tf.einsum('si,ij,sj,sk->s', a, c, b, d)