Einsum Notebook - Einstein Summation

Einstein Notation

In mathematics, especially in applications of linear algebra to physics, the Einstein notation or Einstein summation convention is a notational convention that implies summation over a set of indexed terms in a formula, thus achieving notational brevity. It was introduced to physics by Albert Einstein in 1916.

The following $$ y = \sum^3_{i=1} c_i x_i = c_1 x_1 + c_2 x_2 + c_3 x_3 $$ is reduced to $$ y = c_i x_i $$

Inner Product

$$ \mathbf{u} \cdot \mathbf{v} = u_i v_i $$

Matrix Product

$$ \mathbf{C} = \mathbf{A} \mathbf{B} = A_{ij} B_{jk} = C_{ik} $$

Matrix Trace

The trace of a matrix, $tr(A) = \sum_i A_ii$ is calculated by $$ tr(A) = A_{ii} $$

NumPy Einsum

numpy.einsum(subscripts, *operands, out=None, dtype=None, order='K', casting='safe')

In [1]:
import numpy as np

Ok, let's try some stuff out. The example below shows how a dot product is represented using einsum:


In [2]:
u = np.array([1.0, 2.0, 4.0, 3.0])
v = np.array([3.0, 1.0, 1.0, 5.0])


print('Dot using np.dot is', np.dot(u, v))
print('Dot using np.einsum is', np.einsum('i,i', u, v))
print(np.einsum('i',u))


Dot using np.dot is 24.0
Dot using np.einsum is 24.0
[ 1.  2.  4.  3.]

In [3]:
A = np.array([[1.0, 2.0, 4.0, 3.0],
              [1.0, 0.0, 2.0, 0.0],
              [1.0, 1.0, 1.1, 0.0],
              [2.0, 2.0, 1.0, 1.0]])

B = np.array([[1.0, 2.0, 4.0, 3.0],
              [0.0, 1.0, 0.0, 0.0],
              [1.0, 1.0, 1.0, 1.0],
              [7.0, 8.0, 9.0, 1.0]])


Cdot = np.dot(A, B)
Cein = np.einsum('jw,wk', A, B)

assert np.allclose(Cdot, Cein)

print('Trace using np.trace is', np.trace(A))
print('\nTrace using np.einsum is',np.einsum('ii', A))
#print np.einsum('ij', A)
#print np.einsum('ji', A)


Trace using np.trace is 3.1

Trace using np.einsum is 3.1

In [4]:
print('Matrix-vector product using np.dot is\n', np.dot(A, u))
print('\nMatrix-vector product using np.einsum is\n', np.einsum('ij,j', A, u))


Matrix-vector product using np.dot is
 [ 30.    9.    7.4  13. ]

Matrix-vector product using np.einsum is
 [ 30.    9.    7.4  13. ]

In [5]:
print(np.dot(A.T, u))
print(np.einsum('qj,q', A, u))


[ 13.   12.   15.4   6. ]
[ 13.   12.   15.4   6. ]

Broadcasting

Broadcasting will use the "..." syntax.


In [6]:
# Multiply each line of A by u pointwise
print('A is\n', A)
print('\nU is\n', u)
print('\nA*u is\n', A*u)
print('\nnp.einsum is\n', np.einsum('...j,j->...j', A, u))


A is
 [[ 1.   2.   4.   3. ]
 [ 1.   0.   2.   0. ]
 [ 1.   1.   1.1  0. ]
 [ 2.   2.   1.   1. ]]

U is
 [ 1.  2.  4.  3.]

A*u is
 [[  1.    4.   16.    9. ]
 [  1.    0.    8.    0. ]
 [  1.    2.    4.4   0. ]
 [  2.    4.    4.    3. ]]

np.einsum is
 [[  1.    4.   16.    9. ]
 [  1.    0.    8.    0. ]
 [  1.    2.    4.4   0. ]
 [  2.    4.    4.    3. ]]

In [7]:
print('\nnp.einsum is\n', np.einsum('...j->...', A))


np.einsum is
 [ 10.    3.    3.1   6. ]

Now let's check performance!


In [ ]:
A = np.random.rand(1000,1000)
B = np.random.rand(1000,1000)
b = np.random.rand(1000)

print ('Broadcasting')

print ('\nA*b')
%timeit A*b

print ("\nnp.einsum('...j,j->...j', A, b)")
%timeit np.einsum('...j,j->...j', A, b)

print ('\nMatrix Vector Product')

print ('\nnp.dot(A, b)')
%timeit np.dot(A, b)

print ("\nnp.einsum('...j,j', A, b)")
%timeit np.einsum('...j,j', A, b)

print ('\nMatrix Product')

print ('\nnp.dot(A, B)')
%timeit np.dot(A, B)

print ("\nnp.einsum('ij,jk', A, B)")
%timeit np.einsum('ij,jk', A, B)


Broadcasting

A*b
100 loops, best of 3: 2.95 ms per loop

np.einsum('...j,j->...j', A, b)
100 loops, best of 3: 3.44 ms per loop

Matrix Vector Product

np.dot(A, b)
The slowest run took 12.83 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 183 µs per loop

np.einsum('...j,j', A, b)
1000 loops, best of 3: 433 µs per loop

Matrix Product

np.dot(A, B)
10 loops, best of 3: 35.6 ms per loop

np.einsum('ij,jk', A, B)
1 loop, best of 3: 442 ms per loop

In [ ]:
A_array = np.random.rand(10000, 3, 3) # [A1, A2, A3, ...]
b_array = np.random.rand(10000, 3) # [b1, b2, b3, ...]
x_array = np.zeros((10000, 3))

# np.dot(A_array, b_array)


def use_dot(x_array):
    for i in range(10000):
        x_array[i] = np.dot(A_array[i], b_array[i])

print ('Using dot and python loop')
%timeit use_dot(x_array)

print ("\nUsing np.einsum('...jk,...k->...j', A_array, b_array)")
%timeit np.einsum('...jk,...k->...j', A_array, b_array)

use_dot(x_array)
x_einsum_array1 = np.einsum('...jk,...k->...j', A_array, b_array)
assert np.allclose(x_array, x_einsum_array1)


Using dot and python loop
10 loops, best of 3: 19.4 ms per loop

Using np.einsum('...jk,...k->...j', A_array, b_array)

In [ ]:
np.einsum('...ij,...jk,...kl,...l->...i', A_array, A_array, A_array, b_array)

NOTE: If performance and not code style is your primary concern, you might still be better off with dot and the loop (depending on your specific data and system environment). In contrast to einsum, dot can take advantage of BLAS and will often multithread automatically.

Methods in the oven: numpy.core.umath_tests


In [ ]:
#dir(np.core.umath_tests)
#print np.version.version
from numpy.core.umath_tests import inner1d
dir(np.core.umath_tests)

In [ ]:
a_array = np.random.rand(1000000, 3) # [a1, a2, a3, ...]
b_array = np.random.rand(1000000, 3) # [b1, b2, b3, ...]

print ("np.einsum('...k,...k', a_array, b_array)")
%timeit np.einsum('...k,...k', a_array, b_array)

# Another one, it seems really advanced... but also faster...
print ('\nnp.core.umath_tests.inner1d(a_array, b_array)')
%timeit np.core.umath_tests.inner1d(a_array, b_array)

assert np.allclose(np.einsum('...k,...k', a_array, b_array), np.core.umath_tests.inner1d(a_array, b_array))

It uses the Generalized Universal Function API, but this will be for another book meeting.

There is a general need for looping over not only functions on scalars but also over functions on vectors (or arrays). This concept is realized in Numpy by generalizing the universal functions (ufuncs).

New hidden methods in this pull request. They all use the method gufunc(..., **kw_args).

In numpy/linalg/_gufuncs_linalg.py,

=======================
 gufuncs_linalg module
=======================

Linear Algebra functions implemented as gufuncs, so they broadcast.

warning::** This module is only for testing, the functionality will be integrated into numpy.linalg proper.

matrix_multiply wins for matrix-matrix vectorized.


In [ ]:
A_array = np.random.rand(10000, 4, 3) # [A1, A2, A3, ...]
B_array = np.random.rand(10000, 3, 4) # [A1, A2, A3, ...]

print ("np.einsum('...ij,...jk->...ik', A_array, B_array)")
%timeit np.einsum('...ij,...jk->...ik', A_array, B_array)

# Another one, it seems really advanced... but also faster...
print ('\nnp.core.umath_tests.matrix_multiply(A_array, B_array)')
%timeit np.core.umath_tests.matrix_multiply(A_array, B_array)

assert np.allclose(np.einsum('...ij,...jk->...ik', A_array, B_array), np.core.umath_tests.matrix_multiply(A_array, B_array))

einsum wins for matrix-vector vectorized.


In [ ]:
A_array = np.random.rand(10000, 4, 4) # [A1, A2, A3, ...]
b_array = np.random.rand(10000, 4) # [b1, b2, b3, ...]

print ("\nUsing np.einsum('...jk,...k->...j', A_array, b_array)")
%timeit np.einsum('...jk,...k->...j', A_array, b_array)

print ("\nnp.core.umath_tests.matrix_multiply(A_array, b_array)")
%timeit np.core.umath_tests.matrix_multiply(A_array, b_array[:, :, np.newaxis])[:,:,0]

x_einsum = np.einsum('...jk,...k->...j', A_array, b_array)
x_matrix_multiply = np.core.umath_tests.matrix_multiply(A_array, b_array[:, :, np.newaxis])[:,:,0]
assert np.allclose(x_einsum, x_matrix_multiply)

When there is only one operand, no axes are summed, and no output parameter is provided, a view into the operand is returned instead of a new array. Thus, taking the diagonal as np.einsum('ii->i', a) produces a view.


In [ ]:
A = np.array([[1.0, 2.0, 4.0, 3.0],
              [1.0, 0.0, 2.0, 0.0],
              [1.0, 1.0, 1.1, 0.0],
              [2.0, 2.0, 1.0, 1.0]])
d = np.einsum('ii->i', A)

#d[0] = 1000 # Doesn't work, it's a view!
print(d)
print(A)

In [ ]:
# Products with more than 2 arrays...

A_array = np.random.rand(2, 3, 3) # [A1, A2, A3, ...]
b_array = np.random.rand(2, 3) # [b1, b2, b3, ...]

# "A*A*b" = [dot(dot(A1,A1), b1), dot(dot(A2,A2), b2), ...]
val2 = np.einsum('...ij,...jk,...k->...ik', A_array, A_array, b_array)
print(val2)