More stuff from Jake Vanderplas https://jakevdp.github.io/blog/2012/08/16/memoryview-benchmarks-2/
In [1]:
%load_ext cythonmagic
In [2]:
%%cython
import numpy as np
cimport numpy as np
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
cdef inline double inner_func(double[:, ::1] X):
return X[0, 0]
def loop_1(int N, switch=True):
cdef double[:, ::1] X = np.zeros((100, 100))
cdef int i
for i in range(N):
# this should be inlined by the compiler
inner_func(X)
In [3]:
timeit loop_1(1E6)
Now we'll repeat, but make a dummy function such that we
can guarantee that inner_func
will not be inlined
In [4]:
%%cython
import numpy as np
cimport numpy as np
cimport cython
ctypedef double (*inner_func_ptr)(double[:, ::1])
@cython.boundscheck(False)
@cython.wraparound(False)
cdef double inner_func_1(double[:, ::1] X):
return X[0, 0]
@cython.boundscheck(False)
@cython.wraparound(False)
cdef double inner_func_2(double[:, ::1] X):
return X[0, 0]
def loop_2(int N, switch=True):
# use a switch to ensure that inlining can't happen: compilers
# are usually smart enough to figure it out otherwise.
cdef inner_func_ptr func
if switch:
func = inner_func_1
else:
func = inner_func_2
cdef double[:, ::1] X = np.zeros((100, 100))
cdef int i
for i in range(N):
func(X)
In [5]:
%timeit loop_2(1E6)
In this case, inlining improves the computation speed by a factor of 2000!
Here we'll replicate the fast method above, but with a typed numpy array rather than a typed memoryview:
In [6]:
%%cython
import numpy as np
cimport numpy as np
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
cdef inline double inner_func(np.ndarray[double, ndim=2, mode='c'] X):
return X[0, 0]
def loop_3(int N, switch=True):
cdef np.ndarray[double, ndim=2, mode='c'] X = np.zeros((100, 100))
cdef int i
for i in range(N):
inner_func(X)
These warnings indicate a problem: buffer unpacking cannot be optimized for the numpy array dtype. Let's see how this compares to the other implementations:
In [7]:
print "inlined memview:"
%timeit loop_1(1E6)
print "non-inlined memview:"
%timeit loop_2(1E6)
print "inlined ndarray:"
%timeit loop_3(1E6)
The result for the ndarray is many times slower than either the inlined or the non-inlined example above!