Lets define $sum_p(x)$ in pure Python
In [1]:
def sum_p(X):
y = 0
for x_i in range(int(X)):
y += x_i
return y
Then we define $sum_j(x)$ that is identical but just with decorator @jit in the definition.
In [2]:
from numba import jit
@jit
def sum_j(X):
y = 0
for x_i in range(int(X)):
y += x_i
return y
In [3]:
import os
import time
import pandas as pd
import matplotlib
%matplotlib inline
# Different platforms require different functions to properly measure current timestamp:
if os.name == 'nt':
now = time.clock
else:
now = time.time
def run_benchmarks(functions, call_parameters, num_times,
logy=False, logx=False):
# Executes one function several times and measure performances:
def _apply_function(function, num_times):
for j in range(num_times):
t_0 = now()
y = function(*call_parameters)
duration = (now() - t_0)
yield float(duration)
def _name(function):
return '${' + function.__name__ + '(x)}$'
# Execute all functions the requested number of times and collects durations:
def _apply_functions(functions, num_times):
for function in functions:
yield pd.Series(_apply_function(function, num_times),
name=_name(function))
# Collects and plots the results:
df = pd.concat(_apply_functions(functions, num_times),
axis=1)
ax = df.plot(figsize=(10,5),
logy=logy,
logx=logx,
title='$T[f(x)]$ in seconds',
style='o-')
In [4]:
run_benchmarks(functions=[sum_p, sum_j],
call_parameters=(10000000,),
num_times=5,
logy=True) # Logarithmic scale
In [5]:
run_benchmarks(functions=[sum_j],
call_parameters=(1000000000000000.,),
num_times=5,
logy=True) # Logarithmic scale
Numba JIT functionality works in the following way:
In [6]:
from numba import jit
@jit
def sum_j(x):
y = 0.
x_i = 0.
while x_i < x:
y += x_i
x_i += 1.
return y
In [7]:
%load_ext Cython
In [8]:
%%cython
def sum_c(double x):
cdef double y = 0.
cdef double x_i = 0.
while x_i < x:
y += x_i
x_i += 1.
return y
About Cython:
double.
In [11]:
run_benchmarks(functions=[sum_j, sum_c],
call_parameters=(1000000000.,),
num_times=10)
The numba jitted function is comparable with the cythonized one, lets check what was the C code cython used, just to give us an idea of the efficiency of the code generated.
In [12]:
%%cython --annotate
def sum_c(double x):
cdef double y = 0.
cdef double x_i = 0.
while x_i < x:
y += x_i
x_i += 1.
return y
Out[12]:
The function is in order, we have Python overhead only during the call and the return value to convert values from/to Python.
Write C/C++ code and use it from Python.
It is sure the most powerful approach, but by far the more expensive.
Who develops C/C++ knows what does it mean:
In general, C++ is expensive and reserved to projects with a very big budget.