This small Jupyter notebook shows a simple benchmark comparing various implementations in Python and one in Julia of a specific numerical algorithm, the Romberg integration method.
For Python:
For Julia:
We will use scipy.integrate.quad
function to compare the result of our manual implementations.
In [1]:
from scipy.integrate import quad
Let try it with this function $f(x)$ on $[a,b]=[1993,2015]$:
$$ f(x) := \frac{12x+1}{1+\cos(x)^2} $$
In [2]:
import math
f = lambda x: (12*x+1)/(1+math.cos(x)**2)
a, b = 1993, 2017
In [3]:
quad(f, a, b)
Out[3]:
The first value is the numerical value of the integral $\int_{a}^{b} f(x) \mathrm{d}x$ and the second value is an estimate of the numerical error.
$0.4\%$ is not much, alright.
See https://mec-cs101-integrals.readthedocs.io/en/latest/_modules/integrals.html#romberg_rec for the code and https://mec-cs101-integrals.readthedocs.io/en/latest/integrals.html#integrals.romberg_rec for the doc
In [4]:
def romberg_rec(f, xmin, xmax, n=8, m=None):
if m is None: # not m was considering 0 as None
m = n
assert n >= m
if n == 0 and m == 0:
return ((xmax - xmin) / 2.0) * (f(xmin) + f(xmax))
elif m == 0:
h = (xmax - xmin) / float(2**n)
N = (2**(n - 1)) + 1
term = math.fsum(f(xmin + ((2 * k) - 1) * h) for k in range(1, N))
return (term * h) + (0.5) * romberg_rec(f, xmin, xmax, n - 1, 0)
else:
return (1.0 / ((4**m) - 1)) * ((4**m) * romberg_rec(f, xmin, xmax, n, m - 1) - romberg_rec(f, xmin, xmax, n - 1, m - 1))
In [5]:
romberg_rec(f, a, b, n=0) # really not accurate!
romberg_rec(f, a, b, n=1) # alreay pretty good!
romberg_rec(f, a, b, n=2)
romberg_rec(f, a, b, n=3)
romberg_rec(f, a, b, n=8) # Almost the exact value.
romberg_rec(f, a, b, n=10) # Almost the exact value.
romberg_rec(f, a, b, n=12) # Almost the exact value.
Out[5]:
Out[5]:
Out[5]:
Out[5]:
Out[5]:
Out[5]:
Out[5]:
It converges quite quickly to the "true" value as given by scipy.integrate.quad
.
See https://mec-cs101-integrals.readthedocs.io/en/latest/_modules/integrals.html#romberg for the code and https://mec-cs101-integrals.readthedocs.io/en/latest/integrals.html#integrals.romberg for the doc.
It is not hard to make this function non-recursive, by storing the intermediate results.
In [6]:
def romberg(f, xmin, xmax, n=8, m=None):
assert xmin <= xmax
if m is None:
m = n
assert n >= m >= 0
# First value:
r = {(0, 0): 0.5 * (xmax - xmin) * (f(xmax) + f(xmin))}
# One side of the triangle:
for i in range(1, n + 1):
h_i = (xmax - xmin) / float(2**i)
xsamples = [xmin + ((2 * k - 1) * h_i) for k in range(1, 1 + 2**(i - 1))]
r[(i, 0)] = (0.5 * r[(i - 1, 0)]) + h_i * math.fsum(f(x) for x in xsamples)
# All the other values:
for j in range(1, m + 1):
for i in range(j, n + 1):
try:
r[(i, j)] = (((4**j) * r[(i, j - 1)]) - r[(i - 1, j - 1)]) / float((4**j) - 1)
except:
raise ValueError("romberg() with n = {}, m = {} and i = {}, j = {} was an error.".format(n, m, i, j))
return r[(n, m)]
In [7]:
romberg(f, a, b, n=0) # really not accurate!
romberg(f, a, b, n=1) # alreay pretty good!
romberg(f, a, b, n=2)
romberg(f, a, b, n=3)
romberg(f, a, b, n=8) # Almost the exact value.
romberg(f, a, b, n=10) # Almost the exact value.
romberg(f, a, b, n=12) # Almost the exact value.
Out[7]:
Out[7]:
Out[7]:
Out[7]:
Out[7]:
Out[7]:
Out[7]:
It converges quite quickly to the "true" value as given by scipy.integrate.quad
.
Instead of using a dictionary, which gets filled up dynamically (and so, slowly), let us use a numpy arrays, as we already know the size of the array we need ($n+1 \times m+1$).
Note that only half of the array is used, so we could try to use sparse matrices maybe, for triangular matrices? From what I know, it is not worth it, both in term of memory information (if the sparsity measure is $\simeq 1/2$, you don't win anything from LIL or other sparse matrices representation...
We could use numpy.tri
but this uses a dense array so nope.
In [8]:
import numpy as np
def romberg_better(f, xmin, xmax, n=8, m=None):
assert xmin <= xmax
if m is None:
m = n
assert n >= m >= 0
# First value:
r = np.zeros((n+1, m+1))
r[0, 0] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in range(1, n + 1):
h_i = (xmax - xmin) / 2.**i
r[i, 0] = (0.5 * r[i - 1, 0]) + h_i * math.fsum(
f(xmin + ((2 * k - 1) * h_i))
for k in range(1, 1 + 2**(i - 1))
)
# All the other values:
for j in range(1, m + 1):
for i in range(j, n + 1):
r[i, j] = (((4.**j) * r[i, j - 1]) - r[i - 1, j - 1]) / ((4.**j) - 1.)
return r[n, m]
In [9]:
romberg_better(f, a, b, n=0) # really not accurate!
romberg_better(f, a, b, n=1) # alreay pretty good!
romberg_better(f, a, b, n=2)
romberg_better(f, a, b, n=3)
romberg_better(f, a, b, n=8) # Almost the exact value.
romberg_better(f, a, b, n=10) # Almost the exact value.
romberg_better(f, a, b, n=12) # Almost the exact value.
Out[9]:
Out[9]:
Out[9]:
Out[9]:
Out[9]:
Out[9]:
Out[9]:
It converges quite quickly to the "true" value as given by scipy.integrate.quad
.
In [10]:
%timeit quad(f, a, b)
In [11]:
%timeit romberg_rec(f, a, b, n=10)
In [12]:
%timeit romberg(f, a, b, n=10)
In [13]:
%timeit romberg_better(f, a, b, n=10)
We already see that the recursive version is much slower than the dynamic programming one!
But there is not much difference between the one using dictionary (romberg()
) and the one using a numpy array of a known size (romberg_better()
).
In [14]:
%%time
import numpy as np
import math
import random
f = lambda x: (12*x+1)/(1+math.cos(x)**2)
# Same code
def romberg(f, xmin, xmax, n=8, m=None):
assert xmin <= xmax
if m is None:
m = n
assert n >= m >= 0
# First value:
r = np.zeros((n+1, m+1))
r[0, 0] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in range(1, n + 1):
h_i = (xmax - xmin) / 2.**i
r[i, 0] = (0.5 * r[i - 1, 0]) + h_i * math.fsum(
f(xmin + ((2 * k - 1) * h_i))
for k in range(1, 1 + 2**(i - 1))
)
# All the other values:
for j in range(1, m + 1):
for i in range(j, n + 1):
r[i, j] = (((4.**j) * r[i, j - 1]) - r[i - 1, j - 1]) / ((4.**j) - 1.)
return r[n, m]
for _ in range(100000):
a = random.randint(-2000, 2000)
b = a + random.randint(0, 100)
romberg(f, a, b)
And now the same code executed by an external Pypy interpreter (Python 2.7.13 and PyPy 5.8.0 with GCC 5.4.0)
In [15]:
%%time
%%pypy
import math
import random
f = lambda x: (12*x+1)/(1+math.cos(x)**2)
# Same code
def romberg_pypy(f, xmin, xmax, n=8, m=None):
assert xmin <= xmax
if m is None:
m = n
assert n >= m >= 0
# First value:
r = [[0 for _ in range(n+1)] for _ in range(m+1)]
r[0][0] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in range(1, n + 1):
h_i = (xmax - xmin) / 2.**i
r[i][0] = (0.5 * r[i - 1][0]) + h_i * math.fsum(
f(xmin + ((2 * k - 1) * h_i))
for k in range(1, 1 + 2**(i - 1))
)
# All the other values:
for j in range(1, m + 1):
for i in range(j, n + 1):
r[i][j] = (((4.**j) * r[i][j - 1]) - r[i - 1][j - 1]) / ((4.**j) - 1.)
return r[n][m]
for _ in range(100000):
a = random.randint(-2000, 2000)
b = a + random.randint(0, 100)
romberg_pypy(f, a, b)
This version uses the improved memoization trick (no dictionary), but uses nested lists and not numpy arrays, I didn't bother to install numpy on my Pypy installation (even thought it should be possible).
In [16]:
from numba import jit
In [17]:
@jit
def romberg_numba(f, xmin, xmax, n=8):
assert xmin <= xmax
m = n
# First value:
r = {(0, 0): 0.5 * (xmax - xmin) * (f(xmax) + f(xmin))}
# One side of the triangle:
for i in range(1, n + 1):
h_i = (xmax - xmin) / float(2**i)
xsamples = [xmin + ((2 * k - 1) * h_i) for k in range(1, 1 + 2**(i - 1))]
r[(i, 0)] = (0.5 * r[(i - 1, 0)]) + h_i * math.fsum(f(x) for x in xsamples)
# All the other values:
for j in range(1, m + 1):
for i in range(j, n + 1):
try:
r[(i, j)] = (((4**j) * r[(i, j - 1)]) - r[(i - 1, j - 1)]) / float((4**j) - 1)
except:
raise ValueError("romberg() with n = {}, m = {} and i = {}, j = {} was an error.".format(n, m, i, j))
return r[(n, m)]
In [18]:
romberg_numba(f, a, b, n=8) # Almost the exact value.
It fails! Almost as always when trying Numba, it fails cryptically, too bad. I don't want to spend time debugging this.
Thanks to this page for a nice and short introduction to Julia.
In [19]:
%%script julia
function f(x)
(12*x + 1) / (1 + cos(x)^2)
end
a = 1993
b = 2017
function romberg_julia(f, xmin, xmax; n=8)
m = n
# First value:
r = Dict()
r[(0, 0)] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in 1 : n
h_i = (xmax - xmin) / (2^i)
sum_f_x = 0
for k in 1 : 2^(i - 1)
sum_f_x += f(xmin + ((2 * k - 1) * h_i))
end
r[(i, 0)] = (r[(i - 1, 0)] / 2.) + (h_i * sum_f_x)
end
# All the other values:
for j in 1 : m
for i in j : n
r[(i, j)] = (((4^j) * r[(i, j - 1)]) - r[(i - 1, j - 1)]) / (4^j - 1.)
end
end
r[(n, m)]
end
println(romberg_julia(f, a, b, n=0)) # really not accurate!
println(romberg_julia(f, a, b, n=1)) # alreay pretty good!
println(romberg_julia(f, a, b, n=2))
println(romberg_julia(f, a, b, n=3))
println(romberg_julia(f, a, b, n=8)) # Almost the exact value.
println(romberg_julia(f, a, b, n=10)) # Almost the exact value.
println(romberg_julia(f, a, b, n=12)) # Almost the exact value.
It seems to work well, like the Python implementation. We get the same numerical result:
In [20]:
f = lambda x: (12*x+1)/(1+math.cos(x)**2)
a, b = 1993, 2017
quad(f, a, b)
romberg(f, a, b, n=12)
Out[20]:
Out[20]:
Let try a less naive version using a fixed-size array instead of a dictionary. (as we did before for the Python version)
In [21]:
%%script julia
function f(x)
(12*x + 1) / (1 + cos(x)^2)
end
a = 1993
b = 2017
function romberg_julia_better(f, xmin, xmax; n=8)
m = n
# First value:
r = zeros((n+1, m+1)) # https://docs.julialang.org/en/latest/stdlib/arrays/#Base.zeros
r[1, 1] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in 1 : n
h_i = (xmax - xmin) / (2^i)
sum_f_x = 0
for k in 1 : 2^(i - 1)
sum_f_x += f(xmin + ((2 * k - 1) * h_i))
end
r[i + 1, 1] = (r[i, 1] / 2.) + (h_i * sum_f_x)
end
# All the other values:
for j in 1 : m
for i in j : n
r[i + 1, j + 1] = (((4.^j) * r[i + 1, j]) - r[i, j]) / (4.^j - 1.)
end
end
r[n + 1, m + 1]
end
println(romberg_julia_better(f, a, b, n=0)) # really not accurate!
println(romberg_julia_better(f, a, b, n=1)) # alreay pretty good!
println(romberg_julia_better(f, a, b, n=2))
println(romberg_julia_better(f, a, b, n=3))
println(romberg_julia_better(f, a, b, n=8)) # Almost the exact value.
println(romberg_julia_better(f, a, b, n=10)) # Almost the exact value.
println(romberg_julia_better(f, a, b, n=12)) # Almost the exact value.
First with Python:
In [22]:
%%time
import numpy as np
import math
import random
f = lambda x: (12*x+1)/(1+math.cos(x)**2)
# Same code
def romberg(f, xmin, xmax, n=8, m=None):
assert xmin <= xmax
if m is None:
m = n
assert n >= m >= 0
# First value:
r = np.zeros((n+1, m+1))
r[0, 0] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in range(1, n + 1):
h_i = (xmax - xmin) / 2.**i
r[i, 0] = (0.5 * r[i - 1, 0]) + h_i * math.fsum(
f(xmin + ((2 * k - 1) * h_i))
for k in range(1, 1 + 2**(i - 1))
)
# All the other values:
for j in range(1, m + 1):
for i in range(j, n + 1):
r[i, j] = (((4.**j) * r[i, j - 1]) - r[i - 1, j - 1]) / ((4.**j) - 1.)
return r[n, m]
for _ in range(100000):
a = random.randint(-2000, 2000)
b = a + random.randint(0, 100)
romberg(f, a, b)
And now the same code executed by an external Pypy interpreter (Python 2.7.13 and PyPy 5.8.0 with GCC 5.4.0)
In [23]:
%%time
%%pypy
import math
import random
f = lambda x: (12*x+1)/(1+math.cos(x)**2)
# Same code
def romberg_pypy(f, xmin, xmax, n=8, m=None):
assert xmin <= xmax
if m is None:
m = n
assert n >= m >= 0
# First value:
r = [[0 for _ in range(n+1)] for _ in range(m+1)]
r[0][0] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in range(1, n + 1):
h_i = (xmax - xmin) / 2.**i
r[i][0] = (0.5 * r[i - 1][0]) + h_i * math.fsum(
f(xmin + ((2 * k - 1) * h_i))
for k in range(1, 1 + 2**(i - 1))
)
# All the other values:
for j in range(1, m + 1):
for i in range(j, n + 1):
r[i][j] = (((4.**j) * r[i][j - 1]) - r[i - 1][j - 1]) / ((4.**j) - 1.)
return r[n][m]
for _ in range(100000):
a = random.randint(-2000, 2000)
b = a + random.randint(0, 100)
romberg_pypy(f, a, b)
And finally with Julia:
In [24]:
%%time
%%script julia
function f(x)
(12*x + 1) / (1 + cos(x)^2)
end
function romberg_julia(f, xmin, xmax; n=8)
m = n
# First value:
r = Dict()
r[(0, 0)] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in 1 : n
h_i = (xmax - xmin) / (2^i)
sum_f_x = 0
for k in 1 : 2^(i - 1)
sum_f_x += f(xmin + ((2 * k - 1) * h_i))
end
r[(i, 0)] = (r[(i - 1, 0)] / 2.) + (h_i * sum_f_x)
end
# All the other values:
for j in 1 : m
for i in j : n
r[(i, j)] = (((4^j) * r[(i, j - 1)]) - r[(i - 1, j - 1)]) / (4^j - 1.)
end
end
r[(n, m)]
end
for _ in 1:100000
a = rand(-2000:2000)
b = a + rand(0:100)
romberg_julia(f, a, b)
end
On this first test, it doesn't look faster than Pypy... But what if we use the improved version, with an array instead of dictionary?
In [25]:
%%time
%%script julia
function f(x)
(12*x + 1) / (1 + cos(x)^2)
end
function romberg_julia_better(f, xmin, xmax; n=8)
m = n
# First value:
r = zeros((n+1, m+1)) # https://docs.julialang.org/en/latest/stdlib/arrays/#Base.zeros
r[1, 1] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in 1 : n
h_i = (xmax - xmin) / (2^i)
sum_f_x = 0
for k in 1 : 2^(i - 1)
sum_f_x += f(xmin + ((2 * k - 1) * h_i))
end
r[i + 1, 1] = (r[i, 1] / 2.) + (h_i * sum_f_x)
end
# All the other values:
for j in 1 : m
for i in j : n
r[i + 1, j + 1] = (((4.^j) * r[i + 1, j]) - r[i, j]) / (4.^j - 1.)
end
end
r[n + 1, m + 1]
end
for _ in 1:100000
a = rand(-2000:2000)
b = a + rand(0:100)
romberg_julia_better(f, a, b)
end
Oh, this time it finally seems faster. Really faster? Yes, about 3 to 4 time faster than Pypy.
Remark also that this last cells compared by using the magic %%pypy
and %%script julia
, so they both need a warm-up time (opening the pipe, the sub-process, initializing the JIT compiler etc).
But it is fair to compare Pypy to Julia this way.
Let try the same numerical algorithm but with a different integrand function.
$$\int_{0}^{1} \exp(-x^2) \mathrm{d}x \approx 0.842700792949715$$First with Python:
In [26]:
%%time
import numpy as np
import math
import random
f = lambda x: (2.0 / math.sqrt(math.pi)) * math.exp(-x**2)
# Same code
def romberg(f, xmin, xmax, n=8, m=None):
assert xmin <= xmax
if m is None:
m = n
assert n >= m >= 0
# First value:
r = np.zeros((n+1, m+1))
r[0, 0] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in range(1, n + 1):
h_i = (xmax - xmin) / 2.**i
r[i, 0] = (0.5 * r[i - 1, 0]) + h_i * math.fsum(
f(xmin + ((2 * k - 1) * h_i))
for k in range(1, 1 + 2**(i - 1))
)
# All the other values:
for j in range(1, m + 1):
for i in range(j, n + 1):
r[i, j] = (((4.**j) * r[i, j - 1]) - r[i - 1, j - 1]) / ((4.**j) - 1.)
return r[n, m]
for _ in range(100000):
a = 0
b = 1
romberg(f, a, b)
print(romberg(f, a, b))
And now the same code executed by an external Pypy interpreter (Python 2.7.13 and PyPy 5.8.0 with GCC 5.4.0)
In [27]:
%%time
%%pypy
import math
import random
f = lambda x: (2.0 / math.sqrt(math.pi)) * math.exp(-x**2)
# Same code
def romberg_pypy(f, xmin, xmax, n=8, m=None):
assert xmin <= xmax
if m is None:
m = n
assert n >= m >= 0
# First value:
r = [[0 for _ in range(n+1)] for _ in range(m+1)]
r[0][0] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in range(1, n + 1):
h_i = (xmax - xmin) / 2.**i
r[i][0] = (0.5 * r[i - 1][0]) + h_i * math.fsum(
f(xmin + ((2 * k - 1) * h_i))
for k in range(1, 1 + 2**(i - 1))
)
# All the other values:
for j in range(1, m + 1):
for i in range(j, n + 1):
r[i][j] = (((4.**j) * r[i][j - 1]) - r[i - 1][j - 1]) / ((4.**j) - 1.)
return r[n][m]
for _ in range(100000):
a = 0
b = 1
romberg_pypy(f, a, b)
print(romberg_pypy(f, a, b))
And finally with Julia:
In [28]:
%%time
%%script julia
function f(x)
(2.0 / sqrt(pi)) * exp(-x^2)
end
function romberg_julia(f, xmin, xmax; n=8)
m = n
# First value:
r = Dict()
r[(0, 0)] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in 1 : n
h_i = (xmax - xmin) / (2^i)
sum_f_x = 0
for k in 1 : 2^(i - 1)
sum_f_x += f(xmin + ((2 * k - 1) * h_i))
end
r[(i, 0)] = (r[(i - 1, 0)] / 2.) + (h_i * sum_f_x)
end
# All the other values:
for j in 1 : m
for i in j : n
r[(i, j)] = (((4^j) * r[(i, j - 1)]) - r[(i - 1, j - 1)]) / (4^j - 1.)
end
end
r[(n, m)]
end
for _ in 1:100000
a = 0
b = 1
romberg_julia(f, a, b)
end
println(romberg_julia(f, 0, 1))
Still not faster than Pypy... So what is the goal of Julia?
In [29]:
%%time
%%script julia
function f(x)
(2.0 / sqrt(pi)) * exp(-x^2)
end
function romberg_julia_better(f, xmin, xmax; n=8)
m = n
# First value:
r = zeros((n+1, m+1)) # https://docs.julialang.org/en/latest/stdlib/arrays/#Base.zeros
r[1, 1] = (xmax - xmin) * (f(xmax) + f(xmin)) / 2.
# One side of the triangle:
for i in 1 : n
h_i = (xmax - xmin) / (2^i)
sum_f_x = 0
for k in 1 : 2^(i - 1)
sum_f_x += f(xmin + ((2 * k - 1) * h_i))
end
r[i + 1, 1] = (r[i, 1] / 2.) + (h_i * sum_f_x)
end
# All the other values:
for j in 1 : m
for i in j : n
r[i + 1, j + 1] = (((4.^j) * r[i + 1, j]) - r[i, j]) / (4.^j - 1.)
end
end
r[n + 1, m + 1]
end
for _ in 1:100000
a = 0
b = 1
romberg_julia_better(f, a, b)
end
println(romberg_julia_better(f, 0, 1))
This is also faster than Pypy, but not that much...
$\implies$ On this (baby) example of a real world numerical algorithms, tested on thousands of random inputs or on thousands time the same input, the speed-up is in favor of Julia, but it doesn't seem impressive enough to make me want to use it (at least for now).
If I have to use 1-based indexing and a slightly different language, just to maybe gain a speed-up of 2 to 3 (compared to Pypy) or even a 10x speed-up compare to naive Python, why bother?
Of course, this was a baby benchmark, on a small algorithm, and probably wrongly implemented in both Python and Julia.
But still, I am surprised to see that the naive Julia version was slower than the naive Python version executed with Pypy... For the less naive version (without dictionary), the Julia version was about 2 to 3 times faster than the Python version with Pypy.