Numba is a just-in-time compiler for Python that works best on code that uses NumPy arrays and functions, and loops. ArviZ includes Numba as an optional dependency and a number of functions have been included in utils.py for systems in which Numba is pre-installed. An additional functionality of disabling/re-enabling numba for systems which have numba installed has also been included.
In [1]:
import arviz as az
from arviz.utils import conditional_jit, Numba
from arviz.stats import geweke
from arviz.stats.diagnostics import ks_summary
import numpy as np
import timeit
In [2]:
data = np.random.randn(1000000)
In [3]:
def variance(data, ddof=0): # Method to calculate variance without using numba
a_a, b_b = 0, 0
for i in data:
a_a = a_a + i
b_b = b_b + i * i
var = b_b / (len(data)) - ((a_a / (len(data))) ** 2)
var = var * (len(data) / (len(data) - ddof))
return var
In [4]:
%timeit variance(data, ddof=1)
In [5]:
@conditional_jit
def variance_jit(data, ddof=0): # Calculating variance with numba
a_a, b_b = 0, 0
for i in data:
a_a = a_a + i
b_b = b_b + i * i
var = b_b / (len(data)) - ((a_a / (len(data))) ** 2)
var = var * (len(data) / (len(data) - ddof))
return var
In [6]:
%timeit variance_jit(data, ddof=1)
That is almost 300 times faster!! Let's compare this to numpy
In [7]:
%timeit np.var(data, ddof=1)
In certain scenarios, Numba outperforms numpy! Let's see Numba's effect on a few of ArviZ functions
In [8]:
Numba.disable_numba() # This disables numba
Numba.numba_flag
Out[8]:
In [9]:
data = np.random.randn(1000000)
smaller_data = np.random.randn(1000)
In [10]:
%timeit geweke(data)
In [11]:
%timeit geweke(smaller_data)
In [12]:
Numba.enable_numba() #This will re-enable numba
Numba.numba_flag # This indicates the status of Numba
Out[12]:
In [13]:
%timeit geweke(data)
In [14]:
%timeit geweke(smaller_data)
In [15]:
Numba.enable_numba()
Numba.numba_flag
Out[15]:
Numba speeds up the code by a factor of two approximately. Let's check some other method
In [16]:
summary_data = np.random.randn(1000,100,10)
school = az.load_arviz_data("centered_eight").posterior["mu"].values
In [17]:
Numba.disable_numba()
Numba.numba_flag
Out[17]:
In [18]:
%timeit ks_summary(summary_data)
In [19]:
%timeit ks_summary(school)
In [20]:
Numba.enable_numba()
Numba.numba_flag
Out[20]:
In [21]:
%timeit ks_summary(summary_data)
In [22]:
%timeit ks_summary(school)
Numba has provided a substantial speedup once again.