Sum of squared deviations

TL;DR Comparison of approaches


In [2]:
from numba import jit
import numpy as np
import pandas as pd
from numpy import sum, power, mean
from matplotlib import pyplot as plt
%matplotlib inline
import matplotlib
matplotlib.rcParams['figure.figsize'] = (16, 8)

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [3]:
@jit(nopython=True)
def sum_sq_dev(x):
    return sum(power(x - mean(x), 2))

In [4]:
@jit(nopython=True)
def sum_sq_dev_experimental(x):
    population_mean = mean(x)
    total = 0.0

    for value in x:
        total += power((value - population_mean), 2)

    return total

In [5]:
x = np.random.randn(1000)

In [6]:
np.testing.assert_almost_equal(sum_sq_dev(x), sum_sq_dev_experimental(x))  # Basic sanity test

In [7]:
results = []

for exponent in range(7):
    population_size = 10**exponent
    x = np.random.randn(population_size)
    timings_v1 = %timeit -o sum_sq_dev(x)
    timings_v2 = %timeit -o sum_sq_dev_experimental(x)
    np.testing.assert_almost_equal(sum_sq_dev(x), sum_sq_dev_experimental(x))
    results.append((population_size, timings_v1.best, timings_v2.best))


The slowest run took 12.79 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 463 ns per loop
The slowest run took 10.28 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 346 ns per loop
The slowest run took 5.93 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 466 ns per loop
The slowest run took 11.84 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 334 ns per loop
The slowest run took 4.91 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 724 ns per loop
The slowest run took 7.88 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 551 ns per loop
The slowest run took 8.12 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.16 µs per loop
100000 loops, best of 3: 2.93 µs per loop
10000 loops, best of 3: 27.6 µs per loop
10000 loops, best of 3: 26 µs per loop
1000 loops, best of 3: 291 µs per loop
1000 loops, best of 3: 257 µs per loop
100 loops, best of 3: 5.69 ms per loop
100 loops, best of 3: 2.62 ms per loop

In [8]:
df = pd.DataFrame(np.array(results), columns=['population_size', 'sum_sq_dev', 'sum_sq_dev_experimental'])
df.population_size = df.population_size.astype(int)
df = df.set_index('population_size')
df.apply(lambda x: x * 1000)

df.plot(logx=True);
plt.ylabel('best time (ms)')


Out[8]:
sum_sq_dev sum_sq_dev_experimental
population_size
1 0.000463 0.000346
10 0.000466 0.000334
100 0.000724 0.000551
1000 0.003164 0.002926
10000 0.027611 0.025958
100000 0.291489 0.257195
1000000 5.694846 2.621306
Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x21f33ac0470>
Out[8]:
<matplotlib.text.Text at 0x21f37305400>

In [ ]: