In [1]:
%matplotlib inline

import xyzpy as xyz
import numpy as np

Timing

Simple timing with Timer

This is a super simple context manager for very roughly timing a statement that runs once:


In [2]:
with xyz.Timer() as timer:
    
    A = np.random.randn(512, 512)
    el, ev = np.linalg.eig(A)
    
timer.interval


Out[2]:
0.4984438419342041

If you run this a few times you might notice some big fluctuations.

Advanced timing with benchmark

This is a more advanced and accurate function that wraps timeit under the hood. If offers however a convenient interface that accepts callables and sensibly manages how many repeats to do etc.:


In [3]:
def setup(n=512):
    return np.random.randn(n, n)

def foo(A):
    return np.linalg.eig(A)

xyz.benchmark(foo, setup=setup)


Out[3]:
0.2674092039997049

Or we can specfic the size n to benchmark with as well:


In [4]:
xyz.benchmark(foo, setup=setup, n=1024)


Out[4]:
1.481813181000689

In [5]:
import numba as nb

def setup(n):
    return np.random.randn(n)

def python_square_sum(xs):
    y = 0.0
    for x in xs:
        y += x**2
    return y**0.5

def numpy_square_sum(xs):
    return (xs**2).sum()**0.5

@nb.njit
def numba_square_sum(xs):
    y = 0.0
    for x in xs:
        y += x**2
    return y**0.5

The setup function will be supplied to each, we can check they first give the same answer:


In [6]:
xs = setup(100)

In [7]:
python_square_sum(xs)


Out[7]:
10.259648966061379

In [8]:
numpy_square_sum(xs)


Out[8]:
10.259648966061379

In [9]:
numba_square_sum(xs)


Out[9]:
10.259648966061379

In [10]:
kernels = [
    python_square_sum, 
    numpy_square_sum, 
    numba_square_sum,
]

benchmarker = xyz.Benchmarker(
    kernels, 
    setup=setup, 
    benchmark_opts={'min_t': 0.01}
)

Next we run a set of problem sizes:


In [11]:
sizes = [2**i for i in range(1, 11)]

benchmarker.run(sizes, verbosity=2)


{'n': 1024, 'kernel': 'numba_square_sum'}: 100%|##########| 30/30 [00:01<00:00, 16.94it/s]

Which we can then automatically plot:


In [12]:
benchmarker.ilineplot()


Loading BokehJS ...
Out[12]:
<xyzpy.plot.plotter_bokeh.ILinePlot at 0x7f3f184aa9e8>

In [13]:
import random

stats = xyz.RunningStatistics()
total = 0.0

# don't know how many `x` we'll generate, and won't keep them
while total < 100:
    x = random.random()
    total += x
    
    stats.update(x)

We can now check a variety of information about the values generated:


In [14]:
print("             Count: {}".format(stats.count))
print("              Mean: {}".format(stats.mean))
print("          Variance: {}".format(stats.var))
print("Standard Deviation: {}".format(stats.std))
print(" Error on the mean: {}".format(stats.err))
print("    Relative Error: {}".format(stats.rel_err))


             Count: 199
              Mean: 0.5063529774965086
          Variance: 0.08943114739945406
Standard Deviation: 0.2990504094621073
 Error on the mean: 0.02119912146177349
    Relative Error: 0.041866291705413464

In [15]:
xs = [random.random() for _ in range(10000)]

In [16]:
stats.update_from_it(xs)

In [17]:
stats.count


Out[17]:
10199

The relative error should now be much smaller:


In [18]:
stats.rel_err


Out[18]:
0.0057787049156377245

In [19]:
def rand_n_sum(n):
    return np.random.rand(n).sum()

stats = xyz.estimate_from_repeats(rand_n_sum, n=1000, rtol=0.0001, verbosity=1)


30971it [00:00, 61861.29it/s]
<RunningStatistics(mean=499.93003, err=0.05009, count=33469)>

We can then query the returned RunningStatistics object:


In [20]:
stats.mean


Out[20]:
499.9300279872804

In [21]:
stats.rel_err


Out[21]:
0.0001001975800490424

Which looks as expected.