Understanding outputs of Timeit in python

The objective of this notebook is to set up some examples to understand the output of timeit and work towards having useful functions to be able to quickly study the performance of python functions, particularly when these functions have sequences as inputs, and as a function of sequence size.

A lot of this notebook is based on trying to understand material from



In [4]:

    
from __future__ import absolute_import, print_function, division



In [1]:

    
import timeit
import time



In [2]:

    
import numpy as np
import pandas



In [3]:

    
import functools

Understanding the outputs

As explained in the python docs(https://docs.python.org/2/library/timeit.html), the timeit object works on two strings:

a statement string : This should be the code to run the computation that is required to be timed. In our example we will use a statement function which prints out a statement so that we know this is being executed. It also spends some time sleeping using the time.sleep() function with a default sleepTime of 1 sec.
a setup string : This should contain the code that might be necessary to setup the computation. In this example, we will use a function stp() which prints out a statement indicating the setup is being run, and sleeps for a default sleepTime of 3 sec.



In [5]:

    
def statement(sleepTime=1, stmt=None):
    """
    return a string that includes code to print out stmt and sleep for sleepTime.
    
    Parameters
    ----------
    sleepTime :
    stmt :
    Returns 
    
    -------
    """
    if stmt is None:
        stmt = 'print("running stmt"); time.sleep({})'.format(sleepTime)
    return stmt



In [6]:

    
statement()









    Out[6]:





'print("running stmt"); time.sleep(1)'



In [7]:

    
def stp(sleepTime=3, setup=None):
    if setup is None:
        setup = 'print("running setup"); time.sleep({})'.format(sleepTime)
        return setup



In [8]:

    
stp()









    Out[8]:





'print("running setup"); time.sleep(3)'

Very Basic Usage



In [11]:

    
tstart = time.time()
tt = timeit.Timer(stmt=statement(), setup=stp()) 
l = tt.timeit(number=2)
tend = time.time()
print('wall time elapsed from time.time()', tend - tstart)
print('timeit output: ', l)









    



running setup
running stmt
running stmt
wall time elapsed from time.time() 5.01104879379
timeit output:  2.0092151165

So, this is what happened:

the line instantiating the Timer object read in the statement and setup strings.
we ran it with number=2. This ran the setup statement once (3 second sleep) and the statement (twice) 2 X 1 sec sleep for a total wall time of 5 seconds as indicated by the wall time print statement.
The timeit output is a single number of time 2 seconds. This clearly does not include the time to run setup, and is obtained by running the stmt number times. This is a way of timing only the statement, and running it multiple times to get some kind of averaging effect.



In [20]:

    
tstart = time.time()
l = timeit.timeit(stmt=statement(), setup=stp(), timer=timeit.default_timer, number=2)
tend = time.time()
print('wall time elapsed from time.time()', tend - tstart)
print('timeit output: ', l)









    



running setup
running stmt
running stmt
wall time elapsed from time.time() 5.0052011013
timeit output:  2.00321412086

Warning: python gives the number argument a default of 1000000, so we should make sure we specify it to be small, or the time taken could be very large.

Using repeats



In [13]:

    
tstart = time.time()
x = np.array(tt.repeat(repeat=3, number=2) )
tend = time.time()
print(tend - tstart)
print(x)









    



running setup
running stmt
running stmt
running setup
running stmt
running stmt
running setup
running stmt
running stmt
15.0403218269
[ 2.00846004  2.01045108  2.00494385]

So, unlike number, the repeat argument is a complete repeat of the run, running both setup and the statement. The return is in a list.

For timing a deterministic piece of computation performed by the statement code, one would expect the minimum value of the time to be the true value (all of the extra time coming from different levels of background processes running. So, ideally, I would like to be able to



In [24]:

    
def timemyfunc(func, args=None, setup='pass', number=3, repeat=3):
    if args is None:
        stmt = 'func()'
    else:
        stmt = 'func(*args)'
    timeit.Timer(stmt=stmt, setup=setup)
    res = np.asarray(timeit.repeat(number=number, repeat=repeat))
    return res/number



In [53]:

    
def square(num, val):
    x = np.arange(num)
    print(val)
    return x * x



In [51]:

    
timemyfunc(square, args=[5e10000000000], setup='import numpy as np', number=10000)









    Out[51]:





array([  2.41041183e-08,   2.21014023e-08,   2.20060349e-08])

Using functools to time variation with a single variable



In [52]:

    
import functools



In [56]:

    
functools.partial(square, val='time')(2)









    



time






    Out[56]:





array([0, 1])



In [57]:

    
functools.partial(square, 5)('statement')









    



statement






    Out[57]:





array([ 0,  1,  4,  9, 16])



In [58]:

    
functools.partial(square, num=5)









    Out[58]:





<functools.partial at 0x10fc7aa48>

Garbage Collection



In [ ]: