Python 3.6 [conda env: PY36]

Performance Testing in iPython/Jupyter NBs

The timeit() command appears to have strict limitations in how you can use it within a Jupyter Notebook. For it to work most effectively:

  • organize the code to test in a function that returns a value
  • ensure it is not printing to screen or the code will print 1000 times (or however many times timeit() is configured to iterate)
  • make sure that timeit() is the only line in the test cell as shown in these examples
  • for more advanced use of timeit() and open more options for how to use it and related functions, check the documentation. This library was creatd in Python 2 and is compatible (and may have updated in Python 3.

To get around this limitation, examples are also provided using %timeit() and %time()

To understand the abbreviations in timeit, %timeit, and %time performance metrics, see this wikipedia post. For additional research on performance testing and code time metrics: timing and profiling

Simple Example: timeit(), %time, %timeit, %%timeit.

The function here is something stupid and simple just to show how to use these capabilities ...


In [3]:
def myFun(x):
    return (x**x)**x

myFun(9)


Out[3]:
196627050475552913618075908526912116283103450944214766927315415537966391196809

For this example, timeit() needs to be the only function in the cell, and then your code is called in as a valid function call as in this demo:


In [12]:
timeit(myFun(12))


100000 loops, best of 3: 2.96 µs per loop

Should this malfunction and/or throw errors, try restarting the kernel and re-running all pre-requisite cells and then this syntax should work.


In [14]:
%timeit 10*1000000
# this syntax allows comments ... note that if you leave off the numeric argument, %timeit seems to do nothing
myFun(12)


The slowest run took 38.09 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 38.5 ns per loop
Out[14]:
252405858452706802146088003199234910139421423537379794530169220964425944728647963794263559358200737721321118953128592183095980912780081856373786437365006336

If you get the 'slowest run took ...' message, try re-running the code cell to over-write the caching


In [15]:
%timeit 10*1000000
# this syntax allows comments ... note that if you leave off the numeric argument, %timeit seems to do nothing
myFun(12)


10000000 loops, best of 3: 39 ns per loop
Out[15]:
252405858452706802146088003199234910139421423537379794530169220964425944728647963794263559358200737721321118953128592183095980912780081856373786437365006336

In [9]:
%%timeit
# this syntax allows comments ... if defaults the looping argument
myFun(12)


100000 loops, best of 3: 2.96 µs per loop

In [16]:
%time
# generates "wall time" instead of CPU time
myFun(12)


Wall time: 0 ns
Out[16]:
252405858452706802146088003199234910139421423537379794530169220964425944728647963794263559358200737721321118953128592183095980912780081856373786437365006336

In [24]:
# getting more detail using %time on a script or code
%time {for i in range(10*1000000): x=1}


  File "<unknown>", line 1
    {for i in range(10*1000000): x=1}
       ^
SyntaxError: invalid syntax

In [28]:
%timeit -n 1 10*1000000
# does it just once which may be inaccurate due to random events
myFun(12)


1 loop, best of 3: 0 ns per loop
Out[28]:
252405858452706802146088003199234910139421423537379794530169220964425944728647963794263559358200737721321118953128592183095980912780081856373786437365006336

Unlike timeit(), the other options provided here (using iPython cell magics) can test any snippet of code within a python cell.


In [ ]:

Symmetric Difference Example

This code from hackerrank shows increasingly smaller snippets of code to find the symmentric difference between two sets. Symmetric difference of sets A and B is the set of values from both sets that do not intersect (i.e., values in A not found in B plus the values in B not found in A). This code was written to accept 4 lines of input as per a www.hackerrank.com specification. The problem itself is also from www.hackerrank.com.

Performance tests are attempted but are hard to know what is really going on since variance in the time to input the values could also account for speed differences just as easily as the possibility of coding efficiencies.


In [1]:
def find_symmetricDiff_inputSetsAB_v1():
    len_setA = int(input())
    set_A = set([int(i) for i in input().split()])
    len_setB = int(input())
    set_B = set([int(i) for i in input().split()])
    [print(val) for val in sorted(list(set_A.difference(set_B).union(set_B.difference(set_A))))]

def find_symmetricDiff_inputSetsAB_v2():
    setsLst = [0,0]
    for i in range(2):
        int(input()) # eat value ... don't need it
        setsLst[i] = set([int(i) for i in input().split()])
    [print(val) for val in sorted(list(setsLst[0].difference(setsLst[1]).union(setsLst[1].difference(setsLst[0]))))]

''' understanding next two versions:
    * key=int, applies int() to each value to be sorted so the values are sorted as 1,2,3 ... not: '1', '2', '3'
    * a^b is the same as a.symmetric_difference(b)
    
    these two come from discussion boards on hackerrank
'''
def find_symmetricDiff_inputSetsAB_v3():
    a,b = [set(input().split()) for _ in range(4)][1::2]
    return '\n'.join(sorted(a.symmetric_difference(b), key=int))

def find_symmetricDiff_inputSetsAB_v4():
    a,b = [set(input().split()) for _ in range(4)][1::2]
    return '\n'.join(sorted(a^b, key=int))

These tests use the following inputs. As per requirements in the challenge problem, what each line mean is also given here:

10
999 10001 574 39 12345678900100111, 787878, 999999, 1000000000000000000008889934567, 8989, 1111111111111111111111110000009999999
5
999 10001 574 39 73277773377737373000000000000007777888

In [2]:
i1 = int(1000000000000000000008889934567)
i2 = int(73277773377737373000000000000007777888)
print(i1)
print(i2)


1000000000000000000008889934567
73277773377737373000000000000007777888

In [30]:
%timeit -n 1 10*1000000
find_symmetricDiff_inputSetsAB_v1()


1 loop, best of 3: 0 ns per loop
10
1 2 3 4 6 7 8 9 10001
5
79 65 8 9 1
2
3
4
6
7
65
79
10001

In [ ]:
# timeit(find_symmetricDiff_inputSetsAB_v1(), 1)