Overhead of Functions and Coroutines

Plain function calls


In [1]:
def test(a):
    return a

%timeit test(1)


10000000 loops, best of 3: 75.9 ns per loop

So we get 10M function calls per second (a few hundred CPU clock cycles). This is about 100 times slower than what can be achieved in low level languages. CPython does use the C stack for function calls and spins up a new interpreter loop for each call. However, it uses a custom frame object for the actual Python content.

Plain list

Using this for comparison with generators (reading the numbers from a precomputed list should be faster...).


In [2]:
numbers = list(range(1000))

%timeit sum(numbers)


100000 loops, best of 3: 8.11 µs per loop

Generator


In [3]:
def generator_func():
    i = 0
    while i < 1000:
        i += 1
        yield i

%timeit sum(generator_func())


10000 loops, best of 3: 108 µs per loop

So while this is ten times slower, we can still manage about 10M iterations per second (similar to the number of function calls we can manage). Under the hood CPython keeps the frame object and reuses it when entering the function again, but a new C stack frame is pushed each time.

What about nesting generators?


In [4]:
def generator_wrapper(inner_iterator):
    yield from inner_iterator

%timeit sum(generator_wrapper(generator_wrapper(generator_func())))


10000 loops, best of 3: 177 µs per loop

So while nesting added some overhead, the effect is not dramatic (we still manage about five million iterations per second).

Plain function calls in a generator


In [5]:
def test_func(i):
    return i + 1;

def generator_calling_func():
    i = 0
    while i < 1000:
        i = test_func(i)
        yield i
        
%timeit sum(generator_calling_func())


10000 loops, best of 3: 184 µs per loop

asyncio overhead

The cost of switching to another thread is hard to estimate, but a rough estimate might be $30 \mu s$. If entering a generator takes $100ns$ then this should be enough for 300 such enterings in the same amount of time.


In [6]:
import asyncio
loop = asyncio.get_event_loop()

In [7]:
async def counter():
    sum = 0
    i = 0
    while i < 1000:
        i = await get_next(i)
        sum += i
    return sum

async def get_next(i):
    return i + 1
    
%timeit loop.run_until_complete(counter())


1000 loops, best of 3: 468 µs per loop

So each iteration takes about $500ns$, and the event loop could perform about 2 million iterations per second.