In [1]:
def test(a):
return a
%timeit test(1)
So we get 10M function calls per second (a few hundred CPU clock cycles). This is about 100 times slower than what can be achieved in low level languages. CPython does use the C stack for function calls and spins up a new interpreter loop for each call. However, it uses a custom frame object for the actual Python content.
Using this for comparison with generators (reading the numbers from a precomputed list should be faster...).
In [2]:
numbers = list(range(1000))
%timeit sum(numbers)
In [3]:
def generator_func():
i = 0
while i < 1000:
i += 1
yield i
%timeit sum(generator_func())
So while this is ten times slower, we can still manage about 10M iterations per second (similar to the number of function calls we can manage). Under the hood CPython keeps the frame object and reuses it when entering the function again, but a new C stack frame is pushed each time.
What about nesting generators?
In [4]:
def generator_wrapper(inner_iterator):
yield from inner_iterator
%timeit sum(generator_wrapper(generator_wrapper(generator_func())))
So while nesting added some overhead, the effect is not dramatic (we still manage about five million iterations per second).
In [5]:
def test_func(i):
return i + 1;
def generator_calling_func():
i = 0
while i < 1000:
i = test_func(i)
yield i
%timeit sum(generator_calling_func())
The cost of switching to another thread is hard to estimate, but a rough estimate might be $30 \mu s$. If entering a generator takes $100ns$ then this should be enough for 300 such enterings in the same amount of time.
In [6]:
import asyncio
loop = asyncio.get_event_loop()
In [7]:
async def counter():
sum = 0
i = 0
while i < 1000:
i = await get_next(i)
sum += i
return sum
async def get_next(i):
return i + 1
%timeit loop.run_until_complete(counter())
So each iteration takes about $500ns$, and the event loop could perform about 2 million iterations per second.