This is code for Python versions >= 3.3.
Iterator objects in Python provide a __next__
method. If the iteration has reached the end this is signaled by raising a StopIteration
exception.
In [1]:
class TestIterator:
def __init__(self, max_value):
self._current_value = 0
self._max_value = max_value
def __next__(self):
self._current_value += 1
if self._current_value > self._max_value:
raise StopIteration()
return self._current_value
When you perform the iteration manually you should use the builtin next
function to call the magic __next__
method.
In [2]:
iterator = TestIterator(3)
try:
while True:
print(next(iterator))
except StopIteration:
pass
Of course you can also use a standard for-loop. However, the for-loop actually expects to be given a so called iterable object, not an iterator.
In [3]:
for i in TestIterator(3):
print(i)
The same is the case for list constructors.
In [4]:
list(TestIterator(3))
Iterables are defined by having an __iter__
method that return an iterator.
In [5]:
class TestIterable:
def __init__(self, max_value):
self._max_value = max_value
def __iter__(self):
return TestIterator(self._max_value)
Now we can finally use the standard for-loop:
In [6]:
for i in TestIterable(3):
print(i)
This is convenient, because all the standard container classes are iterable. So you can directly put them into a for-loop or list constructor, without first having to manually create an iterator first.
In [7]:
for i in [1, 2, 3]:
print(i)
Usually on etherefore does not have to use the __iter__
method manually. But if you do, use the builtin iter
function instead.
In [8]:
test_iterable = TestIterable(3)
test_iterator = iter(test_iterable)
print(test_iterable)
It would be anoying (and quite surprising) to not be able to use iterators with for-loops. Therefore iterators in Python must include an __iter__
method as well, returning the iterator itself.
In [9]:
class RealTestIterator(TestIterator):
def __iter__(self):
return self
We can now use this iterator as expected. When the for-loop applies the iter
function this works and has no effect on the iterator.
In [10]:
for i in RealTestIterator(3):
print(i)
But there is an important semantic difference between the __iter__
of iterables and iterators: iterables provide a fresh iterator object on each call and can therefore be iterated over multiple times. Iterators on the other hand are spent after the first iteration.
In [11]:
iterator = RealTestIterator(3)
for i in iterator:
print(i)
for i in iterator:
# iterator directly raises StopIteration, so this is never reached
print(i)
This can cause subtle bugs and is actually a nice example for the pitfalls of duck typing. One possible way to safeguard against this is by testing the semantics of __iter__
:
In [12]:
def is_iterator(it):
return iter(it) is it
print(is_iterator(RealTestIterator(3)))
print(is_iterator(TestIterable(3)))
Every function that contains a yield
keyword is a generator function. A generator function returns a generator object, which is a special case of an iterator (i.e., an object with a __next__
method and an __iter__
method that returns self).
In [13]:
def test():
yield 1
yield 2
print(test)
print(test())
The iteration can be performed using the standard iterator API.
In [14]:
t = test()
try:
while True:
print(next(t))
except StopIteration:
print('done')
A generator object can be used anywhere an iterator is supported, e.g., for loops.
In [15]:
for i in test():
print(i)
Python 2.5 added the ability to not only get data from a generator, but also to send data to it. yield
turned from a statement into an expression. Functions that use this feature are called coroutines.
In [16]:
def test():
x = yield 1
yield x**2
t = test()
print(next(t)) # go to the first yield
print(t.send(3))
Note that next(t)
is equivalent to t.send(None)
.
Forwarding an iterator is easy:
In [17]:
def test():
yield 1
yield 2
def wrapper():
for i in test():
yield i
for i in wrapper():
print(i)
Doing the same with a coroutine on the other hand is quite hard (see PEP 380), so Python 3.3 introduced yield from
.
Wrapping/forwarding coroutines with yield from
is easy. This is, for example, important if you want to refactor a coroutine by extracting a sub-coroutine.
In [18]:
def test():
x = yield 1
yield x**2
def wrapper():
yield from test()
w = wrapper()
print(next(w))
print(w.send(3))
The same PEP also introduced return statements in coroutines, to transport a return value via StopIteration
.
In [19]:
def test():
for i in range(3):
yield i
return 'done'
for i in test():
print(i)
In [20]:
t = test()
try:
while True:
print(next(t))
except StopIteration as e:
print(e.value)
The return value also becomes the value of yield from
:
In [21]:
def wrapper():
value = yield from test()
print('wrapper got:', value)
return 'wrapper done'
for i in wrapper():
print(i)
So yield from
transparently pipes through the iterations and provides the end result value.
In older versions of Python the variables in list comprehensions would leak out. In Python 3 this is no longer the case:
In [22]:
[xy for xy in range(3)]
xy
List comprehensions now have their own execution context, just like functions and generator expressions.
In [23]:
(xy for xy in range(3))
xy
A side effect of this is that a yield
statement in some parts of a list comprehension causes it to evaluate to a generator object.
In [24]:
[i for i in range(3) if (yield i)]
Out[24]:
This can be surprising at first.
In [25]:
set([i**2 for i in range(3) if (yield i)])
Out[25]:
In [26]:
set([(yield i**2) for i in range(3)])
Out[26]:
Only the expression list part is not affected by this. A yield
statement in this part of the list comprehension works as normally expected (i.e., it refers to the surrounding generator function).
In [27]:
def g():
return [i for i in (yield range(3))]
next(g())
Out[27]:
Generator expressions have always behaved like described above (since they are executed lazily they always had to store their context).
In [28]:
set(i**2 for i in range(3) if (yield i))
Out[28]:
Set and Dict comprehensions of course act like just list comprehensions.
In [29]:
{i**2 for i in range(3) if (yield i)}
Out[29]:
In [30]:
{i: i**2 for i in range(3) if (yield i)}
Out[30]:
With yield from
we get the same behavior as with yield.
In [31]:
[i for i in range(3) if (yield from i)]
Out[31]:
In [32]:
set([i for i in range(3) if (yield from i)])
A generator can be exited explicity by raising StopIteration
. Unfortunately it doesn't matter from where this is raised. It might come from another iteration inside a nested function that is not caught properly.
In [33]:
import unittest.mock as mock
m = mock.Mock(side_effect=[1, 2])
def test():
yield m()
yield m()
yield m()
for i in test():
print(i)
So a simple error in setting up your mocks can silently cause an unexpected abortion in your asynchronois test code!
As a counterpart to StopIteration
you can signal a generator from the outside that it should finish. This is done by calling close()
on the generator, which will raise a GeneratorExit
exception.
In [34]:
def test():
try:
i = 1
while True:
yield i
i += 1
except GeneratorExit:
print('done')
print('bye')
t = test()
print(next(t))
print(next(t))
t.close()
try:
print(next(t))
except StopIteration:
print('no more values')
Catching the GeneratorExit
is not really necessary here. But if the generator has any resources that need cleanup then one can use a try ... finally
or a context manager to perform this.
In [35]:
def test():
i = 1
while True:
yield i
i += 1
t = test()
print(next(t))
print(next(t))
t.close()
try:
print(next(t))
except StopIteration:
print('no more values')
Yielding values after the exception was raised is not supported.
In [36]:
def test():
try:
i = 1
while True:
yield i
i += 1
except GeneratorExit:
print('done')
yield 'just one more value'
t = test()
print(next(t))
print(next(t))
t.close()
Note that throwing the GeneratorExit
exception manually does not have the same effect as calling close.
In [37]:
def test():
try:
i = 1
while True:
yield i
i += 1
except GeneratorExit:
print('done')
yield 'one more value'
yield 'and another one'
t = test()
print(next(t))
print(next(t))
print(t.throw(GeneratorExit()))
print(next(t))