This is code for Python versions >= 3.3.

Iterables and Iterators in Python

Iterators

Iterator objects in Python provide a __next__ method. If the iteration has reached the end this is signaled by raising a StopIteration exception.


In [1]:
class TestIterator:
    
    def __init__(self, max_value):
        self._current_value = 0
        self._max_value = max_value
    
    def __next__(self):
        self._current_value += 1
        if self._current_value > self._max_value:
            raise StopIteration()
        return self._current_value

When you perform the iteration manually you should use the builtin next function to call the magic __next__ method.


In [2]:
iterator = TestIterator(3)
try:
    while True:
        print(next(iterator))
except StopIteration:
    pass


1
2
3

Of course you can also use a standard for-loop. However, the for-loop actually expects to be given a so called iterable object, not an iterator.


In [3]:
for i in TestIterator(3):
    print(i)


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-f0bcb3795080> in <module>()
----> 1 for i in TestIterator(3):
      2     print(i)

TypeError: 'TestIterator' object is not iterable

The same is the case for list constructors.


In [4]:
list(TestIterator(3))


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-83ca34441b54> in <module>()
----> 1 list(TestIterator(3))

TypeError: 'TestIterator' object is not iterable

Iterables

Iterables are defined by having an __iter__ method that return an iterator.


In [5]:
class TestIterable:
    
    def __init__(self, max_value):
        self._max_value = max_value
        
    def __iter__(self):
        return TestIterator(self._max_value)

Now we can finally use the standard for-loop:


In [6]:
for i in TestIterable(3):
    print(i)


1
2
3

This is convenient, because all the standard container classes are iterable. So you can directly put them into a for-loop or list constructor, without first having to manually create an iterator first.


In [7]:
for i in [1, 2, 3]:
    print(i)


1
2
3

Usually on etherefore does not have to use the __iter__ method manually. But if you do, use the builtin iter function instead.


In [8]:
test_iterable = TestIterable(3)
test_iterator = iter(test_iterable)
print(test_iterable)


<__main__.TestIterable object at 0x10f44dc50>

It would be anoying (and quite surprising) to not be able to use iterators with for-loops. Therefore iterators in Python must include an __iter__ method as well, returning the iterator itself.


In [9]:
class RealTestIterator(TestIterator):
    
    def __iter__(self):
        return self

We can now use this iterator as expected. When the for-loop applies the iter function this works and has no effect on the iterator.


In [10]:
for i in RealTestIterator(3):
    print(i)


1
2
3

But there is an important semantic difference between the __iter__ of iterables and iterators: iterables provide a fresh iterator object on each call and can therefore be iterated over multiple times. Iterators on the other hand are spent after the first iteration.


In [11]:
iterator = RealTestIterator(3)
for i in iterator:
    print(i)
for i in iterator:
    # iterator directly raises StopIteration, so this is never reached
    print(i)


1
2
3

This can cause subtle bugs and is actually a nice example for the pitfalls of duck typing. One possible way to safeguard against this is by testing the semantics of __iter__:


In [12]:
def is_iterator(it):
    return iter(it) is it

print(is_iterator(RealTestIterator(3)))
print(is_iterator(TestIterable(3)))


True
False

Generators

Generator Basics in Python

Every function that contains a yield keyword is a generator function. A generator function returns a generator object, which is a special case of an iterator (i.e., an object with a __next__ method and an __iter__ method that returns self).


In [13]:
def test():
    yield 1
    yield 2
    
print(test)
print(test())


<function test at 0x10f496950>
<generator object test at 0x10f48cdc8>

The iteration can be performed using the standard iterator API.


In [14]:
t = test()
try:
    while True:
        print(next(t))
except StopIteration:
    print('done')


1
2
done

A generator object can be used anywhere an iterator is supported, e.g., for loops.


In [15]:
for i in test():
    print(i)


1
2

Generators as Coroutines

Python 2.5 added the ability to not only get data from a generator, but also to send data to it. yield turned from a statement into an expression. Functions that use this feature are called coroutines.


In [16]:
def test():
    x = yield 1
    yield x**2
    
t = test()
print(next(t))  # go to the first yield
print(t.send(3))


1
9

Note that next(t) is equivalent to t.send(None).

Forwarding an iterator is easy:


In [17]:
def test():
    yield 1
    yield 2
    
def wrapper():
    for i in test():
        yield i
        
for i in wrapper():
    print(i)


1
2

Doing the same with a coroutine on the other hand is quite hard (see PEP 380), so Python 3.3 introduced yield from.

yield from

Wrapping/forwarding coroutines with yield from is easy. This is, for example, important if you want to refactor a coroutine by extracting a sub-coroutine.


In [18]:
def test():
    x = yield 1
    yield x**2
    
def wrapper():
    yield from test()
    
w = wrapper()
print(next(w))
print(w.send(3))


1
9

The same PEP also introduced return statements in coroutines, to transport a return value via StopIteration.


In [19]:
def test():
    for i in range(3):
        yield i
    return 'done'

for i in test():
    print(i)


0
1
2

In [20]:
t = test()
try:
    while True:
        print(next(t))
except StopIteration as e:
    print(e.value)


0
1
2
done

The return value also becomes the value of yield from:


In [21]:
def wrapper():
    value = yield from test()
    print('wrapper got:', value)
    return 'wrapper done'

for i in wrapper():
    print(i)


0
1
2
wrapper got: done

So yield from transparently pipes through the iterations and provides the end result value.

More random info about Generators

Yield and List Comprehensions (or Generator Expressions)

In older versions of Python the variables in list comprehensions would leak out. In Python 3 this is no longer the case:


In [22]:
[xy for xy in range(3)]
xy


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-22-cbb0562a8cb4> in <module>()
      1 [xy for xy in range(3)]
----> 2 xy

NameError: name 'xy' is not defined

List comprehensions now have their own execution context, just like functions and generator expressions.


In [23]:
(xy for xy in range(3))
xy


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-23-b71fcbdacfb2> in <module>()
      1 (xy for xy in range(3))
----> 2 xy

NameError: name 'xy' is not defined

A side effect of this is that a yield statement in some parts of a list comprehension causes it to evaluate to a generator object.


In [24]:
[i for i in range(3) if (yield i)]


Out[24]:
<generator object <listcomp> at 0x10f498b88>

This can be surprising at first.


In [25]:
set([i**2 for i in range(3) if (yield i)])


Out[25]:
{0, 1, 2}

In [26]:
set([(yield i**2) for i in range(3)])


Out[26]:
{0, 1, 4}

Only the expression list part is not affected by this. A yield statement in this part of the list comprehension works as normally expected (i.e., it refers to the surrounding generator function).


In [27]:
def g():
    return [i for i in (yield range(3))]

next(g())


Out[27]:
range(0, 3)

Generator expressions have always behaved like described above (since they are executed lazily they always had to store their context).


In [28]:
set(i**2 for i in range(3) if (yield i))


Out[28]:
{0, 1, 2}

Set and Dict comprehensions of course act like just list comprehensions.


In [29]:
{i**2 for i in range(3) if (yield i)}


Out[29]:
<generator object <setcomp> at 0x10f48c990>

In [30]:
{i: i**2 for i in range(3) if (yield i)}


Out[30]:
<generator object <dictcomp> at 0x10f48cc60>

With yield from we get the same behavior as with yield.


In [31]:
[i for i in range(3) if (yield from i)]


Out[31]:
<generator object <listcomp> at 0x10f48cfc0>

In [32]:
set([i for i in range(3) if (yield from i)])


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-32-483e50c20598> in <module>()
----> 1 set([i for i in range(3) if (yield from i)])

<ipython-input-32-483e50c20598> in <listcomp>(.0)
----> 1 set([i for i in range(3) if (yield from i)])

TypeError: 'int' object is not iterable

Beware of StopIteration

A generator can be exited explicity by raising StopIteration. Unfortunately it doesn't matter from where this is raised. It might come from another iteration inside a nested function that is not caught properly.


In [33]:
import unittest.mock as mock
m = mock.Mock(side_effect=[1, 2])

def test():
    yield m()
    yield m()
    yield m()

for i in test():
    print(i)


1
2

So a simple error in setting up your mocks can silently cause an unexpected abortion in your asynchronois test code!

GeneratorExit, close and throw

As a counterpart to StopIteration you can signal a generator from the outside that it should finish. This is done by calling close() on the generator, which will raise a GeneratorExit exception.


In [34]:
def test():
    try:
        i = 1
        while True:
            yield i
            i += 1
    except GeneratorExit:
        print('done')
    print('bye')
    
t = test()
print(next(t))
print(next(t))
t.close()
try:
    print(next(t))
except StopIteration:
    print('no more values')


1
2
done
bye
no more values

Catching the GeneratorExit is not really necessary here. But if the generator has any resources that need cleanup then one can use a try ... finally or a context manager to perform this.


In [35]:
def test():
    i = 1
    while True:
        yield i
        i += 1
    
t = test()
print(next(t))
print(next(t))
t.close()
try:
    print(next(t))
except StopIteration:
    print('no more values')


1
2
no more values

Yielding values after the exception was raised is not supported.


In [36]:
def test():
    try:
        i = 1
        while True:
            yield i
            i += 1
    except GeneratorExit:
        print('done')
    yield 'just one more value'
        
    
t = test()
print(next(t))
print(next(t))
t.close()


1
2
done
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-36-059928dd7e89> in <module>()
     13 print(next(t))
     14 print(next(t))
---> 15 t.close()

RuntimeError: generator ignored GeneratorExit

Note that throwing the GeneratorExit exception manually does not have the same effect as calling close.


In [37]:
def test():
    try:
        i = 1
        while True:
            yield i
            i += 1
    except GeneratorExit:
        print('done')
    yield 'one more value'
    yield 'and another one'

t = test()
print(next(t))
print(next(t))
print(t.throw(GeneratorExit()))
print(next(t))


1
2
done
one more value
and another one