In [1]:
for x in [1,2,3]:
print(x, end=' ')
Iteration is something that is used all the time in programming. Python makes it really easy to use.
There are a couple of parts to iteration, consider this a simple glossary for terms used in this post:
for
...for example__iter__
and __next__
to be defined__iter__
, which in turns returns an iterator__next__
, which in turns returns value to the consumerConsumers can be anything: for
, iter
creates an iterable out of a container. The builtin next
handles manual advancement over an iterator. Whatever they are, they use iteration to get values from an object. This is the simplest part of the question of iteration.
This part is more complex because there's multiple ways to approach implementing iteration. The iter protocol says that two methods must be implemented:
__iter__
: On the container this must return an object that implements __next__
; on the iterator, it must return it's instance__next__
: This only has to be implemented on the iterator, this method is used to return values to the consumer. When no values are left to return, this method must raise StopIteration
and continue to do so on each sequential call.A common way of handling iteration is two split the protocol over two objects. list
does this.
In [2]:
print('[] and iter([]) are different:', type([]) != type(iter([])))
print('iter([]) and iter(iter([])) are the same:', type(iter([])) == type(iter(iter([]))))
print('typeof iter([]):', type(iter([])))
print('[] hasattr __iter__:', hasattr([], '__iter__'))
print('[] hasattr __next__:', hasattr([], '__next__'))
print('iter([]) hasattr __iter__:', hasattr(iter([]), '__iter__'))
print('iter([]) hasattr __next__:', hasattr(iter([]), '__next__'))
It's very common for an iterable and an iterator to be different; however, it's not uncommon for them to be the same object (as defined, every iterator is both an iterable and it's iterator).
Python makes it super easy to define our own iteratables and iterators. Just implement a couple of methods and you're done!
In [3]:
class MyIterable:
'''Simple implementation of the iter protocol
__iter__ returns an instance of MyIterator
'''
def __init__(self, upper=0):
self.upper = upper
def __iter__(self):
return MyIterator(upper=self.upper)
class MyIterator:
'''Simple implementaion of the iter protocol
When iterated, incremental numbers are returned
until the upper bound is reached.
'''
def __init__(self, upper=0):
self.upper = upper
self.__current = 0
def __iter__(self):
return self
def __next__(self):
if self.__current >= self.upper:
raise StopIteration("Upper bound reached.")
else:
self.__current += 1
return self.__current - 1
for x in MyIterable(4):
print(x)
The advantage to splitting the iterable and the iterator into two classes is that you can maintain multiple iterators all at different states. If an object handled it's own iteration, you can maintain only one state.
In [4]:
test = MyIterator(4)
for x in test:
print(x, end=' ')
for x in test:
print(x, end=' ')
test = MyIterable(4)
it = iter(test)
print('\n', next(it), sep='')
for x in test:
print(x, end=' ')
print('\n', next(it), sep='')
print(next(it))
print(next(it))
However, something that's a little sneaky is returning a something else completely from the __iter__
so long as it is an iterator.
In [5]:
class SneakyIter(MyIterable):
def __iter__(self):
return iter(range(self.upper))
for x in SneakyIter(4):
print(x, end=' ')
Returning just range wouldn't work because range is iterable but it's not an iterator.
Well, a lot more than just counting. To borrow an example from Dave Beazly's fantastic Python Cookbook, 3rd Edition (I seriously cannot recommend this book enough), traversing nodes.
In [6]:
class Node:
def __init__(self, value):
self._value = value
self._children = []
def __repr__(self):
return 'Node(%r)' % self._value
def add_child(self, other_node):
self._children.append(other_node)
def __iter__(self):
return iter(self._children)
def depth_first(self):
return DepthFirstIterator(self)
class DepthFirstIterator(object):
'''
Depth-first traversal
'''
def __init__(self, start_node):
self._node = start_node
self._children_iter = None
self._child_iter = None
def __iter__(self):
return self
def __next__(self):
# Return myself if just started. Create an iterator for children
if self._children_iter is None:
self._children_iter = iter(self._node)
return self._node
# If processing a child, return its next item
elif self._child_iter:
try:
nextchild = next(self._child_iter)
return nextchild
except StopIteration:
self._child_iter = None
return next(self)
# Advance to the next child and start its iteration
else:
self._child_iter = next(self._children_iter).depth_first()
return next(self)
root = Node(0)
child1 = Node(1)
child2 = Node(2)
root.add_child(child1)
root.add_child(child2)
child1.add_child(Node(3))
child1.add_child(Node(4))
child2.add_child(Node(5))
for ch in root.depth_first():
print(ch, end=' ')
print('\n')
for ch in root:
print(ch, end=' ')
In [7]:
from string import ascii_lowercase as lowercase
# list comp.
test = [ord(c) for c in lowercase]
print(test)
# dict comp.
test = {c:ord(c) for c in lowercase}
print(test)
#you can use comps. in place of iterables in function arguments
print(sum([ord(c) for c in lowercase]))
#something else you can do is drop the brackets if the comp is the only argument
print(sum(ord(c) for c in lowercase))
But there are more of manipulating iters, such as the builtin filter
and map
, the itertools
module (which all return a special type of iterable called a generator, but that's another post, just know they aren't lists). But the use of filter and map are lessened some by list comps, which I often find easier to write. Some would argue that comphrensions should be used for creating new objects, whereas map, filter, et. al. should be used for manipulating existing objects. However, the important thing is consistency: if you use filter and map all over the place, don't suddenly throw a comphrension that transforms an existing structure.
In [8]:
filt = filter(lambda c: not ord(c) % 3, lowercase)
filt = list(filt) # transform filt into a list
comp = [c for c in lowercase if not ord(c)%3]
print(filt, comp, sep='\n')
mapd = map(ord, lowercase)
mapd = list(mapd)
comp = [ord(c) for c in lowercase]
print(mapd, comp, sep='\n')
mapfilt = map(ord, filter(lambda c: not ord(c) % 3, lowercase))
mapfilt = list(mapfilt)
comphre = [ord(c) for c in lowercase if not ord(c)%3]
print(mapfilt, comphre, sep='\n')
There's also unpacking, which is a great tool for pulling items out of an iterable without using a regular form of iteration. The way it works is that Python transparently creates a tuple -- if you didn't know the tuple operator is the ,
, not ()
; parens are only needed if the tuple is not the only argument in a function -- and then unpacking the tuple into the curret namespace. Seems complicated, but it really isn't.
In [9]:
a, b, c = lowercase[:3]
print(a, b, c)
# tuple unpacking is also useful for variable switching
a, b, c = c, b, a
print(a, b, c)
# similar to how we can splat a list into a function,
# we can splat iterables into unpacking
# _ becomes a list
a, *_, z = lowercase
print(a, z, _)
Well, there are other types of iterators called generators which iterate in a peculiar way. And without having a base understanding of them, understanding the real benefits of iters and itertools is difficult.
But other than that, go and iterate all the things. But just remember to iterate responsibly.