Iterators

In Python most container objects can be looped using a for statement.

For example we can use for statement for looping over a list.


In [1]:
for i in [1,2,3]:
    print(i)


1
2
3

If we use it with a string, it loops over its characters.


In [2]:
for ch in 'test':
    print(ch)


t
e
s
t

If use it with a dictionary, it loops over its keys


In [3]:
for k in {1:'test1',2:'test'}:
    print(k)


1
2

So there are many types of objects which can be used with a for loop. These are called iterable objects.

There are many functions which consume these iterables.


In [5]:
",".join(["a","b","c"])


Out[5]:
'a,b,c'

In [7]:
",".join(('this','is','a','test'))


Out[7]:
'this,is,a,test'

In [9]:
",".join({'key1':'value','key2':'value2'})


Out[9]:
'key1,key2'

Iteration Protocol


In [15]:
x = iter([1,2,3])
print(x)
print(next(x))
print(next(x))
print(next(x))
print(next(x))  # <-- will create an error


<list_iterator object at 0x107733978>
1
2
3
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-15-cb89fe47a2a1> in <module>()
      4 print(next(x))
      5 print(next(x))
----> 6 print(next(x))  # <-- will create an error

StopIteration: 

Having seen the mechanics behind the iterator protocol, it is easy to add iterator behavior to your classes. Define an __iter__() method which returns an object with a __next__() method. If the class defines __next__(), then __iter__() can just return self:


In [16]:
class Reverse:
    """Iterator for looping over a sequence backwards."""
    def __init__(self, data):
        self.data = data
        self.index = len(data)

    def __iter__(self):
        return self

    def __next__(self):
        if self.index == 0:
            raise StopIteration
        self.index = self.index - 1
        return self.data[self.index]

In [17]:
rev = Reverse('spam')
for i in rev:
    print(i)


m
a
p
s

Generators

Generators are a simple and powerful tool for creating iterators.

They are written like regular functions but use the yield statement whenever they want to return data. Each time next() is called on it, the generator resumes where it left off (it remembers all the data values and which statement was last executed). An example shows that generators can be trivially easy to create:


In [18]:
def reverse(data):
    for index in range(len(data)-1, -1, -1):
        yield data[index]

In [20]:
for ch in reverse('shallow'):
    print(ch)


w
o
l
l
a
h
s

Anything that can be done with generators can also be done with class-based iterators as described in the previous section. What makes generators so compact is that the iter() and next() methods are created automatically.

Another key feature is that the local variables and execution state are automatically saved between calls. This made the function easier to write and much more clear than an approach using instance variables like self.index and self.data.

In addition to automatic method creation and saving program state, when generators terminate, they automatically raise StopIteration. In combination, these features make it easy to create iterators with no more effort than writing a regular function.

The following examples shows how generators work.


In [28]:
def samplegen():
    print("begin")
    for i in range(3):
        print("before yield", i)
        yield i
        print("after yield", i)
    print("end")
    
f = samplegen()
print(next(f))
print(next(f))
print(next(f))
print(next(f))


begin
before yield 0
0
after yield 0
before yield 1
1
after yield 1
before yield 2
2
after yield 2
end
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-28-20ef883c7482> in <module>()
     11 print(next(f))
     12 print(next(f))
---> 13 print(next(f))

StopIteration: 

Generator Expressions

Generator Expressions are generator version of list comprehensions. They look like list comprehensions, but returns a generator back instead of a list.


In [30]:
a = (x * x for x in range(10))
sum(a)


Out[30]:
285

Generator expressions are more compact but less versatile than full generator definitions and tend to be more memory friendly than equivalent list comprehensions.


In [35]:
xvec = [5,16,7]
yvec = [4,12,18]

sum(x * y for x,y in zip(xvec,yvec))


Out[35]:
338

In [36]:
data = 'golf'
list(data[i] for i in range(len(data)-1, -1, -1))


Out[36]:
['f', 'l', 'o', 'g']

In [39]:
# unique_words = set(word  for line in page  for word in line.split())

In [40]:
# valedictorian = max((student.gpa, student.name) for student in graduates)

Note that generators provide another way to deal with infinity, for example:


In [41]:
from time import gmtime, strftime
def myGen():
    while True:
        yield strftime("%a, %d %b %Y %H:%M:%S +0000", gmtime())    

myGeneratorInstance = myGen()
next(myGeneratorInstance)


Out[41]:
'Wed, 25 Oct 2017 07:45:00 +0000'

In [42]:
next(myGeneratorInstance)


Out[42]:
'Wed, 25 Oct 2017 07:45:12 +0000'

Use of Generators

1. Easy to Implement

Generators can be implemented in a clear and concise way as compared to their iterator class counterpart.


In [ ]:
# Iterator Class
class PowTwo:
    def __init__(self, max = 0):
        self.max = max
    def __iter__(self):
        self.n = 0
        return self
    def __next__(self):
        if self.n > self.max:
            raise StopIteration
        
        result = 2 ** self.n
        self.n += 1
        return result

This was lengthy. Now lets do the same using a generator function.


In [ ]:
def PowTwoGen(max = 0):
    n = 0
    while n < max:
        yield 2 ** n
        n += 1

Since generators keep track of details automatically, it was concise and much cleaner in implementation.

2. Memory Efficient

A normal function to return a sequence will create the entire sequence in memory before returning the result. This is an overkill if the number of items in the sequence is very large.

3. Represent Infinite Stream

Generators are excellent medium to represent an infinite stream of data. Infinite streams cannot be stored in memory and since generators produce only one item at a time, it can represent infinite stream of data.


In [ ]:
def all_event():
    n = 0
    while True:
        yield n
        n += 2

4. Pipelining Generators

Generators can be used to pipeline a series of operations.

If we are analysing a log file and if the log file has a 3rd column that keeps track of the ips every hour and we want to sum it to find unique ips in last 5 months.


In [2]:
with open('sells.log') as file:
    ip_col = (line[3] for line in file)
    per_hr = (int(x) for x in ip_col if x != 'N/A')
    print("IPs =", sum(per_hr))


---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-2-63fb62a63ffe> in <module>()
----> 1 with open('sells.log') as file:
      2     ip_col = (line[3] for line in file)
      3     per_hr = (int(x) for x in ip_col if x != 'N/A')
      4     print("IPs =", sum(per_hr))

FileNotFoundError: [Errno 2] No such file or directory: 'sells.log'

Using Itertools


In [44]:
import itertools
horses = [1,2,3,4]
races = itertools.permutations(horses)
print(list(races))


[(1, 2, 3, 4), (1, 2, 4, 3), (1, 3, 2, 4), (1, 3, 4, 2), (1, 4, 2, 3), (1, 4, 3, 2), (2, 1, 3, 4), (2, 1, 4, 3), (2, 3, 1, 4), (2, 3, 4, 1), (2, 4, 1, 3), (2, 4, 3, 1), (3, 1, 2, 4), (3, 1, 4, 2), (3, 2, 1, 4), (3, 2, 4, 1), (3, 4, 1, 2), (3, 4, 2, 1), (4, 1, 2, 3), (4, 1, 3, 2), (4, 2, 1, 3), (4, 2, 3, 1), (4, 3, 1, 2), (4, 3, 2, 1)]