In [1]:
%load_ext watermark

In [2]:
%watermark -d -v -u


Last updated: 20/07/2014 

CPython 2.7.8
IPython 2.1.0

[More information](http://nbviewer.ipython.org/github/rasbt/python_reference/blob/master/ipython_magic/watermark.ipynb) about the `watermark` magic command extension.


I would be happy to hear your comments and suggestions.
Please feel free to drop me a note via twitter, email, or google+.


Day 19 - One Python Benchmark per Day

Python 2.x's izip and xrange


A common use-case for the range function is the classic for-loop. In Python 2.x, range will create a list in memory and iterate over it. As you can imagine, its is not always very efficient to create a list (especially for a large number of iterations) if we just want to use it once.
Thus, the xrange function was added later-on, which evaluates "lazily" similar to generators.
(In Python 3.x, the range function was implemented similar to Python's xrange).

But note that it using xrange instead of range does not always make sense. E.g., if we have a use-case where we want to iterate over the sequence (here: integers) multiple times, it would be more efficient to use range, since xrange is always constructing the values from scratch.
And also slicing is not supported by xrange.


In [27]:
print 'range', type(range(3))
print 'xrange', type(xrange(3))


range <type 'list'>
xrange <type 'xrange'>

In the following benchmark, we will take a look at the speed advantages of range over xrange using a simple for-loop construct:

for i in range(some_int):
    pass



Similar to the xrange function, the izip function (in the itertools built-in module) evaluates the values in a list "lazily" without constructing a list of tuples like it's zip counterpart.


In [4]:
from itertools import izip

print 'zip', type(zip([1,2], [3,4]))
print 'izip', type(izip([1,2], [3,4]))


zip <type 'list'>
izip <type 'itertools.izip'>

Also here, we take a look at the speed benefits of using izip over zip for a simple for-loop construct:

for i in range(n, n): # where i is a tuple of 2 values
    pass


Bechmarking via timeit


In [16]:
import timeit

xrange_list, range_list, zip_list, izip_list = [], [], [], []

orders = [10**i for i in xrange(1, 9)]

for n in orders:

    print('n=%s (%s of %s)' %(n, orders.index(n)+1, len(orders)))

    range_list.append(min(timeit.Timer('for i in range(n): pass' , 
            'from __main__ import n').repeat(repeat=5, number=1)))

    xrange_list.append(min(timeit.Timer('for i in xrange(n): pass' , 
            'from __main__ import n').repeat(repeat=5, number=1)))
    zip_list.append(min(timeit.Timer('for i in zip(range(n), range(n)): pass' , 
            'from __main__ import n').repeat(repeat=5, number=1)))
    izip_list.append(min(timeit.Timer('for i in izip(range(n), range(n)): pass' , 
            'from __main__ import n, izip').repeat(repeat=5, number=1)))
print('finished')


n=10 (1 of 8)
n=100 (2 of 8)
n=1000 (3 of 8)
n=10000 (4 of 8)
n=100000 (5 of 8)
n=1000000 (6 of 8)
n=10000000 (7 of 8)
n=100000000 (8 of 8)
finished

In [17]:
%matplotlib inline

In [24]:
from matplotlib import pyplot as plt

def plot():
    
    def settings():
        plt.xlim([min(orders) / 10, max(orders)* 10])
        plt.grid()
        plt.xticks(fontsize=16)
        plt.yticks(fontsize=16)
        plt.xscale('log')
        plt.yscale('log')
        plt.legend(loc='upper left', fontsize=14)
        plt.xlabel('size of n', fontsize=16)
        plt.ylabel('time in seconds (log-scale)', fontsize=16)
    
    fig = plt.figure(figsize=(14,6))
    
    plt.subplot(1,2,1)
    plt.plot(orders, range_list, label='range(n)')
    plt.plot(orders, xrange_list, label='xrange(n)')
    plt.title('range and xrange in a for-loop', fontsize=22)
    settings()
    
    plt.subplot(1,2,2)
    plt.plot(orders, zip_list, label='zip(n, n)')
    plt.plot(orders, izip_list, label='izip(n, n)')
    plt.title('zip and izip in a for-loop', fontsize=22)
    settings()
    
    plt.tight_layout()
    plt.show()



Results


In [25]:
plot()



In [28]:
%watermark -d -v -m


20/07/2014 

CPython 2.7.8
IPython 2.1.0

compiler   : GCC 4.2.1 (Apple Inc. build 5577)
system     : Darwin
release    : 13.3.0
machine    : x86_64
processor  : i386
CPU cores  : 2
interpreter: 64bit



If we take the logarithmic scale, the "lazy evaluation" really pays off in simple for-loop constructs where we are only iterating over a given sequence once. Especially with increasing sizes of n, we can obeserve that the construction of the list objects via range and zip becomes very inefficient for a one-time iteration use-case.


In [ ]: