Chapter 3, example 1

We illustrate the notion of array computation with a simple example consisting of finding, among a large list of locations, the closest one from a position of interest.

Pure Python version

The pure Python version loops through all positions, computes the distance from the position of interest, and returns the index of the closest location.


In [1]:
def closest(position, positions):
    x0, y0 = position
    dbest, ibest = None, None
    for i, (x, y) in enumerate(positions):
        d = (x - x0) ** 2 + (y - y0) ** 2
        if dbest is None or d < dbest:
            dbest, ibest = d, i
    return ibest
import random

We generate a list of random positions.


In [2]:
positions = [(random.random(), random.random()) for _ in xrange(10000000)]

Now we evaluate the time required to compute the closest position from (0.5, 0.5).


In [3]:
%timeit closest((.5, .5), positions)


1 loops, best of 3: 12.8 s per loop

Vectorized version

This version uses NumPy to compute all distances in a vectorized way, without using Python loops. This version is far more efficient than the pure Python version.

First, we need to activate the pylab mode so that NumPy is loaded and the NumPy objects are available in the current namespace.


In [4]:
%pylab


Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.zmq.pylab.backend_inline].
For more information, type 'help(pylab)'.

Generating random positions with NumPy's rand function is much faster than using a list.


In [5]:
positions = rand(10000000, 2)

positions is a NumPy ndarray object.


In [6]:
type(positions)


Out[6]:
numpy.ndarray

It has two dimensions and a shape of (10000000, 2).


In [7]:
positions.ndim, positions.shape


Out[7]:
(2, (10000000, 2))

We can easily get the columns of this matrix, which contain here the x and y coordinates of all positions.


In [8]:
x, y = positions[:,0], positions[:,1]

Now we compute the distances in a vectorized fashion, which is much more efficient than with a Python loop.


In [9]:
distances = (x - .5) ** 2 + (y - .5) ** 2

In [10]:
%timeit exec In[9]


1 loops, best of 3: 341 ms per loop

In [11]:
%timeit ibest = distances.argmin()


10 loops, best of 3: 24.2 ms per loop

Using NumPy is about 35 times faster than using pure Python code here.