In [1]:

    
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

from __future__ import print_function

Efficient Numerical Computing with Numpy

Jake VanderPlas

In this session, we'll discuss how to make your programs as efficient as possible, mainly by taking advantage of vectorization in NumPy.

If you use Python for any amount of time, you'll quickly find that there are some things it is not so good at. In particular, performing repeated operations via loops is one of its weaknesses.

For example:



In [2]:

    
# A silly function implemented in Python

def func_python(N):
    d = 0.0
    for i in range(N):
        d += (i % 3 - 1) * i
    return d



In [3]:

    
# Use IPython timeit magic to time the execution
%timeit func_python(10000)









    



100 loops, best of 3: 1.97 ms per loop

To compare to a compiled language, let's write the same function in fortran and use the f2py tool (included in NumPy) to compile it



In [4]:

    
%%file func_fortran.f

      subroutine func_fort(n, d)
           integer, intent(in) :: n
           double precision, intent(out) :: d
           integer :: i
           d = 0
           do i = 0, n - 1
                d = d + (mod(i, 3) - 1) * i
           end do
      end subroutine func_fort









    



Overwriting func_fortran.f



In [5]:

    
!f2py -c func_fortran.f -m func_fortran > /dev/null



In [6]:

    
from func_fortran import func_fort
%timeit func_fort(10000)









    



100000 loops, best of 3: 16.1 µs per loop

Fortran is about 100 times faster for this task!

Why is Python so slow?

Languages tend to compromise between convenience and performance.

High level languages (Python, R, IDL, Matlab, etc.) tend to use High-level objects, dynamic typing, and interpreted execution for convenience in writing code
- But: have generally large execution overhead for type-checking, etc.
- JIT (just in time) compilation technologies, like LLVM (e.g. Julia, Numba) may be changing this, though
Low-level languages (C, Fortran, etc.) tend to use static types and compiled execution for performance
- But: no interactive prompt, lots of development time spent on details like typing, etc.

We like high-level languages because our development time is generally more valuable than execution time. But sometimes speed can be an issue.

Strategies for making Python fast

Use Numpy ufuncs to your advantage
Use Numpy aggregates to your advantage
Use Numpy broadcasting to your advantage
Use Numpy slicing, masking, and fancy indexing to your advantage
Use a tool like SWIG, cython or f2py to interface to compiled code.

Here we'll cover the first four strategies.

Strategy 1: Using UFuncs in Numpy

A ufunc in numpy is a Universal Function. This is a function which operates element-wise on an array. We've already seen examples of these in the various arithmetic operations:



In [7]:

    
x = np.random.random(4)
print(x)
print(x + 1)  # add 1 to each element of x









    



[ 0.1231047   0.34071966  0.97596715  0.38643665]
[ 1.1231047   1.34071966  1.97596715  1.38643665]



In [8]:

    
print(x * 2)  # multiply each element of x by 2









    



[ 0.24620939  0.68143932  1.95193429  0.77287329]



In [9]:

    
print(x * x)  # multiply each element of x by itself









    



[ 0.01515477  0.11608989  0.95251187  0.14933328]



In [10]:

    
print(x[1:] - x[:-1])









    



[ 0.21761497  0.63524749 -0.5895305 ]

These are binary ufuncs: they take two arguments.

There are also many unary ufuncs:



In [11]:

    
-x









    Out[11]:





array([-0.1231047 , -0.34071966, -0.97596715, -0.38643665])



In [12]:

    
np.sin(x)









    Out[12]:





array([ 0.122794  ,  0.33416547,  0.82824423,  0.37689023])

The Speed of Ufuncs



In [13]:

    
x = np.random.random(100000)



In [14]:

    
%%timeit
# compute element-wise x + 1 via a ufunc 
y = x + 1









    



10000 loops, best of 3: 57.1 µs per loop



In [15]:

    
%%timeit
# compute element-wise x + 1 via a loop
y = np.empty_like(x)
for i in range(len(x)):
    y[i] = x[i] + 1









    



10 loops, best of 3: 118 ms per loop

Why is NumPy so much faster?

Numpy UFuncs are faster than Python functions involving loops, because the looping happens in compiled code. This is only possible when types are known beforehand, which is why numpy arrays must be typed.

Other Available Ufuncs

Trigonometric functions (np.sin, np.cos, etc.)
Scipy special functions (scipy.special.j0, scipy.special.gammaln, etc.)
Element-wise minimum/maximum (np.minimum, np.maximum)
User-defined ufuncs (read more here)

`np.min` vs `np.minimum`



In [16]:

    
x = np.random.random(5)
print(x)
print(np.minimum(x, 0.5))
print(np.maximum(x, 0.5))









    



[ 0.64960338  0.44854455  0.93193523  0.60049152  0.0451027 ]
[ 0.5         0.44854455  0.5         0.5         0.0451027 ]
[ 0.64960338  0.5         0.93193523  0.60049152  0.5       ]



In [17]:

    
# contrast this behavior with that of min() and max()
print(np.min(x))
print(np.max(x))









    



0.0451027007923
0.931935234484

Sines, Gamma functions, etc.



In [18]:

    
x = np.linspace(0, 10, 1000)
plt.plot(x, np.sin(x));



In [19]:

    
from scipy.special import gammaln
plt.plot(x, gammaln(x));

Some interesting properties of UFuncs

UFuncs have some methods built-in, which allow for some very interesting, flexible, and fast operations:



In [20]:

    
x = np.arange(5)
y = np.arange(1, 6)
x + y









    Out[20]:





array([1, 3, 5, 7, 9])

All operators (like +) actuall call an underlying numpy function: in this case np.add:



In [21]:

    
np.add(x, y)









    Out[21]:





array([1, 3, 5, 7, 9])

These ufuncs have some interesting and useful properties:



In [22]:

    
np.add.accumulate(x)









    Out[22]:





array([ 0,  1,  3,  6, 10])



In [23]:

    
np.multiply.accumulate(x)









    Out[23]:





array([0, 0, 0, 0, 0])



In [24]:

    
np.multiply.accumulate(y)









    Out[24]:





array([  1,   2,   6,  24, 120])



In [25]:

    
np.add.identity









    Out[25]:





0



In [26]:

    
np.multiply.identity









    Out[26]:





1



In [27]:

    
np.add.outer(x, y)









    Out[27]:





array([[1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6],
       [3, 4, 5, 6, 7],
       [4, 5, 6, 7, 8],
       [5, 6, 7, 8, 9]])



In [28]:

    
# make a times-table
x = np.arange(1, 13)
np.multiply.outer(x, x)









    Out[28]:





array([[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12],
       [  2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24],
       [  3,   6,   9,  12,  15,  18,  21,  24,  27,  30,  33,  36],
       [  4,   8,  12,  16,  20,  24,  28,  32,  36,  40,  44,  48],
       [  5,  10,  15,  20,  25,  30,  35,  40,  45,  50,  55,  60],
       [  6,  12,  18,  24,  30,  36,  42,  48,  54,  60,  66,  72],
       [  7,  14,  21,  28,  35,  42,  49,  56,  63,  70,  77,  84],
       [  8,  16,  24,  32,  40,  48,  56,  64,  72,  80,  88,  96],
       [  9,  18,  27,  36,  45,  54,  63,  72,  81,  90,  99, 108],
       [ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100, 110, 120],
       [ 11,  22,  33,  44,  55,  66,  77,  88,  99, 110, 121, 132],
       [ 12,  24,  36,  48,  60,  72,  84,  96, 108, 120, 132, 144]])

Ufunc mini-exercises

Each of the following functions take an array as input, and return an array as output. They are implemented using loops, which is not very efficient.

For each function, implement a fast version which uses ufuncs to calculate the result more efficiently. Double-check that you get the same result for several different arrays.
use the %timeit magic to time the execution of the two implementations for a large array (say, 1000 elements).



In [29]:

    
# 1. computing the element-wise sine + cosine

from math import sin, cos
def slow_sincos(x):
    """x is a 1-dimensional array"""
    y = np.zeros_like(x)
    for i in range(len(x)):
        y[i] = sin(x[i]) + cos(x[i])
    return y

x = np.random.random(5)
slow_sincos(x)









    Out[29]:





array([ 1.10954934,  1.41280477,  1.11025905,  1.1235107 ,  1.19782107])



In [30]:

    
# write a fast_sincos function



In [31]:

    
# 2. computing the difference between adjacent squares

def slow_sqdiff(x):
    """x is a 1-dimensional array"""
    y = np.zeros(len(x) - 1)
    for i in range(len(y)):
        y[i] = x[i + 1] ** 2 - x[i] ** 2
    return y

x = np.random.random(5)
slow_sqdiff(x)









    Out[31]:





array([ 0.02843657,  0.22826022, -0.33808932,  0.45751827])



In [32]:

    
# write a fast_sqdiff function



In [33]:

    
# 3. computing the outer-product of each consecutive pair

def slow_pairprod(x):
    """x is a 1-dimensional array"""
    if len(x) % 2 != 0:
        raise ValueError("length of x must be even")
    N = len(x) / 2
    y = np.zeros((N, N))
    for i in range(N):
        for j in range(N):
            y[i, j] = x[2 * i] * x[2 * j + 1]
    return y

x = np.arange(1, 9)
slow_pairprod(x)









    Out[33]:





array([[  2.,   4.,   6.,   8.],
       [  6.,  12.,  18.,  24.],
       [ 10.,  20.,  30.,  40.],
       [ 14.,  28.,  42.,  56.]])



In [34]:

    
# write a fast_pairprod function

Strategy 2. Using Numpy Aggregates

Aggregates are functions over arrays which return smaller arrays.

Numpy has several built-in



In [35]:

    
# 10 x 10 array drawn from a standard normal
x = np.random.randn(10, 10)



In [36]:

    
x.mean()









    Out[36]:





0.005926372912100737



In [37]:

    
np.mean(x)









    Out[37]:





0.005926372912100737



In [38]:

    
x.std()









    Out[38]:





1.0681813993312947



In [39]:

    
x.var()









    Out[39]:





1.1410115018773628



In [40]:

    
x.sum()









    Out[40]:





0.59263729121007369



In [41]:

    
x.prod()









    Out[41]:





2.3263845450932399e-26



In [42]:

    
np.median(x)









    Out[42]:





-0.080455675624693163



In [43]:

    
np.percentile(x, 50)









    Out[43]:





-0.080455675624693163



In [44]:

    
np.percentile(x, (25, 75))









    Out[44]:





[-0.63705411377136534, 0.66386400880543395]

Aggregates along certain dimensions



In [45]:

    
x = np.random.rand(3, 5)
x









    Out[45]:





array([[ 0.37521212,  0.16805226,  0.61283898,  0.21395026,  0.36733656],
       [ 0.91795723,  0.02566346,  0.16991705,  0.62027897,  0.3057437 ],
       [ 0.751662  ,  0.1830548 ,  0.31503027,  0.85763625,  0.90561931]])



In [46]:

    
x.sum(0)  # sum along rows









    Out[46]:





array([ 2.04483135,  0.37677051,  1.0977863 ,  1.69186548,  1.57869957])



In [47]:

    
x.sum(1)  # sum along columns









    Out[47]:





array([ 1.73739018,  2.0395604 ,  3.01300262])



In [48]:

    
np.median(x, 1)









    Out[48]:





array([ 0.36733656,  0.3057437 ,  0.751662  ])



In [49]:

    
np.mean(x, 1)









    Out[49]:





array([ 0.34747804,  0.40791208,  0.60260052])

Binary ufuncs as aggregates

Any binary ufunc (a ufunc taking two arguments) can be turned into an aggregate using the reduce() method:



In [50]:

    
np.sum(x, 1)









    Out[50]:





array([ 1.73739018,  2.0395604 ,  3.01300262])



In [51]:

    
np.add.reduce(x, 1)









    Out[51]:





array([ 1.73739018,  2.0395604 ,  3.01300262])



In [52]:

    
np.prod(x, 1)









    Out[52]:





array([ 0.003037  ,  0.00075914,  0.03366703])



In [53]:

    
np.multiply.reduce(x, 1)









    Out[53]:





array([ 0.003037  ,  0.00075914,  0.03366703])

Some of these reduce methods are a bit silly:



In [54]:

    
np.divide.reduce(x, 1)









    Out[54]:





array([   46.35634202,  1110.00709617,    16.78187137])

A caution: for reduce methods, the default axis is 0, while for sum the default axis is None:



In [55]:

    
np.add.reduce(x)









    Out[55]:





array([ 2.04483135,  0.37677051,  1.0977863 ,  1.69186548,  1.57869957])



In [56]:

    
np.sum(x)









    Out[56]:





6.7899532114887027

A quick efficiency note: Beware the built-in Python aggregates!

Python has a min, max, and sum aggregate built-in. These are much more general than the versions in NumPy:



In [57]:

    
x = np.random.random(10000)

%timeit np.sum(x)
%timeit sum(x)









    



100000 loops, best of 3: 15.8 µs per loop
100 loops, best of 3: 3.66 ms per loop

This is slow for the same reason that loops are slow: built-in min, sum, and max functions basically have to type-check every element.

Make sure to use Numpy's np.sum, np.min, and np.max.

Aggregate Mini-exercises

Take the following functions, and convert them into an efficient form using aggregates. Each function expects a 1-dimensional array as input. Double-check that your function returns the same result as the original



In [58]:

    
def slow_cubesum(x):
    """x is a 1D array"""
    result = 0
    for i in range(len(x)):
        result += x[i] ** 3
    return result

x = np.random.random(100)
slow_cubesum(x)









    Out[58]:





22.986847893700467



In [59]:

    
# implement fast_cubesum



In [60]:

    
def slow_rms(x):
    """x is a 1D array"""
    m = np.mean(x)
    rms = 0
    for i in range(len(x)):
        rms += (x[i] - m) ** 2
    rms /= len(x)
    return np.sqrt(rms)

x = np.random.random(100)
slow_rms(x)









    Out[60]:





0.28895814643282985



In [61]:

    
# implement fast_rms

Now we return to our silly function from the beginning of this section. Can you implement a fast version using ufuncs and aggregates?



In [62]:

    
def slow_sillyfunc(N):
    """N is an integer"""
    d = 0.0
    for i in range(N):
        d += (i % 3 - 1) * i
    return d

slow_sillyfunc(100)









    Out[62]:





-33.0



In [63]:

    
# Implement fast_sillyfunc using ufuncs & aggragates

Strategy 3: Using Numpy Broadcasting

Broadcasting is a way to extend the usefulness of ufuncs. For example, it allows you to subtract a vector from a matrix without writing a loop:



In [64]:

    
X = np.random.rand(3, 4)
mean = X.mean(0)
print(X)
print(mean)









    



[[ 0.13033367  0.39084497  0.48740275  0.46706801]
 [ 0.54222919  0.71130202  0.53520395  0.27169094]
 [ 0.91013207  0.05874528  0.35727916  0.30515685]]
[ 0.52756498  0.38696409  0.45996195  0.34797193]



In [65]:

    
print(X - mean)









    



[[-0.3972313   0.00388088  0.02744079  0.11909608]
 [ 0.01466421  0.32433793  0.075242   -0.076281  ]
 [ 0.38256709 -0.32821881 -0.10268279 -0.04281509]]

We've subtracted the mean vector from each row of the matrix with a single operation. This is an example of broadcasting.

Rules of Broadcasting

(image source)

Broadcasting rules:

If the two arrays differ in their number of dimensions, the shape of the array with fewer dimensions is padded with ones on its leading (left) side.
If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

Some Broadcasting examples...



In [66]:

    
x = np.arange(10)
x ** 2









    Out[66]:





array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])



In [67]:

    
Y = x * x[:, np.newaxis]
Y









    Out[67]:





array([[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18],
       [ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27],
       [ 0,  4,  8, 12, 16, 20, 24, 28, 32, 36],
       [ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45],
       [ 0,  6, 12, 18, 24, 30, 36, 42, 48, 54],
       [ 0,  7, 14, 21, 28, 35, 42, 49, 56, 63],
       [ 0,  8, 16, 24, 32, 40, 48, 56, 64, 72],
       [ 0,  9, 18, 27, 36, 45, 54, 63, 72, 81]])



In [68]:

    
Y + 10 * x









    Out[68]:





array([[  0,  10,  20,  30,  40,  50,  60,  70,  80,  90],
       [  0,  11,  22,  33,  44,  55,  66,  77,  88,  99],
       [  0,  12,  24,  36,  48,  60,  72,  84,  96, 108],
       [  0,  13,  26,  39,  52,  65,  78,  91, 104, 117],
       [  0,  14,  28,  42,  56,  70,  84,  98, 112, 126],
       [  0,  15,  30,  45,  60,  75,  90, 105, 120, 135],
       [  0,  16,  32,  48,  64,  80,  96, 112, 128, 144],
       [  0,  17,  34,  51,  68,  85, 102, 119, 136, 153],
       [  0,  18,  36,  54,  72,  90, 108, 126, 144, 162],
       [  0,  19,  38,  57,  76,  95, 114, 133, 152, 171]])



In [69]:

    
Y + 10 * x[:, np.newaxis]









    Out[69]:





array([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
       [ 10,  11,  12,  13,  14,  15,  16,  17,  18,  19],
       [ 20,  22,  24,  26,  28,  30,  32,  34,  36,  38],
       [ 30,  33,  36,  39,  42,  45,  48,  51,  54,  57],
       [ 40,  44,  48,  52,  56,  60,  64,  68,  72,  76],
       [ 50,  55,  60,  65,  70,  75,  80,  85,  90,  95],
       [ 60,  66,  72,  78,  84,  90,  96, 102, 108, 114],
       [ 70,  77,  84,  91,  98, 105, 112, 119, 126, 133],
       [ 80,  88,  96, 104, 112, 120, 128, 136, 144, 152],
       [ 90,  99, 108, 117, 126, 135, 144, 153, 162, 171]])



In [70]:

    
Y = np.random.random((2, 3, 4))
x = 10 * np.arange(3)

Y + x[:, np.newaxis]









    Out[70]:





array([[[  0.83250853,   0.68313399,   0.37255257,   0.90963716],
        [ 10.86953565,  10.49053538,  10.40165644,  10.06197872],
        [ 20.95098578,  20.31752505,  20.18094159,  20.78663043]],

       [[  0.13724524,   0.31820215,   0.8721424 ,   0.37733029],
        [ 10.22608683,  10.18324573,  10.89380149,  10.85604921],
        [ 20.36238462,  20.93689935,  20.53204387,  20.63015677]]])

Quick Broadcasting/Ufunc Exercises

Now, assume you have $N$ points in $D$ dimensions, represented by an array of shape [N, D], try the following:

Compute the mean of the distribution of points efficiently using the built-in np.mean aggregate (that is, find the D-dimensional point which is the mean of the rest of the points)
Compute the mean of the distribution of points efficiently using the np.add ufunc.
Compute the standard error of the mean $\sigma_{mean} = \sigma N^{-1/2}$, where $\sigma$ is the standard-deviation, using the np.std aggregate.
Compute this again using the np.add ufunc.
Construct the matrix M, the centered and normalized version of the X array: $$ M_{ij} = (X_{ij} - \mu_j) / \sigma_j $$ This is one version of whitening the array.



In [71]:

    
X = np.random.random((1000, 5))  # 1000 points in 5 dimensions



In [72]:

    
# 1. Compute the mean of the 1000 points in X



In [73]:

    
# 2. Compute the mean using np.add



In [74]:

    
# 3. Compute the standard deviation across the 1000 points



In [75]:

    
# 4. Compute the standard deviation using np.add only



In [76]:

    
# 5. Compute the whitened version of the array

Strategy 4: Fancy Indexing and Masking

The last strategy we will cover is fancy indexing and masking.

For example, imagine you have an array of data where negative values indicate some kind of error.



In [77]:

    
x = np.array([1, 2, 3, -999, 2, 4, -999])

How might you clean this array, setting all negative values to, say, zero?



In [78]:

    
for i in range(len(x)):
    if x[i] < 0:
        x[i] = 0
x









    Out[78]:





array([1, 2, 3, 0, 2, 4, 0])

A faster way is to construct a boolean mask:



In [79]:

    
x = np.array([1, 2, 3, -999, 2, 4, -999])

mask = (x < 0)
mask









    Out[79]:





array([False, False, False,  True, False, False,  True], dtype=bool)

And the mask can be used directly to set the value you desire:



In [80]:

    
x[mask] = 0
x









    Out[80]:





array([1, 2, 3, 0, 2, 4, 0])

Often you'll see this done in a single step:



In [81]:

    
x = np.array([1, 2, 3, -999, 2, 4, -999])
x[x < 0] = 0
x









    Out[81]:





array([1, 2, 3, 0, 2, 4, 0])

Useful masking functions



In [82]:

    
x = np.random.random(5)
x









    Out[82]:





array([ 0.38048517,  0.21218316,  0.72218801,  0.12978412,  0.73458475])



In [83]:

    
x[x > 0.5] = np.nan
x









    Out[83]:





array([ 0.38048517,  0.21218316,         nan,  0.12978412,         nan])



In [84]:

    
x[np.isnan(x)] = np.inf
x









    Out[84]:





array([ 0.38048517,  0.21218316,         inf,  0.12978412,         inf])



In [85]:

    
# warning: don't replace np.isnan(x) with (x == np.nan):
np.nan == np.nan









    Out[85]:





False



In [86]:

    
x[np.isinf(x)] = 0
x









    Out[86]:





array([ 0.38048517,  0.21218316,  0.        ,  0.12978412,  0.        ])



In [87]:

    
x = np.array([1, 0, -1, -np.inf, np.inf, np.nan])
print("input   ", x)
print("x < 0   ", x[x < 0])
print("x > 0   ", x[x > 0])
print("isinf   ", x[np.isinf(x)])
print("isnan   ", x[np.isnan(x)])
print("isposinf", x[np.isposinf(x)])
print("isneginf", x[np.isneginf(x)])









    



input    [  1.   0.  -1. -inf  inf  nan]
x < 0    [ -1. -inf]
x > 0    [  1.  inf]
isinf    [-inf  inf]
isnan    [ nan]
isposinf [ inf]
isneginf [-inf]






    



-c:3: RuntimeWarning: invalid value encountered in less
-c:4: RuntimeWarning: invalid value encountered in greater

Boolean Operations on Masks

Boolean operators can be used on masks: order of operations can be a gotcha – be sure to use parentheses!



In [88]:

    
x = np.arange(16).reshape((4, 4))
x









    Out[88]:





array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])



In [89]:

    
x[x < 5]









    Out[89]:





array([0, 1, 2, 3, 4])



In [90]:

    
x[~(x < 5)]









    Out[90]:





array([ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])



In [91]:

    
x[(x < 10) & (x % 2 == 0)]









    Out[91]:





array([0, 2, 4, 6, 8])



In [92]:

    
x[(x > 3) & (x < 8)]









    Out[92]:





array([4, 5, 6, 7])

Counting elements with a mask

Sum over a mask to find the number of True elements:



In [93]:

    
x = np.random.random(100)
print("array is length", len(x), "and has")
print((x > 0.5).sum(), "elements are greater than 0.5")









    



array is length 100 and has
60 elements are greater than 0.5



In [94]:

    
# clip is a useful function:
x = np.clip(x, 0.3, 0.6)

np.sum(x < 0.3), np.sum(x > 0.6)









    Out[94]:





(0, 0)



In [95]:

    
# works for 2D arrays as well
X = np.random.random((10, 10))
print((X < 0.1).sum())

`where` function: Turning a mask into indices



In [96]:

    
x = np.random.random((3, 3))
x









    Out[96]:





array([[ 0.21305713,  0.58388143,  0.43704044],
       [ 0.1925056 ,  0.82110282,  0.77627151],
       [ 0.79434441,  0.44246991,  0.48757473]])



In [97]:

    
np.where(x < 0.5)









    Out[97]:





(array([0, 0, 1, 2, 2]), array([0, 2, 0, 1, 2]))



In [98]:

    
x[x < 0.5]









    Out[98]:





array([ 0.21305713,  0.43704044,  0.1925056 ,  0.44246991,  0.48757473])



In [99]:

    
x[np.where(x < 0.5)]









    Out[99]:





array([ 0.21305713,  0.43704044,  0.1925056 ,  0.44246991,  0.48757473])

When you index with the result of a where function, you are using what is called fancy indexing: indexing with tuples

Fancy Indexing (indexing with sequences)



In [100]:

    
X = np.arange(16).reshape((4, 4))
X









    Out[100]:





array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])



In [101]:

    
X[(0, 1), (1, 0)]









    Out[101]:





array([1, 4])



In [102]:

    
X[range(4), range(4)]









    Out[102]:





array([ 0,  5, 10, 15])



In [103]:

    
X.diagonal()









    Out[103]:





array([ 0,  5, 10, 15])



In [104]:

    
# Note: diagonal() is read-only (though this will change in numpy 1.9)
X.diagonal() = 100









    



  File "<ipython-input-104-d4b0e796ca47>", line 2
    X.diagonal() = 100
SyntaxError: can't assign to function call



In [105]:

    
# Instead, try this:
X[range(4), range(4)] = 100
X









    Out[105]:





array([[100,   1,   2,   3],
       [  4, 100,   6,   7],
       [  8,   9, 100,  11],
       [ 12,  13,  14, 100]])

Randomizing the rows



In [106]:

    
X = np.arange(24).reshape((6, 4))
X









    Out[106]:





array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])



In [107]:

    
i = np.arange(6)
np.random.shuffle(i)
i









    Out[107]:





array([0, 4, 2, 3, 5, 1])



In [108]:

    
X[i]  # X[i, :] is identical









    Out[108]:





array([[ 0,  1,  2,  3],
       [16, 17, 18, 19],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [20, 21, 22, 23],
       [ 4,  5,  6,  7]])

Fancy indexing also works for multi-dimensional index arrays



In [109]:

    
i2 = i.reshape(3, 2)
X[i2]









    Out[109]:





array([[[ 0,  1,  2,  3],
        [16, 17, 18, 19]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]],

       [[20, 21, 22, 23],
        [ 4,  5,  6,  7]]])



In [110]:

    
X[i2].shape









    Out[110]:





(3, 2, 4)

Summary: Speeding up NumPy

It's all about moving loops into compiled code:

Use Numpy ufuncs to your advantage (eliminate loops!)
Use Numpy aggregates to your advantage (eliminate loops!)
Use Numpy broadcasting to your advantage (eliminate loops!)
Use Numpy slicing and masking to your advantage (eliminate loops!)
Use a tool like SWIG, cython or f2py to interface to compiled code.