Numpy

NumPy is the fundamental package for scientific computing with Python. You can find more tutorials at http://wiki.scipy.org/Tentative_NumPy_Tutorial . Also check http://www.numpy.org for additional informations.


In [69]:
# uncomment that line if you are using python 2
# from __future__ import print_function, division 

import numpy as np

Array Creation

numpy arrays are generally defined from lists

Unlike python lists, numpy arrays contain objects of ONLY one type.


In [2]:
# 1D array
np.array([1, 2, 3, 4])


Out[2]:
array([1, 2, 3, 4])

In [3]:
# 2D array
np.array([[1, 2], [3, 4]])


Out[3]:
array([[1, 2],
       [3, 4]])

In [4]:
a = np.array([[1, 2], [3, 4]])
print(a, type(a))


[[1 2]
 [3 4]] <class 'numpy.ndarray'>

A numpy array is an object with many attributes


In [68]:
print('* length (a.size) : %i' % a.size )
print('* number of dimensions (a.ndim) : %i' % a.ndim )
print('* shape (a.shape) : %s' % str(a.shape) )
print('* data type (a.dtype) : %s' % a.dtype )


* length (a.size) : 4
* number of dimensions (a.ndim) : 1
* shape (a.shape) : (4,)
* data type (a.dtype) : int64

NOTE: Unlike python lists, numpy arrays contain objects of ONLY one data type. By default, the data type (dtype) is usually set to float which corresponds to 64-byte float or np.float64

empty, zeros and ones

To quickly initialize a numpy array, there are a few convenience methods that only require a shape as input (and optionally a dtype)..


In [6]:
np.empty((3, 3))


Out[6]:
array([[ -0.00000000e+000,  -0.00000000e+000,   2.14317173e-314],
       [  2.14320111e-314,  -0.00000000e+000,  -0.00000000e+000],
       [  2.17173008e-314,   2.17169402e-314,   2.17155066e-314]])

In [7]:
np.zeros((2, 2), dtype=float)


Out[7]:
array([[ 0.,  0.],
       [ 0.,  0.]])

In [8]:
np.ones((3, 5), dtype=int)


Out[8]:
array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

.. or another numpy array.


In [9]:
np.empty_like(a)


Out[9]:
array([[0, 0],
       [0, 0]])

In [10]:
np.zeros_like(a, dtype=complex)


Out[10]:
array([[ 0.+0.j,  0.+0.j],
       [ 0.+0.j,  0.+0.j]])

In [11]:
np.ones_like(a, dtype=str)


Out[11]:
array([['1', '1'],
       ['1', '1']], 
      dtype='<U1')

arange

A range can be quickly created with the arange method ( as indgen in IDL)


In [12]:
np.arange(10)


Out[12]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Additional arguments enable to set the lower and upper bounds as well as the range step.


In [13]:
np.arange(2, 3, 0.25)


Out[13]:
array([ 2.  ,  2.25,  2.5 ,  2.75])

linspace and logspace

Similarly to arange, linspace and logspace are convenience methods to create ranges, but instead of specifying the elements spacing, the constraint is on the total number of elements.


In [14]:
np.linspace(1, 10, 5)


Out[14]:
array([  1.  ,   3.25,   5.5 ,   7.75,  10.  ])

For logspace, the boundaries should be specified in log.


In [15]:
np.logspace(np.log10(1), np.log10(10), 5)


Out[15]:
array([  1.        ,   1.77827941,   3.16227766,   5.62341325,  10.        ])

NOTE: arange never includes the upper bound while linespace and logspace do.

ogrid, mgrid and indices

For multidimensional ranges, ogrid creates a column and a row


In [16]:
np.ogrid[0:3, 0:3]


Out[16]:
[array([[0],
        [1],
        [2]]), array([[0, 1, 2]])]

while mgrid and indices create the full coordinate arrays


In [17]:
np.mgrid[0:3, 0:3]


Out[17]:
array([[[0, 0, 0],
        [1, 1, 1],
        [2, 2, 2]],

       [[0, 1, 2],
        [0, 1, 2],
        [0, 1, 2]]])

In [18]:
np.indices((3, 3))


Out[18]:
array([[[0, 0, 0],
        [1, 1, 1],
        [2, 2, 2]],

       [[0, 1, 2],
        [0, 1, 2],
        [0, 1, 2]]])

Accessing Array Elements

Indexing in 1D

numpy arrays are accessed with the same slicing as for lists.

Reminder: [start:end:step]


In [19]:
b = np.arange(10)
b[0:9:2]


Out[19]:
array([0, 2, 4, 6, 8])

Indexing in n-dimensions

The first index represents the row, the second represents the column. Dimensions need to be separated with commas ','.


In [20]:
# Square array from arange with reshape method
c = np.arange(5 * 5).reshape(5, 5)
print(c)
print(c[2, 3])   # Third row, fourth column
print(c[-1, 1])  # Last row, second column


[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]
13
21

To specify "all the elements along that particular dimension", use the colon :


In [21]:
d = np.arange(27).reshape(3, 3, 3)
print(d)


[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]]

In [22]:
d[0, :, 0]  # First row, entire column of the first slice


Out[22]:
array([0, 3, 6])

If you have many dimensions, you can replace several colons by an ellipsis ...


In [23]:
d[..., 1] == d[:, :, 1]


Out[23]:
array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]], dtype=bool)

Slicing


In [24]:
print(c)
print("c[0, :] = %s"% str( c[0, :]) )# First row equivalent to c[0]
print("c[1, :] = %s"% str( c[1, :]) )# Second row equivalent to c[1]
print("c[:, 0] = %s"% str( c[:, 0]) )# First column
print("c[:, 1] = %s"% str( c[:, 1]) )# Second column


[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]
c[0, :] = [0 1 2 3 4]
c[1, :] = [5 6 7 8 9]
c[:, 0] = [ 0  5 10 15 20]
c[:, 1] = [ 1  6 11 16 21]

More funky..


In [25]:
print(c[0:3, 0])       # from 0 to 3 (excluded) on the first axis
print(c[0:4:2, 1:5:2]) # from 0 to 4 (excluded) every 2 elements on the 1th to 5th axis every 2 elements


[ 0  5 10]
[[ 1  3]
 [11 13]]

Fancy indexing

numpy provides a way to give an array as index for a second array, an operation referred to as fancy indexing .

This is should very familiar to IDL users.


In [26]:
size = 6
x = np.random.randint(10, size=size)
idx = np.arange(0, size, 2)            # every second integer
print(x)
print(idx)
print(x[idx])


[9 0 9 8 8 2]
[0 2 4]
[9 9 8]

This can be useful to extract some information from an array without modifying it.

where ?

This is the magical IDL command, which is rendered almost useless in Python


In [27]:
# np.random is a submodule of numpy, dedicated to random generators
a = np.random.uniform(size=6)
print(a)


[ 0.57224757  0.44210908  0.00159502  0.05174668  0.22759638  0.89416188]

A simple test operation with a numpy array becomes an array of boolean values (element-wise test), which can turn extremely convenient to create masks.


In [28]:
mask = a > 0.5
print(mask)


[ True False False False False  True]

These masks are then used as indices (fancy indexing) to only retrieve the elements that match the test criterion.


In [29]:
a[mask]


Out[29]:
array([ 0.57224757,  0.89416188])

or that don't match the criterion


In [30]:
a[~mask]


Out[30]:
array([ 0.44210908,  0.00159502,  0.05174668,  0.22759638])

And the tests can be combined all at once, either is plain


In [31]:
a[(a > 0.4) & (a < 0.7)]


Out[31]:
array([ 0.57224757,  0.44210908])

or with numpy dedicated methods


In [32]:
a[np.logical_or(a < 0.3, a > 0.66)]


Out[32]:
array([ 0.00159502,  0.05174668,  0.22759638,  0.89416188])

However to get the indices of the array satisfying certain conditions, then np.where should be used


In [33]:
np.where(a > 0.5)                 # return a tuple with the indexes where the condition is True as first element
print(np.where(a > 0.5)[0])       # ... to access directly the indexes


[0 5]

but the best use case for np.where is to return an array progamatically : "if condition is met, return x, otherwise return y"


In [34]:
np.where(a > 0.5, a, np.nan)


Out[34]:
array([ 0.57224757,         nan,         nan,         nan,         nan,
        0.89416188])

Array operations

Basic operations


In [35]:
c + 1


Out[35]:
array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25]])

In [36]:
c * 5 - 3 / 2


Out[36]:
array([[  -1.5,    3.5,    8.5,   13.5,   18.5],
       [  23.5,   28.5,   33.5,   38.5,   43.5],
       [  48.5,   53.5,   58.5,   63.5,   68.5],
       [  73.5,   78.5,   83.5,   88.5,   93.5],
       [  98.5,  103.5,  108.5,  113.5,  118.5]])

Also work inplace


In [37]:
c += 3
c


Out[37]:
array([[ 3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12],
       [13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22],
       [23, 24, 25, 26, 27]])

ufuncs

All common arithmetic operations are optimally implemented in numpy and take advantage of the C machinery underneath. It means that squaring an array or computing its sum is much more efficient in numpy than just computing it from lists.

These small convenience methods are implemented as universal functions (ufunc - http://docs.scipy.org/doc/numpy/reference/ufuncs.html), like min, max, mean, std, etc.. can provide information on the whole array


In [38]:
c


Out[38]:
array([[ 3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12],
       [13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22],
       [23, 24, 25, 26, 27]])

In [39]:
c.mean()


Out[39]:
15.0

In [40]:
c.std()


Out[40]:
7.2111025509279782

or in multidimensional data, can provide information along a given axis ( 0 = rows | 1 = columns )


In [41]:
c.sum(axis=1)


Out[41]:
array([ 25,  50,  75, 100, 125])

In [42]:
c.max(axis=0)


Out[42]:
array([23, 24, 25, 26, 27])

This is a simpler way of applying the np.min() or np.std() functions to the array.

Matrices and vectors

Operations on vectors and matrices are handled easily, but upcasting (int => float) is always happening.


In [43]:
a = np.array([[1, 1],[0, 0]], dtype=float)
b = np.array([[1, 1],[1, 1]])
c = np.array([[1, 2],[3, 4]])
(a + 2) + (b * 4) * c


Out[43]:
array([[  7.,  11.],
       [ 14.,  18.]])

Between matrices, there is the element-wise product *


In [44]:
a * c


Out[44]:
array([[ 1.,  2.],
       [ 0.,  0.]])

and the matrix product np.dot.


In [45]:
np.dot(a, c)


Out[45]:
array([[ 4.,  6.],
       [ 0.,  0.]])

For more complex operations on vectors and matrices, there is a submodule in numpy dedicated to linear algebra called np.linalg for eigen vector decomposition, inverse operations, etc.

Masked Array

You can associate a mask to an array to create a masked array


In [46]:
a = np.array([1,2,3,4])
mask = np.array(a == 2)
print(a)
print(mask) # True when masked


[1 2 3 4]
[False  True False False]

In [47]:
masked_a = np.ma.array(a, mask=mask)
masked_a


Out[47]:
masked_array(data = [1 -- 3 4],
             mask = [False  True False False],
       fill_value = 999999)

In [48]:
print(a.sum())
print(masked_a.sum()) # The mask are taken into account


10
8

In [49]:
masked_b = np.ma.array(a, mask=(a ==1) )
masked_b


Out[49]:
masked_array(data = [-- 2 3 4],
             mask = [ True False False False],
       fill_value = 999999)

In [50]:
masked_a+masked_b # Will create a common mask


Out[50]:
masked_array(data = [-- -- 6 8],
             mask = [ True  True False False],
       fill_value = 999999)

Broadcasting

One of the major feature of numpy is the use of array broadcasting. Broadcasting allows operations (such as addition, multiplication etc.) which are normally element-wise to be carried on arrays of different shapes. It is a virtual replication of the arrays along the missing dimensions. It can be seen as a generalization of operations involving an array and a scalar.


In [51]:
matrix = np.zeros((4, 5))
matrix


Out[51]:
array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

The addition of a scalar on an matrix can be seen as the addition of a matrix with identical elements (and same dimensions)


In [52]:
matrix + 6


Out[52]:
array([[ 6.,  6.,  6.,  6.,  6.],
       [ 6.,  6.,  6.,  6.,  6.],
       [ 6.,  6.,  6.,  6.,  6.],
       [ 6.,  6.,  6.,  6.,  6.]])

The addition of a row on a matrix will be seen as the addition of a matrix with replicated rows (the number of columns must match).


In [53]:
row = np.arange(5)
row


Out[53]:
array([0, 1, 2, 3, 4])

In [54]:
matrix + row


Out[54]:
array([[ 0.,  1.,  2.,  3.,  4.],
       [ 0.,  1.,  2.,  3.,  4.],
       [ 0.,  1.,  2.,  3.,  4.],
       [ 0.,  1.,  2.,  3.,  4.]])

The addition of a column on a matrix will be seen as the addition of a matrix with replicated columns (the number of rows must match).


In [55]:
column = np.ones(4)
column


Out[55]:
array([ 1.,  1.,  1.,  1.])

In [56]:
# This would fail

# matrix + column

This one failed since the righmost dimensions are different. So for columns, an additional dimension must be specified and added on the right, indexing the array with an additional np.newaxis or simply None.


In [57]:
column = np.arange(4)[:, None]   # or np.ones(4)[:, np.newaxis]
column


Out[57]:
array([[0],
       [1],
       [2],
       [3]])

In [58]:
matrix + column


Out[58]:
array([[ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  3.]])

NOTE: In the row case above, the shapes also did not match (4,5) for the matrix and (5,) for the row. The actual rule of broadcasting is that for arrays of different rank, dimensions of length 1 are prepended (added on the left of the array shape) until the two arrays have the same rank.

For this reason, arrays with the following shapes can be broadcasted together:

  • (1, 1, 1, 8) and (9, 1)
  • (4, 1, 9) and (3, 1)

Reading text files

Numpy as basic capabilities to read text files


In [59]:
%%file numpy_data.txt
1 2 3 
4 5 6


Overwriting numpy_data.txt

In [60]:
data = np.loadtxt('numpy_data.txt')
data


Out[60]:
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

The astropy.io.ascii module has a much more complete function of reading ascii files in different type of format. Furthermore, scipy.io.readsav has the capability of reading IDL sav files into python objects. The python pickle and cPickle module can also be used to dump and read data to disk.

You are looking for a particular method in numpy ?

Simply use the lookfor method..


In [61]:
np.lookfor('fourier transform')


Search results for 'fourier transform'
--------------------------------------
numpy.fft.fft
    Compute the one-dimensional discrete Fourier Transform.
numpy.fft.fft2
    Compute the 2-dimensional discrete Fourier Transform
numpy.fft.fftn
    Compute the N-dimensional discrete Fourier Transform.
numpy.fft.ifft
    Compute the one-dimensional inverse discrete Fourier Transform.
numpy.fft.rfft
    Compute the one-dimensional discrete Fourier Transform for real input.
numpy.fft.ifft2
    Compute the 2-dimensional inverse discrete Fourier Transform.
numpy.fft.ifftn
    Compute the N-dimensional inverse discrete Fourier Transform.
numpy.fft.rfftn
    Compute the N-dimensional discrete Fourier Transform for real input.
numpy.fft.fftfreq
    Return the Discrete Fourier Transform sample frequencies.
numpy.fft.rfftfreq
    Return the Discrete Fourier Transform sample frequencies
numpy.bartlett
    Return the Bartlett window.
numpy.convolve
    Returns the discrete, linear convolution of two one-dimensional sequences.
numpy.fft.irfft
    Compute the inverse of the n-point DFT for real input.
numpy.fft.rfft2
    Compute the 2-dimensional FFT of a real array.
numpy.fft.irfftn
    Compute the inverse of the N-dimensional FFT of real input.

Exercice time !

These exercices are extracted from Pierre Chanial numpy lecture.

Waving for loops goodbye

Compute $\pi$, as given by the Madhava of Sangamagrama approximation formula:

$\pi = \sqrt{12}\sum^\infty_{k=0} \frac{(-\frac{1}{3})^{k}}{2k+1}$.

The $k$ indices ranging from 0 to (let’s say) 29 will be returned by the NumPy function arange and $\pi$ will be computed by calling another NumPy function (sum), instead of using a for loop.


In [1]:
# Write your solution here (2 lines expected)

N = 30
k = 0     # to be replaced with an array of indices using N
pi = 3.14 # to be replaced with the mathematical expression above

print('Relative error:', np.abs(pi - np.pi) / np.pi)


Relative error: 0.000506957382897

Computation of $\pi$ with Monte Carlo

Given the random variables X and Y following the uniform distribution between -1 and 1, the probability for the point (X, Y) to be inside the unity disk is the ratio of the surface of the unity disk and that of the unity square, i.e. $\pi/4$. It is then possible possible to compute $\pi$ by drawing realizations of X and Y and counting the fraction of points (X, Y) inside the unity disk.

Vectorize the following pure Python code, by using NumPy arrays and functions.


In [63]:
import math
import random

NTOTAL = 1000000

random.seed(0)
ninside = 0
for i in range(NTOTAL):
    x = random.uniform(-1, 1)
    y = random.uniform(-1, 1)
    ninside += math.sqrt(x**2 + y**2) < 1
pi = 4 * ninside / NTOTAL

print('Relative error:', np.abs(pi - np.pi) / np.pi)


Relative error: 0.000146630591737

Write the NumPy version below


In [64]:
NTOTAL = 1000000

random.seed(0)                        # use NumPy random module instead
x = random.uniform(-1, 1)             # use NumPy random module instead
y = random.uniform(-1, 1)             # use NumPy random module instead
ninside = math.sqrt(x**2 + y**2) < 1  # to be replaced with NumPy version
pi = 4 * ninside / NTOTAL

print('Relative error:', np.abs(pi - np.pi) / np.pi)


Relative error: 0.000146630591737

Broadcasting theory

Try to predict the output shape of the two examples given in the note on broadcasting, a paragraph above.


Velocity histogram

Complete the missing parts of the code below to do this exercise. Given a large number of particules of velocities $v_x$, $v_y$, $v_z$ distributed according to the standard normal distribution, plot the histogram of the speed in 1, 2 and 3 dimensions:

$v_1 = |v_x| = \sqrt{v_x^2}$

$v_2 = \sqrt{v_x^2+v_y^2}$

$v_3 = \sqrt{v_x^2+v_y^2+v_z^2}$

and compare it to the theoretical Maxwell distributions:

$f_n(v) = \left(\frac{\pi}{2}\right)^{-\frac{|n-2|}{2}} v^{n-1} e^{-\frac{1}{2}v^2}$

where $n = 1, 2, 3$ is the number of dimensions.


In [65]:
# Replace the ... with your solution

import numpy as np

def velocity2speed(velocity, ndims):
    """ Return the ndims-dimensional speed of the particles. """
    speed = np.sqrt(1)  # replace 1 with an expression that sums the velocity depending on ndims
    return speed


def speed_distribution(speed, ndims):
    """
    Return the probability distribution function of the ndims-dimensional
    speed of the particles.
    """
    distrib = 0    # replace with mathematical expression of Maxwell distribution
    # normalize the distribution
    return distrib


NPARTICULES = 1000000

velocity = np.random.standard_normal((NPARTICULES, 3))

# Test your functions here before plotting the result

In [ ]:
import matplotlib.pyplot as plt
%matplotlib inline

for ndims in (1, 2, 3):
    velocity = np.random.standard_normal((NPARTICULES, ndims))
    speed = velocity2speed(velocity, ndims)
    ax = plt.subplot(1, 3, ndims)
    n, bins, patches = ax.hist(speed, bins=100, normed=True, alpha=0.7)
    ax.set_title('{}-d speed distribution'.format(ndims))
    ax.set_xlim(0, 5)
    ax.set_ylim(0, 1.0)
    ax.set_xlabel('speed')
    ax.plot(speed, speed_distribution(speed, ndims), 'ro')

plt.show()

In [67]:
from IPython.core.display import HTML
HTML(open('../styles/notebook.css', 'r').read())


Out[67]: