NumPy is the fundamental package for scientific computing with Python. You can find more tutorials at http://wiki.scipy.org/Tentative_NumPy_Tutorial . Also check http://www.numpy.org for additional informations.
In [69]:
# uncomment that line if you are using python 2
# from __future__ import print_function, division
import numpy as np
numpy
arrays are generally defined from lists
Unlike python lists, numpy arrays contain objects of ONLY one type.
In [2]:
# 1D array
np.array([1, 2, 3, 4])
Out[2]:
In [3]:
# 2D array
np.array([[1, 2], [3, 4]])
Out[3]:
In [4]:
a = np.array([[1, 2], [3, 4]])
print(a, type(a))
A numpy
array is an object with many attributes
In [68]:
print('* length (a.size) : %i' % a.size )
print('* number of dimensions (a.ndim) : %i' % a.ndim )
print('* shape (a.shape) : %s' % str(a.shape) )
print('* data type (a.dtype) : %s' % a.dtype )
NOTE: Unlike python lists, numpy arrays contain objects of ONLY one data type. By default, the data type (dtype
) is usually set to float
which corresponds to 64-byte float or np.float64
In [6]:
np.empty((3, 3))
Out[6]:
In [7]:
np.zeros((2, 2), dtype=float)
Out[7]:
In [8]:
np.ones((3, 5), dtype=int)
Out[8]:
.. or another numpy array.
In [9]:
np.empty_like(a)
Out[9]:
In [10]:
np.zeros_like(a, dtype=complex)
Out[10]:
In [11]:
np.ones_like(a, dtype=str)
Out[11]:
In [12]:
np.arange(10)
Out[12]:
Additional arguments enable to set the lower and upper bounds as well as the range step.
In [13]:
np.arange(2, 3, 0.25)
Out[13]:
In [14]:
np.linspace(1, 10, 5)
Out[14]:
For logspace
, the boundaries should be specified in log
.
In [15]:
np.logspace(np.log10(1), np.log10(10), 5)
Out[15]:
NOTE: arange
never includes the upper bound while linespace
and logspace
do.
In [16]:
np.ogrid[0:3, 0:3]
Out[16]:
while mgrid
and indices
create the full coordinate arrays
In [17]:
np.mgrid[0:3, 0:3]
Out[17]:
In [18]:
np.indices((3, 3))
Out[18]:
In [19]:
b = np.arange(10)
b[0:9:2]
Out[19]:
In [20]:
# Square array from arange with reshape method
c = np.arange(5 * 5).reshape(5, 5)
print(c)
print(c[2, 3]) # Third row, fourth column
print(c[-1, 1]) # Last row, second column
To specify "all the elements along that particular dimension", use the colon :
In [21]:
d = np.arange(27).reshape(3, 3, 3)
print(d)
In [22]:
d[0, :, 0] # First row, entire column of the first slice
Out[22]:
If you have many dimensions, you can replace several colons by an ellipsis ...
In [23]:
d[..., 1] == d[:, :, 1]
Out[23]:
In [24]:
print(c)
print("c[0, :] = %s"% str( c[0, :]) )# First row equivalent to c[0]
print("c[1, :] = %s"% str( c[1, :]) )# Second row equivalent to c[1]
print("c[:, 0] = %s"% str( c[:, 0]) )# First column
print("c[:, 1] = %s"% str( c[:, 1]) )# Second column
More funky..
In [25]:
print(c[0:3, 0]) # from 0 to 3 (excluded) on the first axis
print(c[0:4:2, 1:5:2]) # from 0 to 4 (excluded) every 2 elements on the 1th to 5th axis every 2 elements
In [26]:
size = 6
x = np.random.randint(10, size=size)
idx = np.arange(0, size, 2) # every second integer
print(x)
print(idx)
print(x[idx])
This can be useful to extract some information from an array without modifying it.
This is the magical IDL command, which is rendered almost useless in Python
In [27]:
# np.random is a submodule of numpy, dedicated to random generators
a = np.random.uniform(size=6)
print(a)
A simple test operation with a numpy
array becomes an array of boolean values (element-wise test), which can turn extremely convenient to create masks.
In [28]:
mask = a > 0.5
print(mask)
These masks are then used as indices (fancy indexing) to only retrieve the elements that match the test criterion.
In [29]:
a[mask]
Out[29]:
or that don't match the criterion
In [30]:
a[~mask]
Out[30]:
And the tests can be combined all at once, either is plain
In [31]:
a[(a > 0.4) & (a < 0.7)]
Out[31]:
or with numpy
dedicated methods
In [32]:
a[np.logical_or(a < 0.3, a > 0.66)]
Out[32]:
However to get the indices of the array satisfying certain conditions, then np.where
should be used
In [33]:
np.where(a > 0.5) # return a tuple with the indexes where the condition is True as first element
print(np.where(a > 0.5)[0]) # ... to access directly the indexes
but the best use case for np.where
is to return an array progamatically : "if condition is met, return x, otherwise return y"
In [34]:
np.where(a > 0.5, a, np.nan)
Out[34]:
In [35]:
c + 1
Out[35]:
In [36]:
c * 5 - 3 / 2
Out[36]:
Also work inplace
In [37]:
c += 3
c
Out[37]:
All common arithmetic operations are optimally implemented in numpy
and take advantage of the C machinery underneath. It means that squaring an array or computing its sum is much more efficient in numpy than just computing it from lists.
These small convenience methods are implemented as universal functions (ufunc - http://docs.scipy.org/doc/numpy/reference/ufuncs.html), like min
, max
, mean
, std
, etc.. can provide information on the whole array
In [38]:
c
Out[38]:
In [39]:
c.mean()
Out[39]:
In [40]:
c.std()
Out[40]:
or in multidimensional data, can provide information along a given axis ( 0 = rows | 1 = columns )
In [41]:
c.sum(axis=1)
Out[41]:
In [42]:
c.max(axis=0)
Out[42]:
This is a simpler way of applying the np.min()
or np.std()
functions to the array.
Operations on vectors and matrices are handled easily, but upcasting (int => float
) is always happening.
In [43]:
a = np.array([[1, 1],[0, 0]], dtype=float)
b = np.array([[1, 1],[1, 1]])
c = np.array([[1, 2],[3, 4]])
(a + 2) + (b * 4) * c
Out[43]:
Between matrices, there is the element-wise product *
In [44]:
a * c
Out[44]:
and the matrix product np.dot
.
In [45]:
np.dot(a, c)
Out[45]:
For more complex operations on vectors and matrices, there is a submodule in numpy
dedicated to linear algebra called np.linalg
for eigen vector decomposition, inverse operations, etc.
You can associate a mask to an array to create a masked array
In [46]:
a = np.array([1,2,3,4])
mask = np.array(a == 2)
print(a)
print(mask) # True when masked
In [47]:
masked_a = np.ma.array(a, mask=mask)
masked_a
Out[47]:
In [48]:
print(a.sum())
print(masked_a.sum()) # The mask are taken into account
In [49]:
masked_b = np.ma.array(a, mask=(a ==1) )
masked_b
Out[49]:
In [50]:
masked_a+masked_b # Will create a common mask
Out[50]:
One of the major feature of numpy
is the use of array broadcasting. Broadcasting allows operations (such as addition, multiplication etc.) which are normally element-wise to be carried on arrays of different shapes. It is a virtual replication of the arrays along the missing dimensions. It can be seen as a generalization of operations involving an array and a scalar.
In [51]:
matrix = np.zeros((4, 5))
matrix
Out[51]:
The addition of a scalar on an matrix can be seen as the addition of a matrix with identical elements (and same dimensions)
In [52]:
matrix + 6
Out[52]:
The addition of a row on a matrix will be seen as the addition of a matrix with replicated rows (the number of columns must match).
In [53]:
row = np.arange(5)
row
Out[53]:
In [54]:
matrix + row
Out[54]:
The addition of a column on a matrix will be seen as the addition of a matrix with replicated columns (the number of rows must match).
In [55]:
column = np.ones(4)
column
Out[55]:
In [56]:
# This would fail
# matrix + column
This one failed since the righmost dimensions are different. So for columns, an additional dimension must be specified and added on the right, indexing the array with an additional np.newaxis
or simply None
.
In [57]:
column = np.arange(4)[:, None] # or np.ones(4)[:, np.newaxis]
column
Out[57]:
In [58]:
matrix + column
Out[58]:
NOTE: In the row case above, the shapes also did not match (4,5) for the matrix and (5,) for the row. The actual rule of broadcasting is that for arrays of different rank, dimensions of length 1 are prepended (added on the left of the array shape) until the two arrays have the same rank.
For this reason, arrays with the following shapes can be broadcasted together:
(1, 1, 1, 8)
and (9, 1)
(4, 1, 9)
and (3, 1)
Numpy as basic capabilities to read text files
In [59]:
%%file numpy_data.txt
1 2 3
4 5 6
In [60]:
data = np.loadtxt('numpy_data.txt')
data
Out[60]:
The astropy.io.ascii
module has a much more complete function of reading ascii files in different type of format. Furthermore, scipy.io.readsav
has the capability of reading IDL sav files into python objects. The python pickle
and cPickle
module can also be used to dump and read data to disk.
Simply use the lookfor
method..
In [61]:
np.lookfor('fourier transform')
These exercices are extracted from Pierre Chanial numpy
lecture.
Compute $\pi$, as given by the Madhava of Sangamagrama approximation formula:
$\pi = \sqrt{12}\sum^\infty_{k=0} \frac{(-\frac{1}{3})^{k}}{2k+1}$.
The $k$ indices ranging from 0 to (let’s say) 29 will be returned by the NumPy function arange
and $\pi$ will be computed by calling another NumPy function (sum
), instead of using a for
loop.
In [1]:
# Write your solution here (2 lines expected)
N = 30
k = 0 # to be replaced with an array of indices using N
pi = 3.14 # to be replaced with the mathematical expression above
print('Relative error:', np.abs(pi - np.pi) / np.pi)
Given the random variables X and Y following the uniform distribution between -1 and 1, the probability for the point (X, Y) to be inside the unity disk is the ratio of the surface of the unity disk and that of the unity square, i.e. $\pi/4$. It is then possible possible to compute $\pi$ by drawing realizations of X and Y and counting the fraction of points (X, Y) inside the unity disk.
Vectorize the following pure Python code, by using NumPy arrays and functions.
In [63]:
import math
import random
NTOTAL = 1000000
random.seed(0)
ninside = 0
for i in range(NTOTAL):
x = random.uniform(-1, 1)
y = random.uniform(-1, 1)
ninside += math.sqrt(x**2 + y**2) < 1
pi = 4 * ninside / NTOTAL
print('Relative error:', np.abs(pi - np.pi) / np.pi)
Write the NumPy version below
In [64]:
NTOTAL = 1000000
random.seed(0) # use NumPy random module instead
x = random.uniform(-1, 1) # use NumPy random module instead
y = random.uniform(-1, 1) # use NumPy random module instead
ninside = math.sqrt(x**2 + y**2) < 1 # to be replaced with NumPy version
pi = 4 * ninside / NTOTAL
print('Relative error:', np.abs(pi - np.pi) / np.pi)
Try to predict the output shape of the two examples given in the note on broadcasting, a paragraph above.
Complete the missing parts of the code below to do this exercise. Given a large number of particules of velocities $v_x$, $v_y$, $v_z$ distributed according to the standard normal distribution, plot the histogram of the speed in 1, 2 and 3 dimensions:
$v_1 = |v_x| = \sqrt{v_x^2}$
$v_2 = \sqrt{v_x^2+v_y^2}$
$v_3 = \sqrt{v_x^2+v_y^2+v_z^2}$
and compare it to the theoretical Maxwell distributions:
$f_n(v) = \left(\frac{\pi}{2}\right)^{-\frac{|n-2|}{2}} v^{n-1} e^{-\frac{1}{2}v^2}$
where $n = 1, 2, 3$ is the number of dimensions.
In [65]:
# Replace the ... with your solution
import numpy as np
def velocity2speed(velocity, ndims):
""" Return the ndims-dimensional speed of the particles. """
speed = np.sqrt(1) # replace 1 with an expression that sums the velocity depending on ndims
return speed
def speed_distribution(speed, ndims):
"""
Return the probability distribution function of the ndims-dimensional
speed of the particles.
"""
distrib = 0 # replace with mathematical expression of Maxwell distribution
# normalize the distribution
return distrib
NPARTICULES = 1000000
velocity = np.random.standard_normal((NPARTICULES, 3))
# Test your functions here before plotting the result
In [ ]:
import matplotlib.pyplot as plt
%matplotlib inline
for ndims in (1, 2, 3):
velocity = np.random.standard_normal((NPARTICULES, ndims))
speed = velocity2speed(velocity, ndims)
ax = plt.subplot(1, 3, ndims)
n, bins, patches = ax.hist(speed, bins=100, normed=True, alpha=0.7)
ax.set_title('{}-d speed distribution'.format(ndims))
ax.set_xlim(0, 5)
ax.set_ylim(0, 1.0)
ax.set_xlabel('speed')
ax.plot(speed, speed_distribution(speed, ndims), 'ro')
plt.show()
In [67]:
from IPython.core.display import HTML
HTML(open('../styles/notebook.css', 'r').read())
Out[67]: