(See Ch. 3, "Python Scientific lecture notes", Release 2012.3 (EuroScipy 2012)
by the EuroScipy tutorial team
Editors: Valentin Haenel, Emmanuelle Gouillart, Gaël Varoquaux
Python has built-in:
Numpy is:
In [5]:
import numpy as np
a = np.array([0, 1, 2, 3, 4])
a
a = np.arange(5)
a
Out[5]:
For example...
...an array containing:
Why is it useful?
efficient numerical operations
In [6]:
l = range(1000)
l**2
%timeit [i**2 for i in l]
In [8]:
a = np.arange(10)
%timeit a**2
a**2
Out[8]:
Reference documentation:
In [9]:
help(np.array)
In [10]:
np.array?
In [11]:
np.lookfor('create array')
In [12]:
np.con*?
In [13]:
a = np.array([0, 1, 2, 3])
a
Out[13]:
In [15]:
a.ndim
Out[15]:
In [16]:
a.shape
Out[16]:
In [ ]:
len(a)
In [23]:
b = np.array([(0, 1, 2), [3, 4, 5]])
b
Out[23]:
In [25]:
b.ndim
Out[25]:
In [19]:
len(b) # returns the size of the first dimension
Out[19]:
In [20]:
b.shape
Out[20]:
In [26]:
c = np.array([[[1, 2], [3, 4], [5, 6]], [[7, 8], [9, 10], [11, 12]]])
c
Out[26]:
In [27]:
c.shape
Out[27]:
In practice, we rarely enter items one by one...
In [28]:
a = np.arange(10) # 0 .. n-1 (!)
a
Out[28]:
In [31]:
b = np.arange(1, 9, 1) # start, end (exlusive), step
b
Out[31]:
In [33]:
c = np.linspace(0, 1, 6, endpoint=False) # start, end, num-points
c
Out[33]:
In [ ]:
d = np.linspace(0, 1, 5, endpoint=False)
d
In [37]:
a = np.ones((3, 3)) # reminder: (3, 3) is a tuple
a
Out[37]:
In [38]:
b = np.zeros((2, 2))
b
Out[38]:
In [39]:
c = np.eye(3)
c
Out[39]:
In [40]:
d = np.diag(np.array([1, 2, 3, 4]))
d
Out[40]:
In [41]:
a = np.random.rand(4) # uniform in [0, 1]
a
Out[41]:
In [55]:
b = np.random.randn(4) # Gaussian
b
Out[55]:
In [54]:
np.random.seed(1234) # Setting the random seed
You may have noticed that, in some instances, array elements are displayed with a trailing dot (e.g. 2. vs 2).
This is due to a difference in the data-type used:
In [62]:
a = np.array([1, 2, 3]).astype(float)
a[0] = 2121.2
a
Out[62]:
In [57]:
b = np.array([1., 2., 3.])
b.dtype
Out[57]:
Different data-types allow us to store data more compactly in memory, but most of the time we simply work with floating point numbers.
Note that, in the example above, NumPy auto-detects the data-type from the input.
In [ ]:
c = np.array([1, 2, 3], dtype=int)
c.dtype
In [ ]:
a = np.ones((3, 3))
a.dtype
There are also other types:
In [ ]:
d = np.array([1+2j, 3+4j, 5+6*1j])
d.dtype
In [ ]:
e = np.array([True, False, False, True])
e.dtype
In [ ]:
f = np.array(['a','happy','world'])
f.dtype # <--- strings containing max. 5 letters
In [ ]:
f[:] = 'abcdefg'
f
The items of an array can be accessed and assigned to the same way as other Python sequences (e.g. lists)
In [63]:
a = np.arange(10)
a
Out[63]:
In [66]:
a[-4]
Out[66]:
Warning: Indices begin at 0, like other Python sequences (and C/C++).
In contrast, in Fortran or Matlab, indices begin at 1.
In [67]:
a = np.diag(np.arange(3))
a
Out[67]:
In [68]:
a[1, 1]
Out[68]:
In [69]:
a[2, 1] = 10 # third line, second column
a
Out[69]:
In [70]:
a[1]
Out[70]:
Note that:
Slicing
In [71]:
a = np.arange(10)
a
Out[71]:
In [72]:
a[2:9:3] # [start:end:step]
Out[72]:
Note that:
In [73]:
a[:4]
Out[73]:
In [ ]:
a[1:3]
In [74]:
a[::2]
Out[74]:
In [ ]:
a[3:]
A slicing operation creates a view on the original array.
This is just a way of accessing array data. Thus the original array is not copied in memory.
When modifying the view, the original array is modified as well:
In [75]:
a = np.arange(10)
a
Out[75]:
In [76]:
b = a[::2]
b
Out[76]:
In [77]:
b[0] = 12
b
Out[77]:
In [83]:
x = [1,2,3,4]
b = x[0:2]
b[0]=100000
x
Out[83]:
In [79]:
a = np.arange(10)
b = a[::2].copy() # force a copy
b[0] = 12
a
Out[79]:
This behavior can be surprising at first sight...
but it allows to save both memory and time.
Warning: The transpose is a view
As a result, a matrix cannot be made symmetric in-place:
In [86]:
a = np.ones((100, 100))
a += a.T
a
Out[86]:
In [87]:
a = np.ones((100,100))
a += a.copy().T
a
Out[87]:
Compute prime numbers in 0–99, with a "sieve"
Construct a shape (100,) boolean array is_prime, filled with True in the beginning
Cross out 0 and 1 which are not primes
For each integer j starting from 2, cross out its higher multiples
In [ ]:
In [ ]:
In [ ]:
In [ ]:
help(np.nonzero)
In [ ]:
In [ ]:
Indexing with the np.newaxis object allows us to add an axis to an array:
In [88]:
z = np.array([1, 2, 3])
z
Out[88]:
In [89]:
z[:, np.newaxis]
Out[89]:
In [90]:
z[np.newaxis, :]
Out[90]:
Numpy arrays can be indexed with slices
but also with boolean or integer arrays (masks)
This method is called fancy indexing.
It creates copies not views.
Using boolean masks
In [96]:
np.random.seed(3)
a = np.random.random_integers(0, 20, 15)
a
Out[96]:
In [92]:
(a % 3 == 0)
Out[92]:
In [97]:
mask = (a % 3 == 0)
extract_from_a = a[mask] # or, a[a%3==0]
extract_from_a # extract a sub-array with the mask
extract_from_a[0] = 9999999
a
Out[97]:
In [98]:
a[a % 3 == 0] = -1
a
Out[98]:
Indexing with an array of integers
In [102]:
a = np.arange(2,12)
a
Out[102]:
In [103]:
a[[2, 3, 2, 4, 2]] # note: [2, 3, 2, 4, 2] is a Python list
Out[103]:
In [104]:
a[[9, 7]] = -10
a
Out[104]:
In [ ]:
a = np.arange(10)
idx = np.array([[3, 4], [9, 7]])
a[idx]
Consider the following examples of fancy indexing:
In [ ]:
a = np.arange(36).reshape(6,6)
a
In [ ]:
a[(0,1,2,3,4),(1,2,3,4,5)]
In [ ]:
a[:3,[0,2,5]]
In [ ]:
mask = np.array([1,0,1,0,0,1],dtype=bool)
a[mask,2]
In [ ]:
a = np.arange(12).reshape(3,4)
a
In [ ]:
i = np.array([[0, 1], [1, 2]])
a[i, 2] # same as a[i, 2*np.ones((2, 2), dtype=int)]
Most of numpy and scipy's mathematical operations are vectorized. Applying these operations to arrays means they are applied elementwise. However, similar to MATLAB, internally the elementwise operations are implemented in C and C++. Thus vectorized computations are much faster than if you would simply loop through each element yourself and apply your operation manually.
So alway try think vectorized instead of manually looping through arrays.
Elementwise operations
In [105]:
a = np.array([1, 2, 3, 4])
a + 1
Out[105]:
In [106]:
2**a
Out[106]:
In [107]:
"""Speed up of vectorization"""
def twice_each_element(a):
b=np.zeros(a.shape)
for idx, elem in enumerate(a):
b[idx] = 2*elem
return b
a = np.arange(100)
%timeit twice_each_element(a)
%timeit 2*a
In [109]:
a = np.array([1, 2, 3, 4])
b = np.ones(4) + 1
print a
print b
a - b
Out[109]:
In [112]:
a*b
Out[112]:
In [113]:
j = np.arange(5)
2**(j + 1) - j
Out[113]:
Warning:
In [114]:
c = np.ones((3, 3))
c * c # NOT matrix multiplication!
Out[114]:
In [115]:
c.dot(c)
Out[115]:
In [116]:
a = np.array([1, 2, 3, 4])
b = np.array([4, 2, 2, 4])
a == b
Out[116]:
In [117]:
a > b
Out[117]:
In [118]:
a = np.array([1, 1, 0, 0], dtype=bool)
b = np.array([1, 0, 1, 0], dtype=bool)
np.logical_or(a, b)
Out[118]:
In [ ]:
np.logical_and(a, b)
In [119]:
a = np.arange(4)
a
Out[119]:
In [120]:
a + np.array([1, 2])
In [121]:
a = np.triu(np.ones((3, 3)), 1) # see help(np.triu)
a
Out[121]:
In [122]:
a.T
Out[122]:
Note: Linear algebra
The sub-module np.linalg implements basic linear algebra
However,
Basic reductions
In [123]:
x = np.array([1, 2, 3, 4])
np.sum(x)
Out[123]:
In [ ]:
x.sum()
In [124]:
x = np.array([[1, 1], [2, 2]])
x
Out[124]:
In [125]:
x.sum(axis=0) # columns (first dimension)
Out[125]:
In [126]:
x[:, 0].sum(), x[:, 1].sum()
Out[126]:
In [ ]:
x.sum(axis=1) # rows (second dimension)
In [ ]:
x[0, :].sum(), x[1, :].sum()
In [ ]:
x = np.random.rand(2, 2, 2)
x.sum(axis=2)[0, 1]
In [ ]:
x[0, 1, :].sum()
Other reductions— works the same way (and take axis=)
In [ ]:
x = np.array([1, 2, 3, 1])
y = np.array([[1, 2, 3], [5, 6, 1]])
x.mean()
In [ ]:
np.median(x)
In [ ]:
np.median(y, axis=-1) # last axis
In [ ]:
x.std() # full population standard dev.
In [ ]:
x = np.array([1, 3, 2])
x.min()
In [ ]:
x.max()
In [ ]:
x.argmin() # index of minimum
In [ ]:
x.argmax() # index of maximum
In [ ]:
np.all([True, True, False])
In [ ]:
np.any([True, True, False])
Note: Can be used for array comparisons:
In [ ]:
a = np.zeros((100, 100))
np.any(a != 0)
In [ ]:
np.all(a == a)
In [ ]:
a = np.array([1, 2, 3, 2])
b = np.array([2, 2, 3, 2])
c = np.array([6, 4, 4, 5])
((a <= b) & (b <= c)).all()
Nevertheless
In [134]:
a = np.tile(np.arange(0, 40, 10), (3, 1)).T
a
Out[134]:
In [131]:
b = np.array([0, 1, 2])
b
Out[131]:
In [135]:
a+b
Out[135]:
In [136]:
a = np.arange(0, 40, 10)
a.shape
Out[136]:
In [137]:
a = a[:, np.newaxis] # adds a new axis -> 2D array
a.shape
Out[137]:
In [140]:
b
Out[140]:
In [141]:
a
Out[141]:
In [142]:
a + b
Out[142]:
In [ ]:
a = np.ones((4, 5))
a[0] = 2 # we assign an array of dimension 0 to an array of dimension 1
a
In [ ]:
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6]])
a.ravel()
In [ ]:
a.T.ravel()
In [ ]:
a.shape
In [ ]:
b = a.ravel()
b.reshape((2, 3))
In [ ]:
a = np.arange(36)
b = a.reshape((6, 6))
b
In [ ]:
b = a.reshape((6, -1)) # unspecified (-1) value is inferred
b
In [ ]:
b[0, 0] = 99
a
In [ ]:
a = np.zeros((3, 2))
b = a.T.reshape(3*2)
b[0] = 9
a
In [ ]:
a = np.arange(4*3*2).reshape(4, 3, 2)
a.shape
In [ ]:
a[0, 2, 1]
In [ ]:
b = a.transpose(1, 2, 0)
b.shape
In [ ]:
b[2, 1, 0]
In [ ]:
b[2, 1, 0] = -1
a[0, 2, 1]
In [ ]:
a = np.arange(4)
a.resize((8,))
a
However, it must not be referred to somewhere else:
In [ ]:
b = a
a.resize((4,))
In [ ]:
a = np.array([[4, 3, 5], [1, 2, 1]])
b = np.sort(a, axis=1)
b
Note: Sorts each row separately!
In-place sort:
In [ ]:
a.sort(axis=1)
a
In [ ]:
a = np.array([4, 3, 1, 2])
j = np.argsort(a)
j
In [ ]:
a[j]
In [ ]:
a = np.array([4, 3, 1, 2])
j_max = np.argmax(a)
j_min = np.argmin(a)
j_max, j_min
(See Ch. 3, "Python Scientific lecture notes", Release 2012.3 (EuroScipy 2012)
by the EuroScipy tutorial team
Editors: Valentin Haenel, Emmanuelle Gouillart, Gaël Varoquaux
You may also need to visualize your data arrays.
Matplotlib is a Python plotting library. Its capabilities and interface are quite similar to the ones offered by Matlab. It can be used to display the results of your calculations, create publication quality gures and can be embedded in graphical user interfaces.
Matplotlib is a 2D plotting package.
We can import its functions as below:
In [ ]:
import matplotlib.pyplot as plt # the tidy way
import numpy as np
In [ ]:
x = np.linspace(0, 3, 20)
y = np.linspace(0, 9, 20)
plt.figure()
plt.plot(x, y) # line plot
plt.plot(x, y,'o') # dot plot
plt.show() # <-- shows the plot (not needed with Ipython)
In [ ]:
# Add the following line to make plots *appear* within the iypthon notebook
%matplotlib inline
In [ ]:
x = np.linspace(0, 3, 20)
y = np.linspace(0, 9, 20)
plt.figure()
plt.plot(x, y) # line plot
plt.plot(x, y,'o') # dot plot
plt.xlabel('input')
plt.ylabel('output')
plt.title('Input-Output-Relationship')
(such as images)
In [ ]:
image = np.random.rand(30, 30)
plt.imshow(image, cmap=plt.cm.gray)
plt.colorbar()
In [ ]:
x = np.linspace(0, 3, 20)
y = np.linspace(0, 9, 20)
image = np.random.rand(30, 30)
plt.figure()
plt.subplot(1,2,1)
plt.plot(x, y) # line plot
plt.plot(x, y,'o') # dot plot
plt.subplot(1,2,2)
plt.imshow(image, cmap=plt.cm.gray)
plt.colorbar()
In [ ]:
x, y = np.arange(5), np.arange(5)
distance = np.sqrt(x ** 2 + y[:, np.newaxis] ** 2)
distance
In [ ]:
plt.pcolor(distance)
plt.colorbar()
plt.axis('equal')
In [ ]:
Mileposts = np.random.random_integers(1,50,10)
Mileposts[0] = 0
Mileposts
In [ ]:
DistAlongRoad = np.cumsum(Mileposts)
DistAlongRoad
In [ ]:
PtPtDist = np.abs(DistAlongRoad - DistAlongRoad[:, np.newaxis])
PtPtDist
In [ ]:
plt.pcolor(PtPtDist)
plt.colorbar()
plt.axis('equal')
In [ ]:
In [ ]: