NumPy (Numerical Python) is a library for scientific computing. The library contains many mathematical functions for performing operations of linear algebra, Fourier transforms, pseudo-random numbers generators, etc. NumPy can use BLAS and LAPACK (if you have them installed on your system). BLAS and LAPACK are libraries written in Fortran that provides linear algebra routines. NumPy was born with the idea of bringing the functionality of environments like Matlab and Mathematica to Python.
Overall, the code written in NumPy can be shorter and (arguably) clearer than pure Python. The use of loops is reduced because many operations are applied directly on arrays. In addition, there are many mathematical/scientific functions already defined, so it is posible to write less code and more important, those functions are implemted using very efficient and reliable algorithms.
In [1]:
import numpy as np# the standard way to import numpy
NumPy uses a data structure called array. NumPy arrays are similar to Python lists but are more efficient for numerical tasks. This efficiency come from the following facts:
Furthermore arrays are similar to mathematical vectors and matrices.
There are several routines to create NumPy arrays:
To create arrays from Python list (or tuples) we ca use the array function.
In [2]:
v = np.array([1, 2, 3, 4 , 5, 6])
v
Out[2]:
In [3]:
M = np.array([[1, 2], [3, 4], [5, 6]])
M
Out[3]:
Objects v and M are both of the same type; ndarray as you can easily check.
In [4]:
type(v), type(M)
Out[4]:
The difference between array v and M us that they have different shapes. We can get the shape of an array using the method shape.
In [6]:
v.shape # equivalent to np.shape(v)
Out[6]:
In [ ]:
np.shape
In [6]:
M.shape # equivalent to np.shape(M)
Out[6]:
size tell us the number of elements of an array.
In [7]:
v.size, M.size # or np.size(M)
Out[7]:
In the same way the number of dimensions of and array are obtained using ndim
In [7]:
v.ndim, M.ndim
Out[7]:
Using the method dtype, we can ask for the type of the elements stored in an array.
In [8]:
M.dtype
Out[8]:
Using Python list we can mix different types of data in a same list. Instead, with arrays we get an error.
In [9]:
v[0] = "hello world"
Or we could get something we didn't expect
In [12]:
v[0] = 3.14
v
Out[12]:
It is possible to explicitelly define the type of the data when creating an array.
In [13]:
np.array([[1, 2], [3, 4]], dtype=complex)
Out[13]:
Some common types are: int, float, complex, bool
It is also possible to define the number of bits for each type: int64, int16, float64, complex64.
In [16]:
np.arange(0, 10, 1) # from, to, step (from and step are optional)
Out[16]:
Unlike the python range function arange support floats
In [15]:
np.arange(-1, 1, 0.25)
Out[15]:
linspace returns evenly spaced numbers, calculated over the interval [from, to]. Works similar to arange but instead of specifying the step we specify the total number of elements we want.
In [16]:
np.linspace(1, 10, 25) # from, to, elements (elements is optional)
Out[16]:
logspace return numbers spaced evenly on a logaritmic scale.
In [17]:
np.logspace(0, 1.5, 5, base=np.e)
Out[17]:
Random numbers are widely used in scientific applications. In practice computers generate pseudo random numbers, i.e. numbers that, for most intents and purposes, looks like random numbers.
All the routines to generate random number live inside the random module. Python uses the Mersenne Twister pseudo-random generator.
The most basic function is rand. This function creates an array from a uniform distribution over the range [0, 1).
In [19]:
np.random.rand(2, 5) # array with shape (2, 5)
Out[19]:
randn returns samples from the standard normal distribution.
In [23]:
np.random.randn(3)
Out[23]:
While in a lot of applications we need random numbers, randomness could be problematic, for example during debugging. We can solve this by seeding the pseudo-random generator. In this way each time we ask for a random number we will get the same result.
In [20]:
np.random.seed(31415)
np.random.randn(3)
Out[20]:
In [24]:
np.zeros((2,5))
Out[24]:
In [25]:
np.ones((2,5))
Out[25]:
In [26]:
np.empty((2,5))
Out[26]:
In [29]:
np.full((2,5), 42)
Out[29]:
In [30]:
!head sample.dat # Shows the first lines of a file
In [31]:
data = np.genfromtxt('sample.dat', skip_header=True)
data
Out[31]:
In [32]:
data.shape
Out[32]:
Using numpy.savetxt we can save a NumPy array to a csv file.
In [33]:
M = np.random.rand(3,3)
M
Out[33]:
In [36]:
np.savetxt("random_matrix.csv", M, fmt='%.2f')
In [37]:
!head random_matrix.csv
Alternatively, is also possible to save a NumPy array using the npy format. It is useful for storing arrays of data that will then be read with NumPy.
In [38]:
np.save("random_matrix.npy", M)
!ls *.npy
To load the saved file we use the load function, as you may have guessed.
In [39]:
np.load("random_matrix.npy")
Out[39]:
In [40]:
v = np.array([1, 2, 3, 4, 5, 6])
M = np.array([[1, 2], [3, 4], [5, 6]])
print v
print M
v is a 1D array (like a vector) and hence each element is indexed using just one number.
In [41]:
v[0]
Out[41]:
M is a 2D array (like a matrix) and hence each element is indexed using two numbers.
In [43]:
M[1,1] # this is equivalent to M[1][1]
Out[43]:
If we omit an index when indexing a multidimensional array, NumPy delivers the entire row (or, in general, the array of dimension N-1)
In [44]:
M[1]
Out[44]:
The same results is obtained using :
In [45]:
M[1,:] # row 1
Out[45]:
In [46]:
M[:,1] # column 1
Out[46]:
We can assign element to an array specifing the desired index
In [47]:
M[0,0] = -1
M
Out[47]:
The same idea works for entire rows and columns
In [48]:
M[1,:] = 0
M
Out[48]:
In [49]:
M[:,1] = -2
M
Out[49]:
In [50]:
print v
v[1:3]
Out[50]:
Slices can also be used in assigning new values.
In [51]:
v[1:3] = -2,-3
v
Out[51]:
Is possible to slice an array using steps
In [52]:
v[::2] # from and to takes defaults values (the entire array)
Out[52]:
or take the first N elements
In [53]:
v[:3]
Out[53]:
or start from the $i^{th}$ element
In [54]:
v[3:]
Out[54]:
or just take the last element
In [55]:
v[-1]
Out[55]:
or the last N elements
In [56]:
v[-3:] # the las three elements
Out[56]:
As we have already seen arrays are not restricted to 1 or 2 dimensions. Arrays can have many dimensions. As size increases the complexity of the slices also increases.
In [57]:
c = np.array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]],
[[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]])
c
Out[57]:
In [58]:
np.shape(c)
Out[58]:
In [59]:
c[0]
Out[59]:
In [60]:
c[0, 2]
Out[60]:
Let extract the zeroth elements from the second dimension (axis=1)
In [53]:
c[:,0] # equivalent to c[:,0,:]
Out[53]:
Let extract the first elements from the third dimension (axis=2)
In [54]:
c[:,:,1]
Out[54]:
In [55]:
c[0,:,1]
Out[55]:
Any subset of element can be obtained from a NumPy array using, for example.
In [56]:
c[0,1,::2]
Out[56]:
Reverse the order
In [57]:
c[::-1]
Out[57]:
Another example of a complex slicing.
In [58]:
c[0,::-1, -1]
Out[58]:
In [61]:
c = np.arange(24).reshape(2,3,4)
c
Out[61]:
May be we need to delete part of our array
In [60]:
np.delete(c, (0,2), axis=2)
Out[60]:
If we need to reduce our n-dimensional array to a 1D array we could use flatten or ravel.
In [61]:
c.flatten() # flatten is method not a function
Out[61]:
In [62]:
c.ravel() # equivalent to np.ravel(c)
Out[62]:
In [62]:
d = np.arange(16).reshape(4, 4)
e = d * 2
print d
print e
In [63]:
np.concatenate((d, e), axis=0) # equivalent to np.vstack((d, e))
Out[63]:
In [65]:
np.concatenate((d, e), axis=1) # equivalent to np.hstack((d, e))
Out[65]:
In [64]:
np.split(d, 2, axis=0)
Out[64]:
In [65]:
np.split(d, 2, axis=1)
Out[65]:
NumPy provides several mathematical functions. This could seems redundand since the Python's Standard Library already provides this type of functions. But the interesting thing is that NumPy's matemathical functions (as well as other functions) work on the whole array in an element-wise fashion.
For example if we would like to compute the square root of all element in a Python list we should build a for-loop and compute the square root for each element, one at a time. With NumPy we can just doing in one line.
First lets define a new array
In [66]:
f = np.arange(4).reshape(2, 2)
f
Out[66]:
and now compute the square root
In [67]:
np.sqrt(f)
Out[67]:
Functions like sqrt that operates on arrays in an element-by-element fashion are call universal functions (or ufunc for short).
One of the advantages of using ufuncs is that allows for shorter code. Another advantage is that computations are faster than when using a Python loop. Under the hood NumPy is still performing a loop, but the loop is done in a compiled language (C or Fortran) and even further the compiled code has been generally optimized to take advantage of vectorization routines.
Another example of a ufunc is sum
In [68]:
np.sum(f)
Out[68]:
Sometimes we don't want to sum over the flattened array but over one of it's axis.
In [69]:
np.sum(f, axis=0)
Out[69]:
In [70]:
np.sum(f, axis=1)
Out[70]:
The speed-gain of using ufuncs depends on several factors, typical gains could be around 1 to 3 order of magnitudes (unless you apply ufuncs to scalar instead of arrays!).
In [71]:
from math import sqrt
g_arr = np.arange(0, 1000)
g_list = range(0, 1000)
In [72]:
%%timeit
for i in g_list:
sqrt(i)
In [73]:
%%timeit
np.sqrt(g_arr)
In [74]:
%%timeit
sqrt(1)
In [75]:
%%timeit
np.sqrt(1)
In [76]:
h = np.arange(10)**2
h
Out[76]:
In [77]:
idx = np.array([0,2,3,3])
h[idx]
Out[77]:
In [78]:
I = np.arange(9).reshape(3, 3)
I
Out[78]:
In [79]:
idx_row = np.array([0,2])
idx_col = np.array([1])
I[idx_row, idx_col]
Out[79]:
We can also use and array of boolean for indexing, for example if we want to get all the values that are even, we can do:
In [80]:
I[I % 2 == 0]
Out[80]:
And as you can check, the expression inside de brackets is a array of booleans
In [83]:
I % 2 == 0
Out[83]:
Some constants like $\pi$ and $e$ can be accessed from NumPy. If you need other constants you could probably find them at scipy.constants
In [84]:
np.pi
Out[84]:
In [85]:
np.e
Out[85]:
NumPy is a great library for numerical computing and we have just barely scratch the surface of what is possible to do with it. For scientific computing sometimes you will need access to numerical routines for interpolation, integration, optimization, statistics, audio and image processing, etc. Chances are big that this routines are already implemented in SciPy, a scientific computing library built on top of NumPy. As with NumPy in SciPy fast and reliable routines are implemented and is always a good idea to search if the routine you need is available in SciPy (do not waste time reinventing the wheel, unless you need a very special wheel!). SciPy is also the name of a group of related conferences for users and developers of scientific computing in python. In the following chapters we will use a very few of the whole stack of SciPy functions.
If you problem is less of the numerical-crunching-physics-style of computing and more related to statistics and data analysis probably you will be better using Pandas:. Pandas is another library build on top of NumPy that provides easy-to-use data structures and data analysis tools. If the main part of your work involves manipulating and reshaping data, dealing with missing values, and then performing statistical analysis and data visualization Pandas is the library for you.