Scientific Python: Part 1

Numpy and Scipy

Software Carpentry Bootcamp
eResearch NZ 2013

Prepared by: Ariel Rokem

Thanks to: Matt Davis, Justin Kitzes, Katy Huff, Matthew Terry

Introducing Numpy

NumPy is a Python package implementing efficient collections of specific types of data (generally numerical), similar to the standard array module (but with many more features). NumPy arrays differ from lists and tuples in that the data is contiguous in memory. A Python list, [0, 1, 2], in contrast, is actually an array of pointers to Python objects representing each number. This allows NumPy arrays to be considerably faster for numerical operations than Python lists/tuples.



In [1]:

    
# Many useful functions are in external packages
# Let's meet numpy
import numpy as np



In [2]:

    
# To see what's in a package, type the name, a period, then hit tab
#np?
#np.



In [3]:

    
# Some examples of numpy functions and "things":
print(np.sqrt(4))
print(np.pi)  # Not a function, just a variable
print(np.sin(np.pi)) # A function on a variable :)









    



2.0
3.14159265359
1.22464679915e-16

Numpy arrays (ndarrays)

Creating a NumPy array is as simple as passing a sequence to numpy.array:

Numpy arrays are collections of things, all of which must be the same type, that work similarly to lists (as we've described them so far). The most important are:

You can easily perform elementwise operations (and matrix algebra) on arrays
Arrays can be n-dimensional
Arrays must be pre-allocated (ie, there is no equivalent to append)

Arrays can be created from existing collections such as lists, or instantiated "from scratch" in a few useful ways.



In [4]:

    
arr1 = np.array([1, 2.3, 4])   
print(type(arr1))
print(arr1.dtype)









    



<type 'numpy.ndarray'>
float64

You can also explicitly specify the data-type if the automatically-chosen one would be unsuitable.



In [5]:

    
arr2 = np.array([1, 2.3, 4], dtype=int)   
print(type(arr2))
print(arr2.dtype)









    



<type 'numpy.ndarray'>
int64

As you might expect, creating a NumPy array this way can be slow, since it must manually convert each element of a list into its equivalent C type (int objects become C ints, etc). There are many other ways to create NumPy arrays, such as np.identity, np.zeros, np.zeros_like, or by manually specifying the dimensions and type of the array with the low-level creation function:



In [6]:

    
arr3 = np.ndarray((2, 3, 4), dtype=complex)  # Notice : `ndarray`, not `array`!
print(type(arr3))









    



<type 'numpy.ndarray'>

Arrays have a .shape attribute, which stores the dimensions of the array as a tuple:



In [7]:

    
print(arr3.shape)









    



(2, 3, 4)

For many of the examples below, we will be using np.arange which, similar to the Python built-in function range, returns a NumPy array of integers from 0 to N-1, inclusive. Like range, you can also specify a starting value and a step:



In [8]:

    
arr4 = np.arange(2, 5)
print(arr4)
arr5 = np.arange(1, 5, 2)
print(arr5)
arr6 = np.arange(1, 10, 2)
print arr6









    



[2 3 4]
[1 3]
[1 3 5 7 9]

Exercise : Create an Array

Create an array with values ranging from 0 to 10, in increments of 0.5.

Reminder: get help by typing np.arange?, np.ndarray?, np.array?, etc.

Arithmetic with arrays

Since numpy exists to perform efficient numerical operations in Python, arrays have all the usual arithmetic operations available to them. These operations are performed element-wise (i.e. the same operation is performed independently on each element of the array).



In [9]:

    
A = np.arange(5)
B = np.arange(5, 10)

print (A+B)

print(B-A)

print(A*B)









    



[ 5  7  9 11 13]
[5 5 5 5 5]
[ 0  6 14 24 36]

What would happen if A and B did not have the same `shape`?

Arithmetic with scalars:

In addition, if one of the arguments is a scalar, that value will be applied to all the elements of the array.



In [10]:

    
A = np.arange(5)
print(A+10)
print(2*A)
print(A**2)









    



[10 11 12 13 14]
[0 2 4 6 8]
[ 0  1  4  9 16]

Linear algebra with arrays

You can use arrays as vectors and matrices in linear algebra operations

Specifically, you can perform matrix/vector multiplication between arrays, by using the .dot method, or the np.dot function:



In [11]:

    
print A.dot(B)
print np.dot(A, B)

Note: This is like the '``' operator in Matlab*

Exercise :

Given two vectors in 3-space:

a = [1,2]

b = [4,5]

Find the angle between these vectors.
The rotation matrix of $\theta$ angle is a matrix with elements:

[[cos($\theta$), -sin($\theta$],

[sin($\theta$, cos($\theta$)]]

Create a function that takes two 3-vectors and creates the rotation matrix between them.
For the vector c=[7,8], find a vector d that has the same rotation as b has relative to a

Boolean operators work on arrays too, and they return boolean arrays

Much like the basic arithmetic operations we discussed above, comparison operations are perfomed element-wise. That is, rather than returning a single boolean, comparison operators compare each element in both arrays pairwise, and return an array of booleans (if the sizes of the input arrays are incompatible, the comparison will simply return False). For example:



In [12]:

    
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([1, 1, 3, 3, 5])
print(arr1 == arr2)
c = (arr1 == arr2)
print c.dtype









    



[ True False  True False  True]
bool

Note: You can use the methods .any() and .all() or the functions np.any and np.all to return a single boolean indicating whether any or all values in the array are True, respectively.



In [13]:

    
print(np.all(c))
print(c.all())
print(c.any())









    



False
False
True

Indexing arrays

In addition to the usual methods of indexing lists with an integer (or with a series of colon-separated integers for a slice), numpy allows you to index arrays in a wide variety of different ways for more advanced operations.

First, the simple way:



In [14]:

    
a = np.array([1,2,3])
print a[0:2]

What happens if the array has more than one dimension?



In [15]:

    
c = np.random.rand(3,3)
print(c)
print(c[1:3,0:2])
c[0,:] = a
print(c)









    



[[ 0.67732547  0.92040098  0.78127807]
 [ 0.17040251  0.45105409  0.53734387]
 [ 0.01613388  0.86307387  0.01468406]]
[[ 0.17040251  0.45105409]
 [ 0.01613388  0.86307387]]
[[ 1.          2.          3.        ]
 [ 0.17040251  0.45105409  0.53734387]
 [ 0.01613388  0.86307387  0.01468406]]

Exercise

We can manipulate the shape of an array as follows: A = np.arange(16).reshape(4, 4) Or even: A = np.reshape(numpy.arange(16), (4, 4))

Using what we've learned about slicing and indexing, create a function that takes an integer as input, creates a square n-by-n array with integers from 0 to n**2-1 (like A) and prints just the upper-left quarter of the array

For example, for A, the desired output would be:

 array([[2, 3],
        [6, 7]])

Arrays can also be indexed with other arrays

Arrays can be indexed with other arrays, using either an array of indices, or an array of booleans of the same length. In the former case, numpy returns a view of the data in the specified indices as a new array. In the latter, numpy returns a view of the array with only the elements where the index array is True. (We'll discuss the difference between views and copies in a moment.) This makes normally-tedious operations like clamping extremely simple.

Indexing with an array of indices:



In [16]:

    
A = np.arange(5, 10)
print(A)
print(A[[0, 2, 3]])
A[[0, 2, 3]] = 0
print(A)









    



[5 6 7 8 9]
[5 7 8]
[0 6 0 0 9]

Indexing with a boolean array:



In [17]:

    
random = np.random
A = np.array([random.randint(0, 10) for i in range(10)]) # Check out the list comprehension!
print(A)
A[A>5] = 5
print(A)









    



[0 2 9 1 9 5 6 1 3 1]
[0 2 5 1 5 5 5 1 3 1]

A few more examples:



In [18]:

    
b = np.array([4,5,6])
print (a)
print (b)
print (a > 2)
print (a[a > 2])
print (b[a > 2])
b[a == 3] = 77
print(b)









    



[1 2 3]
[4 5 6]
[False False  True]
[3]
[6]
[ 4  5 77]



In [19]:

    
# There are handy ways to make arrays full of ones and zeros
print(np.zeros(5))
print np.ones(5)
print np.identity(5), '\n'









    



[ 0.  0.  0.  0.  0.]
[ 1.  1.  1.  1.  1.]
[[ 1.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.]
 [ 0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  1.]]

Numpy 'gotchas'

Multiplication and Addition

As you may have noticed above, since NumPy arrays are modeled more closely after vectors and matrices, multiplying by a scalar will multiply each element of the array, whereas multiplying a list by a scalar will repeat that list N times.



In [20]:

    
A = np.arange(5)*2
print(A) 
B = range(5)*2
print(B)









    



[0 2 4 6 8]
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]

Similarly, when adding two numpy arrays together, we get the vector sum back, whereas when adding two lists together, we get the concatenation back.



In [21]:

    
A = np.arange(5) + np.arange(5)
print(A)
B =range(5) + range(5)
print(B)









    



[0 2 4 6 8]
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]

Views vs. Copies

In order to be as efficient as possible, numpy uses "views" instead of copies wherever possible. That is, numpy arrays derived from another base array generally refer to the ''exact same data'' as the base array. The consequence of this is that modification of these derived arrays will also modify the base array. The result of an array indexed by an array of indices is a ''copy'', but an array indexed by an array of booleans is a ''view''.

Specifically, slices of arrays are always views, unlike slices of lists or tuples, which are always copies.



In [22]:

    
A = np.arange(5)
B = A[0:1]
B[0] = 42
print(A)

A = range(5)
B = A[0:1]
B[0] = 42
print(A)









    



[42  1  2  3  4]
[0, 1, 2, 3, 4]

Exercise : Copy a numpy array

Figure out how to create a copy of a numpy array. Remember: since numpy slices are views, you can't use the trick you'd use for Python lists, i.e. copy = list[:].

Mathematics with numpy

Being designed for scientific computing, numpy also contains a host of common mathematical functions, including linear algebra functions, fast Fourier transforms, and probability/statistics functions. While there isn't space to go over ''all'' of these in detail, we will provide an overview of the most common/essential of these.

For >2-dimensional arrays, there are some other common matrix operations that can be conducted:



In [23]:

    
A = np.arange(16).reshape(4, 4)
print(A)
print(A.T) # transpose
print(A.trace())









    



[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]
[[ 0  4  8 12]
 [ 1  5  9 13]
 [ 2  6 10 14]
 [ 3  7 11 15]]
30

There are many more methods like these available with NumPy arrays. Be sure to consult the numpy documentation before writing your own versions!

The `matrix` class

So far, we've used two-dimensional arrays to represent matrix-like objects. However, numpy provides a specialized class for this. The matrix class is almost identical to a two-dimensional numpy array, but has a few changes to the interface to simplify common linear algebraic tasks. These are: * The * operator is performs matrix multiplication * The ** operator performs matrix exponentiation * The property .I (or the method .getI()) returns the matrix inverse * The property .H (or the method .getH()) returns the conjugate transpose

Example: Solving a System of Linear Equations



In [24]:

    
la = np.linalg
A = np.matrix([[3, 2, -1], [2, -2, 4], [-1, .5, -1]])
B = np.array([1, -2, 0])
print(la.solve(A, B))









    



[ 1. -2. -2.]

Universal functions

Universal functions (also called ufuncs) are high-speed, element-wise operations on NumPy arrays. They are, in essence, what allows you to operate on NumPy arrays efficiently. There are a large number of universal functions available covering most of the basic operations that get performed on data, like addition, subtraction, logarithms, and so on. Calling a ufunc is a simple matter:



In [25]:

    
A = np.arange(1,10)
print(np.log10(A))









    



[ 0.          0.30103     0.47712125  0.60205999  0.69897     0.77815125
  0.84509804  0.90308999  0.95424251]



In [25]:

In addition to basic operation like above, ufuncs that take two input arrays and return an output array can be used in more advanced ways.

Exercise : Elementwise Operations

Using ufuncs, calculate the reciprocals of each element in the following array:

[8.1, 1.6, 0.9, 4.3, 7.0, 7.3, 4.7, 8.2, 7.2, 3.0, 1.4, 9.8, 5.7, 0.7, 8.7, 4.6, 8.8, 0.9, 4.4, 4.4]

Scipy

Scipy is a huge library of scientific software. It includes many useful things, such as functions for statistics, for signal processing, image processing, etc.

Because it's very large, importing it doesn't actually bring in all of its modules and those need to be imported individually.



In [26]:

    
import scipy
# scipy?

For example, if you want to look at physical constants, you need to specifically import the constants module.



In [27]:

    
import scipy.constants as const



In [28]:

    
print(const.c)









    



299792458.0



In [29]:

    
print(const.find("alpha"))









    



['alpha particle mass', 'alpha particle mass energy equivalent', 'alpha particle mass energy equivalent in MeV', 'alpha particle mass in u', 'alpha particle molar mass', 'alpha particle-electron mass ratio', 'alpha particle-proton mass ratio', 'electron to alpha particle mass ratio']



In [30]:

    
print(const.physical_constants['alpha particle mass'])









    



(6.64465675e-27, 'kg', 2.9e-34)

There are a lot of gems hidden in the special and misc modules



In [31]:

    
import scipy.special as sps

# We'll talk about matplotlib later today:
import matplotlib.pylab as pylab
x = np.arange(0.0, 10.1, 0.1)

for n in range(4): 
    j = sps.jn(n, x) 
    pylab.plot(x, j, label='Bessel order=%s'%n)

pylab.legend()
pylab.show()



In [32]:

    
import scipy.misc as misc



In [33]:

    
import matplotlib.pylab as pylab
pylab.matshow(misc.lena())
pylab.show()

As a short-cut, you can import many useful things using the following command:



In [34]:

    
%pylab inline









    



Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.kernel.zmq.pylab.backend_inline].
For more information, type 'help(pylab)'.

This pulls into your namespace numpy (as np), as well as everything that is in numpy:



In [35]:

    
array([1,2,3]) # Notice - I am calling this 'bare' without the preceding 'np'!









    Out[35]:





array([1, 2, 3])

Also - plots will from now on appear 'inline', as part of the cell output



In [36]:

    
plot(array([1,2,3]))









    Out[36]:





[<matplotlib.lines.Line2D at 0x10cbf0b90>]



In [ ]: