Introduction to NumPy

NumPy is the basic library in Python that defines a number of essential data structures and routines for doing numerical computing (among other things). Many of the semantics for manipulating the most basic data structure, the ndarray, are identical to manipulating lists with a few key exceptions. We will cover those and some of the other important points when working with NumPy.

Topics:

  • The ndarray
  • Mathematical functions
  • Array manipulations
  • Common array functions

ndarray

The ndarray forms the most basic type of data-structure for NumPy. As the name suggests the ndarray is an array that can have as many dimensions as you specify. For matlab users this should be familiar although note that the ndarray does not exactly behave as you might expect the same object to in matlab. Here are some examples usages:


In [2]:
import numpy

Define a 2x2 array, note that unlike MATLAB we need commas everywhere:


In [3]:
my_array = numpy.array([[1, 2], [3, 4]])
print my_array


[[1 2]
 [3 4]]

Get the (0, 1) component of the array:


In [4]:
my_array[0, 1]


Out[4]:
2

Fetch the first column of the matrix:


In [5]:
my_array[:,0]


Out[5]:
array([1, 3])

Define a column vector:


In [6]:
my_vec = numpy.array([[1], [2]])

Multiply my_array by the vector my_vec in the usual linear algebra sense (equivalent to MATLAB's *)


In [7]:
numpy.dot(my_array, my_vec)


Out[7]:
array([[ 5],
       [11]])

Multiply my_array and my_vec by "broadcasting" the matching dimensions, equivalent to MATLAB's .* form:


In [8]:
my_array * my_vec


Out[8]:
array([[1, 2],
       [6, 8]])

Common Array Constructors

Along with the must common constructor for ndarrays above (array) there are number of other ways to create arrays with particular values inserted in them. Here are a few that can be useful.

The linspace command (similar to MATLAB's linspace command) take three arguments, the first define a range of values and the third how many points to put in between them. This is great if you want to evaluate a function at evently space points between two numbers.


In [9]:
numpy.linspace(-1, 1, 10)


Out[9]:
array([-1.        , -0.77777778, -0.55555556, -0.33333333, -0.11111111,
        0.11111111,  0.33333333,  0.55555556,  0.77777778,  1.        ])

Another useful set of functions are zeros and ones which create an array of zeros and ones respectively (again equivalent to the functions in MATLAB).


In [10]:
numpy.zeros([3, 3])


Out[10]:
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

In [11]:
numpy.ones([1, 3, 2])


Out[11]:
array([[[ 1.,  1.],
        [ 1.,  1.],
        [ 1.,  1.]]])

Note that NumPy arrays can be reshaped and expanded after they are created but this can be computational expense and may be difficult to fully understand the consequences of (reshape in particular can be difficult). One way to avoid these issues is to create an empty array of the right size and storing the calculated values as you find them. The array constructor to do this is called empty:


In [12]:
numpy.empty([2,3])


Out[12]:
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

Note that here the IPython notebook is displaying zeros (or something close to this). The values are almost always not zero but the display of values is truncated to help with displaying long numbers. This can be controlled via the settings `` TODO!

Array Manipulations

Sometimes, despite our best efforts, we will need to manipulate the size or shape of our already created arrays.

  • Note that these functions can be complex to use and can be computationally expensive so use sparingly!
  • That being said, often these can still be a great way to avoid using too much memory and still may be faster than creating multiple arrays.
  • Check out the NumPy Docs for more functions beyond these basic ones

One of the important aspects of an array is its shape.


In [1]:
A = numpy.array([[1, 2, 3], [4, 5, 6]])
print "A Shape = ", A.shape
print A


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-87d3a08cdd53> in <module>()
----> 1 A = numpy.array([[1, 2, 3], [4, 5, 6]])
      2 print "A Shape = ", A.shape
      3 print A

NameError: name 'numpy' is not defined

We can reshape an array.


In [26]:
B = A.reshape((3, 2))
print "A Shape = ", A.shape
print "B Shape = ", B.shape
print B


A Shape =  (2, 3)
B Shape =  (3, 2)
[[1 2]
 [3 4]
 [5 6]]

Take the matrix A and make a larger matrix by tiling the old one the number of times specified.


In [32]:
numpy.tile(A, (2,2))


Out[32]:
array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6],
       [1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

Transpose!


In [34]:
A.transpose()


Out[34]:
array([[1, 4],
       [2, 5],
       [3, 6]])

Mathematical Functions

Similar to the built-in Python module math, NumPy also provides a number of common math functions such as sqrt, sin, cos, and tan along with a number of useful constants, the most important of which is $pi$. The benefit of using NumPy's versions is that they can be used on entire arrays.


In [11]:
x = numpy.linspace(-2.0 * numpy.pi, 2.0 * numpy.pi, 62)
y = numpy.sin(x)
print x


[-6.28318531 -6.07717923 -5.87117316 -5.66516708 -5.459161   -5.25315493
 -5.04714885 -4.84114278 -4.6351367  -4.42913063 -4.22312455 -4.01711848
 -3.8111124  -3.60510632 -3.39910025 -3.19309417 -2.9870881  -2.78108202
 -2.57507595 -2.36906987 -2.16306379 -1.95705772 -1.75105164 -1.54504557
 -1.33903949 -1.13303342 -0.92702734 -0.72102126 -0.51501519 -0.30900911
 -0.10300304  0.10300304  0.30900911  0.51501519  0.72102126  0.92702734
  1.13303342  1.33903949  1.54504557  1.75105164  1.95705772  2.16306379
  2.36906987  2.57507595  2.78108202  2.9870881   3.19309417  3.39910025
  3.60510632  3.8111124   4.01711848  4.22312455  4.42913063  4.6351367
  4.84114278  5.04714885  5.25315493  5.459161    5.66516708  5.87117316
  6.07717923  6.28318531]

This is often useful for plotting functions easily or setting up a problem (we will cover plotting later).

One thing to watch out for (and this is true of the math module) is that contrary to what you might expect:


In [26]:
x = numpy.linspace(-1, 1, 20)
numpy.sqrt(x)


/Users/mandli/src/envs/claw/lib/python2.7/site-packages/ipykernel/__main__.py:1: RuntimeWarning: invalid value encountered in sqrt
  if __name__ == '__main__':
Out[26]:
array([        nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
        0.22941573,  0.39735971,  0.51298918,  0.60697698,  0.6882472 ,
        0.76088591,  0.82717019,  0.88852332,  0.9459053 ,  1.        ])

The problem is that if you take the sqrt of a negative number NumPy does not automatically use the Complex variable type to represent the output. Unlike lists, NumPy requires the data stored within to be uniform (of the same type or record structure). By default NumPy assumes we want floats which obey the IEEE compliant floating point rules for arithmitec (more on this later) and generates nans instead (nan stands for "not-a-number", see more about this special value here).

If we want to deal with complex numbers there is still a way to tell NumPy that we want the Complex data type instead by doing the following:


In [27]:
x = numpy.linspace(-1, 1, 20, dtype=complex)
numpy.sqrt(x)


Out[27]:
array([ 0.00000000+1.j        ,  0.00000000+0.9459053j ,
        0.00000000+0.88852332j,  0.00000000+0.82717019j,
        0.00000000+0.76088591j,  0.00000000+0.6882472j ,
        0.00000000+0.60697698j,  0.00000000+0.51298918j,
        0.00000000+0.39735971j,  0.00000000+0.22941573j,
        0.22941573+0.j        ,  0.39735971+0.j        ,
        0.51298918+0.j        ,  0.60697698+0.j        ,
        0.68824720+0.j        ,  0.76088591+0.j        ,
        0.82717019+0.j        ,  0.88852332+0.j        ,
        0.94590530+0.j        ,  1.00000000+0.j        ])

There are number of other data types that NumPy understands, the most important one being int for integers.