Overview

The goal of this tutorial is to provide an overview of the use of the NumPy library. It tries to hit all of the important parts , but it is by no means comprehensive. For more information, try looking at the:

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

  • a powerful N-dimensional array object
  • sophisticated (broadcasting) functions
  • tools for integrating C/C++ and Fortran code
  • useful linear algebra, Fourier transform, and random number capabilities

The NumPy array object is the common interface for working with typed arrays of data across a wide-variety of scientific Python packages. NumPy also features a C-API, which enables interfacing existing Fortran/C/C++ libraries with Python and NumPy.

Basic Usage

The NumPy array represents a contiguous block of memory, holding entries of a given type (and hence fixed size). The entries are laid out in memory according to the shape, or list of dimension sizes.


In [ ]:
# Convention for import to get shortened namespace
import numpy as np

In [ ]:
# Create a simple array from a list of integers
a = np.array([1, 2, 3])
a

In [ ]:
# Print out the shape attribute
a.shape

In [ ]:
# And now the data type
a.dtype

In [ ]:
# This time use a list of floats
a = np.array([1., 2., 3., 4., 5.])
a

In [ ]:
a.shape

In [ ]:
a.dtype

NumPy also provides helper functions for generating arrays of data. arange for instance creates a range of values.


In [ ]:
a = np.arange(5)
print a

In [ ]:
a = np.arange(1, 10, 2)
print a

linspace is similar, but is used to create a linearly spaced array of values.


In [ ]:
b = np.linspace(5, 15, 5)
print b

NumPy also provides simple ways of performing mathematical operations. For instance, in core Python, you can do:


In [ ]:
print [x + y for x,y in zip(a,b)]

Using NumPy this becomes:


In [ ]:
a + b

In [ ]:
a * b

In [ ]:
t = np.arange(0, 2 * np.pi, np.pi / 4)
t

NumPy also provides mathematical functions that operate on arrays:


In [ ]:
np.sin(t)

Basic Plotting

Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.


In [ ]:
# Convention for import of the pyplot interface
import matplotlib.pyplot as plt

In [ ]:
from IPython.display import display

# Set-up the IPython notebook to put plots inline
%matplotlib inline

In [ ]:
# Create some example data
x = np.linspace(0, 2, 100)

In [ ]:
# Go ahead and explicitly create a figure and an axes
fig = plt.figure(figsize=(8,5))
ax = fig.add_subplot(1, 1, 1)

# Plot our x variable on the x-axis and x^2 on the y-axis
ax.plot(x, x**2)

In [ ]:
# Add some labels to the plot
ax.set_xlabel('x')
ax.set_ylabel('f(x)')

# Needed to reuse and see the updated plot while using inline
display(fig)

In [ ]:
# Let's add a title with a bit of latex syntax
ax.set_title('$y = x^2$', fontdict={'size':22})

display(fig)

In [ ]:
fig = plt.figure(figsize=(8,5))
ax = fig.add_subplot(1, 1, 1)

# Plot a set of different polynomials. The label argument is used when generating a legend.
ax.plot(x, x, label='$x$')
ax.plot(x, x * x, label='$x^2$')
ax.plot(x, x**3, label='$x^3$')

# Add labels and title
ax.set_xlabel('x')
ax.set_ylabel('f(x)')
ax.set_title('Polynomials')

# Add gridlines
ax.grid(True)

# Add a legend to the upper left corner of the plot
ax.legend(loc='upper left')

Exercise

Make a plot containing:

  • 3 different plots with different styles. Suggestions:
    • sin, cos, tan
    • exp, log
    • sqrt
    • Any others you want to try
  • Use labels and a legend
  • Add labels and title

Indexing and Slicing

Indexing is how we pull individual data items out of an array. Slicing extends this process to pulling out a regular set of the items.


In [ ]:
# Create an array for testing
a = np.arange(12).reshape(3, 4)

In [ ]:
a

Indexing in Python is 0-based, so the command below looks for the 2nd item along the first dimension and the 3rd along the second dimension.


In [ ]:
a[1, 2]

Can also just index on one dimension


In [ ]:
a[2]

Negative indices are also allowed, which permit indexing relative to the end of the array.


In [ ]:
a[0, -1]

Slicing syntax is written as start:stop[:step], where all numbers are optional. Start defaults to 0, end to len(dim), and step to 1. The second colon is also optional if no step is used. It should be noted that end represents one past the last item; one can also think of it as a half open interval: [start, end)


In [ ]:
a[1:3]

In [ ]:
a[::2, ::2]

In [ ]:
a[:, 2]

In [ ]:
# ... can be used to replace one or more full slices
a[..., 2]

Exercises

Using the data below, write code to solve the commented problems.


In [ ]:
a = np.arange(60).reshape(3, 4, 5)

In [ ]:
# Decimate the data by a factor of 3 along all dimensions

In [ ]:
# Pull out the 2nd 4x5 slice

In [ ]:
# Grab the last column from all dimensions (result should be 3x4)

Broadcasting

"The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations. There are, however, cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows computation." (NumPy User Guide)


In [ ]:
# Create some test data
a = np.linspace(0, np.pi, 4)
a

NumPy can perform operations between arrays and constants:


In [ ]:
a * 2

This extends to operations between arrays of different sizes:


In [ ]:
a * np.array([2])

In [ ]:
b = np.linspace(0, 50, 5)

In [ ]:
print a.shape
print b.shape

In [ ]:
# This will not work, however, because the array sizes cannot be rectified
a * b

Broadcasting works by:

  1. Make the arrays have the same number of dimensions. If they are different, size-1 dimensions are prepended implicitly
  2. Check that each dimension is compatible--that they are the same or one of them is 1

In [ ]:
# If we add a size-1 dimension to a they can be rectified. The process of broadcasting will implicitly add a 
a = a.reshape((-1, 1))
print a.shape

In [ ]:
result = a * b
result

In [ ]:
result.shape

Exercise

Create a 1D array of 100 x values between -3 and 3, and a 1D array of 150 y values between -5 and 5. Use these to calculate an array of radius values.

Radius can be calculated as: $r = \sqrt{x^2 + y^2}$


In [ ]:

Logical Indexing

Logical indexing allows selecting elements from an array using a second array of True and False values.


In [ ]:
# Create some synthetic data representing temperature and wind speed data
temp = 20 * np.cos(np.linspace(0, 2 * np.pi, 100)) + 40 + 2 * np.random.randn(100)
spd = np.abs(10 * np.sin(np.linspace(0, 2 * np.pi, 100)) + 10 + 5 * np.random.randn(100))

In [ ]:
plt.plot(temp)
plt.plot(spd)

In [ ]:
# By doing a comparision between a NumPy array and a value, we get an
# array of values representing the results of the comparison between
# each element and the value
temp > 50

In [ ]:
# We can take the resulting array and use this to index into the NumPy
# array and retrieve the values where the result was true
temp[temp > 50]

In [ ]:
# So long as the size of the boolean array matches the data, the
# boolean array can come from anywhere
temp[spd > 10]

In [ ]:
# Make a copy so we don't modify the original data
temp2 = temp.copy()

# Replace all places where spd is <10 with 0.
temp2[spd < 10] = 0
plt.plot(temp2)

In [ ]:
# Can also combine multiple boolean arrays using the syntax for bitwise operations
# *MUST HAVE PARENTHESES* due to operator precedence
temp[(temp < 45) & (spd > 10)]

Masked Arrays

Masked arrays are a specialization of NumPy arrays to handle flagging individual elements as masked. This allows elminating values from plots and from computations.


In [ ]:
# Create a masked array of temperature, masking values where speed is < 10
temp_masked = np.ma.array(temp, mask=spd < 10)

In [ ]:
temp_masked

In [ ]:
plt.plot(temp_masked)

In [ ]:
# Masked values can be set after array creation by setting the corresponding value to the special value of ma.masked
temp_masked[temp_masked > 45] = np.ma.masked

In [ ]:
temp_masked

In [ ]:
plt.plot(np.arange(temp_masked.size), temp_masked)

# Set plot limits to the same as before
plt.xlim(0, 100)
plt.ylim(0, 65)

Exercise

Currently, the integral in the code below results in a bad value. Using masked arrays, fix this so that the integral returns the proper value of 2.393.

Hint: look at np.isnan()


In [ ]:
t = np.linspace(0, 2*np.pi, 200)
x = np.sqrt(np.sin(t))
print np.trapz(x, t)