The goal of this tutorial is to provide an overview of the use of the NumPy library. It tries to hit all of the important parts , but it is by no means comprehensive. For more information, try looking at the:
NumPy is the fundamental package for scientific computing with Python. It contains among other things:
The NumPy array object is the common interface for working with typed arrays of data across a wide-variety of scientific Python packages. NumPy also features a C-API, which enables interfacing existing Fortran/C/C++ libraries with Python and NumPy.
The NumPy array represents a contiguous block of memory, holding entries of a given type (and hence fixed size). The entries are laid out in memory according to the shape, or list of dimension sizes.
In [ ]:
# Convention for import to get shortened namespace
import numpy as np
In [ ]:
# Create a simple array from a list of integers
a = np.array([1, 2, 3])
a
In [ ]:
# Print out the shape attribute
a.shape
In [ ]:
# And now the data type
a.dtype
In [ ]:
# This time use a list of floats
a = np.array([1., 2., 3., 4., 5.])
a
In [ ]:
a.shape
In [ ]:
a.dtype
NumPy also provides helper functions for generating arrays of data. arange
for instance creates a range of values.
In [ ]:
a = np.arange(5)
print a
In [ ]:
a = np.arange(1, 10, 2)
print a
linspace
is similar, but is used to create a linearly spaced array of values.
In [ ]:
b = np.linspace(5, 15, 5)
print b
NumPy also provides simple ways of performing mathematical operations. For instance, in core Python, you can do:
In [ ]:
print [x + y for x,y in zip(a,b)]
Using NumPy this becomes:
In [ ]:
a + b
In [ ]:
a * b
In [ ]:
t = np.arange(0, 2 * np.pi, np.pi / 4)
t
NumPy also provides mathematical functions that operate on arrays:
In [ ]:
np.sin(t)
Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.
In [ ]:
# Convention for import of the pyplot interface
import matplotlib.pyplot as plt
In [ ]:
from IPython.display import display
# Set-up the IPython notebook to put plots inline
%matplotlib inline
In [ ]:
# Create some example data
x = np.linspace(0, 2, 100)
In [ ]:
# Go ahead and explicitly create a figure and an axes
fig = plt.figure(figsize=(8,5))
ax = fig.add_subplot(1, 1, 1)
# Plot our x variable on the x-axis and x^2 on the y-axis
ax.plot(x, x**2)
In [ ]:
# Add some labels to the plot
ax.set_xlabel('x')
ax.set_ylabel('f(x)')
# Needed to reuse and see the updated plot while using inline
display(fig)
In [ ]:
# Let's add a title with a bit of latex syntax
ax.set_title('$y = x^2$', fontdict={'size':22})
display(fig)
In [ ]:
fig = plt.figure(figsize=(8,5))
ax = fig.add_subplot(1, 1, 1)
# Plot a set of different polynomials. The label argument is used when generating a legend.
ax.plot(x, x, label='$x$')
ax.plot(x, x * x, label='$x^2$')
ax.plot(x, x**3, label='$x^3$')
# Add labels and title
ax.set_xlabel('x')
ax.set_ylabel('f(x)')
ax.set_title('Polynomials')
# Add gridlines
ax.grid(True)
# Add a legend to the upper left corner of the plot
ax.legend(loc='upper left')
Make a plot containing:
Indexing is how we pull individual data items out of an array. Slicing extends this process to pulling out a regular set of the items.
In [ ]:
# Create an array for testing
a = np.arange(12).reshape(3, 4)
In [ ]:
a
Indexing in Python is 0-based, so the command below looks for the 2nd item along the first dimension and the 3rd along the second dimension.
In [ ]:
a[1, 2]
Can also just index on one dimension
In [ ]:
a[2]
Negative indices are also allowed, which permit indexing relative to the end of the array.
In [ ]:
a[0, -1]
Slicing syntax is written as start:stop[:step]
, where all numbers are optional. Start defaults to 0, end to len(dim), and step to 1. The second colon is also optional if no step is used. It should be noted that end represents one past the last item; one can also think of it as a half open interval: [start, end)
In [ ]:
a[1:3]
In [ ]:
a[::2, ::2]
In [ ]:
a[:, 2]
In [ ]:
# ... can be used to replace one or more full slices
a[..., 2]
Using the data below, write code to solve the commented problems.
In [ ]:
a = np.arange(60).reshape(3, 4, 5)
In [ ]:
# Decimate the data by a factor of 3 along all dimensions
In [ ]:
# Pull out the 2nd 4x5 slice
In [ ]:
# Grab the last column from all dimensions (result should be 3x4)
"The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations. There are, however, cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows computation." (NumPy User Guide)
In [ ]:
# Create some test data
a = np.linspace(0, np.pi, 4)
a
NumPy can perform operations between arrays and constants:
In [ ]:
a * 2
This extends to operations between arrays of different sizes:
In [ ]:
a * np.array([2])
In [ ]:
b = np.linspace(0, 50, 5)
In [ ]:
print a.shape
print b.shape
In [ ]:
# This will not work, however, because the array sizes cannot be rectified
a * b
Broadcasting works by:
In [ ]:
# If we add a size-1 dimension to a they can be rectified. The process of broadcasting will implicitly add a
a = a.reshape((-1, 1))
print a.shape
In [ ]:
result = a * b
result
In [ ]:
result.shape
Create a 1D array of 100 x values between -3 and 3, and a 1D array of 150 y values between -5 and 5. Use these to calculate an array of radius values.
Radius can be calculated as: $r = \sqrt{x^2 + y^2}$
In [ ]:
Logical indexing allows selecting elements from an array using a second array of True
and False
values.
In [ ]:
# Create some synthetic data representing temperature and wind speed data
temp = 20 * np.cos(np.linspace(0, 2 * np.pi, 100)) + 40 + 2 * np.random.randn(100)
spd = np.abs(10 * np.sin(np.linspace(0, 2 * np.pi, 100)) + 10 + 5 * np.random.randn(100))
In [ ]:
plt.plot(temp)
plt.plot(spd)
In [ ]:
# By doing a comparision between a NumPy array and a value, we get an
# array of values representing the results of the comparison between
# each element and the value
temp > 50
In [ ]:
# We can take the resulting array and use this to index into the NumPy
# array and retrieve the values where the result was true
temp[temp > 50]
In [ ]:
# So long as the size of the boolean array matches the data, the
# boolean array can come from anywhere
temp[spd > 10]
In [ ]:
# Make a copy so we don't modify the original data
temp2 = temp.copy()
# Replace all places where spd is <10 with 0.
temp2[spd < 10] = 0
plt.plot(temp2)
In [ ]:
# Can also combine multiple boolean arrays using the syntax for bitwise operations
# *MUST HAVE PARENTHESES* due to operator precedence
temp[(temp < 45) & (spd > 10)]
Masked arrays are a specialization of NumPy arrays to handle flagging individual elements as masked. This allows elminating values from plots and from computations.
In [ ]:
# Create a masked array of temperature, masking values where speed is < 10
temp_masked = np.ma.array(temp, mask=spd < 10)
In [ ]:
temp_masked
In [ ]:
plt.plot(temp_masked)
In [ ]:
# Masked values can be set after array creation by setting the corresponding value to the special value of ma.masked
temp_masked[temp_masked > 45] = np.ma.masked
In [ ]:
temp_masked
In [ ]:
plt.plot(np.arange(temp_masked.size), temp_masked)
# Set plot limits to the same as before
plt.xlim(0, 100)
plt.ylim(0, 65)
Currently, the integral in the code below results in a bad value. Using masked arrays, fix this so that the integral returns the proper value of 2.393
.
Hint: look at np.isnan()
In [ ]:
t = np.linspace(0, 2*np.pi, 200)
x = np.sqrt(np.sin(t))
print np.trapz(x, t)