Numpy is a Python module that provides fast primitives for multidimensional arrays. It's well-suited to implementing numerical linear algebra algorithms, and for those can be much faster than Python's native list and dictionary types when you only need to store and operate on numerical data.
Some of the material from this lesson is lifted from the following more comprehensive tutorial: link
Quick demo. The recommended importing convention is:
In [ ]:
import numpy as np
Numpy provides some natural types and operations on arrays. For instance:
In [ ]:
a_1d = np.array ([0, 1, 2, 3]) # a vector
print a_1d
b_1d = np.array ([4, 5, 6, 7]) # another vector
print b_1d
print a_1d + b_1d
In [ ]:
print 5*a_1d
In [ ]:
print a_1d**2
Getting help. By the way, if you need help getting documentation from within this notebook, here are some handy shortcuts.
In [ ]:
# Append '?' to get help on a specific routine
np.array?
In [ ]:
# Search for key text
np.lookfor ("creating array")
In [ ]:
# Wildcard search
np.con*?
In [ ]:
n = 1000000
In [ ]:
%timeit L = range (n)
L = range (n)
%timeit [i**2 for i in L]
In [ ]:
np.arange (10) # Moral equivalent to `range`
In [ ]:
%timeit A = np.arange (n)
A = np.arange (n)
%timeit A**2
Exercise. Recall the definition of the 2-norm of a vector (or Euclidean length of a vector) from Da Kuang's notes on linear algebra. Compare its speed when using native Python lists versus Numpy arrays.
Hint: For Numpy, look for a routine that computes the norm.
In [ ]:
from random import gauss # Generates random numbers from a Gaussian
from math import sqrt # Computes the square root of a number
n = 1000000
X_py = [gauss (0, 1) for i in range (n)]
X_np = np.array (X_py)
print ("==> Native Python lists:")
%timeit ...
print ("\n==> Numpy:")
%timeit ...
In [ ]:
# Create a two-dimensional array of size 3 rows x 4 columns:
B = np.array([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]])
print B.ndim # What does this do?
print B.shape # What does this do?
print len (B) # What does this do?
In [ ]:
C1 = [[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]]
C2 = [[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]
C = np.array ([C1, C2])
print C.ndim
print C.shape
print len (C)
Besides arange()
, you can also define an interval and a number of points. What does the following code do?
In [ ]:
print np.linspace (0, 1, 10)
In [ ]:
print np.linspace (0, 1, 10, endpoint=False)
In [ ]:
print np.ones ((3, 4))
In [ ]:
print np.zeros ((3, 4))
In [ ]:
print np.eye (3)
In [ ]:
print np.diag ([1, 2, 3])
Exercise. The following code creates an identity matrix in two different ways, which are found to be equal according to the assertion. But in fact there is a subtle difference; can you spot it?
In [ ]:
n = 3
I = np.eye (n)
print ("I = eye(n):")
print (I)
u = [1] * n
I_u = np.diag (u)
print ("u:")
print u
print ("\ndiag (u):")
print I_u
assert np.all (D_u == E)
You can also create empty (uninitialized) arrays. What does the following produce?
In [ ]:
A = np.empty ((3, 4)) # An empty 3 x 4 matrix
print A
In [ ]:
# Recall: C
print C
Exercise. What part of C will the following slice extract? Run the code to check your answer.
In [ ]:
print C[0, 2, :]
Exercise. What will the following slice return? Run the code to check your answer.
In [ ]:
print C[1, 0, ::-1]
Exercise. Consider the following $6 \times 6$ matrix, which has 4 different subsets highlighted.
Write some code to generate this matrix, named Z
. Then, for each subset illustrated above, write an indexing or slicing expression that extracts the subset.
In [ ]:
Z = ... # INSERT YOUR CODE HERE
print ("\n==> Orange slice:")
print Z[...] # Edit to print the orange-ish subset
print ("\n==> Red slice:")
print Z[...] # Edit to print the red-ish subset
print ("\n==> Blue slice:")
print Z[...] # Edit to print the blue-ish subset
print ("\n==> Green slice:")
print Z[...] # Edit to print the green-ish subset
Incidentally, there is a very cute way to create the above matrix. I would have never guessed this method, so I've put some print statements so you can see what it is doing.
In [ ]:
# Rich will demo this in class
In [ ]:
print ("==> Recall C: %s" % str (C.shape))
print C
In [ ]:
C_view = C[1, 0:3:2, 1:4:2] # Question: What does this produce?
print ("==> C_view: %s" % str (C_view.shape))
In [ ]:
C_view[:, :] = -C_view[::-1, ::-1] # Question: What does this do?
In [ ]:
print C
You can force a copy using the .copy()
method:
In [ ]:
C_copy = C[1, 0:3:2, 1:4:2].copy ()
C_copy[:, :] = -C_copy[::-1, ::-1]
print ("==> C_view:")
print (C_view)
print ("\n==> C_copy:")
print (C_copy)
And to check whether two Numpy array variables point to the same object, you can use the numpy.may_share_memory()
function:
In [ ]:
print ("C and C_view share memory: %s" % np.may_share_memory (C, C_view))
print ("C and C_copy share memory: %s" % np.may_share_memory (C, C_copy))
Exercise. Complete the prime number sieve algorithm, which is illustrated below.
That is, given a positive integer $n$, the algorithm iterates from $i \in \{2, 3, 4, \ldots, \left\lfloor\sqrt{n}\right\rfloor\}$, repeatedly "crossing out" values that are strict multiples of $i$. "Crossing out" means maintaining an array of, say, booleans, and setting values that are multiples of $i$ to False
.
In [ ]:
from math import sqrt
n = 20
is_prime = np.ones (n+1, dtype=bool) # the "sieve"
# Initial values
is_prime[0:2] = False
is_prime[2:] = True
# Sieving loop
for i in range (2, int (sqrt (n))):
# Fill in your code here
is_prime[2*i::i] = False
# Prints your primes
print ("==> Primes through %d:" % n)
print np.nonzero (is_prime)[0]
In [ ]:
np.random.seed(3)
x = np.random.random_integers(0, 20, 15) # 15 random ints in [0, 20)
print x
In [ ]:
# Find all positive multiples of 3
mask_mult_3 = (x % 3 == 0) & (x > 0)
print mask_mult_3
print x[mask_mult_3]
In [ ]:
# Pull out an arbitrary subset of elements
inds = np.array ([3, 7, 8, 12])
print x[inds]
For linear algebra, we will be especially interested in 2-D arrays, which we will use to store matrices. There is a final subtle performance issue related to how matrices are stored in memory.
By way of background, physical storage---whether it be memory or disk---is basically one big array. And because of how physical storage is implemented, it turns out that it is much faster to access consecutive elements in memory than, say, to jump around randomly.
A matrix is a two-dimensional object. Thus, when it is stored in memory, it must be mapped in some way to the one-dimensional physical array. There are many possible mappings, but the two most common conventions are known as the row-major and column-major layouts:
In Numpy, you can ask for one layout or the other. The most common convention in numerical linear algebra libraries is column-major layout, which is also the Fortran language convention; by contrast, when you create a two-dimensional array in languages like C and C++, the convention is row-major layout.
The default in Numpy is C order (row-major), even though the linear algebra library convention is Fortran order (column-major). You can request either order using the order
parameter. For linear algebra, we recommend using the column-major convention.
In [ ]:
n = 5000
A_rowmaj = np.ones ((n, n), order='C') # row-major (C/C++ convention)
A_colmaj = np.ones ((n, n), order='F') # column-major (Fortran convention)
Exercise. Write a function that iterates over the columns of a matrix and scales each column by its index. That is, column $A(:, j)$ should be replaced by $j*A(:, j)$.
Then use that routine to measure the difference in time when the input is row-major versus column-major.
Note: In Numpy, this operation can be easily done just by multiplying
In [ ]:
def scale_colwise (A):
"""Given a matrix `A`, visits each column `A[:, j]` and scales it by `j`."""
# Fill in this code
pass
# Measure time to scale a row-major input column-wise
%timeit scale_colwise (A_rowmaj)
# Measure time to scale a column-major input column-wise
%timeit scale_colwise (A_colmaj)
Exercise. Recall the definition of matrix-vector multiplication from Da's LA notes. Benchmark a matrix-vector multiply in native Python, and compare that to doing the same operation in Numpy.
In [ ]: