Adapted from Scientific Python: Part 1 (lessons/thw-numpy/numpy.ipynb)

Introducing NumPy

NumPy is a Python package implementing efficient collections of specific types of data (generally numerical), similar to the standard array module (but with many more features). NumPy arrays differ from lists and tuples in that the data is contiguous in memory. A Python list, [0, 1, 2], in contrast, is actually an array of pointers to Python objects representing each number. This allows NumPy arrays to be considerably faster for numerical operations than Python lists/tuples.


In [ ]:
# by convention, we typically import numpy as the alias np
import numpy as np

Let's see what numpy can do.


In [ ]:
#np?
#np.

We can try out some of those constants and functions:


In [ ]:
print((np.sqrt(4)))
print((np.pi))         # a constant
print((np.sin(np.pi)))

"That's great," you're thinking. "math already has all of those functions and constants." But that's not the real beauty of NumPy.

TRY IT

Find the square root of pi using numpy functions and constants


In [ ]:

Numpy arrays (ndarrays)

Creating a NumPy array is as simple as passing a sequence to numpy.array:

Numpy arrays are collections of things, all of which must be the same type, that work similarly to lists (as we've described them so far). The most important are:

  1. You can easily perform elementwise operations (and matrix algebra) on arrays
  2. Arrays can be n-dimensional
  3. Arrays must be pre-allocated (ie, there is no equivalent to append)

Arrays can be created from existing collections such as lists, or instantiated "from scratch" in a few useful ways.


In [ ]:
arr1 = np.array([1, 2.3, 4]) 
# Type of a numpy array
print((type(arr1)))
# Type of the data inside a numpy array dtype=data type
print((arr1.dtype))

TRY IT

Create an array from the list [0,1,2] and print out it's dtype


In [ ]:

Datatype options

Choose your datatype based on how large the largest values could be, and how much memory you expect to use

  • bool_ - Boolean (True or False) stored as a byte
  • int_ - Default integer type (same as C long; normally either int64 or int32)
  • int8 - Byte (-128 to 127)
  • int16 - Integer (-32768 to 32767)
  • int32 - Integer (-2147483648 to 2147483647)
  • int64 - Integer (-9223372036854775808 to 9223372036854775807)
  • uint8 - Unsigned integer (0 to 255)
  • uint16 - Unsigned integer (0 to 65535)
  • uint32 - Unsigned integer (0 to 4294967295)
  • uint64 - Unsigned integer (0 to 18446744073709551615)
  • float_ - Shorthand for float64.
  • float16 - Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
  • float32 - Single precision float: sign bit, 8 bits exponent, 23 bits mantissa
  • float64 - Double precision float: sign bit, 11 bits exponent, 52 bits mantissa
  • complex_ - Shorthand for complex128.
  • complex64 - Complex number, represented by two 32-bit floats (real and imaginary components)
  • complex128 - Complex number, represented by two 64-bit floats (real and imaginary components)

Creating Arrays

There are many other ways to create NumPy arrays, such as np.identity, np.zeros, np.zeros_like, np.ones, np.ones_like


In [ ]:
print(('2 rows, 3 columns of zeros:\n', np.zeros((2,3)))) 
print(('4x4 identity matrix:\n', np.identity(4)))
squared = []
for x in range(5):
    squared.append(x**2)
print(squared)
a = np.array(squared)
b = np.zeros_like(a)

print(('a:\n', a))
print(('b:\n', b))

These arrays have attributes, like .ndim and .shape that tell us about the number and length of the dimensions.

The dimension of an array is the number of indices needed to select an element. Thus, if the array is seen as a function on a set of possible index combinations, it is the dimension of the space of which its domain is a discrete subset. Thus a one-dimensional array is a list of data, a two-dimensional array a rectangle of data, a three-dimensional array a block of data, etc.

The shape is the number of elements in each dimension of data


In [ ]:
c = np.ones((15, 30))
print(('number of dimensions of c:', c.ndim)) 
print(('length of c in each dimension:', c.shape))

x = np.array([[[1,2,3],[4,5,6],[7,8,9]] , [[0,0,0],[0,0,0],[0,0,0]]])
print(('number of dimensions of x:', x.ndim)) 
print(('length of x in each dimension:', x.shape))

NumPy has its own range() function, np.arange() (stands for array-range), that is more efficient for building larger arrays. It functions in much the same way as range().

NumPy also has linspace() and logspace(), that can generate equally spaced samples between a start-point and an end-point. Find out more with np.linspace?.


In [ ]:
print("Arange")
print((np.arange(5)))

# Args: start, stop, number of elements
print("Linspace")
print((np.linspace(5, 10, 5)))

# logspace can also take a base argument, by default it is 10
print("Logspace")
print((np.logspace(0, 1, 5)))
print((np.logspace(0, 1, 5, base=2)))

TRY IT

Create a numpy array with 8 rows and 50 columns of 0's


In [ ]:

Creating numpy arrays from text files

You can use loadtxt to load data from a text file (csv or tab-delimited data)


In [ ]:
np.loadtxt?

The simplest way to use it is to just give it a file name. By default, your data will be loaded as floats with whitespace being the delimiter

my_arr = np.loadtxt('myfile.txt')

More likely you will need to use some of the keyword arguments. like dtype, delimiter, skiprows, or usecols Docs available here: http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html

my_array = loadtxt('myfile.csv', usecols=[1,2,3,4,5,6,7,8,9,10,11,12], delimiter=',')


In [ ]:
np.loadtxt('simple.csv', delimiter=',')

TRY IT

Load the file 'example.tsv' a tab delimited file. Once you have that working, only load the odd numbered columns (1,3,5).


In [ ]:

Arithmetic with ndarrays

Standard arithmetic operators perform element-wise operations on arrays of the same size.


In [ ]:
A = np.arange(5)
B = np.arange(5, 10)

print(('A', A))
print(('B', B))

print(('A+B', A+B))
print(('B-A', B-A))
print(('A*B', A*B))

In addition, if one of the arguments is a scalar, that value will be applied to all the elements of the array.

scalar - a quantity possessing only magnitude. (In this case we mean a single number either an int or a float)


In [ ]:
A = np.arange(5)
print(('A', A))
print(('A+10', A+10))
print(('2 * A', 2*A))
print(('A ** 2', A**2))

Linear algebra with arrays

You can use arrays as vectors and matrices in linear algebra operations

Specifically, you can perform matrix/vector multiplication between arrays, by using the .dot method, or the np.dot function

dot product - the dot product between two vectors is based on the projection of one vector onto another.


In [ ]:
print((A.dot(B)))
print((np.dot(A, B)))

If you are planning on doing serious linear algebra, you might be better off using the np.matrix object instead of np.array.

Numpy 'gotchas'

Multiplication and Addition

As you may have noticed above, since NumPy arrays are modeled more closely after vectors and matrices, multiplying by a scalar will multiply each element of the array, whereas multiplying a list by a scalar will repeat that list N times.


In [ ]:
# Numpy arrays
A = np.arange(5)*2
print(A)
# Lists
B = list(range(5))*2
print(B)

Similarly, when adding two numpy arrays together, we get the vector sum back, whereas when adding two lists together, we get the concatenation back.


In [ ]:
# Numpy arrays
A = np.arange(5) + np.arange(5)
print(A)
# Lists
B = list(range(5)) + list(range(5))
print(B)

Boolean operators work on arrays too, and they return boolean arrays

Much like the basic arithmetic operations we discussed above, comparison operations are performed element-wise. That is, rather than returning a single boolean, comparison operators compare each element in both arrays pairwise, and return an array of booleans (if the sizes of the input arrays are incompatible, the comparison will simply return False). For example:


In [ ]:
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([1, 1, 3, 3, 5])
print((arr1 == arr2))
c = (arr1 == arr2)
print((type(c)))
print((c.dtype))

You can get a portion of an array by using a boolean array as the index. It will return an array where only true values are returned


In [ ]:
print(arr1)
print(c)
print((arr1[c]))

Note: You can use the methods .any() and .all() or the functions np.any and np.all to return a single boolean indicating whether any or all values in the array are True, respectively.


In [ ]:
print((np.all(c)))
print((c.all()))
print((c.any()))

TRY IT

Create a boolean array for arr1 for where values are >= 3


In [ ]:

Views vs. Copies

In order to be as efficient as possible, numpy uses "views" instead of copies wherever possible. That is, numpy arrays derived from another base array generally refer to the ''exact same data'' as the base array. The consequence of this is that modification of these derived arrays will also modify the base array. The result of an array indexed by an array of indices is a ''copy'', but an array indexed by an array of booleans is a ''view''.

Specifically, slices of arrays are always views, unlike slices of lists or tuples, which are always copies.


In [ ]:
A = np.arange(5)
B = A[0:1]
B[0] = 42
print(A)

A = list(range(5))
B = A[0:1]
B[0] = 42
print(A)

Indexing arrays

In addition to the usual methods of indexing lists with an integer (or with a series of colon-separated integers for a slice), numpy allows you to index arrays in a wide variety of different ways for more advanced operations.

First, the simple way:


In [ ]:
a = np.array([1,2,3])
print((a[0:2]))

How can we index if the array has more than one dimension?


In [ ]:
c = np.random.rand(3,3)
print(c)
print((c[1:3,0:2]))
print(a)
c[0,:] = a
print(c)

TRY IT

Create a random 4x4 array, print out the second row, second column.


In [ ]: