NumPy Basics

Numerical Python, or "NumPy" for short, is a foundational package on which many of the most common data science packages are built. Numpy provides us with high performance multi-dimensional arrays which we can use as vectors or matrices.

The key features of numpy are:

  • ndarrays: n-dimensional arrays of the same data type which are fast and space-efficient. There are a number of built-in methods for ndarrays which allow for rapid processing of data without using loops (e.g., compute the mean).
  • Broadcasting: a useful tool which defines implicit behavior between multi-dimensional arrays of different sizes.
  • Vectorization: enables numeric operations on ndarrays.
  • Input/Output: simplifies reading and writing of data from/to file.

Additional Recommended Resources:

In this brief tutorial, I will demonstrate some of the common NumPy operations you will see during the rest of the week.


In [ ]:
import numpy as np
from __future__ import print_function

A common habit is to import under the np namespace as you will often find yourself typing numpy a lot otherwise. Two letters is easier on your fingers and your computer.

Rank 1


In [ ]:
np.arange(-1.0, 1.0, 0.1)

In [ ]:
print(np.random.randint(0, 5, size=10))
print(np.ones(10))
print(np.zeros(10))

In [ ]:
rank1_array = np.array([3, 33, 333])
print(type(rank1_array))
print(rank1_array.shape)
print(rank1_array.size)
print(rank1_array.dtype)
print(rank1_array[0], rank1_array[1], rank1_array[2]) 
print(rank1_array[:], rank1_array[1:], rank1_array[:2])

Rank 2


In [ ]:
np.ones((10,2)) # 10 rows, 2 columns

In [ ]:
np.zeros((2,10)) # 2 columns, 10 rows

In [ ]:
np.eye(10,10)*3 # diagonal of 1s but multiplied by 3

In [ ]:
rank2_array = np.array([[11,12,13],[21,22,23],[31,32,33]])
print(type(rank2_array))
print(rank2_array.shape)
print(rank2_array.size)
print(rank2_array.dtype)
print(rank2_array[0], rank2_array[1], rank2_array[2])

In [ ]:
print(rank2_array[:]) # print everything in array

In [ ]:
print(rank2_array[1:]) # slice from 2nd row and on

In [ ]:
print(rank2_array[:,0]) # all rows, but 1st column

In [ ]:
print(rank2_array[:,1]) # all rows, but 2nd column

In [ ]:
print(rank2_array[:,2]) # all rows, but 3rd column

In [ ]:
print(rank2_array[0,1]) # i=0, j=1 of the 3x3 matrix we just made

Rank 3 and beyond!


In [ ]:
np.random.randint(0, 5, (2,5,5)) # 2 x 5 x 5 [3D matrix!]

In [ ]:
np.random.randint(0, 5, (2,5,5)).shape

Reshaping and Slicing Arrays

Oftentimes, we would like to change up the dimensions a bit. One natural way to do this with NumPy is to reshape arrays. Let's start with a 1-dimensional array of 72 elements to help understand how things get re-ordered or changed around.


In [ ]:
np.arange(72).reshape(3,24)

In [ ]:
np.arange(72).reshape(24,3).T # tranpose; this is not the same as above! beware

Note that the transpose is just ndarray().T. But remember, things are not always what they seem. The above two examples have the exact same dimensionality -- but the reshaping will slice up the vector in different ways! Be careful!


In [ ]:
np.arange(72).reshape(3, 2, -1) # -1 means to let NumPy figure out the size of the remaining dimension

In [ ]:
np.arange(72).reshape(3, -1, 12) # -1 means to let NumPy figure out the size of the remaining dimension

In [ ]:
np.arange(36).reshape(6, 6)

We can even combine multiple indices with Python slicing!


In [ ]:
np.arange(36).reshape(6,6)[2:4,:3]

Filtering


In [ ]:
unfiltered_arr = np.arange(72).reshape(3, -1, 12)
unfiltered_arr

In [ ]:
condition = unfiltered_arr % 3 == 0 # divisible by 3
condition # this is a bitmask!

In [ ]:
unfiltered_arr[condition] # this creates a view (subset) of the original array, not a copy

In [ ]:
unfiltered_arr[condition] = 0 # only change the values matching the condition 
unfiltered_arr

In [ ]:
unfiltered_arr.reshape(-1) # flatten it back!