NumPy provides low-level and fast features to manipulate arrays of data (main implementation is in C).
While it has some relatively advanced features like linear algebraic calculations and more, in many cases Pandas
provides a more convenient high level interface to do the same things (and even more).
If you just want a quick overview, the following cheatsheet provides one:
In [2]:
# This is a regular python list
range(1,4)
Out[2]:
In [3]:
# If you multiply or add to it, it extends the list
In [4]:
a = range(1, 10)
a * 2
Out[4]:
In [5]:
a = range(1,11)
a + [ 11 ]
Out[5]:
In [6]:
# Compare this to np.array:
import numpy as np
np.array(range(1,10))
Out[6]:
In [7]:
# Multiplication is defined as multiplying each element in the array
a = np.array(range(1, 10))
a * 2
Out[7]:
In [8]:
a + 5 # Adding to it works as well, this just adds 5 to each element (note that this operation is undefined in regular python)
Out[8]:
ndarray
is actually a multi-dimensional array
In [9]:
np.array([[1,2],[3,4],[5,6]])
Out[9]:
In [10]:
a = np.array([[1,2],[3,4],[5,6]])
a.shape, a.dtype, a.size, a.ndim # shape -> dimension sizes, dtype -> datatype, size -> total number of elems, ndim -> number of dimensions
Out[10]:
In [11]:
# You can use comma-separated indexing like so:
a[1,1] # same as a[1][1]
Out[11]:
In [12]:
# Note that 1,1 is really a tuple (the parenthesis are just ommited), so this works too:
indices = (1,1)
a[indices]
Out[12]:
In [13]:
# Note that regular python doesn't support this
mylist = [[1,2],[3,4]]
# mylist[1,1] # error!
In [14]:
# As always, use ? to get details
a?
In [15]:
np.zeros(5)
Out[15]:
In [16]:
np.ones(10)
Out[16]:
In [17]:
np.empty(7) # Empty returns uninitialized garbage values (not zeroes!)
Out[17]:
In [18]:
np.identity(5) # identity array
Out[18]:
In [19]:
np.arange(11) # same as .nparray(range(11))
Out[19]:
In [20]:
np.array(range(11))
Out[20]:
In [21]:
np.array([1,2,3], dtype='float64')
Out[21]:
In [22]:
# Show all available types
np.sctypes
Out[22]:
In [23]:
# Consider strings
a = np.array(['12', '999', '432536'])
a.dtype
Out[23]:
The datatype S5
stands for Fixed String with length 5, because the longest string in the array is of length 5. Compare this to:
In [24]:
np.array(['123', '21345312312'])
Out[24]:
In [25]:
# You can also cast between types
a.astype(np.int32) # This copies the data into a new array, it does not change the array itself!
Out[25]:
In [26]:
a = np.array(range(10, 20))
a[3:]
Out[26]:
In [27]:
a[4:6]
Out[27]:
However, slices in Numpy are actually views on the original np.array
which means that if you manipulate them,
the array changes as well.
In [28]:
a[3:6] = 33
a
Out[28]:
Compare this to regular python:
In [29]:
b = range(1, 10)
# b[2:7] = 10 # this will raise an error
In [30]:
# Copies need to be explicit in numpy
b = a[3:6].copy()
b[:] = 22 # change all values to 22
b, a # print b and a, see that a is not modified
Out[30]:
In [31]:
# You can also slice multi-dimensionally
c = np.array([[1,2,3], [4,5,6], [7,8,9]])
c[1:,:1] # Only keep the last 2 arrays, and from them, only keep up the first elements
Out[31]:
In [32]:
# Note how this is different from using c[1:][:1]
# This is really doing 2 operations: first slice to keep the last 2 arrays.
# This returns a new array: array([[4, 5, 6],[7, 8, 9]])
# Then from this new array, return the first element.
c[1:][:1]
Out[32]:
This picture explains NumPy
's array slicing pretty well.
In [33]:
# A boolean mask is just a boolean array
mask = np.array([ True, False, True ])
mask
Out[33]:
In [34]:
# To apply the mask against a target, just pass it like an index.
# The result is an array with the elements from 'target' that had True on their corresponding index in 'mask'.
target = np.array([7,8,9])
target[mask]
Out[34]:
In [35]:
# This works for multi-dimensional arrays too, but the result will obviously be a single dimensional array
# Also, you need to make sure that the dimensions of your target and mask arrays match
target2 = np.array([['a','b','c'], ['d','e','f'],['g','h','i']])
mask2 = np.array([[False,True,False], [True, True, False], [True, False, True]])
target2[mask2]
Out[35]:
The easiest way to create a boolean mask is to just create an array with booleans in it. However, you can also create boolean masks by applying a boolean expression to a existing array.
In [36]:
numbers = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
numbers > 5
Out[36]:
In [37]:
numbers % 2 == 0 # Even numbers
Out[37]:
Strings work too!
In [38]:
names = np.array(["John", "Mary", "Joe", "Jane", "Marc", "Jorge", "Adele" ])
In [39]:
names == "Joe"
Out[39]:
You can combine filters using the boolean arithmetic operations |
and &
. Note that you have to but the individual boolean expressions between parentheses at this point.
In [40]:
(names == "Joe") | (names == "Mary")
Out[40]:
Once you have boolean mask, you can apply it to an array of the same length as a boolean mask. This is often useful if you want to select certain values in an array like so:
In [41]:
names[names == "Joe"], numbers[numbers > 5]
Out[41]:
In [42]:
numbers = np.array([-1, -9, 18.2, 3, 4.3, 0, 5.3, -12.2])
numbers
Out[42]:
In [43]:
np.sum(numbers), np.mean(numbers)
Out[43]:
In [44]:
np.square(numbers)
Out[44]:
In [45]:
np.abs(numbers)
Out[45]:
In [46]:
np.sqrt(np.abs(numbers)) # Can't take sqrt of negative number, so let's get the abs values first
Out[46]:
In [47]:
np.max(numbers), np.min(numbers)
Out[47]:
In [48]:
np.ceil(numbers), np.floor(numbers)
Out[48]:
The boolean expressions that create boolean masks (see prev section) can also be expressed explicitely
In [49]:
np.greater(numbers, 3)
Out[49]:
In [50]:
# combining with boolean arithmetic
np.logical_or(np.less_equal(numbers, 4), np.greater(numbers, 0))
Out[50]:
In [51]:
np.sort(numbers)
Out[51]:
In [52]:
np.unique(np.array([1, 2, 4, 2, 5, 1]))
Out[52]:
Some of these operations are also directly available on the array
In [53]:
numbers.sum(), numbers.mean(), numbers.min(), numbers.max()
Out[53]:
In [54]:
np.save("/tmp/myarray", np.arange(10))
In [55]:
# The .npy extension is automatically added
!cat /tmp/myarray.npy
In [56]:
np.load("/tmp/myarray.npy") # You DO need to specify the .npy extension when loading
Out[56]:
You can also save/load as a zip file using savez
and loadz
.
In [57]:
np.savez("/tmp/myarray2", a=np.arange(2000))
In [58]:
np.load("/tmp/myarray2.npz")['a'] # Loading from a npz file is lazy, you need to specify which array to load
Out[58]:
You can also load other file formats using loadtxt
.
In [59]:
!echo "1,2,3,4" > /tmp/numpytxtsample.txt
!cat /tmp/numpytxtsample.txt
In [60]:
np.loadtxt("/tmp/numpytxtsample.txt", delimiter=",")
Out[60]:
In [61]:
x = np.array([[1,2,3],[4,5,6], [7,8,9]])
y = np.array([[9,8,7],[6,5,4],[3,2,1]])
x,y
Out[61]:
In [62]:
# Matrix multiplication
np.dot(x,y) # same as: x.dot(y)
Out[62]:
In [63]:
# The numpy.linalg package has a bunch of extra linear algebra functions
# For example, the determinant (https://en.wikipedia.org/wiki/Determinant)
from numpy.linalg import det
det(x)
Out[63]:
Other commonly used functions from numpy.linalg
Function | Description |
---|---|
diag |
Return diagonal of matrix as 1D array |
dot |
Matrix multiplication |
trace |
Sum of diagonal elements |
det |
Determinant |
eig |
Eigenvalues and eigenvectors |
inv |
Inverse of square matrix |
```qr```` | QR Decomposition |
svd |
Singular Value Decomposition (SVD) |
solv |
Solve linear system Ax=b for x, where A is a square matrix |
lstsq |
Compute the least square solution to Ax=b |
In [ ]: