Adapted from Scientific Python: Part 1 (lessons/thw-numpy/numpy.ipynb)
NumPy is a Python package implementing efficient collections of specific types of data (generally numerical), similar to the standard array
module (but with many more features). NumPy arrays differ from lists and tuples in that the data is contiguous in memory. A Python list,
[0, 1, 2]
, in contrast, is actually an array of pointers to Python objects representing each number. This allows NumPy arrays to be
considerably faster for numerical operations than Python lists/tuples.
In [ ]:
# by convention, we typically import numpy as the alias np
import numpy as np
Let's see what numpy can do.
In [ ]:
#np?
#np.
We can try out some of those constants and functions:
In [ ]:
print((np.sqrt(4)))
print((np.pi)) # a constant
print((np.sin(np.pi)))
"That's great," you're thinking. "math
already has all of those functions and constants." But that's not the real beauty of NumPy.
In [ ]:
Creating a NumPy array is as simple as passing a sequence to numpy.array:
Numpy arrays are collections of things, all of which must be the same type, that work similarly to lists (as we've described them so far). The most important are:
Arrays can be created from existing collections such as lists, or instantiated "from scratch" in a few useful ways.
In [ ]:
arr1 = np.array([1, 2.3, 4])
# Type of a numpy array
print((type(arr1)))
# Type of the data inside a numpy array dtype=data type
print((arr1.dtype))
In [ ]:
Choose your datatype based on how large the largest values could be, and how much memory you expect to use
In [ ]:
print(('2 rows, 3 columns of zeros:\n', np.zeros((2,3))))
print(('4x4 identity matrix:\n', np.identity(4)))
squared = []
for x in range(5):
squared.append(x**2)
print(squared)
a = np.array(squared)
b = np.zeros_like(a)
print(('a:\n', a))
print(('b:\n', b))
These arrays have attributes, like .ndim
and .shape
that tell us about the number and length of the dimensions.
The dimension of an array is the number of indices needed to select an element. Thus, if the array is seen as a function on a set of possible index combinations, it is the dimension of the space of which its domain is a discrete subset. Thus a one-dimensional array is a list of data, a two-dimensional array a rectangle of data, a three-dimensional array a block of data, etc.
The shape is the number of elements in each dimension of data
In [ ]:
c = np.ones((15, 30))
print(('number of dimensions of c:', c.ndim))
print(('length of c in each dimension:', c.shape))
x = np.array([[[1,2,3],[4,5,6],[7,8,9]] , [[0,0,0],[0,0,0],[0,0,0]]])
print(('number of dimensions of x:', x.ndim))
print(('length of x in each dimension:', x.shape))
NumPy has its own range()
function, np.arange()
(stands for array-range), that is more efficient for building larger arrays. It functions in much the same way as range()
.
NumPy also has linspace()
and logspace()
, that can generate equally spaced samples between a start-point and an end-point. Find out more with np.linspace?
.
In [ ]:
print("Arange")
print((np.arange(5)))
# Args: start, stop, number of elements
print("Linspace")
print((np.linspace(5, 10, 5)))
# logspace can also take a base argument, by default it is 10
print("Logspace")
print((np.logspace(0, 1, 5)))
print((np.logspace(0, 1, 5, base=2)))
In [ ]:
In [ ]:
np.loadtxt?
The simplest way to use it is to just give it a file name. By default, your data will be loaded as floats with whitespace being the delimiter
my_arr = np.loadtxt('myfile.txt')
More likely you will need to use some of the keyword arguments. like dtype
, delimiter
, skiprows
, or usecols
Docs available here: http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
my_array = loadtxt('myfile.csv', usecols=[1,2,3,4,5,6,7,8,9,10,11,12], delimiter=',')
In [ ]:
np.loadtxt('simple.csv', delimiter=',')
In [ ]:
In [ ]:
A = np.arange(5)
B = np.arange(5, 10)
print(('A', A))
print(('B', B))
print(('A+B', A+B))
print(('B-A', B-A))
print(('A*B', A*B))
In addition, if one of the arguments is a scalar, that value will be applied to all the elements of the array.
scalar - a quantity possessing only magnitude. (In this case we mean a single number either an int or a float)
In [ ]:
A = np.arange(5)
print(('A', A))
print(('A+10', A+10))
print(('2 * A', 2*A))
print(('A ** 2', A**2))
You can use arrays as vectors and matrices in linear algebra operations
Specifically, you can perform matrix/vector multiplication between arrays, by using the .dot method, or the np.dot function
dot product - the dot product between two vectors is based on the projection of one vector onto another.
In [ ]:
print((A.dot(B)))
print((np.dot(A, B)))
If you are planning on doing serious linear algebra, you might be better off using the np.matrix object instead of np.array.
In [ ]:
# Numpy arrays
A = np.arange(5)*2
print(A)
# Lists
B = list(range(5))*2
print(B)
Similarly, when adding two numpy arrays together, we get the vector sum back, whereas when adding two lists together, we get the concatenation back.
In [ ]:
# Numpy arrays
A = np.arange(5) + np.arange(5)
print(A)
# Lists
B = list(range(5)) + list(range(5))
print(B)
Much like the basic arithmetic operations we discussed above, comparison operations are performed element-wise. That is, rather than returning a
single boolean, comparison operators compare each element in both arrays pairwise, and return an array
of booleans (if the sizes of the input
arrays are incompatible, the comparison will simply return False). For example:
In [ ]:
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([1, 1, 3, 3, 5])
print((arr1 == arr2))
c = (arr1 == arr2)
print((type(c)))
print((c.dtype))
You can get a portion of an array by using a boolean array as the index. It will return an array where only true values are returned
In [ ]:
print(arr1)
print(c)
print((arr1[c]))
Note: You can use the methods .any()
and .all()
or the functions np.any
and np.all
to return a single boolean indicating whether any or all values in the array are True
, respectively.
In [ ]:
print((np.all(c)))
print((c.all()))
print((c.any()))
In [ ]:
In order to be as efficient as possible, numpy uses "views" instead of copies wherever possible. That is, numpy arrays derived from another base array generally refer to the ''exact same data'' as the base array. The consequence of this is that modification of these derived arrays will also modify the base array. The result of an array indexed by an array of indices is a ''copy'', but an array indexed by an array of booleans is a ''view''.
Specifically, slices of arrays are always views, unlike slices of lists or tuples, which are always copies.
In [ ]:
A = np.arange(5)
B = A[0:1]
B[0] = 42
print(A)
A = list(range(5))
B = A[0:1]
B[0] = 42
print(A)
In [ ]:
a = np.array([1,2,3])
print((a[0:2]))
How can we index if the array has more than one dimension?
In [ ]:
c = np.random.rand(3,3)
print(c)
print((c[1:3,0:2]))
print(a)
c[0,:] = a
print(c)
In [ ]: