In [1]:
%matplotlib inline
%pylab inline
NumPy in a nutshell is a multidimensional array library.
Why does one need numpy?
In [2]:
l = list(range(10000))
a = np.arange(10000)
In [3]:
l[:10], a[:10]
Out[3]:
In [4]:
%timeit [x + 1 for x in l]
In [5]:
%timeit a + 1
numpy arraysThere are a number of ways to initialize new numpy arrays, for example from
arange, linspace, etc.For example, to create new vector and matrix arrays from Python lists we can use the numpy.array function.
In [6]:
# one dimensional array, the argument to the array function is a Python list
# will cast to 'best' type as required: int -> double -> complex double
v = np.array([1,2,3,4])
v
Out[6]:
In [7]:
# a 2d image: the argument to the array function is a nested Python list
img = np.array([[1, 2], [3, 4]])
img
Out[7]:
An array has two basic properties:
In [8]:
v.dtype
Out[8]:
In [9]:
v.shape
Out[9]:
In [10]:
img.shape
Out[10]:
Many functions take arguments named dtype or shape (sometimes size) that define how the output looks like:
In [11]:
img = np.array([[1, 2], [3, 4]], dtype=np.complex64)
img
Out[11]:
In [12]:
np.empty((3,3))
Out[12]:
In [13]:
np.zeros((3,3))
Out[13]:
In [14]:
np.ones((3,3), dtype=np.int16)
Out[14]:
In [15]:
# get same type of array
np.empty_like(v)
np.ones_like(v)
np.zeros_like(v)
np.full_like(v, 3) #numpy 1.8
Out[15]:
In [16]:
# create a range
x = np.arange(0, 10, 1) # arguments: start, stop, step, [dtype]
x
Out[16]:
In [17]:
x = np.arange(-1, 1, 0.1)
x
Out[17]:
In [18]:
# using linspace, end points are included by default and you define the number of points instead of the step
np.linspace(0, 10, num=30, endpoint=True)
Out[18]:
In [19]:
np.logspace(0, 10, num=10, base=np.e)
Out[19]:
In [20]:
from numpy import random
In [21]:
# standard normal distributed random numbers
random.normal(loc=-5., scale=5, size=(5,5))
Out[21]:
In [22]:
!head stockholm_td_adj.dat
In [23]:
data = np.genfromtxt('stockholm_td_adj.dat')
data.shape, data.dtype
Out[23]:
In [24]:
!wc -l stockholm_td_adj.dat
In [25]:
data = np.genfromtxt('stockholm_td_adj.dat', names=True)
# structured array
data.shape, data.dtype
Out[25]:
In [26]:
data['year'][:10]
Out[26]:
In [27]:
fig, ax = subplots(figsize=(14,4))
x = data['year'] + data['month'] / 12.0 + data['day'] / 365.
y = data['temperature']
ax.plot(x, y)
ax.axis('tight')
ax.set_title('temperatures in Stockholm')
ax.set_xlabel('year')
ax.set_ylabel('tempature (C)');
In [28]:
vec = np.arange(9)
# create a view of vec with a different shape
imgview = vec.reshape(3, 3)
imgview
Out[28]:
In [29]:
# the ordering of the elements can be defined (Fortran or C)
vec.reshape(3, 3, order='F')
Out[29]:
In [30]:
# the value of a negative entry will be infered from the remaining free entries
vec.reshape(3, -1).shape
Out[30]:
Reshaping will create a view/reference of the original data array. Changing the reshaped array will also change the original array.
In [31]:
imgview[0] = 5
vec
Out[31]:
In [32]:
# regular python indexing [start:stop:step]
vec[3], vec[3::-1]
Out[32]:
The dimensions are separated by commas in the brackets:
In [33]:
imgview[1,0]
Out[33]:
In [34]:
#row
imgview[0], imgview[0,:]
Out[34]:
In [35]:
#column
imgview[:, 0]
# Ellipsis means all free axis, this selects the zero element of the last dimension, regardless of the dimension of imgview
imgview[..., 0]
Out[35]:
In [36]:
# assignment on a indexed array
imgview[-1, :] = 1
imgview
Out[36]:
Compared to Python lists, a slice of an array is no copy, it is a view on the original data copies must be explicit, e.g. with
imgview[0, :].copy()
In [37]:
a = np.ones((5,))
b = np.zeros_like(a)
a, b
Out[37]:
In [38]:
b = a + a
b
Out[38]:
In [39]:
# in place
a /= b
a
Out[39]:
In [40]:
a + 1
Out[40]:
In [41]:
a + np.ones((4,))
Broadcasting applies if the shapes of a set of arrays do not match but the shapes can be made to match by repeating certain dimensions.
The basic rules when broadcasting can be applied are:
The simplest case is adding a zero dimensional array (scalar) to a higher dimensional array:
In [43]:
from IPython.display import Image
Image(filename='images/broadcast_11.png')
Out[43]:
In [44]:
# subtract vector from an image (e.g. remove overscan offset):
img = np.arange(4*3).reshape(4,3)
o = np.array([1, 2, 3])
#dim(4, 3) - dim(3)
img - o
Out[44]:
In [45]:
img = img.reshape(3,4)
o = np.array([1, 2, 3])
#dim(3, 4) - dim(3), trailing dimension does not match
img - o
In [46]:
# solution: add a new empty dimension (size 1)
# dim(3, 4) - dim(3, 1)
img - o[:,np.newaxis]
Out[46]:
In [47]:
v = np.arange(5, 10)
row_indices = [0, 2, 4]
print(v)
print(row_indices, '->', v[row_indices])
In [48]:
img = np.arange(25).reshape(5, 5)
col_indices = [1, 2, -1] # index -1 means the last element
img[row_indices, col_indices]
Out[48]:
In [49]:
img[:, col_indices]
Out[49]:
Functions related with positions of data like sorting, partitioning, min/max often have a arg variant which instead of returning the result, return an index array that when applied to the original array return the result:
In [50]:
v = random.randint(0, 5, size=10)
sortindices = v.argsort()
sortindices, v[sortindices]
Out[50]:
We can also index with boolean masks: If the index array is an Numpy array of with data type bool, then an element is selected (True) or not (False) depending on the value of the index mask at the position each element:
In [51]:
x = arange(0, 10, 0.5)
x
Out[51]:
In [52]:
# & is a bitwise and, which, for booleans, is equivalent to np.logical_and(a, b)
mask = (x >= 5) & (x < 7.5)
mask
Out[52]:
In [53]:
x[mask]
Out[53]:
NumPy also contains a special array class which carries a mask and ignores the masked values for all arithmetic operations:
In [54]:
mx = np.ma.MaskedArray(x, mask=~((5 < x) & (x < 7.5)))
mx
Out[54]:
In [55]:
x.sum(), mx.sum()
Out[55]:
In [56]:
indices = np.where((x > 5) & (x < 7.5))
indices
Out[56]:
It can also be used to conditionally select from two options
In [57]:
a = np.ones_like(x)
np.where((x > 5) & (x < 7.5), a, 5) # note the broadcasting of the second option
Out[57]:
In [57]: