In [1]:
%matplotlib inline
%pylab inline
NumPy in a nutshell is a multidimensional array library.
Why does one need numpy?
In [2]:
l = list(range(10000))
a = np.arange(10000)
In [3]:
l[:10], a[:10]
Out[3]:
In [4]:
%timeit [x + 1 for x in l]
In [5]:
%timeit a + 1
numpy
arraysThere are a number of ways to initialize new numpy arrays, for example from
arange
, linspace
, etc.For example, to create new vector and matrix arrays from Python lists we can use the numpy.array
function.
In [6]:
# one dimensional array, the argument to the array function is a Python list
# will cast to 'best' type as required: int -> double -> complex double
v = np.array([1,2,3,4])
v
Out[6]:
In [7]:
# a 2d image: the argument to the array function is a nested Python list
img = np.array([[1, 2], [3, 4]])
img
Out[7]:
An array has two basic properties:
In [8]:
v.dtype
Out[8]:
In [9]:
v.shape
Out[9]:
In [10]:
img.shape
Out[10]:
Many functions take arguments named dtype
or shape
(sometimes size
) that define how the output looks like:
In [11]:
img = np.array([[1, 2], [3, 4]], dtype=np.complex64)
img
Out[11]:
In [12]:
np.empty((3,3))
Out[12]:
In [13]:
np.zeros((3,3))
Out[13]:
In [14]:
np.ones((3,3), dtype=np.int16)
Out[14]:
In [15]:
# get same type of array
np.empty_like(v)
np.ones_like(v)
np.zeros_like(v)
np.full_like(v, 3) #numpy 1.8
Out[15]:
In [16]:
# create a range
x = np.arange(0, 10, 1) # arguments: start, stop, step, [dtype]
x
Out[16]:
In [17]:
x = np.arange(-1, 1, 0.1)
x
Out[17]:
In [18]:
# using linspace, end points are included by default and you define the number of points instead of the step
np.linspace(0, 10, num=30, endpoint=True)
Out[18]:
In [19]:
np.logspace(0, 10, num=10, base=np.e)
Out[19]:
In [20]:
from numpy import random
In [21]:
# standard normal distributed random numbers
random.normal(loc=-5., scale=5, size=(5,5))
Out[21]:
In [22]:
!head stockholm_td_adj.dat
In [23]:
data = np.genfromtxt('stockholm_td_adj.dat')
data.shape, data.dtype
Out[23]:
In [24]:
!wc -l stockholm_td_adj.dat
In [25]:
data = np.genfromtxt('stockholm_td_adj.dat', names=True)
# structured array
data.shape, data.dtype
Out[25]:
In [26]:
data['year'][:10]
Out[26]:
In [27]:
fig, ax = subplots(figsize=(14,4))
x = data['year'] + data['month'] / 12.0 + data['day'] / 365.
y = data['temperature']
ax.plot(x, y)
ax.axis('tight')
ax.set_title('temperatures in Stockholm')
ax.set_xlabel('year')
ax.set_ylabel('tempature (C)');
In [28]:
vec = np.arange(9)
# create a view of vec with a different shape
imgview = vec.reshape(3, 3)
imgview
Out[28]:
In [29]:
# the ordering of the elements can be defined (Fortran or C)
vec.reshape(3, 3, order='F')
Out[29]:
In [30]:
# the value of a negative entry will be infered from the remaining free entries
vec.reshape(3, -1).shape
Out[30]:
Reshaping will create a view/reference of the original data array. Changing the reshaped array will also change the original array.
In [31]:
imgview[0] = 5
vec
Out[31]:
In [32]:
# regular python indexing [start:stop:step]
vec[3], vec[3::-1]
Out[32]:
The dimensions are separated by commas in the brackets:
In [33]:
imgview[1,0]
Out[33]:
In [34]:
#row
imgview[0], imgview[0,:]
Out[34]:
In [35]:
#column
imgview[:, 0]
# Ellipsis means all free axis, this selects the zero element of the last dimension, regardless of the dimension of imgview
imgview[..., 0]
Out[35]:
In [36]:
# assignment on a indexed array
imgview[-1, :] = 1
imgview
Out[36]:
Compared to Python lists, a slice of an array is no copy, it is a view on the original data copies must be explicit, e.g. with
imgview[0, :].copy()
In [37]:
a = np.ones((5,))
b = np.zeros_like(a)
a, b
Out[37]:
In [38]:
b = a + a
b
Out[38]:
In [39]:
# in place
a /= b
a
Out[39]:
In [40]:
a + 1
Out[40]:
In [41]:
a + np.ones((4,))
Broadcasting applies if the shapes of a set of arrays do not match but the shapes can be made to match by repeating certain dimensions.
The basic rules when broadcasting can be applied are:
The simplest case is adding a zero dimensional array (scalar) to a higher dimensional array:
In [43]:
from IPython.display import Image
Image(filename='images/broadcast_11.png')
Out[43]:
In [44]:
# subtract vector from an image (e.g. remove overscan offset):
img = np.arange(4*3).reshape(4,3)
o = np.array([1, 2, 3])
#dim(4, 3) - dim(3)
img - o
Out[44]:
In [45]:
img = img.reshape(3,4)
o = np.array([1, 2, 3])
#dim(3, 4) - dim(3), trailing dimension does not match
img - o
In [46]:
# solution: add a new empty dimension (size 1)
# dim(3, 4) - dim(3, 1)
img - o[:,np.newaxis]
Out[46]:
In [47]:
v = np.arange(5, 10)
row_indices = [0, 2, 4]
print(v)
print(row_indices, '->', v[row_indices])
In [48]:
img = np.arange(25).reshape(5, 5)
col_indices = [1, 2, -1] # index -1 means the last element
img[row_indices, col_indices]
Out[48]:
In [49]:
img[:, col_indices]
Out[49]:
Functions related with positions of data like sorting, partitioning, min/max often have a arg variant which instead of returning the result, return an index array that when applied to the original array return the result:
In [50]:
v = random.randint(0, 5, size=10)
sortindices = v.argsort()
sortindices, v[sortindices]
Out[50]:
We can also index with boolean masks: If the index array is an Numpy array of with data type bool
, then an element is selected (True) or not (False) depending on the value of the index mask at the position each element:
In [51]:
x = arange(0, 10, 0.5)
x
Out[51]:
In [52]:
# & is a bitwise and, which, for booleans, is equivalent to np.logical_and(a, b)
mask = (x >= 5) & (x < 7.5)
mask
Out[52]:
In [53]:
x[mask]
Out[53]:
NumPy also contains a special array class which carries a mask and ignores the masked values for all arithmetic operations:
In [54]:
mx = np.ma.MaskedArray(x, mask=~((5 < x) & (x < 7.5)))
mx
Out[54]:
In [55]:
x.sum(), mx.sum()
Out[55]:
In [56]:
indices = np.where((x > 5) & (x < 7.5))
indices
Out[56]:
It can also be used to conditionally select from two options
In [57]:
a = np.ones_like(x)
np.where((x > 5) & (x < 7.5), a, 5) # note the broadcasting of the second option
Out[57]:
In [57]: