In [ ]:
%matplotlib inline
%pylab inline
In [ ]:
d = np.random.randint(0, 100, size=(10,10))
In [ ]:
np.sort(d)
In [ ]:
np.argsort(d)
In [ ]:
d[np.arange(d.shape[0])[:,np.newaxis], np.argsort(d)]
Lexographical sorting
In [89]:
x = np.ones((5,2))
x[1:3,0] = 0
x[:,1] = np.arange(5)
x
Out[89]:
In [90]:
# create view as a structured array (requires x to be continous)
v = x.ravel().view(dtype=[('a', x.dtype), ('b', x.dtype)])
# inplace sort will change x
v.sort(order=('a', 'b'))
x
Out[90]:
In [91]:
# alternative, lexsort
x = np.ones((5,2)); x[1:3,0] = 0; x[:,1] = np.arange(5)
x[np.lexsort((x[:,1], x[:,0]))]
Out[91]:
With NumPy 1.8 order statistics can be retrieved with partition which is faster than applying a full sort
In [92]:
np.random.shuffle(d.ravel())
np.partition(d, (2, 8))
In [ ]:
data = np.genfromtxt('stockholm_td_adj.dat', names=True)
temp = data['temperature']
year = data['year']
np.mean(temp) # or data['temperature'].mean()
In [ ]:
data['temperature'].mean()
In [ ]:
np.std(temp), np.var(temp)
In [ ]:
temp.min(), temp.max()
In [ ]:
temp.argmin(), temp.argmax()
In [93]:
year[temp.argmin()], year[temp.argmax()]
Out[93]:
In [94]:
np.median(temp)
Out[94]:
In [95]:
np.percentile(temp, [5, 95])
Out[95]:
Most functions have an axis keyword to only do the reduction on a certain axis
In [96]:
d = np.arange(12).reshape((4,3))
print(d)
print('naxis1', d.sum(axis=1))
print('naxis0', d.sum(axis=0))
In [97]:
# numpy >= 1.7, reductions also work over multiply axes:
d = np.arange(5*4*3).reshape((5,4,3))
d.sum(axis=(-2, -1))
Out[97]:
In [98]:
#numpy < 1.7 way for trailing axes:
nshape = d.shape[:-2] + (-1,)
view = d.reshape(nshape)
print('new shape', view.shape)
view.sum(axis=1)
Out[98]:
A universal function (ufunc) is a function operating on arrays element-by-element. They allow implementing a large set of operations and getting features like broadcasting, type casting and reductions "for free".
E.g. all basic operations like addition, multiplication and also basic math like np.sin and np.cos are ufuncs and thus all support the same set of features.
In [99]:
a = np.array([1, 2, 3])
b = np.array([1, 2, 3])
# binary ufunc
np.add(a, b)
# unary ufunc
np.sin(a)
Out[99]:
In [100]:
r = np.empty_like(b)
np.add(a, b, out=r), r
Out[100]:
In [101]:
np.add(a, b, dtype=np.float64)
Out[101]:
In [102]:
np.logaddexp(a, b)
Out[102]:
In [103]:
np.add.reduce(a)
Out[103]:
Reductions support the additional keepdims argument that retains the same dimensionality as the input by adding empty dimensions.
This allows applying the result to operands via broadcasting.
In [106]:
a = np.arange(3*4).reshape(3,4)
r = np.add.reduce(a, axis=1, keepdims=True)
r
Out[106]:
In [107]:
a - r
Out[107]:
In [108]:
np.add.reduceat(np.arange(8),[0,4,6])
Out[108]:
In [109]:
!head -n 3 stockholm_td_adj.dat
The dataformat is: year, month, day, daily average temperature, low, high, location.
If we are interested in the average temperature only in a particular month, say February, then we can create a index mask and use the select out only the data for that month using:
In [110]:
np.unique(data['month']) # the month column takes values from 1 to 12
Out[110]:
In [111]:
mask_feb = data['month'] == 2.
In [112]:
np.mean(data['temperature'][mask_feb])
Out[112]:
With these tools we have very powerful data processing capabilities at our disposal. For example, to extract the average monthly average temperatures for each month of the year only takes a few lines of code:
In [113]:
months = arange(1,13)
monthly_mean = [np.mean(data['temperature'][data['month'] == month]) for month in months]
fig, ax = subplots()
ax.bar(months, monthly_mean)
ax.set_xlabel("Month")
ax.set_ylabel("Monthly avg. temp.");
Vectorizing code is the key to writing efficient numerical calculation with Python/Numpy. That means that as much as possible of a program should be formulated in terms of matrix and vector operations.
We can use the usual arithmetic operators to multiply, add, subtract, and divide arrays element-wise
In [114]:
v1 = np.arange(0, 5)
img = np.arange(25).reshape((5, 5))
In [115]:
v1 * 2
Out[115]:
In [116]:
img + img
Out[116]:
In [117]:
np.dot(img, img)
Out[117]:
In [118]:
# matrix * vector
np.dot(img, v1)
Out[118]:
In [119]:
# vector * vector, scalar product
np.dot(v1, v1)
Out[119]:
A dot product applies along the last axis of the first operand and the second to last of the second operand:
dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])
The same broadcasting rules apply to these conditions of the input operands:
In [120]:
m = np.arange(2*3).reshape(2,3)
v = np.arange(3 * 2).reshape(3, 2)
m
Out[120]:
In [121]:
v
Out[121]:
In [122]:
m.shape, v.shape
Out[122]:
In [123]:
np.dot(m, v)
Out[123]:
NumPy includes the standard linear algebra functions to work on matrices, e.g. np.det (determinant), numpy.pinv etc. Some of them are placed in numpy.linalg (like eigenvalues, linear solvers, ...)
In [124]:
a = np.arange(9.).reshape(3,3)
# identity
np.einsum('ij', a) == a
Out[124]:
In [125]:
#indices alphabetic -> transpose:
np.einsum('ji', a) == a.T
Out[125]:
In [126]:
#trace
a = np.arange(49.).reshape(7,7)
np.einsum('ii', a), a.trace()
Out[126]:
In [127]:
np.einsum('ii->i', a), np.diag(a)
Out[127]:
In [128]:
a = np.arange(12.).reshape(3,4)
b = np.arange(12.).reshape(4,3)
np.einsum('ij,jk->ik', a, b), a.dot(b)
Out[128]:
In [129]:
#tensort dot product
a = np.arange(60.).reshape(3,4,5)
b = np.arange(24.).reshape(4,3,2)
np.einsum('ijk,jil->kl', a, b)
Out[129]: