The solution to the last exercise in the Numpy Basics notebook introduces an important concept when working with NumPy: the axis. This indicates the particular dimension along which a function should operate (provided the function does something taking multiple values and converts to a single value).
Let's look at a concrete example with sum
:
In [ ]:
# Convention for import to get shortened namespace
import numpy as np
In [ ]:
# Create an array for testing
a = np.arange(12).reshape(3, 4)
a
In [ ]:
# This calculates the total of all values in the array
np.sum(a)
In [ ]:
# Keep this in mind:
a.shape
In [ ]:
# Instead, take the sum across the rows:
np.sum(a, axis=0)
In [ ]:
# Or do the same and take the some across columns:
np.sum(a, axis=1)
In [ ]:
# Synthetic data
temp = np.random.randn(100, 50)
u = np.random.randn(100, 50)
v = np.random.randn(100, 50)
# Calculate the gradient components
gradx, grady = np.gradient(temp)
# Turn into an array of vectors:
# axis 0 is x position
# axis 1 is y position
# axis 2 is the vector components
grad_vec = np.dstack([gradx, grady])
print(grad_vec.shape)
# Turn wind components into vector
wind_vec = np.dstack([u, v])
# Calculate advection, the dot product of wind and the negative of gradient
# DON'T USE NUMPY.DOT (doesn't work). Multiply and add.
In [ ]:
# %load solutions/advection.py
In [ ]:
# Create some synthetic data representing temperature and wind speed data
np.random.seed(19990503) # Make sure we all have the same data
temp = (20 * np.cos(np.linspace(0, 2 * np.pi, 100)) +
50 + 2 * np.random.randn(100))
spd = (np.abs(10 * np.sin(np.linspace(0, 2 * np.pi, 100)) +
10 + 5 * np.random.randn(100)))
In [ ]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(temp, 'tab:red')
plt.plot(spd, 'tab:blue');
By doing a comparision between a NumPy array and a value, we get an array of values representing the results of the comparison between each element and the value
In [ ]:
temp > 45
We can take the resulting array and use this to index into the NumPy array and retrieve the values where the result was true
In [ ]:
print(temp[temp > 45])
So long as the size of the boolean array matches the data, the boolean array can come from anywhere
In [ ]:
print(temp[spd > 10])
In [ ]:
# Make a copy so we don't modify the original data
temp2 = temp.copy()
# Replace all places where spd is <10 with NaN (not a number) so matplotlib skips it
temp2[spd < 10] = np.nan
plt.plot(temp2, 'tab:red')
Can also combine multiple boolean arrays using the syntax for bitwise operations. MUST HAVE PARENTHESES due to operator precedence.
In [ ]:
print(temp[(temp < 45) & (spd > 10)])
In [ ]:
# Here's the "data"
np.random.seed(19990503) # Make sure we all have the same data
temp = (20 * np.cos(np.linspace(0, 2 * np.pi, 100)) +
80 + 2 * np.random.randn(100))
rh = (np.abs(20 * np.cos(np.linspace(0, 4 * np.pi, 100)) +
50 + 5 * np.random.randn(100)))
# Create a mask for the two conditions described above
# good_heat_index =
# Use this mask to grab the temperature and relative humidity values that together
# will give good heat index values
# temp[] ?
# BONUS POINTS: Plot only the data where heat index is defined by
# inverting the mask (using `~mask`) and setting invalid values to np.nan
In [ ]:
# %load solutions/heat_index.py
In [ ]:
print(temp[0])
We can also extract the first, fifth, and tenth elements:
In [ ]:
print(temp[[0, 4, 9]])
One of the ways this comes into play is trying to sort numpy arrays using argsort
. This function returns the indices of the array that give the items in sorted order. So for our temp "data":
In [ ]:
inds = np.argsort(temp)
print(inds)
We can use this array of indices to pass into temp to get it in sorted order:
In [ ]:
print(temp[inds])
Or we can slice inds
to only give the 10 highest temperatures:
In [ ]:
ten_highest = inds[-10:]
print(temp[ten_highest])
There are other numpy arg functions that return indices for operating:
In [ ]:
np.*arg*?