Numpy is a python module for scientific computing (linear algebra, Fourier transform, random number capabilities), and efficient handling of N-dimensional arrays. Have a closer look at https://numpy.org/.
First, we have to import the numpy module to our environment and, just to make it shorter, we want to use an abbreviation np instead of numpy.
In [2]:
import numpy as np
In addition to the capabilities of performing calculations with numpy, numpy's great strength is the use and the modification of arrays.
In the chapter before you've learned how to create and use lists which can be used as a starting point to create a numpy array with np.array.
In [3]:
temp_list = [10, 12, 14, 29, 18, 21]
temp_array = np.array(temp_list)
print('--> temp_list: ',temp_list,type(temp_list))
print('--> temp_array: ',temp_array,type(temp_array))
In [4]:
ndim = temp_array.ndim
shape = temp_array.shape
size = temp_array.size
dtype = temp_array.dtype
print('--> number of array dimensions: ', ndim)
print('--> shape of array dimensions: ', shape)
print('--> size of array dimensions: ', size)
print('--> dtype of array dimensions: ', dtype)
You can convert the array from the given data type to another. Here, we want to convert it from integer to floating point values.
In [5]:
print('--> type(temp_array): ',type(temp_array))
print('--> temp_array.astype(float)): ',temp_array.astype(float))
In [6]:
print('--> temp_array[:]: ', temp_array[:])
print('--> temp_array[1]): ', temp_array[1])
print('--> temp_array[0:3]: ', temp_array[0:3])
print('--> temp_array[:3]: ', temp_array[:3])
print('--> temp_array[3:]: ', temp_array[3:])
print('--> temp_array[-1]: ', temp_array[-1])
print('--> temp_array[-3:-1]:', temp_array[-3:-1])
To generate an evenly spaced array, let's say (0,1,2,3,4), you can use the np.arange function. np.range is similar to the range function but it creates an array instead of a list.
To create an array starting at 0 and have 5 elements you can use one single input value as parameter:
In [7]:
a_array = np.arange(5)
print(a_array)
To create an array giving the start, end, and increment value:
start value = 2
end value = 10
increment = 2
In [8]:
a_array = np.arange(2, 10, 2)
print(a_array)
To create an array with an increment of type float:
In [9]:
a_array = np.arange(2, 10, 1.5)
print(a_array)
If you want to create an array on n-elements without knowing the exact increment the np.linspace function is what you are looking for. You have to give the start, end value, and the number of elements to be generated. For example
start value = 1
end value = 2
number of elements = 11
In [10]:
b_array = np.linspace(1, 2, num=11)
print(b_array)
In [11]:
a = np.arange(0, 12, 1,)
print('a:\n ', a)
a_2d = a.reshape(4, 3)
print('\na_2d:\n ', a_2d)
a_3d = a.reshape(2, 3, 2)
print('\na_3d:\n ', a_3d)
Of course, you can convert an n-dimensional array to an one-dimensional array using the attribute flatten.
In [12]:
a_1d = a_3d.flatten()
print('a_1d: ', a_1d)
There is a numpy function doing this too, but it is called ravel.
In [13]:
a_1d = np.ravel(a_3d)
print('a_1d: ', a_1d)
Adding or multiplying arrays of the same size is simple. We define two arrays, each has 5 elements, and compute the sum and the product of them.
In [14]:
m = np.array([2.1, 3.0, 4.7, 5.3, 6.2])
n = np.arange(5)
print('m = ', m)
print('n = ', n)
mn_add = m + n
mn_mul = m * n
print('m + n = ', mn_add)
print('m * n = ', mn_mul)
In [15]:
m_min = m.min()
m_max = m.max()
m_sum = m.sum()
m_mean = m.mean()
m_std = m.std()
m_round = m.round()
print('m_min: ',m_min)
print('m_max: ',m_max)
print('m_sum: ',m_sum)
print('m_mean: ',m_mean)
print('m_std: ',m.std())
print('m_round: ',m.round())
Numpy also provides many mathematical routines, see https://docs.scipy.org/doc/numpy/reference/routines.math.html.
In [16]:
m_sqrt = np.sqrt(m)
m_exp = np.exp(m)
mn_add = np.add(m,n)
print('m_sqrt: ', m_sqrt)
print('m_exp: ', m_exp)
print('mn_add: ', mn_add)
In [17]:
data_radians = np.linspace(0., 6., 5)
print('data_radians: ', data_radians)
data_sin = np.sin(data_radians)
print('data_sin: ', data_sin)
data_cos = np.cos(data_radians)
print('data_cos: ', data_cos)
data_degrees = np.degrees(data_sin)
print('data_degrees: ', data_degrees)
In [18]:
zeros = np.zeros((3,4))
print('zeros: ',zeros)
zeros = np.zeros((3,4),dtype=int)
print('zeros type integer: ',zeros)
ones = np.ones((3,4))
print('ones: ',ones)
ones = np.ones((3,4),dtype=int)
print('ones type integer: ',ones)
In [19]:
v = np.vstack((zeros,ones))
h = np.hstack((zeros,ones))
print('np.vstack(zeros,ones): \n', v)
print('np.hstack(zeros,ones): \n', h)
In [20]:
a_origin = np.arange(12).reshape(3,4)
print('a_origin: \n', a_origin)
b_copy_of_a = a_origin
b_copy_of_a[1,3] = 999
print('b_copy_of_a: ', b_copy_of_a)
print('a_origin: ', a_origin)
To create a physical copy, so called deep copy, you have to use numpy's np.copy function.
In [21]:
a_origin = np.arange(12).reshape(3,4)
c_deep_copy = a_origin.copy()
c_deep_copy[1,3] = 222
print('a_origin: \n', a_origin)
print('c_deep_copy: \n', c_deep_copy)
Working with arrays 1- or multiple-dimensional arrays makes the programs sometimes slow when you have to find values in given ranges or to change values to set missing values for invalid data. Usually, you would thinl about a for or while loop to go through all elements of an array, but numpy has efficient functions to do it in just one line.
Useful functions
np.where
np.argwhere
np.all
np.any
The np.where function allows you to look at the array using a logical expression. If it is True than let the value untouched but when it is False change it to the given value, maybe a kind of a missing value, but this is NOT the same as a netCDF missing value (see below masked arrays).
In [22]:
x = np.array([-1, 2, 0, 5, -3, -2])
x_ge0 = np.where(x >= 0, x, -9999)
print('x: ', x)
print('x_ge0: ', x_ge0)
In the upper example the values of the array were located and directly changed when the condition is False. But sometimes you want to retrieve the indices of the values instead the values themselves, because you need the same indices later again. Then use the np.argwhere function with a logical condition.
In [23]:
x_ind = np.argwhere(x < 0)
print('--> indices x >= 0: \n', x_ind)
y = x
y[x_ind] = -9999
print('y[x_ind] where x >= 0: \n', y)
To see if the values of an array are less than 0 for instance you can't try do it like below - ah, no that would be too easy.
if(x < 0):
print('some elements are less 0')
else:
print('no values are less 0')
if(x > 0):
print('all elements are greater than 0')
else:
print('not all values are greater than 0')
The result would be the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-23-6f0311fb54a3> in <module>
----> 1 if(x < 0):
2 print('some elements are less 0')
3 else:
4 print('no values are less 0')
5
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
The last line of the error message gives you the right hint to use the array functions any or all.
In [24]:
if(x.any() < 0):
print('some elements are less 0')
else:
print('no values are less 0')
if(x.all() > 0):
print('all elements are greater than 0')
else:
print('not all values are greater than 0')
Or you can use the numpy function np.any or np.all.
In [25]:
if(np.any(x < 0)):
print('some elements are less 0')
else:
print('no values are less 0')
if(np.all(x > 0)):
print('all elements are greater than 0')
else:
print('not all values are greater than 0')
Numpy also provides functions to read data from files. Our next example shows how to read data from a CSV file. The input CSV file pw.dat looks like
ID LAT LON PW
BLAC 36.75 -97.25 48.00
BREC 36.41 -97.69 46.30
BURB 36.63 -96.81 49.80
...
We want to read all values, and because the mixed data types in the file, we read all as strings. We don't need the header line so we skip it.
In [26]:
lines = np.loadtxt('/Users/k204045/data/Station_data/pw.dat', dtype='str', skiprows=1)
ID = lines[:,0]
lat = lines[:,1]
lon = lines[:,2]
pw = lines[:,3]
print('ID: \n', ID)
print('lat: \n', lat)
print('lon: \n', lon)
print('pw: \n', pw)
If you don't need the IDs then you can directly read the lat, lon, and pw values using the usecols parameter. There is no need to use the dtype parameter anymore because the default data type is float, and that's what we want.
In [27]:
data = np.loadtxt('/Users/k204045/data/Station_data/pw.dat', usecols=(1,2,3), skiprows=1)
print()
print('--> data: \n', data)
print()
print('--> lon: \n', data[:,0])
print('--> lon: \n', data[:,1])
print('--> pw: \n', data[:,2])
However, we want to have the IDs, too. It's never too late... But again, we have to tell numpy to use the data type string.
In [28]:
IDs = np.loadtxt('/Users/k204045/data/Station_data/pw.dat', dtype='str', usecols=(0), skiprows=1)
print('IDs: \n', IDs)
In our daily work we are confronted with data which we want to mask to see only the values in sections we need. Masking is sometimes tricky and you have to take care.
In the following example we try to demonstrate how to mask a 2-dimensional array by a given mask array containing zeros and ones, where 0 means 'don't mask', and 1 means 'mask'.
In [29]:
field = np.arange(1,9,1).reshape((4,2))
mask = np.array([[0,0],[1,0],[1,1],[1,0]])
mask_field = np.ma.MaskedArray(field,mask)
print('field: \n', field)
print('mask: \n', mask)
print('mask_field: \n', mask_field)
For the next example we want to get data from one array depending on data of a second array.
To create two arrays with random data of type integer we use numpy's random generator.
In [30]:
A = np.random.randint(-3, high=5, size=10)
B = np.random.randint(-4, high=4, size=10)
print('--> A: \n', A)
print('--> B: \n', B)
Now, we want only the values of array A which are
First, we have to find the indices of those values. Numpy has routines to do that for us, presupposed that both arrays are of the same shape.
In [31]:
ind_ge = list(np.greater_equal(A,B))
ind_lt = list(np.less(A,B))
print('--> ind_ge: \n', ind_ge)
print('--> ind_lt: \n', ind_lt)
This is the same as
In [32]:
ind_ge = list(A>=B)
ind_lt = list(A<B)
print('--> ind_ge: \n', ind_ge)
print('--> ind_lt: \n', ind_lt)
Use these indices to get the data we want.
In [33]:
A_ge = A[ind_ge]
A_lt = A[ind_lt]
print('--> A_ge: \n', A_ge)
print('--> A_lt: \n', A_lt)
In this case we get only the values of the array A which conforms the condition, and not the complete masked array. This has to be done with the numpy.ma.MaskedArray.
In [34]:
A_ge2 = np.ma.MaskedArray(A, ind_ge)
A_lt2 = np.ma.MaskedArray(A, ind_lt)
print('--> A_ge2: \n', A_ge2)
print('--> A_lt2: \n', A_lt2)
In [35]:
print(type(A_ge))
print(type(A_ge2))
Here is just a brief example in order to locate all the values of an array that are not equal to zero. Of course, it also shows how to locate the values equal to zero.
Count and locate the non zero values of an array, select them, and mask the array to save the shape of the array.
In [36]:
C = np.random.randint(-2, high=1, size=10)
C_count_nonzero = np.count_nonzero(C)
C_nonzero_ind = np.nonzero(C)
C2 = C[C_nonzero_ind]
C3 = np.ma.MaskedArray(C, C==0)
print('--> C: ', C)
print('--> C_count_nonzero: ', C_count_nonzero)
print('--> C_nonzero_ind: ', C_nonzero_ind)
print('--> C2: ', C2)
print('--> C3: ', C3)
Now, we look at the zeros.
Count and locate the zero values of an array, select them, and mask the array to save the shape of the array.
In [37]:
Z_count_zero = np.count_nonzero(C==0)
Z_zero_ind = np.argwhere(C==0)
Z = C[Z_zero_ind]
Z2 = np.ma.MaskedArray(C, C!=0)
print('--> Z_count_zero: ', Z_count_zero)
print('--> Z_zero_ind: ', Z_zero_ind.flatten())
print('--> Z: ', Z.flatten())
print('--> Z2: ', Z2)
In [38]:
empty_array = np.full(100,1.0e20)
print(empty_array)
In some cases it is more efficient to start with an empty (missing value) array.
In [ ]: