Introduction to NumPy

Topics:

  • First contact
  • Dimensions
  • Linear algebra
  • Data files

Instructional material, unless otherwise indicated, by Markus van Dijk (SURFsara). Made available under CC0 https://creativecommons.org/publicdomain/zero/1.0/

First contact

Part of SciPy, see http://scipy.org

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

  • a powerful N-dimensional array object
  • useful linear algebra, Fourier transform, and random number capabilities

In [ ]:
import numpy as np

In [ ]:
a = np.arange(5)
a

In [ ]:
a.size

In [ ]:
a.shape

Indexing


In [ ]:
a[0]

In [ ]:
a[-1]

In [ ]:
a[2:4]

In [ ]:
a[2:]

In [ ]:
a[:4]

In [ ]:
a[:]

In [ ]:
a[1::2]

Play

with indexing and slicing...

Try a[12]...

Array functions


In [ ]:
a.max()

In [ ]:
a.min()

In [ ]:
a.sum()

In [ ]:
a + 3

In [ ]:
a * 5

In [ ]:
a + 3 * 5

In [ ]:
a + 15

In [ ]:
(a + 3) * 5

In [ ]:
a * 5 + 3

In [ ]:
a * (5 + 3)

In [ ]:
%matplotlib inline
import matplotlib.pyplot as plt

In [ ]:
a = np.linspace(-np.pi, np.pi, 100)
plt.plot(a)

In [ ]:
plt.plot(a, np.cos(a))
plt.plot(a, np.arctan(a))

Dimensions

Arrays in Python are not truely n-dimensional: they can be a 1d-array of 1d-arrays (of ...)

NumPy arrays are truely n-dimensional.


In [ ]:
a = np.random.random(15).reshape(3,5)
a

In [ ]:
a.size

In [ ]:
a.shape

Indexing


In [ ]:
a[0]

In [ ]:
a[0,1]

In [ ]:
a[0][1]

In [ ]:
a[:,1]

In [ ]:
a[1,1:3]

In [ ]:
a[:,::2]

Play

with indexing and slicing...

Predict this:


In [ ]:
y = np.arange(35).reshape(5,7)

... have a look at y and predict what the next expression will output ...


In [ ]:
y[1:5:2,::3]

Array functions


In [ ]:
a

In [ ]:
a.sum()

In [ ]:
a.sum(axis=1)

In [ ]:
a.sum(axis=0)

In [ ]:
a = np.random.random(24).reshape(2,3,4)
a

In [ ]:
a.shape

In [ ]:
a.sum(axis=2)

In [ ]:
a.sum()

In [ ]:
a *= 7
a, a.sum()

In [ ]:
a += 3
a, a.sum()

In [ ]:
a.sum(axis=(1,2))

In [ ]:
a.sum(axis=(0,1))

In [ ]:
a.sum(axis=(0,1)).sum()

Play

Discover reductions over axes with functions similar to sum(): min(), max(), avg(), mean().

Roll your own

Define a function that accepts a 1-d array as argument, predict the output.


In [ ]:
# from numpy.org
def my_func(arr):
    """Average first and last element of a 1-D array"""
    return (arr[0] + arr[-1]) * 0.5

b = np.array([[1,2,3], [4,5,6], [7,8,9]])

In [ ]:
np.apply_along_axis(my_func, 0, b)

In [ ]:
np.apply_along_axis(my_func, 1, b)

Linear algebra


In [ ]:
a = np.arange(15).reshape(3,5)
a

In [ ]:
a*a

In [ ]:
a.transpose()

In [ ]:
a.dot(a.transpose())

In [ ]:
a.transpose().dot(a)

Solve linear equation


In [ ]:
# from numpy.org
a = np.array([[1, 2, 3], [3, 4, 6.7], [5, 9.0, 5]])
b =  np.array([3, 2, 1])
np.linalg.solve(a, b)  # solve the equation ax = b

CSV data file example


In [ ]:
data = np.genfromtxt('SCI_SICR_lv2_20031119_09000_0000_v1.0.dat')
data.shape

In [ ]:
cols = 'lon0        lat0        lon1        lat1        lon2        lat2        lon3        lat3      lon_cent    lat_cent    dsr_time  int_time   sza    lza     H2O         HDO         N2O         CO          CH4         H2O_err     HDO_err     N2O_err     CO_err      CH4_err  red.chisq   dof eflag albedo   cl_fr mean_elev  st  px bs'.split()
for i in range(len(cols)):
    print(i, cols[i])

In [ ]:
col_ix = cols.index('CH4')
col_ix

In [ ]:
col_data = data.transpose()[col_ix]
print(col_data.shape)
for d in col_data[:10]:
    print(type(d), d)

In [ ]:
plt.plot(col_data)

In [ ]:
plt.plot(col_data[:100])

In [ ]:
plt.plot([d for d in col_data if d < 4e19])

HDF5 data file example


In [ ]:
import h5py
f = h5py.File('SCI_RPRO_L2__CO____20031119T191154_20031119T194955_09000_01_070200_20030115T002650.nc', 'r')

In [ ]:
for group_name, group in f.items():
    print(group_name, group)
    if len(group) < 20:
        for col in group:
            print('+', col, group[col].shape)

In [ ]:
print(f['/main_product/CO_column'].shape)
for d in f['/main_product/CO_column'][:10]:
    print(d)

In [ ]:
%matplotlib inline
import matplotlib.pyplot as plt

In [ ]:
plt.plot(f['/main_product/CO_column'], color='green')

In [ ]:
print(f['/instrument/time'].shape)
for d in f['/instrument/time'][0:9]:
    print(d)

In [ ]:
from datetime import datetime
times = [datetime(*d) for d in f['/instrument/time']]
for t in times[0:9]:
    print(t)

In [ ]:
plt.plot(times, f['/main_product/CO_column'], color='green')
plt.show()

In [ ]:
ax1 = plt.axes()
ax1.plot(times, f['/main_product/CO_column'], color='green')
ax1.set_ylabel('CO')
ax2 = ax1.twinx()
ax2.plot(times, f['/side_product/H2O_column'], color='blue')
ax2.set_ylabel('H2O')