Time to add SciPy to our toolbox

In this session you will learn some of the functions available in SciPy package.

What is SciPy?

General purpose scientific library (that consist of bunch of sublibraries) and builds on NumPy arrays. It consists of several submodules:

  • Special functions (scipy.special)
  • Integration (scipy.integrate)
  • Optimization (scipy.optimize)
  • Interpolation (scipy.interpolate)
  • Fourier Transforms (scipy.fftpack)
  • Signal Processing (scipy.signal)
  • Linear Algebra (scipy.linalg)
  • Sparse Eigenvalue Problems (scipy.sparse)
  • Statistics (scipy.stats)
  • Multi-dimensional image processing (scipy.ndimage)
  • File IO (scipy.io)

See full documentation.

Exercise 1

In the first exercise you are blending numpy, scipy and matplotlib together.

Task

  • Read synoptic observations from a binary file (numpy)
  • Interpolate them on a regular grid over Europe (scipy)
  • Plot the result (matplotlib)

Load all the necessary modules


In [1]:
# import os
# import numpy as np
# import matplotlib.pyplot as plt
# from scipy import interpolate  # import submodule for interpolation and regridding
## make figures appear within the notebook
# %matplotlib inline

Read data from a binary file

Define a file name:


In [2]:
## code is ready here, just uncomment
#
# file_name = os.path.join(os.path.pardir, 'data',
#                          'surface_synoptic_20161109_0000_lon_lat_temperature_17097.bin')

Now use numpy.fromfile() function to read data.

Side note: If it was an unformatted sequential binary file from Fortran code, you could have used scipy.io.FortranFile.

Important:

  • the binary file contains only the data itself
  • the data array contains values of longitude, latitude and air temperature
  • these data are from 17097 stations
  • data are saved in single precision float format

In [3]:
# nrec = 17097

In [4]:
# bin_data = np.fromfile( ..., dtype=...)
Aside

Just as in Fortran, you can use direct access way of reading unformatted binary files with a given length of records. For example:

nx = 123 # points in longitude
ny = 456 # points in latitude
with open(my_file_name, 'rb') as f:
    f.seek(4 * ny * nx)  # Move nx*ny float32 records forward (skip the first slice)
    arr = np.fromfile(f, dtype=np.float32, count=ny*nx) # Read the next slice of data
  • Check the type and length of bin_data. Is this what you expected?
  • How will you slice this array into coordinates and data values?

In [5]:
# lons = bin_data[:nrec]
# lats = 
# temp =

You can now quickly plot the data, for example using matplotlib.pyplot.scatter function. Otherwise, skip the next cell.


In [6]:
# plt.scatter(lons, lats, c=temp)

Subset the data

The next step is to grid the observations onto a rectangular grid. Due to data sparsity over some regions, you can create a smaller sample first, e.g. just over Europe. You can loosely define the Europe region within 10W and 60E longitude and within 35N and 70N latitude. Then you can create a "condition" to subset the original data.


In [7]:
# sub_idx = (lons > ...) & (lons < 60) & (lats > ...) & (lats < ...)  # indices over "Europe"

Check that subset works.


In [8]:
# print('Original shape: {}'.format(lons.shape))
# print('Subset shape: {}'.format(lons[sub_idx].shape))

Define a rectangular grid with constant step


In [9]:
# lon_step = 
# lat_step = 
# lon1d = np.arange(-10, 60.1, lon_step)
# lat1d = np.arange(35, 70.1, lat_step)
# lon2d, lat2d = np.meshgrid(lon1d, lat1d)
  • For the interpolation you can use scipy.interpolate.griddata function.
  • Read its docstring and complete the next code cell with the correct arguments.
  • Which interpolation method would you use?
  • Don't forget to use sub_idx! (Hint: it should go to xpoints, ypoints and values)
  • Feel free to explore the scipy.interpolate submodule for other interpolation options.

In [10]:
# temp_grd_lin = interpolate.griddata((xpoints, ypoints), values, xi, method='linear')[source]
  • How about other methods?
  • Try nearest or cubic

In [11]:
# temp_grd_near = interpolate.griddata( ... )
# temp_grd_cub = interpolate.griddata( ... )

Plot results

Enough boring interpolation, time to plot what you got so far.

  • Create a grid of 4 subplots: one for the observations and 3 for the gridded data
  • Plot observations using scatter() function
  • Use contourf() or pcolormesh() to plot gridded data
  • For easier comparison, use the same colour limits in all 4 subplots
  • Attach a colorbar to each subplot, if you like
  • Set titles
  • Tweak/optimize the code as you want
  • Compare different interpolation methods

In [12]:
# fig, axs = plt.subplots(figsize=(15, 10), nrows=2, ncols=2)  # 
# print('2x2 subplot grid: {}'.format(axs.shape))
# 
# h = axs[0, 0].scatter(lons[sub_idx], lats[sub_idx], c=temp[sub_idx],
#                       edgecolors='none', vmin=..., vmax=...)
# fig.colorbar(h, ax=axs[0, 0])
# axs[0, 0].set_title('Observations')
# 
#
#
# h = axs[0, 1].pcolormesh(lon2d, lat2d, temp_grd_lin,
#                          vmin=..., vmax=...)
# fig.colorbar(h, ax=axs[0, 1])
# axs[0, 0].set_title('Linear')
# 
#
#
# h = axs[1, 0] ... ( ... temp_grd_near)
#
# 
# axs[1, 0].set_title('Nearest')
# 
#
#
# axs[1, 1].set_title('Cubic')