Optimisation

Overview

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. ~Donald Knuth

Once we're sure we need to optimize we need to know where to focus our efforts. As with many languages your first guess should be loops, these can be even more expensive than in other languages.

We can be a bit more scientific about this though:

  • First find where the problem is using a profiler
  • Then use the timeit module to test improved implementations

Once we've located the bottleneck there are a few strategies available to us to speed it up:

  • Move loops and heavy processing to specialised libraries (numpy, scipy, etc)
  • Multiprocessing / threading
  • More involved options: mpi4py, numba, cython, …

We'll cover the profiler later, but the timeit module is easy, and it's even easier to use in a notebook...


In [81]:
aeronet_aot_500 = cis.read_data("../resources/WorkshopData2016/Aeronet/920801_150530_Brussels.lev20", "AOT_500")

def collocate_satellite_swath_with_aeronet(swath):
    return swath.collocated_onto(aeronet_aot_500, h_sep=100)

In [79]:
files = ["../resources/WorkshopData2016/AerosolCCI/20080411002335-ESACCI-L2P_AEROSOL-AER_PRODUCTS-AATSR-ENVISAT-ORAC_31962-fv03.04.nc",
         "../resources/WorkshopData2016/AerosolCCI/20080411020411-ESACCI-L2P_AEROSOL-AER_PRODUCTS-AATSR-ENVISAT-ORAC_31963-fv03.04.nc",
         "../resources/WorkshopData2016/AerosolCCI/20080411034447-ESACCI-L2P_AEROSOL-AER_PRODUCTS-AATSR-ENVISAT-ORAC_31964-fv03.04.nc",
         "../resources/WorkshopData2016/AerosolCCI/20080411052523-ESACCI-L2P_AEROSOL-AER_PRODUCTS-AATSR-ENVISAT-ORAC_31965-fv03.04.nc",
         "../resources/WorkshopData2016/AerosolCCI/20080411070559-ESACCI-L2P_AEROSOL-AER_PRODUCTS-AATSR-ENVISAT-ORAC_31966-fv03.04.nc"]

aerosol_cci_swaths = [cis.read_data(f, 'AOD550') for f in files]

In [86]:
%%timeit -n1 -r1

for s in aerosol_cci_swaths:
    c = collocate_satellite_swath_with_aeronet(s)


1 loop, best of 1: 39.4 s per loop

In [87]:
%%timeit -n1 -r1

cols = pool.map(collocate_satellite_swath_with_aeronet, aerosol_cci_swaths)


1 loop, best of 1: 21.1 s per loop

In [77]:
import cis
import numpy as np

In [69]:
def slow_function():
    d = cis.read_data("../resources/WorkshopData2016/od550aer.nc", "od550aer")
    global_means = []
    for t in d.slices_over('time'):
        global_means.append(t.data.mean())
    return global_means

In [70]:
%%timeit
slow_function()


1 loop, best of 3: 7.02 s per loop

In [93]:
import multiprocessing
print(multiprocessing.cpu_count())


4

In [94]:
pool = multiprocessing.Pool()

In [71]:
def calc_mean(cube):
    return cube.data.mean()

In [72]:
def parallel_function():
    d = cis.read_data("../resources/WorkshopData2016/od550aer.nc", "od550aer")
    global_means = pool.map(calc_mean, d.slices_over('time'))
    return global_means

In [73]:
%%timeit
parallel_function()


1 loop, best of 3: 5.68 s per loop

In [53]:
def faster_function():
    d = cis.read_data("../resources/WorkshopData2016/od550aer.nc", "od550aer")
    return d.collapsed(['x','y'], how='mean')

In [60]:
%%timeit
faster_function()


WARNING:root:Creating guessed bounds as none exist in file
WARNING:root:Creating guessed bounds as none exist in file
WARNING:root:Creating guessed bounds as none exist in file
/Users/watson-parris/anaconda/envs/cis_env/lib/python3.5/site-packages/iris/analysis/cartography.py:327: UserWarning: Using DEFAULT_SPHERICAL_EARTH_RADIUS.
  warnings.warn("Using DEFAULT_SPHERICAL_EARTH_RADIUS.")
WARNING:root:Creating guessed bounds as none exist in file
WARNING:root:Creating guessed bounds as none exist in file
WARNING:root:Creating guessed bounds as none exist in file
/Users/watson-parris/anaconda/envs/cis_env/lib/python3.5/site-packages/iris/analysis/cartography.py:327: UserWarning: Using DEFAULT_SPHERICAL_EARTH_RADIUS.
  warnings.warn("Using DEFAULT_SPHERICAL_EARTH_RADIUS.")
WARNING:root:Creating guessed bounds as none exist in file
WARNING:root:Creating guessed bounds as none exist in file
WARNING:root:Creating guessed bounds as none exist in file
/Users/watson-parris/anaconda/envs/cis_env/lib/python3.5/site-packages/iris/analysis/cartography.py:327: UserWarning: Using DEFAULT_SPHERICAL_EARTH_RADIUS.
  warnings.warn("Using DEFAULT_SPHERICAL_EARTH_RADIUS.")
WARNING:root:Creating guessed bounds as none exist in file
WARNING:root:Creating guessed bounds as none exist in file
WARNING:root:Creating guessed bounds as none exist in file
/Users/watson-parris/anaconda/envs/cis_env/lib/python3.5/site-packages/iris/analysis/cartography.py:327: UserWarning: Using DEFAULT_SPHERICAL_EARTH_RADIUS.
  warnings.warn("Using DEFAULT_SPHERICAL_EARTH_RADIUS.")
1 loop, best of 3: 613 ms per loop

MPI4Py

numba

Cython


In [ ]:

A concrete example

Consider the Aerosol CCI satellite data we saw yesterday. Subsetting this to lat/lon boxes was straightforward using CIS, but what if we wanted a more complex region?


In [4]:
import shapely
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
% matplotlib inline

In [6]:
from shapely.geometry import box
from shapely.ops import unary_union

In [24]:
northern_africa = box(-20, 0, 50, 40)
southern_africa = box(10, -40, 50, 0)
combined_africa = unary_union([northern_africa, southern_africa])
all_africa = box(-20, -40, 50, 40)

In [23]:
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
ax.set_global()
ax.add_geometries([all_africa], ccrs.PlateCarree(), facecolor='orange', edgecolor='black', alpha=0.5)


Out[23]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x111720898>

In [25]:
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
ax.set_global()
ax.add_geometries([combined_africa], ccrs.PlateCarree(), facecolor='orange', edgecolor='black', alpha=0.5)


Out[25]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x110470320>

In [ ]: