We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. ~Donald Knuth
Once we're sure we need to optimize we need to know where to focus our efforts. As with many languages your first guess should be loops, these can be even more expensive than in other languages.
We can be a bit more scientific about this though:
timeit module to test improved implementationsOnce we've located the bottleneck there are a few strategies available to us to speed it up:
We'll cover the profiler later, but the timeit module is easy, and it's even easier to use in a notebook...
In [81]:
aeronet_aot_500 = cis.read_data("../resources/WorkshopData2016/Aeronet/920801_150530_Brussels.lev20", "AOT_500")
def collocate_satellite_swath_with_aeronet(swath):
return swath.collocated_onto(aeronet_aot_500, h_sep=100)
In [79]:
files = ["../resources/WorkshopData2016/AerosolCCI/20080411002335-ESACCI-L2P_AEROSOL-AER_PRODUCTS-AATSR-ENVISAT-ORAC_31962-fv03.04.nc",
"../resources/WorkshopData2016/AerosolCCI/20080411020411-ESACCI-L2P_AEROSOL-AER_PRODUCTS-AATSR-ENVISAT-ORAC_31963-fv03.04.nc",
"../resources/WorkshopData2016/AerosolCCI/20080411034447-ESACCI-L2P_AEROSOL-AER_PRODUCTS-AATSR-ENVISAT-ORAC_31964-fv03.04.nc",
"../resources/WorkshopData2016/AerosolCCI/20080411052523-ESACCI-L2P_AEROSOL-AER_PRODUCTS-AATSR-ENVISAT-ORAC_31965-fv03.04.nc",
"../resources/WorkshopData2016/AerosolCCI/20080411070559-ESACCI-L2P_AEROSOL-AER_PRODUCTS-AATSR-ENVISAT-ORAC_31966-fv03.04.nc"]
aerosol_cci_swaths = [cis.read_data(f, 'AOD550') for f in files]
In [86]:
%%timeit -n1 -r1
for s in aerosol_cci_swaths:
c = collocate_satellite_swath_with_aeronet(s)
In [87]:
%%timeit -n1 -r1
cols = pool.map(collocate_satellite_swath_with_aeronet, aerosol_cci_swaths)
In [77]:
import cis
import numpy as np
In [69]:
def slow_function():
d = cis.read_data("../resources/WorkshopData2016/od550aer.nc", "od550aer")
global_means = []
for t in d.slices_over('time'):
global_means.append(t.data.mean())
return global_means
In [70]:
%%timeit
slow_function()
In [93]:
import multiprocessing
print(multiprocessing.cpu_count())
In [94]:
pool = multiprocessing.Pool()
In [71]:
def calc_mean(cube):
return cube.data.mean()
In [72]:
def parallel_function():
d = cis.read_data("../resources/WorkshopData2016/od550aer.nc", "od550aer")
global_means = pool.map(calc_mean, d.slices_over('time'))
return global_means
In [73]:
%%timeit
parallel_function()
In [53]:
def faster_function():
d = cis.read_data("../resources/WorkshopData2016/od550aer.nc", "od550aer")
return d.collapsed(['x','y'], how='mean')
In [60]:
%%timeit
faster_function()
In [ ]:
A concrete example
Consider the Aerosol CCI satellite data we saw yesterday. Subsetting this to lat/lon boxes was straightforward using CIS, but what if we wanted a more complex region?
In [4]:
import shapely
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
% matplotlib inline
In [6]:
from shapely.geometry import box
from shapely.ops import unary_union
In [24]:
northern_africa = box(-20, 0, 50, 40)
southern_africa = box(10, -40, 50, 0)
combined_africa = unary_union([northern_africa, southern_africa])
all_africa = box(-20, -40, 50, 40)
In [23]:
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
ax.set_global()
ax.add_geometries([all_africa], ccrs.PlateCarree(), facecolor='orange', edgecolor='black', alpha=0.5)
Out[23]:
In [25]:
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
ax.set_global()
ax.add_geometries([combined_africa], ccrs.PlateCarree(), facecolor='orange', edgecolor='black', alpha=0.5)
Out[25]:
In [ ]: