Using dask with iris

Below I'm attempting to calculate the annual mean timeseries for data sufficiently large that without using dask I get a memory error.

Prepare cube


In [1]:
import warnings
warnings.filterwarnings('ignore')

import glob
import iris
from iris.experimental.equalise_cubes import equalise_attributes
import iris.coord_categorisation

In [2]:
infiles = glob.glob('/g/data/ua6/DRSv3/CMIP5/CCSM4/historical/mon/ocean/r1i1p1/thetao/latest/thetao_Omon_CCSM4_historical_r1i1p1_??????-??????.nc')
infiles.sort()

In [3]:
cube_list = iris.load(infiles)

In [4]:
cube_list


Out[4]:
[<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 72; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>]

In [5]:
equalise_attributes(cube_list)

In [6]:
cube = cube_list.concatenate_cube()

In [7]:
iris.coord_categorisation.add_year(cube, 'time')

Using dask for the memory intensive calculation


In [8]:
from dask.distributed import LocalCluster
from dask.distributed import Client

In [9]:
cluster = LocalCluster(n_workers=4)
cluster



In [10]:
client = Client(cluster)
client


Out[10]:

Client

Cluster

  • Workers: 4
  • Cores: 8
  • Memory: 33.67 GB

In [ ]:
test = cube.aggregated_by(['year'], iris.analysis.MEAN)

The kernel always dies trying to execute test = ... above, which I'm assuming is due to insufficient RAM to do the calculation.


In [ ]: