Using dask with iris

Below I'm attempting to calculate the annual mean timeseries for data sufficiently large that without using dask I get a memory error.

Prepare cube



In [1]:

    
import warnings
warnings.filterwarnings('ignore')

import glob
import iris
from iris.experimental.equalise_cubes import equalise_attributes
import iris.coord_categorisation



In [2]:

    
infiles = glob.glob('/g/data/ua6/DRSv3/CMIP5/CCSM4/historical/mon/ocean/r1i1p1/thetao/latest/thetao_Omon_CCSM4_historical_r1i1p1_??????-??????.nc')
infiles.sort()



In [3]:

    
cube_list = iris.load(infiles)



In [4]:

    
cube_list









    Out[4]:





[<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 120; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>,
<iris 'Cube' of sea_water_potential_temperature / (K) (time: 72; depth: 60; cell index along second dimension: 384; cell index along first dimension: 320)>]



In [5]:

    
equalise_attributes(cube_list)



In [6]:

    
cube = cube_list.concatenate_cube()



In [7]:

    
iris.coord_categorisation.add_year(cube, 'time')

Using dask for the memory intensive calculation



In [8]:

    
from dask.distributed import LocalCluster
from dask.distributed import Client



In [9]:

    
cluster = LocalCluster(n_workers=4)
cluster



In [10]:

    
client = Client(cluster)
client









    Out[10]:







Client

  Scheduler: tcp://127.0.0.1:40042
  
Dashboard: http://127.0.0.1:8787/status



Cluster

  Workers: 4
  Cores: 8
  Memory: 33.67 GB



In [ ]:

    
test = cube.aggregated_by(['year'], iris.analysis.MEAN)

The kernel always dies trying to execute test = ... above, which I'm assuming is due to insufficient RAM to do the calculation.



In [ ]: