In [ ]:
%matplotlib inline

An introduction to Iris


In [ ]:
import iris
import numpy as np

In [ ]:
print iris.__version__
print np.__version__

Discuss

What is Iris?

Iris and the cube

The top level object in Iris is called a cube. A cube contains data and metadata about a single phenomenon and is an implementation of the data model interpreted from the Climate and Forecast (CF) Metadata Conventions.

Each cube has:

  • A data array (typically a NumPy array).
  • A "name", preferably a CF "standard name" to describe the phenomenon that the cube represents.
  • A collection of coordinates to describe each of the dimensions of the data array. These coordinates are split into two types:
    • Dimensioned coordinates are numeric, monotonic and represent a single dimension of the data array. There may be only one dimensioned coordinate per data dimension.
    • Auxilliary coordinates can be of any type, including discrete values such as strings, and may represent more than one data dimension.

A fuller explanation is available in the Iris user guide.

Let's take a simple example to demonstrate the cube concept.

Suppose we have a (3, 2, 4) NumPy array:

Where dimensions 0, 1, and 2 have lengths 3, 2 and 4 respectively.

The Iris cube to represent this data may consist of:

  • a standard name of "air_temperature" and units of "kelvin"

  • a data array of shape (3, 2, 4)

  • a coordinate, mapping to dimension 0, consisting of:

    • a standard name of "height" and units of "meters"
    • an array of length 3 representing the 3 height points
  • a coordinate, mapping to dimension 1, consisting of:

    • a standard name of "latitude" and units of "degrees"
    • an array of length 2 representing the 2 latitude points
    • a coordinate system such that the latitude points could be fully located on the globe
  • a coordinate, mapping to dimension 2, consisting of:

    • a standard name of "longitude" and units of "degrees"
    • an array of length 4 representing the 4 longitude points
    • a coordinate system such that the longitude points could be fully located on the globe

Pictorially the cube has taken on more information than a simple array:

Working with a cube

Whilst it is possible to construct a cube by hand, a far more common approach to getting hold of a cube is to use the Iris load function to access data that already exists in a file.


In [ ]:
fname = iris.sample_data_path('uk_hires.pp')
cubes = iris.load(fname)
print cubes

We can see that we've loaded two cubes, one representing the "surface_altitude" and the other representing "air_potential_temperature". We can infer even more detail from this printout; for example, what are the dimensions and shape of the "air_potential_temperature" cube?

Above we've printed the iris.cube.CubeList instance representing all of the cubes found in the given filename. However, we can see more detail by printing individual cubes:


In [ ]:
air_pot_temp = cubes[0]
print air_pot_temp

We can dig even deeper and print individual coordinates:


In [ ]:
print air_pot_temp.coord('model_level_number')

Cube attributes


In [ ]:
cube = iris.load_cube(iris.sample_data_path('A1B_north_america.nc'))
print cube

To access a cube's data array the data property exists. This is either a NumPy array or in some cases a NumPy masked array. It is very important to note that for most of the supported filetypes in Iris, the cube's data isn't actually loaded until you request it via this property (either directly or indirectly). After you've accessed the data once, it is stored on the cube and thus won't be loaded from disk again.

To find the shape of a cube's data it is possible to call cube.data.shape or cube.data.ndim, but this will trigger any unloaded data to be loaded. Therefore shape and ndim are properties available directly on the cube that do not unnecessarily load data.


In [ ]:
print cube.shape
print cube.ndim
print type(cube.data)

The standard_name, long_name and to an extent var_name are all attributes to describe the phenomenon that the cube represents. The name() method is a convenience that looks at the name attributes in the order they are listed above, returning the first non-empty string. To rename a cube, it is possible to set the attributes manually, but it is generally easier to use the rename() method.


In [ ]:
print cube.standard_name
print cube.long_name
print cube.var_name
print cube.name()

In [ ]:
cube.rename("A name that isn't a valid CF standard name")

In [ ]:
print cube.standard_name
print cube.long_name
print cube.var_name
print cube.name()

The units attribute on a cube tells us the units of the numbers held in the data array. We can manually change the units, or better, we can convert the cube to another unit using the convert_units method, which will automatically update the data array.


In [ ]:
print cube.units
print cube.data.max()
cube.convert_units('Celsius')
print cube.units
print cube.data.max()

A cube has a dictionary for extra general purpose attributes, which can be accessed with the cube.attributes attribute:


In [ ]:
print cube.attributes
print cube.attributes['STASH']

A less frequently used attribute on a cube is its cell_methods. The cell methods are a way to store information about the processing that has taken place on the cube.


In [ ]:
for cell_method in cube.cell_methods:
    print cell_method

In this case we can see that the cube is has been produced by taking a mean of forecasts sampled at 6 hourly intervals (we need to look at the time coordinate to identify any more information).

Coordinates

As we've seen, cubes need coordinate information to help us describe the underlying phenomenon. Typically a cube's coordinates are accessed with the coords or coord methods. The latter must return exactly one coordinate for the given parameter filters, where the former returns a list of matching coordinates, possibly of length 0.

For example, to access the time coordinate, and print the first 4 times:


In [ ]:
time = cube.coord('time')
print time[:4]

Along with the cell method from the previous section, we can now see that this cube represents the mean annual air temperature, sampled every 6 hours, starting in 1860.

The coordinate interface is very similar to that of a cube. The attributes that exist on both cubes and coordinates are: standard_name, long_name, var_name, units, attributes and shape. Similarly, the name(), rename() and convert_units() methods also exist on a coordinate.

A coordinate does not have data, instead it has points and bounds (bounds may be None). In Iris, time coordinates are currently represented as "a number since an epoch":


In [ ]:
print repr(time.units)
print time.points[:4]
print time.bounds[:4]

These numbers can be converted to datetime objects with the unit's num2date method. Dates can be converted back again with the date2num method:


In [ ]:
import datetime

print time.units.num2date(time.points[:4])
print time.units.date2num(datetime.datetime(1970, 2, 1))

Another important attribute on a coordinate is its coordinate system. Coordinate systems may be None for trivial coordinates, but particularly for spatial coordinates, they may be complex definitions of things such as the projection, ellipse and/or datum.


In [ ]:
lat = cube.coord('latitude')
print lat.coord_system

In this case, the latitude's coordinate system is a simple geographic latitude on a spherical globe of radius 6371229 (meters).

Sometimes it is desirable to add bounds to a coordinate that doesn't have any. This is often the case for creating "block" type plots where a coordinate should be able to represent an interval of values, rather than a single point. The guess_bounds method on a coordinate is useful in this regard. For example, the latitude coordinate previously obtained does not have bounds, but we can either set some manually, or use the guess_bounds method:


In [ ]:
print lat.points[:4]
print lat.bounds
if lat.bounds is None:
    lat.guess_bounds()
print lat.bounds[:4]

Exercise 1

1. Using the file in iris.sample_data_path('atlantic_profiles.nc') load the data and print the cube list. Store these cubes in a variable called cubes.


In [ ]:

2. Loop through each of the cubes (e.g. for cube in cubes) and print the standard name of each.


In [ ]:

3. Extract the "sea_water_potential_temperature" cube. Print the minimum, maximum, mean and standard deviation of the cube's data.


In [ ]:

4. Print the attributes of the cube.


In [ ]:

5. Print the names of all coordinates on the cube. (Hint: Remember the cube.coords method without any keywords will give us all of the cube's coordinates)


In [ ]:

6. Get hold of the "latitude" coordinate on the cube. Identify whether this coordinate has bounds. Print the minimum and maximum latitude points in the cube.


In [ ]:

Loading data into Iris

We've already seen the basic load function, but we can also control which cubes are actually loaded with constraints. The simplest constraint is just a string, which filters cubes based on their name:


In [ ]:
fname = iris.sample_data_path('uk_hires.pp')
print iris.load(fname, 'air_potential_temperature')

Note on sample_data_path:

Throughout this course we will make use of the sample data that Iris provides. The function iris.sample_data_path returns the appropriate path to the file in the Iris sample data collection. A common mistake for Iris users is to use the sample_data_path function to access data that is not part of Iris's sample data collection - this is bad practice and is unlikely to work in the future.

Exercise 2

Print the result of iris.sample_data_path('uk_hires.pp') to verify that it returns a string pointing to a file on your system. Use this string directly in the call to iris.load and confirm the result is the same as in the previous example e.g.:

print iris.load('/path/to/iris/sampledata/uk_hires.pp', 'air_potential_temperature')

In [ ]:

The three load functions: load, load_cube and load_cubes

There are three main load functions in Iris: load, load_cube and load_cubes.

  1. load is a general purpose loading function. Typically this is where all data analysis will start, before more loading is refined with the more controlled loading from the other two functions.
  2. load_cube returns a single cube from the given source(s) and constraint. There will be exactly one cube, or an exception will be raised.
  3. load_cubes returns a list of cubes from the given sources(s) and constraint(s). There will be exactly one cube per constraint, or an exception will be raised.

Note: load_cube is a special case of load_cubes, which can be seen with:


In [ ]:
c1, = iris.load(fname, 'surface_altitude')
c2 = iris.load_cube(fname, 'surface_altitude')
c3, = iris.load_cubes(fname, 'surface_altitude')
c1 == c2 == c3

In general, it is a good idea to make use of the load_cube/load_cubes functions rather than the generic load function in non-exploratory code. Doing so makes your code more resilient to changes in the data source, often results in more readable/maintainable code, and in combination with well defined constraints, often leads to improve load performance.

The load functions all accept a list of filenames to load, and any of the filenames can be "glob" patterns (http://docs.python.org/2/library/glob.html).

Exercise 2 (continued)

Read in the files found at iris.sample_data_path('GloSea4', 'ensemble_010.pp') and iris.sample_data_path('GloSea4', 'ensemble_011.pp') using a single load call. Do this by:

1. providing a list of the two filenames.


In [ ]:

2. providing a suitable glob pattern. (Notice that iris.load(iris.sample_data_path('GloSea4', 'ensemble_01*.pp')) picks up too many files.)


In [ ]:

Constraints

Iris's constraints mechanism provides a powerful way to filter a subset of data from a larger collection. We've already seen that constraints can be used at load time to return data of interest from a file, but we can also apply constraints to a single cube, or a list of cubes, using their respective extract methods:


In [ ]:
fname = iris.sample_data_path('uk_hires.pp')
cubes = iris.load(fname)
print cubes.extract('air_potential_temperature')

The simplest constraint, namely a string that matches a cube's name, is conveniently converted into an actual iris.Constraint instance wherever needed. However, we could construct this constraint manually and compare with the previous result:


In [ ]:
pot_temperature_constraint = iris.Constraint('air_potential_temperature')
print cubes.extract(pot_temperature_constraint)

The Constraint constructor also takes arbitrary keywords to constrain coordinate values. For example, to extract model level number 10 from the air potential temperature cube:


In [ ]:
pot_temperature_constraint = iris.Constraint('air_potential_temperature',
                                             model_level_number=10)
print cubes.extract(pot_temperature_constraint)

We can pass a list of possible values, and even combine two constraints with &:


In [ ]:
print cubes.extract('air_potential_temperature' & 
                    iris.Constraint(model_level_number=[4, 10]))

We can define arbitrary functions that operate on each cell of a coordinate. This is a common thing to do for floating point coordinates, where exact equality is non-trivial.


In [ ]:
def less_than_10(cell):
    """Return True for values that are less than 10."""
    return cell < 10

print cubes.extract(iris.Constraint('air_potential_temperature',
                                    model_level_number=less_than_10))

Because Iris cells represent both point and bound, cell comparison can sometimes be counter-intuitive:


In [ ]:
def cell_comparison(cell, value):
    print 'cell > {0!r} is {1}'.format(value, cell > value)
    print 'cell >= {0!r} is {1}'.format(value, cell >= value)
    print 'cell == {0!r} is {1}'.format(value, cell == value)
    print 'cell <= {0!r} is {1}'.format(value, cell <= value)
    print 'cell < {0!r} is {1}'.format(value, cell < value)

cell = iris.coords.Cell(point=10, bound=[8, 12])
cell_comparison(cell, 12)

If you want full control of how cell comparison is taking place, you can always compare with another cell, or just access the cell's individual point or bound:


In [ ]:
cell_1 = iris.coords.Cell(point=10, bound=[8, 12])
cell_2 = iris.coords.Cell(point=11, bound=None)

cell_comparison(cell_1, 11)
print
cell_comparison(cell_1, cell_2)

Time Constraints

It is common to want to build a constraint for time. With Iris < v1.6 it was harder to build time constraints than we would have liked, because of the way that time coordinates had been implemented.

However, since v1.6 this has been made simpler through the ability to compare against cells containing datetimes. The functionality can be enabled globally within the session (and will be enabled by default in future release of Iris) with:


In [ ]:
iris.FUTURE.cell_datetime_objects = True

We can now make time constraints as follows:


In [ ]:
time_constraint = iris.Constraint(
    time=lambda c: c >= datetime.datetime(2009, 11, 19, 11, 0))
print air_pot_temp.extract(time_constraint).summary(True)

There are currently still some limitations. For example, it is not yet possible to do cell based datetime comparisons when the datetime is from anything other than a Gregorian calendar (e.g. such as the 360-day calendar often used in climate models). When this is the case however, we can always access individual components of the datetime and do comparisons on those:


In [ ]:
time_constraint = iris.Constraint(time=lambda c: c.point.hour == 11)
print air_pot_temp.extract(time_constraint).summary(True)

Further functionality has been added to isolate individual components of a datetime via the PartialDateTime class. In this case we can extract the timestep at the 11th hour with:


In [ ]:
from iris.time import PartialDateTime
eleventh_hour = iris.Constraint(time=PartialDateTime(hour=11))
print air_pot_temp.extract(eleventh_hour).summary(True)
print air_pot_temp.extract(eleventh_hour).coord('time')

Exercise 3

The following function tells us whether or not a cube has cell methods:

def has_cell_methods(cube):
    return len(cube.cell_methods) > 0

1. With the cubes loaded from [iris.sample_data_path('A1B_north_america.nc'), iris.sample_data_path('uk_hires.pp')] use the CubeList's extract method to filter only the cubes that have cell methods. (Hint: Look at the iris.Constraint documentation for the cube_func keyword). You should find that the 3 cubes are whittled down to just 1.


In [ ]:

2. Using the file found at iris.sample_data_path('A1B_north_america.nc') filter the cube, using constraints, such that only data between 1860 and 1980 remains (hint: This data has a 360-day calendar with yearly data from 1860 to 2100, so we will need to access the individual components of the cell point's datetime, to return a time dimension of length 120).


In [ ]:

Saving cubes

The iris.save function provides a convenient interface to save Cube and CubeList instances.

To save some cubes to a NetCDF file:


In [ ]:
fname = iris.sample_data_path('uk_hires.pp')
cubes = iris.load(fname)
iris.save(cubes, 'saved_cubes.nc')

In [ ]:
!ncdump -h saved_cubes.nc | head -n 20
!rm saved_cubes.nc

Extra keywords can be passed to specific fileformat savers.

Task: Go to the Iris reference documentation for iris.save. What fileformats can Iris currently save to? What keywords are accepted to iris.save when saving a PP file?

Indexing

Cubes can be indexed in a familiar manner to that of NumPy arrays:


In [ ]:
fname = iris.sample_data_path('uk_hires.pp')
cube = iris.load_cube(fname, 'air_potential_temperature')
print cube.summary(shorten=True)

In [ ]:
subcube = cube[..., ::2, 15:35, :10]
subcube.summary(shorten=True)

Note: the result of indexing a cube is always a copy and never a view on the original data.

Merging

When Iris loads data it tries to reduce the number of cubes returned by collecting together multiple fields with shared metadata into a single multidimensional cube. In Iris, this is known as merging.

In order to merge two cubes, they must be identical in everything but a scalar dimension, which goes on to become a new data dimension.

The iris.load_raw function can be used as a diagnostic tool to identify the individual "fields" that Iris identifies in a given set of filenames before any merge takes place:


In [ ]:
fname = iris.sample_data_path('GloSea4', 'ensemble_008.pp')
raw_cubes = iris.load_raw(fname)

print len(raw_cubes)

When we look in detail at these cubes, we find that they are identical in every coordinate except for the scalar forecast_period and time coordinates:


In [ ]:
print raw_cubes[0]
print
print raw_cubes[1]

Any CubeList can be merged with the merge method, and the resulting CubeList from load_raw is no different. The merge method always returns another CubeList:


In [ ]:
merged_cube, = raw_cubes.merge()
print merged_cube

When we look in more detail, we can see that the time coordinate has become a new dimension, as well as gaining another forecast_period auxiliary coordinate:


In [ ]:
print merged_cube.coord('time')
print merged_cube.coord('forecast_period')

Identifying merge problems

In order to avoid the Iris merge functionality making often inappropriate assumptions about incoming data, merge is strict with regards to the uniformity of the incoming cubes.

For example, if we load the fields from two ensemble members from the GloSea4 model sample data, we see we have 12 fields before any merge takes place:


In [ ]:
fname = iris.sample_data_path('GloSea4', 'ensemble_00[34].pp')
cubes = iris.load_raw(fname, 'surface_temperature')
print len(cubes)

If we try to merge these 12 cubes we get 2 cubes rather than one:


In [ ]:
incomplete_cubes = cubes.merge(unique=False)
print incomplete_cubes

When we look in more detail at these two cubes, what is different between the two? (Hint: One value changes, another is completely missing)


In [ ]:
print incomplete_cubes[0]
print '--' * 50
print incomplete_cubes[1]

By adding the missing coordinate, we can trigger a merge of the 12 cubes into a single cube, as expected:


In [ ]:
for cube in cubes:
    if not cube.coords('realization'):
        cube.add_aux_coord(iris.coords.DimCoord(np.int32(3),
                                                'realization'))

merged_cubes = cubes.merge()
print merged_cubes

Iris v1.7 and beyond includes functionality to simplify the identification process for causes of failed merges. The merge_cube method of a CubeList expects the list of cubes to contain only cubes that can be merged to produce a single cube. If they do not merge to a single cube, a descriptive exception will be raised. For instance:

   >>> cubes.merge_cube()
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     ...
   iris.exceptions.MergeError: failed to merge into a single cube.
     Coordinates in cube.aux_coords (scalar) differ: realization.

Exercise 4

The following exercise is designed to give you experience of solving issues that prevent a merge from taking place. The output from merge_cube is included to help with identification, and once a fix has been identified, raw_cubes.merge() should result in a CubeList containing a single cube:

The first exercise is completed below:

1. Identify and resolve the issue preventing resources/merge_exercise.1.*.nc from merging.

>>> raw_cubes = iris.load_raw('../resources/merge_exercise.1.*.nc')
>>> raw_cubes.merge_cube()
Traceback (most recent call last):
...
iris.exceptions.MergeError: failed to merge into a single cube.
  cube.attributes keys differ: 'History'

In [ ]:
raw_cubes = iris.load_raw('../resources/merge_exercise.1.*.nc')

# Print the attributes, clearly one is different.
for cube in raw_cubes:
    print cube.attributes

# Remove the history attribute from the first cube.
del raw_cubes[0].attributes['History']

# Check that this has meant that a merge now results in a single cube.
print raw_cubes.merge()

2. Identify and resolve the issue preventing resources/merge_exercise.2.*.nc from merging.

>>> raw_cubes = iris.load_raw('../resources/merge_exercise.2.*.nc')
>>> raw_cubes.merge_cube()
Traceback (most recent call last):
  iris.exceptions.MergeError: failed to merge into a single cube.
  cube.long_name differs: u'The first timestep' != u'The second timestep'

In [ ]:

3. (extension) Identify and resolve the issue preventing resources/merge_exercise.4.*.nc from merging.

>>> raw_cubes = iris.load_raw('../resources/merge_exercise.4.*.nc')
>>> raw_cubes.merge_cube()
Traceback (most recent call last):
  iris.exceptions.MergeError: failed to merge into a single cube.
  cube data dtype differs: float32 != float64

In [ ]:

4. (extension) Identify and resolve the issue preventing resources/merge_exercise.5.*.nc from merging (hint: Cubes can be indexed like NumPy arrays).

>>> raw_cubes = iris.load_raw('../resources/merge_exercise.5.*.nc')
>>> raw_cubes.merge_cube()
Traceback (most recent call last):
  iris.exceptions.MergeError: failed to merge into a single cube.
  cube.shape differs: (100, 100) != (1, 100, 100)

Concatenate

We have seen that merge combines a list of cubes with a common scalar coordinate to produce a single cube with a new dimension created from these scalar values.

But what happens if you try to combine cubes along a common dimension?


In [ ]:
fname = iris.sample_data_path('A1B_north_america.nc')
cube = iris.load_cube(fname)

cube_1 = cube[:10]
cube_2 = cube[10:20]
cubes = iris.cube.CubeList([cube_1, cube_2])
print cubes

These cubes should be able to be merged; after all, they have both come from the same original cube!


In [ ]:
print cubes.merge()

Merge cannot be used to combine common non-scalar coordinates. Instead we must use concatenate, which joins together ("concatenates") common non-scalar coordinates to produce a single cube with the common dimension extended:


In [ ]:
print cubes.concatenate()

As with merge, Iris contains functionality to simplify the identification process for causes of failed concatenations. The concatenate_cube method of a CubeList expects the list of cubes to contain only cubes that can be concatenated to produce a single cube. If they do not concatenate to a single cube, a descriptive error will be raised. For instance:

    >>> print cubes.concatenate_cube()
    Traceback (most recent call last):
      ...
    iris.exceptions.ConcatenateError: failed to concatenate into a single cube.
      Scalar coordinates differ: forecast_reference_time, height != forecast_reference_time

Load callbacks

Sometimes important data exists in a filename rather than in the file itself, and it is desirable for it to become part of the cube's metadata. For example, some early GloSea4 model runs recorded the "ensemble member number" (or "realization" in CF terms) in the filename, but not in actual PP metadata itself. As a result, loading the data yielded 2 cubes, rather than a single, fully merged, cube.


In [ ]:
fname = iris.sample_data_path('GloSea4', 'ensemble_00[34].pp')
for cube in iris.load(fname, 'surface_temperature'):
    print cube, '\n-----'

To resolve this we can define a function that gets called during the load process. This must take as arguments:

  • a cube,
  • a 2D field - either a PP field, a NetCDF variable or a GRIB message depending on the file format being loaded, and
  • a filename.

In this case, the function makes the necessary adjustments to include a "realization" coordinate. We pass this function to load, and the result is a successfully merged cube:


In [ ]:
import os
def realization_callback(cube, field, fname):
    basename = os.path.basename(fname)
    if not cube.coords('realization') and basename.startswith('ensemble_'):
        cube.add_aux_coord(iris.coords.DimCoord(np.int32(basename[-6:-3]),
                                                'realization'))

print iris.load_cube(fname, callback=realization_callback)

Cube aggregation/collapsing

Many standard univariate aggregations exist in Iris (and it is relatively easy to create your own, if the one you want doesn't exist).


In [ ]:
fname = iris.sample_data_path('uk_hires.pp')
cube = iris.load_cube(fname, 'air_potential_temperature')
print cube.summary(True)

To take the vertical mean of this cube:


In [ ]:
print cube.collapsed('model_level_number', iris.analysis.MEAN)

Some of the aggregators accept weights as a keyword. The supplied weights must have the same shape as the cube (there is currently no broadcasting being done) but there is an Iris utility function to make this easier:


In [ ]:
weights = np.array([1, 1.1, 1, 1.4, 0.7, 1, 1])
weights = iris.util.broadcast_to_shape(weights, cube.shape, (1,))
print cube.collapsed('model_level_number',
                     iris.analysis.MEAN, weights=weights).summary(True)

For an area weighted mean there is a convenience function called area_weights in iris.analysis.cartography. One of the requirements of this function is that the spatial coordinates have bounds (otherwise there would be no area to calculate):


In [ ]:
import iris.analysis.cartography
if cube.coord('grid_latitude').bounds is None:
    cube.coord('grid_latitude').guess_bounds()
    cube.coord('grid_longitude').guess_bounds()

grid_areas = iris.analysis.cartography.area_weights(cube)

This can be passed to the collapsed method along with the two coordinates that we want to take the mean over:


In [ ]:
area_avg = cube.collapsed(['grid_longitude', 'grid_latitude'],
                          iris.analysis.MEAN, weights=grid_areas)
print area_avg

Exercise 5

1. What other aggregators are available?


In [ ]:

2. Calculate the potential temperature variance with time for cube area_avg from above. Hint: We want to reduce the area averaged cube's vertical dimension, and end up with a cube of length 3. Print the data values of the resulting cube.


In [ ]:

Iteration

We can loop through all desired subcubes in a larger cube using the slices and slices_over methods.


In [ ]:
fname = iris.sample_data_path('uk_hires.pp')
cube = iris.load_cube(fname,
                      iris.Constraint('air_potential_temperature',
                                      model_level_number=1))
print cube.summary(True)

The slices method returns all the slices of a cube on the dimensions specified by the coordinates passed to the slices method.

So in this example, each grid_latitude / grid_longitude slice of the cube is returned:


In [ ]:
for subcube in cube.slices(['grid_latitude', 'grid_longitude']):
    print subcube.summary(shorten=True)

The iris.iterate.izip function extends this concept and allows us to loop through multiple cubes at the same time:


In [ ]:
from iris.iterate import izip

iris.FUTURE.cell_datetime_objects = True

e1 = iris.load_cube(iris.sample_data_path('E1_north_america.nc'))
a1b = iris.load_cube(iris.sample_data_path('A1B_north_america.nc'))

for e1_slice, a1b_slice in izip(e1, a1b, coords=['latitude', 'longitude']):
    print e1_slice.summary(True), e1_slice.coord('time').cell(0).point
    print a1b_slice.summary(True), a1b_slice.coord('time').cell(0).point
    break

In this example, one real use for this functionality would be to plot the e1 cube next to the a1b cube for each timestep.

We can use slices_over to return one subcube for each coordinate value in a specified coordinate. This helps us when trying to retrieve all the slices along a given cube dimension.

For example, let's consider retrieving all the slices over the time dimension (i.e. each time step in its own cube with a scalar time coordinate) using slices. As per the above example, to achieve this using slices we would have to specify all the cube's dimensions except the time dimension.

Let's take a look at slices_over providing this functionality:


In [ ]:
fname = iris.sample_data_path('uk_hires.pp')
cube = iris.load_cube(fname, 'air_potential_temperature')
for subcube in cube.slices_over('model_level_number'):
    print subcube.summary(shorten=True)

Plotting

Iris comes with two plotting modules called iris.plot and iris.quickplot that wrap some of the common matplotlib plotting functions such that cubes can be passed as input rather than the usual NumPy arrays. The two modules are very similar, with the primary difference being that quickplot will add extra information to the axes, such as:

  • a colorbar,
  • labels for the x and y axes, and
  • a title where possible.

In [ ]:
import iris.plot as iplt
import iris.quickplot as qplt
import matplotlib.pyplot as plt

In [ ]:
cube = iris.load_cube(iris.sample_data_path('A1B_north_america.nc'))
ts = cube.collapsed(['latitude', 'longitude'], iris.analysis.MEAN)
print ts

In [ ]:
iplt.plot(ts)
plt.show()

For comparison, lets plot the result of iplt.plot next to qplt.plot:


In [ ]:
plt.subplot(1, 2, 1)
iplt.plot(ts)

plt.subplot(1, 2, 2)
qplt.plot(ts)

plt.subplots_adjust(wspace=0.5)
plt.show()

Notice how the result of qplt has axis labels and a title; everything else about the axes is identical.

The plotting functions in Iris have strict rules on the dimensionality of the inputted cubes. For example, a 2d cube is needed in order to create a contour plot:


In [ ]:
qplt.contourf(cube[:, 0, :])
plt.show()

Additionally, we can control the x and y axis coordinates with the coords keyword:


In [ ]:
zonal_variance = cube.collapsed('longitude', iris.analysis.VARIANCE)
qplt.contourf(zonal_variance,
              coords=['forecast_period', 'latitude'])
plt.title('Zonal variance in air temperature')
plt.show()

Maps with cartopy

When the result of a plot operation is a map, Iris will automatically create an appropriate cartopy axes if one doesn't already exist.

We can use matplotlib's gca() to get hold of the automatically created cartopy axes:


In [ ]:
import cartopy.crs as ccrs

plt.figure(figsize=(12, 8))

plt.subplot(1, 2, 1)
qplt.contourf(cube[0, ...], 25)
ax = plt.gca()
ax.coastlines()

ax = plt.subplot(1, 2, 2, projection=ccrs.RotatedPole(100, 37))
qplt.contourf(cube[0, ...], 25)
ax.coastlines()

plt.show()

Exercise 6

Use the above cube, with appropriate indexing, to produce the following:

1. a contour plot of time vs longitude


In [ ]:

2. a contourf map on a LambertConformal projection (with coastlines)


In [ ]:

3. a block plot (pcolormesh) map in its native projection (with coastlines)


In [ ]:

4. a line plot showing air_temperature vs forecast_period (hint: plot accepts two arguments for the x and y axes)


In [ ]:

5. a scatter plot showing air_temperature vs longitude (hint: the inputs to scatter can be a combination of coordinates or 1D cubes)


In [ ]:

Cube maths

Basic mathematical operators exist on the cube to allow one to add, subtract, divide, multiply and perform other mathematical operations on cubes of a similar shape to one another:


In [ ]:
a1b = iris.load_cube(iris.sample_data_path('A1B_north_america.nc'))
e1 = iris.load_cube(iris.sample_data_path('E1_north_america.nc'))

print e1.summary(True)
print a1b

In [ ]:
scenario_difference = a1b - e1
print scenario_difference

Notice that the resultant cube's name is now unknown and that resultant cube's attributes and cell methods have disappeared; this is because these all differed between the two input cubes.

It is also possible to operate on cubes with numeric scalars, NumPy arrays and even cube coordinates:


In [ ]:
e1 * e1.coord('latitude')

Cube broadcasting is also taking place, meaning that the two cubes don't need to have the same shape:


In [ ]:
print e1 - e1.collapsed('time', iris.analysis.MEAN)

Sometimes aggregations don't exist in Iris, so it is important that we still have the power to update the cube's data directly. Whenever we do this though, we should be mindful of updating the necessary metadata on the cube:


In [ ]:
e1_hot = e1.copy()

e1_hot.data = np.ma.masked_less_equal(e1_hot.data, 280)
e1_hot.rename('air temperatures greater than 280K')

Creating extra annotation coordinates for statistical convenience

Sometimes we want to be able to categorise data before performing statistical operations on it. For example, we might want to categorise our data by "daylight maximum" or "seasonal mean" etc. Both of these categorisations would be based on the time coordinate.

The iris.coord_categorisation module provides convenience functions to add some common categorical coordinates, and provides a generalised function to allow each creation of custom categorisations.


In [ ]:
import iris.coord_categorisation as coord_cat

filename = iris.sample_data_path('ostia_monthly.nc')
cube = iris.load_cube(filename, 'surface_temperature')

The cube loaded represents the monthly air_temperature from April 2006 through to October 2010. Let's add a categorisation coordinate to this cube to identify the climatological season (i.e "djf", "mam", "jja" or "son") of each time point:


In [ ]:
coord_cat.add_season(cube, 'time', name='clim_season')

We can now use the cube's aggregated_by method to "group by and aggregate" on the season, to produce the seasonal mean:


In [ ]:
seasonal_mean = cube.aggregated_by('clim_season', iris.analysis.MEAN)

We can take this further by extracting by our newly created coordinate, producing a plot of the winter zonal mean:


In [ ]:
winter = seasonal_mean.extract(iris.Constraint(clim_season='djf'))

qplt.plot(winter.collapsed('latitude', iris.analysis.MEAN))
plt.title('Winter zonal mean surface temperature at $\pm5^{\circ}$ latitude')
plt.show()

Custom categorisation

Custom categorisation can be achieved with an arbitrary function. For example, the existing add_year categorisor takes the 'time' coordinate, and creates a 'year' coordinate. This could be achieved without using the available add_year by:


In [ ]:
def year_from_time(coord, point):
    return coord.units.num2date(point).year

coord_cat.add_categorised_coord(cube, 'year', cube.coord('time'),
                                year_from_time)

print cube.coord('year')

Performance tricks

This section details a few common tricks to improve the performance of your Iris code:

  • Data loading.
  • Load once, extract many times.

Make use of deferred loading of data

Sometimes it makes sense to load data before doing operations, other times it makes sense to do data reduction before loading:


In [ ]:
def zonal_sum(cube):
    """
    A really silly function to calculate the sum of the grid_longitude
    dimension.
    Don't use this in real life, instead consider doing:
    
        cube.collapsed('grid_longitude', iris.analysis.SUM)
    
    """
    total = 0
    for i, _ in enumerate(cube.coord('grid_longitude')):
        total += cube[..., i].data
    return total

In [ ]:
%%timeit
fname = iris.sample_data_path('uk_hires.pp')
pt = iris.load_cube(fname, 'air_potential_temperature')
result = zonal_sum(pt)

The exact same code, only with the data loaded upfront:


In [ ]:
%%timeit
fname = iris.sample_data_path('uk_hires.pp')
pt = iris.load_cube(fname, 'air_potential_temperature')
pt.data
result = zonal_sum(pt)

Load once, extract many times

Iris loading can be slow, particularly if the format stores 2d fields of a conceptually higher dimensional dataset, as is the case with GRIB and PP. To maximise load speed and avoid unncecessary processing, it is worth constraining the fields that are of interest at load time, but there is no caching, so loading a file twice will be twice as slow.


In [ ]:
fname = iris.sample_data_path('uk_hires.pp')
model_levels = [1, 4, 7, 16]

In [ ]:
%%timeit
for model_level in model_levels:
    pt = iris.load_cube(fname,
                        iris.Constraint('air_potential_temperature',
                                        model_level_number=model_level))

In [ ]:
%%timeit
cubes = iris.load(fname)
for model_level in model_levels:
    pt = cubes.extract(iris.Constraint('air_potential_temperature',
                                       model_level_number=model_level),
                       strict=True)

For files with lots of different phenomenon this can be improved further by loading only the phenomenon (and in this case just the model levels of interest):


In [ ]:
%%timeit
cube = iris.load(fname,
                 iris.Constraint('air_potential_temperature',
                                 model_level_number=model_levels))
for model_level in model_levels:
    pt = cube.extract(iris.Constraint(model_level_number=model_level))

Exercise 7

Produce a set of plots that provide a comparison of decadal-mean air temperatures over North America:

Part 1

Load 'A1B_north_america.nc' from the Iris sample data.


In [ ]:

Part 2

Extract just data from the year 1980 and beyond from the loaded data.


In [ ]:

Part 3

Define a function that takes a coordinate and a single time point as arguments, and returns the decade. For example, your function should return 2010 for the following:

   time = iris.coords.DimCoord([10], 'time',
                               units='days since 2018-01-01')
   print your_decade_function(time, time.points[0])

In [ ]:

Part 4

Add a "decade" coordinate to the loaded cube using your function and the coord categorisation module.


In [ ]:

Part 5

Calculate the decadal means cube for this scenario.


In [ ]:

Part 6

Create a figure with 3 rows and 4 columns displaying the decadal means, with the decade displayed prominently in each axes' title.


In [ ]: