Don't run the code in the next cell this code if you are following the notebook! The purpose of this is to avoid long warning messages corrupting the slide display during presentation.
In [1]:
import numpy
numpy.seterr(all='ignore')
import iris
iris.FUTURE.netcdf_promote = True
iris.FUTURE.netcdf_no_unlimited = True
import warnings
warnings.simplefilter('ignore')
Each cube has:
A more complete explanation is available in the Iris user guide.
Let's take a simple example to demonstrate the cube concept.
Suppose we have a (3, 2, 4)
NumPy array:
Where dimensions 0, 1, and 2 have lengths 3, 2 and 4 respectively.
The Iris cube to represent this data may consist of:
a standard name of "air_temperature" and units of "kelvin"
a data array of shape (3, 2, 4)
a coordinate, mapping to dimension 0, consisting of:
a coordinate, mapping to dimension 1, consisting of:
a coordinate, mapping to dimension 2, consisting of:
Pictorially the cube has taken on more information than a simple array:
In [2]:
import iris
import numpy as np
In [3]:
print(iris.__version__)
print(np.__version__)
Whilst it is possible to construct a cube by hand, a far more common approach to getting hold of a cube is to use the Iris load function to access data that already exists in a file.
In [4]:
fname = iris.sample_data_path('uk_hires.pp')
cubes = iris.load(fname)
print(cubes)
We can see that we've loaded two cubes, one representing the "surface_altitude" and the other representing "air_potential_temperature". We can infer even more detail from this printout; for example, what are the dimensions and shape of the "air_potential_temperature" cube?
Above we've printed the iris.cube.CubeList
instance representing all of the cubes found in the given filename. However, we can see more detail by printing individual cubes:
In [5]:
air_pot_temp = cubes[0]
print(air_pot_temp)
In [6]:
cube = iris.load_cube(iris.sample_data_path('A1B_north_america.nc'))
print(cube)
To access a cube's data array the data
property exists. This is either a NumPy array or in some cases a NumPy masked array.
It is important to note that for most of the supported filetypes in Iris, the cube's data isn't actually loaded until you request it via this property (either directly or indirectly). After you've accessed the data once, it is stored on the cube and thus won't be loaded from disk again.
To find the shape of a cube's data it is possible to call cube.data.shape
or cube.data.ndim
, but this will trigger any unloaded data to be loaded. Therefore shape
and ndim
are properties available directly on the cube that do not unnecessarily load data.
In [7]:
print(cube.shape)
print(cube.ndim)
print(type(cube.data))
The standard_name
, long_name
and to an extent var_name
are all attributes to describe the phenomenon that the cube represents. The name()
method is a convenience that returns the first non-empty attributes in the order they are listed above.
In [8]:
print(cube.standard_name)
print(cube.long_name)
print(cube.var_name)
print(cube.name())
To rename a cube, it is possible to set the attributes manually, but it is generally easier to use the rename()
method.
In [9]:
cube.rename("A name that isn't a valid CF standard name")
In [10]:
print(cube.standard_name)
print(cube.long_name)
print(cube.var_name)
print(cube.name())
The units
attribute on a cube tells us the units of the numbers held in the data array. We can manually change the units, or better, we can convert the cube to another unit using the convert_units
method, which will automatically update the data array.
In [11]:
print(cube.data.max(), cube.units)
cube.convert_units('Celsius')
print(cube.data.max(), cube.units)
A cube has a dictionary for extra general purpose attributes, which can be accessed with the cube.attributes
attribute:
In [12]:
print(cube.attributes)
print(cube.attributes['STASH'])
As we've seen, cubes need coordinate information to help us describe the underlying phenomenon. Typically a cube's coordinates are accessed with the coords
or coord
methods. The latter must return exactly one coordinate for the given parameter filters, where the former returns a list of matching coordinates, possibly of length 0.
For example, to access the time coordinate, and print the first 3 times:
In [13]:
time = cube.coord('time')
print(time[:3])
The coordinate interface is very similar to that of a cube. The attributes that exist on both cubes and coordinates are: standard_name
, long_name
, var_name
, units
, attributes
and shape
. Similarly, the name()
, rename()
and convert_units()
methods also exist on a coordinate.
A coordinate does not have data
, instead it has points
and bounds
(bounds
may be None
). In Iris, time coordinates are currently represented as "a number since an epoch":
In [14]:
print(repr(time.units))
print(time.points[:3])
print(time.bounds[:3])
These numbers can be converted to datetime objects with the unit's num2date
method. Dates can be converted back again with the date2num
method:
In [15]:
import datetime
print(time.units.num2date(time.points[:3]))
print(time.units.date2num(datetime.datetime(1970, 2, 1)))
Another important attribute on a coordinate is its coordinate system. Coordinate systems may be None
for trivial coordinates, but particularly for spatial coordinates, they may be complex definitions of things such as the projection, ellipse and/or datum.
In [16]:
lat = cube.coord('latitude')
print(lat.coord_system)
In this case, the latitude's coordinate system is a simple geographic latitude on a spherical globe of radius 6371229 (meters).
Sometimes it is desirable to add bounds to a coordinate that doesn't have any.
The guess_bounds
method on a coordinate is useful in this regard.
For example, the latitude coordinate previously obtained does not have bounds, but we can either set some manually, or use the guess_bounds
method:
In [17]:
print(lat.points[:3])
print(lat.bounds)
if lat.bounds is None:
lat.guess_bounds()
print(lat.bounds[:3])
2. Loop through each of the cubes (e.g. for cube in cubes
) and print the standard name of each.
3. Extract the "sea_water_potential_temperature" cube. Print the minimum, maximum, mean and standard deviation of the cube's data.
4. Print the attributes of the cube.
5. Print the names of all coordinates on the cube. (Hint: Remember the cube.coords method without any keywords will give us all of the cube's coordinates)
6. Get hold of the "latitude" coordinate on the cube. Identify whether this coordinate has bounds. Print the minimum and maximum latitude points in the cube.
We've already seen the basic load
function, but we can also control which cubes are actually loaded with constraints. The simplest constraint is just a string, which filters cubes based on their name:
In [18]:
fname = iris.sample_data_path('uk_hires.pp')
print(iris.load(fname, 'air_potential_temperature'))
In [19]:
filename1 = iris.sample_data_path('GloSea4', 'ensemble_010.pp')
filename2 = iris.sample_data_path('GloSea4', 'ensemble_011.pp')
cubes = iris.load([filename1, filename2])
print(cubes)
Throughout this course we will make use of the sample data that Iris provides. The function iris.sample_data_path
returns the appropriate path to the file in the Iris sample data collection. A common mistake for Iris users is to use the sample_data_path
function to access data that is not part of Iris's sample data collection - this is bad practice and is unlikely to work in the future.
There are three main load functions in Iris: load
, load_cube
and load_cubes
.
In general, it is a good idea to make use of the load_cube
/load_cubes
functions rather than the generic load
function in non-exploratory code. Doing so makes your code more resilient to changes in the data source, often results in more readable/maintainable code, and in combination with well defined constraints, often leads to improve load performance.
The load functions all accept either a single filename or a list of filenames to load, and any of the filenames can be "glob" patterns (http://docs.python.org/2/library/glob.html).
In [20]:
fname = iris.sample_data_path('uk_hires.pp')
cubes = iris.load(fname)
iris.save(cubes, 'saved_cubes.nc')
Extra keywords can be passed to specific fileformat savers.
In [21]:
!ncdump -h saved_cubes.nc | head -n 20
!rm saved_cubes.nc
In [22]:
fname = iris.sample_data_path('uk_hires.pp')
cube = iris.load_cube(fname, 'air_potential_temperature')
print(cube.summary(shorten=True))
In [23]:
subcube = cube[..., 15:35, :10]
print(subcube.summary(shorten=True))
Note: the result of indexing a cube is always a copy and never a view on the original data.
Iris's constraints mechanism provides a powerful way to filter a subset of data from a larger collection. We've already seen that constraints can be used at load time to return data of interest from a file, but we can also apply constraints to a single cube, or a list of cubes, using their respective extract
methods:
In [24]:
fname = iris.sample_data_path('uk_hires.pp')
cubes = iris.load(fname)
print(cubes.extract('air_potential_temperature'))
The simplest constraint, namely a string that matches a cube's name, is conveniently converted into an actual iris.Constraint
instance wherever needed. However, we could construct this constraint manually and compare with the previous result:
In [25]:
pot_temp_constraint = iris.Constraint('air_potential_temperature')
print(cubes.extract(pot_temp_constraint))
The Constraint constructor also takes arbitrary keywords to constrain coordinate values. For example, to extract model level number 10 from the air potential temperature cube:
In [26]:
fname = iris.sample_data_path('uk_hires.pp')
cube = iris.load_cube(fname, 'air_potential_temperature')
In [27]:
level_constraint = iris.Constraint(model_level_number=10)
cube_l10 = cube.extract(level_constraint)
print(cube_l10.summary(shorten=True))
print(cube_l10.coord('model_level_number'))
We could also use a list of possible values instead of a single value in our constraint:
In [28]:
level_constraint = iris.Constraint(model_level_number=[4, 10])
print(cube.extract(level_constraint))
Constraints can also be defined by arbitrary functions that operate on each cell of a coordinate. The function should return True
if we want to keep the input cell, and False
otherwise:
In [29]:
def less_than_10(cell):
"""Return True for values that are less than 10."""
return cell < 10
level_constraint = iris.Constraint(model_level_number=less_than_10)
print(cube.extract(level_constraint))
In [30]:
cube = iris.load_cube(iris.sample_data_path('A1B_north_america.nc'))
print(cube)
We'll start by taking the mean over the time dimension of our cube. We use the collapsed()
method, giving it the name of the coordinate we want to collapse and an aggregator that tells it how to collapse the coordinate:
In [31]:
import iris.analysis
print(cube.collapsed('time', iris.analysis.MEAN))
We can use a variety of built-in aggregators to compute different quantities, for example we could also compute the standard deviation over the time dimension:
In [32]:
print(cube.collapsed('time', iris.analysis.STD_DEV))
It is possible to collapse a cube over multiple dimensions, in this case we compute the global mean air temperature by collapsing over the latitude and longitude dimensions:
In [33]:
global_mean = cube.collapsed(['latitude', 'longitude'],
iris.analysis.MEAN)
print(global_mean)
print(global_mean.data[0])
Notice the warning in the previous example. We averaged over space but did not take into account the variation in grid cell area over the globe. To do this averaging correctly we need weights. Iris has built-in functionality for computing area weights (but requires bounded grid coordinates):
In [34]:
from iris.analysis.cartography import area_weights
cube.coord('latitude').guess_bounds()
cube.coord('longitude').guess_bounds()
cell_weights = area_weights(cube)
global_mean = cube.collapsed(['latitude', 'longitude'],
iris.analysis.MEAN,
weights=cell_weights)
print(global_mean)
print(global_mean.data[0])
Sometimes we want to be able to categorise data before performing statistical operations on it. For example, we might want to categorise our data by "daylight maximum" or "seasonal mean" etc. Both of these categorisations would be based on the time coordinate.
The iris.coord_categorisation
module provides convenience functions to add some common categorical coordinates, and provides a generalised function to allow each creation of custom categorisations.
In [35]:
import iris.coord_categorisation as coord_cat
filename = iris.sample_data_path('ostia_monthly.nc')
cube = iris.load_cube(filename, 'surface_temperature')
The cube loaded represents the monthly air_temperature from April 2006 through to October 2010. Let's add a categorisation coordinate to this cube to identify the climatological season (i.e "djf", "mam", "jja" or "son") of each time point:
In [36]:
coord_cat.add_season(cube, 'time', name='clim_season')
print(cube.coord('clim_season'))
We can now use the cube's aggregated_by
method to "group by and aggregate" on the season, to produce the seasonal mean:
In [37]:
seasonal_mean = cube.aggregated_by('clim_season', iris.analysis.MEAN)
print(seasonal_mean)
print(seasonal_mean.coord('clim_season'))
Basic mathematical operators exist on the cube to allow one to add, subtract, divide, multiply and perform other mathematical operations on cubes of a similar shape to one another:
In [38]:
a1b = iris.load_cube(iris.sample_data_path('A1B_north_america.nc'))
e1 = iris.load_cube(iris.sample_data_path('E1_north_america.nc'))
print(e1.summary(True))
print(a1b.summary(True))
In [39]:
scenario_difference = a1b - e1
print(scenario_difference)
Notice that the resultant cube's name is now unknown.
Cube broadcasting is also supported, meaning that the two cubes don't need to have the same shape. This is roughly analagous to broadcasting of NumPy arrays. For example, to compute anomalies with respect to the time-mean:
In [40]:
print(e1 - e1.collapsed('time', iris.analysis.MEAN))
It is also possible to operate on cubes with numeric scalars, NumPy arrays and even cube coordinates:
In [41]:
e1 * e1.coord('latitude')
Out[41]:
Sometimes you'd like to apply an operation that hasn't been pre-defined in Iris, so it is important that we still have the power to update the cube's data directly. Whenever we do this though, we should be mindful of updating the necessary metadata on the cube:
In [42]:
e1_hot = e1.copy()
e1_hot.data = np.ma.masked_less_equal(e1_hot.data, 280)
e1_hot.rename('air temperatures greater than 280K')
In [43]:
cube = iris.load_cube(iris.sample_data_path('air_temp.pp'))
We'll use the iris.plot
module to plot this cube:
In [44]:
import iris.plot as iplt
iplt.pcolormesh(cube, cmap='viridis')
iplt.show()
As you can see, this plot is a bit basic!
Iris has a second plotting module iris.quickplot
which adds labels and colorbars etc. to plots automatically:
In [45]:
import iris.quickplot as qplt
qplt.pcolormesh(cube, cmap='viridis')
qplt.show()
You can manually create an axes using your desired settings, and pass this to an iris plotting function to tailor your plot:
In [46]:
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
ax = plt.axes(projection=ccrs.Robinson())
ax.set_global()
ax.coastlines()
qplt.contourf(cube, axes=ax, cmap='viridis')
qplt.show()
Several kinds of plots are available, including line plots, scatter plots and 2-d points plots:
In [47]:
temp = iris.load_cube(iris.sample_data_path('atlantic_profiles.nc'),
'sea_water_potential_temperature')
qplt.plot(temp[:, 0, 0], temp.coord('depth'))
qplt.show()
First we'll load some data on a global latitude-longitude grid:
In [48]:
global_air_temp = iris.load_cube(iris.sample_data_path('air_temp.pp'))
In [49]:
qplt.pcolormesh(global_air_temp, cmap='viridis')
plt.gca().coastlines()
qplt.show()
Next we'll load some data on a rotated pole grid covering the North Atlantic and Europe (Met Office NAE model domain):
In [50]:
rotated_psl = iris.load_cube(iris.sample_data_path('rotated_pole.nc'))
rotated_psl.convert_units('hPa')
In [51]:
qplt.pcolormesh(rotated_psl, cmap='viridis')
plt.gca().coastlines()
qplt.show()
If we wanted to compare data on the rotated grid to the global grid, we might regrid the global data onto the smaller rotated pole grid:
In [52]:
rotated_air_temp = global_air_temp.regrid(rotated_psl,
iris.analysis.Linear())
In [53]:
qplt.pcolormesh(rotated_air_temp, cmap='viridis')
plt.gca().coastlines()
qplt.show()
Alternatively we might want to put the rotated pole data onto the global latitude-longitude grid. In this direction we must be careful not to extrapolate outside the original domain:
In [54]:
scheme = iris.analysis.Linear(extrapolation_mode='mask')
global_psl = rotated_psl.regrid(global_air_temp, scheme)
In [55]:
qplt.pcolormesh(global_psl, cmap='viridis')
plt.gca().coastlines()
qplt.show()
This has only been a short overview of Iris' regridding capabilities. There are more advanced techniques available such as area weighted regridding, and a selection of more generic interpolation routines suited to all kinds of data.
1. Load the single cube from the file iris.sample_data_path('SOI_Darwin.nc')
. This contains monthly values of the Southern Oscillation Index.
2. Add two new coordinates based upon the time coordinate, one categorising the meteorological season, and one the year the season was in (hint: add_season and add_season_year are pre-defined). Examine the resulting coordinates.
3. Compute the seasonal means from this cube (i.e. average together the times within in each individual season). You should end up with a time series of length 593 (hint: you can specify two coordinates to aggregate over).
4. Now compute the seasonal climatology. You should end up with a cube of size 4, one point per season (hint: you can use aggregated_by again).
5. Extract the DJF season from both the climatology and seasonal means. Use these to compute a time series of DJF anomalies with respect to the DJF mean (hint: remember you can subtract cubes of different dimensionality).
6. Finally, give the DJF anomalies cube a sensible name and plot the time-series with labelled axes.
2. Extract just data from the year 1980 and beyond from the loaded cube
3. Define a function which takes a coordinate and a single time point as arguments, and returns the decade. For example, your function should return 2010 for the following:
time = iris.coords.DimCoord([10], 'time', units='days since 2018-01-01')
print your_decade_function(time, time.points[0])
4. Add a "decade" coordinate to the loaded cube using your function and the coord categorisation module
5. Calculate the decadal means cube for this scenario
6. Create a figure with 3 rows and 4 columns displaying the decadal means, with the decade displayed prominently in each axes' title (hint: the slices
or slices_over
method of the cube will be helpful, especially combined with the built-in enumerate
function in a for-loop)