Using Python for analysis of Climate data

This morning's session hopefully covered getting started with coding in Python. As well as being able to write modelling code in Python, and use it for retrieving / initial processing of data, it can also be used for analysis of your data, and that's what we'll look at here.

I spend a lot of time working with climate model output or observations, that are provided in gridded files, typically in NetCDF format. Python, like Matlab and IDL, has tools that are capable of working with NetCDF files. One tool in particular stands out, because it is takes advantage of a set of rules we use to describe data (CF) to make working with and plotting climate data easier.

There are more python modules that are also helpful (e.g. pandas, which is like R), but today I'll just be focusing on Iris.

Iris

What is it?

  • The Met Office have developed a project to build, in an open, community-driven way, tools designed to make working with and displaying scientific data easier within Python. That project is called SciTools and of the tools they've developed, the one we spend most time using is called Iris.

Where is it?

  • Iris is installed on our school-of-geosciences linux machines, e.g. burn, and is available on Eddie. It is freely available, so you can install it yourself on your linux machine at home, or (like this notebook) on a cloud computing service.

How do I learn more?

Loading data: iris.load and _iris.loadcube


In [1]:
# to 'get' iris, all you have to do (on geos machines) is import it..
import iris

Let's start with loading some data, but let's assume we don't know anything about the file's contents. We use the general 'load' command..


In [2]:
cubes = iris.load('~/data/HadCRUT*.nc')
print cubes


0: near_surface_temperature_anomaly / (K) (time: 1964; latitude: 36; longitude: 72)
1: field_status / (no_unit)            (time: 1964; -- : 1)

We can now see the file contains a couple of variables. So use a specific loader to just load the SAT field..


In [3]:
sat = iris.load_cube('~/data/HadCRUT*.nc', 'near_surface_temperature_anomaly')

And to explore the metadata, we can just print the object..


In [4]:
print sat


near_surface_temperature_anomaly / (K) (time: 1964; latitude: 36; longitude: 72)
     Dimension coordinates:
          time                              x               -              -
          latitude                          -               x              -
          longitude                         -               -              x
     Attributes:
          Conventions: CF-1.0
          comment: 
          ensemble_member_index: 0
          ensemble_members: 100
          history: Updated at 30/09/2013 09:38:38
          institution: Met Office Hadley Centre / Climatic Research Unit, University of East ...
          reference: Morice, C. P., J. J. Kennedy, N. A. Rayner, and P. D. Jones (2012), Quantifying...
          reference_period: [1961 1990]
          source: CRUTEM.4.2.0.0, HadSST.3.1.0.0
          title: HadCRUT4 near-surface temperature ensemble data - ensemble median
          version: HadCRUT.4.2.0.0

If we wanted to know more detail about the coordinates, we could, e.g.


In [5]:
# print the list of the coordinates..
for i in sat.coords(): print i.name()
print ' '
#now give me more info about an individual coordinate, say time..
print sat.coord('time')
print ' '
# ooh, time has units, let's look at those..
print sat.coord('time').units


time
latitude
longitude
 
DimCoord([1850-01-16 12:00:00, 1850-02-15 00:00:00, 1850-03-16 12:00:00, ...,
       2013-06-16 00:00:00, 2013-07-16 12:00:00, 2013-08-16 12:00:00], standard_name=u'time', calendar=u'gregorian', long_name=u'time', var_name='time', attributes={'start_month': 1, 'end_month': 8, 'start_year': 1850, 'end_year': 2013})
 
days since 1850-1-1 00:00:00

The 'name' can have a couple of versions in line with CF rules, e.g. "latitude" as it's standard_name but "lat" as the name in the file, and "latitude_in_degress_from_turkey". We can inspect them the same way...


In [6]:
print sat.name() # <-- function that returns the name of the variable in the cube
print sat.standard_name # <-- the CF compliant equivalent name
print sat.long_name # <-- a name that has been given the variable in the file


near_surface_temperature_anomaly
None
near_surface_temperature_anomaly

>> ASIDE: Python's lambda function

Python includes a function called lambda which allows you to quickly generate functions that return a true/false.

e.g. create a function, f(x), that is true if x is between 0 and 10..


In [7]:
f = lambda x: 0 < x < 10
print f(-1),f(5),f(11)


False True False

The reason for explaining this is because we can use lambda functions to help us quickly select parts of the grids we're loading.

Loading restricted regions, or subsetting an existing region: Iris' extract method

If we're only interested in a small chunk of the dataset, we have two options..

  • specify a sub-set when we're loading, e.g. only model-level 1,
  • or use extract to get a subset of an existing cube.

Let's suppose we wanted to load just the mid-lats for the Eastern hemisphere. I'l show both modes..


In [8]:
# define my search rules, and use the lambda function to help..
func = lambda a: 35<a<60
my_lat_rule = iris.Constraint(latitude=func)
#
# we could also write this as one line, e.g.
my_lon_rule = iris.Constraint(longitude=lambda j: 0 < j)
#
# then combine the name and the other constraints...
mycube = iris.load_cube('~/data/HadCRUT*.nc', 'near_surface_temperature_anomaly' & my_lat_rule & my_lon_rule)
#
# and let's check that we've got a smaller cube than we got before...
print "old:",sat.summary(True)
print "new:",mycube.summary(True)


old: near_surface_temperature_anomaly / (K) (time: 1964; latitude: 36; longitude: 72)
new: near_surface_temperature_anomaly / (K) (time: 1964; latitude: 5; longitude: 36)

Note that at this stage all that has been loaded is the meta-data, we haven't yet accessed the actual data, which makes it fast.

The second method I mentioned - extract - can be used on an existing cube, e.g. suppose we're interested in getting some new cubes which contain just europe and asia..


In [9]:
# artibrarily split the EU and Asia at 50E..
eu = mycube.extract(iris.Constraint(longitude=lambda c:c<50))
asia = mycube.extract(iris.Constraint(longitude=lambda c:c>50))
print eu


near_surface_temperature_anomaly / (K) (time: 1964; latitude: 5; longitude: 10)
     Dimension coordinates:
          time                              x               -             -
          latitude                          -               x             -
          longitude                         -               -             x
     Attributes:
          Conventions: CF-1.0
          comment: 
          ensemble_member_index: 0
          ensemble_members: 100
          history: Updated at 30/09/2013 09:38:38
          institution: Met Office Hadley Centre / Climatic Research Unit, University of East ...
          reference: Morice, C. P., J. J. Kennedy, N. A. Rayner, and P. D. Jones (2012), Quantifying...
          reference_period: [1961 1990]
          source: CRUTEM.4.2.0.0, HadSST.3.1.0.0
          title: HadCRUT4 near-surface temperature ensemble data - ensemble median
          version: HadCRUT.4.2.0.0

Saving data: iris.save

Saving is easy. It can be done in NetCDF, Met Office PP, and Grib2. The suffix is the usual control, e.g.


In [10]:
iris.save(mycube, '/user_home/w_OliverBrowne/data/tester.nc')
!ls -lh ~/data/tester.nc


-rw-rw-r--+ 1 w_OliverBrowne w_OliverBrowne 1.6M Apr 29 17:56 /user_home/w_OliverBrowne/data/tester.nc

Accessing the actual data: cube.data

We saw earlier how to load a cube, examine its metadata, and get a sub-set of it. Sometimes we want to look at the raw data held within the cube. That data is held in a numpy arrays, as we can see when we ask for it..


In [11]:
print mycube.data
print '--'*50
print mycube.coord('longitude').points


[[[-3.1212072372436523 -3.6611440181732178 -4.130472183227539 ..., -- -- --]
  [-2.206348419189453 -3.1869609355926514 -3.240568161010742 ..., -- -- --]
  [-3.91144061088562 -4.21144962310791 -3.6211435794830322 ..., -- -- --]
  [-5.274986267089844 -5.482906341552734 -6.00362491607666 ..., -- -- --]
  [-- -- -5.108006477355957 ..., -- -- --]]

 [[-- -- -- ..., -- -- --]
  [-0.6189641356468201 1.3698525428771973 -0.28829100728034973 ..., -- --
   --]
  [2.9192702770233154 1.899109125137329 1.193287968635559 ..., -- -- --]
  [2.117946147918701 3.128809928894043 3.387681484222412 ..., -- -- --]
  [-- -- 0.7174021005630493 ..., -- -- --]]

 [[-0.12547747790813446 -- -- ..., -- -- --]
  [-1.458777666091919 -1.1208312511444092 -2.4547319412231445 ..., -- -- --]
  [-2.151918411254883 -2.4202094078063965 -2.8737549781799316 ..., -- -- --]
  [-2.5314083099365234 -3.074037551879883 -2.26218318939209 ..., -- -- --]
  [-- -- -3.07560396194458 ..., -- -- --]]

 ..., 
 [[-0.37730494141578674 -0.25883811712265015 -0.232458233833313 ...,
   0.6258227825164795 0.8061743974685669 0.4860484302043915]
  [-0.28086841106414795 -0.04777210205793381 -0.123653843998909 ...,
   0.5509220957756042 0.8243297338485718 0.3498220443725586]
  [-0.05002003163099289 0.9228976964950562 0.9245114326477051 ...,
   1.0247204303741455 0.5663504004478455 0.19766049087047577]
  [-0.6084867715835571 -0.01662907376885414 0.6543121337890625 ...,
   0.8391696214675903 0.5906949043273926 0.4248795211315155]
  [0.42929786443710327 -0.13868431746959686 0.42582690715789795 ...,
   0.6977628469467163 -0.1327328383922577 0.9631644487380981]]

 [[0.7392256855964661 0.8838582038879395 1.1195366382598877 ...,
   1.808112621307373 2.269096851348877 1.4205644130706787]
  [2.013071060180664 1.7809333801269531 1.1759599447250366 ...,
   0.12383653223514557 0.5160945653915405 0.893868088722229]
  [3.4215664863586426 3.0118229389190674 3.030275821685791 ...,
   0.44366514682769775 0.5010309219360352 0.12540656328201294]
  [0.6404116153717041 2.165029525756836 2.5720558166503906 ...,
   1.2371703386306763 0.6665645837783813 0.7471789121627808]
  [0.8989510536193848 0.9796202182769775 1.1438648700714111 ...,
   2.468184471130371 3.25018310546875 1.572434663772583]]

 [[0.8381973505020142 0.43397873640060425 0.9918569326400757 ...,
   1.5441250801086426 2.2518253326416016 1.4807782173156738]
  [1.4829208850860596 1.6063967943191528 0.9720247983932495 ...,
   1.3446682691574097 0.8563520312309265 1.0676681995391846]
  [1.3653123378753662 1.6469001770019531 2.0974111557006836 ...,
   1.684618353843689 1.4754317998886108 1.3069658279418945]
  [0.8832257390022278 1.5529133081436157 1.6317083835601807 ...,
   2.539888381958008 2.3792741298675537 1.3463228940963745]
  [1.6101698875427246 1.618232250213623 1.0042062997817993 ...,
   1.8869011402130127 -0.261691689491272 1.8282470703125]]]
----------------------------------------------------------------------------------------------------
[   2.5    7.5   12.5   17.5   22.5   27.5   32.5   37.5   42.5   47.5
   52.5   57.5   62.5   67.5   72.5   77.5   82.5   87.5   92.5   97.5
  102.5  107.5  112.5  117.5  122.5  127.5  132.5  137.5  142.5  147.5
  152.5  157.5  162.5  167.5  172.5  177.5]

Numpy data arrays have a bunch of useful functions defined on them. For instance we can ask what the min/max/mean etc are..


In [12]:
# and whilst we're here, how about the absolute min/max..
print eu.data.min(),eu.data.mean(),eu.data.max()


-14.6088 -0.0527571372049 10.7449

In [13]:
import numpy as np
print np.mean(eu.data[0:9]),np.mean(eu.data[-10:-1])


-0.800152587891 0.956177340487

If we want to, we can still drill into the individual data by hand, e.g.


In [14]:
# what shape is our data..
print eu.data.shape
# right, let's choose the first 12 months, near a corner..
print eu.data[0:12,0,0]
# Where a '--' is printed, that means missing data.


(1964, 5, 10)
[-3.1212072372436523 -- -0.12547747790813446 0.14354702830314636
 0.05412481725215912 -- 1.0202434062957764 0.7915058135986328 -- -- --
 1.3976106643676758]

ASIDE: for information on an object / function / module use help()


In [15]:
a=2
help(a)


Help on int object:

class int(object)
 |  int(x=0) -> int or long
 |  int(x, base=10) -> int or long
 |  
 |  Convert a number or string to an integer, or return 0 if no arguments
 |  are given.  If x is floating point, the conversion truncates towards zero.
 |  If x is outside the integer range, the function returns a long instead.
 |  
 |  If x is not a number or if base is given, then x must be a string or
 |  Unicode object representing an integer literal in the given base.  The
 |  literal can be preceded by '+' or '-' and be surrounded by whitespace.
 |  The base defaults to 10.  Valid bases are 0 and 2-36.  Base 0 means to
 |  interpret the base from the string as an integer literal.
 |  >>> int('0b100', base=0)
 |  4
 |  
 |  Methods defined here:
 |  
 |  __abs__(...)
 |      x.__abs__() <==> abs(x)
 |  
 |  __add__(...)
 |      x.__add__(y) <==> x+y
 |  
 |  __and__(...)
 |      x.__and__(y) <==> x&y
 |  
 |  __cmp__(...)
 |      x.__cmp__(y) <==> cmp(x,y)
 |  
 |  __coerce__(...)
 |      x.__coerce__(y) <==> coerce(x, y)
 |  
 |  __div__(...)
 |      x.__div__(y) <==> x/y
 |  
 |  __divmod__(...)
 |      x.__divmod__(y) <==> divmod(x, y)
 |  
 |  __float__(...)
 |      x.__float__() <==> float(x)
 |  
 |  __floordiv__(...)
 |      x.__floordiv__(y) <==> x//y
 |  
 |  __format__(...)
 |  
 |  __getattribute__(...)
 |      x.__getattribute__('name') <==> x.name
 |  
 |  __getnewargs__(...)
 |  
 |  __hash__(...)
 |      x.__hash__() <==> hash(x)
 |  
 |  __hex__(...)
 |      x.__hex__() <==> hex(x)
 |  
 |  __index__(...)
 |      x[y:z] <==> x[y.__index__():z.__index__()]
 |  
 |  __int__(...)
 |      x.__int__() <==> int(x)
 |  
 |  __invert__(...)
 |      x.__invert__() <==> ~x
 |  
 |  __long__(...)
 |      x.__long__() <==> long(x)
 |  
 |  __lshift__(...)
 |      x.__lshift__(y) <==> x<<y
 |  
 |  __mod__(...)
 |      x.__mod__(y) <==> x%y
 |  
 |  __mul__(...)
 |      x.__mul__(y) <==> x*y
 |  
 |  __neg__(...)
 |      x.__neg__() <==> -x
 |  
 |  __nonzero__(...)
 |      x.__nonzero__() <==> x != 0
 |  
 |  __oct__(...)
 |      x.__oct__() <==> oct(x)
 |  
 |  __or__(...)
 |      x.__or__(y) <==> x|y
 |  
 |  __pos__(...)
 |      x.__pos__() <==> +x
 |  
 |  __pow__(...)
 |      x.__pow__(y[, z]) <==> pow(x, y[, z])
 |  
 |  __radd__(...)
 |      x.__radd__(y) <==> y+x
 |  
 |  __rand__(...)
 |      x.__rand__(y) <==> y&x
 |  
 |  __rdiv__(...)
 |      x.__rdiv__(y) <==> y/x
 |  
 |  __rdivmod__(...)
 |      x.__rdivmod__(y) <==> divmod(y, x)
 |  
 |  __repr__(...)
 |      x.__repr__() <==> repr(x)
 |  
 |  __rfloordiv__(...)
 |      x.__rfloordiv__(y) <==> y//x
 |  
 |  __rlshift__(...)
 |      x.__rlshift__(y) <==> y<<x
 |  
 |  __rmod__(...)
 |      x.__rmod__(y) <==> y%x
 |  
 |  __rmul__(...)
 |      x.__rmul__(y) <==> y*x
 |  
 |  __ror__(...)
 |      x.__ror__(y) <==> y|x
 |  
 |  __rpow__(...)
 |      y.__rpow__(x[, z]) <==> pow(x, y[, z])
 |  
 |  __rrshift__(...)
 |      x.__rrshift__(y) <==> y>>x
 |  
 |  __rshift__(...)
 |      x.__rshift__(y) <==> x>>y
 |  
 |  __rsub__(...)
 |      x.__rsub__(y) <==> y-x
 |  
 |  __rtruediv__(...)
 |      x.__rtruediv__(y) <==> y/x
 |  
 |  __rxor__(...)
 |      x.__rxor__(y) <==> y^x
 |  
 |  __str__(...)
 |      x.__str__() <==> str(x)
 |  
 |  __sub__(...)
 |      x.__sub__(y) <==> x-y
 |  
 |  __truediv__(...)
 |      x.__truediv__(y) <==> x/y
 |  
 |  __trunc__(...)
 |      Truncating an Integral returns itself.
 |  
 |  __xor__(...)
 |      x.__xor__(y) <==> x^y
 |  
 |  bit_length(...)
 |      int.bit_length() -> int
 |      
 |      Number of bits necessary to represent self in binary.
 |      >>> bin(37)
 |      '0b100101'
 |      >>> (37).bit_length()
 |      6
 |  
 |  conjugate(...)
 |      Returns self, the complex conjugate of any int.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  denominator
 |      the denominator of a rational number in lowest terms
 |  
 |  imag
 |      the imaginary part of a complex number
 |  
 |  numerator
 |      the numerator of a rational number in lowest terms
 |  
 |  real
 |      the real part of a complex number
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __new__ = <built-in method __new__ of type object>
 |      T.__new__(S, ...) -> a new object with type S, a subtype of T


In [16]:
# That wasn't amazingly helpful!  Here's something a bit more useful..
help(iris.analysis.cartography.area_weights)


Help on function area_weights in module iris.analysis.cartography:

area_weights(cube, normalize=False)
    Returns an array of area weights, with the same dimensions as the cube.
    
    This is a 2D lat/lon area weights array, repeated over the non lat/lon
    dimensions.
    
    Args:
    
    * cube (:class:`iris.cube.Cube`):
        The cube to calculate area weights for.
    
    Kwargs:
    
    * normalize (False/True):
        If False, weights are grid cell areas. If True, weights are grid
        cell areas divided by the total grid area.
    
    The cube must have coordinates 'latitude' and 'longitude' with bounds.
    
    Area weights are calculated for each lat/lon cell as:
    
        .. math::
    
            r^2 cos(lat_0) (lon_1 - lon_0) - r^2 cos(lat_1) (lon_1 - lon_0)
    
    Currently, only supports a spherical datum.
    Uses earth radius from the cube, if present and spherical.
    Defaults to iris.analysis.cartography.DEFAULT_SPHERICAL_EARTH_RADIUS.

Reducing dimensions: The collapsed function

This could be used for averaging over a dimension (e.g. space to give a global-mean) or for looking for a particular value in a dimension (e.g. the maximum value at each point, or counting the number of times a threshold was exceeded).

Since we've got temperature we'll start with the global average. We'll collapse the spatial dimensions, using a weighted-mean.

This dataset happens not to have cell bound information, which we need for area-weighting, so we'll add it.

(A small warning appears because Iris doesn't quite have all the information about the map projection so has to assume the world is perfectly spherical).


In [17]:
# guess some bounds (only do this once!)..
sat.coord('latitude').guess_bounds()
sat.coord('longitude').guess_bounds()

# calculate the weighting for each cell..
grid_areas = iris.analysis.cartography.area_weights(sat)

# do the area average by..
#  * collapse the latitude & longitude dimensions,
#  * apply the MEAN function
#  * use our calculated weights
sat_ts = sat.collapsed( ['latitude','longitude'],iris.analysis.MEAN,weights=grid_areas)

# and what does the new cube look like?
print sat_ts


near_surface_temperature_anomaly / (K) (time: 1964)
     Dimension coordinates:
          time                              x
     Scalar coordinates:
          latitude: 0.0 degrees, bound=(-90.0, 90.0) degrees
          longitude: 0.0 degrees, bound=(-180.0, 180.0) degrees
     Attributes:
          Conventions: CF-1.0
          comment: 
          ensemble_member_index: 0
          ensemble_members: 100
          history: Updated at 30/09/2013 09:38:38
          institution: Met Office Hadley Centre / Climatic Research Unit, University of East ...
          reference: Morice, C. P., J. J. Kennedy, N. A. Rayner, and P. D. Jones (2012), Quantifying...
          reference_period: [1961 1990]
          source: CRUTEM.4.2.0.0, HadSST.3.1.0.0
          title: HadCRUT4 near-surface temperature ensemble data - ensemble median
          version: HadCRUT.4.2.0.0
     Cell methods:
          mean: latitude, longitude
/opt/anaconda/envs/SciTools/lib/python2.7/site-packages/Iris-1.7.0_dev-py2.7-linux-x86_64.egg/iris/analysis/cartography.py:316: UserWarning: Using DEFAULT_SPHERICAL_EARTH_RADIUS.
  warnings.warn("Using DEFAULT_SPHERICAL_EARTH_RADIUS.")

And now perhaps we're interested in the average of the first and last decades? We could operate on the raw data, like so..


In [18]:
# get the first 120 slices, then average over them..
print sat_ts.data[0:121].mean()

# get the last 120 cells, and average over them..
print sat_ts.data[-122:-1].mean()


-0.313192669839
0.484871089494

Or we could slice directly on our timeseries (which is more useful as it will preserve all the metadata for us)..


In [19]:
# get a new cube that's a subsection,
first = sat_ts[0:121]

# and collapse in the time-dimension, using the mean..
first_avg = first.collapsed('time',iris.analysis.MEAN)

# print the results..
print first_avg
print '--'*50
print first_avg.data
print '--'*50

# fyi, if all you cared about was the result, you could have got it in one line..
print sat_ts[0:121].collapsed('time',iris.analysis.MEAN).data


near_surface_temperature_anomaly / (K) (-- : 1)
     Scalar coordinates:
          latitude: 0.0 degrees, bound=(-90.0, 90.0) degrees
          longitude: 0.0 degrees, bound=(-180.0, 180.0) degrees
          time: 1855-01-16 12:00:00, bound=(1850-01-16 12:00:00, 1860-01-16 12:00:00)
     Attributes:
          Conventions: CF-1.0
          comment: 
          ensemble_member_index: 0
          ensemble_members: 100
          history: Updated at 30/09/2013 09:38:38
          institution: Met Office Hadley Centre / Climatic Research Unit, University of East ...
          reference: Morice, C. P., J. J. Kennedy, N. A. Rayner, and P. D. Jones (2012), Quantifying...
          reference_period: [1961 1990]
          source: CRUTEM.4.2.0.0, HadSST.3.1.0.0
          title: HadCRUT4 near-surface temperature ensemble data - ensemble median
          version: HadCRUT.4.2.0.0
     Cell methods:
          mean: latitude, longitude
          mean: time
----------------------------------------------------------------------------------------------------
[-0.31319267]
----------------------------------------------------------------------------------------------------
[-0.31319267]
/opt/anaconda/envs/SciTools/lib/python2.7/site-packages/Iris-1.7.0_dev-py2.7-linux-x86_64.egg/iris/coords.py:954: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for u'time'.
  warnings.warn(msg.format(self.name()))

Plotting: Timeseries

And now to plot. There are three ways to do this.

  • use the iris.quickplot routines when you want to take a quick look - they use the metadata to label axis etc so that plotting is a one line task
  • using the iris.plot routines to get more fine-grained control or for more unusual data
  • extract the 'data' out of the cube and use whatever plotting routine you like, e.g. matplotlib.pyplot

We'll use the first, and a little bit of the last..


In [20]:
import iris.quickplot
import matplotlib.pyplot

To make a plot using the quickplot approach is pretty simple..


In [21]:
# make the plot..
iris.quickplot.plot( sat_ts )

# display it on the screen (as opposed to, say, saving it to a file)..
matplotlib.pyplot.show()


ASIDE: Module Name Shortcuts

There are three mains ways to load functions into python using import, e.g.

  • import iris.quickplot
  • from iris import quickplot
  • import iris.quickplot as qplt

The difference is the way we then call them..

  • iris.quickplot.plot( mycube )
  • quickplot.plot( mycube )
  • qplt.plot( mycube )

The most read-able - in case you plan to share your code - is to use the first approach. The speediest to type is the last approach. Which you choose is up to you.

Plotting: more Timeseries

That was easy, (the point of it!) but there are times we want a bit more refinement. The example below demonstrates how to combine a matplotlib tool (subplot) with a user-defined function (smooth) and some line control properties. It includes..

  • 3 different plots (using 'subplot')
  • Defining & applying a (really-basic) smoothing function
  • Controlling line properties. (If you're used to Matlab/IDL syntax the line control keywords will be pretty familiar.)
  • adding a legend
  • chopping up the original data in new ways for plotting

In [22]:
# define a smoothing function..
def smooth(cube,window):
    return cube.rolling_window('time',iris.analysis.MEAN,window)

# first plot..(merge across the top two cells)
fig = matplotlib.pyplot.figure(figsize=(16,16))
ax = matplotlib.pyplot.subplot2grid((2,2),(0,0),colspan=2)
pid = iris.quickplot.plot(sat_ts,'grey')
matplotlib.pyplot.axhline(0,linestyle='--',color='grey')
iris.quickplot.plot(smooth(sat_ts, 12), 'r', linewidth=2.0, label='12-month rolling mean')
leg = matplotlib.pyplot.legend(loc='lower right')

# second plot...
ax = matplotlib.pyplot.subplot2grid((2,2),(1,0))
pid = iris.quickplot.plot(smooth(sat_ts,60), 'grey', linewidth=2.0, label='global-mean')
matplotlib.pyplot.axhline(0,linestyle='--',color='grey')
for lat in range(-60,0,20):
    mylats = iris.Constraint(latitude=lambda c:lat<c<lat+20)
    data = sat.extract(mylats).collapsed(['latitude','longitude'], iris.analysis.MEAN)
    iris.quickplot.plot(smooth(data,60), label=str(-1*lat)+' to '+str(-1*(lat+20))+'S')
matplotlib.pyplot.legend(loc='upper left')

# third plot...
#plt.subplot(2,2,4)
ax = plt.subplot2grid((2,2),(1,1))
pid = iris.quickplot.plot(smooth(sat_ts,60),'grey',linewidth=2.0,label='global-mean')
matplotlib.pyplot.axhline(0, linestyle='--', color='grey')
for lat in range(0,80,20):
    mylats = iris.Constraint(latitude=lambda c: lat < c < lat+20)
    data = sat.extract(mylats).collapsed(['latitude','longitude'], iris.analysis.MEAN)
    mylabel = str(lat)+' to '+str(lat+20)+'N'
    iris.quickplot.plot(smooth(data, 60), label=mylabel)
matplotlib.pyplot.legend(loc='upper left')

matplotlib.pyplot.show()


/opt/anaconda/envs/SciTools/lib/python2.7/site-packages/Iris-1.7.0_dev-py2.7-linux-x86_64.egg/iris/cube.py:2715: UserWarning: Collapsing spatial coordinate u'latitude' without weighting
  warnings.warn(msg.format(coord.name()))

Plotting: Maps

Let's suppose we want a hovmoeller plot, of the last 15 years. First, for convenience we add an extra coordinate. We'll look at more uses for this later...


In [23]:
# make a copy of the cube..
tmp = sat.copy()

# add a new coordinate to the cube..
import iris.coord_categorisation
iris.coord_categorisation.add_year( tmp , 'time' )

# let's look at that coordinate..
print tmp


near_surface_temperature_anomaly / (K) (time: 1964; latitude: 36; longitude: 72)
     Dimension coordinates:
          time                              x               -              -
          latitude                          -               x              -
          longitude                         -               -              x
     Auxiliary coordinates:
          year                              x               -              -
     Attributes:
          Conventions: CF-1.0
          comment: 
          ensemble_member_index: 0
          ensemble_members: 100
          history: Updated at 30/09/2013 09:38:38
          institution: Met Office Hadley Centre / Climatic Research Unit, University of East ...
          reference: Morice, C. P., J. J. Kennedy, N. A. Rayner, and P. D. Jones (2012), Quantifying...
          reference_period: [1961 1990]
          source: CRUTEM.4.2.0.0, HadSST.3.1.0.0
          title: HadCRUT4 near-surface temperature ensemble data - ensemble median
          version: HadCRUT.4.2.0.0

In [24]:
# and specifically let's look at how the 'time' and the 'year' coordinates compare..
print tmp.coord('time')
print '\n'+'--'*50+'\n'
print tmp.coord('year')


DimCoord([1850-01-16 12:00:00, 1850-02-15 00:00:00, 1850-03-16 12:00:00, ...,
       2013-06-16 00:00:00, 2013-07-16 12:00:00, 2013-08-16 12:00:00], standard_name=u'time', calendar=u'gregorian', long_name=u'time', var_name='time', attributes={'start_month': 1, 'end_month': 8, 'start_year': 1850, 'end_year': 2013})

----------------------------------------------------------------------------------------------------

AuxCoord(array([1850, 1850, 1850, ..., 2013, 2013, 2013]), standard_name=None, units=Unit('1'), long_name=u'year', attributes={'start_month': 1, 'end_month': 8, 'start_year': 1850, 'end_year': 2013})

We'll come back to the use of extra coordinates later. Now, we want to extract just say the last 15 years, and average over them...


In [25]:
tmp = tmp.extract(iris.Constraint(year=lambda yr: 1995<yr<2010) )
sat_hv = tmp.collapsed('longitude',iris.analysis.MEAN)
print sat_hv


near_surface_temperature_anomaly / (K) (time: 168; latitude: 36)
     Dimension coordinates:
          time                              x              -
          latitude                          -              x
     Auxiliary coordinates:
          year                              x              -
     Scalar coordinates:
          longitude: 0.0 degrees, bound=(-180.0, 180.0) degrees
     Attributes:
          Conventions: CF-1.0
          comment: 
          ensemble_member_index: 0
          ensemble_members: 100
          history: Updated at 30/09/2013 09:38:38
          institution: Met Office Hadley Centre / Climatic Research Unit, University of East ...
          reference: Morice, C. P., J. J. Kennedy, N. A. Rayner, and P. D. Jones (2012), Quantifying...
          reference_period: [1961 1990]
          source: CRUTEM.4.2.0.0, HadSST.3.1.0.0
          title: HadCRUT4 near-surface temperature ensemble data - ensemble median
          version: HadCRUT.4.2.0.0
     Cell methods:
          mean: longitude

In [26]:
#sat_hv.coord('time').guess_bounds()
matplotlib.pyplot.figure(figsize=(8,8))
iris.quickplot.contourf( sat_hv , 20 , vmin=-4 , vmax=4 )
matplotlib.pyplot.show()


Which is alright, but the axis are a bit poor, we can do better..


In [29]:
# extra modules we'll need..
import iris.palette # <-- this is where the 'Brewer' colormaps are stored.
import matplotlib.cm # <-- load a custom-colormap
import matplotlib.dates # <-- useful tools for generating lists of dates

# set the figure size to be 12" * 8"
matplotlib.pyplot.figure(figsize=(12,8))

# load a Red-Blue Brewer color map
color_map = matplotlib.cm.get_cmap('brewer_RdBu_11')

# plot
iris.quickplot.contourf(sat_hv, 20, cmap=color_map, vmin=-4, vmax=4)

# a better label for the x-axis
matplotlib.pyplot.xlabel('Time (years)')

# Stop matplotlib providing clever axes range padding
matplotlib.pyplot.axis('tight')

# set nicely spaced y-axis ticks
matplotlib.pyplot.gca().yaxis.set_ticks([-60,-30,0,30,60])

# As we are plotting annual variability, put years as the x ticks.
# --> get a handle to the x-axis
xaxis = matplotlib.pyplot.gca().xaxis
# --> use matplotlib function to generate a list of every 2nd year
xaxis.set_major_locator(matplotlib.dates.YearLocator(2))
# --> format the ticks to just show the year
xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%Y'))

# Display the plot
plt.show()


You can use this site..

http://colorbrewer2.org/

to choose a colormap suitable for your task. And you can read more about using Colourmaps in Iris here..

http://scitools.org.uk/iris/docs/latest/userguide/plotting_a_cube.html#brewer-colour-palettes

To save the plot to a file..

We use matplotlib's savefig command..


In [30]:
matplotlib.pyplot.savefig('/user_home/w_OliverBrowne/figures/my_figure.png')


<matplotlib.figure.Figure at 0x51366d0>

Adding / using extra coordinates

Earlier we added an extra coordinate ('year') to a cube. Now let's see how that can be used. Let's look at how much winter-average temperature, over N.America, varies from year-to-year. So we need to..

  • extract the N.America region
  • average each winter season, which is done by creating 'seasons' in the first place
  • calculate the variance
  • plot a map, and show the value of setting a good projection

In [31]:
# modules we'll need..
import cartopy.crs
import iris
import iris.coord_categorisation
import iris.quickplot
import matplotlib.pyplot

# >> Assume you have already loaded the sat <<

# extract North America..
sat_us = sat.extract(iris.Constraint(longitude=lambda x:-150<x<-50, latitude=lambda x:20<x<80))

# average over seasons..
iris.coord_categorisation.add_season(sat_us, 'time')
iris.coord_categorisation.add_season_year(sat_us, 'time')
sat_us_djf = sat_us.aggregated_by(['season','season_year'], iris.analysis.MEAN).extract(iris.Constraint(season='djf'))

# check that we have a reduced cube - should be a few lats and lons, and maybe ~150yrs worth..
print "Our cube of winter temperature over north america has the following dimensions.."
print sat_us_djf.summary(True)
print "--"*50

# calculate the variance of each cell..
sat_us_djf_var = sat_us_djf.collapsed('time' , iris.analysis.VARIANCE)

# plot..
matplotlib.pyplot.subplots(1,2,figsize=(15,6))

# 1. the default quick-plot..
matplotlib.pyplot.subplot(1,2,1)
iris.quickplot.pcolormesh(sat_us_djf_var)
matplotlib.pyplot.title('Variance of the winter-average SAT')
matplotlib.pyplot.gca().coastlines()

# 2. still a quick-plot but now use a better projection..
matplotlib.pyplot.subplot(1, 2, 2, projection=cartopy.crs.RotatedPole(100, 37))
iris.quickplot.pcolormesh(sat_us_djf_var)
matplotlib.pyplot.gca().coastlines()
matplotlib.pyplot.title('Variance of the winter-average SAT')

# display the plot
matplotlib.pyplot.show()


Our cube of winter temperature over north america has the following dimensions..
near_surface_temperature_anomaly / (K) (time: 164; latitude: 10; longitude: 18)
----------------------------------------------------------------------------------------------------
/opt/anaconda/envs/SciTools/lib/python2.7/site-packages/Iris-1.7.0_dev-py2.7-linux-x86_64.egg/iris/coords.py:954: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for u'season_year'.
  warnings.warn(msg.format(self.name()))