This morning's session hopefully covered getting started with coding in Python. As well as being able to write modelling code in Python, and use it for retrieving / initial processing of data, it can also be used for analysis of your data, and that's what we'll look at here.
I spend a lot of time working with climate model output or observations, that are provided in gridded files, typically in NetCDF format. Python, like Matlab and IDL, has tools that are capable of working with NetCDF files. One tool in particular stands out, because it is takes advantage of a set of rules we use to describe data (CF) to make working with and plotting climate data easier.
There are more python modules that are also helpful (e.g. pandas, which is like R), but today I'll just be focusing on Iris.
What is it?
Where is it?
How do I learn more?
In [1]:
# to 'get' iris, all you have to do (on geos machines) is import it..
import iris
Let's start with loading some data, but let's assume we don't know anything about the file's contents. We use the general 'load' command..
In [2]:
cubes = iris.load('~/data/HadCRUT*.nc')
print cubes
We can now see the file contains a couple of variables. So use a specific loader to just load the SAT field..
In [3]:
sat = iris.load_cube('~/data/HadCRUT*.nc', 'near_surface_temperature_anomaly')
And to explore the metadata, we can just print the object..
In [4]:
print sat
If we wanted to know more detail about the coordinates, we could, e.g.
In [5]:
# print the list of the coordinates..
for i in sat.coords(): print i.name()
print ' '
#now give me more info about an individual coordinate, say time..
print sat.coord('time')
print ' '
# ooh, time has units, let's look at those..
print sat.coord('time').units
The 'name' can have a couple of versions in line with CF rules, e.g. "latitude" as it's standard_name but "lat" as the name in the file, and "latitude_in_degress_from_turkey". We can inspect them the same way...
In [6]:
print sat.name() # <-- function that returns the name of the variable in the cube
print sat.standard_name # <-- the CF compliant equivalent name
print sat.long_name # <-- a name that has been given the variable in the file
Python includes a function called lambda which allows you to quickly generate functions that return a true/false.
e.g. create a function, f(x), that is true if x is between 0 and 10..
In [7]:
f = lambda x: 0 < x < 10
print f(-1),f(5),f(11)
The reason for explaining this is because we can use lambda functions to help us quickly select parts of the grids we're loading.
If we're only interested in a small chunk of the dataset, we have two options..
Let's suppose we wanted to load just the mid-lats for the Eastern hemisphere. I'l show both modes..
In [8]:
# define my search rules, and use the lambda function to help..
func = lambda a: 35<a<60
my_lat_rule = iris.Constraint(latitude=func)
#
# we could also write this as one line, e.g.
my_lon_rule = iris.Constraint(longitude=lambda j: 0 < j)
#
# then combine the name and the other constraints...
mycube = iris.load_cube('~/data/HadCRUT*.nc', 'near_surface_temperature_anomaly' & my_lat_rule & my_lon_rule)
#
# and let's check that we've got a smaller cube than we got before...
print "old:",sat.summary(True)
print "new:",mycube.summary(True)
Note that at this stage all that has been loaded is the meta-data, we haven't yet accessed the actual data, which makes it fast.
The second method I mentioned - extract - can be used on an existing cube, e.g. suppose we're interested in getting some new cubes which contain just europe and asia..
In [9]:
# artibrarily split the EU and Asia at 50E..
eu = mycube.extract(iris.Constraint(longitude=lambda c:c<50))
asia = mycube.extract(iris.Constraint(longitude=lambda c:c>50))
print eu
Saving is easy. It can be done in NetCDF, Met Office PP, and Grib2. The suffix is the usual control, e.g.
In [10]:
iris.save(mycube, '/user_home/w_OliverBrowne/data/tester.nc')
!ls -lh ~/data/tester.nc
We saw earlier how to load a cube, examine its metadata, and get a sub-set of it. Sometimes we want to look at the raw data held within the cube. That data is held in a numpy arrays, as we can see when we ask for it..
In [11]:
print mycube.data
print '--'*50
print mycube.coord('longitude').points
Numpy data arrays have a bunch of useful functions defined on them. For instance we can ask what the min/max/mean etc are..
In [12]:
# and whilst we're here, how about the absolute min/max..
print eu.data.min(),eu.data.mean(),eu.data.max()
In [13]:
import numpy as np
print np.mean(eu.data[0:9]),np.mean(eu.data[-10:-1])
If we want to, we can still drill into the individual data by hand, e.g.
In [14]:
# what shape is our data..
print eu.data.shape
# right, let's choose the first 12 months, near a corner..
print eu.data[0:12,0,0]
# Where a '--' is printed, that means missing data.
In [15]:
a=2
help(a)
In [16]:
# That wasn't amazingly helpful! Here's something a bit more useful..
help(iris.analysis.cartography.area_weights)
This could be used for averaging over a dimension (e.g. space to give a global-mean) or for looking for a particular value in a dimension (e.g. the maximum value at each point, or counting the number of times a threshold was exceeded).
Since we've got temperature we'll start with the global average. We'll collapse the spatial dimensions, using a weighted-mean.
This dataset happens not to have cell bound information, which we need for area-weighting, so we'll add it.
(A small warning appears because Iris doesn't quite have all the information about the map projection so has to assume the world is perfectly spherical).
In [17]:
# guess some bounds (only do this once!)..
sat.coord('latitude').guess_bounds()
sat.coord('longitude').guess_bounds()
# calculate the weighting for each cell..
grid_areas = iris.analysis.cartography.area_weights(sat)
# do the area average by..
# * collapse the latitude & longitude dimensions,
# * apply the MEAN function
# * use our calculated weights
sat_ts = sat.collapsed( ['latitude','longitude'],iris.analysis.MEAN,weights=grid_areas)
# and what does the new cube look like?
print sat_ts
And now perhaps we're interested in the average of the first and last decades? We could operate on the raw data, like so..
In [18]:
# get the first 120 slices, then average over them..
print sat_ts.data[0:121].mean()
# get the last 120 cells, and average over them..
print sat_ts.data[-122:-1].mean()
Or we could slice directly on our timeseries (which is more useful as it will preserve all the metadata for us)..
In [19]:
# get a new cube that's a subsection,
first = sat_ts[0:121]
# and collapse in the time-dimension, using the mean..
first_avg = first.collapsed('time',iris.analysis.MEAN)
# print the results..
print first_avg
print '--'*50
print first_avg.data
print '--'*50
# fyi, if all you cared about was the result, you could have got it in one line..
print sat_ts[0:121].collapsed('time',iris.analysis.MEAN).data
And now to plot. There are three ways to do this.
We'll use the first, and a little bit of the last..
In [20]:
import iris.quickplot
import matplotlib.pyplot
To make a plot using the quickplot approach is pretty simple..
In [21]:
# make the plot..
iris.quickplot.plot( sat_ts )
# display it on the screen (as opposed to, say, saving it to a file)..
matplotlib.pyplot.show()
There are three mains ways to load functions into python using import, e.g.
The difference is the way we then call them..
The most read-able - in case you plan to share your code - is to use the first approach. The speediest to type is the last approach. Which you choose is up to you.
That was easy, (the point of it!) but there are times we want a bit more refinement. The example below demonstrates how to combine a matplotlib tool (subplot) with a user-defined function (smooth) and some line control properties. It includes..
In [22]:
# define a smoothing function..
def smooth(cube,window):
return cube.rolling_window('time',iris.analysis.MEAN,window)
# first plot..(merge across the top two cells)
fig = matplotlib.pyplot.figure(figsize=(16,16))
ax = matplotlib.pyplot.subplot2grid((2,2),(0,0),colspan=2)
pid = iris.quickplot.plot(sat_ts,'grey')
matplotlib.pyplot.axhline(0,linestyle='--',color='grey')
iris.quickplot.plot(smooth(sat_ts, 12), 'r', linewidth=2.0, label='12-month rolling mean')
leg = matplotlib.pyplot.legend(loc='lower right')
# second plot...
ax = matplotlib.pyplot.subplot2grid((2,2),(1,0))
pid = iris.quickplot.plot(smooth(sat_ts,60), 'grey', linewidth=2.0, label='global-mean')
matplotlib.pyplot.axhline(0,linestyle='--',color='grey')
for lat in range(-60,0,20):
mylats = iris.Constraint(latitude=lambda c:lat<c<lat+20)
data = sat.extract(mylats).collapsed(['latitude','longitude'], iris.analysis.MEAN)
iris.quickplot.plot(smooth(data,60), label=str(-1*lat)+' to '+str(-1*(lat+20))+'S')
matplotlib.pyplot.legend(loc='upper left')
# third plot...
#plt.subplot(2,2,4)
ax = plt.subplot2grid((2,2),(1,1))
pid = iris.quickplot.plot(smooth(sat_ts,60),'grey',linewidth=2.0,label='global-mean')
matplotlib.pyplot.axhline(0, linestyle='--', color='grey')
for lat in range(0,80,20):
mylats = iris.Constraint(latitude=lambda c: lat < c < lat+20)
data = sat.extract(mylats).collapsed(['latitude','longitude'], iris.analysis.MEAN)
mylabel = str(lat)+' to '+str(lat+20)+'N'
iris.quickplot.plot(smooth(data, 60), label=mylabel)
matplotlib.pyplot.legend(loc='upper left')
matplotlib.pyplot.show()
Let's suppose we want a hovmoeller plot, of the last 15 years. First, for convenience we add an extra coordinate. We'll look at more uses for this later...
In [23]:
# make a copy of the cube..
tmp = sat.copy()
# add a new coordinate to the cube..
import iris.coord_categorisation
iris.coord_categorisation.add_year( tmp , 'time' )
# let's look at that coordinate..
print tmp
In [24]:
# and specifically let's look at how the 'time' and the 'year' coordinates compare..
print tmp.coord('time')
print '\n'+'--'*50+'\n'
print tmp.coord('year')
We'll come back to the use of extra coordinates later. Now, we want to extract just say the last 15 years, and average over them...
In [25]:
tmp = tmp.extract(iris.Constraint(year=lambda yr: 1995<yr<2010) )
sat_hv = tmp.collapsed('longitude',iris.analysis.MEAN)
print sat_hv
In [26]:
#sat_hv.coord('time').guess_bounds()
matplotlib.pyplot.figure(figsize=(8,8))
iris.quickplot.contourf( sat_hv , 20 , vmin=-4 , vmax=4 )
matplotlib.pyplot.show()
Which is alright, but the axis are a bit poor, we can do better..
In [29]:
# extra modules we'll need..
import iris.palette # <-- this is where the 'Brewer' colormaps are stored.
import matplotlib.cm # <-- load a custom-colormap
import matplotlib.dates # <-- useful tools for generating lists of dates
# set the figure size to be 12" * 8"
matplotlib.pyplot.figure(figsize=(12,8))
# load a Red-Blue Brewer color map
color_map = matplotlib.cm.get_cmap('brewer_RdBu_11')
# plot
iris.quickplot.contourf(sat_hv, 20, cmap=color_map, vmin=-4, vmax=4)
# a better label for the x-axis
matplotlib.pyplot.xlabel('Time (years)')
# Stop matplotlib providing clever axes range padding
matplotlib.pyplot.axis('tight')
# set nicely spaced y-axis ticks
matplotlib.pyplot.gca().yaxis.set_ticks([-60,-30,0,30,60])
# As we are plotting annual variability, put years as the x ticks.
# --> get a handle to the x-axis
xaxis = matplotlib.pyplot.gca().xaxis
# --> use matplotlib function to generate a list of every 2nd year
xaxis.set_major_locator(matplotlib.dates.YearLocator(2))
# --> format the ticks to just show the year
xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%Y'))
# Display the plot
plt.show()
You can use this site..
to choose a colormap suitable for your task. And you can read more about using Colourmaps in Iris here..
http://scitools.org.uk/iris/docs/latest/userguide/plotting_a_cube.html#brewer-colour-palettes
We use matplotlib's savefig command..
In [30]:
matplotlib.pyplot.savefig('/user_home/w_OliverBrowne/figures/my_figure.png')
Earlier we added an extra coordinate ('year') to a cube. Now let's see how that can be used. Let's look at how much winter-average temperature, over N.America, varies from year-to-year. So we need to..
In [31]:
# modules we'll need..
import cartopy.crs
import iris
import iris.coord_categorisation
import iris.quickplot
import matplotlib.pyplot
# >> Assume you have already loaded the sat <<
# extract North America..
sat_us = sat.extract(iris.Constraint(longitude=lambda x:-150<x<-50, latitude=lambda x:20<x<80))
# average over seasons..
iris.coord_categorisation.add_season(sat_us, 'time')
iris.coord_categorisation.add_season_year(sat_us, 'time')
sat_us_djf = sat_us.aggregated_by(['season','season_year'], iris.analysis.MEAN).extract(iris.Constraint(season='djf'))
# check that we have a reduced cube - should be a few lats and lons, and maybe ~150yrs worth..
print "Our cube of winter temperature over north america has the following dimensions.."
print sat_us_djf.summary(True)
print "--"*50
# calculate the variance of each cell..
sat_us_djf_var = sat_us_djf.collapsed('time' , iris.analysis.VARIANCE)
# plot..
matplotlib.pyplot.subplots(1,2,figsize=(15,6))
# 1. the default quick-plot..
matplotlib.pyplot.subplot(1,2,1)
iris.quickplot.pcolormesh(sat_us_djf_var)
matplotlib.pyplot.title('Variance of the winter-average SAT')
matplotlib.pyplot.gca().coastlines()
# 2. still a quick-plot but now use a better projection..
matplotlib.pyplot.subplot(1, 2, 2, projection=cartopy.crs.RotatedPole(100, 37))
iris.quickplot.pcolormesh(sat_us_djf_var)
matplotlib.pyplot.gca().coastlines()
matplotlib.pyplot.title('Variance of the winter-average SAT')
# display the plot
matplotlib.pyplot.show()