The netCDF4
library provides access to data stored in NetCDF format, which is a very common modern format for storing scientific data. NetCDF files hold multidimensional arrays of data, plus attributes that describe the data. Different communities use different sets of attributes, but the most common set of attributes are defined by the Climate and Forecast conventions, otherwise known as the "CF Conventions".
In this practical we won't have time to look into the details of NetCDF or the CF conventions, but we'll just look at some simple code to extract and plot data.
(Many of the techniques in this lesson can apply to other file formats that store multidimensional arrays too, such as HDF.)
First, we'll import the NetCDF library:
In [1]:
import netCDF4
We've put some data files in the data
directory. Let's open one of them. This is some ocean potential temperature data from NCEP's Global Ocean Data Assimilation System.
In [2]:
nc = netCDF4.Dataset('../data/pottmp.2014.1time.nc')
We can see what variables we have in the dataset with nc.variables
. This returns a dictionary of Variable objects, the keys of which give the identifiers of the variables:
In [3]:
nc.variables.keys()
Out[3]:
(The leading 'u' means that these are Unicode strings. Don't worry about this, it just means that the strings could contain international characters like
ü or é if we wanted.)
We want to plot the potential temperature variable. Let's get a handle to this variable:
In [4]:
pottmp = nc.variables['pottmp']
We can find its units ("units" is an attribute of the vorticity variable):
In [5]:
pottmp.units
Out[5]:
(Which means "Kelvin" of course.) We can find its long name, which is a human-readable string that describes what the variable represents:
In [6]:
pottmp.long_name
Out[6]:
Some datasets provide standard names, which are more precise description of the variable. This (unfortunately) isn't provided in this dataset, but if it were, you could have found it using sst.standard_name
.
You can look up the definition of the standard name in the CF conventions' table of standard names if you want. (Note that in this particular case the standard name may not be quite correct.)
(Note that not all NetCDF files provide long names or standard names for their variables, but it's good practice to do so.)
Now let's extract some data from the variable. When we read data from a NetCDF variable, we get a Numpy array that we can manipulate in exactly the same way as any other Numpy array.
It's usually a good idea to check how big the data array will be, since they can be very large and can contain any number of dimensions. The shape
attribute let's us find this, without actually reading any data at this point:
In [7]:
pottmp.shape
Out[7]:
So the temperature variable is a four-dimensional array, but what are the dimensions? We can find this out too:
In [8]:
pottmp.dimensions
Out[8]:
So there is 1 value of time, 40 vertical levels, 418 latitude values and 360 of longitude. We're just going to plot a simple map, so we're going to slice the data at the first time value and level:
In [9]:
data = pottmp[0,0]
data.shape
Out[9]:
Now data
is a 2D array of all the data at the first timestep and vertical level. We can plot the data using matplotlib's contourf
function, which creates a filled contour plot. In this case we're using 20 levels of colour.
In [10]:
import matplotlib.pyplot as plt
# These two lines are needed to make matplotlib plot figures inline and at a decent size.
# They are not needed in scripts.
%matplotlib inline
plt.rcParams['figure.figsize'] = (12.0, 8.0)
# These lines do the actual plotting
plt.contourf(data, 20)
plt.show()
This plot isn't very satisfactory. Apart from the fact that there are no labels or colour bar on the plot to indicate what we are looking at, the axis values are wrong. We can correct this by finding the variables that hold these values.
We know from above that there is a dimension called 'lat' and a dimension called 'lon'. The variables that hold the values of latitude and longitude have the same names, so we can get handles to the variable objects like this:
In [11]:
lon = nc.variables['lon']
lat = nc.variables['lat']
Now we can read the actual values of longitude and latitude. Note that "[:]
" means "read all the values from the variable".
In [12]:
lonvals = lon[:]
latvals = lat[:]
Now we can make a better plot. Note how we use the long_name
and units
of the variables for the title and axis labels. Also, we are using a more appropriate colour scale than the default rainbow scale (question: why is this scale more appropriate?).
In [13]:
plt.title (pottmp.long_name + ' (' + pottmp.units + ')')
plt.xlabel(lon.long_name + ' (' + lon.units + ')')
plt.ylabel(lat.long_name + ' (' + lat.units + ')')
plt.contourf(lonvals, latvals, data, 20, cmap=plt.get_cmap('YlGnBu_r'))
plt.colorbar()
plt.show()
(By the way, you can resize the plot by dragging its bottom right-hand corner. Try it!)
Note that the axis labels and title are automatically read from the data file, which is neat.
We can create visualizations of many different views through the data. Recall that pottmp
is a four-dimensional variable with dimensions (time, level, lat, lon)
. Let's say we want to take a vertical profile at a certain point in the data (i.e. we want to find temperature as a function of depth). How can we do this?
First, of course, we have to extract the data from the variable. We're going to pick a point roughly in the middle of the above plot. Recall that we have 360 values of longitude and 418 of latitude. So let's pick the longitude at index 180 and the latitude at index 214 (i.e. roughly in the middle). We're going to pick the first time value (there is only one anyway). We can extract all the values of temperature with the following command:
In [14]:
profile = pottmp[0,:,214,180]
You should read this as "give me temperature values for all values of depth at the 214th latitude index and the 180th longitude index". (Recall that Python array indices start at zero, not one.)
If we're going to plot a graph, we also need the values of depth:
In [15]:
depthvar = nc.variables['level']
depthvals = depthvar[:]
Now we can create a plot of temperature versus depth:
In [16]:
plt.plot(depthvals, profile)
plt.xlabel(depthvar.long_name + ' (' + depthvar.units + ')')
plt.ylabel(pottmp.long_name + ' (' + pottmp.units + ')')
plt.title('Vertical profile: first attempt')
plt.show()
Well, this is OK, but to be more intuitive, we would probably like the depth axis to be the vertical axis, so we should plot depth versus temperature, not the other way around. However, this would result in depth increasing vertically, which is not what we want. So we have to insert some code to reverse the depth axis. Here's how:
In [17]:
plt.plot(profile, depthvals)
plt.xlabel(pottmp.long_name + ' (' + pottmp.units + ')')
plt.ylabel(depthvar.long_name + ' (' + depthvar.units + ')')
plt.gca().invert_yaxis() # Gets the axes of the plot and reverses the y axis
plt.title('Vertical profile: this is better')
plt.show()
This is what we expect to see in a typical deep-ocean location: a shallow, warm, well-mixed layer at the surface followed by a steep gradient (the thermocline) to cold temperatures at depth.
A script to generate the map plot above can be found in temperature.py
. You can use this as the basis for your experiments in these exercises.
data
folder contains output from Phil Browne's barotropic vorticity model in a file called barotropic.nc
. Create some plots (e.g. maps and transects) using this data file. The file has no vertical dimension so you can't create vertical profiles or sections.NetCDF tutorial from Unidata: http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-tutorial.html
The NetCDF operators (http://nco.sourceforge.net/) are an extremely useful set of command-line tools for manipulating data in NetCDF files. (Some of the data used in this practical were prepared using these tools. See the file ReadMe.txt
in the data
directory.)
In [17]: