This notebook contains a tutorial on the CESM model, and how to work with NetCDF
data in Python using the netCDF4
package. There are simple homework questions throughout the tutorial. Their purpose is just to get you up to speed on working with the model input and output files.
[your last name].ipynb
, e.g. my notebook should be called Rose.ipynb
. This makes it easier for me when I collect all your answersWe are using a version of the Community Earth System Model (CESM) which is developed and maintained at the National Center for Atmospheric Research in Boulder, CO. See the CESM website here: http://www2.cesm.ucar.edu
Our experiments will use CESM in the so-called Slab Ocean Model mode, in which the ocean is represented by a static layer of water with some fixed heat capacity but no motion.
This greatly simplifies the necessary calculations, particularly the time required for the model to reach equilibrium. The net effect heat transport by ocean currents is prescribed through a so-called q-flux, which really just means we prescribe sources and sinks of heat at different locations.
For (lots of) details, see http://www2.cesm.ucar.edu/working-groups/pwg/documentation/cesm1-paleo-toolkit/ocean/som
The governing equations (fluid dynamics, radiation, etc.) are continuous in space and time.
Like every GCM, the model solves approximations to these equations that are discretized to a finite grid.
The spatial resolution of each component is:
The model runs on a local compute cluster here at UAlbany. We can simulate about 5 years per day by running the model on 32 cores. Equilibration time for the slab ocean model is roughly 20 years. Thus it takes a few days to run any particularly scenario out to equilibrium. The atmospheric GCM uses about half of the cpu time, the sea ice uses about one quarter, and the rest goes to the land model, the coupler, and various other bits and pieces.
We will begin with a control run, i.e. we will set up the model with (approximately) realistic conditions and run it out to equilibrium. We can then measure how well our simulated climate agrees with observations.
First, let's take a look at some of the ingredients that go into the control run. All of the necessary data will be served up by a special data server sitting in the department, so you should be able to run this code to interact with the data on any computer that is connected to the internet.
You can browse the available data through a web interface here:
http://ramadda.atmos.albany.edu:8080/repository/entry/show/Top/Users/Brian+Rose/CESM+runs
Within this folder called CESM runs
, you will find another folder called som_input
which contains all the input files.
The data are all stored in NetCDF
files. Python has some nice interfaces for working with NetCDF
data files, including accessing files remotely over the internet. To begin, we need to import the Python package netCDF4
to read the data files.
We also set the notebook to inline
graphics mode to display figures right here in the notebook.
In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import netCDF4 as nc
You are encouraged to experiment and tinker with all the code below.
Here we are going to load the input topography file and take a look at what's inside.
We use the Dataset
object from the netCDF4
module as our basic container for any netCDF
data. Dataset()
requires at least one argument telling it what file to open. This can either be a file on your local disk or a URL.
In this case we are passing it a URL to our online dataserver. We'll put the URL in a string variable called datapath
to simplify things later on.
In [2]:
datapath = "http://ramadda.atmos.albany.edu:8080/repository/opendap/latest/Top/Users/Brian+Rose/CESM+runs/"
endstr = "/entry.das"
# Notice that in Python we can easily concatenate strings together just by `adding` them
fullURL = datapath + 'som_input/USGS-gtopo30_1.9x2.5_remap_c050602.nc' + endstr
print fullURL
In [3]:
# Now we actually open the file
topo = nc.Dataset( fullURL )
print topo
The Dataset
object has several important attributes. Much of this should look familiar if you have worked with netCDF
data before. This Python interface to netCDF
stores most of the data in dictionaries. For example, the actual variables contained in this dataset are:
In [4]:
for v in topo.variables: print(v)
The output of our little for loop is a list of variable names contained in the file we just opened -- actually the keys to the dictionary called variables
. Each variables contains some data along with a bunch of descriptive information about the data. Here's an example:
In [5]:
print topo.variables['PHIS']
This shows the syntax for reading a particular variable, which in this code is the surface geopotential (from which we can get surface elevation). It will be handy to store the grid information (latitude and longitude) in some local arrays. The example below shows that by adding [:]
at the end of the expression, we reference just the data itself (which is stored as a numpy
array) and not the accompanying decription. In the Python world this is called slicing. You can get subsets of the data by taking a smaller slice, using array indexing.
In [6]:
lat = topo.variables['lat'][:]
lon = topo.variables['lon'][:]
print lat.shape, lon.shape
Now we have made local copies of these two arrays that contain coordinate information for the actual data in the file. The .shape
command used above (which works for any numpy
array) gives us information about the dimensions of the arrays, in this case, 96 latitude points and 144 longitude points. If you like can also try just typing
print lat
or
print lon
and see the coordinates themselves.
We will now read the geopotential and make a plot of the topography of the Earth's surface as represented on the 2º grid. The code below makes a colorful plot of the topography. We also use the land-sea mask in order to plot nothing at grid points that are entirely ocean-covered.
Execute this code exactly as written first, and then play around with it to see how you might customize the graph.
Note that the function pcolormesh
does most of the work here. It's a function that makes color 2D plots of an array.
In [7]:
height = topo.variables['PHIS'][:] / 9.8 / 1000 # in kilometers
# This makes use of the `masked array`, part of the numpy module
# essentially a numpy array with a mask of the same size
height_masked = np.ma.MaskedArray(height, topo.variables['LANDFRAC'][:] == 0. )
fig1 = plt.figure()
ax1 = fig1.add_subplot(1, 1, 1)
cax1 = ax1.pcolormesh( lon, lat, height_masked )
ax1.set_title('Topography (km) and land-sea mask in CESM')
ax1.axis([0, 360, -90, 90])
ax1.set_xlabel('Longitude'); ax1.set_ylabel('Latitude');
cbar1 = plt.colorbar(cax1);
Note that at 2º resolution we can see many smaller features (e.g. Pacific islands). The model is given a fractional land cover for each grid point.
Here let's plot the land-sea mask itself so we can see where there is at least "some" water:
In [8]:
ocean_mask = np.ma.MaskedArray( topo.variables['LANDFRAC'][:], topo.variables['LANDFRAC'][:] == 1. )
fig2 = plt.figure()
ax2 = fig2.add_subplot(1, 1, 1)
cax2 = ax2.pcolormesh( lon, lat, ocean_mask )
ax2.set_title('Ocean mask in CESM')
ax2.axis([0, 360, -90, 90])
ax2.set_xlabel('Longitude'); ax2.set_ylabel('Latitude')
cbar2 = plt.colorbar(cax2);
In [9]:
som_input = nc.Dataset( datapath + 'som_input/pop_frc.1x1d.090130.nc' + endstr )
for v in som_input.variables: print v
The ocean / sea ice models exist on different grids than the atmosphere (1º instead of 2º resolution). Let's store the new coordinate information just like we did for the atmosphere grid.
In [10]:
lon_som = som_input.variables['xc'][:]
lat_som = som_input.variables['yc'][:]
print lon_som.shape, lat_som.shape
Now we are going to look at the annual mean heat flux out of the ocean, which is the prescribed 'q-flux' that we give to the slab ocean model.
It is stored in the field qdp
in the input file.
The sign convention in CESM is that qdp > 0
where heat is going IN to the ocean. We will change the sign to plot heat going OUT of the ocean INTO the atmosphere (a more atmosphere-centric viewpoint).
First, let's just look at the size of the data array:
In [11]:
np.shape(som_input.variables['qdp'][:])
Out[11]:
This means that there are 12 x 180 x 360 data points. One 180 x 360 grid for each calendar month!
Now we are going to take the average over the year at each point. We will use a very convenient numpy
array function np.mean()
, which just computes the point-by-point average. This leaves us with a single grid on 180 latitude points by 360 longitude points:
In [12]:
qdp_an = np.mean( som_input.variables['qdp'][:], axis=0 )
# Note that we have to tell numpy which axis to average over
# By default it will average over all axes simultaneously
print qdp_an.shape
Now make a nice plot of the annual mean q-flux. This code also overlays a contour of the land-sea mask for reference.
In [13]:
# We can always set a non-standard size for our figure window
fig3 = plt.figure(figsize=(10, 6))
ax3 = fig3.add_subplot(111)
cax3 = ax3.pcolormesh( lon_som, lat_som, -qdp_an, cmap=plt.cm.seismic, vmin=-700., vmax=700. )
cbar3 = plt.colorbar(cax3)
ax3.set_title( 'CESM: Prescribed heat flux out of ocean (W m$^{-2}$), annual mean', fontsize=14 )
ax3.axis([0, 360, -90, 90], fontsize=12)
ax3.set_xlabel('Longitude', fontsize=14)
ax3.set_ylabel('Latitude', fontsize=14)
ax3.contour( lon, lat, topo.variables['LANDFRAC'][:], [0.5,0.5], colors='k');
ax3.text(65, 50, 'Annual', fontsize=16 );
Notice all the spatial structure here:
All this structure is determined by ocean circulation, which we are not modeling here. Instead, we are prescribing these heat flux patterns as an input to the atmosphere.
This pattern changes throughout the year. Recall that we just averaged over all months to make this plot. We might want to look at just one month:
In [14]:
qdp_jan = som_input.variables['qdp'][0,:,:]
qdp_jan.shape
Out[14]:
Here we got just the first month (January) by specifying [0,:,:]
after the variable name. This is called slicing or indexing an array. We are saying "give me everything for month number 0". Now make the plot:
In [15]:
fig4 = plt.figure();
ax4 = fig4.add_subplot(111)
cax4 = ax4.pcolormesh( lon_som, lat_som, -qdp_jan, cmap=plt.cm.seismic, vmin=-700., vmax=700. )
cbar4 = plt.colorbar(cax3)
ax4.set_title( 'CESM: Prescribed heat flux out of ocean (W m$^{-2}$)' )
ax4.axis([0, 360, -90, 90])
ax4.set_xlabel('Longitude')
ax4.set_ylabel('Latitude')
ax4.contour( lon, lat, topo.variables['LANDFRAC'][:], [0.5,0.5], colors='k');
ax4.text(65, 50, 'January', fontsize=12 );
Now try to plot a different month by indexing the array differently.
In [16]:
# The variable we want to plot
qdp = som_input.variables['qdp'][:]
# A function that makes a plot for a given month (which we pass as an argument)
# note... imshow is an alternate way to generate an image from a data array
# Faster than pcolormesh (which is why I use it here)
# but not as flexible
def sh(month):
plt.imshow(np.flipud(-qdp[month,:,:]), cmap=plt.cm.seismic, vmin=-700., vmax=700. )
When you execute the next cell, you should get a figure with a slider above it. Go ahead and play with it.
In [17]:
# This will only work with IPython 2.0 and above
from IPython.html.widgets import interact
interact(sh, month=(0,11,1));
Output from the control run is available on the same data server as above. Look in the folder called som_control
.
There are climatology files for each active model component:
I created these files by averaging over the last 10 years of the simulation. Let's take a look at the atmosphere file. The file is called
som_control.cam.h0.clim.nc
(the file extension .nc
is used to indicate NetCDF format).
In [18]:
atm_control = nc.Dataset( datapath + 'som_control/som_control.cam.h0.clim.nc' + endstr )
Now we would like to see what's in this file, same as before:
In [19]:
for v in atm_control.variables: print v
Lots of different stuff! These are all the different quantities that are calculated as part of the model simulation. Every quantity represents a long-term average for a particular month.
Want to get more information about a particular variable?
In [20]:
print atm_control.variables['SOLIN']
Apparently this is the incoming solar radiation or insolation, with shape (12,96,144) meaning it's got 12 months, 96 latitude points and 144 longitude points.
In [21]:
from IPython.display import Image
energy_budget_fig = 'http://www.atmos.albany.edu/facstaff/brose/classes/ATM623_Spring2015/resources/Handouts/GlobalEnergyBudget.pdf'
Image(url=energy_budget_fig)
Out[21]:
Write a Python function to take a properly area-weighted global average.
Your function should:
numpy
array on a regular latitude-longitude grid (like we've been working with above)One way (not the only way) to do this:
Test your function on the surface temperature in the control run.
Surface temperature is called 'TS'
in the control run data file.
Use your function to calculate annual, global average 'TS'
Verify that your function produces something close to 289.57
If it doesn't, try to find and fix the errors.
Now that you have a working function to take global averages, we can compare some energy budget values against observations.
The model output contains lots of diagnostics about the radiative fluxes. Here some CESM naming conventions to help you find the appropriate output fields:
'F'
are an energy flux of some kind. 'FLNT'
'FL'
means longwave flux (i.e. terrestrial)'FS'
means shortwave flux (i.e. solar)'U'
= up'D'
= down'N'
= net'T'
= top of atmosphere'S'
= surface'FLNT'
means 'net longwave flux at the top of atmosphere', i.e. the outgoing longwave radiation.You wil see that these are all 12 x 96 x 144 -- i.e. a two-dimensional grid for every calendar month.
In [22]:
np.shape( atm_control.variables['T'] )
Out[22]:
And here is some code to plot the average sounding (temperature as function of pressure) at a particular point in the month of January.
In [23]:
plt.plot( atm_control.variables['T'][0,:,70,115], atm_control.variables['lev'] )
plt.gca().invert_yaxis()
plt.ylabel('Pressure (hPa)')
plt.xlabel('Temperature (K)')
Out[23]:
What was the location we just used for that plot? Let's check by indexing the latitude and longitude arrays we previously stored:
In [24]:
lat[70], lon[115]
Out[24]:
These are actually the coordinates of the Albany area (read longitude in degrees east).
When we are done it's a good idea to close the data files:
In [25]:
atm_control.close()
som_input.close()
topo.close()
Thanks for playing!