cdms2 : selecting data

In [ ]:
import cdms2
DATAPATH = './data/Obs/mo/sst/HadISST/'
# open the nc file throgh cdms2 module
f = cdms2.open(DATAPATH + 'sst_HadISST_Climatology_1961-1990.nc')
# You can query the file
print f.listvariables()

In [ ]:
x = f('sst') # retrieves the whole dataset - a “slab”
x.shape

In [ ]:
# Selecting a specific area
x = f('sst', latitude=(0., 35.), longitude= (20., 100.))
x.shape

In [ ]:
# Selecting specific times
y = f('sst', time=('0-6-1', '0-6-31'))
print "y shape : ", y.shape  # note : this is monthly climatology file.

# You can also select the specific time from a “slab”
z = x(time=('0-6-1', '0-6-31'))
print "z shape : ", y.shape

In [ ]:
# You can “squeeze” out “singleton” dimensions while retreiving data from file itself
y = f('sst', time=('0-6-1', '0-6-31'), squeeze=1)
print "squeezed", y.shape

In [ ]:
print z.shape
# You can “squeeze” out “singleton” dimensions to loaded array also
z = z(squeeze=1)
print z.shape

In [ ]:
# You can also change the order of returned dimensions
z = x(time=('0-6-1', '0-6-31'), order='xyt')
z.info()
Querying the slab

In [ ]:
# You can query the order of returned dimensions!
print 'Order of dimensions for x is:', x.getOrder()

In [ ]:
# You can change the order of returned dimensions!
x2 = f('sst', latitude=(0., 35.), longitude=(20., 100.), order='xty')
print x2.info()

In [ ]:
# If I just want to ensure that time as my first axis
x3 = f('sst', order='t...')
x3.info()

In [ ]:
# You can check the axis and its properties
lat_axis = x2.getLatitude()
print lat_axis.info()
lat_bounds = lat_axis.getBounds()
#print lat_bounds

In [ ]:
# Axes can be retrieved by index position
my_ax = x.getAxis(0)
print my_ax.info()
Bounds!

In [1]:
#hideme
from IPython.core.display import Image 
Image(filename='img/bounds.png')


Out[1]:
Specifying precise regions to extract
  • If we need a specific point - say lat, lon = 32.1, 100.3

In [ ]:
x3 = f('sst', latitude=(32.1, 32.1, 'cob'), longitude=(100.3, 100.3, 'cob'))
x3.shape
In the args of latitute / longitude, we can pass thrid option as 'cob', 'oob', 'ccb', etc.,
  * The first 2 positions are either 'c'losed or 'o'pen 
  * The 3rd position is 'b'ound or 'e'xtend or .....
cdutil ‐ overview
  • Set of climate data specific utilities
  • The cdutil Package contains a collection of sub‐packages useful to deal with Climate Data
  • Sub‐components are:
    • region : specify our own region as variable instead of lat, lon
    • times : tools to deal with the time dimension
    • horizontal: lat, lon regrid
    • pressure : level/pressure regrid
    • averager : function will help to do average/sum over some particular dimension[s]
    • & more ...
cdutil - region
  • The cdutil.region module allows the user to extract a region "exactly".

    i.e. resetting the latitude and longitude bounds to match the area "exactly",
    therefore computing an "exact" average when passed to the averager function.
  • Predefined regions are:
    • AntarcticZone, AAZ (South of latitude 66.6S)
    • ArcticZone, AZ (North of latitude 66.6N)
    • NorthernHemisphere, NH # useful for dataset with latitude crossing the equator
    • SouthernHemisphere, SH
    • Tropics (latitudes band: 23.4S, 23.4N)
  • cdms2 selector to extract “exact” region

    i.e. reset bounds correctly so averaging account for only "actual" area averaged, not the full cell.

    data = f("var", cdutil.region.NorthernHemisphere)

In [ ]:
import cdutil
data = f("sst", cdutil.region.NorthernHemisphere) 
data.shape
  • Creating your own regions

In [ ]:
# specify NINO domain latitute & longitude
NINO34 = cdms2.selectors.Selector(cdutil.region.domain(latitude=(-5., 5.), longitude=(190., 240.)))

In [ ]:
# specify Indian domain latitute & longitude
India = cdms2.selectors.Selector(cdutil.region.domain(latitude=(0., 40.), longitude=(60., 100.)))

In [ ]:
Idata = f("sst", India) 
Idata.shape
Interpolation (re‐gridding)

Horizontal Regrid

  • Lets do interpolation over latitude & longitude from one grid into another grid/resolution

In [ ]:
# Suppose we have a slab
ds1 = f('sst', latitude=(0., 35.), longitude=(20., 100.), time=('0-6-1', '0-6-31'), order='tyx')
f.close()
print 'ds1.shape = ', ds1.shape, 'Axis order=', ds1.getOrder()

In [ ]:
# Let us now extract another dataset.
f2 = cdms2.open('data/Climatology/NCEP_NCAR_Climatology_ltm/Surface/slp.mon.ltm.nc')
ds2 = f2('slp', latitude=(0., 35.), longitude=(20., 100.), time=('1-6-1', '1-6-30'))
print 'ds2.shape = ', ds2.shape, 'Axis order =', ds2.getOrder()
  • I want to transform ds2 into the grid in ds1

In [ ]:
# Lets transform ds2 into the grid in ds1
ds3 = ds2.regrid(ds1.getGrid())
print 'ds3.shape = ', ds3.shape, 'Axis order =', ds3.getOrder()

In [ ]:
# getGrid() function returns the grid type and resolution.
ds1.getGrid()
  • Alternate regridder (generate it as function)

In [ ]:
# import Horizontal function from regrid2 module
from regrid2 import Horizontal
# get the source grid
ingrid = ds2.getGrid()
# get the destination grid
outgrid = ds1.getGrid()
# generate the function by passing in & out grid as args
# Horizontal will be used to regrid over lat, lon region
horizontalRegridFunc = Horizontal(ingrid, outgrid)
# pass the data into the function which we generated in just above!
new_ds2 = horizontalRegridFunc(ds2)   
print "original data shape", ds2.shape
print "after regrid", new_ds2.shape
  • We can retain the function 'horizontalRegridFunc(data)' in the scope of program and use it whenever we need to convert same ingrid data into that outgrid resolution.

Vertical Regrid

  • Lets do interpolation over level axis from one level/pressure grid into another level/pressure grid/resolution
  • You can regrid pressure‐level coordinates in the vertical axis using the pressureRegrid() method.
  • You need to define, or use an existing, vertical axis.
  • Then use the pressureRegrid method on the variable you wish to regrid, passing it the new level as the argument
  • If var is the variable to regrid and the newlevs is the vertical axis to regrid to:
     >>> var_on_new_levels = var.pressureRegrid(levout)  

In [ ]:
# open & get uwnd data with pressure level
f3 = cdms2.open('data/Climatology/NCEP_NCAR_Climatology_ltm/Levels/uwnd.jan.ltm.nc')
u = f3('uwnd')
f3.close()
print "U shape", u.shape
# get the level axis
u.getLevel()

In [ ]:
# this will return the actual value or index of the level axis
u.getLevel()[:]

Lets create axis which contains only 850 hPa and 200 hPa pressure levels


In [ ]:
# Lets create our own level axis
levAxis = cdms2.createAxis([850., 200.], id='level')
levAxis.designateLevel()
levAxis

In [ ]:
# do pressure regrid
u_850_200 = u.pressureRegrid(levAxis)
print "Original U shape", u.shape
print "After pressure regrided u shape", u_850_200.shape
u_850_200.getLevel()
  • Alternate regridder (generate it as function)

In [ ]:
# import PressureRegridder function from regrid2 module
from regrid2 import PressureRegridder
# generate the function by passing in & out level axis as args
# PressureRegridder will be used to regrid over level axis
verticalRegridFunc = PressureRegridder(u.getLevel(), levAxis)
# pass the data into the function which we generated in just above!
u_new = verticalRegridFunc(u)
print "Original U shape", u.shape
print "After pressure regrided u shape", u_850_200.shape
  • We can do regrid even different level axis like as follows

In [ ]:
import numpy
lev_50_intervals = numpy.arange(1000, 10, -50)
print lev_50_intervals
# Lets create our own level axis
lev50Axis = cdms2.createAxis(lev_50_intervals, id='level')
lev50Axis.designateLevel()
print lev50Axis

In [ ]:
# do pressure regrid
u_50_intervals = u.pressureRegrid(lev50Axis)
print "Original U shape", u.shape
print "After pressure regrided u shape", u_50_intervals.shape
u_50_intervals.getLevel()
  • We can retain the function 'verticalRegridFunc(data)' in the scope of program and use it, whenever we need to convert same ingrid data into that outgrid resolution.
  • Also uv-cdat supports other regridders like
    • BicubicRegridder,
    • BilinearRegridder,
    • ConservativeRegridder,
    • CrossSectionRegridder,
    • DistwgtRegridder,
    • ESMFRegrid,
    • ESMP,
    • & more

In [ ]:
# return the available module functions
import regrid2
dir(regrid2)
cdms2 - Using mask
  • Modify the masked array of the data by
      - get masked array from one dataset or create it by yourself
      - set that masked array to another dataset 

In [ ]:
# get the mask array from ds1 (sst) data
sst_mask = cdms2.MV2.getmask(ds1(order='tyx'))  # or 'ds1.mask' will return the same
print 'sst_mask.shape after reorder = ', sst_mask.shape
print sst_mask.__class__
# So we resize the mask with respect to ds3
sst_mask = sst_mask.resize(ds3.shape)
# ds3.info()
# Create new variable with new mask 
ds4 = cdms2.createVariable(ds3[:], mask=sst_mask, id='masked_slp', fill_value=1.e+20)
ds4.comments = 'set mask from observed HadISST dataset'
ds4.info()
cdscan
  • A utility that helps you manage files better.
  • When you have many .nc (or .ctl) files you can use this utility to generate a single “xml” file that makes life simple.
  • Try the following in your terminal:

    $ cdscan -x "some_filename.xml" DATA_PATH/*.nc
  • You can also change the time axis while you are at it!

    $ cdscan -x "some_filename.xml" -i 1 -r "months since 1-1-1" DATA_PATH2/*.nc
    
  • To know about more command line options

    $ cdscan --help
    
  • Lets do it!
  • list out the directory 'data/Climatology/NCEP_NCAR_Climatology_ltm/Levels'

In [ ]:
!ls data/Climatology/NCEP_NCAR_Climatology_ltm/Levels

We have monthly climatology in "12 uwnd" individual monthly and "5 vwnd" mixed monthly nc files. Lets do cdscan !


In [ ]:
# Lets find system uv-cdat cdscan path
import sys
cdscan = sys.path[1].split('lib')[0] + 'bin/cdscan'
print "cdscan command path is '%s'" % cdscan
print "* Use the above cdscan command system path to execute the follwoing code *.\n **Kindly change the cdscan path line startswith ! mark **"

In [ ]:
# make sure cdscan cmd path is same as above, otherwise change the below cmd
import os
current_dir = os.getcwd()
datapath = 'data/Climatology/NCEP_NCAR_Climatology_ltm/Levels'
if not current_dir.endswith(datapath): os.chdir(datapath)
# cdscan -x out.xml -r 'months since 1-1-1' *.nc
!/usr/local/uvcdat/1.5.1/bin/cdscan -x wnd.xml -r 'months since 1-1-1' *.nc 
# back to previous directory
if not current_dir.endswith(datapath): os.chdir(current_dir)
print 'current', os.getcwd()

We did cdscan and created single xml file (wnd.xml) which has all individual nc files links and meta data in it


In [ ]:
!ls -lrth data/Climatology/NCEP_NCAR_Climatology_ltm/Levels

Look at the file size of the xml file which we created by cdscan command. It is in size of KB only know.


In [ ]:
# Lets access the xml file through cdms2
f4 = cdms2.open('data/Climatology/NCEP_NCAR_Climatology_ltm/Levels/wnd.xml')
print "Available Vars in wnd.xml : ",f4.listvariable()
print "\n U wind info\n\n", f4['uwnd'].info()
print "\n U Wind Time : ", f4['uwnd'].getTime().asComponentTime()
print "\n\n\nV wind info\n\n", f4['vwnd'].info()
print "\n V Wind Time : ", f4['vwnd'].getTime().asComponentTime()
f4.close()
  • We can do cdsan without worrying what variables each nc files contains.
  • Just do cdscan over set of nc/ctl/pp/hdf/ binary files where time axis comes in.
  • Even you can do cdscan over pressure level (where each binary files contain each pressure level axis)
  • You can overwrite the time axis / level axis while you creating xml file by cdscan command
  • Read the documentation of cdscan to play with it!
cdutil - times
cdutil.times – for time axes, geared toward climate data
  • Climatology, Departures, Anomalies Tools works on BOUNDS, NOT on time values, designed for monthly seasons, 
    but one can create an engine for other kind of data (daily, yearly, etc...). 
    ac = cdutil.times.ANNUALCYCLE.climatology(data) # data may be monthlyDatasetOfManyYears - cdms variable
  • In order to set bounds you can use:
    cdutil.setTimeBoundsMonthly(Obj)
    cdutil.setTimeBoundsYearly(Obj)
    cdutil.setTimeBoundsDaily(Obj, frequency=1)
    (Obj can be data slab or time axis)
  • Create your own seasons
    MONSOON = cdutil.times.Seasons('JJAS')

In [ ]:
# Create your own seasons:
import cdutil, cdms2
MONSOON = cdutil.times.Seasons('JJAS')

f5 = cdms2.open('data/Climatology/NCEP_NCAR_Climatology_ltm/Levels/wnd.xml', 'r')
uwnd_fullyear_data = f5('uwnd', level=850)#, squeeze=1)
print 'uwnd_fullyear_data.shape =',uwnd_fullyear_data.shape

# set (monthly) bounds to time axis of data
cdutil.setTimeBoundsMonthly(uwnd_fullyear_data)

# get average over MONSOON season
monsoon_averaged_data = MONSOON(uwnd_fullyear_data)
print 'monsoon_averaged_data.shape =',monsoon_averaged_data.shape, 'averaged over JJAS only'
  • cdutil imports everything in the times module so you can just call e.g.:
    cdutil.setTimeBoundsMonthly(slab/axis)
The importance of bounds

In [2]:
#hideme
from IPython.core.display import Image 
Image(filename='img/tbounds.png')


Out[2]:
  • UV-CDAT used to set bounds automatically. E.g.:
     
    longitude = [0, 90, 180, 270]
    ∴ bounds = [[-45, 45], [45, 135],
              [135, 225], [225, 315]] 
  • Seems reasonable, but imagine a monthly mean time series where the times are recorded on 1st day of each month:
     timeax = [“1999-1-1”, “1999-2-1”, ..., “2100-12-1”] 
  • UV-CDAT assumes that each month represents the period of 15th last month to 15th this month.
  • Since cdutil tools use bounds they will be misinterpreting the data.
Need to set the bounds sensibly:
>>> cdutil.setTimeBoundsMonthly(timeax) 
cdutil : Pre‐defined time‐related means
  • DJF, MAM, JJA, SON (seasons)
    >>> mam_mean = cdutil.MAM(my_var)

In [ ]:
mam_mean = cdutil.MAM(uwnd_fullyear_data)
print "uwnd full year shape", uwnd_fullyear_data.shape
print "MAM mean shape", mam_mean.shape
  • DJF works like average of previous year 'Dec' & current year 'Jan' & 'Feb' months. So if we pass single year, it returns avg of 'JF' & 'D' seperately.

In [ ]:
djf_mean = cdutil.DJF(uwnd_fullyear_data)
print "uwnd full year shape", uwnd_fullyear_data.shape
print "DJF mean shape", djf_mean.shape
  • SEASONALCYCLE (means for the 4 predefined seasons [DJF, MAM, JJA, SON ]) – array of above.
    >>> seas_mns = cdutil.SEASONALCYCLE(my_var)

In [ ]:
seas_mean = cdutil.SEASONALCYCLE(uwnd_fullyear_data)
print "uwnd full year shape", uwnd_fullyear_data.shape
print "Seasonal mean shape", seas_mean.shape
print "Since it has only one year data, DJF gives two different seasons like 'JF' & 'D'"
seas_mean.getTime().asComponentTime()
  • YEAR (annual means)
    >>> annual_mean = cdutil.YEAR(my_var)

In [ ]:
annual_mean = cdutil.YEAR(uwnd_fullyear_data)
print "uwnd full year shape", uwnd_fullyear_data.shape
print "yearly mean shape", annual_mean.shape
  • ANNUALCYCLE (monthly means for each month of the year)
  • Works well for daily dataset
    >>> monthly_mean = cdutil.ANNUALCYCLE(my_var)

In [ ]:
monthly_mean = cdutil.ANNUALCYCLE(uwnd_fullyear_data)
print "uwnd full year shape", uwnd_fullyear_data.shape
print "monthly mean shape", monthly_mean.shape
print "**Since uwnd already monthly average dataset, we cant see any difference in it**"
  • EXERCISE: Try calculating the climatological annual cycle for the NCEP Reanalysis data you have read in.
cdutil : Climatologies and Departures

Season extractors have 2 functions available:

  • climatology: which computes the average of all seasons passed. ANNUALCYCLE.climatology(), will return the 12 month annual cycle for the slab:
    >>> monthly_climatology = cdutil.ANNUALCYCLE.climatology(v)
  • If we pass multiple years, then also we will endup with time length 12.

In [ ]:
monthly_climatology = cdutil.ANNUALCYCLE.climatology(uwnd_fullyear_data)
print "uwnd full year shape", uwnd_fullyear_data.shape
print "monthly climatology shape", monthly_climatology.shape
  • departures: which given an optional climatology will compute seasonal departures from it.
    >>> anomaly = cdutil.ANNUALCYCLE.departures(v, cli60_99)

Note that the second argument is optional but can be a pre‐computed climatology such as here cli60_99 is a 1960‐1999 climatology but the variable v is defined from 1900‐2000. If not given then the overall climatology for v is used.</pre>


In [ ]:
anomaly = cdutil.ANNUALCYCLE.departures(uwnd_fullyear_data)
print "uwnd full year shape", uwnd_fullyear_data.shape
print "monthly anomaly shape", monthly_climatology.shape
Simple user‐defined averaging
  • You can create your own simple averages using arrays, slabs or variables in the usual way:
    – Averaging over 4 time steps:
    >>> t.shape
    (4, 181, 360)
    >>> avg = (t[0]+t[1]+t[2]+t[3])/4
  • Drawbacks:
    • Doesn’t retain your metadata.
    • Cannot average simply across axes within a variable.
MV2 Averaging
  • The MV2 module has an averaging function:
    >>>MV2.average(x, axis=0, weights=None, returned=0)
      – computes the average value of the non‐masked elements of x along the selected axis.
        If weights is given, it must match the size and shape of x, and the value returned is:      
$$ \frac{\sum(weights_i * x_i)}{\sum(weights_i)}$$
    – elements corresponding to those masked in x or weights are ignored. 
      If returned, a 2‐tuple consisting of the average and the sum of the weights is returned.
MV2 Averaging: example
  • To calculate a set of zonal means:

In [ ]:
import cdms2, MV2
f = cdms2.open('data/Climatology/NCEP_NCAR_Climatology_ltm/Surface/slp.mon.ltm.nc')
data = f('slp')
print data.shape
zm = MV2.average(data, axis=2)
print zm.shape
print zm.info()
The cdutil "averager" function
The "cdutil.averager()" function is the key to spatial and temporal averaging in UV-CDAT. 
  • Masks are dealt with implicitly.
  • Powerful area averaging function.
  • Provides control over the order of operations (i.e. which dimensions are averaged over first).
  • Allows weightings for the different axes:
    • pass your own array of weights for each dimension, use the default (grid) weights or specify equal weighting.

Usage of cdutil.averager


 result = averager( V, axis=axisoptions, 
                        weights=weightoptions,
                        action=actionoptions,
                        returned=returnedoptions,
                        combinewts=combinewtsoptions) 

axisoptions has to be a string. You can pass axis='tyx', or '123', or 'x (plev)'.

weightoptions is one of 'generate' | 'weighted' | 'equal' | 'unweighted' | array | Masked Variable

actionoptions is 'average' | 'sum' [Default = 'average']. 

You can either return the weighted average or the weighted sum of the data.
Example: Region Averaging, and Climatology

In [ ]:
import cdutil
# define the your custom regions.
NINO3 = cdms2.selectors.Selector(cdutil.region.domain(latitude=(-5., 5.,'ccb'), longitude=(210., 270., 'ccb')))
fslp = cdms2.open('data/Obs/mo/sst/HadISST/sst_HadISST_Climatology_1961-1990.nc')
#'sst_HadISST_1870-1_2011-1.nc' !!!!!! use this for actual nino3 sst data
nino3_data = fslp('sst', NINO3)
cdutil.setTimeBoundsMonthly(nino3_data)
print "data shape : ", nino3_data.shape, "& its order : ", nino3_data.getOrder()
# Compute the Spatial average
nino3_average = cdutil.averager(nino3_data, axis='xy')
print "after averged over lat, lon shape : ", nino3_average.shape
# Anomaly from climatology computed over 1961-1990
nino3_slice = nino3_average(time=('0-1-16 12:0:0.0', '0-12-16 12:0:0.0'))
#nino3_slice = nino3_average(time=('1961-1-1', '1990-12-31'))  # !!!!!! use this for actual nino3 sst data
nino3_clim = cdutil.ANNUALCYCLE.climatology(nino3_slice)
print "climatology shape : ", nino3_clim.shape
# Now departures
nino3_anomaly = cdutil.ANNUALCYCLE.departures(nino3_average, nino3_clim)
print "anomaly shape : ", nino3_anomaly.shape
EXERCISE
  • Extract the SST data and compute global anomalies from the 1961‐1990 climatology for the whole length of dataset.
  • Average the anomaly data over x and y axes using “equal” weights for both axes and compare against area “weighted” average.
genutil : general utilities
  • genutil.statistics: set of basic statistical functions
  • correlation,
  • covariance,
  • geometricmean,
  • laggedcorrelation,
  • laggedcovariance,
  • linearregression,
  • meanabsdiff,
  • median,
  • array_indexing,
  • percentiles,
  • rank,
  • rms,
  • std,
  • variance,
  • autocorrelation,
  • autocovariance,
  • and more
Statistics Example
>>> c1 = genutil.statistics.correlation(a, b, axis='t')
>>> c1.shape
>>> c2 = genutil.statistics.correlation(a, b, axis='xy')
>>> c2.shape
Support for other grid types

In [3]:
#hideme
from IPython.core.display import Image 
Image(filename='img/grids.png')


Out[3]:
  • RectGrid ‐ Associated latitude and longitude are 1‐D axes, with strictly monotonic values.
  • CurveGrid ‐ Latitude and longitude are 2‐D coordinate axes (Axis2D).
  • GenericGrid ‐ Latitude and longitude are 1‐D auxiliary coordinate axes (AuxAxis1D)
Acknowledgments
  • Dean Williams, Charles Doutriaux (PCMDI, LLNL).
  • Dr. Johnny Lin (Physics Department, North Park University, Chicago, Illinois).
  • Dr.Krishna AchtuaRao (Centre for Atmospheic Sciences, Indian Institute of Technology Delhi, India).
License
  • IPython Notebook Created by
      Arulalan.T 
      Date : 18.06.2014
      Project Associate,
      Centre for Atmospheic Sciences,
      Indian Institute of Technology Delhi, India.
      Blog : http://tuxcoder.wordpress.com 
      Repo : https://github.com/arulalant/UV-CDAT-IPython-Notebooks
      
  • This work (IPython Notebook & Html Slide) is licensed under a Creative Commons Attribution‐NonCommercial‐ShareAlike 3.0
  • Includes all python scripts
  • Freeware license (only for research purpose, not for commercial) for 'data' directory which contains '.nc' files (any one can download it from TIGGE http://tigge-portal.ecmwf.int/ website)