Climate model output is usually provided with geometric information like the position of the cell center and boundaries. Gridded observational datasets often lack these informations and provide only the position of either the gridcell center or cell boundary.
This makes the calculation of common vector calculus operators like the gradient difficult, and results depend on the particular method used.
xgcm
can infer the cell boundary or cell center location depending on the
geometry of the gridded source dataset. This enables consistent and easily
reproducible calculations across model and observational datasets.
The autogenerate module enables the user to apply the same methods to both model output and observational products, which enables better comparison and a unified workflow using different sources of data.
In this case xgcm can infer the missing coordinates to enable the creation of a grid object. Below we will illustrate how to infer coordinates for several example datasets (nonperiodic and periodic) and show how the resulting dataset can be used to perform common calculations like gradients and distance/area weighted averages on observational data.
First let's import xarray and xgcm
In [ ]:
import numpy as np
import xarray as xr
from xgcm import Grid
from xgcm.autogenerate import generate_grid_ds
import matplotlib.pyplot as plt
%matplotlib inline
Below we will create a synthetic temperature profile with decreasing temperature at depth, with unevenly spaced depth intervals (commonly found in hydrographic data).
In [ ]:
# create a synthetic ocean temperature profile with uneven spacing in depth
z = np.hstack([np.arange(1,10, 1), np.arange(10,200, 10), np.arange(200,700, 20)])
# Create synthetic temperature profile with maximum temperature gradient at mid depth (e.g. the thermocline)
temp = ((np.cos(np.pi*z/700) + 1) + np.exp(-z/350) / 2) * 10
# Convert to xarray.Dataset
ds = xr.DataArray(temp, dims=['depth'], coords={'depth':z}).to_dataset(name='temperature')
ds.temperature.plot(y='depth', yincrease=False)
generate_grid_ds can infer the missing cell positions based on the given position (defaults to cell center) and the axis, which is defined by passing a dictionary with the physical axis as key and the dataset dimensions belonging to that axis as values.
In [ ]:
# Generate 'full' dataset, which includes additional coordinate `depth_left` and appropriate attributes.
ds_full = generate_grid_ds(ds, {'Z':'depth'})
print(ds_full)
print(ds.depth.data)
print(ds_full.depth_left.data)
We see now that a new dimension depth_left
was created, with cell boundaries shifted towards the surface
The default behaviour of generate_grid_ds is to extrapolate the grid position to the 'left' (e.g. towards the surface for a depth profile), assuming that the spacing in the two cells closest to the boundary (here: the first and second cell) is equal. Particular geometries might require adjustments of the boundary treatment, by specifying e.g. pad=0
to ensure the topmost cell boundary is located at the sea surface.
In [ ]:
# Create grid object
grid = Grid(ds_full, periodic=False)
print(grid)
Now we have all the tools we need to calculate the vertical gradient just like with numerical model output
In [ ]:
# Calculate vertical distances located on the cellboundary
ds_full.coords['dzc'] = grid.diff(ds_full.depth, 'Z', boundary='extrapolate')
# Calculate vertical distances located on the cellcenter
ds_full.coords['dzt'] = grid.diff(ds_full.depth_left, 'Z', boundary='extrapolate')
In [ ]:
# note that the temperature gradient is located on the `depth_left` dimension,
# but no temperature information is available, to infer the gradient in the topmost grid cell.
# The following will pad with nan towards the surface. Alternatively the values could be padded with
# with a particular value or linearly extrapolated.
dtemp_dz = grid.diff(ds.temperature, 'Z', boundary='fill', fill_value=np.nan) / ds_full.dzc
print(dtemp_dz)
fig, ax1 = plt.subplots()
ax1.invert_yaxis()
ax2 = ax1.twiny()
ds.temperature.plot(ax=ax1, y='depth', color='C0')
ax1.set_xlabel('temperature [deg C]', color='C0')
dtemp_dz.plot(ax=ax2, y='depth_left', color='C1')
ax2.set_xlabel('vertical temperature gradient [deg C / m]', color='C1');
In [ ]:
mean_temp = ds_full.temperature.mean('depth')
mean_temp_weighted = (ds_full.temperature * ds_full.dzt).sum('depth') / ds_full.dzt.sum('depth')
print(mean_temp.data)
print(mean_temp_weighted.data)
In [ ]:
# hack to make file name work with nbsphinx and binder
import os
fname = '../datasets/uvwnd.10m.gauss.2018.nc'
if not os.path.exists(fname):
fname = '../' + fname
In [ ]:
ds = xr.open_dataset(fname)
ds_full = generate_grid_ds(ds, {'X':'lon', 'Y':'lat'})
ds_full
As in the depth profile the longitude and latitude values are extended to the left
when the defaults are used.
However, since the latitude is not periodic we can specify the position to extend to as outer
(more details here). This will extend the latitudinal positions both to the left and right, avoiding missing values later on.
In [ ]:
grid = Grid(ds_full, periodic=['X'])
grid
Now we compute the difference (in degrees) along the longitude and latitude for both the cell center and the cell face. Since we are not taking the difference of a data variable across the periodic boundary, we need to specify the boundary_discontinutity
in order to avoid the introduction of artefacts at the boundary.
In [ ]:
dlong = grid.diff(ds_full.lon, 'X', boundary_discontinuity=360)
dlonc = grid.diff(ds_full.lon_left, 'X', boundary_discontinuity=360)
dlonc_wo_discontinuity = grid.diff(ds_full.lon_left, 'X')
dlatg = grid.diff(ds_full.lat, 'Y', boundary='fill', fill_value=np.nan)
dlatc = grid.diff(ds_full.lat_left, 'Y', boundary='fill', fill_value=np.nan)
dlonc.plot()
dlonc_wo_discontinuity.plot(linestyle='--')
Without adding the boundary_discontinuity
input, the last cell distance is calculated incorectly.
The values we just calculated are actually not cell distances, but instead differences in longitude and latitude (in degrees). In order to calculate operators like the gradient dlon
and dlat
have to be converted into approximate cartesian distances on a globe.
In [ ]:
def dll_dist(dlon, dlat, lon, lat):
"""Converts lat/lon differentials into distances in meters
PARAMETERS
----------
dlon : xarray.DataArray longitude differentials
dlat : xarray.DataArray latitude differentials
lon : xarray.DataArray longitude values
lat : xarray.DataArray latitude values
RETURNS
-------
dx : xarray.DataArray distance inferred from dlon
dy : xarray.DataArray distance inferred from dlat
"""
distance_1deg_equator = 111000.0
dx = dlon * xr.ufuncs.cos(xr.ufuncs.deg2rad(lat)) * distance_1deg_equator
dy = ((lon * 0) + 1) * dlat * distance_1deg_equator
return dx, dy
ds_full.coords['dxg'], ds_full.coords['dyg'] = dll_dist(dlong, dlatg, ds_full.lon, ds_full.lat)
ds_full.coords['dxc'], ds_full.coords['dyc'] = dll_dist(dlonc, dlatc, ds_full.lon, ds_full.lat)
Based on the distances we can estimate the area of each grid cell and compute the area-weighted meridional average temperature
In [ ]:
ds_full.coords['area_c'] = ds_full.dxc * ds_full.dyc
In [ ]:
du_dx = grid.diff(ds_full.uwnd, 'X') / ds_full.dxg
du_dx
The values of the gradient are correctly located on the cell boundary on the x-axis and on the cell center in the y-axis
In [ ]:
import cartopy.crs as ccrs
fig, axarr = plt.subplots(ncols=2, nrows=3, figsize=[16,20],
subplot_kw=dict(projection=ccrs.Orthographic(0, -30)))
for ti,tt in enumerate(np.arange(0,30, 10)):
ax1 = axarr[ti,0]
ax2 = axarr[ti,1]
time = ds_full.time.isel(time=tt).data
ds_full.uwnd.isel(time=tt).plot(ax=ax1, transform=ccrs.PlateCarree(),robust=True)
du_dx.isel(time=tt).plot(ax=ax2, transform=ccrs.PlateCarree(), robust=True)
ax1.set_title('Zonal Surface Wind %s' %time)
ax2.set_title('Zonal Gradient of Zonal Surface Wind %s' %time)
for ax in [ax1, ax2]:
ax.set_global(); ax.coastlines();
The resulting gradient is correctly computed across the periodic (x-axis) boundary.
In [ ]:
u = ds_full.uwnd.mean('time')
u_mean = u.mean('lat')
u_mean_weighted = (u * u.area_c).sum('lat') / (u.area_c).sum('lat')
u_mean.plot(label='unweighted mean')
u_mean_weighted.plot(label='area weighted mean')
plt.legend()
plt.ylabel('zonal wind [m/s]')
plt.gca().autoscale('tight')
Thes two lines show the importance of applying a weighted mean, when the grid spacing is irregular (e.g. datasets gridded on a regular lat-lon grid).