To read and write scientific data xarray and PyNIO are very efficient, and easy to use because the internal data structure is netCDF like. Xarray can read netCDF and Grib files, and handle the metadata following the netCDF CF-convention. The same is true for PyNIO, that can additionally read HDF and WRF files.
See also http://xarray.pydata.org and https://github.com/NCAR/pynio.
The examples below shows the use of xarray and PyNIO software to read data from file, work with coordinates and metadata.
We want to start directly with opening and reading the netCDF file tsurf.nc from the subdirectory data.
Import the common used modules and define the variable fname (file name).
In [ ]:
import numpy as np
fname = './data/tsurf.nc'
1. xarray
First we have to load the xarray module, and because we are too lazy, we want to use the abbreviation xr for it.
The function **xr.open_dataset** of xarray is used to read the content of the file.
The variable name ds is often used and is the abbreviation of dataset.
In [ ]:
import xarray as xr
ds = xr.open_dataset(fname)
print(ds)
Printing the dataset content gives you an overview of the dimension and variable names, their sizes, and the global file attributes.
2. PyNIO
Like above, we have to import the module first, but this time it's Nio (that's short enough).
PyNIO's function to read the file is **Nio.open_file**.
The name f of the file object is often used in NCL scripts, that's why we use it here as well, but you can call it what ever you want.
In [ ]:
import Nio
f = Nio.open_file(fname,"r")
print(f)
This is very similar to the ncdump output, and corresponds to the output from xarray.
It is always good to have a closer look at your data, and this can be done very easily with xarray and PyNIO.
Ok, show me the variables stored in that file (ups - just one :D) and the coordinate variables, too.
1. xarray
In [ ]:
coords = ds.coords
variables = ds.variables
print('--> coords: \n\n', coords)
print('--> variables: \n\n', variables)
Ah, that's better. Here we can see the time displayed in a readable way, because xarray use the datetime64 module under the hood. Also the variable and coordinate attributes are shown.
2. PyNIO
Let us see how PyNIO will do that.
In [ ]:
coords_nio = f.dimensions.keys()
variables_nio = f.variables.keys()
print(coords_nio)
print(variables_nio)
#print f.variables['varName']
In [ ]:
coord_nio = f.dimensions.keys()
varNames = f.variables.keys()
for i in varNames:
print(f.variables[i])
print(f.variables[i][:])
In [ ]:
tsurf = ds.tsurf
lat = tsurf.lat
lon = tsurf.lon
print('Variable tsurf: \n', tsurf.data)
print('\nCoordinate variable lat: \n', lat.data)
print('\nCoordinate variable lon: \n', lon.data)
2. PyNIO
If you use PyNIO to open a file the handling differs a little bit. While with xarray you can retrieve the coordinate variable data from the file, PyNIO gets them from the file object.
In [ ]:
tsurf_nio = f.variables['tsurf'][:,:,:]
lat_nio = f.variables['lat'][:]
lon_nio = f.variables['lon'][:]
print('Variable tsurf_nio: \n', tsurf_nio)
print('\nCoordinate variable lat_nio: \n', lat_nio)
print('\nCoordinate variable lon_nio: \n', lon_nio)
The variables have different data types:
In [ ]:
print(type(tsurf))
print(type(tsurf_nio))
In [ ]:
dimensions = ds.dims
shape = tsurf.shape
size = tsurf.size
rank = len(shape)
print('dimensions: ', dimensions)
print('shape: ', shape)
print('size: ', size)
print('rank: ', rank)
2. PyNIO
In [ ]:
dimensions_nio = f.dimensions
shape_nio = tsurf_nio.shape
size_nio = tsurf_nio.size
rank_nio = len(shape_nio) # or rank_nio = f.variables["tsurf"].rank
print('dimensions: ', dimensions_nio)
print('shape: ', shape_nio)
print('size: ', size_nio)
print('rank_nio: ', rank_nio)
In [ ]:
attributes = list(tsurf.attrs)
print('attributes: ', attributes)
2. PyNIO
To get the attributes we have to use the file variable object f.variables['tsurf'] and not the numpy array tsurf_nio.
In [ ]:
attributes_nio = list(f.variables['tsurf'].attributes.keys())
print('attributes_nio: ', attributes_nio)
Let's see how we can get the content of an attribute.
1. xarray
In [ ]:
long_name = tsurf.long_name
units = tsurf.units
print('long_name: ', long_name)
print('units: ', units)
2. PyNIO
And here we have to use the file variable object f.variables['tsurf'] again.
In [ ]:
long_name_nio = f.variables["tsurf"].attributes['long_name']
units_nio = f.variables["tsurf"].attributes['units']
print('long_name_nio: ', long_name_nio)
print('units_nio: ', units_nio)
In [ ]:
time = ds.time.data
print('timestep 0: ', time[0])
2. PyNIO
In [ ]:
time_nio = f.variables['time'][:]
print('timestep 0: ', time_nio[0])
The returned time value is the value stored in the netCDF file and it has to be converted to a date string. To convert the time value to a string like xarray's above, the units and the calendar attribute have to be known. In this example, we use the netCDF4 module to convert the time values.
In [ ]:
import netCDF4
time_nio_units = f.variables["time"].attributes['units']
time_nio_calendar = f.variables["time"].attributes['calendar']
date_nio = netCDF4.num2date(time_nio[0], units=time_nio_units, calendar=time_nio_calendar)
print('timestep 0: ', date_nio)
In [ ]:
import cfgrib
ds2 = xr.open_dataset('./data/MET9_IR108_cosmode_0909210000.grb2', engine='cfgrib')
variables2 = ds2.variables
print('--> variables2: \n\n', variables2)
2. PyNIO
In [ ]:
f2 = Nio.open_file('./data/MET9_IR108_cosmode_0909210000.grb2',"r")
variables_nio2 = f2.variables.keys()
for i in variables_nio2:
print(f2.variables[i])
print(f2.variables[i][:])
In [ ]: