In [1]:
name = '2016-10-07-creating-netcdf-datasets'
title = 'Creating NetCDF datasets'
tags = 'netcdf, io'
author = 'Denis Sergeev'
In [2]:
from nb_tools import connect_notebook_to_post
from IPython.core.display import HTML, Image
html = connect_notebook_to_post(name, title, tags, author)
NetCDF (network Common Data Form) is a set of interfaces for array-oriented data access and a freely distributed collection of data access libraries for C, Fortran, C++, Java, and other languages.
NetCDF data are:
In [3]:
import os
path_to_file = os.path.join(os.pardir, 'data', 'new.nc')
mode='r'
is the default.mode='a'
opens an existing file and allows for appending (does not clobber existing data)format
can be one ofNETCDF3_CLASSIC
NETCDF3_64BIT
NETCDF4_CLASSIC
NETCDF4
(default).NETCDF4_CLASSIC
uses HDF5 for the underlying storage layer (as does NETCDF4
) but enforces the classic netCDF 3 data model so data can be read with older clients.
In [4]:
from __future__ import division, print_function # py2to3 compatibility
import netCDF4 as nc
import numpy as np
print('NetCDF package version: {}'.format(nc.__version__))
Just to be safe, make sure dataset is not already open
In [5]:
try:
ncfile.close()
except:
pass
# another way of checking this:
# if ncfile.isopen():
# ncfile.close()
ncfile = nc.Dataset(path_to_file, mode='w',
format='NETCDF4_CLASSIC')
print(ncfile)
The ncfile object we created is a container for dimensions, variables, and attributes. First, let's create some dimensions using the createDimension
method.
ncfile.dimensions
dictionary.Setting the dimension length to 0
or None
makes it unlimited, so it can grow.
NETCDF4
files, any variable's dimension can be unlimited. NETCDF4_CLASSIC
and NETCDF3*
files, only one per variable can be unlimited, and it must be the leftmost (slowest varying) dimension.
In [6]:
nlat = 73
nlon = 144
In [7]:
lat_dim = ncfile.createDimension('lat', nlat) # latitude axis
lon_dim = ncfile.createDimension('lon', nlon) # longitude axis
time_dim = ncfile.createDimension('time', None) # unlimited axis
for dim in ncfile.dimensions.items():
print(dim)
netCDF attributes can be created just like you would for any python object.
In [8]:
ncfile.author = 'UEA Python Group'
ncfile.title='My model data'
print(ncfile)
You can also easily delete a netCDF attribute of a Dataset by using delncattr
method:
In [9]:
ncfile.some_unnecessary_attribute = '123456'
ncfile.delncattr('some_unnecessary_attribute')
Now let's add some variables and store some data in them.
The createVariable
method takes 3 mandatory args.
variables
dictionary.NETCDF4
file, any unlimited dimension must be the leftmost one.format='NETCDF4'
) to control compression, chunking, fill_value, etc.
In [10]:
# Define two variables with the same names as dimensions,
# a conventional way to define "coordinate variables".
lat = ncfile.createVariable('lat', np.float32, ('lat',))
lat.units = 'degrees_north'
lat.long_name = 'latitude'
#
lon = ncfile.createVariable('lon', np.float32, ('lon',))
lon.units = 'degrees_east'
lon.long_name = 'longitude'
#
time = ncfile.createVariable('time', np.float64, ('time',))
time.units = 'hours since 1800-01-01'
time.long_name = 'time'
In [11]:
temp = ncfile.createVariable('temp', np.float64,
('time', 'lat', 'lon')) # note: unlimited dimension is leftmost
temp.units = 'K' # degrees Kelvin
temp.standard_name = 'air_temperature' # this is a CF standard name
print(temp)
In [12]:
print("Some pre-defined attributes for variable temp:\n")
print("temp.dimensions:", temp.dimensions)
print("temp.shape:", temp.shape)
print("temp.dtype:", temp.dtype)
print("temp.ndim:", temp.ndim)
In [13]:
# Write latitudes, longitudes.
# Note: the ":" is necessary in these "write" statements
lat[:] = -90. + (180 / nlat) * np.arange(nlat) # south pole to north pole
lon[:] = (180 / nlat) * np.arange(nlon) # Greenwich meridian eastward
In [14]:
ntimes = 5 # 5 Time slices to begin with
In [15]:
# create a 3D array of random numbers
data_arr = np.random.uniform(low=280, high=330, size=(ntimes, nlat, nlon))
# Write the data. This writes the whole 3D netCDF variable all at once.
temp[:] = data_arr # Appends data along unlimited dimension
Let's add another time slice....
In [16]:
# create a 2D array of random numbers
data_slice = np.random.uniform(low=270, high=290, size=(nlat, nlon))
temp[5, :, :] = data_slice # Appends the 6th time slice
print(" Wrote more data, temp.shape is now ", temp.shape)
Note that we have not yet written any data to the time variable. It automatically grew as we appended data along the time dimension to the variable temp
, but the data are missing.
In [17]:
print(time)
times_arr = time[:]
print(type(times_arr), times_arr)
Dashes indicate masked values (where data have not yet been written).
Now, to work with time objects we will need some extra imports:
In [18]:
import datetime as dt
from netCDF4 import date2num, num2date
In [19]:
# 1st 6 days of October.
dates = [dt.datetime(2016, 10, 1, 0),
dt.datetime(2016, 10, 2, 0),
dt.datetime(2016, 10, 3, 0),
dt.datetime(2016, 10, 4, 0),
dt.datetime(2016, 10, 5, 0),
dt.datetime(2016, 10, 6, 0)]
print('\n'.join([str(i) for i in dates]))
In [20]:
times = date2num(dates, time.units)
print(times, time.units) # numeric values
time[:] = times
# read time data back, convert to datetime instances, check values.
print(num2date(time[:], time.units))
In [21]:
# first print the Dataset object to see what we've got
print(ncfile)
# close the Dataset.
ncfile.close()
Check again using ncdump utility
In [22]:
!ncdump -h ../data/new.nc
In [23]:
ncfile = nc.Dataset(path_to_file, 'a')
In [24]:
temp_ave = ncfile.createVariable('zonal_mean_temp',
np.float64, ('time', 'lat'))
temp_ave.units = 'K'
temp_ave.standard_name = 'zonally_averaged_air_temperature'
In [25]:
print(temp_ave)
Create an averaged array using the existing "air_temperature" field:
In [26]:
temp = ncfile.variables['temp'][:]
print(temp.shape)
ave_arr = np.mean(temp[:], axis=2)
print(ave_arr.shape)
Write the data
In [27]:
temp_ave[:] = ave_arr # again, note the square brackets!
In [28]:
ncfile.close()
In [29]:
import matplotlib.pyplot as plt
%matplotlib inline
Open the file for reading
In [30]:
ncfile = nc.Dataset(path_to_file, 'r')
First, try this handy methods of extracting variables: get_variables_by_attributes
. Note: it's available in netCDF4>1.2.0.
In [31]:
try:
ncfile.get_variables_by_attributes(units='K')
ncfile.get_variables_by_attributes(ndim=1)
except:
pass
In [32]:
t = ncfile.variables['zonal_mean_temp']
lats = ncfile.variables['lat']
times = ncfile.variables['time']
dt = num2date(times[:], times.units)
In [33]:
fig, ax = plt.subplots(figsize=(10, 6))
p = ax.contourf(lats[:], dt, t[:], cmap='inferno')
cb = fig.colorbar(p, ax=ax)
ax.tick_params(labelsize=20)
ax.set_xlabel(lats.long_name, fontsize=22)
ax.set_ylabel(times.long_name, fontsize=22)
ax.set_title('{} ({})'.format(t.standard_name.replace('_', ' '), t.units), fontsize=20)
print('Here is the plot')
This notebook is build upon the great materials of the Unidata Python Workshop:
In [34]:
HTML(html)
Out[34]: