The simplest way that I have found for opening and exploring NetCDF files in python, depends on the python package called xray. Here is a little graphical representation of the way to think about this data. For clarification on how multidimensional data are represented in xray, or to figure out how to download the package, visit: http://xray.readthedocs.org/en/latest/
In [1]:
from IPython.display import Image
Image(url='http://xray.readthedocs.org/en/latest/_images/dataset-diagram.png', embed=True, width=950, height=300)
Out[1]:
The first step when you receive a NetCDF file is to open it up and see what it contains.
In [14]:
import os
import posixpath # similar to os, but less dependant on operating system
import numpy as np
import pandas as pd
import xray
In [19]:
NETCDF_DIR = os.getcwd().replace('\\','/')+'/raw_netCDF_output/'
datafile = 'soil'
In [23]:
nc_file = os.listdir(NETCDF_DIR+datafile)[-1]
nc_path = posixpath.join(NETCDF_DIR, datafile, nc_file)
ds = xray.open_dataset(nc_path)
ds
To inspect the coordinates at a specific site (for example, 'Open_W') we just write:
In [80]:
ds.sel(site='Open_W').coords
Out[80]:
Now if we are only interested in soil moisture at the upper depth at a specific time (don't forget that the time is in UTC unless the timezone is explicit), we can pull out just that one data point:
In [61]:
print ds.VW_05cm_Avg.sel(site='Open_W', time='2015-06-02T06:10:00')
And if we are only interested in the actual value rather than all the attributes:
In [60]:
print ds.VW_05cm_Avg.sel(site='Open_W', time='2015-06-02T06:10:00').values
In [86]:
ds_dict = {}
nc_files = os.listdir(NETCDF_DIR+datafile)
for nc_file in nc_files:
nc_path = posixpath.join(NETCDF_DIR, datafile, nc_file)
ds = xray.open_dataset(nc_path)
date = nc_file.split('Tower_')[1].split('.')[0]
ds_dict.update({date: ds})
In [88]:
ds_dict.keys()
Out[88]:
In [96]:
xray.concat(ds_dict.values(), dim='time')
Some analyses are no doubt easier to carry out in pandas, but luckily xray makes it very easy to move back and forth between the two packages. To converat an xray Dataset object to a pandas MultiIndex DataFrame object, just run:
In [65]:
df = ds.to_dataframe()
In [79]:
for i in range(len(df.index.levels)):
print 'df.index.levels[{i}]\n{index}\n'.format(i=i, index=df.index.levels[i])