Examinging NetCDF files using xray

The simplest way that I have found for opening and exploring NetCDF files in python, depends on the python package called xray. Here is a little graphical representation of the way to think about this data. For clarification on how multidimensional data are represented in xray, or to figure out how to download the package, visit: http://xray.readthedocs.org/en/latest/


In [1]:
from IPython.display import Image
Image(url='http://xray.readthedocs.org/en/latest/_images/dataset-diagram.png', embed=True, width=950, height=300)


Out[1]:

Loading a NetCDF file into a dataset

The first step when you receive a NetCDF file is to open it up and see what it contains.


In [14]:
import os
import posixpath      # similar to os, but less dependant on operating system
import numpy as np
import pandas as pd
import xray

In [19]:
NETCDF_DIR = os.getcwd().replace('\\','/')+'/raw_netCDF_output/'
datafile = 'soil'

In [23]:
nc_file = os.listdir(NETCDF_DIR+datafile)[-1]
nc_path = posixpath.join(NETCDF_DIR, datafile, nc_file)
ds = xray.open_dataset(nc_path)
ds

Inspecting and selecting from dataset

To inspect the coordinates at a specific site (for example, 'Open_W') we just write:


In [80]:
ds.sel(site='Open_W').coords


Out[80]:
Coordinates:
    lon        float32 36.8628
    site       |S12 'Open_W'
    lat        float32 0.48497
  * time       (time) datetime64[ns] 2015-06-02 2015-06-02T00:10:00 2015-06-02T00:20:00 ...
    elevation  int32 1650

Now if we are only interested in soil moisture at the upper depth at a specific time (don't forget that the time is in UTC unless the timezone is explicit), we can pull out just that one data point:


In [61]:
print ds.VW_05cm_Avg.sel(site='Open_W', time='2015-06-02T06:10:00')


<xray.DataArray 'VW_05cm_Avg' ()>
array(0.18320590257644653, dtype=float32)
Coordinates:
    lat        float32 0.48497
    elevation  int32 1650
    lon        float32 36.8628
    site       |S12 'Open_W'
    time       datetime64[ns] 2015-06-02T06:10:00
Attributes:
    units: percent
    comment: Avg
    content_coverage_type: physicalMeasurement

And if we are only interested in the actual value rather than all the attributes:


In [60]:
print ds.VW_05cm_Avg.sel(site='Open_W', time='2015-06-02T06:10:00').values


0.183205902576

Concatenate files


In [86]:
ds_dict = {}
nc_files = os.listdir(NETCDF_DIR+datafile)
for nc_file in nc_files:
    nc_path = posixpath.join(NETCDF_DIR, datafile, nc_file)
    ds = xray.open_dataset(nc_path)
    date = nc_file.split('Tower_')[1].split('.')[0]
    ds_dict.update({date: ds})

In [88]:
ds_dict.keys()


Out[88]:
['2015_150',
 '2015_144',
 '2015_145',
 '2015_146',
 '2015_147',
 '2015_153',
 '2015_141',
 '2015_151',
 '2015_143',
 '2015_148',
 '2015_149',
 '2015_152']

In [96]:
xray.concat(ds_dict.values(), dim='time')


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-96-3b302a94bcf7> in <module>()
----> 1 xray.concat(ds_dict.values(), dim='time')

C:\Anaconda\lib\site-packages\xray-0.4.1_106_g52e1fdd-py2.7.egg\xray\core\alignment.pyc in concat(objs, dim, indexers, mode, concat_over, compat)
    276         raise ValueError('must supply at least one object to concatenate')
    277     cls = type(first_obj)
--> 278     return cls._concat(objs, dim, indexers, mode, concat_over, compat)
    279 
    280 

C:\Anaconda\lib\site-packages\xray-0.4.1_106_g52e1fdd-py2.7.egg\xray\core\dataset.pyc in _concat(cls, datasets, dim, indexers, mode, concat_over, compat)
   1698                     verb = 'equal' if compat == 'equals' else compat
   1699                     raise ValueError(
-> 1700                         'variable %r not %s across datasets' % (k, verb))
   1701 
   1702         # we've already verified everything is consistent; now, calculate

ValueError: variable u'site' not equal across datasets

Convert to pandas dataframe

Some analyses are no doubt easier to carry out in pandas, but luckily xray makes it very easy to move back and forth between the two packages. To converat an xray Dataset object to a pandas MultiIndex DataFrame object, just run:


In [65]:
df = ds.to_dataframe()

In [79]:
for i in range(len(df.index.levels)):
    print 'df.index.levels[{i}]\n{index}\n'.format(i=i, index=df.index.levels[i])


df.index.levels[0]
Index([u'Euphorbia_SW', u'Glade_SE', u'Open_W', u'Riverine_N'], dtype='object')

df.index.levels[1]
<class 'pandas.tseries.index.DatetimeIndex'>
[2015-06-02 00:00:00, ..., 2015-06-02 07:10:00]
Length: 44, Freq: None, Timezone: None