<img src="http://nci.org.au/wp-content/themes/nci/img/img-logo-large.png", width=400>


Programmatically accessing data through THREDDS and the VDI

In this notebook:

The following material uses CSIRO IMOS TERN-AusCover MODIS Data Collection. For more information on the collection and licensing, please click here.



Import python packages


In [1]:
from netCDF4 import Dataset
import matplotlib.pyplot as plt 
from siphon import catalog, ncss
import datetime
%matplotlib inline

Start by defining the parent catalog URL from NCI's THREDDS Data Server

Note: Switch the '.html' ending on the URL to '.xml'


In [2]:
url = 'http://dapds00.nci.org.au/thredds/catalog/u39/public/data/modis/fractionalcover-clw/v2.2/netcdf/catalog.xml'

Using Siphon

Siphon is a collection of Python utilities for downloading data from Unidata data technologies. More information on installing and using Unidata's Siphon can be found: https://github.com/Unidata/siphon

Once selecting a parent dataset directory, Siphon can be used to search and use the data access methods and services provided by THREDDS. For example, Siphon will return a list of data endpoints for the OPeNDAP data URL, NetCDF Subset Service (NCSS), Web Map Service (WMS), Web Coverage Service (WCS), and the HTTP link for direct download.

In this Notebook, we'll be demonstrating the Netcdf Subset Service (NCSS).


In [3]:
tds = catalog.TDSCatalog(url)
datasets = list(tds.datasets)
endpts = tds.datasets.values()
The possible data services end points through NCI's THREDDS includes: OPeNDAP, Netcdf Subset Service (NCSS), HTTP download, Web Map Service (WMS), Web Coverage Service (WCS), NetCDF Markup Language (NcML), and a few metadata services (ISO, UDDC).

In [6]:
for key, value in endpts[0].access_urls.items():
    print('{}, {}'.format(key, value))


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-abee30952ffc> in <module>()
----> 1 for key, value in endpts[0].access_urls.items():
      2     print('{}, {}'.format(key, value))

TypeError: 'odict_values' object does not support indexing

We can create a small function that uses Siphon's Netcdf Subset Service (NCSS) to extract a spatial request (defined by a lat/lon box)


In [5]:
def get_data(dataset, bbox):    
    nc = ncss.NCSS(dataset.access_urls['NetcdfSubset'])
    query = nc.query()
    query.lonlat_box(north=bbox[3],south=bbox[2],east=bbox[1],west=bbox[0])
    query.variables('bs')
    
    data = nc.get_data(query)
    
    lon = data['longitude'][:]
    lat = data['latitude'][:]
    bs = data['bs'][0,:,:]
    t = data['time'][:]
    
    time_base = datetime.date(year=1800, month=1, day=1)
    time = time_base + datetime.timedelta(t[0])
    
    return lon, lat, bs, time

Query a single file and view result


In [6]:
bbox = (135, 140, -31, -27)
lon, lat, bs, t = get_data(endpts[0], bbox)

In [7]:
plt.figure(figsize=(10,10))
plt.imshow(bs, extent=bbox, cmap='gist_earth', origin='upper')

plt.xlabel('longitude (degrees)', fontsize=14)
plt.ylabel('latitude (degrees)', fontsize=14)
print "Date: ", t


Date:  2012-04-30

Loop and query over the collection


In [8]:
bbox = (135, 140, -31, -27)
plt.figure(figsize=(10,10))

for endpt in endpts[:15]:
    try:
        lon, lat, bs, t = get_data(endpt, bbox)

        plt.imshow(bs, extent=bbox, cmap='gist_earth', origin='upper')
        plt.clim(vmin=-2, vmax=100)

        plt.tick_params(labelsize=14)
        plt.xlabel('longitude (degrees)', fontsize=14)
        plt.ylabel('latitude (degrees)', fontsize=14)

        plt.title("Date: "+str(t), fontsize=16, weight='bold')
        plt.savefig("./images/"+endpt.name+".png")
        plt.cla()
    except:
        pass

plt.close()

Can make an animation of the temporal evolution (this example is by converting the series of *.png files above into a GIF)

Can also use Siphon to extract a single point


In [9]:
def get_point(dataset, lat, lon):
    nc = ncss.NCSS(dataset.access_urls['NetcdfSubset'])
    query = nc.query()
    query.lonlat_point(lon, lat)
    query.variables('bs')
    
    data = nc.get_data(query)
    bs = data['bs'][0]
    date = data['date'][0]
    
    return bs, date

In [10]:
bs, date = get_point(endpts[4], -27.75, 137)
print bs, date


54.0 2004-01-01 00:00:00+00:00

Time series example


In [11]:
data = []
for endpt in endpts[::20]:
    bs, date = get_point(endpt, -27.75, 137)
    data.append([date, bs])

In [ ]:
import numpy as np

BS = np.array(data)[:,1]
Date = np.array(data)[:,0]

plt.figure(figsize=(12,6))
plt.plot(Date, BS, '-o', linewidth=2, markersize=8)

plt.tick_params(labelsize=14)
plt.xlabel('date', fontsize=14)
plt.ylabel('fractional cover of bare soil (%)', fontsize=14)
plt.title('Lat, Lon: -27.75, 137', fontsize=16)