<img src="http://nci.org.au/wp-content/themes/nci/img/img-logo-large.png", width=400>
Begin by going to NCI's Geonetwork page: http://geonetwork.nci.org.au/
This page contains the metadata records for NCI Data Collections as well as information on where to find the data.
In this example, we will search for Landsat data:
If we click on the first result, we see a brief overview of the metadata record. Note: For the full record, navigate to the upper-right corner of your browser to change to the "Full view" (eyeball icon).
One of the options under Download and links is the NCI THREDDS Data Server Catalog page:
By navigating to this link, the available (public) data subcollections and datasets will be visible:
In this example, let's navigate to the LANDSAT data: scenes and tiles/ tiles/ EPSG3577/ LS8_OLI_TIRS_NBAR/ dataset:
In [1]:
from netCDF4 import Dataset
import matplotlib.pyplot as plt
from siphon import catalog, ncss
%matplotlib inline
Siphon is a collection of Python utilities for downloading data from Unidata data technologies. More information on installing and using Unidata's Siphon can be found: https://github.com/Unidata/siphon
Once selecting a parent dataset directory, Siphon can be used to search and use the data access methods and services provided by THREDDS. For example, Siphon will return a list of data endpoints for the OPeNDAP data URL, NetCDF Subset Service (NCSS), Web Map Service (WMS), Web Coverage Service (WCS), and the HTTP link for direct download.
In [2]:
catalog_url = 'http://dapds00.nci.org.au/thredds/catalog/rs0/tiles/EPSG3577/LS8_OLI_TIRS_NBAR/catalog.xml'
In [3]:
tds = catalog.TDSCatalog(catalog_url)
datasets = list(tds.datasets)
endpts = tds.datasets.values()
In [4]:
datasets[:10]
Out[4]:
In [5]:
endpts[0].access_urls
Out[5]:
In [6]:
from shapely.geometry import Polygon
from shapely.wkt import loads
query = (136, 138, -29.3, -27.8)
query = Polygon([[136, -29.3], [136, -27.8], [138, -27.8], [138, -29.3]])
# What this query looks like in WKT
print query.wkt
In [7]:
%%time
matches = []
for dataset in endpts:
dap = dataset.access_urls['OPENDAP']
with Dataset(dap) as f:
bounds = loads(f.geospatial_bounds.encode())
if bounds.intersects(query) == True:
print dap
matches.append(dap)
In [8]:
%%time
plt.figure(figsize=(12,12))
for match in matches:
with Dataset(match) as f:
x = f.variables['x'][::50]
y = f.variables['y'][::50]
t = f.variables['time'][::5]
for i in range(0, len(t)):
b2 = f.variables['band_2'][i,::50,::50]
plt.pcolormesh(x, y, b2)
In [9]:
import os
This example with use os.listdir
as a simple means to search the files under one parent directory. For searching a directory tree, os.walk
is useful. For information on all the avaiable os
library functions, please see: https://docs.python.org/2/library/os.html.
In [10]:
top_directory = '/g/data2/rs0/tiles/EPSG3577/LS8_OLI_TIRS_NBAR/'
files = os.listdir(top_directory)
print files[:10]
In [11]:
%%time
matches = []
for file in files:
if file.endswith('.nc'):
file_path = os.path.join(top_directory, file)
with Dataset(file_path) as f:
bounds = loads(f.geospatial_bounds.encode())
if bounds.intersects(query) == True:
print file_path
matches.append(file_path)
In [12]:
%%time
plt.figure(figsize=(12,12))
for match in matches:
with Dataset(match) as f:
x = f.variables['x'][::10]
y = f.variables['y'][::10]
t = f.variables['time'][::10]
b2 = f.variables['band_2'][14,::10,::10]
plt.pcolormesh(x, y, b2)