In [1]:
"""
The original notebook is NGDC_CSW_QueryForIOOSRAs.ipynb

Created by Emilio Mayorga, 2/10/2014
"""

title = 'Catalog based search for the IOOS Regional Associations acronyms'
name = '2015-11-23-NGDC_CSW_QueryForIOOSRAs'

In [2]:
%matplotlib inline
import seaborn
seaborn.set(style='ticks')

import os
from datetime import datetime
from IPython.core.display import HTML

import warnings
warnings.simplefilter("ignore")

# Metadata and markdown generation.
hour = datetime.utcnow().strftime('%H:%M')
comments = "true"

date = '-'.join(name.split('-')[:3])
slug = '-'.join(name.split('-')[3:])

metadata = dict(title=title,
                date=date,
                hour=hour,
                comments=comments,
                slug=slug,
                name=name)

markdown = """Title: {title}
date:  {date} {hour}
comments: {comments}
slug: {slug}

{{% notebook {name}.ipynb cells[2:] %}}
""".format(**metadata)

content = os.path.abspath(os.path.join(os.getcwd(), os.pardir,
                                       os.pardir, '{}.md'.format(name)))

with open('{}'.format(content), 'w') as f:
    f.writelines(markdown)


html = """
<small>
<p> This post was written as an IPython notebook.  It is available for
<a href="http://ioos.github.com/system-test/downloads/
notebooks/%s.ipynb">download</a>.  You can also try an interactive version on
<a href="http://mybinder.org/repo/ioos/system-test/">binder</a>.</p>
<p></p>
""" % (name)


/home/filipe/miniconda/envs/IOOS/lib/python2.7/site-packages/matplotlib/__init__.py:872: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
  warnings.warn(self.msg_depr % (key, alt_key))

The goal of this post is to investigate if it is possible to query the NGDC CSW Catalog to extract records matching an IOOS RA acronym, like SECOORA for example.

In the cell above we do the usual: instantiate a Catalogue Service Web (csw) using the NGDC catalog endpoint.


In [3]:
from owslib.csw import CatalogueServiceWeb

endpoint = 'http://www.ngdc.noaa.gov/geoportal/csw'
csw = CatalogueServiceWeb(endpoint, timeout=30)

We need a list of all the Regional Associations we know.


In [4]:
ioos_ras = ['AOOS',      # Alaska
            'CaRA',      # Caribbean
            'CeNCOOS',   # Central and Northern California
            'GCOOS',     # Gulf of Mexico
            'GLOS',      # Great Lakes
            'MARACOOS',  # Mid-Atlantic
            'NANOOS',    # Pacific Northwest 
            'NERACOOS',  # Northeast Atlantic 
            'PacIOOS',   # Pacific Islands 
            'SCCOOS',    # Southern California
            'SECOORA']   # Southeast Atlantic

To streamline the query we can create a function that instantiate the fes filter and returns the records.


In [5]:
from owslib.fes import PropertyIsEqualTo

def query_ra(csw, ra='SECOORA'):
    q = PropertyIsEqualTo(propertyname='apiso:Keywords', literal=ra)
    csw.getrecords2(constraints=[q], maxrecords=100, esn='full')
    return csw
Here is what we got:

In [6]:
for ra in ioos_ras:
    csw = query_ra(csw, ra)
    ret = csw.results['returned']
    word = 'records' if ret > 1 else 'record'
    print("{0:>8} has {1:>3} {2}".format(ra, ret, word))
    csw.records.clear()


    AOOS has   1 record
    CaRA has   0 record
 CeNCOOS has   7 records
   GCOOS has   5 records
    GLOS has  15 records
MARACOOS has 100 records
  NANOOS has   1 record
NERACOOS has 100 records
 PacIOOS has   0 record
  SCCOOS has  23 records
 SECOORA has  71 records

I would not trust those number completely. Surely some of the RA listed above have more than 0/1 record.

Note that we have more information in the csw.records. Let's inspect one of SECOORA's stations for example.


In [7]:
csw = query_ra(csw, 'SECOORA')
key = csw.records.keys()[0]

print(key)


id_usf.tas.ngwlms

We can verify the station type, title, and last date of modification.


In [8]:
station = csw.records[key]

station.type, station.title, station.modified


Out[8]:
('downloadableData', 'usf.tas.ngwlms', '2015-11-25T01:32:42-07:00')

The subjects field contains the variables and some useful keywords.


In [9]:
station.subjects


Out[9]:
['air_pressure',
 'air_temperature',
 'water_surface_height_above_reference_datum',
 'wind_from_direction',
 'wind_speed_of_gust',
 'wind_speed',
 'SECOORA',
 'air_pressure',
 'air_temperature',
 'water_surface_height_above_reference_datum',
 'wind_from_direction',
 'wind_speed_of_gust',
 'wind_speed',
 'latitude',
 'longitude',
 'time',
 'climatologyMeteorologyAtmosphere']

And we can access the full XML description for the station.


In [10]:
print(station.xml)


<csw:Record xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcmiBox="http://dublincore.org/documents/2000/07/11/dcmi-box/" xmlns:dct="http://purl.org/dc/terms/" xmlns:gml="http://www.opengis.net/gml" xmlns:ows="http://www.opengis.net/ows" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<dc:identifier scheme="urn:x-esri:specification:ServiceType:ArcIMS:Metadata:FileID">id_usf.tas.ngwlms</dc:identifier>
<dc:identifier scheme="urn:x-esri:specification:ServiceType:ArcIMS:Metadata:DocID">{9DDE8E32-EB36-4E72-B2CC-A47D51151271}</dc:identifier>
<dc:title>usf.tas.ngwlms</dc:title>
<dc:type scheme="urn:x-esri:specification:ServiceType:ArcIMS:Metadata:ContentType">downloadableData</dc:type>
<dc:type scheme="urn:x-esri:specification:ServiceType:ArcIMS:Metadata:ContentType">liveData</dc:type>
<dc:subject>air_pressure</dc:subject>
<dc:subject>air_temperature</dc:subject>
<dc:subject>water_surface_height_above_reference_datum</dc:subject>
<dc:subject>wind_from_direction</dc:subject>
<dc:subject>wind_speed_of_gust</dc:subject>
<dc:subject>wind_speed</dc:subject>
<dc:subject>SECOORA</dc:subject>
<dc:subject>air_pressure</dc:subject>
<dc:subject>air_temperature</dc:subject>
<dc:subject>water_surface_height_above_reference_datum</dc:subject>
<dc:subject>wind_from_direction</dc:subject>
<dc:subject>wind_speed_of_gust</dc:subject>
<dc:subject>wind_speed</dc:subject>
<dc:subject>latitude</dc:subject>
<dc:subject>longitude</dc:subject>
<dc:subject>time</dc:subject>
<dc:subject>climatologyMeteorologyAtmosphere</dc:subject>
<dct:modified>2015-11-25T01:32:42-07:00</dct:modified>
<dct:references scheme="urn:x-esri:specification:ServiceType:distribution:url">http://tds.secoora.org/thredds/dodsC/usf.tas.ngwlms.nc.html</dct:references>
<dct:references scheme="urn:x-esri:specification:ServiceType:distribution:url">http://www.ncdc.noaa.gov/oa/wct/wct-jnlp-beta.php?singlefile=http://tds.secoora.org/thredds/dodsC/usf.tas.ngwlms.nc</dct:references>
<dct:references scheme="urn:x-esri:specification:ServiceType:sos:url">http://tds.secoora.org/thredds/sos/usf.tas.ngwlms.nc?service=SOS&amp;version=1.0.0&amp;request=GetCapabilities</dct:references>
<dct:references scheme="urn:x-esri:specification:ServiceType:odp:url">http://tds.secoora.org/thredds/dodsC/usf.tas.ngwlms.nc</dct:references>
<dct:references scheme="urn:x-esri:specification:ServiceType:download:url">http://tds.secoora.org/thredds/dodsC/usf.tas.ngwlms.nc.html</dct:references>
<ows:WGS84BoundingBox>
<ows:LowerCorner>-82.75800323486328 28.1560001373291</ows:LowerCorner>
<ows:UpperCorner>-82.75800323486328 28.1560001373291</ows:UpperCorner>
</ows:WGS84BoundingBox>
<ows:BoundingBox>
<ows:LowerCorner>-82.75800323486328 28.1560001373291</ows:LowerCorner>
<ows:UpperCorner>-82.75800323486328 28.1560001373291</ows:UpperCorner>
</ows:BoundingBox>
<dc:source>{B3EA8869-B726-4E39-898A-299E53ABBC98}</dc:source>
</csw:Record>

This query is very simple, but also very powerful. We can quickly assess the data available for a certain Regional Association data with just a few line of code.

You can see the original notebook here.


In [11]:
HTML(html)


Out[11]:

This post was written as an IPython notebook. It is available for download. You can also try an interactive version on binder.