In [1]:
"""
The original notebook is NGDC_CSW_QueryForIOOSRAs_UUID.ipynb

Created by Emilio Mayorga, 2/10/2014
"""

title = 'Catalog based search for the IOOS Regional Associations using UUID'
name = '2015-12-07-NGDC_CSW_QueryForIOOSRAs_UUID'

In [2]:
%matplotlib inline
import seaborn
seaborn.set(style='ticks')

import os
from datetime import datetime
from IPython.core.display import HTML

import warnings
warnings.simplefilter("ignore")

# Metadata and markdown generation.
hour = datetime.utcnow().strftime('%H:%M')
comments = "true"

date = '-'.join(name.split('-')[:3])
slug = '-'.join(name.split('-')[3:])

metadata = dict(title=title,
                date=date,
                hour=hour,
                comments=comments,
                slug=slug,
                name=name)

markdown = """Title: {title}
date:  {date} {hour}
comments: {comments}
slug: {slug}

{{% notebook {name}.ipynb cells[2:] %}}
""".format(**metadata)

content = os.path.abspath(os.path.join(os.getcwd(), os.pardir,
                                       os.pardir, '{}.md'.format(name)))

with open('{}'.format(content), 'w') as f:
    f.writelines(markdown)


html = """
<small>
<p> This post was written as an IPython notebook.  It is available for
<a href="http://ioos.github.com/system-test/downloads/
notebooks/%s.ipynb">download</a>.  You can also try an interactive version on
<a href="http://mybinder.org/repo/ioos/system-test/">binder</a>.</p>
<p></p>
""" % (name)


/home/filipe/miniconda/envs/IOOS/lib/python2.7/site-packages/matplotlib/__init__.py:872: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
  warnings.warn(self.msg_depr % (key, alt_key))

In the previous example we investigated if it was possible to query the NGDC CSW Catalog to extract records matching an IOOS RA acronym. However, we could not trust the results. Some RAs results in just a few records or no record at all, like AOOS and PacIOOS respectively.

We can make a more robust search using the UUID rather than the acronym. The advantage is that all records will be associated to an UUID, hence a more robust search. The disadvantage is that we need to keep track of a long and unintelligible identification.

As usual let's start by instantiating the csw catalog object.


In [3]:
from owslib.csw import CatalogueServiceWeb

endpoint = 'http://www.ngdc.noaa.gov/geoportal/csw'
csw = CatalogueServiceWeb(endpoint, timeout=30)

We will use the same list of all the Regional Associations as before, but now we will match them with the corresponding UUID from the IOOS registry.


In [4]:
import pandas as pd

ioos_ras = ['AOOS',      # Alaska
            'CaRA',      # Caribbean
            'CeNCOOS',   # Central and Northern California
            'GCOOS',     # Gulf of Mexico
            'GLOS',      # Great Lakes
            'MARACOOS',  # Mid-Atlantic
            'NANOOS',    # Pacific Northwest 
            'NERACOOS',  # Northeast Atlantic 
            'PacIOOS',   # Pacific Islands 
            'SCCOOS',    # Southern California
            'SECOORA']   # Southeast Atlantic

url = 'https://raw.githubusercontent.com/ioos/registry/master/uuid.csv'

df = pd.read_csv(url, index_col=0, header=0, names=['UUID'])
df['UUID'] = df['UUID'].str.strip()

The function below is similar to the one we used before. Note the same matching (PropertyIsEqualTo), but different property name (sys.siteuuid rather than apiso:Keywords).

That is the key difference for the robustness of the search, keywords are not always defined and might return bogus matching. While UUID will always mean one RA.


In [5]:
from owslib.fes import PropertyIsEqualTo

def query_ra(csw, uuid='B3EA8869-B726-4E39-898A-299E53ABBC98'):
    q = PropertyIsEqualTo(propertyname='sys.siteuuid', literal='{%s}' % uuid)
    csw.getrecords2(constraints=[q], maxrecords=2000, esn='full')
    return csw
Here is what we got:

In [6]:
for ra in ioos_ras:
    try:
        uuid = df.ix[ra]['UUID'].strip('{').strip('}')
        csw = query_ra(csw, uuid)
        ret = csw.results['returned']
        word = 'records' if ret > 1 else 'record'
        print("{0:>8} has {1:>4} {2}".format(ra, ret, word))
        csw.records.clear()
    except KeyError:
        pass


    AOOS has   74 records
   GCOOS has    8 records
    GLOS has   20 records
MARACOOS has  468 records
  NANOOS has    8 records
NERACOOS has 1109 records
 PacIOOS has  192 records
  SCCOOS has   23 records
 SECOORA has  100 records

Compare the results above with cell [6] from before. Note that now we got 192 for PacIOOS and 74 for AOOS now!

You can see the original notebook here.


In [7]:
HTML(html)


Out[7]:

This post was written as an IPython notebook. It is available for download. You can also try an interactive version on binder.