In [1]:
"""
The original notebook is NGDC_CSW_QueryForIOOSRAs_UUID.ipynb
Created by Emilio Mayorga, 2/10/2014
"""
title = 'Catalog based search for the IOOS Regional Associations using UUID'
name = '2015-12-07-NGDC_CSW_QueryForIOOSRAs_UUID'
In [2]:
%matplotlib inline
import seaborn
seaborn.set(style='ticks')
import os
from datetime import datetime
from IPython.core.display import HTML
import warnings
warnings.simplefilter("ignore")
# Metadata and markdown generation.
hour = datetime.utcnow().strftime('%H:%M')
comments = "true"
date = '-'.join(name.split('-')[:3])
slug = '-'.join(name.split('-')[3:])
metadata = dict(title=title,
date=date,
hour=hour,
comments=comments,
slug=slug,
name=name)
markdown = """Title: {title}
date: {date} {hour}
comments: {comments}
slug: {slug}
{{% notebook {name}.ipynb cells[2:] %}}
""".format(**metadata)
content = os.path.abspath(os.path.join(os.getcwd(), os.pardir,
os.pardir, '{}.md'.format(name)))
with open('{}'.format(content), 'w') as f:
f.writelines(markdown)
html = """
<small>
<p> This post was written as an IPython notebook. It is available for
<a href="http://ioos.github.com/system-test/downloads/
notebooks/%s.ipynb">download</a>. You can also try an interactive version on
<a href="http://mybinder.org/repo/ioos/system-test/">binder</a>.</p>
<p></p>
""" % (name)
In the previous example we investigated if it was possible to query the NGDC CSW Catalog to extract records matching an IOOS RA acronym. However, we could not trust the results. Some RAs results in just a few records or no record at all, like AOOS and PacIOOS respectively.
We can make a more robust search using the UUID rather than the acronym. The advantage is that all records will be associated to an UUID, hence a more robust search. The disadvantage is that we need to keep track of a long and unintelligible identification.
As usual let's start by instantiating the csw catalog object.
In [3]:
from owslib.csw import CatalogueServiceWeb
endpoint = 'http://www.ngdc.noaa.gov/geoportal/csw'
csw = CatalogueServiceWeb(endpoint, timeout=30)
We will use the same list of all the Regional Associations as before, but now we will match them with the corresponding UUID from the IOOS registry.
In [4]:
import pandas as pd
ioos_ras = ['AOOS', # Alaska
'CaRA', # Caribbean
'CeNCOOS', # Central and Northern California
'GCOOS', # Gulf of Mexico
'GLOS', # Great Lakes
'MARACOOS', # Mid-Atlantic
'NANOOS', # Pacific Northwest
'NERACOOS', # Northeast Atlantic
'PacIOOS', # Pacific Islands
'SCCOOS', # Southern California
'SECOORA'] # Southeast Atlantic
url = 'https://raw.githubusercontent.com/ioos/registry/master/uuid.csv'
df = pd.read_csv(url, index_col=0, header=0, names=['UUID'])
df['UUID'] = df['UUID'].str.strip()
The function below is similar to the one we used before.
Note the same matching (PropertyIsEqualTo
),
but different property name (sys.siteuuid
rather than apiso:Keywords
).
That is the key difference for the robustness of the search, keywords are not always defined and might return bogus matching. While UUID will always mean one RA.
In [5]:
from owslib.fes import PropertyIsEqualTo
def query_ra(csw, uuid='B3EA8869-B726-4E39-898A-299E53ABBC98'):
q = PropertyIsEqualTo(propertyname='sys.siteuuid', literal='{%s}' % uuid)
csw.getrecords2(constraints=[q], maxrecords=2000, esn='full')
return csw
In [6]:
for ra in ioos_ras:
try:
uuid = df.ix[ra]['UUID'].strip('{').strip('}')
csw = query_ra(csw, uuid)
ret = csw.results['returned']
word = 'records' if ret > 1 else 'record'
print("{0:>8} has {1:>4} {2}".format(ra, ret, word))
csw.records.clear()
except KeyError:
pass
Compare the results above with cell [6]
from before. Note that now we got 192 for PacIOOS and 74 for AOOS now!
You can see the original notebook here.
In [7]:
HTML(html)
Out[7]: