Demonstrate some CSW query capabilities

We will use the owslib library to construct queries and parse responses from CSW



In [1]:

    
from owslib.csw import CatalogueServiceWeb
from owslib import fes
import numpy as np

Specify a CSW endpoint. You can test if it's working with a getCapabilities request:

<endpoint>?request=GetCapabilities&service=CSW

for example:

http://catalog.data.gov/csw-all?service=CSW&version=2.0.2&request=GetCapabilities



In [2]:

    
#endpoint = 'http://catalog.data.gov/csw-all'  #granule level production catalog
#endpoint = 'https://data.ioos.us/csw'
endpoint = 'https://dev-catalog.ioos.us/csw'
#endpoint = 'http://geoport.whoi.edu/csw'
#endpoint = 'http://www.ngdc.noaa.gov/geoportal/csw'
csw = CatalogueServiceWeb(endpoint,timeout=60)
print csw.version



In [3]:

    
val = 'sea_water_salinity'
#val = 'NODC'
filter1 = fes.PropertyIsLike(propertyname='apiso:AnyText',literal=('*%s*' % val),
                        escapeChar='\\',wildCard='*',singleChar='?')
filter_list = [ filter1 ]
csw.getrecords2(constraints=filter_list,maxrecords=100,esn='full')
print len(csw.records.keys())
for rec in list(csw.records.keys()):
    print csw.records[rec].title









    



10
PacIOOS Water Quality Buoy 03: Kiholo Bay, Big Island, Hawaii
UCSC294-20150430T2218
ud_134-20150122T1955
PacIOOS Nearshore Sensor 09: Cetti Bay, Guam
PacIOOS Nearshore Sensor 15: Pago Bay, Guam
PacIOOS Nearshore Sensor 10: Maunalua Bay, Oahu, Hawaii
HRECOS Aggregated Station HRPVSC Data
None
None
Alaska Regional Data Portal For US Integrated Ocean Observing System

Hmmm..... In the query above, we only get 10 records, even though we specified maxrecords=100.

What's up with that?

Turns out the CSW service specified a MaxRecordDefault that cannot be exceeded. For example, checking: https://dev-catalog.ioos.us/csw?request=GetCapabilities&service=CSW we find:

<ows:Constraint name="MaxRecordDefault">
    <ows:Value>10</ows:Value>
</ows:Constraint>

So we need to loop the getrecords request, incrementing the startposition:



In [4]:

    
from owslib.fes import SortBy, SortProperty
pagesize = 10
sort_property = 'dc:title'  # a supported queryable of the CSW
sort_order = 'ASC'  # should be 'ASC' or 'DESC' (ascending or descending)
maxrecords = 50
sortby = SortBy([SortProperty(sort_property, sort_order)])



In [5]:

    
startposition = 0
while True:
    print 'getting records %d to %d' % (startposition, startposition+pagesize)
    csw.getrecords2(constraints=filter_list,
                    startposition=startposition, maxrecords=pagesize, sortby=sortby)
    for rec,item in csw.records.iteritems():
        print(item.title)
    print
    if csw.results['nextrecord'] == 0:
        break
    startposition += pagesize
    if startposition >= maxrecords:
        break









    



getting records 0 to 10
Alaska Regional Data Portal For US Integrated Ocean Observing System
AOOS/Models/High-resolution Ice/Ocean Modeling and Assimilation System (HIOMAS)
Arctic Seas Regional Climatology : sea_water_temperature January 0.25 degree
bass-20150706T151619Z
bass-20150706T151619Z
bass-20150827T1909
bass-20150827T1909
Bering Sea
blue-20150627T1242
blue-20150627T1242

getting records 10 to 20
blue-20150627T1242
blue-20160518T1525
blue-20160518T1525
blue-20160818T1448
CariCOOS Realtime Buoy Observations
CariCOOS Realtime Buoy Observations
CeNCOOS/Models/ROMS/California ROMS/California Coastal Regional Ocean Modeling System (ROMS) Forecast
CeNCOOS/Models/ROMS/California ROMS/California Coastal Regional Ocean Modeling System (ROMS) Nowcast
CeNCOOS/Models/ROMS/Monterey Bay ROMS (Oct 2010 to Jan 2013)/Monterey Bay (MB) Regional Ocean Modeling System (ROMS) Forecast
Central California Regional Data Portal For US Integrated Ocean Observing System

getting records 20 to 30
clark-20130821T2130
clark-20130821T2130
clark-20150709T1803
clark-20150709T1803
clark-20160624T1800
EASTCOAST-3D-NAM/Forecast Model Run Collection (2D time coordinates)
Fort_Point/sea_water_practical_salinity.nc
GCOOS Data Portal
Gichigami-20110629T1821
Gichigami-20110629T1821

getting records 30 to 40
gichigami-20110630T2049
gichigami-20110630T2049
gichigami-20110701T0123
gichigami-20110701T0123
gichigami-20110701T1556
gichigami-20110701T1556
gichigami-20110702T1048
gichigami-20110702T1048
gichigami-20110703T0859
gichigami-20110703T0859

getting records 40 to 50
gichigami-20110704T0352
gichigami-20110704T0352
gichigami-20110705T1936
gichigami-20110705T1936
gichigami-20110706T1754
gichigami-20110706T1754
gichigami-20110707T0056
gichigami-20110707T0056
Gichigami-20111129T2109
Gichigami-20111129T2109

Okay, now lets add another query filter and add it to the first one



In [6]:

    
val = 'CariCOOS'
#val = '0115145'
filter2 = fes.PropertyIsLike(propertyname='apiso:AnyText',literal=('*%s*' % val),
                        escapeChar='\\',wildCard='*',singleChar='?')
filter_list = [fes.And([filter1, filter2])]



In [7]:

    
startposition = 0
maxrecords = 50
while True:
    print 'getting records %d to %d' % (startposition, startposition+pagesize)
    csw.getrecords2(constraints=filter_list,
                    startposition=startposition, maxrecords=pagesize, sortby=sortby)
    for rec,item in csw.records.iteritems():
        print(item.title)
    print
    if csw.results['nextrecord'] == 0:
        break
    startposition += pagesize
    if startposition >= maxrecords:
        break









    



getting records 0 to 10
CariCOOS Realtime Buoy Observations
CariCOOS Realtime Buoy Observations
None



In [8]:

    
choice=np.random.choice(list(csw.records.keys()))
print(csw.records[choice].title)
csw.records[choice].references









    



CariCOOS Realtime Buoy Observations






    Out[8]:





[{'scheme': 'WWW:LINK',
  'url': 'http://dm2.caricoos.org/thredds/dodsC/UMO/DSG/SOS/VIA/WQM/HistoricRealtime/Agg.ncml.html'},
 {'scheme': 'WWW:LINK',
  'url': 'http://www.ncdc.noaa.gov/oa/wct/wct-jnlp-beta.php?singlefile=http://dm2.caricoos.org/thredds/dodsC/UMO/DSG/SOS/VIA/WQM/HistoricRealtime/Agg.ncml'},
 {'scheme': 'OPeNDAP:OPeNDAP',
  'url': 'http://dm2.caricoos.org/thredds/dodsC/UMO/DSG/SOS/VIA/WQM/HistoricRealtime/Agg.ncml'},
 {'scheme': 'OGC:SOS',
  'url': 'http://dm2.caricoos.org/thredds/sos/UMO/DSG/SOS/VIA/WQM/HistoricRealtime/Agg.ncml?service=SOS&version=1.0.0&request=GetCapabilities'}]

Lets see what the full XML record looks like



In [9]:

    
csw.records[choice].xml









    Out[9]:





'<csw:SummaryRecord xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:inspire_common="http://inspire.ec.europa.eu/schemas/common/1.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:dct="http://purl.org/dc/terms/" xmlns:ows="http://www.opengis.net/ows" xmlns:apiso="http://www.opengis.net/cat/csw/apiso/1.0" xmlns:gml="http://www.opengis.net/gml" xmlns:dif="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:srv="http://www.isotc211.org/2005/srv" xmlns:ogc="http://www.opengis.net/ogc" xmlns:fgdc="http://www.opengis.net/cat/csw/csdgm" xmlns:inspire_ds="http://inspire.ec.europa.eu/schemas/inspire_ds/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:os="http://a9.com/-/spec/opensearch/1.1/" xmlns:soapenv="http://www.w3.org/2003/05/soap-envelope" xmlns:sitemap="http://www.sitemaps.org/schemas/sitemap/0.9"><dc:identifier>edu.maine:VIA</dc:identifier><dc:title>CariCOOS Realtime Buoy Observations</dc:title><dc:type>dataset</dc:type><dc:subject>Oceans &gt; Ocean Temperature &gt; Water Temperature</dc:subject><dc:subject>Oceans &gt; Ocean Pressure &gt; Sea Level Pressure</dc:subject><dc:subject>Oceans &gt; Ocean Chemistry &gt; Oxygen</dc:subject><dc:subject>Oceans &gt; Ocean Chemistry &gt; Chlorophyll</dc:subject><dc:subject>Oceans &gt; Ocean Optics &gt; Turbidity</dc:subject><dc:subject>Oceans &gt; Salinity/Density &gt; Conductivity</dc:subject><dc:subject>Oceans &gt; Salinity/Density &gt; Salinity</dc:subject><dc:subject>Oceans &gt; Salinity/Density &gt; Density</dc:subject><dc:subject>Oceans &gt; Ocean Waves &gt; Significant Wave Height</dc:subject><dc:subject>Oceans &gt; Ocean Waves &gt; Wave Period</dc:subject><dc:subject>Oceans &gt; Ocean Winds &gt; Surface Winds</dc:subject><dc:subject>Oceans &gt; Ocean Circulation &gt; Ocean Currents</dc:subject><dc:subject>The Project Name</dc:subject><dc:subject>CariCOOS</dc:subject><dc:subject>station_name</dc:subject><dc:subject>sea_water_electrical_conductivity</dc:subject><dc:subject>conductivity data_quality</dc:subject><dc:subject>sea_water_temperature data_quality</dc:subject><dc:subject>sea_water_salinity data_quality</dc:subject><dc:subject>sea_water_density</dc:subject><dc:subject>sea_water_density data_quality</dc:subject><dc:subject>dissolved_oxygen data_quality</dc:subject><dc:subject>chlorophyll_concentration_in_sea_water</dc:subject><dc:subject>chlorophyll data_quality</dc:subject><dc:subject>turbidity_of_sea_water</dc:subject><dc:subject>turbidity data_quality</dc:subject><dc:subject>offset_time</dc:subject><dc:subject>time_created</dc:subject><dc:subject>time_modified</dc:subject><dc:subject>longitude</dc:subject><dc:subject>latitude</dc:subject><dc:subject>depth</dc:subject><dc:subject>time</dc:subject><dct:references scheme="WWW:LINK">http://dm2.caricoos.org/thredds/dodsC/UMO/DSG/SOS/VIA/WQM/HistoricRealtime/Agg.ncml.html</dct:references><dct:references scheme="WWW:LINK">http://www.ncdc.noaa.gov/oa/wct/wct-jnlp-beta.php?singlefile=http://dm2.caricoos.org/thredds/dodsC/UMO/DSG/SOS/VIA/WQM/HistoricRealtime/Agg.ncml</dct:references><dct:references scheme="OPeNDAP:OPeNDAP">http://dm2.caricoos.org/thredds/dodsC/UMO/DSG/SOS/VIA/WQM/HistoricRealtime/Agg.ncml</dct:references><dct:references scheme="OGC:SOS">http://dm2.caricoos.org/thredds/sos/UMO/DSG/SOS/VIA/WQM/HistoricRealtime/Agg.ncml?service=SOS&amp;version=1.0.0&amp;request=GetCapabilities</dct:references><dc:relation/><dct:modified>2015-05-12</dct:modified><dct:abstract>Ocean observation data from (CariCOOS).</dct:abstract><ows:BoundingBox crs="urn:x-ogc:def:crs:EPSG:6.11:4326" dimensions="2"><ows:LowerCorner>18.26 -65.0</ows:LowerCorner><ows:UpperCorner>18.26 -65.0</ows:UpperCorner></ows:BoundingBox></csw:SummaryRecord>'

Yuk! That's why we use OWSlib! :-)

Now add contraint to return only records that have either the OPeNDAP or SOS service.

Let's first see what services are advertised:



In [10]:

    
try:
    csw.get_operation_by_name('GetDomain')
    csw.getdomain('apiso:ServiceType', 'property')
    print(csw.results['values'])
except:
    print('GetDomain not supported')









    



[None, 'ERDDAP OPeNDAP', 'ERDDAP tabledap,OPeNDAP,ERDDAP Subset', 'OPeNDAP:OPeNDAP', 'OPeNDAP:OPeNDAP,OGC:SOS', 'OPeNDAP:OPeNDAP,OGC:WMS,file', 'Open Geospatial Consortium Web Coverage Service (WCS),Open Geospatial Consortium Web Map Service (WMS),Open Geospatial Consortium Web Map Service - Cached (WMS-C)', 'Open Geospatial Consortium Web Feature Service (WFS),Open Geospatial Consortium Web Map Service (WMS)', 'Open Geospatial Consortium Web Feature Service (WFS),Open Geospatial Consortium Web Map Service (WMS),Open Geospatial Consortium Web Map Service - Cached (WMS-C)', 'Open Geospatial Consortium Web Map Service (WMS)', 'THREDDS OPeNDAP', 'THREDDS OPeNDAP,Open Geospatial Consortium Sensor Observation Service (SOS)', 'THREDDS OPeNDAP,Open Geospatial Consortium Sensor Observation Service (SOS),THREDDS HTTP Service', 'THREDDS OPeNDAP,Open Geospatial Consortium Web Coverage Service (WCS),Open Geospatial Consortium Sensor Observation Service (SOS),THREDDS NetCDF Subset Service', 'THREDDS OPeNDAP,Open Geospatial Consortium Web Coverage Service (WCS),Open Geospatial Consortium Web Map Service (WMS),THREDDS NetCDF Subset Service', 'THREDDS OPeNDAP,Open Geospatial Consortium Web Coverage Service (WCS),THREDDS NetCDF Subset Service', 'THREDDS OPeNDAP,Open Geospatial Consortium Web Map Service (WMS)', 'THREDDS OPeNDAP,Open Geospatial Consortium Web Map Service (WMS),THREDDS HTTP Service', 'THREDDS OPeNDAP,Open Geospatial Consortium Web Map Service (WMS),THREDDS NetCDF Subset Service', 'THREDDS OPeNDAP,THREDDS HTTP Service']



In [11]:

    
#val = 'OPeNDAP'
val = 'SOS'
filter3 = fes.PropertyIsLike(propertyname='apiso:ServiceType',literal=('*%s*' % val),
                        escapeChar='\\',wildCard='*',singleChar='?')

services = ['OPeNDAP','SOS'] 
service_filt = fes.Or([fes.PropertyIsLike(propertyname='apiso:ServiceType',literal=('*%s*' % val),
                    escapeChar='\\',wildCard='*',singleChar='?') for val in services])
    
filter_list = [fes.And([filter1, filter2, filter3])]
#filter_list = [fes.And([filter1,  filter3])]
#filter_list = [fes.And([filter1, filter2, service_filt])]



In [12]:

    
startposition = 0
while True:
    print 'getting records %d to %d' % (startposition, startposition+pagesize)
    csw.getrecords2(constraints=filter_list,
                    startposition=startposition, maxrecords=pagesize, sortby=sortby)
    for rec,item in csw.records.iteritems():
        print(item.title)
    print
    if csw.results['nextrecord'] == 0:
        break
    startposition += pagesize
    if startposition >= maxrecords:
        break









    



getting records 0 to 10
CariCOOS Realtime Buoy Observations
CariCOOS Realtime Buoy Observations

Let's try adding a search for a non-existant service, which should result in no records back:



In [13]:

    
val = 'not_a_real_service'
filter3 = fes.PropertyIsLike(propertyname='apiso:ServiceType',literal=('*%s*' % val),
                        escapeChar='\\',wildCard='*',singleChar='?')
filter_list = [fes.And([filter1, filter2, filter3])]

csw.getrecords2(constraints=filter_list,maxrecords=100,esn='full')
print len(csw.records.keys())
for rec in list(csw.records.keys()):
    print csw.records[rec].title

Good!

Now add bounding box constraint. To specify lon,lat order for bbox (which we want to do so that we can use the same bbox with either geoportal server or pycsw requests), we need to request the bounding box specifying the CRS84 coordinate reference system. The CRS84 option is available in pycsw 1.1.10+. The ability to specify the crs in the bounding box request is available in owslib 0.8.12+. For more info on the bounding box problem and how it was solved, see this pycsw issue, this geoportal server issue, and this owslib issue



In [14]:

    
bbox = [-158.4, 21.24, -157.5, 21.77]    # [lon_min, lat_min, lon_max, lat_max]
bbox_filter = fes.BBox(bbox,crs='urn:ogc:def:crs:OGC:1.3:CRS84')
filter_list = [fes.And([filter1, filter2, service_filt, bbox_filter])]

startposition = 0
while True:
    print 'getting records %d to %d' % (startposition, startposition+pagesize)
    csw.getrecords2(constraints=filter_list,
                    startposition=startposition, maxrecords=pagesize, sortby=sortby)
    for rec,item in csw.records.iteritems():
        print(item.title)
    print
    if csw.results['nextrecord'] == 0:
        break
    startposition += pagesize
    if startposition >= maxrecords:
        break









    



getting records 0 to 10

Now add time contraints. Here we first define a function that will return records if any data in the records overlaps the specified time period



In [15]:

    
def dateRange(start_date='1900-01-01',stop_date='2100-01-01',constraint='overlaps'):
    if constraint == 'overlaps':
        start = fes.PropertyIsLessThanOrEqualTo(propertyname='apiso:TempExtent_begin', literal=stop_date)
        stop = fes.PropertyIsGreaterThanOrEqualTo(propertyname='apiso:TempExtent_end', literal=start_date)
    elif constraint == 'within':
        start = fes.PropertyIsGreaterThanOrEqualTo(propertyname='apiso:TempExtent_begin', literal=start_date)
        stop = fes.PropertyIsLessThanOrEqualTo(propertyname='apiso:TempExtent_end', literal=stop_date)
    return start,stop



In [16]:

    
import datetime as dt
# 2014 recent
jd_start = dt.datetime(1988,1,1)
jd_stop = dt.datetime(1988,3,1)

# 2011 
#jd_start = dt.datetime(2013,4,20)
#jd_stop = dt.datetime(2013,4,24)

# ... or relative to now
jd_now = dt.datetime.utcnow()
jd_start = jd_now - dt.timedelta(days=3)
jd_stop = jd_now + dt.timedelta(days=3)

start_date = jd_start.strftime('%Y-%m-%d %H:00')
stop_date  = jd_stop.strftime('%Y-%m-%d %H:00')

jd_start = dt.datetime.strptime(start_date,'%Y-%m-%d %H:%M')
jd_stop = dt.datetime.strptime(stop_date,'%Y-%m-%d %H:%M')

print(start_date,'to',stop_date)
start,stop = dateRange(start_date,stop_date)









    



('2016-08-30 16:00', 'to', '2016-09-05 16:00')



In [17]:

    
filter_list = [fes.And([filter1, filter2, service_filt, bbox_filter, start, stop])]

startposition = 0
while True:
    print 'getting records %d to %d' % (startposition, startposition+pagesize)
    csw.getrecords2(constraints=filter_list,
                    startposition=startposition, maxrecords=pagesize, sortby=sortby)
    for rec,item in csw.records.iteritems():
        print(item.title)
    print
    if csw.results['nextrecord'] == 0:
        break
    startposition += pagesize
    if startposition >= maxrecords:
        break









    



getting records 0 to 10

Now add a NOT filter to eliminate some entries



In [18]:

    
kw = dict(wildCard='*', escapeChar='\\',
          singleChar='?', propertyname='apiso:AnyText')

not_filt = fes.Not([fes.PropertyIsLike(literal='*Waikiki*', **kw)])



In [19]:

    
filter_list = [fes.And([filter1, filter2, service_filt, bbox_filter, start, stop, not_filt])]

startposition = 0
while True:
    print 'getting records %d to %d' % (startposition, startposition+pagesize)
    csw.getrecords2(constraints=filter_list,
                    startposition=startposition, maxrecords=pagesize, sortby=sortby)
    for rec,item in csw.records.iteritems():
        print(item.title)
    print
    if csw.results['nextrecord'] == 0:
        break
    startposition += pagesize
    if startposition >= maxrecords:
        break









    



getting records 0 to 10

Hopefully this notebook demonstrated some of the power (and complexity) of CSW! ;-)



In [ ]: