Testing data.noaa.gov using the CKAN API

We want to find all Data.gov datasets that match a specific type of data (e.g. sea_water_temperature), in a specified geospatial extent and time window, and that have a specific type of data endpoint (e.g. OPeNDAP). Since data.gov uses CKAN, while waiting for a CSW interface, here we try using the CKAN API with the ckanclient package.


In [1]:
import ckanclient
import pandas as pd
from pprint import pprint

In [2]:
#ckan = ckanclient.CkanClient('http://catalog.data.gov/api/3')
ckan = ckanclient.CkanClient('https://data.noaa.gov/api/3')

Try a keyword search:


In [3]:
search_params = { 'q': 'tags:"sea_water_temperature" '}
d = ckan.action('package_search', **search_params) 
print d['count']


62

Let's try a more complicated search:


In [4]:
search_params = {                                           
    'q': 'tags:"sea_water_temperature" AND metadata_modified:[2012-06-01T00:00:00.000Z TO NOW]',  
    'fq': 'res_format:HTML',                                
    'extras': {"ext_bbox":"-71.5,41.,-63,46.0"},                   
    'rows': 3                                                     
}
d = ckan.action('package_search', **search_params) 
print d['count']


7

Try to find Bodega data through a restricted bounding box:


In [5]:
search_params = {  
    'q': 'tags:"sea_water_temperature"', 
    'extras': {"ext_bbox":"-125,38,-122,39"},
    'rows': 10  
}
      
d = ckan.action('package_search', **search_params) 
print d['count']


9

Did we find it?


In [6]:
for rec in d['results']:
    print rec['title']


Meteorological and oceanographic data collected from the National Data Buoy Center Coastal-Marine Automated Network (C-MAN) and moored (weather) buoys during June 2011 (NODC Accession 0074384)
Meteorological and oceanographic data collected from the National Data Buoy Center Coastal-Marine Automated Network (C-MAN) and moored (weather) buoys during May 2011 (NODC Accession 0073426)
Meteorological and oceanographic data collected from the National Data Buoy Center Coastal-Marine Automated Network (C-MAN) and moored (weather) buoys during March 2011 (NODC Accession 0072077)
Meteorological and oceanographic data collected from the National Data Buoy Center Coastal-Marine Automated Network (C-MAN) and moored (weather) buoys during January 2011 (NODC Accession 0070959)
Meteorological and oceanographic data collected from the National Data Buoy Center Coastal-Marine Automated Network (C-MAN) and moored (weather) buoys during February 2011 (NODC Accession 0071368)
Meteorological and oceanographic data collected from the National Data Buoy Center Coastal-Marine Automated Network (C-MAN) and moored (weather) buoys during April 2011 (NODC Accession 0072886)
Physical oceanographic data collected from moorings deployed at Bodega Head by Gulf of the Farallones National Marine Sanctuary (GFNMS) and Bodega Marine Laboratory (BML) in the North Pacific Ocean from 2005-06-27 to 2011-10-27 (NODC Accession 0104152)
Meteorological and oceanographic data collected from the National Data Buoy Center Coastal-Marine Automated Network (C-MAN) and moored (weather) buoys during July 2011 (NODC Accession 0074922)
Physical oceanographic data collected from moorings deployed at Cordell Bank by Cordell Bank National Marine Sanctuary (CBNMS) and Bodega Marine Laboratory (BML) in the North Pacific Ocean from 2007-05-08 to 2011-12-14 (NODC Accession 0069874)

Find all NetCDF data:


In [7]:
search_params = {           
    'fq': 'res_format:NetCDF',
    'rows': 10 
}
      
d = ckan.action('package_search', **search_params) 
print d['count']


3

So what does one of these results look like? Let's take a look at the keys


In [8]:
print d['results'][0].keys()


[u'license_title', u'maintainer', u'relationships_as_object', u'private', u'maintainer_email', u'num_tags', u'id', u'metadata_created', u'metadata_modified', u'author', u'author_email', u'state', u'version', u'license_id', u'type', u'resources', u'num_resources', u'tags', u'tracking_summary', u'groups', u'organization', u'relationships_as_subject', u'revision_timestamp', u'name', u'isopen', u'url', u'notes', u'owner_org', u'extras', u'title', u'revision_id']

Now let's see what the urls looks like for all the resources


In [9]:
pprint(d['results'][0]['resources'])


[{u'cache_last_updated': None,
  u'cache_url': None,
  u'created': u'2013-11-30T18:53:06.560173',
  u'description': u'',
  u'format': u'HTML',
  u'hash': u'',
  u'id': u'0f02f524-1b82-40ca-9962-c3bd0e92f48f',
  u'last_modified': None,
  u'mimetype': None,
  u'mimetype_inner': None,
  u'name': u'Web Page',
  u'position': 0,
  u'resource_group_id': u'21c6ac76-d483-44f0-bfb3-4db426ca0aee',
  u'resource_locator_function': u'http://www.ngdc.noaa.gov/metadata/published/xsd/schema/resources/Codelist/gmxCodelists.xml#CI_OnLineFunctionCode',
  u'resource_locator_protocol': u'',
  u'resource_type': None,
  u'revision_id': u'dbb02a1b-3dd1-4791-941d-45bed0eba3df',
  u'revision_timestamp': u'2013-11-30T23:53:06.530627',
  u'size': None,
  u'state': u'active',
  u'tracking_summary': {u'recent': 0, u'total': 0},
  u'url': u'http://www.nodc.noaa.gov/cgi-bin/OAS/prd/accession/details/113335',
  u'webstore_last_updated': None,
  u'webstore_url': None},
 {u'cache_last_updated': None,
  u'cache_url': None,
  u'created': u'2013-11-30T18:53:06.560189',
  u'description': u'',
  u'format': u'HTML',
  u'hash': u'',
  u'id': u'05a94259-1480-4ee6-aa10-bac013f883f4',
  u'last_modified': None,
  u'mimetype': None,
  u'mimetype_inner': None,
  u'name': u'Web Page',
  u'position': 1,
  u'resource_group_id': u'21c6ac76-d483-44f0-bfb3-4db426ca0aee',
  u'resource_locator_function': u'http://www.ngdc.noaa.gov/metadata/published/xsd/schema/resources/Codelist/gmxCodelists.xml#CI_OnLineFunctionCode',
  u'resource_locator_protocol': u'',
  u'resource_type': None,
  u'revision_id': u'dbb02a1b-3dd1-4791-941d-45bed0eba3df',
  u'revision_timestamp': u'2013-11-30T23:53:06.530627',
  u'size': None,
  u'state': u'active',
  u'tracking_summary': {u'recent': 0, u'total': 0},
  u'url': u'http://www.nodc.noaa.gov/archive/arc0040/0113335',
  u'webstore_last_updated': None,
  u'webstore_url': None},
 {u'cache_last_updated': None,
  u'cache_url': None,
  u'created': u'2013-11-30T18:53:06.560198',
  u'description': u'',
  u'format': u'HTML',
  u'hash': u'',
  u'id': u'828bb510-27e5-4371-9526-0bbdd0516b36',
  u'last_modified': None,
  u'mimetype': None,
  u'mimetype_inner': None,
  u'name': u'Web Page',
  u'position': 2,
  u'resource_group_id': u'21c6ac76-d483-44f0-bfb3-4db426ca0aee',
  u'resource_locator_function': u'http://www.ngdc.noaa.gov/metadata/published/xsd/schema/resources/Codelist/gmxCodelists.xml#CI_OnLineFunctionCode',
  u'resource_locator_protocol': u'',
  u'resource_type': None,
  u'revision_id': u'dbb02a1b-3dd1-4791-941d-45bed0eba3df',
  u'revision_timestamp': u'2013-11-30T23:53:06.530627',
  u'size': None,
  u'state': u'active',
  u'tracking_summary': {u'recent': 0, u'total': 0},
  u'url': u'http://www.nodc.noaa.gov/cgi-bin/OAS/prd/accession/download/113335',
  u'webstore_last_updated': None,
  u'webstore_url': None},
 {u'cache_last_updated': None,
  u'cache_url': None,
  u'created': u'2013-11-30T18:53:06.560207',
  u'description': u'',
  u'format': u'NetCDF',
  u'hash': u'',
  u'id': u'8c24091e-5f93-4539-a557-1dafed9f9910',
  u'last_modified': None,
  u'mimetype': None,
  u'mimetype_inner': None,
  u'name': u'NetCDF File',
  u'position': 3,
  u'resource_group_id': u'21c6ac76-d483-44f0-bfb3-4db426ca0aee',
  u'resource_locator_function': u'http://www.ngdc.noaa.gov/metadata/published/xsd/schema/resources/Codelist/gmxCodelists.xml#CI_OnLineFunctionCode',
  u'resource_locator_protocol': u'',
  u'resource_type': None,
  u'revision_id': u'dbb02a1b-3dd1-4791-941d-45bed0eba3df',
  u'revision_timestamp': u'2013-11-30T23:53:06.530627',
  u'size': None,
  u'state': u'active',
  u'tracking_summary': {u'recent': 0, u'total': 0},
  u'url': u'http://ecowatch.ncddc.noaa.gov/thredds/catalog/ocean_exploration_research/catalog.html?dataset=ocean_exploration_research/EX1305_SCS.nc',
  u'webstore_last_updated': None,
  u'webstore_url': None},
 {u'cache_last_updated': None,
  u'cache_url': None,
  u'created': u'2013-11-30T18:53:06.560215',
  u'description': u'',
  u'format': u'HTML',
  u'hash': u'',
  u'id': u'325e8439-2456-4a1e-b6c0-f725f2495ba1',
  u'last_modified': None,
  u'mimetype': None,
  u'mimetype_inner': None,
  u'name': u'NOAA National Oceanographic Data Center (NODC)',
  u'position': 4,
  u'resource_group_id': u'21c6ac76-d483-44f0-bfb3-4db426ca0aee',
  u'resource_locator_function': u'information',
  u'resource_locator_protocol': u'http',
  u'resource_type': None,
  u'revision_id': u'dbb02a1b-3dd1-4791-941d-45bed0eba3df',
  u'revision_timestamp': u'2013-11-30T23:53:06.530627',
  u'size': None,
  u'state': u'active',
  u'tracking_summary': {u'recent': 0, u'total': 0},
  u'url': u'http://www.nodc.noaa.gov/',
  u'webstore_last_updated': None,
  u'webstore_url': None},
 {u'cache_last_updated': None,
  u'cache_url': None,
  u'created': u'2013-11-30T18:53:06.560223',
  u'description': u'EX1305_COLLECTION_RESOLVED.xml',
  u'format': u'XML',
  u'hash': u'',
  u'id': u'ab08d131-7d91-4962-8723-e4431caa0405',
  u'last_modified': None,
  u'mimetype': None,
  u'mimetype_inner': None,
  u'name': u'XML File',
  u'position': 5,
  u'resource_group_id': u'21c6ac76-d483-44f0-bfb3-4db426ca0aee',
  u'resource_locator_function': u'',
  u'resource_locator_protocol': u'',
  u'resource_type': None,
  u'revision_id': u'dbb02a1b-3dd1-4791-941d-45bed0eba3df',
  u'revision_timestamp': u'2013-11-30T23:53:06.530627',
  u'size': None,
  u'state': u'active',
  u'tracking_summary': {u'recent': 0, u'total': 0},
  u'url': u'http://www.ncddc.noaa.gov/oer-waf/ISO/Resolved/2012/EX1305_COLLECTION_RESOLVED.xml',
  u'webstore_last_updated': None,
  u'webstore_url': None}]

So there are multiple resources for each record. Let's check out a some specific resource parameters for all datasets to see how the service endpoints might be defined:


In [10]:
urls=[]
for item in d['results']:
    for member in item['resources']:
        print 'url:',member['url']
        print 'resource_locator_protocol:',member['resource_locator_protocol']
        print 'resource_type:',member['resource_type']
        print 'format:',member['format'],'\n'
        if member['format'] == 'NetCDF' or member['resource_locator_protocol'] == 'THREDDS':
            urls.append(member['url'])


url: http://www.nodc.noaa.gov/cgi-bin/OAS/prd/accession/details/113335
resource_locator_protocol: 
resource_type: None
format: HTML 

url: http://www.nodc.noaa.gov/archive/arc0040/0113335
resource_locator_protocol: 
resource_type: None
format: HTML 

url: http://www.nodc.noaa.gov/cgi-bin/OAS/prd/accession/download/113335
resource_locator_protocol: 
resource_type: None
format: HTML 

url: http://ecowatch.ncddc.noaa.gov/thredds/catalog/ocean_exploration_research/catalog.html?dataset=ocean_exploration_research/EX1305_SCS.nc
resource_locator_protocol: 
resource_type: None
format: NetCDF 

url: http://www.nodc.noaa.gov/
resource_locator_protocol: http
resource_type: None
format: HTML 

url: http://www.ncddc.noaa.gov/oer-waf/ISO/Resolved/2012/EX1305_COLLECTION_RESOLVED.xml
resource_locator_protocol: 
resource_type: None
format: XML 

url: http://www.nodc.noaa.gov/cgi-bin/OAS/prd/accession/details/107211
resource_locator_protocol: 
resource_type: None
format: HTML 

url: http://www.nodc.noaa.gov/archive/arc0040/0107211/
resource_locator_protocol: 
resource_type: None
format: HTML 

url: http://www.nodc.noaa.gov/cgi-bin/OAS/prd/accession/download/107211
resource_locator_protocol: 
resource_type: None
format: HTML 

url: http://data.nodc.noaa.gov/thredds/catalog/testdata/ex-netcdf3/2012/SCS/catalog.html?dataset=testdata/ex-netcdf3/2012/SCS/EX1301_SCS.nc
resource_locator_protocol: 
resource_type: None
format: NetCDF 

url: http://www.nodc.noaa.gov/
resource_locator_protocol: http
resource_type: None
format: HTML 

url: http://www.ncddc.noaa.gov/oer-waf/ISO/Resolved/2012/EX1301_COLLECTION_RESOLVED.xml
resource_locator_protocol: 
resource_type: None
format: XML 

url: http://www.nodc.noaa.gov/cgi-bin/OAS/prd/accession/details/112723
resource_locator_protocol: 
resource_type: None
format: HTML 

url: http://www.nodc.noaa.gov/archive/arc0040/0112723
resource_locator_protocol: 
resource_type: None
format: HTML 

url: http://www.nodc.noaa.gov/cgi-bin/OAS/prd/accession/download/112723
resource_locator_protocol: 
resource_type: None
format: HTML 

url: http://ecowatch.ncddc.noaa.gov/thredds/catalog/ocean_exploration_research/catalog.html?dataset=ocean_exploration_research/EX1304L2_SCS.nc
resource_locator_protocol: 
resource_type: None
format: NetCDF 

url: http://www.nodc.noaa.gov/
resource_locator_protocol: http
resource_type: None
format: HTML 

url: http://www.ncddc.noaa.gov/oer-waf/ISO/Resolved/2012/EX1304_COLLECTION_RESOLVED.xml
resource_locator_protocol: 
resource_type: None
format: XML 

Lots of missing metadata information.


In [11]:
print(urls)


[u'http://ecowatch.ncddc.noaa.gov/thredds/catalog/ocean_exploration_research/catalog.html?dataset=ocean_exploration_research/EX1305_SCS.nc', u'http://data.nodc.noaa.gov/thredds/catalog/testdata/ex-netcdf3/2012/SCS/catalog.html?dataset=testdata/ex-netcdf3/2012/SCS/EX1301_SCS.nc', u'http://ecowatch.ncddc.noaa.gov/thredds/catalog/ocean_exploration_research/catalog.html?dataset=ocean_exploration_research/EX1304L2_SCS.nc']

Hmmm... None of above URLs work. The THREDDS catalog exists, but none of the datasets here are in that catalog http://ecowatch.ncddc.noaa.gov/thredds/catalog/ocean_exploration_research/catalog.html


In [11]:

Let's back off and see what the broader search yields:


In [12]:
search_params = { 'q': 'tags:"sea_water_temperature"',
     'extras': {"ext_bbox":"-60,60,-50,70"}
} 
d = ckan.action('package_search', **search_params) 
print d['count']


12

In [13]:
urls=[]
for item in d['results']:
    for member in item['resources']:
        print 'url:',member['url']
        print 'resource_locator_protocol:',member['resource_locator_protocol']
        print 'resource_type:',member['resource_type']
        print 'format:',member['format'],'\n'
        if member['format'] == 'NetCDF' or member['resource_locator_protocol'] == 'THREDDS':
            urls.append(member['url'])


url: ftp://ftp.nodc.noaa.gov/nodc/archive/arc0060/0111843/
resource_locator_protocol: FTP
resource_type: None
format:  

url: http://data.nodc.noaa.gov/thredds/catalog/glider/seaglider/uw/014/20040924/
resource_locator_protocol: THREDDS
resource_type: None
format: HTML 

url: http://data.nodc.noaa.gov/opendap/glider/seaglider/uw/014/20040924/
resource_locator_protocol: DAP
resource_type: None
format: HTML 

url: http://accession.nodc.noaa.gov/download/111843
resource_locator_protocol: HTTP
resource_type: None
format: HTML 

url: ftp://ftp.nodc.noaa.gov/nodc/archive/arc0034/0074384/
resource_locator_protocol: FTP
resource_type: None
format:  

url: http://accession.nodc.noaa.gov/download/74384
resource_locator_protocol: HTTP
resource_type: None
format: HTML 

url: http://www.ndbc.noaa.gov/
resource_locator_protocol: 
resource_type: None
format: HTML 

url: ftp://ftp.nodc.noaa.gov/nodc/archive/arc0033/0073426/
resource_locator_protocol: FTP
resource_type: None
format:  

url: http://accession.nodc.noaa.gov/download/73426
resource_locator_protocol: HTTP
resource_type: None
format: HTML 

url: http://www.ndbc.noaa.gov/
resource_locator_protocol: 
resource_type: None
format: HTML 

url: ftp://ftp.nodc.noaa.gov/nodc/archive/arc0060/0111841/
resource_locator_protocol: FTP
resource_type: None
format:  

url: http://data.nodc.noaa.gov/thredds/catalog/glider/seaglider/uw/008/20031002/
resource_locator_protocol: THREDDS
resource_type: None
format: HTML 

url: http://data.nodc.noaa.gov/opendap/glider/seaglider/uw/008/20031002/
resource_locator_protocol: DAP
resource_type: None
format: HTML 

url: http://accession.nodc.noaa.gov/download/111841
resource_locator_protocol: HTTP
resource_type: None
format: HTML 

url: ftp://ftp.nodc.noaa.gov/nodc/archive/arc0060/0111844/
resource_locator_protocol: FTP
resource_type: None
format:  

url: http://data.nodc.noaa.gov/thredds/catalog/glider/seaglider/uw/015/20040924/
resource_locator_protocol: THREDDS
resource_type: None
format: HTML 

url: http://data.nodc.noaa.gov/opendap/glider/seaglider/uw/015/20040924/
resource_locator_protocol: DAP
resource_type: None
format: HTML 

url: http://accession.nodc.noaa.gov/download/111844
resource_locator_protocol: HTTP
resource_type: None
format: HTML 

url: ftp://ftp.nodc.noaa.gov/nodc/archive/arc0032/0072077/
resource_locator_protocol: FTP
resource_type: None
format:  

url: http://accession.nodc.noaa.gov/download/72077
resource_locator_protocol: HTTP
resource_type: None
format: HTML 

url: http://www.ndbc.noaa.gov/
resource_locator_protocol: 
resource_type: None
format: HTML 

url: ftp://ftp.nodc.noaa.gov/nodc/archive/arc0032/0070959/
resource_locator_protocol: FTP
resource_type: None
format:  

url: http://accession.nodc.noaa.gov/download/70959
resource_locator_protocol: HTTP
resource_type: None
format: HTML 

url: http://www.ndbc.noaa.gov/
resource_locator_protocol: 
resource_type: None
format: HTML 

url: ftp://ftp.nodc.noaa.gov/nodc/archive/arc0061/0111845/
resource_locator_protocol: FTP
resource_type: None
format:  

url: http://data.nodc.noaa.gov/thredds/catalog/glider/seaglider/uw/016/20050406/
resource_locator_protocol: THREDDS
resource_type: None
format: HTML 

url: http://data.nodc.noaa.gov/opendap/glider/seaglider/uw/016/20050406/
resource_locator_protocol: DAP
resource_type: None
format: HTML 

url: http://accession.nodc.noaa.gov/download/111845
resource_locator_protocol: HTTP
resource_type: None
format: HTML 

url: ftp://ftp.nodc.noaa.gov/nodc/archive/arc0061/0112863/
resource_locator_protocol: FTP
resource_type: None
format:  

url: http://data.nodc.noaa.gov/thredds/catalog/glider/seaglider/uw/004/20031002/
resource_locator_protocol: THREDDS
resource_type: None
format: HTML 

url: http://data.nodc.noaa.gov/opendap/glider/seaglider/uw/004/20031002/
resource_locator_protocol: DAP
resource_type: None
format: HTML 

url: http://accession.nodc.noaa.gov/download/112863
resource_locator_protocol: HTTP
resource_type: None
format: HTML 

url: ftp://ftp.nodc.noaa.gov/nodc/archive/arc0032/0071368/
resource_locator_protocol: FTP
resource_type: None
format:  

url: http://accession.nodc.noaa.gov/download/71368
resource_locator_protocol: HTTP
resource_type: None
format: HTML 

url: http://www.ndbc.noaa.gov/
resource_locator_protocol: 
resource_type: None
format: HTML 


In [14]:
print(urls)


[u'http://data.nodc.noaa.gov/thredds/catalog/glider/seaglider/uw/014/20040924/', u'http://data.nodc.noaa.gov/thredds/catalog/glider/seaglider/uw/008/20031002/', u'http://data.nodc.noaa.gov/thredds/catalog/glider/seaglider/uw/015/20040924/', u'http://data.nodc.noaa.gov/thredds/catalog/glider/seaglider/uw/016/20050406/', u'http://data.nodc.noaa.gov/thredds/catalog/glider/seaglider/uw/004/20031002/']

These are not DAP URLS, but collections of datasets on a THREDDS server. Lets try opening a DAP url (two clicks away)


In [15]:
url='http://data.nodc.noaa.gov/thredds/dodsC/glider/seaglider/uw/014/20040924/p0140001_20040924.nc'

In [16]:
import netCDF4

In [17]:
nc = netCDF4.Dataset(url)
ncvars = nc.variables
#pprint(ncvars.keys())

In [18]:
s = ncvars['salinity'][:]
time_var = nc.variables['time']
dtime = netCDF4.num2date(time_var[:],time_var.units)
# Create Pandas time series object
ts = pd.Series(s,index=dtime)
# Use Pandas plot() method
ts.plot(figsize=(16,4))


Out[18]:
<matplotlib.axes.AxesSubplot at 0x404e810>

In [23]:
#print the last few values
print ts[-5:]


2004-09-24 18:46:50    32.837085
2004-09-24 18:46:55    32.837255
2004-09-24 18:47:00    32.832002
2004-09-24 18:47:05          NaN
2004-09-24 18:47:10          NaN

In [22]: