Testing data.gov using the CKAN API

We want to find all Data.gov datasets that match a specific criteria. Here we try using the CKAN API with the ckanclient package.


In [4]:
import ckanclient

In [5]:
ckan = ckanclient.CkanClient('http://catalog.data.gov/api/3')
#ckan = ckanclient.CkanClient('https://data.noaa.gov/api/3')

In [6]:
search_params = { 
    'fq': 'res_format:WMS',
    'rows': 10 
}
      
d = ckan.action('package_search', **search_params) 
print d['count']


5019

In [7]:
for rec in d['results']:
    print rec['title']


National Park Boundaries
Railroad Mileposts BNSF
USGS National Elevation Dataset (NED)
USGS US Topo Map Collection
Syria_IDPSites_2015Apr16_HIU_USDoS
New Mexico Mountain Ranges
Airport Runways
GPS Roads
Bathymetry of Lake Superior
2013 FEMA Firm Panels

Find all WMS data matching additional query criteria


In [8]:
search_params = { 
    'q': 'mvco', 
    'fq': 'res_format:WMS',
    'rows': 10 
}
      
d = ckan.action('package_search', **search_params) 
print d['count']


1

So what does one of these results look like? Let's take a look at the keys


In [9]:
print d['results'][0].keys()


[u'license_title', u'maintainer', u'relationships_as_object', u'private', u'maintainer_email', u'num_tags', u'id', u'metadata_created', u'metadata_modified', u'author', u'author_email', u'state', u'version', u'license_id', u'type', u'resources', u'num_resources', u'tags', u'tracking_summary', u'groups', u'organization', u'relationships_as_subject', u'revision_timestamp', u'name', u'isopen', u'url', u'notes', u'owner_org', u'extras', u'title', u'revision_id']

Now let's see what the urls looks like for all the resources


In [10]:
pprint(d['results'][0]['resources'])


Pretty printing has been turned OFF

So there are multiple resources for each record. Let's check out a some specific resource parameters for all datasets to see how the service endpoints might be defined:


In [11]:
urls=[]
for item in d['results']:
    for member in item['resources']:
        print 'url:',member['url']
        print 'resource_locator_protocol:',member['resource_locator_protocol']
        print 'resource_type:',member['resource_type']
        print 'format:',member['format'],'\n'
        if member['format'] == 'NetCDF' or member['resource_locator_protocol'] == 'THREDDS':
            urls.append(member['url'])


url: http://pubs.usgs.gov/of/2008/1288/GIS_catalog/Bathy/bathy_2m.zip
resource_locator_protocol: 
resource_type: None
format: ZIP 

url: http://cmgds.marine.usgs.gov/geoserver/bathy/wms?service=WMS&version=1.1.0&request=GetMap&layers=bathy:2008-1288_bathy&styles=&bbox=-70.60143471462636,41.300607371198,-70.51082222580634,41.3499367012399&srs=EPSG:4326&WIDTH=256&HEIGHT=256&FORMAT=image/png
resource_locator_protocol: 
resource_type: None
format: WMS 

url: http://pubs.usgs.gov/of/2008/1288/GIS_catalog/Bathy/bathy_2m.zip
resource_locator_protocol: 
resource_type: None
format: ZIP 

url: http://pubs.usgs.gov/of/2008/1288/html/gis.html
resource_locator_protocol: 
resource_type: None
format: HTML 

url: http://pubs.usgs.gov/of/2008/1288/
resource_locator_protocol: 
resource_type: None
format:  

Lots of missing metadata information.


In [12]:
print(urls)


[]

Hmmm... None of above URLs work. The THREDDS catalog exists, but none of the datasets here are in that catalog http://ecowatch.ncddc.noaa.gov/thredds/catalog/ocean_exploration_research/catalog.html


In [12]:

Let's back off and see what the broader search yields:


In [14]:
search_params = { 'q': 'tags:"temperature"',
     'extras': {"ext_bbox":"-60,60,-50,70"}
} 
d = ckan.action('package_search', **search_params) 
print d['count']


14

In [15]:
urls=[]
for item in d['results']:
    for member in item['resources']:
        print 'url:',member['url']
        print 'resource_locator_protocol:',member['resource_locator_protocol']
        print 'resource_type:',member['resource_type']
        print 'format:',member['format'],'\n'
        if member['format'] == 'NetCDF' or member['resource_locator_protocol'] == 'THREDDS':
            urls.append(member['url'])


url: http://hurricane.ncdc.noaa.gov/pls/paleox/f?p=518:1:3874104498420267:::APP:PROXYTOSEARCH:16:
resource_locator_protocol: HTTP
resource_type: None
format: ascii 

url: http://gis.ncdc.noaa.gov/map/viewer/#app=cdo&cfg=paleo&theme=paleo&node=gis
resource_locator_protocol: HTTP
resource_type: None
format: KML 

url: http://www.ncdc.noaa.gov/paleo/pollen.html
resource_locator_protocol: HTTP
resource_type: None
format: HTML 

url: http://hurricane.ncdc.noaa.gov/pls/paleox/f?p=519:1:::::P1_STUDY_ID:13988
resource_locator_protocol: HTTP
resource_type: None
format:  

url: http://gcmd.nasa.gov/Resources/valids/archives/keyword_list.html
resource_locator_protocol: HTTP
resource_type: None
format: HTML 

url: http://hurricane.ncdc.noaa.gov/pls/paleox/f?p=518:1:3874104498420267:::APP:PROXYTOSEARCH:18:
resource_locator_protocol: HTTP
resource_type: None
format: ascii 

url: http://gis.ncdc.noaa.gov/map/viewer/#app=cdo&cfg=paleo&theme=paleo&node=gis
resource_locator_protocol: HTTP
resource_type: None
format: KML 

url: http://www.ncdc.noaa.gov/paleo/treering.html
resource_locator_protocol: HTTP
resource_type: None
format: HTML 

url: http://hurricane.ncdc.noaa.gov/pls/paleox/f?p=519:1:::::P1_STUDY_ID:13831
resource_locator_protocol: HTTP
resource_type: None
format:  

url: http://gcmd.nasa.gov/Resources/valids/archives/keyword_list.html
resource_locator_protocol: HTTP
resource_type: None
format: HTML 

url: http://spidr.ngdc.noaa.gov/spidr/querydmsp.do
resource_locator_protocol: http
resource_type: None
format:  

url: http://www.ngdc.noaa.gov/stp/spaceweather.html
resource_locator_protocol: http
resource_type: None
format: HTML 

url: http://spidr.ngdc.noaa.gov/spidr/querydmsp.do
resource_locator_protocol: 
resource_type: None
format:  

url: http://www.ngdc.noaa.gov/stp/spaceweather.html
resource_locator_protocol: 
resource_type: None
format: HTML 

url: http://www.ngdc.noaa.gov/stp/spaceweather.html
resource_locator_protocol: http
resource_type: None
format: HTML 

url: http://spidr.ngdc.noaa.gov/spidr/querydmsp.do
resource_locator_protocol: http
resource_type: None
format:  

url: http://tidesandcurrents.noaa.gov/stations.html?type=Water+Levels
resource_locator_protocol: 
resource_type: None
format: ascii 

url: http://opendap.co-ops.nos.noaa.gov/ioos-dif-sos/
resource_locator_protocol: 
resource_type: None
format: ascii 

url: http://opendap.co-ops.nos.noaa.gov/axis/
resource_locator_protocol: 
resource_type: None
format: ascii 

url: http://tidesandcurrents.noaa.gov/stations.html?type=Water+Levels
resource_locator_protocol: 
resource_type: None
format: ascii 

url: http://hurricane.ncdc.noaa.gov/pls/paleox/f?p=518:1::::APP:PROXYTOSEARCH:7:
resource_locator_protocol: HTTP
resource_type: None
format: ascii 

url: http://gis.ncdc.noaa.gov/map/viewer/#app=cdo&cfg=paleo&theme=paleo&node=gis
resource_locator_protocol: HTTP
resource_type: None
format: KML 

url: http://www.ncdc.noaa.gov/paleo/icecore.html
resource_locator_protocol: HTTP
resource_type: None
format: HTML 

url: http://hurricane.ncdc.noaa.gov/pls/paleox/f?p=519:1:::::P1_STUDY_ID:13943
resource_locator_protocol: HTTP
resource_type: None
format:  

url: http://gcmd.nasa.gov/Resources/valids/archives/keyword_list.html
resource_locator_protocol: HTTP
resource_type: None
format: HTML 

url: http://www.pifsc.noaa.gov
resource_locator_protocol: WWW:LINK-1.0-http--link
resource_type: None
format:  

url: http://www.pifsc.noaa.gov
resource_locator_protocol: WWW:LINK-1.0-http--link
resource_type: None
format:  

url: http://gcmd.nasa.gov/learn/keyword_list.html
resource_locator_protocol: WWW:LINK-1.0-http--link
resource_type: None
format: HTML 

url: ftp://ftp.cpc.ncep.noaa.gov/wd53rl/ssu/
resource_locator_protocol: 
resource_type: None
format:  

url: ftp://ftp.cpc.ncep.noaa.gov/wd53rl/ssu/
resource_locator_protocol: 
resource_type: None
format:  

url: http://www.ngdc.noaa.gov/ecosys/cdroms/Pathfinder98/pathfind.htm
resource_locator_protocol: 
resource_type: None
format: HTML 

url: http://www.ngdc.noaa.gov/ecosys/cdroms/Pathfinder98/pathfind.htm
resource_locator_protocol: http
resource_type: None
format: HTML 

url: http://hurricane.ncdc.noaa.gov/pls/paleox/f?p=514
resource_locator_protocol: HTTP
resource_type: None
format: ascii 

url: http://gis.ncdc.noaa.gov/map/viewer/#app=cdo&cfg=paleo&theme=paleo&node=gis
resource_locator_protocol: HTTP
resource_type: None
format: KML 

url: http://www.ncdc.noaa.gov/paleo/recons.html
resource_locator_protocol: HTTP
resource_type: None
format: HTML 

url: http://hurricane.ncdc.noaa.gov/pls/paleox/f?p=519:1:::::P1_STUDY_ID:13925
resource_locator_protocol: HTTP
resource_type: None
format:  

url: http://gcmd.nasa.gov/Resources/valids/archives/keyword_list.html
resource_locator_protocol: HTTP
resource_type: None
format: HTML 

url: http://spidr.ngdc.noaa.gov
resource_locator_protocol: 
resource_type: None
format:  

url: http://spidr.ngdc.noaa.gov
resource_locator_protocol: 
resource_type: None
format:  

url: http://www.ngdc.noaa.gov/dmsp/sensors/ssmi.html
resource_locator_protocol: http
resource_type: None
format: HTML