IOOS System Test: Extreme Events Theme: Inundation

Compare modeled water levels with observations for a specified bounding box and time period using IOOS recommended service standards for catalog search (CSW) and data retrieval (OPeNDAP & SOS).

Query CSW to find datasets that match criteria
Extract OPeNDAP data endpoints from model datasets and SOS endpoints from observational datasets
OPeNDAP model datasets will be granules
SOS endpoints may be datasets (from ncSOS) or collections of datasets (from NDBC, CO-OPS SOS servers)
Filter SOS services to obtain datasets
Extract data from SOS datasets
Extract data from model datasets at locations of observations
Compare time series data on same vertical datum



In [ ]:

    
from pylab import *
from owslib.csw import CatalogueServiceWeb
from owslib import fes
import random
import netCDF4
import pandas as pd
import datetime as dt
from pyoos.collectors.coops.coops_sos import CoopsSos
import cStringIO
import iris
import urllib2
import parser
from lxml import etree



In [ ]:

Specify a bounding box and time range of interest:



In [ ]:

    
#box=[-74.4751, 40.3890, -73.7432, 40.9397]
box=[-76.4751, 38.3890, -71.7432, 42.9397]

#box=[-180, -90, 180, 90]

# specific specific times (UTC) ...

# hurricane sandy
jd_start = dt.datetime(2012,10,26)
jd_stop = dt.datetime(2012,11,2)

# 2014 feb 10-15 storm
jd_start = dt.datetime(2014,2,10)
jd_stop = dt.datetime(2014,2,15)

# 2014 recent
jd_start = dt.datetime(2014,3,8)
jd_stop = dt.datetime(2014,3,11)

# 2011 
#jd_start = dt.datetime(2013,4,20)
#jd_stop = dt.datetime(2013,4,24)

# ... or relative to now
jd_start = dt.datetime.utcnow()- dt.timedelta(days=3)
jd_stop = dt.datetime.utcnow() + dt.timedelta(days=3)

start_date = jd_start.strftime('%Y-%m-%d %H:00')
stop_date  = jd_stop.strftime('%Y-%m-%d %H:00')

jd_start = dt.datetime.strptime(start_date,'%Y-%m-%d %H:%M')
jd_stop = dt.datetime.strptime(stop_date,'%Y-%m-%d %H:%M')

print start_date,'to',stop_date

sos_name = 'water_surface_height_above_reference_datum'

Now we need to specify all the names we know for water level, names that will get used in the CSW search, and also to find data in the datasets that are returned. This is ugly and fragile. There hopefully will be a better way in the future...



In [ ]:

    
std_name_list=['water_surface_height_above_reference_datum',
    'sea_surface_height_above_geoid','sea_surface_elevation',
    'sea_surface_height_above_reference_ellipsoid','sea_surface_height_above_sea_level',
    'sea_surface_height','water level']

1. Get observations from SOS



In [ ]:

    
collector = CoopsSos()



In [ ]:

    
collector.server.identification.title



In [ ]:

    
collector.start_time = jd_start
collector.end_time = jd_stop
collector.variables = [sos_name]



In [ ]:

    
ofrs = collector.server.offerings



In [ ]:

    
print len(ofrs)
for p in ofrs[700:710]: print p

Find the SOS stations within our bounding box and time extent

We would like to just use a filter on a collection to get a new collection, but PYOOS doesn't do that yet. So we do a GetObservation request for a collection, including a bounding box, and asking for one value at the start of the time period of interest. We use that to do a bounding box filter on the SOS server, which returns 1 point for each station found. So for 3 stations, we get back 3 records, in CSV format. We can strip the station ids from the CSV, and then we have a list of stations we can use with pyoos. The template for the GetObservation query for the bounding box filtered collection was generated using the GUI at http://opendap.co-ops.nos.noaa.gov/ioos-dif-sos/



In [ ]:

    
iso_start = jd_start.strftime('%Y-%m-%dT%H:%M:%SZ')
print iso_start
box_str=','.join(str(e) for e in box)
print box_str



In [ ]:

    
url=('http://opendap.co-ops.nos.noaa.gov/ioos-dif-sos/SOS?'
     'service=SOS&request=GetObservation&version=1.0.0&'
     'observedProperty=%s&offering=urn:ioos:network:NOAA.NOS.CO-OPS:WaterLevelActive&'
     'featureOfInterest=BBOX:%s&responseFormat=text/csv&eventTime=%s&'
     'result=VerticalDatum==urn:ogc:def:datum:epsg::5103&'
     'dataType=PreliminarySixMinute') % (sos_name,box_str,iso_start)
print url
obs_loc_df = pd.read_csv(url)



In [ ]:

    
obs_loc_df.head()



In [ ]:

    
stations = [sta.split(':')[-1] for sta in obs_loc_df['station_id']]
print stations
obs_lon = [sta for sta in obs_loc_df['longitude (degree)']]
obs_lat = [sta for sta in obs_loc_df['latitude (degree)']]
print obs_lon
print obs_lat

Get longName from SOS DescribeSensor request



In [ ]:

    
def get_Coops_longName(sta):
    """
    get longName for specific station from COOPS SOS using DescribeSensor request
    """
    url=('http://opendap.co-ops.nos.noaa.gov/ioos-dif-sos/SOS?service=SOS&'
        'request=DescribeSensor&version=1.0.0&outputFormat=text/xml;subtype="sensorML/1.0.1"&'
        'procedure=urn:ioos:sensor:NOAA.NOS.CO-OPS:%s:B1') % sta
    tree = etree.parse(urllib2.urlopen(url))
    root = tree.getroot()
    longName=root.xpath("//sml:identifier[@name='longName']/sml:Term/sml:value/text()", namespaces={'sml':"http://www.opengis.net/sensorML/1.0.1"})
    return longName



In [ ]:

    
long_names=[get_Coops_longName(sta) for sta in stations]
print long_names



In [ ]:

    
long_names=[]
for sta in stations:
    a = get_Coops_longName(sta)
    if len(a)==0:
        long_names.append(sta)
    else:
        long_names.append(a[0])
print long_names

Request CSV response from SOS and convert to Pandas DataFrames



In [ ]:

    
def coops2df(collector,coops_id,sos_name):
    collector.features = [coops_id]
    collector.variables = [sos_name]
    response = collector.raw(responseFormat="text/csv")
    data_df = pd.read_csv(cStringIO.StringIO(str(response)), parse_dates=True, index_col='date_time')
    data_df['lev']=data_df['water_surface_height_above_reference_datum (m)']-data_df['vertical_position (m)']
    return data_df

Generate a uniform 6-min time base for model/data comparison:



In [ ]:

    
ts_rng = pd.date_range(start=jd_start, end=jd_stop, freq='6Min')
ts = pd.DataFrame(index=ts_rng)
print jd_start,jd_stop
print len(ts)

Create a dictionary of obs dataframes, one for each station:



In [ ]:

    
obs_df={}
for sta in stations:
    b=coops2df(collector,sta,sos_name)
    obs_df[sta] = pd.concat([b, ts],axis=1).interpolate()

Search CSW for datasets of interest



In [ ]:

    
#from IPython.core.display import HTML
#HTML('<iframe src=http://www.ngdc.noaa.gov/geoportal/ width=950 height=400></iframe>')



In [ ]:

    
# connect to CSW, explore it's properties

endpoint = 'http://www.ngdc.noaa.gov/geoportal/csw' # NGDC Geoportal
#endpoint = 'http://geoport.whoi.edu/geoportal/csw'  # USGS WHSC Geoportal

#endpoint = 'http://www.nodc.noaa.gov/geoportal/csw'   # NODC Geoportal: granule level
#endpoint = 'http://data.nodc.noaa.gov/geoportal/csw'  # NODC Geoportal: collection level   
#endpoint = 'http://geodiscover.cgdi.ca/wes/serviceManagerCSW/csw'  # NRCAN CUSTOM
#endpoint = 'http://geoport.whoi.edu/gi-cat/services/cswiso' # USGS Woods Hole GI_CAT
#endpoint = 'http://cida.usgs.gov/gdp/geonetwork/srv/en/csw' # USGS CIDA Geonetwork
#endpoint = 'http://cmgds.marine.usgs.gov/geonetwork/srv/en/csw' # USGS Coastal and Marine Program
#endpoint = 'http://geoport.whoi.edu/geoportal/csw' # USGS Woods Hole Geoportal 
#endpoint = 'http://geo.gov.ckan.org/csw'  # CKAN testing site for new Data.gov
#endpoint = 'https://edg.epa.gov/metadata/csw'  # EPA
#endpoint = 'http://cwic.csiss.gmu.edu/cwicv1/discovery'  # CWIC

csw = CatalogueServiceWeb(endpoint,timeout=60)
csw.version



In [ ]:

    
for oper in csw.operations:
    if oper.name == 'GetRecords':
        print '\nISO Queryables:\n',oper.constraints['SupportedISOQueryables']['values']



In [ ]:

    
# hopefully something like this will be implemented in fes soon
def dateRange(start_date='1900-01-01',stop_date='2100-01-01',constraint='overlaps'):
    if constraint == 'overlaps':
        start = fes.PropertyIsLessThanOrEqualTo(propertyname='apiso:TempExtent_begin', literal=stop_date)
        stop = fes.PropertyIsGreaterThanOrEqualTo(propertyname='apiso:TempExtent_end', literal=start_date)
    elif constraint == 'within':
        start = fes.PropertyIsGreaterThanOrEqualTo(propertyname='apiso:TempExtent_begin', literal=start_date)
        stop = fes.PropertyIsLessThanOrEqualTo(propertyname='apiso:TempExtent_end', literal=stop_date)
    return start,stop



In [ ]:

    
stop_date



In [ ]:

    
# convert User Input into FES filters
start,stop = dateRange(start_date,stop_date)
bbox = fes.BBox(box)



In [ ]:

    
or_filt = fes.Or([fes.PropertyIsLike(propertyname='apiso:AnyText',literal=('*%s*' % val),
                    escapeChar='\\',wildCard='*',singleChar='?') for val in std_name_list])

ROMS model output often has Averages and History files. The Averages files are usually averaged over a tidal cycle or more, while the History files are snapshots at that time instant. We are not interested in averaged data for this test, so in the cell below we remove any Averages files here by removing any datasets that have the term "Averages" in the metadata text. A better approach would be to look at the cell_methods attributes propagated through to some term in the ISO metadata, but this is not implemented yet, as far as I know



In [ ]:

    
val = 'Averages'
not_filt = fes.Not([fes.PropertyIsLike(propertyname='apiso:AnyText',literal=('*%s*' % val),
                        escapeChar='\\',wildCard='*',singleChar='?')])



In [ ]:

    
filter_list = [fes.And([ bbox, start, stop, or_filt, not_filt]) ]



In [ ]:

    
# try request using multiple filters "and" syntax: [[filter1,filter2]]
csw.getrecords2(constraints=filter_list,maxrecords=1000,esn='full')
print len(csw.records.keys())

Now print out some titles



In [ ]:

    
for rec,item in csw.records.iteritems():
    print item.title

Define a function that will return the endpoint for a specified service type



In [ ]:

    
# get specific ServiceType URL from records
def service_urls(records,service_string='urn:x-esri:specification:ServiceType:odp:url'):
    urls=[]
    for key,rec in records.iteritems():
        #create a generator object, and iterate through it until the match is found
        #if not found, gets the default value (here "none")
        url = next((d['url'] for d in rec.references if d['scheme'] == service_string), None)
        if url is not None:
            urls.append(url)
    return urls

Print out all the OPeNDAP Data URL endpoints



In [ ]:

    
dap_urls = service_urls(csw.records,service_string='urn:x-esri:specification:ServiceType:odp:url')
print "\n".join(dap_urls)

Print out all the SOS Data URL endpoints



In [ ]:

    
sos_urls = service_urls(csw.records,service_string='urn:x-esri:specification:ServiceType:sos:url')
print "\n".join(sos_urls)



In [ ]:

    
def nearxy(x,y,xi,yi):
    ind=ones(len(xi),dtype=int)
    dd=ones(len(xi),dtype='float')
    for i in arange(len(xi)):
        dist=sqrt((x-xi[i])**2+(y-yi[i])**2)
        ind[i]=dist.argmin()
        dd[i]=dist[ind[i]]
    return ind,dd



In [ ]:

    
def find_ij(x,y,d,xi,yi):
    """
    find non-NaN cell d[j,i] that are closest to xi,yi.
    """
    index = where(~isnan(d.flatten()))[0]
    ind,dd = nearxy(x.flatten()[index],y.flatten()[index],xi,yi)
    j,i=ind2ij(x,index[ind])
    return i,j,dd



In [ ]:

    
def find_timevar(cube):
    try:
        cube.coord(axis='T').rename('time')
    except:
        pass
    timevar = cube.coord('time')
    return timevar



In [ ]:

    
def ind2ij(a,index):
    """
    returns a[j,i] for a.ravel()[index]
    """
    n,m = shape(lon)
    j = ceil(index/m).astype(int)
    i = remainder(index,m)
    return i,j



In [ ]:

    
print std_name_list
def name_in_list(cube):
    return cube.standard_name in std_name_list
constraint = iris.Constraint(cube_func=name_in_list)



In [ ]:

    
#dap_urls=['http://geoport-dev.whoi.edu/thredds/dodsC/estofs/atlantic']
mod_df={}

for url in dap_urls:
    try:
        a = iris.load_cube(url,constraint)
        # take first 20 chars for name
        name = a.attributes['title'][0:20]
        r = shape(a)
        timevar = find_timevar(a)
        lat = a.coord(axis='Y').points
        lon = a.coord(axis='X').points
        jd = timevar.units.num2date(timevar.points)
        istart = timevar.nearest_neighbour_index(timevar.units.date2num(jd_start))
        istop = timevar.nearest_neighbour_index(timevar.units.date2num(jd_stop))
        mod_df[name]=[]
        if istart != istop:
            print url
            nsta = len(obs_lon)
            if len(r)==3:
                j,i,dd = find_ij(lon,lat,d,obs_lon,obs_lat)
                d = a[0,:,:].data
                for n in range(nsta):
                    arr = a[istart:istop,j[n],i[n]].data                    
                    if dd[n]<0.1:
                        b = pd.DataFrame(arr,index=jd[istart:istop],columns=[stations[n]])
                        c = pd.concat([b, ts],axis=1).interpolate()
                        mod_df[name].append(c)
            elif len(r)==2:
                index,dd = nearxy(lon.ravel(),lat.ravel(),obs_lon,obs_lat)
                for n in range(nsta):
                    if dd[n]<0.1:
                        arr = a[istart:istop,index[n]].data
                        b = pd.DataFrame(arr,index=jd[istart:istop],columns=[stations[n]])
                        c = pd.concat([b, ts],axis=1).interpolate()
                        mod_df[name].append(c)
                   
    except:
        pass



In [ ]:

    
for key in mod_df:
    pd.concat(mod_df[key],axis=1).plot(title=key)



In [ ]:

    
mod_df={}
url='http://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NYOFS/fmrc/Aggregated_7_day_NYOFS_Fields_Forecast_best.ncd'

a = iris.load_cube(url,constraint)
# take first 20 chars for name
name = a.attributes['title'][0:20]
r = shape(a)
timevar = find_timevar(a)
lat = a.coord(axis='Y').points
lon = a.coord(axis='X').points
jd = timevar.units.num2date(timevar.points)
istart = timevar.nearest_neighbour_index(timevar.units.date2num(jd_start))
istop = timevar.nearest_neighbour_index(timevar.units.date2num(jd_stop))
mod_df[name]=[]
if istart != istop:
    print url
    nsta = len(obs_lon)
    if len(r)==3:
        j,i,dd = find_ij(lon,lat,d,obs_lon,obs_lat)
        d = a[0,:,:].data
        for n in range(nsta):
            arr = a[istart:istop,j[n],i[n]].data                    
            if dd[n]<0.1:
                b = pd.DataFrame(arr,index=jd[istart:istop],columns=[stations[n]])
                c = pd.concat([b, ts],axis=1).interpolate()
                mod_df[name].append(c)
    elif len(r)==2:
        index,dd = nearxy(lon.ravel(),lat.ravel(),obs_lon,obs_lat)
        for n in range(nsta):
            if dd[n]<0.1:
                arr = a[istart:istop,index[n]].data
                b = pd.DataFrame(arr,index=jd[istart:istop],columns=[stations[n]])
                c = pd.concat([b, ts],axis=1).interpolate()
                mod_df[name].append(c)



In [ ]:

    
for sta in stations:
    df1=pd.DataFrame(obs_df[sta]['lev'])
    df2=mod_df[sta]
    foo = pd.concat([df1, df2],axis=1)
    foo.plot(figsize=(16,6))



In [ ]:

    
foo=pd.concat([,mod_df[stations[0]],axis=1).interpolate()



In [ ]:

    
url='http://www.smast.umassd.edu:8080/thredds/dodsC/FVCOM/NECOFS/Forecasts/NECOFS_GOM3_FORECAST.nc'
a=iris.load_raw(url,'sea_surface_height_above_geoid')[0]



In [ ]:

    
print a[0]



In [ ]:



In [ ]:

    
b =pd.concat([mod_df[1], ts],axis=1).interpolate()



In [ ]:

    
b.head()



In [ ]:

    
url='http://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NYOFS/fmrc/Aggregated_7_day_NYOFS_Fields_Forecast_best.ncd'



In [ ]:

    
a = iris.load_cube(url, constraint)
#    timevar = a.coord(axis='T')
timevar = find_timevar(a)
lat = a.coord(axis='Y').points
lon = a.coord(axis='X').points
jd = timevar.units.num2date(timevar.points)
istart = timevar.nearest_neighbour_index(timevar.units.date2num(jd_start))
istop = timevar.nearest_neighbour_index(timevar.units.date2num(jd_stop))
d = a[0,:,:].data
index = where(~isnan(d.flatten()))[0]
ind,dd = nearxy(lon.flatten()[index],lat.flatten()[index],obs_lon,obs_lat)
j,i=ind2ij(lon,index[ind])
j,i,dd = find_ij(lon,lat,d,obs_lon,obs_lat)



In [ ]:

    
pcolormesh(lon,lat,d,vmin=-1.0,vmax=1.0)
plot(obs_lon[0],obs_lat[0],'wo')
plot(lon.flatten()[index],lat.flatten()[index],'w.')
plot(lon[j[0],i[0]],lat[j[0],i[0]],'ro')



In [ ]:

    
ind=where(~isnan(d))[0]



In [ ]:

    
index=nearxy(lon.ravel()[ind],lat.ravel()[ind],obs_lon,obs_lat)



In [ ]:

    
ind[index]



In [ ]:

    
index



In [ ]:

    
jj,ii=ind2ij(lon,index)



In [ ]:

    
lon[jj,ii]



In [ ]:

    
obs_lon



In [ ]:

    
lon[ii,jj]



In [ ]:

    
plot(lon.ravel(),lat.ravel(),'g.')
plot(obs_lon,obs_lat,'k.')
plot(lon[ii,jj],lat[ii,jj],'c.')



In [ ]:

    
plot(lon.ravel()[ind],lat.ravel()[ind],'k.')



In [ ]:



In [ ]:

    
d.min()



In [ ]:

    
d.max()



In [ ]:

    
print a[1,50,30].data



In [ ]:

    
for sta in stations:
    obs_df[sta]['lev'].plot(figsize=(16,6))



In [ ]: