NDBC Station 41047 Spectral Wave Density Data

This notebook explores the differences between spectral wave density data products for NDBC Station 41047. It investigates three unique OPeNDAP sources for the data: Planet OS, NDBC realtime, and NDBC 2014 historical products.

Planet OS currently acquires spectral wave density data from each individual station's realtime product, which is denoted by the w9999 nomenclature immediately before the file extension. We would expect the Planet OS product and the NDBC realtime product to be identical.

Let's begin with the required imports...



In [1]:

    
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr
from package_api import download_data_station
import datetime

Next we'll setup our data sources and acquire the data via OPeNDAP using xarray.



In [2]:

    
API_key = open('APIKEY').readlines()[0].strip() #'<YOUR API KEY HERE>'
dataset_key = 'noaa_ndbc_swden_stations'
variables = 'spectral_wave_density,mean_wave_dir,principal_wave_dir,wave_spectrum_r1,wave_spectrum_r2'



In [29]:

    
# OpenDAP URLs for each product
now = datetime.datetime.now()
ndbc_rt_url='http://dods.ndbc.noaa.gov/thredds/dodsC/data/swden/41047/41047w9999.nc'
ndbc_url = 'http://dods.ndbc.noaa.gov/thredds/dodsC/data/swden/41047/41047w' + str(now.year) + '.nc'
planetos_filename = download_data_station(dataset_key,API_key,'41047','2018-01-01T00:00:00',datetime.datetime.strftime(now,'%Y-%m-%dT%H:%M:%S')
,variables,'41047')
#planetos_tds_url = 'https://api.planetos.com/v1/datasets/noaa_ndbc_swden_stations/stations/41047?origin=dataset-details&station=46237&apikey=8428878e4b944abeb84790e832c633fc&_ga=2.215365009.721611707.1530692788-133742091.1504032768&netcdf=true'#'http://thredds.planetos.com/thredds/dodsC/dpipe//rel_0_8x03_dataset/transform/ns=/noaa_ndbc_swden_stations/scheme=/http/authority=/dods.ndbc.noaa.gov/path=/thredds/dodsC/data/swden/41047/41047w9999.nc/chunk=/1/1/data'

# acquire OpenDAP datasets
ds_ndbc_rt = xr.open_dataset(ndbc_rt_url)
ds_ndbc = xr.open_dataset(ndbc_url)
ds_planetos = xr.open_dataset(planetos_filename)









    



/Users/etoodu/Desktop/planetOS/git/notebooks/api-examples
https://api.planetos.com/v1/packages?dataset=noaa_ndbc_swden_stations&apikey=8428878e4b944abeb84790e832c633fc&station=41047&grouping=location&reftime_recent=true&time_start=2018-01-01T00:00:00&time_end=2019-04-03T16:52:35&package=noaa_ndbc_swden_stations_20180101T000000to20190403T165235_41047&var=spectral_wave_density,mean_wave_dir,principal_wave_dir,wave_spectrum_r1,wave_spectrum_r2
https://api.planetos.com/v1/packages/noaa_ndbc_swden_stations_20180101T000000to20190403T165235_41047?apikey=8428878e4b944abeb84790e832c633fc
https://api.planetos.com/v1/packages/noaa_ndbc_swden_stations_20180101T000000to20190403T165235_41047/data?apikey=8428878e4b944abeb84790e832c633fc
Please wait while package is downloaded
Package download was unsuccessful, please wait a few seconds...






    



---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-29-3af60fa45278> in <module>()
     10 ds_ndbc_rt = xr.open_dataset(ndbc_rt_url)
     11 ds_ndbc = xr.open_dataset(ndbc_url)
---> 12 ds_planetos = xr.open_dataset(planetos_filename)

/Users/etoodu/anaconda/envs/py36/lib/python3.6/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs)
    318                                                    group=group,
    319                                                    autoclose=autoclose,
--> 320                                                    **backend_kwargs)
    321         elif engine == 'scipy':
    322             store = backends.ScipyDataStore(filename_or_obj,

/Users/etoodu/anaconda/envs/py36/lib/python3.6/site-packages/xarray/backends/netCDF4_.py in open(cls, filename, mode, format, group, writer, clobber, diskless, persist, autoclose, lock)
    330                                    diskless=diskless, persist=persist,
    331                                    format=format)
--> 332         ds = opener()
    333         return cls(ds, mode=mode, writer=writer, opener=opener,
    334                    autoclose=autoclose, lock=lock)

/Users/etoodu/anaconda/envs/py36/lib/python3.6/site-packages/xarray/backends/netCDF4_.py in _open_netcdf4_group(filename, mode, group, **kwargs)
    229     import netCDF4 as nc4
    230 
--> 231     ds = nc4.Dataset(filename, mode=mode, **kwargs)
    232 
    233     with close_on_error(ds):

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.__init__()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()

FileNotFoundError: [Errno 2] No such file or directory: b'/Users/etoodu/Desktop/planetOS/git/notebooks/api-examples/noaa_ndbc_swden_stations_20180101T000000to20190403T165235_41047'



In [ ]:

    
# Let's focus on a specific hour of interest...
time = str((datetime.datetime.now() - datetime.timedelta(days=60)).strftime('%Y-%m-%d')) + ' 00:00:00' #'2014-08-09 00:00:00'

# Select the specific hour for each dataset
ds_ndbc_rt_hour = ds_ndbc_rt.sel(time=time).isel(latitude=0, longitude=0)
ds_ndbc_hour = ds_ndbc.sel(time=time).isel(latitude=0, longitude=0)
ds_planetos_hour = ds_planetos.sel(time=time).isel(latitude=0, longitude=0)

Product Inspection: Planet OS / NDBC Realtime / NDBC 2014 Historical

For each of our three data products, we'll create an associated Dataframe for analysis.



In [ ]:

    
# First, the Planet OS data which is acquired from the NDBC realtime station file.
df_planetos = ds_planetos_hour.to_dataframe().drop(['context_time_latitude_longitude_frequency','mx_dataset','mx_creator_institution'], axis=1)
df_planetos.head(8)



In [ ]:

    
# Second, the NDBC realtime station data.
df_ndbc_rt = ds_ndbc_rt_hour.to_dataframe()
df_ndbc_rt.head(8)



In [ ]:

    
# Finally, the 2014 archival data.
df_ndbc = ds_ndbc_hour.to_dataframe()
df_ndbc.head(8)

Based on the sample outputs above, it appears that the Planet OS data matches the NDBC realtime file that it is acquired from. We will further verify this below by performing an equality test against the two Dataframes.

We can also see that the historical data is indeed different, with frequency bins that are neatly rounded and values for wave direction and wave spectrum even when spectral wave density is 0.

Using the describe() method we can explore the statistical characteristics of each in more detail below. Note that the NaN values present in the Planet OS and NDBC realtime datasets will raise warnings for percentile calculations.



In [ ]:

    
df_planetos.describe()



In [ ]:

    
df_ndbc_rt.describe()



In [ ]:

    
df_ndbc.describe()

Confirm Planet OS Equality to NDBC Realtime

To confirm that the Planet OS and NDBC realtime Dataframes are indeed equal, we'll perform a diff. Note that NaN != NaN evaluates as True, so NaN values will be raised as inconsistent across the dataframes. This could be resolved using fillna() and an arbitrary fill value such as -9999.99.



In [ ]:

    
# function below requires identical index structure
def df_diff(df1, df2):
    ne_stacked = (df1 != df2).stack()
    changed = ne_stacked[ne_stacked]
    difference_locations = np.where(df1 != df2)
    changed_from = df1.values[difference_locations]
    changed_to = df2.values[difference_locations]
    return pd.DataFrame({'df1': changed_from, 'df2': changed_to}, index=changed.index)

# Compare the NDBC realtime to Planet OS data
# Note that NaN != NaN evaluates as True, so NaN values will be raised as inconsistent across the dataframes
# We could use fillna() to fix this issue, however this is not implemented here.
df_diff(df_ndbc_rt, df_planetos)

The df_dff results are as expected, only NaN values are different between the two datasets.

Spectral Wave Density Plot

Let's plot the spectral wave density for all three datasets across the frequency coverage to see how they differ.



In [ ]:

    
plt.figure(figsize=(20,10))
ds_ndbc_rt_hour.spectral_wave_density.plot(label='NDBC Realtime')
ds_ndbc_hour.spectral_wave_density.plot(label='NDBC ' + str(now.year))
ds_planetos_hour.spectral_wave_density.plot(label='Planet OS')
plt.legend()
plt.show()

There is a very slight discrepancy between the 2014 NDBC product and the Planet OS product, but no difference between the realtime NDBC product and Planet OS product.

Wave Spectrum Plots



In [ ]:

    
vars = ['wave_spectrum_r1','wave_spectrum_r2']
df_planetos.loc[:,vars].plot(label="Planet OS", figsize=(18,6))
df_ndbc_rt.loc[:,vars].plot(label="NDBC Realtime", figsize=(18,6))
df_ndbc.loc[:,vars].plot(label="NDBC " + str(now.year), figsize=(18,6))
plt.show()

Wave Direction Plots



In [ ]:

    
vars = ['principal_wave_dir','mean_wave_dir']
df_planetos.loc[:,vars].plot(label="Planet OS", figsize=(18,6))
df_ndbc_rt.loc[:,vars].plot(label="NDBC Realtime", figsize=(18,6))
df_ndbc.loc[:,vars].plot(label="NDBC " + str(now.year), figsize=(18,6))
plt.show()

Conclusion

The Planet OS NDBC spectral wave density data product matches the original NDBC realtime source.

It appears that NDBC is performing addition QA/QC processing on the archival data, which differ slightly from the realtime data, however attempts to locate documentation on the historical data product processing have not been successful.

Planet OS does not currently overwrite historical data with the NDBC archival products, however we may consider doing so if product quality is superior for end users.



In [ ]: