This notebook explores the differences between spectral wave density data products for NDBC Station 41047. It investigates three unique OPeNDAP sources for the data: Planet OS, NDBC realtime, and NDBC 2014 historical products.
Planet OS currently acquires spectral wave density data from each individual station's realtime product, which is denoted by the w9999 nomenclature immediately before the file extension. We would expect the Planet OS product and the NDBC realtime product to be identical.
Let's begin with the required imports...
In [1]:
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr
from package_api import download_data_station
import datetime
Next we'll setup our data sources and acquire the data via OPeNDAP using xarray.
In [2]:
API_key = open('APIKEY').readlines()[0].strip() #'<YOUR API KEY HERE>'
dataset_key = 'noaa_ndbc_swden_stations'
variables = 'spectral_wave_density,mean_wave_dir,principal_wave_dir,wave_spectrum_r1,wave_spectrum_r2'
In [29]:
# OpenDAP URLs for each product
now = datetime.datetime.now()
ndbc_rt_url='http://dods.ndbc.noaa.gov/thredds/dodsC/data/swden/41047/41047w9999.nc'
ndbc_url = 'http://dods.ndbc.noaa.gov/thredds/dodsC/data/swden/41047/41047w' + str(now.year) + '.nc'
planetos_filename = download_data_station(dataset_key,API_key,'41047','2018-01-01T00:00:00',datetime.datetime.strftime(now,'%Y-%m-%dT%H:%M:%S')
,variables,'41047')
#planetos_tds_url = 'https://api.planetos.com/v1/datasets/noaa_ndbc_swden_stations/stations/41047?origin=dataset-details&station=46237&apikey=8428878e4b944abeb84790e832c633fc&_ga=2.215365009.721611707.1530692788-133742091.1504032768&netcdf=true'#'http://thredds.planetos.com/thredds/dodsC/dpipe//rel_0_8x03_dataset/transform/ns=/noaa_ndbc_swden_stations/scheme=/http/authority=/dods.ndbc.noaa.gov/path=/thredds/dodsC/data/swden/41047/41047w9999.nc/chunk=/1/1/data'
# acquire OpenDAP datasets
ds_ndbc_rt = xr.open_dataset(ndbc_rt_url)
ds_ndbc = xr.open_dataset(ndbc_url)
ds_planetos = xr.open_dataset(planetos_filename)
In [ ]:
# Let's focus on a specific hour of interest...
time = str((datetime.datetime.now() - datetime.timedelta(days=60)).strftime('%Y-%m-%d')) + ' 00:00:00' #'2014-08-09 00:00:00'
# Select the specific hour for each dataset
ds_ndbc_rt_hour = ds_ndbc_rt.sel(time=time).isel(latitude=0, longitude=0)
ds_ndbc_hour = ds_ndbc.sel(time=time).isel(latitude=0, longitude=0)
ds_planetos_hour = ds_planetos.sel(time=time).isel(latitude=0, longitude=0)
In [ ]:
# First, the Planet OS data which is acquired from the NDBC realtime station file.
df_planetos = ds_planetos_hour.to_dataframe().drop(['context_time_latitude_longitude_frequency','mx_dataset','mx_creator_institution'], axis=1)
df_planetos.head(8)
In [ ]:
# Second, the NDBC realtime station data.
df_ndbc_rt = ds_ndbc_rt_hour.to_dataframe()
df_ndbc_rt.head(8)
In [ ]:
# Finally, the 2014 archival data.
df_ndbc = ds_ndbc_hour.to_dataframe()
df_ndbc.head(8)
Based on the sample outputs above, it appears that the Planet OS data matches the NDBC realtime file that it is acquired from. We will further verify this below by performing an equality test against the two Dataframes.
We can also see that the historical data is indeed different, with frequency bins that are neatly rounded and values for wave direction and wave spectrum even when spectral wave density is 0.
Using the describe() method we can explore the statistical characteristics of each in more detail below. Note that the NaN values present in the Planet OS and NDBC realtime datasets will raise warnings for percentile calculations.
In [ ]:
df_planetos.describe()
In [ ]:
df_ndbc_rt.describe()
In [ ]:
df_ndbc.describe()
To confirm that the Planet OS and NDBC realtime Dataframes are indeed equal, we'll perform a diff. Note that NaN != NaN evaluates as True, so NaN values will be raised as inconsistent across the dataframes. This could be resolved using fillna() and an arbitrary fill value such as -9999.99.
In [ ]:
# function below requires identical index structure
def df_diff(df1, df2):
ne_stacked = (df1 != df2).stack()
changed = ne_stacked[ne_stacked]
difference_locations = np.where(df1 != df2)
changed_from = df1.values[difference_locations]
changed_to = df2.values[difference_locations]
return pd.DataFrame({'df1': changed_from, 'df2': changed_to}, index=changed.index)
# Compare the NDBC realtime to Planet OS data
# Note that NaN != NaN evaluates as True, so NaN values will be raised as inconsistent across the dataframes
# We could use fillna() to fix this issue, however this is not implemented here.
df_diff(df_ndbc_rt, df_planetos)
In [ ]:
plt.figure(figsize=(20,10))
ds_ndbc_rt_hour.spectral_wave_density.plot(label='NDBC Realtime')
ds_ndbc_hour.spectral_wave_density.plot(label='NDBC ' + str(now.year))
ds_planetos_hour.spectral_wave_density.plot(label='Planet OS')
plt.legend()
plt.show()
In [ ]:
vars = ['wave_spectrum_r1','wave_spectrum_r2']
df_planetos.loc[:,vars].plot(label="Planet OS", figsize=(18,6))
df_ndbc_rt.loc[:,vars].plot(label="NDBC Realtime", figsize=(18,6))
df_ndbc.loc[:,vars].plot(label="NDBC " + str(now.year), figsize=(18,6))
plt.show()
In [ ]:
vars = ['principal_wave_dir','mean_wave_dir']
df_planetos.loc[:,vars].plot(label="Planet OS", figsize=(18,6))
df_ndbc_rt.loc[:,vars].plot(label="NDBC Realtime", figsize=(18,6))
df_ndbc.loc[:,vars].plot(label="NDBC " + str(now.year), figsize=(18,6))
plt.show()
The Planet OS NDBC spectral wave density data product matches the original NDBC realtime source.
It appears that NDBC is performing addition QA/QC processing on the archival data, which differ slightly from the realtime data, however attempts to locate documentation on the historical data product processing have not been successful.
Planet OS does not currently overwrite historical data with the NDBC archival products, however we may consider doing so if product quality is superior for end users.
In [ ]: