IOOS recommends to data providers that their netCDF files follow the CF-1.6 standard. In this notebook we will create a CF-1.6 compliant file that follows file that follows the Discrete Sampling Geometries (DSG) of a timeSeries from a pandas DataFrame.
The pocean module can handle all the DSGs described in the CF-1.6 document: point, timeSeries, trajectory, profile, timeSeriesProfile, and trajectoryProfile. These DSGs array may be represented in the netCDF file as:
Here we will use the orthogonal multidimensional array to represent time-series data from am hypothetical current meter. We'll use fake data for this example for convenience.
Our fake data represents a current meter located at 10 meters depth collected last week.
In [1]:
from datetime import datetime, timedelta
import numpy as np
import pandas as pd
x = np.arange(100, 110, 0.1)
start = datetime.now() - timedelta(days=7)
df = pd.DataFrame(
{
"time": [start + timedelta(days=n) for n in range(len(x))],
"longitude": -48.6256,
"latitude": -27.5717,
"depth": 10,
"u": np.sin(x),
"v": np.cos(x),
"station": "fake buoy",
}
)
df.tail()
Out[1]:
Let's take a look at our fake data.
In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
from oceans.plotting import stick_plot
q = stick_plot([t.to_pydatetime() for t in df["time"]], df["u"], df["v"])
ref = 1
qk = plt.quiverkey(
q, 0.1, 0.85, ref, f"{ref} m s$^{-1}$", labelpos="N", coordinates="axes"
)
plt.xticks(rotation=70)
pocean.dsg is relatively simple to use. The user must provide a DataFrame, like the one above, and a dictionary of attributes that maps to the data and adhere to the DSG conventions desired.
Because we want the file to work seamlessly with ERDDAP we also added some ERDDAP specific attributes like cdm_timeseries_variables, and subsetVariables.
In [3]:
attributes = {
"global": {
"title": "Fake mooring",
"summary": "Vector current meter ADCP @ 10 m",
"institution": "Restaurant at the end of the universe",
"cdm_timeseries_variables": "station",
"subsetVariables": "depth",
},
"longitude": {"units": "degrees_east", "standard_name": "longitude",},
"latitude": {"units": "degrees_north", "standard_name": "latitude",},
"z": {"units": "m", "standard_name": "depth", "positive": "down",},
"u": {"units": "m/s", "standard_name": "eastward_sea_water_velocity",},
"v": {"units": "m/s", "standard_name": "northward_sea_water_velocity",},
"station": {"cf_role": "timeseries_id"},
}
We also need to map the our data axes to pocean's defaults. This step is not needed if the data axes are already named like the default ones.
In [4]:
axes = {"t": "time", "x": "longitude", "y": "latitude", "z": "depth"}
In [5]:
from pocean.dsg.timeseries.om import OrthogonalMultidimensionalTimeseries
from pocean.utils import downcast_dataframe
df = downcast_dataframe(df) # safely cast depth np.int64 to np.int32
dsg = OrthogonalMultidimensionalTimeseries.from_dataframe(
df, output="fake_buoy.nc", attributes=attributes, axes=axes,
)
The OrthogonalMultidimensionalTimeseries saves the DataFrame into a CF-1.6 TimeSeries DSG.
In [6]:
!ncdump -h fake_buoy.nc
It also outputs the dsg object for inspection. Let us check a few things to see if our objects was created as expected. (Note that some of the metadata was "free" due t the built-in defaults in pocean.
In [7]:
dsg.getncattr("featureType")
Out[7]:
In [8]:
type(dsg)
Out[8]:
In addition to standard netCDF4-python object .variables method pocean's DSGs provides an "categorized" version of the variables in the data_vars, ancillary_vars, and the DSG axes methods.
In [9]:
[(v.standard_name) for v in dsg.data_vars()]
Out[9]:
In [10]:
dsg.axes("T")
Out[10]:
In [11]:
dsg.axes("Z")
Out[11]:
In [12]:
dsg.vatts("station")
Out[12]:
In [13]:
dsg["station"][:]
Out[13]:
In [14]:
dsg.vatts("u")
Out[14]:
We can easily round-trip back to the pandas DataFrame object.
In [15]:
dsg.to_dataframe().head()
Out[15]:
For more information on pocean please check the API docs.