Creating a CF-1.6 timeSeries using pocean

IOOS recommends to data providers that their netCDF files follow the CF-1.6 standard. In this notebook we will create a CF-1.6 compliant file that follows file that follows the Discrete Sampling Geometries (DSG) of a timeSeries from a pandas DataFrame.

The pocean module can handle all the DSGs described in the CF-1.6 document: point, timeSeries, trajectory, profile, timeSeriesProfile, and trajectoryProfile. These DSGs array may be represented in the netCDF file as:

orthogonal multidimensional: when the coordinates along the element axis of the features are identical;
incomplete multidimensional: when the features within a collection do not all have the same number but space is not an issue and using longest feature to all features is convenient;
contiguous ragged: can be used if the size of each feature is known;
indexed ragged: stores the features interleaved along the sample dimension in the data variable.

Here we will use the orthogonal multidimensional array to represent time-series data from am hypothetical current meter. We'll use fake data for this example for convenience.

Our fake data represents a current meter located at 10 meters depth collected last week.



In [1]:

    
from datetime import datetime, timedelta

import numpy as np
import pandas as pd

x = np.arange(100, 110, 0.1)
start = datetime.now() - timedelta(days=7)

df = pd.DataFrame(
    {
        "time": [start + timedelta(days=n) for n in range(len(x))],
        "longitude": -48.6256,
        "latitude": -27.5717,
        "depth": 10,
        "u": np.sin(x),
        "v": np.cos(x),
        "station": "fake buoy",
    }
)


df.tail()









    Out[1]:







  
    
      
      time
      longitude
      latitude
      depth
      u
      v
      station
    
  
  
    
      95
      2019-05-24 17:27:54.693087
      -48.6256
      -27.5717
      10
      0.440129
      -0.897934
      fake buoy
    
    
      96
      2019-05-25 17:27:54.693087
      -48.6256
      -27.5717
      10
      0.348287
      -0.937388
      fake buoy
    
    
      97
      2019-05-26 17:27:54.693087
      -48.6256
      -27.5717
      10
      0.252964
      -0.967476
      fake buoy
    
    
      98
      2019-05-27 17:27:54.693087
      -48.6256
      -27.5717
      10
      0.155114
      -0.987897
      fake buoy
    
    
      99
      2019-05-28 17:27:54.693087
      -48.6256
      -27.5717
      10
      0.055714
      -0.998447
      fake buoy

Let's take a look at our fake data.



In [2]:

    
%matplotlib inline


import matplotlib.pyplot as plt
from oceans.plotting import stick_plot

q = stick_plot([t.to_pydatetime() for t in df["time"]], df["u"], df["v"])

ref = 1
qk = plt.quiverkey(
    q, 0.1, 0.85, ref, f"{ref} m s$^{-1}$", labelpos="N", coordinates="axes"
)

plt.xticks(rotation=70)

pocean.dsg is relatively simple to use. The user must provide a DataFrame, like the one above, and a dictionary of attributes that maps to the data and adhere to the DSG conventions desired.

Because we want the file to work seamlessly with ERDDAP we also added some ERDDAP specific attributes like cdm_timeseries_variables, and subsetVariables.



In [3]:

    
attributes = {
    "global": {
        "title": "Fake mooring",
        "summary": "Vector current meter ADCP @ 10 m",
        "institution": "Restaurant at the end of the universe",
        "cdm_timeseries_variables": "station",
        "subsetVariables": "depth",
    },
    "longitude": {"units": "degrees_east", "standard_name": "longitude",},
    "latitude": {"units": "degrees_north", "standard_name": "latitude",},
    "z": {"units": "m", "standard_name": "depth", "positive": "down",},
    "u": {"units": "m/s", "standard_name": "eastward_sea_water_velocity",},
    "v": {"units": "m/s", "standard_name": "northward_sea_water_velocity",},
    "station": {"cf_role": "timeseries_id"},
}

We also need to map the our data axes to pocean's defaults. This step is not needed if the data axes are already named like the default ones.



In [4]:

    
axes = {"t": "time", "x": "longitude", "y": "latitude", "z": "depth"}



In [5]:

    
from pocean.dsg.timeseries.om import OrthogonalMultidimensionalTimeseries
from pocean.utils import downcast_dataframe

df = downcast_dataframe(df)  # safely cast depth np.int64 to np.int32
dsg = OrthogonalMultidimensionalTimeseries.from_dataframe(
    df, output="fake_buoy.nc", attributes=attributes, axes=axes,
)

The OrthogonalMultidimensionalTimeseries saves the DataFrame into a CF-1.6 TimeSeries DSG.



In [6]:

    
!ncdump -h fake_buoy.nc









    



netcdf fake_buoy {
dimensions:
	station = 1 ;
	time = 100 ;
variables:
	int crs ;
	double time(time) ;
		time:units = "seconds since 1990-01-01 00:00:00Z" ;
		time:standard_name = "time" ;
		time:axis = "T" ;
	string station(station) ;
		station:cf_role = "timeseries_id" ;
		station:long_name = "station identifier" ;
	double latitude(station) ;
		latitude:axis = "Y" ;
		latitude:units = "degrees_north" ;
		latitude:standard_name = "latitude" ;
	double longitude(station) ;
		longitude:axis = "X" ;
		longitude:units = "degrees_east" ;
		longitude:standard_name = "longitude" ;
	int depth(station) ;
		depth:_FillValue = -9999 ;
		depth:axis = "Z" ;
	double u(station, time) ;
		u:_FillValue = -9999.9 ;
		u:units = "m/s" ;
		u:standard_name = "eastward_sea_water_velocity" ;
		u:coordinates = "time depth longitude latitude" ;
	double v(station, time) ;
		v:_FillValue = -9999.9 ;
		v:units = "m/s" ;
		v:standard_name = "northward_sea_water_velocity" ;
		v:coordinates = "time depth longitude latitude" ;

// global attributes:
		:Conventions = "CF-1.6" ;
		:date_created = "2019-02-25T20:27:00Z" ;
		:featureType = "timeseries" ;
		:cdm_data_type = "Timeseries" ;
		:title = "Fake mooring" ;
		:summary = "Vector current meter ADCP @ 10 m" ;
		:institution = "Restaurant at the end of the universe" ;
		:cdm_timeseries_variables = "station" ;
		:subsetVariables = "depth" ;
}

It also outputs the dsg object for inspection. Let us check a few things to see if our objects was created as expected. (Note that some of the metadata was "free" due t the built-in defaults in pocean.



In [7]:

    
dsg.getncattr("featureType")









    Out[7]:





'timeseries'



In [8]:

    
type(dsg)









    Out[8]:





pocean.dsg.timeseries.om.OrthogonalMultidimensionalTimeseries

In addition to standard netCDF4-python object .variables method pocean's DSGs provides an "categorized" version of the variables in the data_vars, ancillary_vars, and the DSG axes methods.



In [9]:

    
[(v.standard_name) for v in dsg.data_vars()]









    Out[9]:





['eastward_sea_water_velocity', 'northward_sea_water_velocity']



In [10]:

    
dsg.axes("T")









    Out[10]:





[<class 'netCDF4._netCDF4.Variable'>
 float64 time(time)
     units: seconds since 1990-01-01 00:00:00Z
     standard_name: time
     axis: T
 unlimited dimensions: 
 current shape = (100,)
 filling on, default _FillValue of 9.969209968386869e+36 used]



In [11]:

    
dsg.axes("Z")









    Out[11]:





[<class 'netCDF4._netCDF4.Variable'>
 int32 depth(station)
     _FillValue: -9999
     axis: Z
 unlimited dimensions: 
 current shape = (1,)
 filling on]



In [12]:

    
dsg.vatts("station")









    Out[12]:





{'cf_role': 'timeseries_id', 'long_name': 'station identifier'}



In [13]:

    
dsg["station"][:]









    Out[13]:





array(['fake buoy'], dtype=object)



In [14]:

    
dsg.vatts("u")









    Out[14]:





{'_FillValue': -9999.9,
 'units': 'm/s',
 'standard_name': 'eastward_sea_water_velocity',
 'coordinates': 'time depth longitude latitude'}

We can easily round-trip back to the pandas DataFrame object.



In [15]:

    
dsg.to_dataframe().head()









    Out[15]:







  
    
      
      t
      x
      y
      z
      station
      u
      v
    
  
  
    
      0
      2019-02-18 17:27:54.693087
      -48.6256
      -27.5717
      10
      fake buoy
      -0.506366
      0.862319
    
    
      1
      2019-02-19 17:27:54.693087
      -48.6256
      -27.5717
      10
      fake buoy
      -0.417748
      0.908563
    
    
      2
      2019-02-20 17:27:54.693087
      -48.6256
      -27.5717
      10
      fake buoy
      -0.324956
      0.945729
    
    
      3
      2019-02-21 17:27:54.693087
      -48.6256
      -27.5717
      10
      fake buoy
      -0.228917
      0.973446
    
    
      4
      2019-02-22 17:27:54.693087
      -48.6256
      -27.5717
      10
      fake buoy
      -0.130591
      0.991436

For more information on pocean please check the API docs.

	time	longitude	latitude	depth	u	v	station
95	2019-05-24 17:27:54.693087	-48.6256	-27.5717	10	0.440129	-0.897934	fake buoy
96	2019-05-25 17:27:54.693087	-48.6256	-27.5717	10	0.348287	-0.937388	fake buoy
97	2019-05-26 17:27:54.693087	-48.6256	-27.5717	10	0.252964	-0.967476	fake buoy
98	2019-05-27 17:27:54.693087	-48.6256	-27.5717	10	0.155114	-0.987897	fake buoy
99	2019-05-28 17:27:54.693087	-48.6256	-27.5717	10	0.055714	-0.998447	fake buoy

	t	x	y	z	station	u	v
0	2019-02-18 17:27:54.693087	-48.6256	-27.5717	10	fake buoy	-0.506366	0.862319
1	2019-02-19 17:27:54.693087	-48.6256	-27.5717	10	fake buoy	-0.417748	0.908563
2	2019-02-20 17:27:54.693087	-48.6256	-27.5717	10	fake buoy	-0.324956	0.945729
3	2019-02-21 17:27:54.693087	-48.6256	-27.5717	10	fake buoy	-0.228917	0.973446
4	2019-02-22 17:27:54.693087	-48.6256	-27.5717	10	fake buoy	-0.130591	0.991436