Accessing ERDDAP from Python

ERDDAP rich responses and RESTful API is makes it THE most convenient way to serve data.

One can build URLs manually or programmatically like:

`https://erddap-uncabled.oceanobservatories.org/uncabled/erddap/tabledap/CP05MOAS-GL336-02-FLORTM000-flort_m_glider_instrument-telemetered-deployment0005-tabledap.csv?ctdgv_m_glider_instrument_sci_water_temp,time&time>=2017-02-10T00:00:00Z`

server: https://erddap-uncabled.oceanobservatories.org/uncabled/erddap/
protocol: tabledap
dataset_id: CP05MOAS-GL336-02-FLORTM000-flort_m_glider_instrument-telemetered-deployment0005-tabledap
variables: ctdgv_m_glider_instrument_sci_water_temp,latitude,longitude,temperature,time
constraints:
- time>=2017-10-11T00:00:00Z
- time<=2017-10-18T00:00:00Z
- latitude>=38.0
- latitude<=41.0
- longitude>=-72.0
- longitude<=-69.0



In [1]:

    
from erddapy import ERDDAP

server = 'https://erddap-uncabled.oceanobservatories.org/uncabled/erddap'

dataset_id = 'CP05MOAS-GL336-02-FLORTM000-flort_m_glider_instrument-telemetered-deployment0005-tabledap'

constraints = {
    'time>=': '2017-10-11T00:00:00Z',
    'time<=': '2017-10-18T08:16:57Z',
    'latitude>=': 38.0,
    'latitude<=': 41.0,
    'longitude>=': -72.0,
    'longitude<=': -69.0,
}

depth = 'ctdgv_m_glider_instrument_sci_water_pressure_dbar'
salinity = 'ctdgv_m_glider_instrument_practical_salinity'
temperature = 'ctdgv_m_glider_instrument_sci_water_temp'

variables = [
  depth,
 'latitude',
 'longitude',
  salinity,
  temperature,
 'time',
]



In [2]:

    
e = ERDDAP(
    server=server,
    dataset_id=dataset_id,
    constraints=constraints,
    variables=variables,
    protocol='tabledap',
    response='mat',
)

print(e.get_download_url())









    



https://erddap-uncabled.oceanobservatories.org/uncabled/erddap/tabledap/CP05MOAS-GL336-02-FLORTM000-flort_m_glider_instrument-telemetered-deployment0005-tabledap.mat?ctdgv_m_glider_instrument_sci_water_pressure_dbar,latitude,longitude,ctdgv_m_glider_instrument_practical_salinity,ctdgv_m_glider_instrument_sci_water_temp,time&time>=1507680000.0&time<=1508314617.0&latitude>=38.0&latitude<=41.0&longitude>=-72.0&longitude<=-69.0

Obtaining the data

There are a few methods to obtain the data with to_pandas() and to_xarray():



In [3]:

    
df = e.to_pandas(
    index_col='time',
    parse_dates=True,
    skiprows=(1,)  # units information can be dropped.
).dropna()



In [4]:

    
df.head()









    Out[4]:







  
    
      
      ctdgv_m_glider_instrument_sci_water_pressure_dbar
      latitude
      longitude
      ctdgv_m_glider_instrument_practical_salinity
      ctdgv_m_glider_instrument_sci_water_temp
    
    
      time
      
      
      
      
      
    
  
  
    
      2017-10-11 00:14:04
      4.6
      39.841324
      -70.500526
      34.913344
      21.2316
    
    
      2017-10-11 00:14:22
      4.6
      39.841324
      -70.500526
      34.913344
      21.2316
    
    
      2017-10-11 00:15:57
      4.6
      39.841324
      -70.500526
      34.913344
      21.2316
    
    
      2017-10-11 00:18:17
      4.6
      39.842746
      -70.503738
      34.913344
      21.2316
    
    
      2017-10-11 00:19:18
      4.6
      39.842766
      -70.503890
      34.913344
      21.2316

Let's plot the data

Exploring an ERDDAP server



In [5]:

    
from erddapy import ERDDAP


e = ERDDAP(server='https://erddap-uncabled.oceanobservatories.org/uncabled/erddap')



In [6]:

    
import pandas as pd


df = pd.read_csv(e.get_search_url(response='csv', search_for='all'))



In [7]:

    
'We have {} tabledap, {} griddap, and {} wms endpoints.'.format(
    len(set(df['tabledap'].dropna())),
    len(set(df['griddap'].dropna())),
    len(set(df['wms'].dropna()))
)









    Out[7]:





'We have 1000 tabledap, 0 griddap, and 0 wms endpoints.'

ERDDAP Advanced Search

Let's narrow the search area, time span, and look for sea_water_temperature only.



In [8]:

    
bbox = [-72.0, -69.0, 38.0, 41.0]

min_time = '2018-02-01T00:00:00Z'
max_time = '2018-02-08T00:00:00Z'

kw = {
    'standard_name': 'sea_water_temperature',
    'search_for': 'glider',
    'min_lon': bbox[0],
    'max_lon': bbox[1],
    'min_lat': bbox[2],
    'max_lat': bbox[3],
    'min_time': min_time,
    'max_time': max_time,
    'cdm_data_type': 'trajectory'
}



In [9]:

    
search_url = e.get_search_url(response='csv', **kw)
search = pd.read_csv(search_url)
gliders = search['Dataset ID'].values

msg = 'Found {} Glider Datasets:\n\n{}'.format
print(msg(len(gliders), '\n'.join(gliders)))









    



Found 3 Glider Datasets:

CP05MOAS-GL336-03-CTDGVM000-ctdgv_m_glider_instrument-telemetered-deployment0006-tabledap
CP05MOAS-GL339-03-CTDGVM000-ctdgv_m_glider_instrument-telemetered-deployment0006-tabledap
CP05MOAS-GL380-03-CTDGVM000-ctdgv_m_glider_instrument-telemetered-deployment0006-tabledap

With the Dataset IDs we can explore the metadata with the get_info_url



In [10]:

    
print(gliders[0])

info_url = e.get_info_url(dataset_id=gliders[0], response='csv')
info = pd.read_csv(info_url)

info.head()









    



CP05MOAS-GL336-03-CTDGVM000-ctdgv_m_glider_instrument-telemetered-deployment0006-tabledap






    Out[10]:







  
    
      
      Row Type
      Variable Name
      Attribute Name
      Data Type
      Value
    
  
  
    
      0
      attribute
      NC_GLOBAL
      cdm_data_type
      String
      Trajectory
    
    
      1
      attribute
      NC_GLOBAL
      cdm_trajectory_variables
      String
      trajectory
    
    
      2
      attribute
      NC_GLOBAL
      collection_method
      String
      telemetered
    
    
      3
      attribute
      NC_GLOBAL
      Conventions
      String
      CF-1.6, COARDS, ACDD-1.3
    
    
      4
      attribute
      NC_GLOBAL
      creator_name
      String
      Ocean Observatories Initiative



In [11]:

    
cdm_profile_variables = info.loc[
    info['Attribute Name'] == 'cdm_profile_variables', 'Value'
]

print(''.join(cdm_profile_variables))

Selecting variables by attributes



In [12]:

    
e.get_var_by_attr(
    dataset_id='CP02PMCI-WFP01-03-CTDPFK000-ctdpf_ckl_wfp_instrument-telemetered-deployment0008-tabledap',
    standard_name='sea_water_temperature'
)









    Out[12]:





['ctdpf_ckl_seawater_temperature']

Easy to use CF conventions standards



In [13]:

    
t_vars = [
    e.get_var_by_attr(
        dataset_id=glider, standard_name='sea_water_temperature'
    )[0] for glider in gliders
]
t_vars









    Out[13]:





['sci_water_temp', 'sci_water_temp', 'sci_water_temp']



In [14]:

    
s_vars = [
    e.get_var_by_attr(
        dataset_id=glider, standard_name='sea_water_practical_salinity'
    )[0] for glider in gliders
]
s_vars









    Out[14]:





['practical_salinity', 'practical_salinity', 'practical_salinity']



In [15]:

    
d_vars = [
    e.get_var_by_attr(
        dataset_id=glider, standard_name='sea_water_pressure'
    )[0] for glider in gliders
]
d_vars









    Out[15]:





['sci_water_pressure_dbar',
 'sci_water_pressure_dbar',
 'sci_water_pressure_dbar']



In [16]:

    
# FIX: should not really assume that variables are the same for each dataset
depth = d_vars[0]
salinity = s_vars[0]
temperature = t_vars[0]

Putting everything together



In [17]:

    
from requests.exceptions import HTTPError

constraints = {
    'time>=': min_time,
    'time<=': max_time,
    'longitude>=': bbox[0],
    'longitude<=': bbox[1],
    'latitude>=': bbox[2],
    'latitude<=': bbox[3]
}

def download_csv(url):
    return pd.read_csv(
        url, index_col='time', parse_dates=True, skiprows=[1]
    )

dfs = {}
for glider in gliders:
    try:
        download_url = e.get_download_url(
            dataset_id=glider,
            protocol='tabledap',
            variables=['time', 'latitude', 'longitude', depth, salinity, temperature],
            response='csv',
            constraints=constraints
        )
    except HTTPError:
        print('Failed to download {}'.format(glider))
        continue
    dfs.update({glider: download_csv(download_url)})



In [18]:

    
import numpy as np

for glider in dfs.keys():
    dfs[glider].loc[dfs[glider][salinity] <= .1, salinity] = np.NaN
    dfs[glider].loc[dfs[glider][temperature] <= .1, temperature] = np.NaN



In [19]:

    
import folium

zoom_start = 7
lon = (bbox[0] + bbox[1]) / 2
lat = (bbox[2] + bbox[3]) / 2
m = folium.Map(width='100%', height='100%',
               location=[lat, lon], zoom_start=zoom_start)

url = 'https://gis.ngdc.noaa.gov/arcgis/services/gebco08_hillshade/MapServer/WMSServer'
w = folium.WmsTileLayer(
    url,
    name='GEBCO Bathymetry',
    fmt='image/png',
    layers='GEBCO_08 Hillshade',
    attr='GEBCO',
    overlay=True,
    transparent=True)

w.add_to(m)

colors = ['orange','pink','yellow']

k=0
for glider, df in dfs.items():

    line = folium.PolyLine(locations=list(zip(df['latitude'],df['longitude'])),
                           color=colors[k],
                           weight=8,
                           opacity=0.6,
                           popup=glider[:22]).add_to(m)
    k = k+1

m









    Out[19]:



In [20]:

    
import matplotlib.pyplot as plt
%matplotlib inline

def glider_scatter(df, ax, glider):
    ax.scatter(df[temperature], df[salinity],
               s=10, alpha=0.5, label=glider)
fig, ax = plt.subplots(figsize=(12, 7))
ax.set_ylabel('salinity')
ax.set_xlabel('temperature')
ax.grid(True)

for glider, df in dfs.items():
    glider_scatter(df, ax, glider)
leg = ax.legend()

Plot one of the glider transects



In [21]:

    
df = next(iter(dfs.values()))



In [22]:

    
import matplotlib.dates as mdates

fig, ax = plt.subplots(figsize=(17, 2))
cs = ax.scatter(df.index, df[depth], s=15, c=df[temperature], marker='o', edgecolor='none')

ax.invert_yaxis()
ax.set_xlim(df.index[0], df.index[-1])
xfmt = mdates.DateFormatter('%H:%Mh\n%d-%b')
ax.xaxis.set_major_formatter(xfmt)

cbar = fig.colorbar(cs, orientation='vertical', extend='both')
cbar.ax.set_ylabel('Temperature ($^\circ$C)')
ax.set_ylabel('Depth (m)');



In [ ]:

	ctdgv_m_glider_instrument_sci_water_pressure_dbar	latitude	longitude	ctdgv_m_glider_instrument_practical_salinity	ctdgv_m_glider_instrument_sci_water_temp
time
2017-10-11 00:14:04	4.6	39.841324	-70.500526	34.913344	21.2316
2017-10-11 00:14:22	4.6	39.841324	-70.500526	34.913344	21.2316
2017-10-11 00:15:57	4.6	39.841324	-70.500526	34.913344	21.2316
2017-10-11 00:18:17	4.6	39.842746	-70.503738	34.913344	21.2316
2017-10-11 00:19:18	4.6	39.842766	-70.503890	34.913344	21.2316

	Row Type	Variable Name	Attribute Name	Data Type	Value
0	attribute	NC_GLOBAL	cdm_data_type	String	Trajectory
1	attribute	NC_GLOBAL	cdm_trajectory_variables	String	trajectory
2	attribute	NC_GLOBAL	collection_method	String	telemetered
3	attribute	NC_GLOBAL	Conventions	String	CF-1.6, COARDS, ACDD-1.3
4	attribute	NC_GLOBAL	creator_name	String	Ocean Observatories Initiative