erddapy can be installed with conda
conda install --channel conda-forge erddapy
or pip
pip install erddapy
First we need to instantiate the erddapy server object.
In [1]:
from erddapy import ERDDAP
e = ERDDAP(
server="https://gliders.ioos.us/erddap",
protocol="tabledap",
response="csv",
)
Now we can populate the object with constraints, the variables of interest, and the dataset id.
In [2]:
e.dataset_id = "whoi_406-20160902T1700"
e.constraints = {
"time>=": "2016-07-10T00:00:00Z",
"time<=": "2017-02-10T00:00:00Z",
"latitude>=": 38.0,
"latitude<=": 41.0,
"longitude>=": -72.0,
"longitude<=": -69.0,
}
e.variables = [
"depth",
"latitude",
"longitude",
"salinity",
"temperature",
"time",
]
url = e.get_download_url()
print(url)
In [3]:
import pandas as pd
df = e.to_pandas(
index_col="time (UTC)",
parse_dates=True,
).dropna()
df.head()
Out[3]:
In [4]:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
fig, ax = plt.subplots(figsize=(17, 2))
cs = ax.scatter(
df.index,
df["depth (m)"],
s=15,
c=df["temperature (Celsius)"],
marker="o",
edgecolor="none"
)
ax.invert_yaxis()
ax.set_xlim(df.index[0], df.index[-1])
xfmt = mdates.DateFormatter("%H:%Mh\n%d-%b")
ax.xaxis.set_major_formatter(xfmt)
cbar = fig.colorbar(cs, orientation="vertical", extend="both")
cbar.ax.set_ylabel("Temperature ($^\circ$C)")
ax.set_ylabel("Depth (m)");
First we need to instantiate the ERDDAP URL constructor for a server.
In this example we will use https://gliders.ioos.us/erddap.
In [5]:
from erddapy import ERDDAP
e = ERDDAP(server="https://gliders.ioos.us/erddap")
What are the methods/attributes available?
In [6]:
[method for method in dir(e) if not method.startswith("_")]
Out[6]:
All the get_<methods> will return a valid ERDDAP URL for the requested response and options.
erddapy will raise an error is if URL HEADER cannot be validated.
In [7]:
print(e.get_search_url(search_for="all"))
In [8]:
import pandas as pd
df = pd.read_csv(e.get_search_url(response="csv", search_for="all"))
In [9]:
print(
f'We have {len(set(df["tabledap"].dropna()))} '
f'tabledap, {len(set(df["griddap"].dropna()))} '
f'griddap, and {len(set(df["wms"].dropna()))} wms endpoints.'
)
We can refine our search by providing some constraints.
In [10]:
def show_iframe(src):
"""Helper function to show HTML returns."""
from IPython.display import HTML
iframe = f'<iframe src="{src}" width="100%" height="950"></iframe>'
return HTML(iframe)
Let's narrow the search area, time span, and look for sea_water_temperature only.
In [11]:
kw = {
"standard_name": "sea_water_temperature",
"min_lon": -72.0,
"max_lon": -69.0,
"min_lat": 38.0,
"max_lat": 41.0,
"min_time": "2016-07-10T00:00:00Z",
"max_time": "2017-02-10T00:00:00Z",
"cdm_data_type": "trajectoryprofile"
}
search_url = e.get_search_url(response="html", **kw)
show_iframe(search_url)
Out[11]:
We can see that the search form above was correctly populated with the constraints we provided.
Let us change the response from .html to .csv,
so we load it as a pandas.DataFrame,
and inspect what are the Dataset IDs available for download.
In [12]:
search_url = e.get_search_url(response="csv", **kw)
search = pd.read_csv(search_url)
gliders = search["Dataset ID"].values
gliders_list = "\n".join(gliders)
print(f"Found {len(gliders)} Glider Datasets:\n{gliders_list}")
Now that we know the Dataset IDs we can explore their metadata with the get_info_url method.
In [13]:
info_url = e.get_info_url(dataset_id=gliders[0], response="html")
show_iframe(src=info_url)
Out[13]:
Again, with the csv response, we can manipulate the metadata and find the variables that have the cdm_profile_variables attribute.
In [14]:
info_url = e.get_info_url(dataset_id=gliders[0], response='csv')
info = pd.read_csv(info_url)
info.head()
Out[14]:
In [15]:
"".join(info.loc[info["Attribute Name"] == "cdm_profile_variables", "Value"])
Out[15]:
Selecting variables by theirs attributes is such a common operation that erddapy brings its own method to simplify this task.
The get_var_by_attr method is inspired by netCDF4-python's get_variables_by_attributes however, because erddapy is operating on remote serves, it will return the variable names instead of the actual variables.
Here we check what is/are the variable(s) associated with the standard_name used in the search.
Note that get_var_by_attr caches the last response in case the user needs to make multiple requests,
but it will loose its state when a new request is made.
(See the execution times below.)
In [16]:
%%time
# First one, slow.
e.get_var_by_attr(
dataset_id="whoi_406-20160902T1700",
standard_name="sea_water_temperature"
)
Out[16]:
In [17]:
%%time
# Second one on the same glider, a little bit faster.
e.get_var_by_attr(
dataset_id="whoi_406-20160902T1700",
standard_name="sea_water_practical_salinity"
)
Out[17]:
In [18]:
%%time
# New one, slow again.
e.get_var_by_attr(
dataset_id="cp_336-20170116T1254",
standard_name="sea_water_practical_salinity"
)
Out[18]:
Another way to browse datasets is via the categorize URL. In the example below we can get all the standard_names available in the dataset with a single request.
In [19]:
url = e.get_categorize_url(
categorize_by="standard_name",
response="csv"
)
pd.read_csv(url)["Category"]
Out[19]:
We can also pass a value to filter the categorize results.
In [20]:
url = e.get_categorize_url(
categorize_by="institution",
value="woods_hole_oceanographic_institution",
response="csv"
)
df = pd.read_csv(url)
In [21]:
whoi_gliders = df.loc[~df["tabledap"].isnull(), "Dataset ID"].tolist()
whoi_gliders
Out[21]:
Now it is easy to filter non WHOI gliders from our original glider search.
In [22]:
gliders = [glider for glider in gliders if glider in whoi_gliders]
gliders
Out[22]:
With Python it is easy to loop over all the dataset_ids for the variables with standard_names
In [23]:
variables = [
e.get_var_by_attr(
dataset_id=glider,
standard_name=lambda v: v is not None
)
for glider in gliders
]
We can construct a set with the common variables in those dataset_ids.
In [24]:
common_variables = set(variables[0]).intersection(*variables[1:])
common_variables
Out[24]:
Last, but not least, the download endpoint!
It is important to note that the download constraints are based on the variables names and not the standardized ones for the get_search_url method.
In [25]:
constraints = {
"longitude>=": kw["min_lon"],
"longitude<=": kw["max_lon"],
"latitude>=": kw["min_lat"],
"latitude<=": kw["max_lat"],
"time>=": kw["min_time"],
"time<=": kw["max_time"],
}
download_url = e.get_download_url(
dataset_id=gliders[0],
protocol="tabledap",
variables=common_variables,
constraints=constraints
)
print(download_url)
Putting everything in DataFrame objects.
In [26]:
from requests.exceptions import HTTPError
def download_csv(url):
return pd.read_csv(
url,
index_col="time",
parse_dates=True,
skiprows=[1],
)
dfs = {}
for glider in gliders:
try:
download_url = e.get_download_url(
dataset_id=glider,
protocol="tabledap",
variables=common_variables,
response="csv",
constraints=constraints
)
except HTTPError:
print(f"Failed to download {glider}.")
continue
dfs.update({glider: download_csv(download_url)})
The glider datasets should be masked automatically but we found that is not true. The cell below applies the mask as described by the data QC flag.
Finally let's see some figures!
In [27]:
%matplotlib inline
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter
def make_map(extent):
fig, ax = plt.subplots(
figsize=(9, 9),
subplot_kw=dict(projection=ccrs.PlateCarree())
)
ax.coastlines(resolution="10m")
ax.set_extent(extent)
ax.set_xticks([extent[0], extent[1]], crs=ccrs.PlateCarree())
ax.set_yticks([extent[2], extent[3]], crs=ccrs.PlateCarree())
lon_formatter = LongitudeFormatter(zero_direction_label=True)
lat_formatter = LatitudeFormatter()
ax.xaxis.set_major_formatter(lon_formatter)
ax.yaxis.set_major_formatter(lat_formatter)
return fig, ax
dx = dy = 0.5
extent = kw["min_lon"]-dx, kw["max_lon"]+dx, kw["min_lat"]+dy, kw["max_lat"]+dy
fig, ax = make_map(extent)
for glider, df in dfs.items():
ax.plot(df["longitude"], df["latitude"], label=glider)
leg = ax.legend()
In [28]:
def glider_scatter(df, ax, glider):
ax.scatter(df["temperature"], df["salinity"],
s=10, alpha=0.5, label=glider)
fig, ax = plt.subplots(figsize=(9, 9))
ax.set_ylabel("salinity")
ax.set_xlabel("temperature")
ax.grid(True)
for glider, df in dfs.items():
glider_scatter(df, ax, glider)
leg = ax.legend()
In [29]:
e.constraints = None
e.protocol = "tabledap"
opendap_url = e.get_download_url(
dataset_id="whoi_406-20160902T1700",
response="opendap",
)
print(opendap_url)
In [30]:
from netCDF4 import Dataset
with Dataset(opendap_url) as nc:
print(nc.summary)
In [31]:
e.dataset_id = "cp_336-20170116T1254"
e.response = "nc"
e.variables = common_variables
e.constraints = constraints
download_url = e.get_download_url()
In [32]:
import requests
def humansize(nbytes):
suffixes = ["B", "KB", "MB", "GB", "TB", "PB"]
k = 0
while nbytes >= 1024 and k < len(suffixes)-1:
nbytes /= 1024.
k += 1
f = ("%.2f" % nbytes).rstrip("0").rstrip(".")
return "%s %s" % (f, suffixes[k])
r = requests.head(download_url)
nbytes = float(r.headers["Content-Length"])
humansize(nbytes)
Out[32]:
That is the uncompressed size, we are OK because the download will be less than that, ERDDAP streams gzip'ed data.
In [33]:
r.headers["Content-Encoding"]
Out[33]:
In [34]:
ds = e.to_xarray(decode_times=False)
ds
Out[34]:
In [35]:
ds["temperature"]
Out[35]:
In [36]:
import numpy as np
data = ds["temperature"].values
depth = ds["depth"].values
mask = ~np.ma.masked_invalid(data).mask
In [37]:
data = data[mask]
depth = depth[mask]
lon = ds["longitude"].values[mask]
lat = ds["latitude"].values[mask]
In [38]:
import warnings
with warnings.catch_warnings():
warnings.simplefilter("ignore")
mask = depth <= 5
data = data[mask]
depth = depth[mask]
lon = lon[mask]
lat = lat[mask]
In [39]:
%matplotlib inline
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
dx = dy = 1.5
extent = (
ds.geospatial_lon_min-dx, ds.geospatial_lon_max+dx,
ds.geospatial_lat_min-dy, ds.geospatial_lat_max+dy
)
fig, ax = make_map(extent)
cs = ax.scatter(lon, lat, c=data, s=50, alpha=0.5, edgecolor="none")
cbar = fig.colorbar(cs, orientation="vertical",
fraction=0.1, shrink=0.9, extend="both")
ax.coastlines("10m");
Or iris if the data is easier to navigate via the CF conventions data model.
In [40]:
import warnings
# Iris warnings are quire verbose!
with warnings.catch_warnings():
warnings.simplefilter("ignore")
cubes = e.to_iris()
print(cubes)
In [41]:
cubes.extract_strict("sea_water_pressure")
Out[41]:
This example is written in a Jupyter Notebook click here to download the notebook so you can run it locally, or click here to run a live instance of this notebook.