Searching datasets

erddapy can wrap the same form-like search capabilities of ERDDAP with the search_for keyword.


In [1]:
def show_iframe(src):
    from IPython.display import HTML
    iframe = '<iframe src="{src}" width="100%" height="950"></iframe>'.format
    return HTML(iframe(src=src))


def to_df(url):
    import pandas as pd
    return pd.read_csv(url)

In [2]:
from erddapy import ERDDAP


e = ERDDAP(
    server="https://upwell.pfeg.noaa.gov/erddap",
    protocol="tabledap"
)

Single word search.


In [3]:
search_for = "fukushima"

url = e.get_search_url(search_for=search_for, response="csv")

to_df(url)["Dataset ID"]


Out[3]:
0    northerngulfinstitute_edac_dap3_0a94_4f88_8950
1    northerngulfinstitute_edac_dap3_0bc3_0230_8add
2    northerngulfinstitute_edac_dap3_2689_8c24_7dcb
3                               whoi_7a97_cb6f_a9db
4                               whoi_4a75_e5e1_6640
5              northerngulfinstitute_1412_d11d_1e9b
6              northerngulfinstitute_a8f3_c2d4_2227
Name: Dataset ID, dtype: object

Filtering the search with extra words.


In [4]:
search_for = "fukushima velocity"

url = e.get_search_url(search_for=search_for, response="csv")

to_df(url)["Dataset ID"]


Out[4]:
0    northerngulfinstitute_edac_dap3_0a94_4f88_8950
1                               whoi_7a97_cb6f_a9db
2              northerngulfinstitute_a8f3_c2d4_2227
Name: Dataset ID, dtype: object

Filtering the search with words that should not be found.


In [5]:
search_for = "fukushima -velocity"

url = e.get_search_url(search_for=search_for, response="csv")

to_df(url)["Dataset ID"]


Out[5]:
0    northerngulfinstitute_edac_dap3_0bc3_0230_8add
1    northerngulfinstitute_edac_dap3_2689_8c24_7dcb
2                               whoi_4a75_e5e1_6640
3              northerngulfinstitute_1412_d11d_1e9b
Name: Dataset ID, dtype: object

Quoted search or "phrase search," first let us try the unquoted search.


In [6]:
search_for = "wind speed"

url = e.get_search_url(search_for=search_for, response="csv")

len(to_df(url)["Dataset ID"])


Out[6]:
600

Too many datasets because wind, speed, and wind speed are matched. Now let's use the quoted search to reduce the number of results to only wind speed.


In [7]:
search_for = '"wind speed"'

url = e.get_search_url(search_for=search_for, response="csv")

len(to_df(url)["Dataset ID"])


Out[7]:
569

This example is written in a Jupyter Notebook click here to download the notebook so you can run it locally, or click here to run a live instance of this notebook.