After you successfully request a dataset from the USGS, Hydrofunctions will process the data into a huge table and make it available to you in several formats.
If you are working with Python and timeseries data, then you should already know about Pandas and Numpy, the numerical systems Hydrofunctions is built upon. These two data analysis packages are involved in almost all scientific data analysis, and are the starting point for hundreds of projects.
Use the following dataset for the examples below:
In [1]:
import hydrofunctions as hf
%matplotlib inline
data = hf.NWIS(['01650800', '01589330'], 'iv', start_date='2019-05-01', end_date='2019-06-01', file='view-example.parquet')
This dataset has the following properties:
In [2]:
data
Out[2]:
It includes two sites, with seven different types of data being collected at one site, and two at the other.
Let's start by viewing all of the columns in the first five rows of our table. To view all of our data as a dataframe, we use the .df()
method of NWIS
. The .head()
method limits display of our table to just the first five rows:
In [3]:
data.df().head()
Out[3]:
This is equivalent to data.df('all').head()
We now have nine different columns containing data. Each column has a twin 'qualifiers' column, which contains metadata flags.
You can list the columns separately by viewing the columns attribute of the dataframe:
In [4]:
data.df().columns
Out[4]:
In [5]:
data.df('data').head()
Out[5]:
In [6]:
data.df('01589330').head()
Out[6]:
It is possible to limit your view to only one parameter by entering the five digit parameter number, such as '00065' for stage. Some common parameters have an alias, such as 'q' and 'discharge' for '00060'. Since discharge is collected at both sites, this request will return two columns:
In [7]:
data.df('q').head()
Out[7]:
The previous example selected discharge data at both sites in the dataset, but you can combine your requests in any order to get just the columns you want. For example, the stage data at a single site would be: .df('01589330', 'stage')
Every data column also comes matched with a 'qualifier' column that contains a set of metadata flags for each observation. These flags are usually not provided unless you request the full table or specifically request 'flags'.
This request will provide the flags for the two discharge columns we viewed above:
In [8]:
data.df('q', 'flags').head()
Out[8]:
In this case, 'P' flags indicate "Provisional" data, meaning the data is less than a year old, and the USGS has not reviewed it yet and released it as the "Approved" ('A') official data.
Other flags include:
A more complete listing of qualifier flags can be found here: https://waterdata.usgs.gov/usa/nwis/uv?codes_help#dv_cd1
In [9]:
data.df('discharge').plot()
Out[9]:
View descriptive statistics
In [10]:
data.df('discharge').describe()
Out[10]:
In [11]:
data.df('00000000')