This guide will step you through the basics of using hydrofunctions. Read more in our User's Guide, or visit us on GitHub!
The first step before using hydrofunctions is to get it installed on your system. For scientific computing, we highly recommend using the free, open-source Anaconda distribution to load and manage all of your Python tools and packages. Once you have downloaded and installed Anaconda, or if you already have Python set up on your computer, your next step is to use the pip tool from your operating system's command line to download hydrofunctions.
In Linux:
$ pip install hydrofunctions
In Windows:
C:\MyPythonWorkspace\> pip install hydrofunctions
If you have any difficulties, visit our Installation page in the User's Guide.
In [1]:
# The first step is to import hydrofunctions so that we can use it here.
import hydrofunctions as hf
# This second line allows us to automatically plot diagrams in this notebook.
%matplotlib inline
The USGS runs an amazing web service called the National Water Information System. Our first task is to download daily mean discharge data for a stream called Herring Run. Set the start date and the end date for our download, and use the site number for Herring Run ('01585200') to specify which stream gage we want to collect data from. Once we request the data, it will be saved to a file. If the file is already present, we'll just use that instead of requesting it from the NWIS.
You can visit the NWIS website or use hydrocloud.org to find the site number for a stream gage near you.
In [2]:
start = '2017-06-01'
end = '2017-07-14'
herring = hf.NWIS('01585200', 'dv', start, end, file='herring_july.parquet')
herring # This last command will print out a description of what we have.
Out[2]:
There are several ways to view our data. Try herring.json() or better still, use a Pandas dataframe:
In [3]:
herring.df()
Out[3]:
Pandas' dataframes give you access to hundreds of useful methods, such as .describe() and .plot():
In [4]:
herring.df().describe()
Out[4]:
In [5]:
herring.df().plot()
Out[5]:
Requests can use lists of sites:
In [6]:
sites = ['380616075380701','394008077005601']
The NWIS can deliver data as daily mean values ('dv') or as instantaneous values ('iv') that can get collected as often as every five minutes!
In [7]:
service = 'iv'
Depending on the site, the USGS collects groundwater levels ('72019'), stage ('00065'), precipitation, and more!
In [8]:
pcode = '72019'
Now we'll create a new dataset called 'groundwater' using the values we set up above.
Since one of the parameters gets collected every 30 minutes, and the other gets collected every 15 minutes, Hydrofunctions will interpolate values for every 15 minutes for every parameter we've requested. These interpolated values will be marked with a special hf.interpolate flag in the qualifiers column.
In [9]:
groundwater = hf.NWIS(sites, service, '2018-01-01', '2018-01-31', parameterCd=pcode, file='groundwater.parquet')
groundwater
Out[9]:
Calculate the mean for every data column:
In [10]:
groundwater.df().mean()
Out[10]:
View the data in a specially styled graph!
In [11]:
groundwater.df().plot(marker='o', mfc='white', ms=4, mec='black', color='black')
Out[11]:
hydrofunctions comes with a variety of built-in help functions that you can access from the command line, in addition to our online User's Guide.
Jupyter Notebooks provide additional helpful shortcuts, such as code completion. This will list all of the available
methods for an object just by hitting herring.<TAB> this is equivalent to using dir(herring) to list
all of the methods available to you.
Typing help() or dir() for different objects allows you to access additional information.
help(hf.NWIS) is equivalent to just using a question mark like this: ?hf.NWIS
In [12]:
help(hf.NWIS)
In [13]:
sites = ['07227500', '07228000', '07235000', '07295500', '07297910', '07298500', '07299540',
'07299670', '07299890', '07300000', '07301300', '07301410', '07308200', '07308500', '07311600',
'07311630', '07311700', '07311782', '07311783', '07311800', '07311900', '07312100', '07312200',
'07312500', '07312700', '07314500', '07314900', '07315200', '07315500', '07342465', '07342480',
'07342500', '07343000', '07343200', '07343500', '07344210', '07344500', '07346000']
mult = hf.NWIS(sites, "dv", "2018-01-01", "2018-01-31", file='mult.parquet')
print('No. sites: {}'.format(len(sites)))
This will calculate the mean value for each site.
In [14]:
mult.df().mean()
Out[14]:
Plot just the discharge data for one site in the list:
In [15]:
mult.df('07228000', 'discharge').plot()
Out[15]:
In [16]:
mult
Out[16]:
In [17]:
mult.df('discharge').head()
Out[17]:
In [18]:
# Use this carefully! You can easily request more data than you will know what to do with.
start = "2017-01-01"
end = "2017-12-31"
param = '00060'
virginia = hf.NWIS(None, "dv", start, end, parameterCd=param, stateCd='va', file='virginia.parquet')
In [19]:
# Calculate the mean for each site.
virginia.df('discharge').mean()
Out[19]:
In [20]:
# There are so many sites that we can't read them all!
virginia.df('q').plot(legend=None)
Out[20]:
In [21]:
start = "2017-01-01"
end = "2017-12-31"
county = hf.NWIS(None, "dv", start, end, parameterCd='00060', countyCd=['51059', '51061'], file='PG.parquet')
In [22]:
county.df('data').head()
Out[22]:
In [23]:
county.df('data').plot()
Out[23]:
We would love to hear your comments and suggestions!