tsgettoolbox and tstoolbox - Command Line Interface

'tsgettoolbox nwis ...': Download data from the National Water Information System (NWIS)

This notebook is to illustrate the command line usage for 'tsgettoolbox' and 'tstoolbox' to download and work with data from the National Water Information System (NWIS). There is a different notebook to do the same thing from within a Python program called tsgettoolbox-nwis-api.

First off, always nice to remind myself about the options. Each sub-command has their own options kept consistent with the options available from the source service. The way that NWIS works is you have one major filter and one or more minor filters to define what sites you want.


In [ ]:
tsgettoolbox nwis_dv --help

Let's say that I want flow (parameterCd=00060) for site '02325000'. I first make sure that I am getting what I want by allowing the output to be printed to the screen. Note the pipe ('|') that directs output to the 'head' command to display the top 10 lines of the time-series.


In [14]:
tsgettoolbox nwis_dv --sites 02325000 --startDT '2000-01-01' --parameterCd 00060 | head


Datetime,USGS-02325000-00060
2000-01-01,82
2000-01-02,81
2000-01-03,80
2000-01-04,79
2000-01-05,75
2000-01-06,75
2000-01-07,74
2000-01-08,73
2000-01-09,75

Then I redirect to a file with "> filename.csv" so that I don't have to wait for the USGS NWIS services for the remaining work or analysis.


In [15]:
tsgettoolbox nwis_dv --sites 02325000 --startDT '2000-01-01' --parameterCd 00060 > 02325000_flow.csv



'tstoolbox ...': Process data using 'tstoolbox'

Now lets use "tstoolbox" to plot the time-series. Note the redirection again, this time for input as "< filename.csv". Default plot filename is "plot.png".


In [16]:
tstoolbox plot < 02325000_flow.csv



'tstoolbox plot' has many options that can be used to modify the plot.


In [17]:
tstoolbox plot --help


usage: tstoolbox plot [-h] [--ofilename <str>] [--type <str>] [--xtitle <str>]
  [--ytitle <str>] [--title <str>] [--figsize <str>] [--legend LEGEND]
  [--legend_names <str>] [--subplots] [--sharex] [--sharey] [--style <str>]
  [--logx] [--logy] [--xaxis <str>] [--yaxis <str>] [--xlim XLIM] [--ylim
  YLIM] [--secondary_y] [--mark_right] [--scatter_matrix_diagonal <str>]
  [--bootstrap_size BOOTSTRAP_SIZE] [--bootstrap_samples BOOTSTRAP_SAMPLES]
  [--norm_xaxis] [--norm_yaxis] [--lognorm_xaxis] [--lognorm_yaxis]
  [--xy_match_line <str>] [--grid GRID] [-i <str>] [-s <str>] [-e <str>]
  [--label_rotation <int>] [--label_skip <int>] [--force_freq FORCE_FREQ]
  [--drawstyle <str>] [--por] [--columns COLUMNS] [--invert_xaxis]
  [--invert_yaxis] [--plotting_position <str>]

Plot data.

optional arguments:
  -h | --help
      show this help message and exit
  --ofilename <str>
      Output filename for the plot. Extension defines the type, ('.png').
      Defaults to 'plot.png'.
  --type <str>
      The plot type. Defaults to 'time'.
      Can be one of the following:
      time
        standard time series plot

      xy
        (x,y) plot, also know as a scatter plot

      double_mass
        (x,y) plot of the cumulative sum of x and y

      boxplot
        box extends from lower to upper quartile, with line at the median.
        Depending on the statistics, the wiskers represent the range of
        the data or 1.5 times the inter-quartile range (Q3 - Q1)

      scatter_matrix
        plots all columns against each other

      lag_plot
        indicates structure in the data

      autocorrelation
        plot autocorrelation

      bootstrap
        visually asses aspects of a data set by plotting random selections of
        values

      probability_density
        sometime called kernel density estimation (KDE)

      bar
        sometimes called a column plot

      barh
        a horizontal bar plot

      bar_stacked
        sometimes called a stacked column

      barh_stacked
        a horizontal stacked bar plot

      histogram
        calculate and create a histogram plot

      norm_xaxis
        sort, calculate probabilities, and plot data against an x axis normal
        distribution

      norm_yaxis
        sort, calculate probabilities, and plot data against an y axis normal
        distribution

      lognorm_xaxis
        sort, calculate probabilities, and plot data against an x axis lognormal
        distribution

      lognorm_yaxis
        sort, calculate probabilities, and plot data against an y axis lognormal
        distribution

      weibull_xaxis
        sort, calculate and plot data against an x axis weibull distribution

      weibull_yaxis
        sort, calculate and plot data against an y axis weibull distribution

  --xtitle <str>
      Title of x-axis, default depend on type.
  --ytitle <str>
      Title of y-axis, default depend on type.
  --title <str>
      Title of chart, defaults to ''.
  --figsize <str>
      The 'width,height' of plot as inches. Defaults to '10,6.5'.
  --legend LEGEND
      Whether to display the legend. Defaults to True.
  --legend_names <str>
      Legend would normally use the time-series names associated with the input
      data. The 'legend_names' option allows you to override the names in
      the data set. You must supply a comma separated list of strings for
      each time-series in the data set. Defaults to None.
  --subplots
      boolean, default False. Make separate subplots for each time series
  --sharex
      boolean, default True In case subplots=True, share x axis
  --sharey
      boolean, default False In case subplots=True, share y axis
  --style <str>
      Comma separated matplotlib style strings matplotlib line style per
      time-series. Just combine codes in 'ColorLineMarker' order, for
      example r--* is a red dashed line with star marker.
      ┌──────┬─────────┐
      │ Code │ Color   │
      ├──────┼─────────┤
      │ b    │ blue    │
      ├──────┼─────────┤
      │ g    │ green   │
      ├──────┼─────────┤
      │ r    │ red     │
      ├──────┼─────────┤
      │ c    │ cyan    │
      ├──────┼─────────┤
      │ m    │ magenta │
      ├──────┼─────────┤
      │ y    │ yellow  │
      ├──────┼─────────┤
      │ k    │ black   │
      ├──────┼─────────┤
      │ w    │ white   │
      ╘══════╧═════════╛

      ┌─────────┬───────────┐
      │ Number  │ Color     │
      ├─────────┼───────────┤
      │ 0.75    │ 0.75 gray │
      ├─────────┼───────────┤
      │ ...etc. │           │
      ╘═════════╧═══════════╛

      ┌──────────────────┐
      │ HTML Color Names │
      ├──────────────────┤
      │ red              │
      ├──────────────────┤
      │ burlywood        │
      ├──────────────────┤
      │ chartreuse       │
      ├──────────────────┤
      │ ...etc.          │
      ╘══════════════════╛

      Color reference: <http://matplotlib.org/api/colors_api.html>
      ┌──────┬──────────────┐
      │ Code │ Lines        │
      ├──────┼──────────────┤
      │ •    │ solid        │
      ├──────┼──────────────┤
      │ --   │ dashed       │
      ├──────┼──────────────┤
      │ -.   │ dash_dot     │
      ├──────┼──────────────┤
      │ :    │ dotted       │
      ├──────┼──────────────┤
      │ None │ draw nothing │
      ├──────┼──────────────┤
      │ ' '  │ draw nothing │
      ├──────┼──────────────┤
      │ ''   │ draw nothing │
      ╘══════╧══════════════╛

      Line reference: <http://matplotlib.org/api/artist_api.html>
      ┌──────┬────────────────┐
      │ Code │ Markers        │
      ├──────┼────────────────┤
      │ .    │ point          │
      ├──────┼────────────────┤
      │ o    │ circle         │
      ├──────┼────────────────┤
      │ v    │ triangle down  │
      ├──────┼────────────────┤
      │ ^    │ triangle up    │
      ├──────┼────────────────┤
      │ <    │ triangle left  │
      ├──────┼────────────────┤
      │ >    │ triangle right │
      ├──────┼────────────────┤
      │ 1    │ tri_down       │
      ├──────┼────────────────┤
      │ 2    │ tri_up         │
      ├──────┼────────────────┤
      │ 3    │ tri_left       │
      ├──────┼────────────────┤
      │ 4    │ tri_right      │
      ├──────┼────────────────┤
      │ 8    │ octagon        │
      ├──────┼────────────────┤
      │ s    │ square         │
      ├──────┼────────────────┤
      │ p    │ pentagon       │
      ├──────┼────────────────┤
      │ •    │ star           │
      ├──────┼────────────────┤
      │ h    │ hexagon1       │
      ├──────┼────────────────┤
      │ H    │ hexagon2       │
      ├──────┼────────────────┤
      │ •    │ plus           │
      ├──────┼────────────────┤
      │ x    │ x              │
      ├──────┼────────────────┤
      │ D    │ diamond        │
      ├──────┼────────────────┤
      │ d    │ thin diamond   │
      ├──────┼────────────────┤
      │ _    │ hline          │
      ├──────┼────────────────┤
      │ None │ nothing        │
      ├──────┼────────────────┤
      │ ' '  │ nothing        │
      ├──────┼────────────────┤
      │ ''   │ nothing        │
      ╘══════╧════════════════╛

      Marker reference: <http://matplotlib.org/api/markers_api.html>
  --logx
      DEPRECATED: use '--xaxis="log"' instead.
  --logy
      DEPRECATED: use '--yaxis="log"' instead.
  --xaxis <str>
      Defines the type of the xaxis. One of 'arithmetic', 'log'. Default is
      'arithmetic'.
  --yaxis <str>
      Defines the type of the yaxis. One of 'arithmetic', 'log'. Default is
      'arithmetic'.
  --xlim XLIM
      Comma separated lower and upper limits (--xlim 1,1000) Limits for the
      x-axis. Default is based on range of x values.
  --ylim YLIM
      Comma separated lower and upper limits (--ylim 1,1000) Limits for the
      y-axis. Default is based on range of y values.
  --secondary_y
      Boolean or sequence, default False Whether to plot on the secondary y-axis
      If a list/tuple, which time-series to plot on secondary y-axis
  --mark_right
      Boolean, default True : When using a secondary_y axis, should the legend
      label the axis of the various time-series automatically
  --scatter_matrix_diagonal <str>
      If plot type is 'scatter_matrix', this specifies the plot along the
      diagonal. Defaults to 'probability_density'.
  --bootstrap_size BOOTSTRAP_SIZE
      The size of the random subset for 'bootstrap' plot. Defaults to 50.
  --bootstrap_samples BOOTSTRAP_SAMPLES
      The number of random subsets of 'bootstrap_size'. Defaults to 500.
  --norm_xaxis
      DEPRECATED: use '--type="norm_xaxis"' instead.
  --norm_yaxis
      DEPRECATED: use '--type="norm_yaxis"' instead.
  --lognorm_xaxis
      DEPRECATED: use '--type="lognorm_xaxis"' instead.
  --lognorm_yaxis
      DEPRECATED: use '--type="lognorm_yaxis"' instead.
  --xy_match_line <str>
      Will add a match line where x == y. Default is ''. Set to a line style
      code.
  --grid GRID
      Boolean, default True Whether to plot grid lines on the major ticks.
  -i <str> | --input_ts <str>
      Filename with data in 'ISOdate,value' format or '-' for stdin.
  -s <str> | --start_date <str>
      The start_date of the series in ISOdatetime format, or 'None' for
      beginning.
  -e <str> | --end_date <str>
      The end_date of the series in ISOdatetime format, or 'None' for end.
  --label_rotation <int>
      Rotation for major labels for bar plots.
  --label_skip <int>
      Skip for major labels for bar plots.
  --force_freq FORCE_FREQ
      Force this frequency for the plot. WARNING: you may lose data if not
      careful with this option. In general, letting the algorithm
      determine the frequency should always work, but this option will
      override. Use PANDAS offset codes,
  --drawstyle <str>
      'default' connects the points with lines. The steps variants produce
      step-plots. 'steps' is equivalent to 'steps-pre' and is maintained
      for backward-compatibility. ACCEPTS:
      ['default' | 'steps' | 'steps-pre' | 'steps-mid' | 'steps-post']

  --por
      Plot from first good value to last good value. Strip NANs from beginning
      and end.
  --columns COLUMNS
      Columns to pick out of input. Can use column names or column numbers. If
      using numbers, column number 1 is the first data column. To pick
      multiple columns; separate by commas with no spaces. As used in
      'pick' command.
  --invert_xaxis
      Invert the x-axis.
  --invert_yaxis
      Invert the y-axis.
  --plotting_position <str>
      'weibull', 'benard', 'tukey', 'gumbel', 'hazen', 'cunnane', or
      'california'. The default is 'weibull'.
      ┌────────────┬─────────────────┬───────────────────────┐
      │ weibull    │ i/(n+1)         │ mean of sampling      │
      │            │                 │ distribution          │
      ├────────────┼─────────────────┼───────────────────────┤
      │ benard     │ (i-0.3)/(n+0.4) │ approx. median of     │
      │            │                 │ sampling distribution │
      ├────────────┼─────────────────┼───────────────────────┤
      │ tukey      │ (i-1/3)/(n+1/3) │ approx. median of     │
      │            │                 │ sampling distribution │
      ├────────────┼─────────────────┼───────────────────────┤
      │ gumbel     │ (i-1)/(n-1)     │ mode of sampling      │
      │            │                 │ distribution          │
      ├────────────┼─────────────────┼───────────────────────┤
      │ hazen      │ (i-1/2)/n       │ midpoints of n equal  │
      │            │                 │ intervals             │
      ├────────────┼─────────────────┼───────────────────────┤
      │ cunnane    │ (i-2/5)/(n+1/5) │ subjective            │
      ├────────────┼─────────────────┼───────────────────────┤
      │ california │ i/n             │                       │
      ╘════════════╧═════════════════╧═══════════════════════╛

      Where 'i' is the sorted rank of the y value, and 'n' is the total number
      of values to be plotted.
      Only used for norm_xaxis, norm_yaxis, lognorm_xaxis, lognorm_yaxis,
      weibull_xaxis, and weibull_yaxis.

In [18]:
tstoolbox plot --ofilename flow.png --ytitle 'Flow (cfs)' --title '02325000: FENHOLLOWAY RIVER NEAR PERRY, FLA' --legend False < 02325000_flow.csv



Monthly Average Flow

You can also use tstoolbox to make calculations on the time-series, for example to aggregate to monthly average flow:


In [21]:
tstoolbox aggregate --groupby M --statistic mean < 02325000_flow.csv | head


Datetime,USGS-02325000-00060_mean
2000-01-31,80
2000-02-29,89.7931
2000-03-31,80.0323
2000-04-30,81.7667
2000-05-31,90.8387
2000-06-30,94.4
2000-07-31,85.9032
2000-08-31,83.0323
2000-09-30,128.067

In [20]:
tstoolbox aggregate --groupby M --statistic mean < 02325000_flow.csv | tstoolbox plot --ofilename plot_monthly.png --drawstyle steps-pre