The water network consists of a streamflow network, consisting of connections between river gauges, reservoirs, and junctions, and a canal network. The canal network is a bipartite network, with links from nodes in the streamflow network and the counties. The initwaternet.jl file loads these networks into the global environment. It uses cached Julia Data (.jld) files to speed up the loading process, creating them if they aren't present in the data directory.

Scientific documentation

The river network is documented here:

The Canal Network

The canal network is first produced in R and then loaded into Julia. In Julia, it is used as just a DataFrame with the names of the river network nodes added.

The columns in the waterdraws.jld data are as follows:

  • The fips column is the FIPS code of the receiving county.
  • The source column is the row number of the gauge, as it appears in the network variable of the waternet.RData file.
  • The justif column provides a justification for why the gauge should be available for feeding the county. It contains short categorical entries, described in the R script that generates the countydraws.RData file (network3/demand/allcounties.R).
  • The downhill column compares the elevation of the county with the elevation of the gauge, if we know it. It is 1 if county's average elevation is below the gauges, such that the water can be tapped for free.
  • The exdist column is greater than 0 if the county had to be connected to a gauge arbitrarily to ensure that it had any source. In this case, the county is connected to the closest gauge, and this column is the geodesic distance in km.
  • The gaugeid is the only column added by Julia, and is in the same format as the keys in the wateridverts dictionary, which allows easy access to the nodes in the river network by name.

In [8]:
using DataFrames
draws = deserialize(open("../data/waterdraws.jld", "r"))


Development and contributing

You can replace the data/countydraws.RData file with another R Data file which contains the variable draws. draws should be a data.frame minimally with the columns fips and source.

Until the script for generating the countydraws.RData file is migrated into the repository, please do the following to extend the dataset:

  1. Copy the countydraws.RData file into a new sources/waternet directory as countydraws.v1.RData.
  2. Add your script for modifying the data to the same directory, and have it output a new countydraws.v2.RData file. Copy this into the data directory.
  3. If there is already a countydraws.v(N).RData (for $N \ge 2$) file, use the latest one as your input, and output a file countydraws.v(N+1).RData.

Future work:

  • Have each canal specify a flow limit.
  • Include a column for an optional price for using that canal.

Missing canals

By Laureline

Several utilities and facilities operate accros multiple counties. For instance, New York City water supply system source its water from the Catskill Mountains and the Delaware river in Delaware County and distribute the water to all of the boroughs.

Utilities of this type are not rare and occur at many locations across CONTUS. So that the water network allows to link the point of source and the point of use, additionnal connections have been added to the countydraws file.

Cross-county utilities

The first step consists in finding the utilities and facilities that operate across multiple counties. This is done by finding all of the water utilities operating within a given county on the Drinking Water Mapping Application to Protect Source Waters website ( The website then redirects to the Safe Drinking Water Information System (SDWIS) Federal Reporting Services, which provides the list of counties served by the given water system (such as This led to the construction of a dataset canals.txt with the first column being the source county (referred by FIPS), and the other ones listing the counties the water facilities present in the source county serve.

Add missing canals to the water network

The second step is to complete the countydraws dataset. This is done using the R script script_incorporation_missing_data.R, which simply adds a connection between each gauge within a source county to the point of use county.

Current status

As the number of water utilities is consequent, this task has not been accomplished for all counties yet. As a starting point, we focused our research on problematic areas: counties that presented a suspicious public supply withdrawals/population ratio, and counties with important population. The following plot (fig 1.a) shows the USGS 2010 public supply fresh water withdrawals in function of the population. The dots in green are the counties that have been added to the missing canals set. Using this new set of connections, a public supplied demand has been estimated by assuming that the withdrawals performed within a given county are distributed to all of the linked counties (including itself) proportionnally to the population they contain. The new estimated demand is illustrated in fig 1.b in function of population and a comparison for the treated counties between demands and withdrawals can be found in fig 1.c.

The following figure shows the difference between withdrawals and the estimated demand for all CONTUS.