This notebook is supplementary material to the following paper submitted to Groundwater:
R.A. Collenteur, M. Bakker, R. Calje, S. Klop, F. Schaars, (2019) Pastas: open source software for the analysis of groundwater time series, Manuscript under review.
In this notebook the Pastas "cookbook" recipe is shown. In this example it is investigated how well the heads measured in a borehole near Kingstown, Rhode Island, US, can be simulated using rainfall and potential evaporation. A transfer function noise (TFN) model using impulse response function is created to simulate the observed heads.
The observed heads are obtained from the Ground-Water Climate Response Network (CRN) of the USGS (https://groundwaterwatch.usgs.gov/). The corresponding USGS site id is 412918071321001. The rainfall data is taken from the Global Summary of the Day dataset (GSOD) available at the National Climatic Data Center (NCDC). The rainfall series is obtained from the weather station in Kingston (station number: NCDC WBAN 54796) located at 41.491$^\circ$, -71.541$^\circ$. The evaporation is calculated from the mean temperature obtained from the same USGS station using Thornthwaite's method (Pereira and Pruitt, 2004).
Pereira AR, Pruitt WO (2004), Adaptation of the Thornthwaite scheme for estimating daily reference evapotranspiration, Agricultural Water Management 66(3), 251-257
The first step to creating the TFN model is to import the python packages. In this notebook two packages are used, the Pastas package and the Pandas package to import the time series data. Both packages are short aliases for convenience (ps
for the Pastas package and pd
for the Pandas package). The other packages that are imported are not needed for the analysis but are needed to make the publication figures.
In [1]:
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
import pastas as ps
ps.set_log_level("ERROR")
%matplotlib inline
# This notebook has been developed using Pastas version 0.9.9 and Python 3.7
print("Pastas version: {}".format(ps.__version__))
print("Pandas version: {}".format(pd.__version__))
print("Numpy version: {}".format(np.__version__))
print("Python version: {}".format(os.sys.version))
The next step is to import the time series data. Three series are used in this example; the observed groundwater head, the rainfall and the evaporation. The data can be read using different methods, in this case the Pandas read_csv
method is used to read the csv files. Each file consists of two columns; a date column called 'Date' and a column containing the values for the time series. The index column is the first column and is read as a date format. The heads series are stored in the variable obs
, the rainfall in rain
and the evaporation in evap
. All variables are transformed to SI-units.
In [2]:
obs = pd.read_csv('obs.csv', index_col='Date', parse_dates=True) * 0.3048
rain = pd.read_csv('rain.csv', index_col='Date', parse_dates=True) * 0.3048
rain = rain.asfreq("D", fill_value=0.0) # There are some nan-values present
evap = pd.read_csv('evap.csv', index_col='Date', parse_dates=True) * 0.3048
In [3]:
ml = ps.Model(obs.loc[::14], name='Kingstown')
A RechargeModel
instance is created and stored in the variable rm
, taking the rainfall and potential evaporation time series as input arguments, as well as a name and a response function. In this example the Gamma response function is used (the Gamma function is available as ps.Gamma
). After creation the recharge stress model instance is added to the model.
In [4]:
rm = ps.RechargeModel(rain, evap, name='recharge', rfunc=ps.Gamma)
ml.add_stressmodel(rm)
The model parameters are estimated by calling the solve
method of the Model
instance. In this case the default settings are used (for all but the tmax argument) to solve the model. Several options can be specified in the .solve
method, for example; a tmin
and tmax
or the type of solver used (this defaults to a least squares solver, ps.LeastSquares
). This solve
method prints a fit report with basic information about the model setup and the results of the model fit.
In [5]:
ml.solve(tmax="2014");
# Print some information on the model fit for the validation period
print("\nThe R2 and the RMSE in the validation period are ", ml.stats.rsq(tmin="2015", tmax="2019").round(2),
"and", ml.stats.rmse(tmin="2015", tmax="2019").round(2), ", respectively.")
The final step of the "cookbook" recipe is to visualize the results of the TFN model. The Pastas package has several build in plotting methods, available through the ml.plots
instance. Here the .results
plotting method is used. This method plots an overview of the model results, including the simulation and the observations of the groundwater head, the optimized model parameters, the residuals and the noise, the contribution of each stressmodel, and the step response function for each stressmodel.
In [6]:
ml.plots.results(tmax="2018");
In [7]:
ml.plots.diagnostics()
Out[7]:
In the next codeblocks the Figures used in the Pastas paper are created. The first codeblock sets the matplotlib parameters to obtain publication-quality figures. The following figures are created:
In [8]:
# Set matplotlib params to create publication figures
params = {
'axes.labelsize': 18,
'axes.grid': True,
'font.size': 16,
'font.family': 'serif',
'legend.fontsize': 16,
'xtick.labelsize': 16,
'ytick.labelsize': 16,
'text.usetex': False,
'figure.figsize': [8.2, 5],
'lines.linewidth' : 2
}
plt.rcParams.update(params)
# Save figures or not
savefig = True
figpath = "figures"
if not os.path.exists(figpath):
os.mkdir(figpath)
In [9]:
rfunc = ps.Gamma(cutoff=0.999)
p = [100, 1.5, 15]
b = np.append(0, rfunc.block(p))
s = rfunc.step(p)
rfunc2 = ps.Hantush(cutoff=0.999)
p2 = [-100, 4, 15]
b2 = np.append(0, rfunc2.block(p2))
s2 = rfunc2.step(p2)
# Make a figure of the step and block response
fig, [ax1, ax2] = plt.subplots(1, 2, sharex=True, figsize=(8, 4))
ax1.plot(b)
ax1.plot(b2)
ax1.set_ylabel("block response")
ax1.set_xlabel("days")
ax1.legend(["Gamma", "Hantush"], handlelength=1.3)
ax1.axhline(0.0, linestyle="--", c="k")
ax2.plot(s)
ax2.plot(s2)
ax2.set_xlim(0,100)
ax2.set_ylim(-105, 105)
ax2.set_ylabel("step response")
ax2.set_xlabel("days")
ax2.axhline(0.0, linestyle="--", c="k")
ax2.annotate('', xy=(95, 100), xytext=(95, 0),
arrowprops={'arrowstyle': '<->'})
ax2.annotate('A', xy=(95, 100), xytext=(85, 50))
ax2.annotate('', xy=(95, -100), xytext=(95, 0),
arrowprops={'arrowstyle': '<->'})
ax2.annotate('A', xy=(95, 100), xytext=(85, -50))
plt.tight_layout()
if savefig:
path = os.path.join(figpath, "impuls_step_response.eps")
plt.savefig(path, dpi=300, bbox_inches="tight")
In [10]:
fig, [ax1, ax2, ax3] = plt.subplots(3,1, sharex=True, figsize=(8, 7))
ax1.plot(obs, 'k.',label='obs', markersize=2)
ax1.set_ylabel('head (m)', labelpad=0)
ax1.set_yticks([-4, -3, -2])
plot_rain = ax2.plot(rain * 1000, color='k', label='prec', linewidth=1)
ax2.set_ylabel('rain (mm/d)', labelpad=-5)
ax2.set_xlabel('Date');
ax2.set_ylim([0,150])
ax2.set_yticks(np.arange(0, 151, 50))
plot_evap = ax3.plot(evap * 1000,'k', label='evap', linewidth=1)
ax3.set_ylabel('evap (mm/d)')
ax3.tick_params('y')
ax3.set_ylim([0,8])
plt.xlim(['2003','2019'])
plt.xticks([str(x) for x in np.arange(2004, 2019, 2)], rotation=0, horizontalalignment='center')
ax2.set_xlabel("")
ax3.set_xlabel("year")
if savefig:
path = os.path.join(figpath, "data_example_1.eps")
plt.savefig(path, bbox_inches='tight', dpi=300)
In [11]:
# Create the main plot
fig, ax = plt.subplots(figsize=(16,5))
ax.plot(obs, marker=".", c="grey", linestyle=" ")
ax.plot(obs.loc[:"2013":14], marker="x", markersize=7, c="C3", linestyle=" ", mew=2)
ax.plot(ml.simulate(tmax="2019"), c="k")
plt.ylabel('head (m)')
plt.xlabel('year')
plt.title("")
plt.xticks([str(x) for x in np.arange(2004, 2019, 2)], rotation=0, horizontalalignment='center')
plt.xlim('2003', '2019')
plt.ylim(-4.7, -1.6)
plt.yticks(np.arange(-4, -1, 1))
# Create the arrows indicating the calibration and validation period
ax.annotate("calibration period", xy=("2003-01-01", -4.6), xycoords='data',
xytext=(300, 0), textcoords='offset points',
arrowprops=dict(arrowstyle="->"), va="center", ha="center")
ax.annotate("", xy=("2014-01-01", -4.6), xycoords='data',
xytext=(-230, 0), textcoords='offset points',
arrowprops=dict(arrowstyle="->"), va="center", ha="center")
ax.annotate("validation", xy=("2014-01-01", -4.6), xycoords='data',
xytext=(150, 0), textcoords='offset points',
arrowprops=dict(arrowstyle="->"), va="center", ha="center")
ax.annotate("", xy=("2019-01-01", -4.6), xycoords='data',
xytext=(-85, 0), textcoords='offset points',
arrowprops=dict(arrowstyle="->"), va="center", ha="center")
plt.legend(["observed head", "used for calibration","simulated head"], loc=2, numpoints=3)
# Create the inset plot with the step response
ax2 = plt.axes([0.66, 0.65, 0.22, 0.2])
s = ml.get_step_response("recharge")
ax2.plot(s, c="k")
ax2.set_ylabel("response")
ax2.set_xlabel("days", labelpad=-15)
ax2.set_xlim(0, s.index.size)
ax2.set_xticks([0, 300])
if savefig:
path = os.path.join(figpath, "results.eps")
plt.savefig(path, bbox_inches='tight', dpi=300)
In [77]:
from matplotlib.font_manager import FontProperties
font = FontProperties()
#font.set_size(10)
font.set_weight('normal')
font.set_family('monospace')
font.set_name("courier new")
plt.text(-1, -1, str(ml.fit_report()), fontproperties=font)
plt.axis('off')
plt.tight_layout()
if savefig:
path = os.path.join(figpath, "fit_report.eps")
plt.savefig(path, bbox_inches='tight', dpi=600)
In [13]:
fig, ax1 = plt.subplots(1,1, figsize=(8, 3))
ml.residuals(tmax="2019").plot(ax=ax1, c="k")
ml.noise(tmax="2019").plot(ax=ax1, c="C0")
plt.xticks([str(x) for x in np.arange(2004, 2019, 2)], rotation=0, horizontalalignment='center')
ax1.set_ylabel('(m)')
ax1.set_xlabel('year')
ax1.legend(["residuals", "noise"], ncol=2)
if savefig:
path = os.path.join(figpath, "residuals.eps")
plt.savefig(path, bbox_inches='tight', dpi=300)
In [61]:
fig, ax2 = plt.subplots(1,1, figsize=(9, 2))
n =ml.noise()
conf = 1.96 / np.sqrt(n.index.size)
acf = ps.stats.acf(n)
ax2.axhline(conf, linestyle='--', color="dimgray")
ax2.axhline(-conf, linestyle='--', color="dimgray")
ax2.stem(acf.index, acf.values)
ax2.set_ylabel('ACF (-)')
ax2.set_xlabel('lag (days)')
plt.xlim(0, 370)
plt.ylim(-0.25, 0.25)
plt.legend(["95% confidence interval"])
if savefig:
path = os.path.join(figpath, "acf.eps")
plt.savefig(path, bbox_inches='tight', dpi=300)
In [15]:
h, test = ps.stats.ljung_box(ml.noise())
print("The hypothesis that there is significant autocorrelation is:", h)
test
Out[15]: