In [1]:
# First perform the necessary imports
import pandas as pd
import matplotlib.pyplot as plt
import pastas as ps
%matplotlib inline
In this codeblock a time series of groundwater levels is imported using the read_csv
function of pandas
. As pastas
expects a pandas
Series
object, the data is squeezed. To check if you have the correct data type (a pandas Series
object), you can use type(oseries)
as shown below.
The following characteristics are important when importing and preparing the observed time series:
pandas Series
object.
In [2]:
# Import groundwater time seriesm and squeeze to Series object
gwdata = pd.read_csv('../data/head_nb1.csv', parse_dates=['date'],
index_col='date', squeeze=True)
print('The data type of the oseries is: %s' % type(gwdata))
# Plot the observed groundwater levels
gwdata.plot(style='.', figsize=(10, 4))
plt.ylabel('Head [m]');
plt.xlabel('Time [years]');
Two explanatory series are used: the precipitation and the potential evaporation. These need to be pandas Series
objects, as for the observed heads.
Important characteristics of these time series are:
pandas Series
objects.
In [3]:
# Import observed precipitation series
precip = pd.read_csv('../data/rain_nb1.csv', parse_dates=['date'],
index_col='date', squeeze=True)
print('The data type of the precip series is: %s' % type(precip))
# Import observed evaporation series
evap = pd.read_csv('../data/evap_nb1.csv', parse_dates=['date'],
index_col='date', squeeze=True)
print('The data type of the evap series is: %s' % type(evap))
# Calculate the recharge to the groundwater
recharge = precip - evap
print('The data type of the recharge series is: %s' % type(recharge))
# Plot the time series of the precipitation and evaporation
plt.figure()
recharge.plot(label='Recharge', figsize=(10, 4))
plt.xlabel('Time [years]')
plt.ylabel('Recharge (m/year)');
In this code block the actual time series model is created. First, an instance of the Model
class is created (named ml
here). Second, the different components of the time series model are created and added to the model. The imported time series are automatically checked for missing values and other inconsistencies. The keyword argument fillnan can be used to determine how missing values are handled. If any nan-values are found this will be reported by pastas
.
In [4]:
# Create a model object by passing it the observed series
ml = ps.Model(gwdata, name="GWL")
# Add the recharge data as explanatory variable
ts1 = ps.StressModel(recharge, ps.Gamma, name='recharge', settings="evap")
ml.add_stressmodel(ts1)
The next step is to compute the optimal model parameters. The default solver uses a non-linear least squares method for the optimization. The python package scipy
is used (info on scipy's
least_squares solver can be found here). Some standard optimization statistics are reported along with the optimized parameter values and correlations.
In [5]:
ml.solve()
In [6]:
ml.plot()
Out[6]:
There are many ways to further explore the time series model. pastas
has some built-in functionalities that will provide the user with a quick overview of the model. The plots
subpackage contains all the options. One of these is the method plots.results
which provides a plot with more information.
In [7]:
ml.plots.results(figsize=(10, 6))
Out[7]:
In [8]:
ml.stats.summary()
Out[8]:
In the previous model, the recharge was estimated as precipitation minus potential evaporation. A better model is to estimate the actual evaporation as a factor (called the evaporation factor here) times the potential evaporation. First, new model is created (called ml2
here so that the original model ml
does not get overwritten). Second, the StressModel2
object is created, which combines the precipitation and evaporation series and adds a parameter for the evaporation factor f
. The StressModel2
object is added to the model, the model is solved, and the results and statistics are plotted to the screen. Note that the new model gives a better fit (lower root mean squared error and higher explained variance), and that the Akiake information criterion indicates that the addition of the additional parameter improved the model signficantly (the Akaike criterion for model ml2
is higher than for model ml
).
In [9]:
# Create a model object by passing it the observed series
ml2 = ps.Model(gwdata)
# Add the recharge data as explanatory variable
ts1 = ps.StressModel2([precip, evap], ps.Gamma, name='rainevap', settings=("prec", "evap"))
ml2.add_stressmodel(ts1)
# Solve the model
ml2.solve()
# Plot the results
ml2.plot()
# Statistics
ml2.stats.summary()
Out[9]: