In this Jupyter Notebook, the concept of the Pastas TimeSeries class is explained in full detail.
"To create one class that deals with all user-provided time series and the manipulations of the series while maintaining the original series."
The definition of the class can be found on Github (https://github.com/pastas/pastas/blob/master/pastas/timeseries.py) Documentation on the Pandas Series can be found here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html
The central idea behind the TimeSeries class is to solve all data manipulations in a single class while maintaining the original time series. While manipulating the TimeSeries when working with your Pastas model, the original data are to be maintained such that only the settings and the original series can be stored.
In [1]:
# Import some packages
import pastas as ps
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
Let's first import some time series so we have some data to play around with. We use Pandas read_csv method and obtain a Pandas Series object, pandas data structure to efficiently deal with 1D Time Series data. By default, Pandas adds a wealth of functionalities to a Series object, such as descriptive statistics (e.g. series.describe()
) and plotting funtionality.
In [2]:
gwdata = pd.read_csv('../data/B58C0698001_0.csv', skiprows=11,
parse_dates=['PEIL DATUM TIJD'],
index_col='PEIL DATUM TIJD',
skipinitialspace=True)
gwdata.rename(columns={'STAND (MV)': 'h'}, inplace=True)
gwdata.index.names = ['date']
gwdata = gwdata.h * 0.01
gwdata.plot(figsize=(15,4))
Out[2]:
The user will provide time series data when creating a model instance, or one of the stressmodels found in stressmodels.py. Pastas expects Pandas Series as a standard format in which time series are provided, but will internally transform these to Pastas TimeSeries objects to add the necessary funtionality. It is therefore also possible to provide a TimeSeries object directly instead of a Pandas Series object.
We will now create a TimeSeries object for the groundwater level (gwdata). When creating a TimeSeries object the time series that are provided are validated, such that Pastas can use the provided time series for simulation without errors. The time series are checked for:
If all of the above is OK, a TimeSeries object is returned. When valid time series are provided all of the above checks are no problem and no settings are required. However, all too often this is not the case and at least "fill_nan" and "freq" are required. The first argument tells the TimeSeries object how to handle nan-values, and the freq argument provides the frequency of the original time series (by default, freq=D, fill_nan="interpolate").
In [3]:
oseries = ps.TimeSeries(gwdata, name="Groundwater Level")
# Plot the new time series and the original
plt.figure(figsize=(10,4))
oseries.plot(label="pastas timeseries")
gwdata.plot(label="original")
plt.legend()
Out[3]:
So let's see how we can configure a TimeSeries object. In the case of the observed groundwater levels (oseries) as in the example above, interpolating between observations might not be the preffered method to deal with gaps in your data. In fact, the do not have to be constant for simulation, one of the benefits of the method of impulse response functions. The nan-values can simply be dropped. To configure a TimeSeries object the user has three options:
For example, when creating a TimeSeries object for the groundwater levels consider the three following examples for setting the fill_nan option:
In [4]:
# Options 1
oseries = ps.TimeSeries(gwdata, name="Groundwater Level", settings="oseries")
print(oseries.settings)
In [5]:
# Option 2
oseries = ps.TimeSeries(gwdata, name="Groundwater Level", settings=dict(fill_nan="drop"))
print(oseries.settings)
In [6]:
# Options 3
oseries = ps.TimeSeries(gwdata, name="Groundwater Level", fill_nan="drop")
print(oseries.settings)
All of the above methods yield the same result. It is up to the user which one is preferred.
A question that may arise with options 1, is what the possible strings for settings
are and what configuration is then used.
The TimeSeries class contains a dictionary with predefined settings that are used often. You can ask the TimeSeries class this question:
In [7]:
pd.DataFrame(ps.TimeSeries._predefined_settings).T
Out[7]:
As said, Pastas TimeSeries are capable of handling time series in a way that is convenient for Pastas.
We will now import some precipitation series measured at a daily frequency and show how the above methods work
In [8]:
# Import observed precipitation series
precip = pd.read_csv('../data/Heibloem_rain_data.dat', skiprows=4,
delim_whitespace=True, parse_dates=['date'],
index_col='date')
precip = precip.precip["2012"]
precip /= 1000.0 # Meters
prec = ps.TimeSeries(precip, name="Precipitation", settings="prec")
In [9]:
fig, ax = plt.subplots(2, 1, figsize=(10,8))
prec.update_series(freq="D")
prec.series.plot.bar(ax=ax[0])
prec.update_series(freq="7D")
prec.series.plot.bar(ax=ax[1])
import matplotlib.dates as mdates
ax[1].fmt_xdata = mdates.DateFormatter('%m')
fig.autofmt_xdate()
We just changed the frequency of the TimeSeries. When reducing the frequency, the values were summed into the new bins. Conveniently, all pandas methods are still available and functional, such as the great plotting functionalities of Pandas.
All this happened inplace
, meaning the same object just took another shape based on the new settings. Moreover, it performed those new settings (freq="W"
weekly values) on the original series. This means that going back and forth between frequencies does not lead to any information loss.
Why is this so important? Because when solving or simulating a model, the Model will ask every member of the TimeSeries family to prepare itself with the necessary settings (e.g. new freq) and perform that operation only once. When asked for a time series, the TimeSeries object will "be" in that new shape.
Let's say, we want to simulate the groundwater series for a period where no data is available for the time series, but we need some kind of value for the warmup period to prevent things from getting messy. The TimeSeries object can easily extend itself, as the following example shows.
In [10]:
prec.update_series(tmin="2011")
prec.plot()
prec.settings
Out[10]:
When done, we might want to store the TimeSeries object for later use. A dump
method is built-in to export the original time series to a json format, along with its current settings and name. This way the original data is maintained and can easily be recreated from a json file.
In [11]:
data = prec.dump()
print(data.keys())
In [12]:
# Tadaa, we have our extended time series in weekly frequency back!
ts = ps.TimeSeries(**data)
ts.plot()
In [ ]: