In this notebook, we'll explore some functionality of the models of this package. We'll work with the coupled CemaneigeGR4j model that is implemented in `rrmpg.models`

module. The data we'll use, comes from the CAMELS [1] data set. For some basins, the data is provided within this Python library and can be easily imported using the `CAMELSLoader`

class implemented in the `rrmpg.data`

module.

In summary we'll look at:

- How you can create a model instance.
- How we can use the CAMELSLoader.
- How you can fit the model parameters to observed discharge by:
- Using one of SciPy's global optimizer
- Monte-Carlo-Simulation

- How you can use a fitted model to calculate the simulated discharge.

[1] Addor, N., A.J. Newman, N. Mizukami, and M.P. Clark, 2017: The CAMELS data set: catchment attributes and meteorology for large-sample studies. version 2.0. Boulder, CO: UCAR/NCAR. doi:10.5065/D6G73C3Q

```
In [1]:
```# Imports and Notebook setup
from timeit import timeit
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from rrmpg.models import CemaneigeGR4J
from rrmpg.data import CAMELSLoader
from rrmpg.tools.monte_carlo import monte_carlo
from rrmpg.utils.metrics import calc_nse

As a first step let us have a look how we can create one of the models implemented in `rrmpg.models`

. Basically, for all models we have two different options:

- Initialize a model
**without**specific model parameters. - Initialize a model
**with**specific model parameters.

The documentation provides a list of all model parameters. Alternatively we can look at `help()`

for the model (e.g. `help(CemaneigeGR4J)`

).

If no specific model parameters are provided upon intialization, random parameters will be generated that are in between the default parameter bounds. We can look at these bounds by calling `.get_param_bounds()`

method on the model object and check the current parameter values by calling `.get_params()`

method.

For now we don't know any specific parameter values, so we'll create one with random parameters.

```
In [2]:
```model = CemaneigeGR4J()
model.get_params()

```
Out[2]:
```

Here we can see the six model parameters of CemaneigeGR4J model and their current values.

To have data to start with, we can use the `CAMELSLoader`

class to load data of provided basins from the CAMELS dataset. To get a list of all available basins that are provided within this library, we can use the `.get_basin_numbers()`

method. For now we will use the provided basin number `01031500`

.

```
In [3]:
```df = CAMELSLoader().load_basin('01031500')
df.head()

```
Out[3]:
```

Next we will split the data into a calibration period, which we will use to find a set of good model parameters, and a validation period, we will use the see how good our model works on unseen data. As in the CAMELS data set publication, we will use the first 15 hydrological years for calibration. The rest of the data will be used for validation.

Because the index of the dataframe is in pandas Datetime format, we can easily split the dataframe into two parts

```
In [4]:
```# calcute the end date of the calibration period
end_cal = pd.to_datetime(f"{df.index[0].year + 15}/09/30", yearfirst=True)
# validation period starts one day later
start_val = end_cal + pd.DateOffset(days=1)
# split the data into two parts
cal = df[:end_cal].copy()
val = df[start_val:].copy()

As already said above, we'll look at two different methods implemented in this library:

- Using one of SciPy's global optimizer
- Monte-Carlo-Simulation

Each model has a `.fit()`

method. This function uses the global optimizer differential evolution from the scipy package to find the set of model parameters that produce the best simulation, regarding the provided observed discharge array.
The inputs for this function can be found in the documentation or the `help()`

.

```
In [5]:
```help(model.fit)

```
```

`CAMELSLoader`

class via the `.get_station_height()`

method.

```
In [6]:
```# calculate mean temp for calibration and validation period
cal['tmean'] = (cal['tmin(C)'] + cal['tmax(C)']) / 2
val['tmean'] = (val['tmin(C)'] + val['tmax(C)']) / 2
# load the gauge station height
height = CAMELSLoader().get_station_height('01031500')

```
In [7]:
```# We don't have an initial value for the snow storage, so we omit this input
result = model.fit(cal['QObs(mm/d)'], cal['prcp(mm/day)'], cal['tmean'],
cal['tmin(C)'], cal['tmax(C)'], cal['PET'], height)

`result`

is an object defined by the scipy library and contains the optimized model parameters, as well as some more information on the optimization process. Let us have a look at this object:

```
In [8]:
``````
result
```

```
Out[8]:
```

The relevant information here is:

`fun`

is the final value of our optimization criterion (the mean-squared-error in this case)`message`

describes the cause of the optimization termination`nfev`

is the number of model simulations`sucess`

is a flag wether or not the optimization was successful`x`

are the optimized model parameters

Next, let us set the model parameters to the optimized ones found by the search. Therefore we need to create a dictonary containing one key for each model parameter and as the corresponding value the optimized parameter. As mentioned before, the list of model parameter names can be retrieved by the `model.get_parameter_names()`

function. We can then create the needed dictonary by the following lines of code:

```
In [9]:
```params = {}
param_names = model.get_parameter_names()
for i, param in enumerate(param_names):
params[param] = result.x[i]
# This line set the model parameters to the ones specified in the dict
model.set_params(params)
# To be sure, let's look at the current model parameters
model.get_params()

```
Out[9]:
```

`result.x`

. In `result.x`

they are ordered according to the ordering of the `_param_list`

specified in each model class, where ass the dictonary output here is alphabetically sorted.

```
In [10]:
```help(monte_carlo)

```
```

`model.simulate()`

function. Let us create a new model instance and see how this works for the CemaneigeGR4J model.

```
In [11]:
```model2 = CemaneigeGR4J()
# Let use run MC for 1000 runs, which is in the same range as the above optimizer
result_mc = monte_carlo(model2, num=10000, qobs=cal['QObs(mm/d)'],
prec=cal['prcp(mm/day)'], mean_temp=cal['tmean'],
min_temp=cal['tmin(C)'], max_temp=cal['tmax(C)'],
etp=cal['PET'], met_station_height=height)
# Get the index of the best fit (smallest mean squared error)
idx = np.argmin(result_mc['mse'][~np.isnan(result_mc['mse'])])
# Get the optimal parameters and set them as model parameters
optim_params = result_mc['params'][idx]
params = {}
for i, param in enumerate(param_names):
params[param] = optim_params[i]
# This line set the model parameters to the ones specified in the dict
model2.set_params(params)

```
In [12]:
```# simulated discharge of the model optimized by the .fit() function
val['qsim_fit'] = model.simulate(val['prcp(mm/day)'], val['tmean'],
val['tmin(C)'], val['tmax(C)'],
val['PET'], height)
# simulated discharge of the model optimized by monte-carlo-sim
val['qsim_mc'] = model2.simulate(val['prcp(mm/day)'], val['tmean'],
val['tmin(C)'], val['tmax(C)'],
val['PET'], height)
# Calculate and print the Nash-Sutcliff-Efficiency for both simulations
nse_fit = calc_nse(val['QObs(mm/d)'], val['qsim_fit'])
nse_mc = calc_nse(val['QObs(mm/d)'], val['qsim_mc'])
print("NSE of the .fit() optimization: {:.4f}".format(nse_fit))
print("NSE of the Monte-Carlo-Simulation: {:.4f}".format(nse_mc))

```
```

```
In [13]:
```# Plot last full hydrological year of the simulation
%matplotlib notebook
start_date = pd.to_datetime("2013/10/01", yearfirst=True)
end_date = pd.to_datetime("2014/09/30", yearfirst=True)
plt.plot(val.loc[start_date:end_date, 'QObs(mm/d)'], label='Qobs')
plt.plot(val.loc[start_date:end_date, 'qsim_fit'], label='Qsim .fit()')
plt.plot(val.loc[start_date:end_date, 'qsim_mc'], label='Qsim mc')
plt.legend()

```
Out[13]:
```

```
In [14]:
```%%timeit
model.simulate(val['prcp(mm/day)'], val['tmean'],
val['tmin(C)'], val['tmax(C)'],
val['PET'], height)

```
```