PEST is a highly capable system for model calibration, and for sensitivity and uncertainty analysis. PEST is independent of any particular modelling software and modellers have connected Source to PEST in a number of ways.
This tutorial looks at specific support in veneer-py for calibration Source models with PEST. Notably, you can describe the PEST job in the notebook and veneer-py will generate the required PEST configuration files.
At the time of writing, this functionality supports basic calibration of model parameters to timeseries observations.
PEST can be downloaded from http://www.pesthomepage.org/. You want the standard PC PEST for Windows - not BEOPEST. PEST is delivered as a zip file - unzip to a directory on your system (ideally a directory without spaces or special characters in the path - something like C:\PEST
is good)
This tutorial will gloss over a lot of PEST specifics in order to focus on the connection to Source and Veneer. The PEST manual is very comprehensive and it is available from the PEST homepage and included with the software.
Note: This session uses ExampleProject/CalibrationExample.rsproj
. You are welcome to work with your own model instead, however you will need to change the notebook text at certain points to reflect the names of nodes, links and functions in your model file.
PEST requires a series of configuration file that describe the parameter estimation problem, including:
veneer-py has functionality for describing the calibration problem in Python, and then subsequently generating the files expected by PEST. This includes generating a Python script, which becomes the program that PEST invokes. This Python program calls to a running copy of Veneer to execute the Source model and retrieve results.
When running PEST in parallel, PEST controls a number of 'slave' processes, each of which is responsible for running one copy of the simulation at a time:
In this way, you can run a PEST calibration for a Source model that is running in the main Source windows application. You can start multiple copies of Source in order to support parallel calibrations. Alternatively, you can use the Veneer Command Line tool.
In [1]:
from veneer.manage import start, create_command_line, kill_all_now
import veneer
In [2]:
veneer_install = 'D:\\src\\projects\\Veneer\\Compiled\\Source 4.1.1.4484 (public version)'
source_version = '4.1.1'
cmd_directory = 'E:\\temp\\veneer_cmd'
path = create_command_line(veneer_install,source_version,dest=cmd_directory)
path
Out[2]:
In [3]:
catchment_project='ExampleProject/CalibrationExample.rsproj'
num_copies=20 # Important - set this to be a number ~ the number of CPU cores in your system!
first_port=9950
In [4]:
processes,ports = start(catchment_project,
n_instances=num_copies,
ports=first_port,
debug=True,
veneer_exe=path,
remote=False)
Also as before, we need a copy of the Veneer client for each copy of the server:
In [5]:
vs = [veneer.Veneer(port=p) for p in ports]
In [6]:
%matplotlib inline
v = vs[0]
v.network().as_dataframe().plot()
Out[6]:
When configuring PEST to work with a particular model, you 'teach' PEST how to communicate with the model by describing the format of one or more text input files expected by the model and one or more text output files produced by the model. PEST will then generate updated input files for each simulation and read the updated output files that result from the simulation.
veneer-py has some basic functionality for setting up PEST runs that avoids the need to directly edit the PEST configuration files.
With veneer-py, you can describe a PEST job in Python, including describing the Source model parameters that you want to calibrate and the outputs that you want to calibrate against.
veneer-py will then write out the following PEST configuration files and invoke PEST:
To establish all these files and run a PEST job, you can use functionality in veneer-py to describe a PEST 'Case'.
The 'Case' will ultimately know everything about the calibration
In [13]:
from veneer.pest import Case
At very least, we need to give a Case a name
- which is the basis for all the filenames that will be written out.
You can also specify:
pest
, but the PEST software also comes several others)
In [15]:
calibration = Case('CalibrationCase',optimiser='cmaes_p',model_servers=ports)
PEST has many options - most of which we leave at default. One option that we currently need is to put PEST into single precision mode. This is because PEST, in double precision mode, uses a syntax for floating point literals that is not valid Python:
In [16]:
calibration.options['PRECIS']='single'
PEST needs to be told about the calibration parameters
This is a two step process:
model.catchment.runoff.set_param_values('baseflowCoefficient',0.5,fus=list(fu_types))
but with the actual value (0.5
) changed to the PEST parameter name, with markers (eg @v_bfCoeff@
). This information forms part of a dynamically generated Python script that PEST will modify and run for each simulationWe're performing a rainfall runoff calibration, with the four parameter GR4J rainfall runoff used in the model
We're going to perform a lumped calibration - ie one parameter set everywhere - but we could, alternatively, calibrate distinct parameters by functional unit type, or in different parts of the catchment.
In [17]:
v.model.find_model_type('GR4J')
Out[17]:
In [18]:
params = v.model.find_parameters('TIME.Models.RainfallRunoff.GR4J.GR4J')
params
Out[18]:
We only want to calibrate x1
-x4
- (C
and k
are specific to the eWater version of GR4J - they provide a baseflow filter)
In [19]:
params = params[2:]
params
Out[19]:
We need to assign ranges to each of these. The model implementation in Source has metadata about suitable ranges - but at this stage, there isn't an easy way to interrogate that information from veneer-py. You can check in the Source user interface (Edit|Rainfall Runoff Models
) to see the ranges.
Having done that, we'll construct a Python dictionary of parameter ranges
Note: These are quite narrow ranges for GR4J. This is done to keep the optimisation short in the context of the tutorial (and because the 'observed' data is actually synthetic data generated from Source)
In [20]:
ranges = {
'x1':[100.0,500.0],
'x2':[1.0,5.0],
'x3':[1.0,200.0],
'x4':[0.5,3.0]
}
ranges
Out[20]:
Now, we can loop over each parameter and 'teach' PEST about it - ie tell PEST how to modify the parameter and tell PEST what range we want to calibrate over:
In [21]:
for param,param_range in ranges.items():
print('Configuring %s'%param)
pest_pname = '$'+param+'$'
# 1. Tell PEST how to set the parameter
calibration.parameters.model.catchment.runoff.set_param_values(param,pest_pname)
# 2. Details of the PEST parameter. name, starting value, min, max
# Decide what to use for the initial value... half way between min and max!
initial = 0.5*(param_range[0]+param_range[1])
calibration.parameters.describe(pest_pname,initial,param_range[0],param_range[1])
Note: When we tell PEST how to set the parameter in the model, we use a Python statement that looks similar to statements from earlier sessions:
calibration.parameters.model.catchment.runoff.set_param_values(param,pest_pname)
# is similar to
v.model.catchment.runoff.set_param_values(param,value)
The instructions to PEST will in fact translate to instructions to a Veneer client (v
). So everything after .parameter.
should be something you can call on a Veneer client. The other main difference is that, instead of passing in an actual value at this point, you pass pest_pname
, which will be something like $x1$
and will get translated to an actual value at runtime.
The previous code has gone some way towards configuring the PEST job. We can get a preview of what has been achieved by asking to see the configuration as it stands.
First, the state of the PTF (PEST Template File). You can see that it looks very much like a Python script, except where it has references to things like $x1$
and the like
In [22]:
print(calibration.ptf_text())
There are still gaps in the PTF - eg the # Compute Stats
section - that will come as we describe the outputs and observations.
The PCF (PEST Control File) is also partly complete:
In [23]:
print(calibration.pcf_text())
Note: There are a lot of options in the PCF - and we are using defaults. They are very well described in the PEST Manual and can be specified using calibration.options
In [24]:
calibration.options
Out[24]:
PEST needs to know what its calibrating to. In our case, that means a time series of observed data, and a corresponding modelled time series. We also need the objective function and the time period for the calibration.
PEST has a tool for processing time series data (TSPROC) but we don't use it here. Rather, we compute the objective values in Python and just pass single numbers back to PEST.
We'll start by loading the observed flow time series data into this notebook and exploring the data a bit...
Note: Loading the data at this point serves two purposes:
pandas
command needed to load the data. We need to give this command to PEST later on to call as part of the simulations...
In [26]:
import pandas as pd
flows = pd.read_csv('SyntheticObservedFlow.csv',parse_dates=True,dayfirst=True,index_col=0)
flows[0::50] # Show every fifty days
Out[26]:
In [27]:
flows.plot()
Out[27]:
Note: If your observed data had gaps during your simulation period, this would be a good point to establish the overlapping period in order to inform the simulation/calibration dates.
In our case, the data aligns with the simulation, so the simulation dates we want are just the start and end of the time series:
In [28]:
start,end = flows.index[[0,-1]]
start,end
Out[28]:
This (synthetic) observed flow sequence relates to the (synthetic) gauge towards the bottom of the system. What was it called?
In [29]:
network = v.network()
nodes = network['features'].find_by_feature_type('node')
nodes._all_values('name')
Out[29]:
Aaah, we want 'G123456A'
In [30]:
calibration_node = 'G123456A'
Now we can tell PEST about the observations and the comparison we want.
We need to tell PEST how to load the observed data - and the pandas command we used to do a test load will help:
pd.read_csv('SyntheticObservedFlow.csv',parse_dates=True,dayfirst=True,index_col=0)
# will become
calibration.observations.data.read_csv('SyntheticObservedFlow.csv',parse_dates=True,dayfirst=True,index_col=0)
In [31]:
calibration.observations.data.read_csv('SyntheticObservedFlow.csv',parse_dates=True,dayfirst=True,index_col=0)
And we can set up the comparison
In [32]:
comparison={'NetworkElement':calibration_node,'RecordingVariable':'Downstream Flow Volume'}
veneer-py configures the observation based on the column name in the observed flow file (so that you can have multiple comparisons from different columns and files)
In [33]:
flows.columns
Out[33]:
We also need to reference a stats function. You can write your own (but you'll need to store it in a .py
file) or you can access one from veneer.stats
In [34]:
from veneer import stats
In [35]:
dir(stats)
Out[35]:
In [36]:
help(stats.nse)
In [37]:
calibration.observations.compare('Flow',comparison,stat=stats.nse,aggregation='daily')
We need to do one more thing: We need to make sure that each of our n servers is configured to record the output we require.
We'll need to loop over each of our veneer clients and configure recording. We have comparison
which describes what we want to record. We can disable all other outputs
In [38]:
for v in vs:
veneer.log('Configuring recording for server on port %d'%v.port)
v.configure_recording(enable=[comparison],disable=[{}])
If we look at the content of the PEST config files now, we'll see more details filled in:
In [39]:
print(calibration.ptf_text())
In [40]:
print(calibration.pif_text())
In [41]:
print(calibration.pcf_text())
In [42]:
print(calibration.prf_text())
In [ ]:
We can now invoke the PEST run.
When you call calibration.run()
, PEST will start, and, in this case, it will start parallel PEST mode with n workers. However all output of PEST will be in files, and in the command prompt window from which you started the Jupyter notebook. You will need to look in these places for progress of the calibration.
When PEST finishes, you'll get the calibrated parameters back as well as details of the PEST run. The PEST manual covers the outputs of a PEST run in much detail
Note: PEST needs to be in your Windows path. If its not, the run()
command won't work. You can temporarily add PEST to your path from within the notebook:
In [43]:
pest_path='C:\\PEST'
import os
os.environ['PATH'] = os.environ['PATH']+';'+pest_path
In [44]:
results = calibration.run()
In [45]:
#results = calibration.get_results()
results['parameters']
Out[45]:
In [46]:
kill_all_now(processes)
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]: