Running Source models from Python becomes more compelling when you start running the model multiple times, modifying something (parameters, inputs, structure) about the model between runs.
At the same time, the number of possible actions expands, and in many cases there are multiple possible ways to achieve a particular change.
This session explores iterative runs and some of the ways to make changes to the Source model.
In this session, the changes will be 'non-structural' - which is a loose term, but here it means that we won't be adding or removing nodes, links or catchments and we won't be changing model types within the network.
The types of non-structural changes we will cover include:
The functionality covered in this session is enough to create some quite sophisticated modelling workflows.
It is worth considering how you organise such workflows.
Notebooks can be useful and important documentation regarding your modelling. The notebooks are a way to make your actions reproducible, both by others in the future and on other Source models.
When moving to more sophisticated notebooks, it is worth considering the following issues:
In order to be standalone, these tutorial notebooks tend to combine all the data and logic into the notebook itself, but this isn't desirable. In some cases, it is worth extracting the data into files so that the notebook can be run in other circumstances with different data. In other cases, there is enough custom logic in the workflow that this logic should be in Python modules in order to facilitate reuse.
We'll mostly take the easy route in these tutorials - and keep it all the notebook.
In [ ]:
Perhaps the simplest approach to iterating with Python and Source is to the run the model for a series of Scenario Input Sets, where you've separately defined those input sets - eg manually.
There are two main ways to use input sets in batch runs:
v.run_model
. This instructs Source to use that Input Set include all data sources configured for that Input Set.v.apply_input_set
before v.run_model
. v.apply_input_set
will only execute the parameter 'instructions' in the input set.Option 2 can be useful where you have numerous input sets that need to be combined in various ways for different scenarios.
Here, we will demonstrate option 1 (specify in v.run_model()
) and then illustrate option 2 with a hypothetical example (which has a basis in reality!).
In [1]:
import veneer
v = veneer.Veneer(port=9876)
In [2]:
input_sets = v.input_sets()
input_sets
Out[2]:
In [3]:
input_sets.as_dataframe()
Out[3]:
Note: When looking at the details of the input set from Python, you only see the instructions (under Configuration
). This is because the instructions are the only thing that you can easily modify from Python at this stage. Everything else is there in Source and will be used when you run the model.
We can iterate over the input_sets
list using a Python for
loop, but first, we should establish a way to track the results we are interested in.
The input sets in the sample model simply change the maximum extraction rate at the supply point. The effect of this should be visible at a few points in the model - in terms of storage releases, extractions, shortfalls on the demand and flow out the bottom of the system. Lets ensure we are recording some of those
In [4]:
things_to_record=[
{'NetworkElement':'Lower Gauge','RecordingVariable':'Downstream Flow Volume'},
{'NetworkElement':'Crop Fields'},
{'NetworkElement':'Recreational Lake','RecordingElement':'StorageVolume'}
]
v.configure_recording(enable=things_to_record)
We now want to iterate over the input sets, running the model each time, and retrieving some results for each run.
We can either:
Option 2 allows us to drop each run as we go (which can be useful to save memory on big models).
Either way, we need to track information as we go - with option 1 we need to know which run relates to each input set.
We will use a Python dictionary to let us track input set
-> results time series
In [5]:
all_results = {}
A python for
loop takes the form
for element in loop_range:
do_something_typically_using(element)
where loop_range
is the thing you're looping over and is typically a list or similar. (It needs to be 'iterable')
For example:
In [6]:
for i in [0,1,2,3,4]:
print(i)
There are lots of ways to support loop writing, such as the range(n)
function, which returns an iterable object that goes from 0 to n-1:
In [7]:
for i in range(10):
print(i)
We will loop over the input_sets
list, where each item in the list is a Python dictionary. Those dictionaries contain the 'Name'
key, which we need to pass to Source
We need some help from the pandas core library for this one (to combine DataFrames from two calls to v.retrieve_multiple_time_series
), so we'll import it here
In [8]:
import pandas as pd
In [9]:
for input_set in input_sets:
set_name = input_set['Name']
# Log what's happening
veneer.log('Running ' + set_name)
# Run the model with the current input set
v.run_model(SelectedInputSet=set_name)
# Retrieve the run index so we can pass it to v.retrieve_multiple_time_series
run_index = v.retrieve_run()
# Now, retrieve the results we want
end_of_system_flow = v.retrieve_multiple_time_series(run_data=run_index,criteria={
'NetworkElement':'Lower Gauge','RecordingVariable':'Downstream Flow Volume'
})
crop_time_series = v.retrieve_multiple_time_series(run_data=run_index,criteria={
'NetworkElement':'Crop Fields','RecordingVariable':'.*@Demand Model@.*'
})
all_results[set_name] = pd.merge(end_of_system_flow,crop_time_series,left_index=True,right_index=True)
Now that that's run, lets look at the results
The default view won't be that friendly...
In [10]:
all_results
Out[10]:
but we can pick the results for a given input set.
In [11]:
all_results['Default Input Set'][0:10]
Out[11]:
Note how the results from two calls to v.retrieve_multiple_time_series
have been combined into the one DataFrame
In [ ]:
Lets enable charting now so we can plot them
(This time, we explicitly import matplotlib.pyplot
in order to get the legend
function)
In [12]:
%pylab inline
import matplotlib.pyplot as plt
In [13]:
for input_set in all_results:
all_results[input_set]['Lower Gauge:Downstream Flow Volume'].plot(label=input_set)
plt.legend()
Out[13]:
There's obviously no visible difference there...
Lets compute the difference between the two runs and see if we can find deltas
In [14]:
delta = all_results['Default Input Set']-all_results['Unrestricted Take']
delta
Out[14]:
In [15]:
delta['Lower Gauge:Downstream Flow Volume'].plot()
Out[15]:
So there is a difference, albeit small c.f. the overall system flow.
In [ ]:
In a recent project, there was a requirement to run 48 simulations. This was made up of 12 development scenarios x 4 climate scenarios.
The scenarios were described as input sets:
v.apply_input_sets
in a particular order.The scenario runs were organised as follows:
scenarios={
'Baseline':['Reset Input Set'],
'Development Case 1':['Reset Input Set','Increase Crop Areas'],
'Development Case 2':['Reset Input Set','Increase Crop Areas','Increase Storage Volumes'],
}
While the climate input sets were identifed by the input set name:
climate_sets=['Historic','Low Rainfall','Moderate Rainfall','High Rainfall']
The model runs were then handled with a nested loop:
for scenario_name,input_sets in scenarios.items:
# Run the input sets required by this scenario
for scenario_input_set in input_sets:
v.apply_input_set(scenario_input_set)
# Run the model in this configuration, once for each climate input set
for climate_set in climate_sets:
v.run_model(SelectedInputSet=climate_set)
Note: Source will remember the last input set used as the SelectedInputSet
. If you want to use a different input set for the next run, you will need to explicitly specify it on the next call to v.run_model()
In the above examples, and in most (but not all!) of what follows, when you specify a change for a particular run, that change will persist - ie it will still be there in future runs unless you change things back.
In some cases, the changes will get saved in the model file.
There is no generic 'undo' functionality in Veneer or veneer-py, so you need to consider how you reset any changes that you don't want to keep. For example:
v.apply_input_set
when reset is requiredThe flipside is that not all changes will persist, even if you want them to. Most notably, changes to a the time series data in a data source won't be saved back to disk. Changed time series will survive as long as the project is open in the current session of Source.
In [ ]:
In [ ]:
In [ ]:
The Functions feature in Source was the initial way in which model parameters could be expossed to external programs, through RiverSystem.CommandLine
.
You can query and modify functions in Source. While you can do this for any Function, its typical to define Functions that are scalars, and modify these, with the scalar functions being referenced by other functions.
This can be seen in the sample model.
In [16]:
functions = v.functions()
f_df = functions.as_dataframe()
f_df
Out[16]:
Note: Jupyter thinks those expressions are in LaTeX format. We'll temporarily disable HTML output to look at this data frame
In [17]:
print(f_df)
In [ ]:
We can see that $InflowScaling
is used by three functions - $CrabSaled
, $FishScaled
and $ShellScaled
. These functions are used to scale the respective inflow timeseries by a single scaling factor.
We can use $InflowScaling
to scale all model inflows, for example to test the reliability of the system under reduced inflow.
In [18]:
v.update_function('$InflowScaling',0.5)
Out[18]:
In [19]:
print(v.functions().as_dataframe())
Before proceeding, lets reset $InflowScaling
to its original value
Because its just one change, its easy enough to hard code it here...
In [20]:
v.update_function('$InflowScaling',1.0)
Out[20]:
Alternatively, we could reset all the functions to the values we retrieved earlier.
functions
is a list of Python dictionaries, with each dictionary in the list having a 'Name'
and the the expression
In [21]:
for fn in functions:
v.update_function(fn['Name'],fn['Expression'])
We can now run a number of runs, modifying the $InflowScaling
function each time and hence modifying the system inflow time series.
We'll perform a very simple (and short!) monte-carlo simulation, sampling from the exponential distribution. We'll then see what effect this has on spill volumes from the storage.
The random generators for different distributions are in the numpy
package, which, by convention, is imported as np
In [22]:
import numpy as np
In [23]:
NUMBER_OF_SIMULATIONS=50
sampled_scaling_factors = np.random.exponential(size=NUMBER_OF_SIMULATIONS)
sampled_scaling_factors
Out[23]:
In [24]:
plt.hist(sampled_scaling_factors)
Out[24]:
Now we can construct our loop and gather the results.
I only want numbers on the average spill for each run, and I want to be able to plot it as a distribution in much the same way that we've plotting the distribution of $InflowScaling
In [25]:
spill_results=[]
# Store our time series criteria in a variable to use it in configuring recording and retrieving results
ts_match_criteria = {'NetworkElement':'Recreational Lake','RecordingVariable':'Spill Volume'}
v.configure_recording(enable=[ts_match_criteria])
for scaling_factor in sampled_scaling_factors:
veneer.log('Running for $InflowScaling=%f'%scaling_factor)
# We are running the multiple many times in this case - so lets drop any results we already have...
v.drop_all_runs()
# Set $InflowScaling to current scaling factor
v.update_function('$InflowScaling',scaling_factor)
v.run_model()
# Retrieve the spill time series, as an annual sum, with the column named for the variable ('Spill Volume')
run_results = v.retrieve_multiple_time_series(criteria=ts_match_criteria,timestep='annual',name_fn=veneer.name_for_variable)
# Store the mean spill volume and the scaling factor we used
spill_results.append({'ScalingFactor':scaling_factor,'SpillVolume':run_results['Spill Volume'].mean()})
# Convert the results to a Data Frame
spill_results_df = pd.DataFrame(spill_results)
spill_results_df
Out[25]:
Notes:
v.drop_all_runs()
ensures that we don't have 50 full results sets sitting in memory. By dropping the runs at the start of the loop, before running the model, we ensure that we have one set of results at the end in case we need to investigate anything in more detail.
In [26]:
spill_results_df['SpillVolumeGL'] = spill_results_df['SpillVolume'] * 1e-6 # Convert to GL
In [27]:
spill_results_df['SpillVolumeGL'].hist()
Out[27]:
In [ ]:
Functions in Source can make use of numerous types of variables, which can derive values from a number of sources, including:
You can query Source for the list of variables, and, at this stage, you can directly modify time series variables and piecewise linear variables.
In [28]:
variables = v.variables()
variables_df = variables.as_dataframe()
variables_df
Out[28]:
This summary doesn't tell you anything about the details of the values in the variable - although in some cases there is a URL pointing to the details (in the PiecewiseFunction
column or the TimeSeries
column).
As a quick summary of where more details are available, filter for variables where VeneerSupported=True
In [29]:
variables_df[variables_df.VeneerSupported]
Out[29]:
We can query for either the piecewise function or the time series
In [30]:
v.variable_piecewise('$PatternPW')
Out[30]:
In [31]:
v.variable_time_series('$CrabTS')[::500]
Out[31]:
We can update a piecewise linear variable by passing an appropriate dataframe (one with two columns, each with numbers) to v.update_variable_piecewise
For example, lets double the minimum flow requirement ($PatternPW
)
In [32]:
pattern = v.variable_piecewise('$PatternPW')
pattern
Out[32]:
In [33]:
pattern['Result'] *= 2.0 # Multiply each value of Result column by 2
pattern
Out[33]:
In [34]:
v.update_variable_piecewise('$PatternPW',pattern)
Out[34]:
You can check that the change has taken effect by looking in the Functions Manager in Source, or retrieving the piecewise function again
In [35]:
v.variable_piecewise('$PatternPW')
Out[35]:
Updating time series variables works in much the same way - you need to pass a DataFrame with an appropriate structure. In this case, you need a date time index and a single column of values.
One approach is to retrieve the existing time series data then modify the Value column (keeping the date index in place)
In [36]:
crab_ts = v.variable_time_series('$CrabTS')
crab_ts.plot()
Out[36]:
Lets generate a synthetic time series based on the existing sequence.
We'll apply a distinct monthly scaling factor. We could do this in Source through a pattern variable, but using Python is an opportunity to demonstrate some of the time series capabilities in pandas.
First, lets initialise the scaling factor data.
In [37]:
monthly_scaling=[0.6,0.75,1.0,1.0,1.1,1.20,1.20,1.1,1.0,0.8,0.6,0.5]
len(monthly_scaling)
Out[37]:
In [38]:
scaling_df = pd.DataFrame(data={'Month':range(1,13),'Scale':monthly_scaling}).set_index('Month')
scaling_df
Out[38]:
So, now we need to apply these monthly factors to every day in the time series.
In many languages, you would write a loop (eg a for
loop) at this point.
In Python, and particularly with numpy and pandas, there are more convenient (and performant) options
Because we have a datetime index on the time series, we can easily get a list of the month for each timestep, using crab_ts.index.month
:
In [39]:
plt.plot(crab_ts.index.month)
Out[39]:
So, now we need to take that sequence of values and find the corresponding scaling factors.
We'll use the indexing capabilities of data frames to help. To understand the next step, see what happens when we index scaling_df
by a month number (starting at 1)
In [40]:
scaling_df.Scale[12]
Out[40]:
Now, see what happens when we provide a series of months (including duplicates)
In [41]:
scaling_df.Scale[[2,5,7,12,12,1]]
Out[41]:
Extending this, and using the list of months for each timestep, we can find the scaling factor for each timestep
In [42]:
scaling_for_timesteps = scaling_df.Scale[crab_ts.index.month].values
In [43]:
plot(scaling_for_timesteps)
Out[43]:
Now, we can multiple the values in the time series by the scaling factors
In [44]:
crab_ts['ValueScaled'] = crab_ts.Value * scaling_for_timesteps
In [45]:
# Lets plot the first year to see the effect
crab_ts[0:365].plot()
Out[45]:
In [46]:
# That's hard to see, so lets look at the difference:
delta = crab_ts.Value-crab_ts.ValueScaled
delta[0:365].plot()
Out[46]:
We now have an extra column in our DataFrame. Source is expecting one column for our time series
In [47]:
crab_ts.columns
Out[47]:
In [48]:
new_ts = crab_ts[['ValueScaled']]
new_ts[0::500]
Out[48]:
In [49]:
v.update_variable_time_series('$CrabTS',new_ts)
Out[49]:
In [ ]:
At the beginning of this session, we ran a series of scenarios by iterating through each scenario input set and running once for each one.
It is also possible to modify the commands in the input sets themselves.
Modifying input sets gives you control over a great many aspects of the model, although scripting the modification can get a bit fiddly, for two main reasons:
'Nodes.Supply Point 13.Maximum Extraction Rate'
, a value (eg '100000'
) and units that can be understood by Source (eg 'ML/d'
), andMuch of what can be accomplished with input sets can be handled directly using the functions under the v.model.
namespace. However input sets can be a convenient option for certain parameters that aren't yet well supported under v.model
. For example, there is currently no way to directly modify the monthly pattern on a minimum flow node, using the functions in veneer-py. However, you can modify the pattern via input sets - and the user interface in Source will give you hints as to how.
Nodes.Lake Release.Monthly Pattern=
=
, it should show you the current value of the pattern, expressed in the scenario input set syntax. If you select this text and press enter, it should be added to the text editor:
Nodes.Lake Release.Monthly Pattern=[0 0 0 0 0 0 0 0 0 0 0 0]{ML/d}
You now have a working example of how to set the monthly pattern using an input set - and you can use this, along with Python's string manipulation routines, to set the monthly pattern from Python.
In the following blocks, we will use Python's string handling functionality. A Python tutorial will cover this in more depth.
Lets start by setting up a template for the command to change the monthly pattern. We'll add a string substitution command into the template:
In [50]:
template='Nodes.Lake Release.Monthly Pattern=[%s]{ML/d}'
As you can see, the 0s (separated by spaces) have been replaced with a single %s
. When used with the string substitution functionality, this tells Python to expect another string that should be inserted at this point.
We'll make that other string contain 12 numbers, separated by spaces.
Lets define the values we want
In [51]:
monthly_values=[5,3,4,6,6,7,7,6,5,4,4,5]
Now, we want to convert our list of numbers (monthly_values
) to a string, separated by space. We can use the join
method, available on all strings, to do what we want.
The only trick is that join
wants a list of strings, not a list of numbers. We'll use a list comprehension to convert each number to a string
In [52]:
list_as_string= ' '.join([str(v) for v in monthly_values])
list_as_string
Out[52]:
We can now combine this list into the template to create an input set command
In [53]:
command = template%(list_as_string)
command
Out[53]:
We can now update an input set of choice by adding this command.
First, retrieve the input sets:
In [54]:
input_sets = v.input_sets()
input_sets.as_dataframe()
Out[54]:
We'll modify the first one in the list
In [55]:
the_input_set = input_sets[0]
the_input_set['Configuration']
Out[55]:
We can use .append
to add a command to the 'Configuration'
In [56]:
the_input_set['Configuration'].append(command)
the_input_set['Configuration']
Out[56]:
Now, we can update the input set within Source
In [57]:
v.update_input_set(the_input_set['Name'],the_input_set)
Out[57]:
You can check in Source to see that the modified input set has been uploaded.
Now, changing an input set won't have any immediate effect on the model. For the options in the input set to take effect and be visible at (in this case) the minimum flow node, you would need to either:
v.apply_input_set
or the option in the Source application)This session has looked at various ways to modify the configuration of the Source model in order to run the model multiple times.
Veneer and veneer-py also include functionality for directly modifying model structure and parameters by executing .NET IronPython scripts within Source itself. This functionality is particularly useful for bulk transformations.
This is the subject of the next session.
In [ ]:
In [ ]:
In [ ]:
In [ ]: