api
At its core, Shyft provides functionality through an API (Application Programming Interface). All the functionality of Shyft is available through this API.
We begin the tutorials by introducing the API as it provides the building blocks for the framework. Once you have a good understanding, you can move toward configured runs that make use of orchestation
. To make use of configured runs, you need to understand how we 'serialize' configurations and input data through repositories
.
In a separate of the simulation tutorials, we cover conducting a very simple simulation of an example catchment using configuration files. This is a typical use case, but assumes that you have a model well configured and ready for simulation. In practice, one is interested in working with the model, testing different configurations, and evaluating different data sources.
This is in fact a key idea of Shyft -- to make it simple to evaluate the impact of the selection of model routine on the performance of the simulation. In this notebook we walk through this lower level paradigm of working with the toolbox and using the Shyft api
directly to conduct the simulations.
This notebook is guiding through the simulation process of a catchment. The following steps are described:
For the notebook tutorials we require several imports. In addition, be sure your shyft environment is correctly configured. This is required before importing shyft itself. Lastly, import the shyft
classes and modules.
In [1]:
# Pure python modules and jupyter notebook functionality
# first you should import the third-party python modules which you'll use later on
# the first line enables that figures are shown inline, directly in the notebook
%matplotlib inline
import os
import datetime as dt
import numpy as np
from os import path
import sys
from matplotlib import pyplot as plt
from netCDF4 import Dataset
In [2]:
# try to auto-configure the path. This will work in the case
# that you have checked out the doc and data repositories
# at same level. Make sure this is done **before** importing shyft
shyft_data_path = path.abspath("../../../shyft-data")
if path.exists(shyft_data_path) and 'SHYFT_DATA' not in os.environ:
os.environ['SHYFT_DATA']=shyft_data_path
# shyft should be available either by it's install in python
# or by PYTHONPATH set by user prior to starting notebook.
# If you have cloned the repositories according to guidelines:
shyft_path=path.abspath('../../../shyft')
sys.path.insert(0,shyft_path)
In [3]:
from shyft import api
import shyft
import shyft.api.pt_gs_k
import shyft.repository
import shyft.repository.netcdf
print(shyft.__path__)
for env in os.environ:
if 'SHYFT' in env:
print('{0}:\n{1}'.format(env, os.environ[env]))
from shyft import shyftdata_dir
print(shyftdata_dir)
The first point of simulation is to define the model that you will create. In this example, we will use Shyft's pure api
approach to create a model from scratch.
What is required to set up a simulation? At the most basic level of Shyft, we need to define the simulation domain / geometry. Shyft does not care about the specific shape of the cells. Shyft just needs a 'geocentroid location' and an area. We will create a container of this information as a first step to provide to one of Shyft's model types later.
We are going to be working with the data from the Nea-Nidelva catchment, example dataset. This is available in the shyft-data repository. Above, you should have set your SHYFT_DATA
environment variable to point to this directory so that we can easily read the data.
The first thing to do is to take a look at the geography of our cells.
In [4]:
# load the data from the example datasets
cell_data = Dataset( os.path.join(shyftdata_dir, 'netcdf/orchestration-testdata/cell_data.nc'))
# plot the coordinates of the cell data provided
# fetch the x- and y-location of the cells
x = cell_data.variables['x'][:]
y = cell_data.variables['y'][:]
z = cell_data.variables['z'][:]
cid = cell_data.variables['catchment_id'][:]
# and make a quick catchment map...
# using a scatter plot of the cells
fig, ax = plt.subplots(figsize=(15,5))
cm = plt.cm.get_cmap('rainbow')
elv_col = ax.scatter(x, y, c=z, marker='.', s=40, lw=0, cmap=cm)
# cm = plt.cm.get_cmap('gist_gray')
# cid_col = ax.scatter(x, y, c=cid, marker='.', s=40, lw=0, alpha=0.4, cmap=cm)
plt.colorbar(elv_col).set_label('catchment elevation [m]')
# plt.colorbar(cid_col).set_label('catchment indices [id]')
plt.title('Nea Nidelva Catchment')
# print(set(cid))
Out[4]:
In Shyft we work with 'cells', which is the basic simulation unit. In the example netcdf file, we provide the attributes for the cells we are going to plot. But you made need to extract this information from your own GIS, or other data. The essential variables that are minimally required include::
If you look at the netcdf file, you'll see these are included:
In [5]:
print(cell_data.variables.keys())
So the first step is to extract these from the netcdf file, and get them into the model.
In [6]:
# Let's first create a 'container' that will hold all of our model domains cells:
cell_data_vector = api.GeoCellDataVector()
# help(cell_data_vector)
#help(api.GeoPoint)
In [7]:
num_cells = cell_data.dimensions['cell'].size
for i in range(num_cells):
gp = api.GeoPoint(x[i], y[i], z[i]) # recall, we extracted x,y,z above
cid = cell_data.variables['catchment_id'][i]
cell_area = cell_data.variables['area'][i]
# land fractions:
glac = cell_data.variables['glacier-fraction'][i]
lake = cell_data.variables['lake-fraction'][i]
rsvr = cell_data.variables['reservoir-fraction'][i]
frst = cell_data.variables['forest-fraction'][i]
unsp = 1 - (glac + lake + rsvr + frst)
land_cover_frac = api.LandTypeFractions(glac, lake, rsvr, frst, unsp)
rad_fx = 0.9
# note, for now we need to make sure we cast some types to pure python, not numpy
geo_cell_data = api.GeoCellData(gp, float(cell_area), int(cid), rad_fx, land_cover_frac)
cell_data_vector.append(geo_cell_data)
In [11]:
nci.close()
In [ ]:
# now get the forcing data ready.
# first create a region_environment object, the 'container' that will hold all
# the forcing data sources
re = api.ARegionEnvironment()
# map the variable names in the netcdf file to the source types
source_map = {'precipitation' : (api.PrecipitationSource, re.precipitation),
'radiation' : (api.RadiationSource, re.radiation),
'temperature' : (api.TemperatureSource, re.temperature),
'wind_speed' : (api.WindSpeedSource, re.wind_speed),
'relative_humidity' : (api.RelHumSource, re.rel_hum) }
# load the data from the example datasets
# station_met = Dataset( os.path.join(shyftdata_dir, 'netcdf/orchestration-testdata/stations_met.nc'))
# for station in station_met.groups.keys():
# stn = station_met.groups[station]
# print(stn)
# time = api.UtcTimeVector([int(t) for t in stn.variables['time'][:]])
# dt = time[1] - time[0] if len(time) > 1 else api.deltahours(1)
# x = stn.x
# y = stn.y
# z = stn.z
# gp = api.GeoPoint(x, y, z)
# for var, (source, source_vec) in source_map.items():
# if var in stn.variables.keys():
# data = stn.variables[var][:]
# time_axis = api.TimeAxis(int(time[0]), api.deltahours(dt), len(time))
# cts = api.TsFactory().create_time_point_ts(time_axis.total_period(), time, data, api.POINT_AVERAGE_VALUE)
# # add it to the precipitation source
# source_vec.append(source(gp, cts))
### ANOTHER APPROACH
re2 = api.ARegionEnvironment()
# precip = Dataset( os.path.join(shyftdata_dir, 'netcdf/orchestration-testdata/precipitation.nc'))
# rad = Dataset( os.path.join(shyftdata_dir, 'netcdf/orchestration-testdata/radiation.nc'))
# windsp = Dataset( os.path.join(shyftdata_dir, 'netcdf/orchestration-testdata/wind_speed.nc'))
# relhum = Dataset( os.path.join(shyftdata_dir, 'netcdf/orchestration-testdata/relative_humidity.nc'))
# temp = Dataset( os.path.join(shyftdata_dir, 'netcdf/orchestration-testdata/temperature.nc'))
# map the variable names in the netcdf file to the source types
source_map = {'precipitation' : ('precipitation.nc', api.PrecipitationSource, re2.precipitation),
'global_radiation' : ('radiation.nc', api.RadiationSource, re2.radiation),
'temperature' : ('temperature.nc', api.TemperatureSource, re2.temperature),
'wind_speed' : ('wind_speed.nc', api.WindSpeedSource, re2.wind_speed),
'relative_humidity' : ('relative_humidity.nc', api.RelHumSource, re2.rel_hum) }
for var, (file_name, source, source_vec) in source_map.items():
nci = Dataset( os.path.join(shyftdata_dir, 'netcdf/orchestration-testdata/' + file_name))
time = api.UtcTimeVector([int(t) for t in nci.variables['time'][:]])
dt = time[1] - time[0] if len(time) > 1 else api.deltahours(1)
for i in range(nci.dimensions['station'].size):
x = nci.variables['x'][i]
y = nci.variables['y'][i]
z = nci.variables['z'][i]
gp = api.GeoPoint(x, y, z)
data = nci.variables[var][:, i]
print(data)
time_axis = api.TimeAxis(int(time[0]), api.deltahours(dt), len(time))
cts = api.TsFactory().create_time_point_ts(time_axis.total_period(), time, data, api.POINT_AVERAGE_VALUE)
# add it to the precipitation source
source_vec.append(source(gp, cts))
In [21]:
re2.temperature.values_at_time(int(stn.variables['time'][100]))
Out[21]:
In [ ]:
def create_dummy_region_environment(self, time_axis, mid_point):
re = api.ARegionEnvironment()
re.precipitation.append(self._create_constant_geo_ts(api.PrecipitationSource, mid_point, time_axis.total_period(), 5.0))
re.temperature.append(self._create_constant_geo_ts(api.TemperatureSource, mid_point, time_axis.total_period(), 10.0))
re.wind_speed.append(self._create_constant_geo_ts(api.WindSpeedSource, mid_point, time_axis.total_period(), 2.0))
re.rel_hum.append(self._create_constant_geo_ts(api.RelHumSource, mid_point, time_axis.total_period(), 0.7))
re.radiation = api.RadiationSourceVector() # just for testing BW compat
re.radiation.append(self._create_constant_geo_ts(api.RadiationSource, mid_point, time_axis.total_period(), 300.0))
return re
def _create_constant_geo_ts(self, geo_ts_type, geo_point, utc_period, value):
"""Create a time point ts, with one value at the start
of the supplied utc_period."""
tv = api.UtcTimeVector()
tv.push_back(utc_period.start)
vv = api.DoubleVector()
vv.push_back(value)
cts = api.TsFactory().create_time_point_ts(utc_period, tv, vv, api.POINT_AVERAGE_VALUE)
return geo_ts_type(geo_point, cts)
In [31]:
# next, create the simulation dictionary
RegionDict = {'region_model_id': 'demo', #a unique name identifier of the simulation
'domain': {'EPSG': 32633,
'nx': 400,
'ny': 80,
'step_x': 1000,
'step_y': 1000,
'lower_left_x': 100000,
'lower_left_y': 6960000},
'repository': {'class': shyft.repository.netcdf.cf_region_model_repository.CFRegionModelRepository,
'params': {'data_file': 'netcdf/orchestration-testdata/cell_data.nc'}},
}
The first keys, are probably quite clear:
start_datetime
: a string in the format: "2013-09-01T00:00:00"run_time_step
: an integer representing the time step of the simulation (in seconds), so for a daily step: 86400number_of_steps
: an integer for how long the simulatoin should run: 365 (for a year long simulation)region_model_id
: a string to name the simulation: 'neanidelva-ptgsk'We also need to know where the simulation is taking place. This information is contained in the domain
:
EPSG
: an EPSG string to identify the coordinate systemnx
: number of 'cells' in the x directionny
: number of 'cells' in the y directionstep_x
: size of cell in x direction (m)step_y
: size of cell in y direction (m)lower_left_x
: where (x) in the EPSG system the cells beginlower_left_y
: where (y) in the EPSG system the cells beginrepository
: a repository that can read the file containing data for the cells (in this case it will read a netcdf file)
In [32]:
from shyft.api.pt_gs_k import PTGSKModel
ModelDict = {'model_t': PTGSKModel, # model to construct
'model_parameters': {
'actual_evapotranspiration':{
'ae_scale_factor': 1.5},
'gamma_snow':{
'calculate_iso_pot_energy': False,
'fast_albedo_decay_rate': 6.752787747748934,
'glacier_albedo': 0.4,
'initial_bare_ground_fraction': 0.04,
'max_albedo': 0.9,
'max_water': 0.1,
'min_albedo': 0.6,
'slow_albedo_decay_rate': 37.17325702015658,
'snow_cv': 0.4,
'tx': -0.5752881492890207,
'snowfall_reset_depth': 5.0,
'surface_magnitude': 30.0,
'wind_const': 1.0,
'wind_scale': 1.8959672005350063,
'winter_end_day_of_year': 100},
'kirchner':{
'c1': -3.336197322290274,
'c2': 0.33433661533385695,
'c3': -0.12503959620315988},
'precipitation_correction': {
'scale_factor': 1.0},
'priestley_taylor':{'albedo': 0.2,
'alpha': 1.26},
}
}
In this dictionary we define two variables:
model_t
: the import path to a shyft 'model stack' classmodel_parameters
: a dictionary containing specific parameter values for a particular model classSpecifics of the model_parameters
dictionary will vary based on which class is used.
Okay, so far we have two dictionaries. One which provides information regarding our simulation domain, and a second which provides information on the model that we wish to run over the domain (e.g. in each of the cells). The next step, then, is to map these together and create a region_repo
class.
This is achieved by using a repository, in this case, the CFRegionModelRepository
we imported above.
In [33]:
region_repo = CFRegionModelRepository(RegionDict, ModelDict)
region_model
The first step in conducting a hydrologic simulation is to define the domain of the simulation and the model type which we would like to simulate. To do this we create a region_model
object. Above we created dictionaries that contain this information, and we instantiated a class called teh region_repo
. In this next step, we put it together so that we have a single object which we can work with "at our fingertips". You'll note above that we have pointed to a 'data_file' earlier when we defined the RegionDict
. This data file contains all the required elements to fill the cells of our domain. The informaiton is contained in a single netcdf file
Before we go further, let's look briefly at the contents of this file:
In [34]:
cell_data_file = os.path.join(shyftdata_dir, 'netcdf/orchestration-testdata/cell_data.nc')
print(cell_data_file)
cell_data = Dataset(cell_data_file)
print(cell_data)
You might be surprised to see the dimensions are 'cells', but recall that in Shyft everything is vectorized. Each 'cell' is an element within a domain, and each cell has associated variables:
We'll bring this data into our workspace via the region_model
. Note that we have instantiated a region_repo
class using one of the existing Shyft repositories, in this case one that was built for reading in the data as it is contained in the example shyft-data netcdf files: CFRegionModelRepository
.
Next, we'll use the region_repo.get_region_model
method to get the region_model
. Note the name 'demo', in this case is arbitrary. However, depending on how you create your repository, you can specify what region model to return using this string.
In [35]:
region_model = region_repo.get_region_model('demo')
region_model
So we now have created a region_model
, but what is it actually? This is a very fundamental class in Shyft. It is actually one of the "model stacks", such as 'PTGSK', or 'PTHSK'. Essentially, the region_model
contains all the information regarding the simulation type and domain. There are many methods associated with the region_model
and it will take time to understand all of them. For now, let's just explore a few key methods:
bounding_region
: provides information regarding the domain of interest for the simulationcatchment_id_map
: indices of the various catchments within the domaincells
: an instance of PTGSKCellAllVector
that holds the individual cells for the simulation (note that this is type-specific to the model type)ncore
: an integer that sets the numbers of cores to use during simulation (Shyft is very greedy if you let it!)time_axis
: a shyft.api.TimeAxisFixedDeltaT
class (basically contains information regarding the timing of the simulation)Keep in mind that many of these methods are more 'C++'-like than 'Pythonic'. This means, that in some cases, you'll have to 'call' the method. For example: region_model.bounding_region.epsg()
returns a string. You can use tab-completion to explore the region_model
further:
In [36]:
region_model.bounding_region.epsg()
Out[36]:
You'll likely note that there are a number of intriguing fucntions, e.g. initialize_cell_environment
or interpolate
. But before we can go further, we need some more information. Perhaps you are wondering about forcing data. So far, we haven't said anything about model input or the time of the simulation, we've only set up a container that holds all the domain and model type information about our simulation.
Still, we have made some progress. Let's look for instance at the cells:
In [37]:
cell_0 = region_model.cells[0]
print(cell_0.geo)
So you can see that so far, each of the cells in the region_model contain information regarding their LandTypeFractions, geolocation, catchment_id, and area.
A particulary important attribute is region_model.region_env
. This is a container for each cell that holds the "environmental timeseries", or forcing data, for the simulation. By "tabbing" from cell.
you can see that each cell also has and env_ts
attribute. These are containers customized to provide timeseries as required by the model type we selected, but there is no data yet. In this case we used the PTGSKModel
(see the ModelDict
). So for every cell in your simulation, there is a container prepared to accept the forcing data as the next cell shows.
In [38]:
#just so we don't see 'private' attributes
print([d for d in dir(cell_0.env_ts) if '_' not in d[0]])
region_model.size()
Out[38]:
region_model
Clearly the next step is to add forcing data to our region_model
object. Let's start by thinking about what kind of data we need. From above, where we looked at the env_ts
attribute, it's clear that this particular model stack, PTGSKModel
, requires:
We have stored this information each in seperate netcdf files which each contain the observational series for a number of different stations.
Our goal now is to populate the region_env
.
We use the term sources to define a location data may be coming from. You may also come across destinations. In both cases, it just means a file, database, service of some kind, etc. that is capable of providing data. Repositories are written to connect to sources. Following our earlier approach, we'll create another dictionary to define our data sources, but first we need to import another repository:
In [39]:
from shyft.repository.netcdf.cf_geo_ts_repository import CFDataRepository
In [40]:
from shyft.repository.netcdf.cf_geo_ts_repository import CFDataRepository
ForcingData = {'sources': [
{'repository': shyft.repository.netcdf.cf_geo_ts_repository.CFDataRepository,
'params': {'epsg': 32633,
'selection_criteria': None,
'stations_met': 'netcdf/orchestration-testdata/precipitation.nc'},
'types': ['precipitation']},
{'repository': shyft.repository.netcdf.cf_geo_ts_repository.CFDataRepository,
'params': {'epsg': 32633,
'selection_criteria': None,
'stations_met': 'netcdf/orchestration-testdata/temperature.nc'},
'types': ['temperature']},
{'params': {'epsg': 32633,
'selection_criteria': None,
'stations_met': 'netcdf/orchestration-testdata/wind_speed.nc'},
'repository': shyft.repository.netcdf.cf_geo_ts_repository.CFDataRepository,
'types': ['wind_speed']},
{'repository': shyft.repository.netcdf.cf_geo_ts_repository.CFDataRepository,
'params': {'epsg': 32633,
'selection_criteria': None,
'stations_met': 'netcdf/orchestration-testdata/relative_humidity.nc'},
'types': ['relative_humidity']},
{'repository': shyft.repository.netcdf.cf_geo_ts_repository.CFDataRepository,
'params': {'epsg': 32633,
'selection_criteria': None,
'stations_met': 'netcdf/orchestration-testdata/radiation.nc'},
'types': ['radiation']}]
}
In another notebook, further information will be provided regarding the repositories. For the time being, let's look at this configuration dictionary that was created. It essentially just contains a list, keyed by the name "sources"
. This key is known in some of the tools that are built in the Shyft orchestration, so it is recommended to use it.
Each item in the list is a dictionary for each of the source types, the keys in the dictionaries are: repository
, params
, and types
. The general idea and concept is that in orchestration, the object keyed by repository
is a class that is instantiated by passing the objects contained in params
.
Let's repeat that. From our Datasets
dictionary, we get a list of "sources"
. Each of these sources contains a class (a repository) that is capable of getting the source data into Shyft. Whatever parameters that are required for the class to work, will be included in the "sources"
dictionary. In our case, the params
are quite simple, just a path to a netcdf file. But suppose our repository required credentials or other information for a database? This information could also be included in the params
stanza of the dictionary.
You should explore the above referenced netcdf files that are available at the shyft-data git repository. These files contain the forcing data that will be used in the example simulation. Each one contains observational data from some stations in our catchment. Depending on how you write your repository, this data may be provided to Shyft in many different formats.
Let's explore this concept further by getting the 'temperature' data:
In [41]:
# get the temperature sources:
tmp_sources = [source for source in ForcingData['sources'] if 'temperature' in source['types']]
# in this example there is only one
t0 = tmp_sources[0]
# We will now instantiate the repository with the parameters that are provided
# in the dictionary.
# Note the 'call' structure expects params to contain keyword arguments, and these
# can be anything you want depending on how you create your repository
tmp_repo = t0['repository'](**t0['params'])
tmp_repo
is now an instance of the Shyft CFDataRepository
, and this will provide Shyft with the data when it sets up a simulation by reading the data directly out of the file referenced in the 'source'. But that is just one repository, and we defined many in fact. Furthermore, you may have a heterogenous collection of data sources -- if for example you want to get your temperature from station data, but radiation from model output. You could define different repositories in the ForcingData
dictionary.
Ultimately, we bundle all these repositories up into a new class called a GeoTsRepositoryCollection
that we can use to populate the region_model.region_env
with data.
In [42]:
# we'll actually create a collection of repositories, as we have different input types.
from shyft.repository.geo_ts_repository_collection import GeoTsRepositoryCollection
def construct_geots_repo(datasets_config, epsg=None):
""" iterates over the different sources that are provided
and prepares the repository to read the data for each type"""
geo_ts_repos = []
src_types_to_extract = []
for source in datasets_config['sources']:
if epsg is not None:
source['params'].update({'epsg': epsg})
# note that here we are instantiating the different source repositories
# to place in the geo_ts list
geo_ts_repos.append(source['repository'](**source['params']))
src_types_to_extract.append(source['types'])
return GeoTsRepositoryCollection(geo_ts_repos, src_types_per_repo=src_types_to_extract)
# instantiate the repository
geots_repo = construct_geots_repo(ForcingData)
geots_repo
is now a "geographic timeseries repository", meaning that the timeseries it holds are spatially aware of their x,y,z coordinates (see CFDataRepository
for details). It also has several methods. One in particular we are interested in is the get_timeseries
method. However, before we can proceed, we need to define the period for the simulation.
In [43]:
# next, create the simulation dictionary
TimeDict = {'start_datetime': "2013-09-01T00:00:00",
'run_time_step': 86400, # seconds, daily
'number_of_steps': 365 # one year
}
def time_axis_from_dict(t_dict)->api.TimeAxis:
utc = api.Calendar()
sim_start = dt.datetime.strptime(t_dict['start_datetime'], "%Y-%m-%dT%H:%M:%S")
utc_start = utc.time(sim_start.year, sim_start.month, sim_start.day,\
sim_start.hour, sim_start.minute, sim_start.second)
tstep = t_dict['run_time_step']
nstep = t_dict['number_of_steps']
time_axis = api.TimeAxis(utc_start, tstep, nstep)
return time_axis
ta_1 = time_axis_from_dict(TimeDict)
print(f'1. {ta_1} \n {ta_1.total_period()}')
# or shyft-wise, ready tested, precise and less effort, two lines
utc = api.Calendar() # 'Europe/Oslo' can be passed to calendar for time-zone
ta_2 = api.TimeAxis(utc.time(2013, 9, 1), api.deltahours(24), 365)
print(f'2. {ta_2} \n {ta_2.total_period()}')
We now have an object that defines the time dimension for the simulation, and we will use this to initialize the region_model
with the "environmental timeseries" or env_ts
data. These containers will be given data from the appropriate repositories using the get_timeseries
function. Following the templates in the shyft.repository.interfaces
module, you'll see that the repositories should provide the capability to "screen" data based on time criteria and *optinally** geo_location criteria.
In [44]:
# we can extract our "bounding box" based on the `region_model` we set up
bbox = region_model.bounding_region.bounding_box(region_model.bounding_region.epsg())
period = ta_1.total_period() #just defined above
# required forcing data sets we want to retrieve
geo_ts_names = ("temperature", "wind_speed", "precipitation",
"relative_humidity", "radiation")
sources = geots_repo.get_timeseries( geo_ts_names, period, geo_location_criteria=bbox )
Now we have a new dictionary, called 'sources' that contains specialized Shyft api types specific to each forcing data type. You can look at one for example:
In [45]:
prec = sources['precipitation']
print(len(prec))
We can explore further and see each element is in itself an api.PrecipitationSource
, which has a timeseries (ts). Recall from the first tutorial that we can easily convert the timeseries.time_axis
into datetime values for plotting.
Let's plot the precip of each of the sources:
In [46]:
fig, ax = plt.subplots(figsize=(15,10))
for pr in prec:
t,p = [dt.datetime.utcfromtimestamp(t_.start) for t_ in pr.ts.time_axis], pr.ts.values
ax.plot(t,p, label=pr.mid_point().x) #uid is empty now, but we reserve for later use
fig.autofmt_xdate()
ax.legend(title="Precipitation Input Sources")
ax.set_ylabel("precip[mm/hr]")
Out[46]:
Finally, the next step will take the data from the sources and connect it to our region_model.region_env
class:
In [47]:
def get_region_environment(sources):
region_env = api.ARegionEnvironment()
region_env.temperature = sources["temperature"]
region_env.precipitation = sources["precipitation"]
region_env.radiation = sources["radiation"]
region_env.wind_speed = sources["wind_speed"]
region_env.rel_hum = sources["relative_humidity"]
return region_env
region_model.region_env = get_region_environment(sources)
And now our forcing data is connected to the region_model
. We are almost ready to run a simulation. There is just one more step. We've connected the sources to the model, but remember that Shyft is a distributed modeling framework, and we've connected point data sources (in this case). So we need to get the data from the observed points to each cell. This is done through interpolation.
In Shyft there are predefined routines for interpolation. In the interp_config
class below one quickly recognizes the same input source type keywords that are used as keys to the params
dictionary. params
is simply a dictionary of dictionaries which contains the parameters used by the interpolation model that is specific for each source type.
In [48]:
from shyft.repository.interpolation_parameter_repository import InterpolationParameterRepository
class interp_config(object):
""" a simple class to provide the interpolation parameters """
def __init__(self):
self.interp_params = {'precipitation': {'method': 'idw',
'params': {'distance_measure_factor': 1.0,
'max_distance': 600000.0,
'max_members': 10,
'scale_factor': 1.02}},
'radiation': {'method': 'idw',
'params': {'distance_measure_factor': 1.0,
'max_distance': 600000.0,
'max_members': 10}},
'relative_humidity': {'method': 'idw',
'params': {'distance_measure_factor': 1.0,
'max_distance': 600000.0,
'max_members': 10}},
'temperature': {'method': 'btk',
'params': {'nug': 0.5,
'range': 200000.0,
'sill': 25.0,
'temperature_gradient': -0.6,
'temperature_gradient_sd': 0.25,
'zscale': 20.0}},
'wind_speed': {'method': 'idw',
'params': {'distance_measure_factor': 1.0,
'max_distance': 600000.0,
'max_members': 10}}}
def interpolation_parameters(self):
return self.interp_params
ip_conf = interp_config()
ip_repo = InterpolationParameterRepository(ip_conf)
region_model.interpolation_parameter = ip_repo.get_parameters(0) #just a '0' for now
The next step is to set the intial states of the model using our last repository. This one, the GeneratedStateRepository
will set empty default values.
Now we are nearly ready to conduct a simulation. We just need to run a few methods to prepare the model and cells for the simulation. The region_model has a method called initalize_cell_environment
that takes a time_axis
type as input. We defined the time_axis
above, so now we'll use it to initialize the model. At the same time, we'll set the initial_state. Then we can actually run a simulation!
In [49]:
from shyft.repository.generated_state_repository import GeneratedStateRepository
init_values = {'gs': {'acc_melt': 0.0,
'albedo': 0.65,
'alpha': 6.25,
'iso_pot_energy': 0.0,
'lwc': 0.1,
'sdc_melt_mean': 0.0,
'surface_heat': 30000.0,
'temp_swe': 0.0},
'kirchner': {'q': 0.01}}
state_generator = GeneratedStateRepository(region_model)#, init_values=init_values)
# we need the state_repository to have the same size as the model
#state_repo.n = region_model.size()
# there is only 1 state (indexed '0')
s0 = state_generator.get_state(0)
not_applied_list=region_model.state.apply_state( # apply state set the current state according to arguments
cell_id_state_vector=s0, # ok, easy to get
cids=[] # empty means apply all, if we wanted to only apply state for specific catchment-ids, this is where to put them
)
assert len(not_applied_list)==0, 'Ensure all states was matched and applied to the model'
region_model.initial_state=region_model.current_state # now we stash the current state to the initial state
In [50]:
state_generator.find_state?
We now have a region_model
that is ready for simulation. As we discussed before, we still need to get the data from our point observations interpolated to the cells, and we need to get the env_ts
of each cell populated. But all the machinery is now in place to make this happen.
To summarize, we've created:
region_repo
, a region repository that contains information related to region of simulation and the model to be used in the simulation. From this we get a region_model
geots_repo
, a geo-timeseries repository that provides a mechanism to pull the data we require from our 'sources'.time_axis
, created from the TimeAxisFixedDeltaT class of shyft
to provide the period of simulation.ip_repo
, an interpolation repository which provides all the required parameters for interpolating our data to the distributed cells -- following variable specific protocols/models.state_repo
, a GeneratedStateRepository
used to provide our simulation an initial state.The next step is simply to initialize the cell environment and run the interpolation. As a practive, before simulation we reset to the initial state (we're there already, but it is something you have to do before a new simulation), and then run the cells. First we'll initialize the cell environment:
In [51]:
region_model.initialize_cell_environment(ta_1)
As a habit, we have a quick "sanity check" function to see if the model is runnable. Itis recommended to have this function when you create 'run scripts'.
In [52]:
def runnable(reg_mod):
""" returns True if model is properly configured
**note** this is specific depending on your model's input data requirements """
return all((reg_mod.initial_state.size() > 0, reg_mod.time_axis.size() > 0,
all([len(getattr(reg_mod.region_env, attr)) > 0 for attr in
("temperature", "wind_speed", "precipitation", "rel_hum", "radiation")])))
In [53]:
# run the model, e.g. as you may configure it in a script:
if runnable(region_model):
region_model.interpolate(region_model.interpolation_parameter, region_model.region_env)
region_model.revert_to_initial_state()
region_model.run_cells()
else:
print('Something wrong with model configuration.')
Okay, so the simulation was run. Now we may be interested in looking at some of the output. We'll take a brief summary glance in the next section, and save a deeper dive into the simulation results for another notebook.
In [54]:
# Here we are going to extact data from the simulation.
# We start by creating a list to hold discharge for each of the subcatchments.
# Then we'll get the data from the region_model object
import pandas as pd
# mapping of internal catch ID to catchment
catchment_id_map = region_model.catchment_id_map
# First get the time-axis which we'll use as the index for the data frame
ta = region_model.time_axis
# and convert it to datetimes
index = [dt.datetime.utcfromtimestamp(p.start) for p in ta]
# Now we'll add all the discharge series for each catchment
data = {}
for cid in catchment_id_map:
# get the discharge time series for the subcatchment
q_ts = region_model.statistics.discharge([int(cid)])
data[cid] = q_ts.values.to_numpy()
df = pd.DataFrame(data, index=index)
# we can simply use:
ax = df.plot(figsize=(20,15))
ax.legend(title="Catch. ID")
ax.set_ylabel("discharge [m3 s-1]")
Out[54]:
In [ ]:
Okay, that was simple. Let's look at the timeseries in some individual cells. The following is a bit of a contrived example, but it shows some aspects of the api. We'll plot the temperature series of all the cells in one sub-catchment, and color them by elevation. This doesn't necessarily show anything about the simulation, per se, but rather results from the interpolation step.
In [55]:
from matplotlib.cm import jet as jet
from matplotlib.colors import Normalize
# get all the cells for one sub-catchment with 'id' == 1228
c1228 = [c for c in region_model.cells if c.geo.catchment_id() == 1228]
# for plotting, create an mpl normalizer based on min,max elevation
elv = [c.geo.mid_point().z for c in c1228]
norm = Normalize(min(elv), max(elv))
#plot with line color a function of elevation
fig, ax = plt.subplots(figsize=(15,10))
# here we are cycling through each of the cells in c1228
for dat,elv in zip([c.env_ts.temperature.values for c in c1228], [c.mid_point().z for c in c1228]):
ax.plot(dat, color=jet(norm(elv)), label=int(elv))
# the following is just to plot the legend entries and not related to Shyft
handles, labels = ax.get_legend_handles_labels()
# sort by labels
import operator
hl = sorted(zip(handles, labels),
key=operator.itemgetter(1))
handles2, labels2 = zip(*hl)
# show legend, but only every fifth entry
ax.legend(handles2[::5], labels2[::5], title='Elevation [m]')
Out[55]:
As we would expect from the temperature kriging method, we should find higher elevations have colder temperatures. As an exercise you could explore this relationship using a scatter plot.
Now we're going to create a function that will read initial states from the initial_state_repo
. In practice, this is already done by the ConfgiSimulator
, but to demonstrate lower level functions, we'll reset the states of our region_model
:
In [56]:
# create a function to read the states from the state repository
def get_init_state_from_repo(initial_state_repo_, region_model_id_=None, timestamp=None):
state_id = 0
if hasattr(initial_state_repo_, 'n'): # No stored state, generated on-the-fly
initial_state_repo_.n = region_model.size()
else:
states = initial_state_repo_.find_state(
region_model_id_criteria=region_model_id_,
utc_period_criteria=timestamp)
if len(states) > 0:
state_id = states[0].state_id # most_recent_state i.e. <= start time
else:
raise Exception('No initial state matching criteria.')
return initial_state_repo_.get_state(state_id)
init_state = get_init_state_from_repo(state_generator, timestamp=region_model.time_axis.start)
Don't worry too much about the function for now, but do take note of the init_state
object that we created. This is another container, this time it is a class that contains PTGSKStateWithId
objects, which are specific to the model stack implemented in the simulation (in this case PTGSK
). If we explore an individual state object, we'll see init_state
contains, for each cell in our simulation, the state variables for each 'method' of the method stack.
Let's look more closely:
In [57]:
def print_pub_attr(obj):
#only public attributes
print(f'{obj.__class__.__name__}:\t',[attr for attr in dir(obj) if attr[0] is not '_'])
print(len(init_state))
init_state_cell0 = init_state[0]
# the identifier
print_pub_attr(init_state_cell0.id)
# gam snow states
print_pub_attr(init_state_cell0.state.gs)
#init_state_cell0.kirchner states
print_pub_attr(init_state_cell0.state.kirchner)
We have now explored the region_model
and looked at how to instantiate a region_model
by using a api.ARegionEnvironment
, containing a collection of timeseries sources, and passing an api.InterpolationParameter
class containing the parameters to use for the data interpolation algorithms. The interpolation step "populated" our cells with data from the point sources.
The cells each contain all the information related to the simulation (their own timeseries, env_ts
; their own model parameters, parameter
; and other attributes and methods). In future tutorials we'll work with the cells indivdual "resource collector" (.rc
) and "state collector" (.sc
) attributes.