Freyberg Model PEST setup example

Herein, we will show users how to use pyEMU to setup a groundwater model for use in pest. We will cover the following topics:

  • setup pilot points as parameters, including 1st-order tikhonov regularization
  • setup other model inputs as parameters
  • setup simulated water levels as observations
  • setup simulated water budget components as observations (or forecasts)
  • create a pest control file and adjust observation weights to balance the objective function

Note that, in addition to pyemu, this notebook relies on flopy. flopy can be obtained (along with installation instructions) at https://github.com/modflowpy/flopy.


In [ ]:
%matplotlib inline
import os
import shutil
import platform
import numpy as np
import pandas as pd
from matplotlib.patches import Rectangle as rect
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore", 
    message="ModflowDis.sr is deprecated. use Modflow.sr")
from mpl_toolkits.axes_grid1 import make_axes_locatable
import matplotlib as mpl
newparams = {'legend.fontsize':10, 'axes.labelsize':10,
             'xtick.labelsize':10, 'ytick.labelsize':10,
             'font.family':'Univers 57 Condensed', 
             'pdf.fonttype':42}
plt.rcParams.update(newparams)
import pyemu

Model background

This example is based on the synthetic classroom model of Freyberg(1988). The model is a 2-dimensional MODFLOW model with 1 layer, 40 rows, and 20 columns. The model has 2 stress periods: an initial steady-state stress period used for calibration, and a 5-year transient stress period. The calibration period uses the recharge and well flux of Freyberg(1988); the last stress period use 25% less recharge and 25% more pumping to represent future conditions for a forecast period.

Freyberg, David L. "AN EXERCISE IN GROUND‐WATER MODEL CALIBRATION AND PREDICTION." Groundwater 26.3 (1988): 350-360.


In [ ]:
#load the existing model and save it in a new dir and make sure it runs
import flopy
model_ws = os.path.join("Freyberg_transient")
ml = flopy.modflow.Modflow.load("freyberg.nam",model_ws=model_ws,verbose=False)
ml.exe_name = "mfnwt"
ml.model_ws = "temp"
EXE_DIR = os.path.join("..","bin")
if "window" in platform.platform().lower():
    EXE_DIR = os.path.join(EXE_DIR,"win")
elif "darwin" in platform.platform().lower():
    EXE_DIR = os.path.join(EXE_DIR,"mac")
else:
    EXE_DIR = os.path.join(EXE_DIR,"linux")

[shutil.copy2(os.path.join(EXE_DIR,f),os.path.join("temp",f)) for f in os.listdir(EXE_DIR)]

ml.write_input()
ml.run_model()

Observations

HOB observations

Here we are going to setup an hob package to handle getting the observations from modflow. Normally, you would already have this file made, but here we are just making one for fun


In [ ]:
rc_df = pd.read_csv(os.path.join("Freyberg","misc","obs_rowcol.dat"),delim_whitespace=True)
hds = flopy.utils.HeadFile(os.path.join(ml.model_ws,"freyberg.hds"))
data = hds.get_alldata()
obs = []
roff = 0.0#ml.dis.delc.array[0] / 2.0
coff = 0.0#ml.dis.delr.array[0] / 2.0
for n,r,c in zip(rc_df.name,rc_df.row,rc_df.col):
    name = "i{1:02d}j{2:02d}".format(n,r-1,c-1)
    d = np.zeros((data.shape[0]-1,2))
    d[:,0] = hds.times[1:]
    d[:,1] = data[1:,0,r-1,c-1] + np.random.randn(d.shape[0]) #add some random noise to the observations
    obs.append(flopy.modflow.HeadObservation(ml,obsname=name,layer=0,row=r-1,
                                  column=c-1,roff=roff,coff=coff,
                                  time_series_data=d))
flopy.modflow.ModflowHob(ml,obs_data=obs,iuhobsv=600)
ext_path = os.path.join(ml.model_ws,"ref")
if os.path.exists(ext_path):
    shutil.rmtree(ext_path)
print(ext_path)
os.mkdir(ext_path)
ml.external_path = os.path.split(ext_path)[-1]
ml.upw.hk.fmtin = "(FREE)"
ml.upw.sy.fmtin = "(FREE)"
ml.rch.rech.fmtin = "(FREE)"
ml.write_input()
ml.run_model()

pyemu has a helper function to setup this instruction file for you and also load observations into dataframe


In [ ]:
hob_df = pyemu.gw_utils.modflow_hob_to_instruction_file(os.path.join(ml.model_ws,ml.name+".hob.out"))

The dataframe returned has a lot of useful info that we will use later...


In [ ]:
hob_df.head()

list file budget components as observations (or forecasts)

Here we will use flopy and pyemu to load each of the flux and volume budget components from the modflow list file to use as observations (or forecasts). These are valuable pieces of information and since observations are free, why not include them? This helper function writes two instruction files: <flx_filename>.ins and <vol_filename>.ins


In [ ]:
# the flux budget output filename that will be written during each forward run
flx_filename=os.path.join(ml.model_ws,"flx.out")

# the volumne budget output filename that will be written during each forward run
vol_filename = os.path.join(ml.model_ws,"vol.out")
df_wb = pyemu.gw_utils.setup_mflist_budget_obs(os.path.join(ml.model_ws,ml.name+".list"))

In [ ]:
df_wb.head()

Parameters

pilot points

Here we will setup pilot points for several array-based modflow inputs using pyemu

setup pilot point locations

first specify what pilot point names we want to use for each model layer (counting from 0). Here we will setup pilot points for hk, sy and rech. The rech pilot points will be used as a single multiplier array for all stress periods to account for potential spatial bias in recharge.


In [ ]:
prefix_dict= {0:["hk1","sy1","rech1"]}

This helper function is doing a lot of things: writing templates, pilot point files, and creating a shapefile of pilot points. The every_n_cell arg is key: it decides how many cells to skip between pilot point locations - since we passed the model, only active model cells get pilot points (using bas6.ibound). Like many things with flopy, the SpatialReference is used to define pilot point x and y coordinates


In [ ]:
pp_cells = 3
pp_df = pyemu.pp_utils.setup_pilotpoints_grid(ml,prefix_dict=prefix_dict,every_n_cell=pp_cells,pp_dir=ml.model_ws,
                                           tpl_dir=ml.model_ws,shapename=os.path.join(ml.model_ws,"pp.shp"))

The dataframe return has the same info as the shapefile that was written - useful info, right?


In [ ]:
pp_df.index = pp_df.parnme
pp_df

geostats and kriging

now that we have pilot points setup, we need to solve the kriging equations for each model cell using pilot point locations. Since we only have a single set of pilot points that we are reusing for several array-based modflow inputs, we only need to get the kriging factors once


In [ ]:
hk_pp = pyemu.pp_utils.pp_file_to_dataframe(os.path.join(ml.model_ws,"hk1pp.dat"))

In [ ]:
hk_pp.head()

Let's setup a geostatistical structure. The contribution doesn't matter for pilot point interpolation, but it does matter when we want to form a prior parameter covariance matrix - we will get to that later. A good rule of thumb is to use an a value that is three times the pilot point spacing. Also, since the all of these pilot points will be log transformed, we need to use a log-based geostatistical structure


In [ ]:
a = pp_cells * ml.dis.delr.array[0] * 3.0
v = pyemu.geostats.ExpVario(contribution=1.0,a=a)
gs = pyemu.geostats.GeoStruct(variograms=v,transform="log")
gs.plot()

This is where things get fun. First we create an OrdinaryKrige object


In [ ]:
ok = pyemu.geostats.OrdinaryKrige(geostruct=gs,point_data=hk_pp)

Now we use a helper function to solve the kriging factors for each active model cell: OrdinaryKrige.calc_factors_grid() includes all the standard kriging arguments, such as search radius, min and max interpolation points,zone_array, as well as the option to save the kriging variance array


In [ ]:
ok.calc_factors_grid(ml.sr, zone_array=ml.bas6.ibound[0].array,var_filename=os.path.join(ml.model_ws,"layer1_var.dat"))

Ok, we know that this function is slow for bigly models, but it is super convienent and allows a lot of flexibility. So, once we have calculated the kriging factors for each active model cell, we need to write this to a factors file


In [ ]:
ok.to_grid_factors_file(os.path.join(ml.model_ws,"pp.fac"))

Let's check out that kriging variance array....


In [ ]:
var_arr = np.ma.masked_invalid(np.loadtxt(os.path.join(ml.model_ws,"layer1_var.dat")))
fig = plt.figure(figsize=(20,20))
ax = plt.subplot(111,aspect="equal")
ax.pcolormesh(ml.sr.xcentergrid,ml.sr.ycentergrid,var_arr,alpha=0.5)
ax.scatter(hk_pp.x, hk_pp.y,marker='.',s=10)

In [ ]:
ml.sr.xcentergrid[0,0],ml.sr.ycentergrid[0,0]

In [ ]:
hk_pp.iloc[0,:].values

other inputs as parameters

Since we rarely know any model inputs perfectly, it is advisable to subject them to adjustment...not to get a good fit, but so we can account for there contribution to uncertainty...How about the conductance between the surface water and groundwater systems. In this model, we are using drain type boundaries. So, let's setup a multiplier parameter for each drain cell's conductance.

Since we told flopy to write external files, all of the list-type modflow inputs are also external, which makes this so much easier! The first thing to do is copy the orginal drain list files (and all other files in the external directory) to a safe place:


In [ ]:
ext_path = os.path.join(ml.model_ws,"ref")
ext_files = [f for f in os.listdir(ext_path)]
drain_files = [f for f in ext_files if "drn" in f.lower()]
#print(drain_files)
assert len(drain_files) == ml.nper,"{0},{1}".format(len(drain_files),ml.nper)
bak_path = os.path.join(ml.model_ws,"bak")
if os.path.exists(bak_path):
    shutil.rmtree(bak_path)
os.mkdir(bak_path)
for f in ext_files:
    shutil.copy2(os.path.join(ext_path,f),os.path.join(bak_path,f))
#assert len(os.listdir(bak_path)) == ml.nper

Now all we need to do is write a template file. We will also write a generic cooresponding input file that will make testing easier later


In [ ]:
drn_df = pd.read_csv(os.path.join(bak_path,drain_files[0]),
                     header=None,names=["l","r","c","stage","cond"],
                    delim_whitespace=True)
f_tpl = open(os.path.join(ml.model_ws,"drain_mlt.dat.tpl"),'w')
f_in = open(os.path.join(ml.model_ws,"drain_mlt.dat"),'w')
f_tpl.write("ptf ~\n")
#build parameter names from model cell info
drn_df.loc[:,"parnme"] = drn_df.apply(lambda x: "drn_i{1:02.0f}j{2:02.0f}".format(x.l-1,x.r-1,x.c-1),axis=1)
for parnme in drn_df.parnme:
    f_tpl.write("{0}  ~   {0}   ~\n".format(parnme))
    f_in.write("{0}     1.0\n".format(parnme))
f_tpl.close()
f_in.close()

Building the pest control file...Finally!

Here we will use the template and instruction files to construct a control file. Then we will use some pandas magic to set the appropriate parameter and observation info


In [ ]:
tpl_files = [os.path.join(ml.model_ws,f) for f in os.listdir(ml.model_ws) if f.endswith(".tpl")]
input_files = [f.replace(".tpl",'') for f in tpl_files]
tpl_files

See why it is important to use a consistent naming structure for the templates-input file pairs? Its the same for the instruction files


In [ ]:
ins_files = [os.path.join(ml.model_ws,f) for f in os.listdir(ml.model_ws) if f.endswith(".ins")]
output_files = [f.replace(".ins",'') for f in ins_files]
ins_files

Now use these files to get a pyemu.Pst instance. This object has lots of cool functionality...


In [ ]:
pst = pyemu.Pst.from_io_files(tpl_files,input_files,ins_files,output_files)

Let's look at some of the important parts of the Pst class. First, all attributes coorespond to the names in list in the pest manual. For instance, the * parameter data section of the control file is a pandas.DataFrame attribute named parameter_data:


In [ ]:
pst.parameter_data.head()

We see that the columns of the DataFrame follow the pest naming conventions. Its the same for * observation data:


In [ ]:
pst.observation_data.head()

What pyemu has set as the obsval is the simulated equivalent, if it is available - in the pst_from_io_files() helper, pyemu tries to call inschek, and, if successful, loads the output files from inschek. This can be very handy for error checking in the forward-run process. However, we still need to get the actual observed data into obsval...remember that dataframe from hob processing?


In [ ]:
hob_df.head()

Notice the obsval column? Let's just set the index of this dataframe to obsnme, then pandas does the hard work for us:


In [ ]:
hob_df.index = hob_df.obsnme
hob_df.head()

In [ ]:
pst.observation_data.loc[hob_df.index,"obsval"] = hob_df.obsval
pst.observation_data.loc[hob_df.index,:].head()

BOOM! that was easy...trying doing that without pandas....not fun!

We still have a few more items to set to specific values. The biggest one is initial values for parameters - they are given default values of 1.0:


In [ ]:
pst.parameter_data.head()

Luckily, pandas makes this very easy. For example, let's set the DRN conductance parameters to have initial values of mean of the values in the model currently:


In [ ]:
avg = ml.drn.stress_period_data[0]["cond"].mean()
par = pst.parameter_data #just a pointer to the full, long-named attribute
drn_pars = par.loc[par.parnme.apply(lambda x: x.startswith("drn")),"parnme"].values
par.loc[drn_pars,"parval1"] = avg
#set the par group to mean something
par.loc[drn_pars,"pargp"] = "drn_cond"
par.loc[drn_pars,"parubnd"] = avg * 10.0
par.loc[drn_pars,"parlbnd"] = avg * 0.1

Let's set the pargp for the remaining parameters using that cool pilot point dataframe from eariler...


In [ ]:
par.loc[pp_df.parnme,"pargp"] = pp_df.pargp

We need to reset the model run command:


In [ ]:
pst.model_command

that is just a generic command. I prefer to use python scripts for this:


In [ ]:
pst.model_command = ["python forward_run.py"]

Let's save this version of the control file


In [ ]:
pst.write(os.path.join(ml.model_ws,"pest.pst"))

But this means we need to write forward_run.py and it needs to perform several actions:

  • apply kriging factors (using pyemu.gw_utils.fac2real())
  • apply the drain multipliers
  • call MODFLOW
  • process the MODFLOW list file

Lucky for you, I already made this file....


In [ ]:
shutil.copy2(os.path.join("Freyberg_transient","forward_run.py"),os.path.join(ml.model_ws,"forward_run.py"))

adding prior information

pyemu supports both zero-order (preferred value) and first-order (preferred difference) Tikhonov regularization. Let's set preferred value for the conductance parameters:


In [ ]:
pyemu.utils.helpers.zero_order_tikhonov(pst,par_groups=["drn_cond"])
pst.prior_information.head()

Now, let's set preferred difference equations for pilot point groups. We will use the Pearson coef as the weight...


In [ ]:
pp_groups = pp_df.groupby("pargp").groups
for pargp,par_names in pp_groups.items():
    this_pp_df = pp_df.loc[par_names,:]
    cov = gs.covariance_matrix(this_pp_df.x,this_pp_df.y,this_pp_df.parnme)
    pyemu.helpers.first_order_pearson_tikhonov(pst,cov,reset=False,abs_drop_tol=0.2)

In [ ]:
pst.prior_information

In [ ]:
pst.control_data.pestmode = "regularization"

setting PEST++ options

Some things I like to add:


In [ ]:
pst.pestpp_options["svd_pack"] = "redsvd"
#pst.pestpp_options["forecasts"] =

saving the new control file


In [ ]:
pst.write("freyberg_reg.pst")

In [ ]:


In [ ]:


In [ ]:


In [ ]: