Herein, we will show users how to use pyEMU to setup a groundwater model for use in pest. We will cover the following topics:
Note that, in addition to pyemu
, this notebook relies on flopy
. flopy
can be obtained (along with installation instructions) at https://github.com/modflowpy/flopy.
In [ ]:
%matplotlib inline
import os
import shutil
import platform
import numpy as np
import pandas as pd
from matplotlib.patches import Rectangle as rect
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore",
message="ModflowDis.sr is deprecated. use Modflow.sr")
from mpl_toolkits.axes_grid1 import make_axes_locatable
import matplotlib as mpl
newparams = {'legend.fontsize':10, 'axes.labelsize':10,
'xtick.labelsize':10, 'ytick.labelsize':10,
'font.family':'Univers 57 Condensed',
'pdf.fonttype':42}
plt.rcParams.update(newparams)
import pyemu
This example is based on the synthetic classroom model of Freyberg(1988). The model is a 2-dimensional MODFLOW model with 1 layer, 40 rows, and 20 columns. The model has 2 stress periods: an initial steady-state stress period used for calibration, and a 5-year transient stress period. The calibration period uses the recharge and well flux of Freyberg(1988); the last stress period use 25% less recharge and 25% more pumping to represent future conditions for a forecast period.
Freyberg, David L. "AN EXERCISE IN GROUND‐WATER MODEL CALIBRATION AND PREDICTION." Groundwater 26.3 (1988): 350-360.
In [ ]:
#load the existing model and save it in a new dir and make sure it runs
import flopy
model_ws = os.path.join("Freyberg_transient")
ml = flopy.modflow.Modflow.load("freyberg.nam",model_ws=model_ws,verbose=False)
ml.exe_name = "mfnwt"
ml.model_ws = "temp"
EXE_DIR = os.path.join("..","bin")
if "window" in platform.platform().lower():
EXE_DIR = os.path.join(EXE_DIR,"win")
elif "darwin" in platform.platform().lower():
EXE_DIR = os.path.join(EXE_DIR,"mac")
else:
EXE_DIR = os.path.join(EXE_DIR,"linux")
[shutil.copy2(os.path.join(EXE_DIR,f),os.path.join("temp",f)) for f in os.listdir(EXE_DIR)]
ml.write_input()
ml.run_model()
In [ ]:
rc_df = pd.read_csv(os.path.join("Freyberg","misc","obs_rowcol.dat"),delim_whitespace=True)
hds = flopy.utils.HeadFile(os.path.join(ml.model_ws,"freyberg.hds"))
data = hds.get_alldata()
obs = []
roff = 0.0#ml.dis.delc.array[0] / 2.0
coff = 0.0#ml.dis.delr.array[0] / 2.0
for n,r,c in zip(rc_df.name,rc_df.row,rc_df.col):
name = "i{1:02d}j{2:02d}".format(n,r-1,c-1)
d = np.zeros((data.shape[0]-1,2))
d[:,0] = hds.times[1:]
d[:,1] = data[1:,0,r-1,c-1] + np.random.randn(d.shape[0]) #add some random noise to the observations
obs.append(flopy.modflow.HeadObservation(ml,obsname=name,layer=0,row=r-1,
column=c-1,roff=roff,coff=coff,
time_series_data=d))
flopy.modflow.ModflowHob(ml,obs_data=obs,iuhobsv=600)
ext_path = os.path.join(ml.model_ws,"ref")
if os.path.exists(ext_path):
shutil.rmtree(ext_path)
print(ext_path)
os.mkdir(ext_path)
ml.external_path = os.path.split(ext_path)[-1]
ml.upw.hk.fmtin = "(FREE)"
ml.upw.sy.fmtin = "(FREE)"
ml.rch.rech.fmtin = "(FREE)"
ml.write_input()
ml.run_model()
pyemu
has a helper function to setup this instruction file for you and also load observations into dataframe
In [ ]:
hob_df = pyemu.gw_utils.modflow_hob_to_instruction_file(os.path.join(ml.model_ws,ml.name+".hob.out"))
The dataframe returned has a lot of useful info that we will use later...
In [ ]:
hob_df.head()
Here we will use flopy
and pyemu
to load each of the flux and volume budget components from the modflow
list file to use as observations (or forecasts). These are valuable pieces of information and since observations are free, why not include them? This helper function writes two instruction files: <flx_filename>.ins
and <vol_filename>.ins
In [ ]:
# the flux budget output filename that will be written during each forward run
flx_filename=os.path.join(ml.model_ws,"flx.out")
# the volumne budget output filename that will be written during each forward run
vol_filename = os.path.join(ml.model_ws,"vol.out")
df_wb = pyemu.gw_utils.setup_mflist_budget_obs(os.path.join(ml.model_ws,ml.name+".list"))
In [ ]:
df_wb.head()
first specify what pilot point names we want to use for each model layer (counting from 0). Here we will setup pilot points for hk
, sy
and rech
. The rech
pilot points will be used as a single multiplier array for all stress periods to account for potential spatial bias in recharge.
In [ ]:
prefix_dict= {0:["hk1","sy1","rech1"]}
This helper function is doing a lot of things: writing templates, pilot point files, and creating a shapefile of pilot points. The every_n_cell
arg is key: it decides how many cells to skip between pilot point locations - since we passed the model
, only active model cells get pilot points (using bas6.ibound
). Like many things with flopy
, the SpatialReference
is used to define pilot point x
and y
coordinates
In [ ]:
pp_cells = 3
pp_df = pyemu.pp_utils.setup_pilotpoints_grid(ml,prefix_dict=prefix_dict,every_n_cell=pp_cells,pp_dir=ml.model_ws,
tpl_dir=ml.model_ws,shapename=os.path.join(ml.model_ws,"pp.shp"))
The dataframe
return has the same info as the shapefile that was written - useful info, right?
In [ ]:
pp_df.index = pp_df.parnme
pp_df
In [ ]:
hk_pp = pyemu.pp_utils.pp_file_to_dataframe(os.path.join(ml.model_ws,"hk1pp.dat"))
In [ ]:
hk_pp.head()
Let's setup a geostatistical structure. The contribution doesn't matter for pilot point interpolation, but it does matter when we want to form a prior parameter covariance matrix - we will get to that later. A good rule of thumb is to use an a
value that is three times the pilot point spacing. Also, since the all of these pilot points will be log transformed, we need to use a log-based geostatistical structure
In [ ]:
a = pp_cells * ml.dis.delr.array[0] * 3.0
v = pyemu.geostats.ExpVario(contribution=1.0,a=a)
gs = pyemu.geostats.GeoStruct(variograms=v,transform="log")
gs.plot()
This is where things get fun. First we create an OrdinaryKrige
object
In [ ]:
ok = pyemu.geostats.OrdinaryKrige(geostruct=gs,point_data=hk_pp)
Now we use a helper function to solve the kriging factors for each active model cell: OrdinaryKrige.calc_factors_grid()
includes all the standard kriging arguments, such as search radius, min and max interpolation points,zone_array, as well as the option to save the kriging variance array
In [ ]:
ok.calc_factors_grid(ml.sr, zone_array=ml.bas6.ibound[0].array,var_filename=os.path.join(ml.model_ws,"layer1_var.dat"))
Ok, we know that this function is slow for bigly models, but it is super convienent and allows a lot of flexibility. So, once we have calculated the kriging factors for each active model cell, we need to write this to a factors file
In [ ]:
ok.to_grid_factors_file(os.path.join(ml.model_ws,"pp.fac"))
Let's check out that kriging variance array....
In [ ]:
var_arr = np.ma.masked_invalid(np.loadtxt(os.path.join(ml.model_ws,"layer1_var.dat")))
fig = plt.figure(figsize=(20,20))
ax = plt.subplot(111,aspect="equal")
ax.pcolormesh(ml.sr.xcentergrid,ml.sr.ycentergrid,var_arr,alpha=0.5)
ax.scatter(hk_pp.x, hk_pp.y,marker='.',s=10)
In [ ]:
ml.sr.xcentergrid[0,0],ml.sr.ycentergrid[0,0]
In [ ]:
hk_pp.iloc[0,:].values
Since we rarely know any model inputs perfectly, it is advisable to subject them to adjustment...not to get a good fit
, but so we can account for there contribution to uncertainty...How about the conductance between the surface water and groundwater systems. In this model, we are using drain
type boundaries. So, let's setup a multiplier parameter for each drain
cell's conductance.
Since we told flopy
to write external files, all of the list-type modflow
inputs are also external, which makes this so much easier! The first thing to do is copy the orginal drain list files (and all other files in the external directory) to a safe place:
In [ ]:
ext_path = os.path.join(ml.model_ws,"ref")
ext_files = [f for f in os.listdir(ext_path)]
drain_files = [f for f in ext_files if "drn" in f.lower()]
#print(drain_files)
assert len(drain_files) == ml.nper,"{0},{1}".format(len(drain_files),ml.nper)
bak_path = os.path.join(ml.model_ws,"bak")
if os.path.exists(bak_path):
shutil.rmtree(bak_path)
os.mkdir(bak_path)
for f in ext_files:
shutil.copy2(os.path.join(ext_path,f),os.path.join(bak_path,f))
#assert len(os.listdir(bak_path)) == ml.nper
Now all we need to do is write a template file. We will also write a generic cooresponding input file that will make testing easier later
In [ ]:
drn_df = pd.read_csv(os.path.join(bak_path,drain_files[0]),
header=None,names=["l","r","c","stage","cond"],
delim_whitespace=True)
f_tpl = open(os.path.join(ml.model_ws,"drain_mlt.dat.tpl"),'w')
f_in = open(os.path.join(ml.model_ws,"drain_mlt.dat"),'w')
f_tpl.write("ptf ~\n")
#build parameter names from model cell info
drn_df.loc[:,"parnme"] = drn_df.apply(lambda x: "drn_i{1:02.0f}j{2:02.0f}".format(x.l-1,x.r-1,x.c-1),axis=1)
for parnme in drn_df.parnme:
f_tpl.write("{0} ~ {0} ~\n".format(parnme))
f_in.write("{0} 1.0\n".format(parnme))
f_tpl.close()
f_in.close()
In [ ]:
tpl_files = [os.path.join(ml.model_ws,f) for f in os.listdir(ml.model_ws) if f.endswith(".tpl")]
input_files = [f.replace(".tpl",'') for f in tpl_files]
tpl_files
See why it is important to use a consistent naming structure for the templates-input file pairs? Its the same for the instruction files
In [ ]:
ins_files = [os.path.join(ml.model_ws,f) for f in os.listdir(ml.model_ws) if f.endswith(".ins")]
output_files = [f.replace(".ins",'') for f in ins_files]
ins_files
Now use these files to get a pyemu.Pst
instance. This object has lots of cool functionality...
In [ ]:
pst = pyemu.Pst.from_io_files(tpl_files,input_files,ins_files,output_files)
Let's look at some of the important parts of the Pst
class. First, all attributes coorespond to the names in list in the pest manual. For instance, the * parameter data
section of the control file is a pandas.DataFrame
attribute named parameter_data
:
In [ ]:
pst.parameter_data.head()
We see that the columns of the DataFrame
follow the pest naming conventions. Its the same for * observation data
:
In [ ]:
pst.observation_data.head()
What pyemu
has set as the obsval
is the simulated equivalent, if it is available - in the pst_from_io_files()
helper, pyemu
tries to call inschek
, and, if successful, loads the output files from inschek
. This can be very handy for error checking in the forward-run process. However, we still need to get the actual observed data into obsval
...remember that dataframe from hob processing?
In [ ]:
hob_df.head()
Notice the obsval
column? Let's just set the index of this dataframe to obsnme
, then pandas does the hard work for us:
In [ ]:
hob_df.index = hob_df.obsnme
hob_df.head()
In [ ]:
pst.observation_data.loc[hob_df.index,"obsval"] = hob_df.obsval
pst.observation_data.loc[hob_df.index,:].head()
BOOM! that was easy...trying doing that without pandas....not fun!
We still have a few more items to set to specific values. The biggest one is initial values for parameters - they are given default values of 1.0:
In [ ]:
pst.parameter_data.head()
Luckily, pandas
makes this very easy. For example, let's set the DRN
conductance parameters to have initial values of mean of the values in the model currently:
In [ ]:
avg = ml.drn.stress_period_data[0]["cond"].mean()
par = pst.parameter_data #just a pointer to the full, long-named attribute
drn_pars = par.loc[par.parnme.apply(lambda x: x.startswith("drn")),"parnme"].values
par.loc[drn_pars,"parval1"] = avg
#set the par group to mean something
par.loc[drn_pars,"pargp"] = "drn_cond"
par.loc[drn_pars,"parubnd"] = avg * 10.0
par.loc[drn_pars,"parlbnd"] = avg * 0.1
Let's set the pargp
for the remaining parameters using that cool pilot point dataframe from eariler...
In [ ]:
par.loc[pp_df.parnme,"pargp"] = pp_df.pargp
We need to reset the model run command:
In [ ]:
pst.model_command
that is just a generic command. I prefer to use python scripts for this:
In [ ]:
pst.model_command = ["python forward_run.py"]
Let's save this version of the control file
In [ ]:
pst.write(os.path.join(ml.model_ws,"pest.pst"))
But this means we need to write forward_run.py
and it needs to perform several actions:
pyemu.gw_utils.fac2real()
)MODFLOW
MODFLOW
list fileLucky for you, I already made this file....
In [ ]:
shutil.copy2(os.path.join("Freyberg_transient","forward_run.py"),os.path.join(ml.model_ws,"forward_run.py"))
In [ ]:
pyemu.utils.helpers.zero_order_tikhonov(pst,par_groups=["drn_cond"])
pst.prior_information.head()
Now, let's set preferred difference equations for pilot point groups. We will use the Pearson coef as the weight...
In [ ]:
pp_groups = pp_df.groupby("pargp").groups
for pargp,par_names in pp_groups.items():
this_pp_df = pp_df.loc[par_names,:]
cov = gs.covariance_matrix(this_pp_df.x,this_pp_df.y,this_pp_df.parnme)
pyemu.helpers.first_order_pearson_tikhonov(pst,cov,reset=False,abs_drop_tol=0.2)
In [ ]:
pst.prior_information
In [ ]:
pst.control_data.pestmode = "regularization"
Some things I like to add:
In [ ]:
pst.pestpp_options["svd_pack"] = "redsvd"
#pst.pestpp_options["forecasts"] =
In [ ]:
pst.write("freyberg_reg.pst")
In [ ]:
In [ ]:
In [ ]:
In [ ]: