In [ ]:
%matplotlib inline
import os
import numpy as np
import pandas
import matplotlib.pyplot as plt
import plot_domain
fig = plot_domain.henry_domain()

Model background

Here is an example based on the Henry saltwater intrusion problem. The synthetic model is a 2-dimensional SEAWAT model (X-Z domain) with 1 row, 120 columns and 20 layers. The left boundary is a specified flux of freshwater, the right boundary is a specified head and concentration saltwater boundary. The model has two stress periods: an initial steady state (calibration) period, then a transient period with less flux (forecast).

The inverse problem has 603 parameters: 600 hydraulic conductivity pilot points, 1 global hydraulic conductivity, 1 specified flux multiplier for history matching and 1 specified flux multiplier for forecast conditions. The inverse problem has 36 obseravtions (21 heads and 15 concentrations) measured at the end of the steady-state calibration period. The forecasts of interest of the distance from the left model edge to the 10% seawater concentration in the basal model layer and the concentration at location 10. Both of there forecasts are "measured" at the end of the forecast stress period. The forecasts are both in the Jacobian matrix as zero-weight observations named pd_ten and C_obs10_2.I previously calculated the jacobian matrix, which is in the henry/ folder, along with the PEST control file.

Using pyemu


In [ ]:
import pyemu

First create a linear_analysis object. We will use schur derived type, which replicates the behavior of the PREDUNC suite of PEST. We pass it the name of the jacobian matrix file. Since we don't pass an explicit argument for parcov or obscov, pyemu attempts to build them from the parameter bounds and observation weights in a pest control file (.pst) with the same base case name as the jacobian. Since we are interested in forecast uncertainty as well as parameter uncertainty, we also pass the names of the forecast sensitivity vectors we are interested in, which are stored in the jacobian as well. Note that the forecasts argument can be a mixed list of observation names, other jacobian files or PEST-compatible ASCII matrix files.


In [ ]:
la = pyemu.Schur(jco=os.path.join("henry", "pest.jcb"),verbose=False)

The screen output can be redirected to a log file by passing a file name to the verbose keyword argument. Or screen output can be stopped by passing False to the verbose argument

We can inspect the parcov and obscov attributes by saving them to files. We can save them PEST-compatible ASCII or binary matrices (.to_ascii() or .to_binary()), PEST-compatible uncertainty files (.to_uncfile()), or simply as numpy ASCII arrays (numpy.savetxt()). In fact, all matrix and covariance objects (including the forecasts) have these methods.


In [ ]:
la.parcov.to_uncfile(os.path.join("henry", "parcov.unc"), covmat_file=os.path.join("henry","parcov.mat"))

When saving an uncertainty file, if the covariance object is diagonal (self.isdiagonal == True), then you can force the uncertainty file to use standard deviation blocks instead of covariance matrix blocks by explicitly passing covmat_file as None:


In [ ]:
la.obscov.to_uncfile(os.path.join("henry", "obscov.unc"), covmat_file=None)

Posterior parameter uncertainty analysis

Let's calculate and save the posterior parameter covariance matrix:


In [ ]:
la.posterior_parameter.to_ascii(os.path.join("henry", "posterior.mat"))

You can open this file in a text editor to examine. The diagonal of this matrix is the posterior variance of each parameter. Since we already calculated the posterior parameter covariance matrix, additional calls to the posterior_parameter decorated method only require access:


In [ ]:
la.posterior_parameter.to_dataframe().sort_index().sort_index(axis=1).iloc[0:3,0:3] #look so nice in the notebook

We can see the posterior variance of each parameter along the diagonal of this matrix. Now, let's make a simple plot of prior vs posterior uncertainty for the 600 pilot point parameters


In [ ]:
par_sum = la.get_parameter_summary().sort_index()
par_sum.loc[par_sum.index[:20],"percent_reduction"].plot(kind="bar",figsize=(10,4),edgecolor="none")
par_sum.iloc[0:10,:]

We can see that the at most, the uncertainty of any one of the 600 hydraulic conductivity parameters is only reduced by 5% and the uncertainty of many parameters has not been reduced at all, meaning these parameters are not informed by the observations.

Prior forecast uncertainty

Now let's examine the prior and posterior variance of the forecasts:


In [ ]:
la.get_forecast_summary()

It is interesting that the uncertainty of the forecasts is reduced substantially even though the uncertainty for any one parameter is only slightly reduced. This is because the right combinations of forecast-sensitive parameters are being informed by the observations.

Data worth

Now, let's try to identify which observations are most important to reducing the posterior uncertainty (e.g.the forecast worth of every observation). We simply recalculate Schur's complement without some observations and see how the posterior forecast uncertainty increases

Let's see which observations are most important, which is measured by the increase in forecast uncertainty when that observation is left out


In [ ]:
df = la.get_removed_obs_importance()
df = 100.0 * (df - df.loc["base",:])/df 
ax = df.plot(kind="bar",figsize=(15,8))
ax.set_ylabel("percent uncertainty increase")

base row are the results of Schur's complement calculation using all observations. The increase in posterior forecast uncertainty for the head and conc cases show how much forecast uncertainty increases when the head and concentrations observations are not used in history matching

parameter contribution to forecast uncertainty

Lets look at which parameters are contributing most to forecast uncertainty, which we estimate as the decrese in forecast uncertainty from "perfect" knowledge of one or more parameters. for demostration purposes, lets group the hydraulic conductivity parameters by row.


In [ ]:
par_groups = {}
for pname in la.pst.par_names:
    if pname.startswith('k'):
        row = "k_row_"+pname[2:4]
        if row not in par_groups.keys():
            par_groups[row] = []
        par_groups[row].append(pname)

par_groups["global_k"] = "global_k"
par_groups["histmatch_mult"] = "mult1"
par_groups["forecast_mult"] = "mult2"

df = la.get_par_contribution(par_groups)
df.sort_index(inplace=True)
df

In [ ]:
df

In [ ]:
df.plot(kind="bar",figsize=(10,5))
plt.show()

We see that the largest contributions to forecast uncertainty depends on the forecast. Forecast pd_ten is most sensitive to hydraulic conductivity parameters in row 10. However, Forecast c_obs10_2 is most sensitive to the forecast_mult parameter.


In [ ]: