In this section, we demonstrate the Evidential Learning paradigm for predicting future field production rates as discussed in Chapter 7.2.3. We consider the situation where 5 producers and 3 injectors have already been drilled. The field has been in production for 4000 days, and production data is available for all 5 wells. A decision needs to be made regarding the economic feasibility of the field in the future. The decision of when to abandon the field is contingent on the field production from day 4000 to 9000 assuming current operating conditions.
The first stage of Evidential Learning is the construction of the prior models. A set of 500 prior models were constructed as discussed in Section 7.2.2. These prior models were forward modelled using a streamline simulator (3DSL) over all 9000 days, to encompass both the 4000 days of production data denoted by d, as well as the 5000 days of prediction h required to make the decision regarding abandonment. We begin by loading the simulation results, that have been marshalled into an appropiate structure.
In [1]:
addpath('../src/evidential/');
addpath('../src/thirdparty/fda_matlab/');
prior_path = '../data/evidential/Prior.mat';
load(prior_path);
We use a struct for storing both the prior and data and prediction variables. The struct has the following fields:
Name | Data Type | Dimensions | Description |
---|---|---|---|
time | double | $1 \times N_{t}$ | A vector containing the time (days, years) at each time step |
data | double | $N_{real}\times N_{t}\times N_{responses}$ | Data array that contains responses. Each row is one realization, each y is a time step, and each z is a different well |
name | string | 1 | The name of the response, ex: 'Oil Rate' |
spline | double | $2$ | First element specifies order of spline, second the number of knots that will be used for FDA |
ObjNames | CellArray | $N_{responses} \times 1$ | Contains the name of each well ex: {'P1','P2'} |
types | string | 1 | 'Data' or 'Prediction' |
For benchmarking, we will set one realization aside and use that as the truth. In this particular case, realization 464 was chosen as the reference. The production data from each of the 5 producers is shown below. The reference is shown in red ($\mathbf{d}_{obs}$), while the historical responses from the prior model are shown in gray ($\mathbf{d}$).
In [2]:
%plot inline -s 1600,1000
FontSize = 12;
% Set aside a realization to use as the "truth"
TruthRealization = 464;
NumPriorRealizations=length(PriorData.data);
AvailableRealizations = setdiff(1:NumPriorRealizations,TruthRealization);
PlotPriorResponses(PriorData,TruthRealization,FontSize);
PlotPriorResponses(PriorPrediction,TruthRealization,FontSize);
Each of the response time series are technically infinite dimension, but have been discretized to 80 time steps for purposes of flow simulation. We require a dimension reduction before we can build a statistical relationship between data and prediction. We start by performing Functional Data Analysis on each of the five historical rates that comprise the data variable (Chapter 2.6). We can achieve further compression by applying Functional Princpal Component Analysis (FPCA). The effectiveness of the compression can be verified by inspecting the scree plot. In this case, we will look at the decomposition of Producer #1's historic oil production rate.
In [3]:
%plot inline -s 1200,800
MinEigenValues = 3; EigenTolerance = 0.95;
% We first perform FPCA on both d and h
PriorData.spline=[3 40]; % 3rd order spline with 40 knots
PriorDataFPCA = ComputeHarmonicScores(PriorData,4);
PriorPrediction.spline=[3 20]; % 3rd order spline with 20 knots
PriorPredictionFPCA = ComputeHarmonicScores(PriorPrediction,0);
The scree plot shows that only the first 5 eigencomponents are required to capture over 95% of the variation in the response of all the prior models for Producer 1. The choice of spline is then iteratively adjusted to minimize the average RMS between the original. It is also important to ensure that oscillations caused by Runge’s phenomena do not occur over any of the models, when high order splines are used. We verify the effectiveness of the compression by reconstructing the original curves from the basis functions and coefficients. The same procedure is applied to $\mathbf{h}$ resulting in 4 eigencomponents.
In [4]:
%plot inline -s 1000,400
rmpath('../src/thirdparty/fda_matlab/');
[d_f, dobs_fpca] = MixedPCA(PriorDataFPCA,TruthRealization,EigenTolerance);
addpath('../src/thirdparty/fda_matlab/');
% Get number of FPCA components to keep for h
nHarmPred = GetNumHarmonics(PriorPredictionFPCA{1}, MinEigenValues, ...
EigenTolerance);
h_f = PriorPredictionFPCA{1}.harmscr(AvailableRealizations,1:nHarmPred);
% Plot prior models in functional space
PlotLowDimModels(d_f,h_f,dobs_fpca,'f',8);
So we have now effectively reduced our dimension from 5 producers and 80 element time series into 11 dimensions (7 from the producers $\mathbf{d}^{fpca}$ and 4 from the prediction $\mathbf{h}^{f}$. This enables the possibility of performing regression between data and prediction. However, this is contingent on a sufficiently strong linear correlation between the variables, or else the resulting estimates from regression will be inconsistent. In this example, the correlation between first functional components of data and forecast is rather poor. This is usually attributed to the presence of cross-correlations among the functional data variables. To fully capitalize on the full multi-variate correlation between all component of h and all components of d a Canonical Correlation Analysis (Chapter 2.5) is performed to transform the models into a canonical space.
In [5]:
%plot inline -s 1000,400
% Compute canonical transformation on the functional components
[A, B, ~, d_c,h_c] = canoncorr(d_f,h_f);
dobs_c=(dobs_fpca-mean(d_f))*A;
% Plot prior models in canonical space
PlotLowDimModels(d_c,h_c,dobs_c,'c',8);
We see that the correlation between the primary canonical components of the data and prediction variables is much higher than in the functional space. To apply linear Gaussian regression between $h_c$ and $d_c$, a normal score transform is required for $h_c$.
In [6]:
% Apply a normal score transform to the h_c
h_c_gauss = NormalScoreTransform(h_c,0);
The linear relationship between data and forecast implies we can express: $$\mathbf{d}^c =G\mathbf{h}^c$$ Likewise, a Gaussian likelihood model formulated as follows: $$L(\mathbf{h}^c) = exp(\frac{1}{2}(G\mathbf{h}^c-\mathbf{d}^c_{obs})^TC^{-1}_{\mathbf{d}^c} (G\mathbf{h}^c-\mathbf{d}^c_{obs})$$ where $C^{-1}_{d^c}$ is the data covariance matrix of the canonical components. Since prior and likelihood are multi-variate Gaussian, the posterior is also Gaussian and the posterior mean and covariance can be readily estimated using classical methods.
In [7]:
% Find best linear fit between Dc and h_c_gauss
G = d_c'/h_c_gauss';
% Compute misfit covariance
DDiff= d_c'-G*h_c_gauss';
C_T = DDiff*DDiff'/length(d_c);
With likelihood and prior multi-variate Gaussian and a linear model, the posterior distribution $f(h^c|d_{obs})$ is also multivariate Gaussian with mean $\tilde{h^c}$ and covariance model $\tilde{C_{h^ch^c}}$ using Equations 7.15 and 7.16:
In [8]:
% Compute posterior using Linear Gaussian Regression Equations
C_H = cov(h_c_gauss);
mu_prior = mean(h_c_gauss)';
mu_posterior = mu_prior + C_H*G'*pinv(G*C_H*G' + C_T)*(dobs_c'-G*mu_prior);
C_posterior = inv(G'*pinv(C_T)*G + inv(C_H));
In [16]:
%plot inline -s 600,400
warning('off','all');
addpath('../src/thirdparty/fda_matlab/');
NumPosteriorSamples = 100;
h_c_post = SampleCanonicalPosterior(mu_posterior,C_posterior,...
NumPosteriorSamples,h_c);
% Undo the CCA and FPCA transformations
h_reconstructed = UndoCanonicalFunctional(h_c_post, B, h_f,...
PriorPrediction.time, PriorPredictionFPCA);
We are not however, interested in posterior predictions in the canonical space; rather we need to transform them back into the time-domain. We thus need to undo each of the transformations. We start by undoing the normal score transform. We can then observe the reduction in variance in the posterior models (blue) versus prior (gray). This represents that a reduction in prediction uncertainty should be achieved. We need to undo the canonical transform, followed by the FPCA and FDA. We can then plot the posterior predictions in the time domain, as well as compute the updated P10,P50,P90. The posterior quantiles exhibit a significant reduction in uncertainty in comparison to the prior.
In [10]:
%plot inline -s 1800,800
[PriorQuantiles, PosteriorQuantiles] = ComputeQuantiles(...
PriorPrediction.data, h_reconstructed');
PlotPosteriorSamplesAndQuantiles(PriorPrediction,TruthRealization, ...
h_reconstructed',PriorQuantiles,PosteriorQuantiles);
In [ ]: