Introduction to the Gaussian process regression

Solusion to the Practice ploblem

Course

  • SURF 2016

Lecturer :

  • Georgios Karagiannis, Department of Mathematics, Purdue

July 8, 2016

Objective :

  • We wish to build a Gaussian Process regression --a probabilistic surrogate model-- in order to be able to emulate the output of the Piston Simulator, with respect to specified inputs.

Related material:

Readings :

- Rasmussen, Carl Edward. "Gaussian processes in machine learning." In Advanced lectures on machine learning, pp. 63-71. Springer Berlin Heidelberg, 2004.  
    - see http://www.GaussianProcess.org/gpml
    - Chapters: 2, 4, 5.1, & 5.4.2

- Slides provided

Software :

- R-cran (https://cran.r-project.org/)
- R packages 
    - DiceKrigin (https://cran.r-project.org/web/packages/DiceKriging/index.html)
    - lhs (https://cran.r-project.org/web/packages/lhs/index.html)
- Roustant, Olivier, David Ginsbourger, and Yves Deville. "DiceKriging, DiceOptim: Two R packages for the analysis of computer experiments by kriging-based metamodeling and optimization." (2012).
    - Plus dependences ...

The Piston Simulation function model

The Piston Simulation function models the circular motion of a piston within a cylinder. It involves a chain of nonlinear functions.

The response C is cycle time (the time it takes to complete one cycle), in seconds.

$$ C(x) = 2\pi \sqrt{\frac{M}{k+S^2}\frac{P_0 V_0}{T_0}\frac{T_0}{V^2}} $$$$ V = \frac{S}{2k} ( \sqrt{A^2 +4k\frac{P_0 V_0}{T_0}T_a} -A) $$$$ A = P_0 S 19.62M-\frac{kV_0}{S} $$

The input variables and their usual input ranges are:


--------------------------------------------------
M  ∈ [30, 60]        | piston weight (kg)
S  ∈ [0.005, 0.020]  | piston surface area (m2)
V0 ∈ [0.002, 0.010]  | initial gas volume (m3)
k  ∈ [1000, 5000]    | spring coefficient (N/m)
P0 ∈ [90000, 110000] | atmospheric pressure (N/m2)
Ta ∈ [290, 296]      | ambient temperature (K)
T0 ∈ [340, 360]      | filling gas temperature (K)
--------------------------------------------------

Here, we consider the output in the log scale.

=> As a result we wish to emulate the function f(M, S, V0 k, P0, Ta, T0) := log(C(M, S, V0 k, P0, Ta, T0)).

Reference: http://www.sfu.ca/~ssurjano/piston.html

Kenett, R., & Zacks, S. (1998). Modern industrial statistics: design and control of quality and reliability. Pacific Grove, CA: Duxbury press.

Software preparation


In [1]:
# DOWNLOAD THE R PACKAGES REQUIRED
install.packages('DiceKriging', repos = "http://cran.us.r-project.org")
install.packages('lhs', repos = "http://cran.us.r-project.org")
# install.packages('tcltk', repos = "http://cran.us.r-project.org")
# install.packages('aplpack', repos = "http://cran.us.r-project.org")


Installing package into ‘/export/users/gkaragia/R/x86_64-redhat-linux-gnu-library/3.3’
(as ‘lib’ is unspecified)
Warning message:
In install.packages("DiceKriging", repos = "http://cran.us.r-project.org"): installation of package ‘DiceKriging’ had non-zero exit statusInstalling package into ‘/export/users/gkaragia/R/x86_64-redhat-linux-gnu-library/3.3’
(as ‘lib’ is unspecified)
Installing package into ‘/export/users/gkaragia/R/x86_64-redhat-linux-gnu-library/3.3’
(as ‘lib’ is unspecified)
Warning message:
In install.packages("lhs", repos = "http://cran.us.r-project.org"): installation of package ‘lhs’ had non-zero exit status

In [2]:
# LOAD THE R PACKAGES REQUIRED
library('lhs')
library('DiceKriging')
# library('tcltk')
# library('aplpack')

In [3]:
# THIS IS THE SIMULATOR AND THE MIN AND MAX OF THE INPUTS

piston <- function(xx) {
  ##########################################################################
  #
  # PISTON FUNCTION
  #
  # Authors: Sonja Surjanovic, Simon Fraser University
  #          Derek Bingham, Simon Fraser University
  # Questions/Comments: Please email Derek Bingham at dbingham@stat.sfu.ca.
  #
  # Copyright 2013. Derek Bingham, Simon Fraser University.
  #
  # THERE IS NO WARRANTY, EXPRESS OR IMPLIED. WE DO NOT ASSUME ANY LIABILITY
  # FOR THE USE OF THIS SOFTWARE.  If software is modified to produce
  # derivative works, such modified software should be clearly marked.
  # Additionally, this program is free software; you can redistribute it 
  # and/or modify it under the terms of the GNU General Public License as 
  # published by the Free Software Foundation; version 2.0 of the License. 
  # Accordingly, this program is distributed in the hope that it will be 
  # useful, but WITHOUT ANY WARRANTY; without even the implied warranty 
  # of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 
  # General Public License for more details.
  #
  # For function details and reference information, see:
  # http://www.sfu.ca/~ssurjano/
  #
  ##########################################################################
  #
  # OUTPUT AND INPUT:
  #
  # C = cycle time
  # xx = c(M, S, V0, k, P0, Ta, T0)
  #
  ##########################################################################
  
  M  <- xx[1]
  S  <- xx[2]
  V0 <- xx[3]
  k  <- xx[4]
  P0 <- xx[5]
  Ta <- xx[6]
  T0 <- xx[7]
  
  Aterm1 <- P0 * S
  Aterm2 <- 19.62 * M
  Aterm3 <- -k*V0 / S
  A <- Aterm1 + Aterm2 + Aterm3
  
  Vfact1 <- S / (2*k)
  Vfact2 <- sqrt(A^2 + 4*k*(P0*V0/T0)*Ta)
  V <- Vfact1 * (Vfact2 - A)
  
  fact1 <- M
  fact2 <- k + (S^2)*(P0*V0/T0)*(Ta/(V^2))
  
  C <- 2 * pi * sqrt(fact1/fact2)
  return(C)
}
input_min <- c(30, 0.005, 0.002, 1000, 90000, 290, 340 )
input_max <- c(60, 0.020, 0.020, 5000, 110000, 296, 360 )
input_d <- length(input_min)

myfun <- function(xx) {return(log(piston(xx))) }

# PLOT THE REAL FUNCTION TO SEE HOW IT LOOKS LIKE

par(mfrow = c(3,3))
for (i in 1:input_d) {
    for ( j in 1:input_d ) 
        if(i>j) {
        n.grid <- 100 ;
        x1.grid <-seq(input_min[i],input_max[i],length.out=n.grid) ;
        x2.grid <-seq(input_min[j],input_max[j],length.out=n.grid) ;
        X.grid <- expand.grid( x1=x1.grid,  x2=x2.grid )
        myfun2d<-function(xx){ 
                            zz<-0.5*(input_min+input_max) ; 
                            zz[i]<-xx[1]; zz[j]<-xx[2]; 
                            return(myfun(zz)) 
                    }
        y.grid <- apply(X.grid,1,myfun2d)
    contour(x1.grid, x2.grid, matrix(y.grid, n.grid, n.grid), 10, 
            main = "Real function", 
            xlab = paste("x", as.character(i)), 
            ylab = paste("x", as.character(j)),
            xlim = c(input_min[i],input_max[i]), 
            ylim = c(input_min[j],input_max[j]))
    }
}


Generate a training data-set

Generate a training data-set $D={(x_i,y_i);i=1,...,n}$ of size $n=20$ via a LHS.


In [4]:
# GENERATE THE TRAINING DATA SET
n_data <- 20 ;
n_dim <- length(input_min)
X_data <- t(input_min + (input_max-input_min)*t(optimumLHS(n=n_data, k=n_dim))) ;
Y_data <- apply(X_data, 1, myfun) ;

# PLOT THE INPUT POINT TO SEE THE SPREAD
par(mfrow = c(3,3))
for (i in 1:n_dim) 
    for ( j in 1:n_dim ) 
        if(i>j) {
            plot(X_data[,i], X_data[,j], 
                main = "Design points", 
                xlab = paste("x", as.character(i)), 
                ylab = paste("x", as.character(j)) ,
                xlim = c(input_min[i],input_max[i]), 
                ylim = c(input_min[j],input_max[j])
                    )
            }


Have a look to the functions again


In [10]:
# We use the function:
#  km {DiceKriging}
?km


Out[10]:
km {DiceKriging}R Documentation

Fit and/or create kriging models

Description

km is used to fit kriging models when parameters are unknown, or to create km objects otherwise. In both cases, the result is a km object. If parameters are unknown, they are estimated by Maximum Likelihood. As a beta version, Penalized Maximum Likelihood Estimation is also possible if some penalty is given, or Leave-One-Out for noise-free observations.

Usage

km(formula=~1, design, response, covtype="matern5_2",
   coef.trend = NULL, coef.cov = NULL, coef.var = NULL,
   nugget = NULL, nugget.estim=FALSE, noise.var=NULL, estim.method="MLE",
   penalty = NULL, optim.method = "BFGS", lower = NULL, upper = NULL, 
   parinit = NULL, multistart = 1, control = NULL, gr = TRUE, 
   iso=FALSE, scaling=FALSE, knots=NULL, kernel=NULL)

Arguments

formula

an optional object of class "formula" specifying the linear trend of the kriging model (see lm). This formula should concern only the input variables, and not the output (response). If there is any, it is automatically dropped. In particular, no response transformation is available yet. The default is ~1, which defines a constant trend.

design

a data frame representing the design of experiments. The ith row contains the values of the d input variables corresponding to the ith evaluation

response

a vector (or 1-column matrix or data frame) containing the values of the 1-dimensional output given by the objective function at the design points.

covtype

an optional character string specifying the covariance structure to be used, to be chosen between "gauss", "matern5_2", "matern3_2", "exp" or "powexp". See a full description of available covariance kernels in covTensorProduct-class. Default is "matern5_2". See also the argument kernel that allows the user to build its own covariance structure.

coef.trend,

(see below)

coef.cov,

(see below)

coef.var

optional vectors containing the values for the trend, covariance and variance parameters. For estimation, 4 cases are implemented: 1. (All unknown) If all are missing, all are estimated. 2. (All known) If all are provided, no estimation is performed; 3. (Known trend) If coef.trend is provided but at least one of coef.cov or coef.var is missing, then BOTH coef.cov and coef.var are estimated; 4. (Unknown trend) If coef.cov and coef.var are provided but coef.trend is missing, then coef.trend is estimated (GLS formula).

nugget

an optional variance value standing for the homogeneous nugget effect.

nugget.estim

an optional boolean indicating whether the nugget effect should be estimated. Note that this option does not concern the case of heterogeneous noisy observations (see noise.var below). If nugget is given, it is used as an initial value. Default is FALSE.

noise.var

for noisy observations : an optional vector containing the noise variance at each observation. This is useful for stochastic simulators. Default is NULL.

estim.method

a character string specifying the method by which unknown parameters are estimated. Default is "MLE" (Maximum Likelihood). At this stage, a beta version of leave-One-Out estimation (estim.method="LOO") is also implemented for noise-free observations.

penalty

(beta version) an optional list suitable for Penalized Maximum Likelihood Estimation. The list must contain the item fun indicating the penalty function, and the item value equal to the value of the penalty parameter. At this stage the only available fun is "SCAD", and covtype must be "gauss". Default is NULL, corresponding to (un-penalized) Maximum Likelihood Estimation.

optim.method

an optional character string indicating which optimization method is chosen for the likelihood maximization. "BFGS" is the optim quasi-Newton procedure of package stats, with the method "L-BFGS-B". "gen" is the genoud genetic algorithm (using derivatives) from package rgenoud (>= 5.3.3).

lower,

(see below)

upper

optional vectors containing the bounds of the correlation parameters for optimization. The default values are given by covParametersBounds.

parinit

an optional vector containing the initial values for the variables to be optimized over. If no vector is given, an initial point is generated as follows. For method "gen", the initial point is generated uniformly inside the hyper-rectangle domain defined by lower and upper. For method "BFGS", some points (see control below) are generated uniformly in the domain. Then the best point with respect to the likelihood (or penalized likelihood, see penalty) criterion is chosen.

multistart

an optional integer indicating the number of initial points from which running the BFGS optimizer. These points will be selected as the best multistart one(s) among those evaluated (see above parinit). The multiple optimizations will be performed in parallel provided that a parallel backend is registered (see package foreach).

control

an optional list of control parameters for optimization. See details below.

gr

an optional boolean indicating whether the analytical gradient should be used. Default is TRUE.

iso

an optional boolean that can be used to force a tensor-product covariance structure (see covTensorProduct-class) to have a range parameter common to all dimensions. Default is FALSE. Not used (at this stage) for the power-exponential type.

scaling

an optional boolean indicating whether a scaling on the covariance structure should be used.

knots

an optional list of knots for scaling. The j-th element is a vector containing the knots for dimension j. If scaling=TRUE and knots are not specified, than knots are fixed to 0 and 1 in each dimension (which corresponds to affine scaling for the domain [0,1]^d).

kernel

an optional function containing a new covariance structure. At this stage, the parameters must be provided as well, and are not estimated. See an example below.

Details

The optimisers are tunable by the user by the argument control. Most of the control parameters proposed by BFGS and genoud can be passed to control except the ones that must be forced [for the purpose of optimization setting], as indicated in the table below. See optim and genoud to get more details about them.

BFGS trace, parscale, ndeps, maxit, abstol, reltol, REPORT, lnm, factr, pgtol
genoud all parameters EXCEPT: fn, nvars, max, starting.values, Domains, gr, gradient.check, boundary.enforcement, hessian and optim.method.

Notice that the right places to specify the optional starting values and boundaries are in parinit and lower, upper, as explained above. Some additional possibilities and initial values are indicated in the table below:

trace Turn it to FALSE to avoid printing during optimization progress.
pop.size For method "BFGS", it is the number of candidate initial points generated before optimization starts (see parinit above). Default is 20. For method "gen", "pop.size" is the population size, set by default at min(20, 4+3*log(nb of variables)
max.generations Default is 5
wait.generations Default is 2
BFGSburnin Default is 0

Value

An object of class km (see km-class).

Author(s)

O. Roustant, D. Ginsbourger, Ecole des Mines de St-Etienne.

References

N.A.C. Cressie (1993), Statistics for spatial data, Wiley series in probability and mathematical statistics.

D. Ginsbourger (2009), Multiples metamodeles pour l'approximation et l'optimisation de fonctions numeriques multivariables, Ph.D. thesis, Ecole Nationale Superieure des Mines de Saint-Etienne, 2009.

D. Ginsbourger, D. Dupuy, A. Badea, O. Roustant, and L. Carraro (2009), A note on the choice and the estimation of kriging models for the analysis of deterministic computer experiments, Applied Stochastic Models for Business and Industry, 25 no. 2, 115-131.

A.G. Journel and M.E. Rossi (1989), When do we need a trend model in kriging ?, Mathematical Geology, 21 no. 7, 715-739.

D.G. Krige (1951), A statistical approach to some basic mine valuation problems on the witwatersrand, J. of the Chem., Metal. and Mining Soc. of South Africa, 52 no. 6, 119-139.

R. Li and A. Sudjianto (2005), Analysis of Computer Experiments Using Penalized Likelihood in Gaussian Kriging Models, Technometrics, 47 no. 2, 111-120.

K.V. Mardia and R.J. Marshall (1984), Maximum likelihood estimation of models for residual covariance in spatial regression, Biometrika, 71, 135-146.

J.D. Martin and T.W. Simpson (2005), Use of kriging models to approximate deterministic computer models, AIAA Journal, 43 no. 4, 853-863.

G. Matheron (1969), Le krigeage universel, Les Cahiers du Centre de Morphologie Mathematique de Fontainebleau, 1.

W.R. Jr. Mebane and J.S. Sekhon, in press (2009), Genetic optimization using derivatives: The rgenoud package for R, Journal of Statistical Software.

J.-S. Park and J. Baek (2001), Efficient computation of maximum likelihood estimators in a spatial linear model with power exponential covariogram, Computer Geosciences, 27 no. 1, 1-7.

C.E. Rasmussen and C.K.I. Williams (2006), Gaussian Processes for Machine Learning, the MIT Press, http://www.GaussianProcess.org/gpml

See Also

kmData for another interface with the data, show,km-method, predict,km-method, plot,km-method. Some programming details and initialization choices can be found in kmEstimate, kmNoNugget.init, km1Nugget.init and kmNuggets.init

Examples


# ----------------------------------
# A 2D example - Branin-Hoo function
# ----------------------------------

# a 16-points factorial design, and the corresponding response
d <- 2; n <- 16
design.fact <- expand.grid(x1=seq(0,1,length=4), x2=seq(0,1,length=4))
y <- apply(design.fact, 1, branin) 

# kriging model 1 : matern5_2 covariance structure, no trend, no nugget effect
m1 <- km(design=design.fact, response=y)

# kriging model 2 : matern5_2 covariance structure, 
#                   linear trend + interactions, no nugget effect
m2 <- km(~.^2, design=design.fact, response=y)

# graphics 
n.grid <- 50
x.grid <- y.grid <- seq(0,1,length=n.grid)
design.grid <- expand.grid(x1=x.grid, x2=y.grid)
response.grid <- apply(design.grid, 1, branin)
predicted.values.model1 <- predict(m1, design.grid, "UK")$mean
predicted.values.model2 <- predict(m2, design.grid, "UK")$mean
par(mfrow=c(3,1))
contour(x.grid, y.grid, matrix(response.grid, n.grid, n.grid), 50, main="Branin")
points(design.fact[,1], design.fact[,2], pch=17, cex=1.5, col="blue")
contour(x.grid, y.grid, matrix(predicted.values.model1, n.grid, n.grid), 50, 
        main="Ordinary Kriging")
points(design.fact[,1], design.fact[,2], pch=17, cex=1.5, col="blue")
contour(x.grid, y.grid, matrix(predicted.values.model2, n.grid, n.grid), 50, 
        main="Universal Kriging")
points(design.fact[,1], design.fact[,2], pch=17, cex=1.5, col="blue")
par(mfrow=c(1,1))


# (same example) how to use the multistart argument
# -------------------------------------------------
require(foreach)

# below an example for a computer with 2 cores, but also work with 1 core

nCores <- 2
require(doParallel)
cl <-  makeCluster(nCores) 
registerDoParallel(cl)

# kriging model 1, with 4 starting points
m1_4 <- km(design=design.fact, response=y, multistart=4)

stopCluster(cl)

# -------------------------------
# A 1D example with penalized MLE
# -------------------------------

# from Fang K.-T., Li R. and Sudjianto A. (2006), "Design and Modeling for 
# Computer Experiments", Chapman & Hall, pages 145-152

n <- 6; d <- 1
x <- seq(from=0, to=10, length=n)
y <- sin(x)
t <- seq(0,10, length=100)

# one should add a small nugget effect, to avoid numerical problems
epsilon <- 1e-3
model <- km(formula<- ~1, design=data.frame(x=x), response=data.frame(y=y), 
            covtype="gauss", penalty=list(fun="SCAD", value=3), nugget=epsilon)

p <- predict(model, data.frame(x=t), "UK")

plot(t, p$mean, type="l", xlab="x", ylab="y", 
                     main="Prediction via Penalized Kriging")
points(x, y, col="red", pch=19)
lines(t, sin(t), lty=2, col="blue")
legend(0, -0.5, legend=c("Sine Curve", "Sample", "Fitted Curve"), 
       pch=c(-1,19,-1), lty=c(2,-1,1), col=c("blue","red","black"))


# ------------------------------------------------------------------------
# A 1D example with known trend and known or unknown covariance parameters
# ------------------------------------------------------------------------

x <- c(0, 0.4, 0.6, 0.8, 1);
y <- c(-0.3, 0, -0.8, 0.5, 0.9)

theta <- 0.01; sigma <- 3; trend <- c(-1,2)

model <- km(~x, design=data.frame(x=x), response=data.frame(y=y), 
            covtype="matern5_2", coef.trend=trend, coef.cov=theta, 
            coef.var=sigma^2)

# below: if you want to specify trend only, and estimate both theta and sigma:
# model <- km(~x, design=data.frame(x=x), response=data.frame(y=y), 
#             covtype="matern5_2", coef.trend=trend, lower=0.2)
# Remark: a lower bound or penalty function is useful here,
#         due to the very small number of design points...

# kriging with gaussian covariance C(x,y)=sigma^2 * exp(-[(x-y)/theta]^2), 
#         and linear trend t(x) = -1 + 2x

t <- seq(from=0, to=1, by=0.005)
p <- predict(model, newdata=data.frame(x=t), type="SK")
# beware that type = "SK" for known parameters (default is "UK")

plot(t, p$mean, type="l", ylim=c(-7,7), xlab="x", ylab="y")
lines(t, p$lower95, col="black", lty=2)
lines(t, p$upper95, col="black", lty=2)
points(x, y, col="red", pch=19)
abline(h=0)


# --------------------------------------------------------------
# Kriging with noisy observations (heterogeneous noise variance)
# --------------------------------------------------------------

fundet <- function(x){
return((sin(10*x)/(1+x)+2*cos(5*x)*x^3+0.841)/1.6)
}

level <- 0.5; epsilon <- 0.1
theta <- 1/sqrt(30); p <- 2; n <- 10
x <- seq(0,1, length=n)

# Heteregeneous noise variances: number of Monte Carlo evaluation among 
#                                a total budget of 1000 stochastic simulations
MC_numbers <- c(10,50,50,290,25,75,300,10,40,150)
noise.var <- 3/MC_numbers

# Making noisy observations from 'fundet' function (defined above)
y <- fundet(x) + noise.var*rnorm(length(x))

# kriging model definition (no estimation here)
model <- km(y~1, design=data.frame(x=x), response=data.frame(y=y), 
            covtype="gauss", coef.trend=0, coef.cov=theta, coef.var=1, 
            noise.var=noise.var)

# prediction
t <- seq(0, 1, by=0.01)
p <- predict.km(model, newdata=data.frame(x=t), type="SK")
lower <- p$lower95; upper <- p$upper95

# graphics
par(mfrow=c(1,1))
plot(t, p$mean, type="l", ylim=c(1.1*min(c(lower,y)) , 1.1*max(c(upper,y))), 
                xlab="x", ylab="y",col="blue", lwd=1.5)
polygon(c(t,rev(t)), c(lower, rev(upper)), col=gray(0.9), border = gray(0.9))
lines(t, p$mean, type="l", ylim=c(min(lower) ,max(upper)), xlab="x", ylab="y",
                 col="blue", lwd=1)
lines(t, lower, col="blue", lty=4, lwd=1.7)
lines(t, upper, col="blue", lty=4, lwd=1.7)
lines(t, fundet(t), col="black", lwd=2)
points(x, y, pch=8,col="blue")
text(x, y, labels=MC_numbers, pos=3)


# -----------------------------
# Checking parameter estimation 
# -----------------------------

d <- 3       	# problem dimension
n <- 40			# size of the experimental design
design <- matrix(runif(n*d), n, d)

covtype <- "matern5_2"		
theta <- c(0.3, 0.5, 1)		# the parameters to be found by estimation
sigma <- 2
nugget <- NULL  # choose a numeric value if you want to estimate nugget 
nugget.estim <- FALSE # choose TRUE if you want to estimate it

n.simu <- 30		# number of simulations
sigma2.estimate <- nugget.estimate <- mu.estimate <- matrix(0, n.simu, 1)
coef.estimate <- matrix(0, n.simu, length(theta))

model <- km(~1, design=data.frame(design), response=rep(0,n), covtype=covtype, 
            coef.trend=0, coef.cov=theta, coef.var=sigma^2, nugget=nugget)
y <- simulate(model, nsim=n.simu)

for (i in 1:n.simu) {
	# parameter estimation: tune the optimizer by changing optim.method, control
	model.estimate <- km(~1, design=data.frame(design), response=data.frame(y=y[i,]), 
	covtype=covtype, optim.method="BFGS", control=list(pop.size=50, trace=FALSE), 
        nugget.estim=nugget.estim) 
	
	# store results
	coef.estimate[i,] <- covparam2vect(model.estimate@covariance)
	sigma2.estimate[i] <- model.estimate@covariance@sd2
	mu.estimate[i] <- model.estimate@trend.coef
	if (nugget.estim) nugget.estimate[i] <- model.estimate@covariance@nugget
}

# comparison true values / estimation
cat("\nResults with ", n, "design points, 
    obtained with ", n.simu, "simulations\n\n",
    "Median of covar. coef. estimates: ", apply(coef.estimate, 2, median), "\n",
    "Median of trend  coef. estimates: ", median(mu.estimate), "\n", 
    "Mean of the var. coef. estimates: ", mean(sigma2.estimate))
if (nugget.estim) cat("\nMean of the nugget effect estimates: ", 
                      mean(nugget.estimate))

# one figure for this specific example - to be adapted
split.screen(c(2,1))        # split display into two screens
split.screen(c(1,2), screen = 2) # now split the bottom half into 3

screen(1)
boxplot(coef.estimate[,1], coef.estimate[,2], coef.estimate[,3], 
        names=c("theta1", "theta2", "theta3"))
abline(h=theta, col="red")
fig.title <- paste("Empirical law of the parameter estimates 
                    (n=", n , ", n.simu=", n.simu, ")", sep="")
title(fig.title)

screen(3)
boxplot(mu.estimate, xlab="mu")
abline(h=0, col="red")

screen(4)
boxplot(sigma2.estimate, xlab="sigma2")
abline(h=sigma^2, col="red")

close.screen(all = TRUE)  

# ----------------------------------------------------------
# Kriging with non-linear scaling on Xiong et al.'s function
# ----------------------------------------------------------

f11_xiong <- function(x){ 
return( sin(30*(x - 0.9)^4)*cos(2*(x - 0.9)) + (x - 0.9)/2)
}

t <- seq(0,1,,300)
f <- f11_xiong(t)

plot(t,f,type="l", ylim=c(-1,0.6), lwd=2)

doe <- data.frame(x=seq(0,1,,20))
resp <- f11_xiong(doe)

knots <- list(  c(0,0.5,1) ) 
eta <- list(c(15, 2, 0.5))
m <- km(design=doe, response=resp, scaling=TRUE, gr=TRUE, 
knots=knots, covtype="matern5_2",  coef.var=1, coef.trend=0)

p <- predict(m, data.frame(x=t), "UK")

plot(t, f, type="l", ylim=c(-1,0.6), lwd=2)

lines(t, p$mean, col="blue", lty=2, lwd=2)
lines(t, p$mean + 2*p$sd, col="blue")
lines(t, p$mean - 2*p$sd,col="blue")

abline(v=knots[[1]], lty=2, col="green")


# -----------------------------------------------------
# Kriging with a symmetric kernel: example with covUser
# -----------------------------------------------------

x <- c(0, 0.15, 0.3, 0.4, 0.5)
y <- c(0.3, -0.2, 0, 0.5, 0.2)

k <- function(x,y) {
  theta <- 0.15
  0.5*exp(-((x-y)/theta)^2) + 0.5*exp(-((1-x-y)/theta)^2)    
}

muser <- km(design=data.frame(x=x), response=data.frame(y=y), 
            coef.trend=0, kernel=k)

u <- seq(from=0, to=1, by=0.01)
puser <- predict(muser, newdata=data.frame(x=u), type="SK")

set.seed(0)
nsim <- 5
zuser <- simulate(muser, nsim=nsim, newdata=data.frame(x=u), cond=TRUE, nugget.sim=1e-8)
par(mfrow=c(1,1))
matplot(u, t(zuser), type="l", lty=rep("solid", nsim), col=1:5, lwd=1)
polygon(c(u, rev(u)), c(puser$upper, rev(puser$lower)), col="lightgrey", border=NA)
lines(u, puser$mean, lwd=5, col="blue", lty="dotted")
matlines(u, t(zuser), type="l", lty=rep("solid", nsim), col=1:5, lwd=1)
points(x, y, pch=19, cex=1.5)


[Package DiceKriging version 1.5.5 ]

Compare different GP regression models

Use the diagnostics we leant, and compare Gaussian process regression models deffering on the prior linear trend structure or the covariance function


In [116]:
# Diagnostics (Feel free to use others that you know)

R2 <-function (Y, Ypred) {
    Ypred <- as.numeric(Ypred)
    Y <- as.numeric(Y)
    return(1 - mean((Y - Ypred)^2)/mean((Y - mean(Y))^2))
}

RMSE <- function (Y, Ypred)  {
    Ypred <- as.numeric(Ypred)
    Y <- as.numeric(Y)
    return(sqrt(mean((Y - Ypred)^2)))
}

MAE <- function (Y, Ypred)  {
    Ypred <- as.numeric(Ypred) 
    Y <- as.numeric(Y)
    return(mean(abs(Y - Ypred)))
}

In [117]:
# Train different GP regressions differing on the covariance function

myfun_km_matern5_2 <- km( formula = ~ 1 , # linear trend formula; E.G.: "formula = ~ 1 +x1 +exp(x2) +I(x1*x2) +I(x1^2)"
                design = data.frame(x1=X_data[,1], 
                                    x2=X_data[,2], 
                                    x3=X_data[,3], 
                                    x4=X_data[,4], 
                                    x5=X_data[,5], 
                                    x6=X_data[,6], 
                                    x7=X_data[,7]), # a data frame representing the design of experiments.
                response=Y_data,                    # the values of the 1-dimensional outpu 
                covtype="matern5_2",                # covariance structure 
                coef.trend = NULL,                  # values for the trend parameters
                coef.cov = NULL,                    # values for the covariance parameters
                coef.var = NULL,                    # values for the variance parameters
                nugget= 1e-7,                       # the homogeneous nugget effect
                noise.var = NULL,                   # containing the noise variance at each observation
                optim.method = "BFGS",              # optimization method is chosen for the likelihood maximization.
                lower = NULL,                       # lower bounds of the correlation parameters for optimization
                upper = NULL,                       # upper bounds of the correlation parameters for optimizati
                control =list(trace=FALSE),         # =list(trace=FALSE) to supress optimization trace information
                kernel=NULL)                        # a user's new covariance structure

myfun_km_matern3_2  <- km( formula = ~ 1 , # linear trend formula; E.G.: "formula = ~ 1 +x1 +exp(x2) +I(x1*x2) +I(x1^2)"
                design = data.frame(x1=X_data[,1], 
                                    x2=X_data[,2], 
                                    x3=X_data[,3], 
                                    x4=X_data[,4], 
                                    x5=X_data[,5], 
                                    x6=X_data[,6], 
                                    x7=X_data[,7]), # a data frame representing the design of experiments.
                response=Y_data,                    # the values of the 1-dimensional outpu 
                covtype="matern3_2",                # covariance structure 
                coef.trend = NULL,                  # values for the trend parameters
                coef.cov = NULL,                    # values for the covariance parameters
                coef.var = NULL,                    # values for the variance parameters
                nugget= 1e-7,                       # the homogeneous nugget effect
                noise.var = NULL,                   # containing the noise variance at each observation
                optim.method = "BFGS",              # optimization method is chosen for the likelihood maximization.
                lower = NULL,                       # lower bounds of the correlation parameters for optimization
                upper = NULL,                       # upper bounds of the correlation parameters for optimizati
                control =list(trace=FALSE),         # =list(trace=FALSE) to supress optimization trace information
                kernel=NULL)  

myfun_km_gauss <- km( formula = ~ 1 ,        # linear trend formula; E.G.: "formula = ~ 1 +x1 +exp(x2) +I(x1*x2) +I(x1^2)"
                design = data.frame(x1=X_data[,1], 
                                    x2=X_data[,2], 
                                    x3=X_data[,3], 
                                    x4=X_data[,4], 
                                    x5=X_data[,5], 
                                    x6=X_data[,6], 
                                    x7=X_data[,7]), # a data frame representing the design of experiments.
                response=Y_data,                    # the values of the 1-dimensional outpu 
                covtype="gauss",                    # covariance structure 
                coef.trend = NULL,                  # values for the trend parameters
                coef.cov = NULL,                    # values for the covariance parameters
                coef.var = NULL,                    # values for the variance parameters
                nugget= 1e-7,                       # the homogeneous nugget effect
                noise.var = NULL,                   # containing the noise variance at each observation
                optim.method = "BFGS",              # optimization method is chosen for the likelihood maximization.
                lower = NULL,                       # lower bounds of the correlation parameters for optimization
                upper = NULL,                       # upper bounds of the correlation parameters for optimizati
                control =list(trace=FALSE),         # =list(trace=FALSE) to supress optimization trace information
                kernel=NULL)                        # a user's new covariance structure

myfun_km_exp <- km( formula = ~ 1 , # linear trend formula; E.G.: "formula = ~ 1 +x1 +exp(x2) +I(x1*x2) +I(x1^2)"
                design = data.frame(x1=X_data[,1], 
                                    x2=X_data[,2], 
                                    x3=X_data[,3], 
                                    x4=X_data[,4], 
                                    x5=X_data[,5], 
                                    x6=X_data[,6], 
                                    x7=X_data[,7]), # a data frame representing the design of experiments.
                response=Y_data,                    # the values of the 1-dimensional outpu 
                covtype="exp",                      # covariance structure 
                coef.trend = NULL,                  # values for the trend parameters
                coef.cov = NULL,                    # values for the covariance parameters
                coef.var = NULL,                    # values for the variance parameters
                nugget= 1e-7,                       # the homogeneous nugget effect
                noise.var = NULL,                   # containing the noise variance at each observation
                optim.method = "BFGS",              # optimization method is chosen for the likelihood maximization.
                lower = NULL,                       # lower bounds of the correlation parameters for optimization
                upper = NULL,                       # upper bounds of the correlation parameters for optimizati
                control =list(trace=FALSE),         # =list(trace=FALSE) to supress optimization trace information
                kernel=NULL)                        # a user's new covariance structure

# CROSS VALIDATION CRITERIA
Y_pred <- leaveOneOut.km(myfun_km_matern5_2, "UK")$mean
CV_matern5_2 <-data.frame( R2 = R2(Y_data, Y_pred),
                            RMSE = RMSE(Y_data, Y_pred),
                            MAE = MAE(Y_data, Y_pred), row.names="matern5_2")


Y_pred <- leaveOneOut.km(myfun_km_matern3_2, "UK")$mean
CV_matern3_2 <-data.frame( R2 = R2(Y_data, Y_pred),
                            RMSE = RMSE(Y_data, Y_pred),
                            MAE = MAE(Y_data, Y_pred), row.names="matern3_2")

Y_pred <- leaveOneOut.km(myfun_km_gauss, "UK")$mean
CV_gauss <-data.frame( R2 = R2(Y_data, Y_pred),
                            RMSE = RMSE(Y_data, Y_pred),
                            MAE = MAE(Y_data, Y_pred), row.names="gauss~~~~")

Y_pred <- leaveOneOut.km(myfun_km_exp, "UK")$mean
CV_exp <-data.frame( R2 = R2(Y_data, Y_pred),
                            RMSE = RMSE(Y_data, Y_pred),
                            MAE = MAE(Y_data, Y_pred), row.names="exp~~~~~~")


# PRINT THE RESULTS TO COMPARE

CV_matern5_2
CV_matern3_2
CV_gauss
CV_exp


Out[117]:
R2RMSEMAE
matern5_20.859997130.115535140.08513416
Out[117]:
R2RMSEMAE
matern3_20.880993060.106520150.07850884
Out[117]:
R2RMSEMAE
gauss~~~~0.894575600.100257340.07906424
Out[117]:
R2RMSEMAE
exp~~~~~~0.59985590.19532320.1518888

Choose the 'Best' model


In [118]:
# SELECT THE BEST GAUSSIAN PROCESS

myfun_km <- myfun_km_gauss

Check the validity of the chossen 'Best' model


In [119]:
# CROSS VALIDATION CRITERIA
Y_pred <- leaveOneOut.km(myfun_km, "UK")$mean
data.frame(
    R2 = R2(Y_data, Y_pred),
    RMSE = RMSE(Y_data, Y_pred),
    MAE = MAE(Y_data, Y_pred))

# PLOT DIAGNOSTICS
plot(myfun_km)


Out[119]:
R2RMSEMAE
10.894575600.100257340.07906424

In [120]:
# PRINT THE PARAMETERS

show(myfun_km)


Call:
km(formula = ~1, design = data.frame(x1 = X_data[, 1], x2 = X_data[, 
    2], x3 = X_data[, 3], x4 = X_data[, 4], x5 = X_data[, 5], 
    x6 = X_data[, 6], x7 = X_data[, 7]), response = Y_data, covtype = "gauss", 
    coef.trend = NULL, coef.cov = NULL, coef.var = NULL, nugget = 1e-07, 
    noise.var = NULL, optim.method = "BFGS", lower = NULL, upper = NULL, 
    control = list(trace = FALSE), kernel = NULL)

Trend  coeff.:
               Estimate
 (Intercept)    -0.7066

Covar. type  : gauss 
Covar. coeff.:
               Estimate 
   theta(x1)    37.9296 
   theta(x2)     0.0134 
   theta(x3)     0.0109 
   theta(x4)  7517.9779 
   theta(x5)  32294.7850
   theta(x6)    10.8707 
   theta(x7)    39.9245 

Variance estimate: 0.1318864

Nugget effect : 1e-07

Plot the posterior GPR mean and variance, as well as the real function (in contour plots)


In [121]:
# PLOT THE GPR MEAN AND VARIANCE , AS WELL AS THE REAL FUNCTION

d_input <- length(input_min)

# Contour plot
par(mfrow = c(3,3))
for (i in 1:d_input) 
    for ( j in 1:d_input ) 
        if(i>j) {
            
            # GET THE GRID FOR INPUTS

            n.grid <- 100 ;
            x1.grid <-seq(input_min[i],input_max[i],length.out=n.grid) ;
            x2.grid <-seq(input_min[j],input_max[j],length.out=n.grid) ;
            X.grid <- expand.grid( x1=x1.grid,  x2=x2.grid )
            
            # COMPUTE THE REAL FUNCTION
            
            myfun_2d<-function(xx){ 
                        zz<-0.5*(input_min+input_max) ; 
                        zz[i]<-xx[1]; zz[j]<-xx[2]; 
                        return(myfun(zz)) 
            }
            y.grid <- apply(X.grid, 1, myfun_2d)
            
            contour(x1.grid, x2.grid, matrix(y.grid, n.grid, n.grid), 11, main = "Real function")
            points(X_data[ , i], X_data[ , j], pch = 19, cex = 1.5, col = "red")  
            
            # COMPUTE THE GPR MEAN
            
            myfun_km_pred_2d_mean<-function(xx){ 
                                zz<-0.5*(input_min+input_max) ; 
                                zz[i]<-xx[1]; zz[j]<-xx[2]; 
                                return( predict(myfun_km, t(matrix(zz)), "UK", checkNames=FALSE)$mean ) 
            }
            y.pred.grid_mean <- apply(X.grid, 1, myfun_km_pred_2d_mean)  
            
            contour(x1.grid, x2.grid, matrix(y.pred.grid_mean, n.grid, n.grid), 11, main = "Posterior GP regression mean")
            points(X_data[ , i], X_data[ , j], pch = 19, cex = 1.5, col = "red") 
            
            # COMPUTE THE GPR VARIANCE
            
            myfun_km_pred_2d_sd<-function(xx){ 
                                zz<-0.5*(input_min+input_max) ; 
                                zz[i]<-xx[1]; zz[j]<-xx[2]; 
                                return( predict(myfun_km, t(matrix(zz)), "UK", checkNames=FALSE)$sd ) 
            }
            y.pred.grid_sd <- apply(X.grid, 1, myfun_km_pred_2d_sd)          

            contour(x1.grid, x2.grid, matrix(y.pred.grid_sd^2, n.grid, n.grid), 15, main = "Posterior GP regression variance")
            points(X_data[ , i], X_data[ , j], pch = 19, cex = 1.5, col = "red")

            }