Module 1: Setting up the problem

Introduction

Geophysical surveys consist of a similar basic framework. An energy source is delivered into the earth, which can be natural (for example, the Earth's magnetic field) or human-made (current in the ground, acoustic wave energy, etc.), and this stimulates a response according to the variation in physical properties in the subsurface. At the surface, receivers pick up the signal and record this as data.


The goal of inversion is to find a model of the physical property distribution in the earth that produced the data. This is a difficult process because (1) information about a physical property for each datum is encoded in a complex way, and (2) we have a finite amount of data and cannot represent the physical property distribution everywhere.

Inversion is a multistep process, often represented as a workflow.


The goal of this module is to cover the first section of the workflow, which discretizes the data and places the values of the functions onto a mesh. This will be done using the following steps:

(1) Start with an expression that relates a kernel function with the continuous distribution of a physical property.
(2) Discretize this expression, and introduce a simple example problem to illustrate the mathematics in detail.
(3) Define a mesh that organizes our information.
(4) Build up the matrix equation $d = Gm$.
(5) Generalize the form of the problem from the example.
(6) Implement the example problem in Python as a forward problem.

But first, here are some fundamental definitions:

The general mathematical description of the inverse problem can be written as follows:

\begin{equation} F_i[m]= d_i +n_i \quad \text{for} \quad i=1,..,N \; \text{where}\end{equation}


  • $F[m]$ is a forward modeling operator. $F[m]$ incorporates the survey design and the physics of the problem. In the linear case we represent this as the matrix $G$.
  • $\bf{m}$ is a generic symbol for a physical property distribution, i.e. the model.
  • $\bf{d}$ represents the observed data, (sometimes also represented as $\bf{d^{obs}}$).
  • $\bf{n}$ is a term that represents additive noise.

This is, of course, the most general formulation of the problem. In this module we will consider the simplest case, which is (a) one dimensional (this can be likened to a survey that varies as a function of depth only) and (b) linear, in which our forward modeling operator becomes a matrix G in the matrix equation:

\begin{equation}d = Gm\end{equation}

Step 1: Physical property distribution and the kernel function

Each datum ($d_i$) collected is a volumetric response, that is to say, every datum measures the response of the whole volume (within the range of the system); it records the superposition of effects caused by all the material in the ground, and is therefore naturally represented as an integral. A "kernel function" or "sensitivity function", $g(x,y,z)$, shows how a datum is affected by all the subsurface. It describes the physics of the problem. The model, $m(x,y,z)$, represents the distribution of a physical property in the volume. Since each datum measures the response of the kernel function with the physical property distribution in the volume, for a continuous medium we can express this relationship as the inner product of the kernel function and the model. In the one dimensional case the expression for the ith datum is written as:

\begin{equation}d_i = \int_a^b g_i(x) m(x) dx \end{equation}


where again: \begin{equation} d :=\text{measured data} \\ g :=\text{kernel function} \\ m :=\text{physical property}\\ a,b := \text{the domain of consideration}\\ \end{equation}

Step 2: Discretize the expression

While the integral expression describes the inner product in a continuum, it is not possible to maintain this representation on a computer, so the above equation must be discretized. Let us consider the case where we have $N$ data, and as we build up the matrix equation, let $i$ be the rows and $j$ be the columns of our matrix:

\begin{equation}d_i = \int_a^b g_i(x) m(x) dx \; \Rightarrow \; \sum_{j=1}^N g_i (x_j) m_j \Delta x\end{equation}


A "Toy Problem"

Consider a simple case where we have a one dimensional, linear problem that will generate two data points, $d_1$ and $d_2$, from five physical property values
($m_1, m_2, m_3, m_4, m_5$), and these two data points are generated using two kernel functions $g_1$ and $g_2$. Further, let us assume that our domain of interest lies on the interval [0,1]. The equation above is then expressed as the following two equations, one for each datum:

\begin{equation}d_1 = \int_0^1 g_1(x) m(x) dx \; \Rightarrow \; \sum_{j=1}^N g_1 (x_j) m_j \Delta x\end{equation}


\begin{equation}d_2 = \int_0^1 g_2(x) m(x) dx \; \Rightarrow \; \sum_{j=1}^N g_2 (x_j) m_j \Delta x\end{equation}

Given that our problem is small by design, it is instructive to visualize the summation notation as matrix-vector products. Doing so yields the following expressions:

(a) Data. We can collect our two data points into a column vector. Our data in vector notation is given by:

\begin{equation} d = \left[ \begin{array}{c} d_1\\ d_2 \end{array} \right] \end{equation}


(b) The x-spacing. The x-spacings, $\Delta x$, are represented by a diagonal matrix. Here we are in a one dimensional case, but in general our information will reside in a volume, so let this matrix be represented as $V$. In the most general situation, the x-spacings need not be equal, and there can be significant variation in the distances within a grid, depending on the amount of resolution one desires at a particular location. For the moment, let's ignore this complexity and assume equal spacing in our grid. Then let $V=diag(\Delta x)$, and given the dimensions of our problem, $V$ appears as follows:

\begin{equation} V=\begin{bmatrix} \Delta x &0 &0 &0 &0 \\ 0 &\Delta x &0 &0 &0 \\ 0 &0 &\Delta x &0 &0 \\ 0 &0 &0 &\Delta x &0 \\ 0 &0 &0 &0 &\Delta x\\ \end{bmatrix} \end{equation}


(c) The model. The model, $m$, is a column vector with five rows: \begin{equation} m= \left[ \begin{array}{c} m_1\\ m_2\\ m_3\\ m_4\\ m_5 \end{array} \right] \end{equation}

(d) The kernel functions. Recall that a kernel function shows how a datum is affected by all the subsurface. The full expression for the kernel function will be developed further in the next sections, but for the moment, let us put each kernel function on a row of a matrix, such that $g_1$ will be on row 1, and $g_2$ on row 2, and define this matrix as $\widetilde{G}$. This will yield the matrix: \begin{equation} \widetilde{G}= \begin{bmatrix} g_{11} &g_{12} &g_{13} &g_{14} &g_{15}\\ g_{21} &g_{22} &g_{23} &g_{24} &g_{25} \end{bmatrix} \end{equation}

Using the above, the assembled matrix vector forms for these equations becomes:

\begin{equation} \left[ \begin{array}{c} d_1\\ d_2\\ \end{array} \right] = \begin{bmatrix} g_{11} &g_{12} &g_{13} &g_{14} &g_{15}\\ g_{21} &g_{22} &g_{23} &g_{24} &g_{25} \end{bmatrix} \begin{bmatrix} \Delta x &0 &0 &0 &0 \\ 0 &\Delta x &0 &0 &0 \\ 0 &0 &\Delta x &0 &0 \\ 0 &0 &0 &\Delta x &0 \\ 0 &0 &0 &0 &\Delta x\\ \end{bmatrix} \left[ \begin{array}{c} m_1\\ m_2\\ m_3\\ m_4\\ m_5 \end{array} \right] \end{equation}

Or to put it succinctly, $d = \widetilde{G} \, diag(\Delta x)\, m$.

Step 3: Set up the mesh

Now that the data has been discretized, let's look at where we put our information. In the three dimensional cases, our information is stored in a cube, where we can put values in the cell centers (the red circle below), on the cube faces (red triangles), the edges (blue triangles), and on the nodes (the apices). This representation is useful because it allows us to represent physical quantities that are conserved across boundaries from on cell to the other. For example, fluxes can be put across the faces of the cube.

Taken from http://docs.simpeg.xyz/en/latest/api_Mesh.html
In a one dimensional case, however, we are only dealing with two elements, cell centers and cell nodes. First, subdivide the one dimensional domain into cell centers and nodes, with spacings of $\Delta x$:

In this instance the model values reside in the cell centers, while the values for the kernel functions reside on the nodes. Schematically, we can think of our 1D mesh as the following picture, where $g$ represents a kernel function and $m$ represents the model parameters.


Note that the kernel function in our example (represented above) has six values and the model values are five; moreover, the x-coordinates where $g(x) $ and $m(x)$ are evaluated are not coincident, and this leads to a complication when we want to perform the inner product of m and g. What is needed is a way to evaluate the kernel functions at the cell centers. To do this, we employ the trapezoidal rule for approximating integration. Simply put, to obtain the values of the kernel function on the cell centers, take the average of the kernel function values on the adjacent nodes. Let us then define two kernel function matrices, one with values evaluated on the cell centers, $G_c$ (represented by the black circles below), and another with the values evaluated on the nodes, $G_n$ (in white).

The relationship between $G_c $ and $G_n$ is an "averaging matrix", $A_v$, such that $G_c = A_v G_n$. Putting each kernel function in the columns for both $G_c$ and $G_n$, and using again the dimensions of our toy problem, this relation appears as follows:

\begin{equation} \begin{bmatrix} g_{c1}(x_1) & g_{c2}(x_1) \\ g_{c1}(x_2) & g_{c2}(x_2) \\ g_{c1}(x_3) & g_{c2}(x_3) \\ g_{c1}(x_4) & g_{c2}(x_4) \\ g_{c1}(x_5) & g_{c2}(x_5) \\ \end{bmatrix} = \frac{1}{2} \begin{bmatrix} 1 & 1 & 0 & 0 & 0 & 0\\ 0 & 1 & 1 & 0 & 0 & 0\\ 0 & 0 & 1 & 1 & 0 & 0\\ 0 & 0 & 0 & 1 & 1 & 0\\ 0 & 0 & 0 & 0 & 1 & 1\\ \end{bmatrix} \begin{bmatrix} g_{n1}(x_1) & g_{n2}(x_1) \\ g_{n1}(x_2) & g_{n2}(x_2) \\ g_{n1}(x_3) & g_{n2}(x_3) \\ g_{n1}(x_4) & g_{n2}(x_4) \\ g_{n1}(x_5) & g_{n2}(x_5) \\ g_{n1}(x_6) & g_{n2}(x_6) \\ \end{bmatrix} \end{equation}


Meanwhile, the relationship between $G_c$ and $\widetilde{G}$ in step 2 is such that $G_c = \widetilde{G}^T$.

Step 4: Build up the matrix equation $d=Gm$

We have all the required building blocks to assemble the matrix equation. At the end of step 2 we arrived at $d = \widetilde{G} \, diag(\Delta x)\, m$, and from the previous step we have $G_c = \widetilde{G}^T = A_v G_n$. Put more orderly, $\widetilde{G} = (A_v G_n)^T$, so substituting this into our expression gives:

\begin{equation} d = (A_v G_n)^T \, diag(\Delta x)\, m \end{equation}


If we group matrices together and let $G = (A_v G_n)^T diag(\Delta x)$ then we arrive at $d=Gm$, our desired form. Referring again to our example, this would appear as follows:

\begin{equation} \left[ \begin{array}{c} d_1\\ d_2\\ \end{array} \right] = \frac{1}{2} \left( \begin{bmatrix} 1 & 1 & 0 & 0 & 0 & 0\\ 0 & 1 & 1 & 0 & 0 & 0\\ 0 & 0 & 1 & 1 & 0 & 0\\ 0 & 0 & 0 & 1 & 1 & 0\\ 0 & 0 & 0 & 0 & 1 & 1\\ \end{bmatrix} \begin{bmatrix} g_{n1}(x_1) & g_{n2}(x_1) \\ g_{n1}(x_2) & g_{n2}(x_2) \\ g_{n1}(x_3) & g_{n2}(x_3) \\ g_{n1}(x_4) & g_{n2}(x_4) \\ g_{n1}(x_5) & g_{n2}(x_5) \\ g_{n1}(x_6) & g_{n2}(x_6) \\ \end{bmatrix} \right)^T \begin{bmatrix} \Delta x &0 &0 &0 &0 \\ 0 &\Delta x &0 &0 &0 \\ 0 &0 &\Delta x &0 &0 \\ 0 &0 &0 &\Delta x &0 \\ 0 &0 &0 &0 &\Delta x\\ \end{bmatrix} \left[ \begin{array}{c} m_1\\ m_2\\ m_3\\ m_4\\ m_5 \end{array} \right] \end{equation}


Step 5: Generalize the form of the problem

In the above case we had two data points and five model values. We can generalize this to larger data sets easily. In the case where we have $N$ measured data and $M$ model values, we obtain a matrix equation of the following dimensions:

\begin{equation} (N \times 1) = [(M \times (M+1)) \; ((M+1) \times N)]^T (M \times M) (M \times 1)\\ d \qquad = \qquad \qquad \qquad \quad [A_v G_n]^T \qquad \qquad diag(\Delta x) \quad m \end{equation}

Step 6: Implement the example problem in Python.

Next, we will build up our matrices one by one in Python. First we will do the small problem exactly as above, and then we will use a larger, but still one-dimensional, problem. To begin, import the SimPEG and numpy packages:


In [2]:
# Import Packages
from SimPEG import *
from IPython.html.widgets import interactive
import numpy as np
%pylab inline


Efficiency Warning: Interpolation will be slow, use setup.py!

            python setup.py build_ext --inplace
    
Populating the interactive namespace from numpy and matplotlib
WARNING: pylab import has clobbered these variables: ['interactive']
`%matplotlib` prevents importing * from pylab and numpy

Let start by formulating the forward problem, that is, let us assume that we have our model values $m$ and seek to generate synthetic data $d$. Recall that we are building up the matrix equation: $d = (A_v G_n)^T \, diag(\Delta x)\, m$, so let's start with the right hand side and go over each matrix, one by one.

(1) The $A_v$ matrix. We can build the "averaging matrix" as follows:


In [3]:
M=5              # RecalL is the number of values we have in m 
n=M+1            # Define n as the N+1 dimension of the matrix   
w=n-1            # Define w and the n-1  dimentsion of the matrix   
s = (M,n)        # Store matrix values 
Av = np.zeros(s) # Create a matrix of zeros of the correct dimensions 
                 # and fill in with elements usin the loop below (note the 1/2 is included in here). 
for i in range(M):
    j=i
    k=i+1
    Av[i,j] = 0.5  
    Av[i,k] = 0.5
print Av


[[ 0.5  0.5  0.   0.   0.   0. ]
 [ 0.   0.5  0.5  0.   0.   0. ]
 [ 0.   0.   0.5  0.5  0.   0. ]
 [ 0.   0.   0.   0.5  0.5  0. ]
 [ 0.   0.   0.   0.   0.5  0.5]]

(2) The kernel matrix on the nodes, $G_n$. To build this matrix, we must first define a sensitivity function that describes some physical phenomenon. For the sake of our example, let us define our kernel function for the toy problem using the following equations. From a physical perspective, one can think of these equations as representing the decay of electromagnetic waves with depth. Note also that $j$ in this case is simply index number (i.e. $j=1,2,3,...$) and is not the imaginary number. The $j$ for the kernel functions affects the frequency and decay for each kernel function. Our kernel functions are as follows:

\begin{equation} g_j(x) = e^{jpx} \cos(2 \pi j q x) \end{equation}



In [4]:
g = lambda x, k, p, q: np.exp(-p*k*x)*np.cos(2*np.pi*q*k*x) # create an anonymous function as above
x=np.linspace(0,1,6)                                          # define the nodes of our x-array
#x = np.array([0., 0.2, 0.4, 0.6, 0.8, 1.])
p = 0.01
q = 0.1
j = np.array([1, 2])
Gn = np.zeros((len(x), len(j)))

for i in range(len(j)):
    f = g(x,i,p,q)
    Gn[:,i] = f
    
print Gn


[[ 1.          1.        ]
 [ 1.          0.99013245]
 [ 1.          0.96471657]
 [ 1.          0.92421453]
 [ 1.          0.86932419]
 [ 1.          0.80096714]]

(3) The volume matrix $V$. This simply consists of making an $M \times M$ array of x-spacings, $\Delta x.$


In [4]:
# Make the delta x array

Deltax = 0.2*np.ones(M) # set x-spacings
V = np.diag(Deltax)     # create diagonal matrix     
print V


[[ 0.2  0.   0.   0.   0. ]
 [ 0.   0.2  0.   0.   0. ]
 [ 0.   0.   0.2  0.   0. ]
 [ 0.   0.   0.   0.2  0. ]
 [ 0.   0.   0.   0.   0.2]]

(4) Input the model values, $m$. Given that we are making a forward problem, we assume these values as given, so we input fictitious values:


In [5]:
m = np.array([0.02, 0.05, 0.09, 0.07, 0.04])
print m


[ 0.02  0.05  0.09  0.07  0.04]

(5) Generate our two data values. The remaining step is simply to perform the matrix-vector multiplication:


In [6]:
Gc=np.transpose(np.dot(Av, Gn)) # cell center matrix Gc
G = np.dot(Gc,V)                # make G     
d = np.dot(G,m)

#Note: in one line our data d, would be:  d = np.dot(np.dot(np.transpose(np.dot(Av, Gn)), V),m) 
print d


[ 0.054       0.04999083]

A Bigger Toy Problem

We have successfully created two synthetic data points. Hopefully this has provided some intuition about how this is put together. But two data points are not interesting to view. Let's rebuild the problem using a larger system and using more model values and several kernel functions. Let's clear everything and begin again:


In [20]:
clear


































This time, let's use a similar setup as the one provided in the the Inversion for Applied Geophysics tutorial, found here: http://www.eos.ubc.ca/ubcgif/iag/tutorials/tutorial-v9.pdf. For this exercise, let's assume that we have 1000 model parameter values and 20 kernel functions, using the same form for the kernel functions as above. Also, let us create a model that is the combination of a Gaussian and a boxcar on a 1D domain between zero and one, similar to the model in the tutorial. Again, we will be performing the forward problem, which means that we will create synthetic data. The general setup of the domain and a plot of the model the model function could go something like this:


In [14]:
# Begin by creating a ficticious set of model data

M=1000                      # Set number of model parameters 
x=np.linspace(0,1,M+1)      # Define 1D domain on nodes
xc = 0.5*(x[1:] + x[0:-1])  # Define 1D domain on cell centers
m = np.zeros(M)             # preallocate m array 

# Define Gaussian function:
gauss = lambda x, a, mean, SD: a*numpy.exp(-(x-mean)**2./(2.*SD**2.)) # create a Gaussian function

# Choose parameters for Gaussian, evaluate, and store in an array, f.
SD=6                           
mean=0
a=1/(SD*sqrt(2*pi))
x2=np.linspace(-20,20,0.2*M)  
f = gauss(x2,a, mean,SD)

# Define a boxcar function:
box = 0.4*np.ones(0.2*M)

# Insert the Gaussian and Boxcar into m:
m[0.2*M:0.4*M]=box
m[0.6*M:0.8*M]=10*f
   
# Plot      
pylab.plot(xc,m)
#pylab.plot(f)
print size(f)
pylab.xlabel('x')
pylab.ylabel('m(x)')
pylab.title('Model, $m(x)$')


print shape(xc)


200
(1000L,)

Next, let's build the kernel functions. In this case, let's define the kernel functions as before, with p, q, and the number of output data, $N$, as follows. We can then create a set of kernel functions on the nodes. The plot for these functions is below:


In [15]:
# Make the set of kernel functions

g = lambda x, i, p, q: np.exp(-p*i*x)*np.cos(2*np.pi*q*i*x) # create an anonymous function as immediately above
p = 0.01      # Set values for p, q
q = 0.15
N = 20        # specify number of output data
Gn = np.zeros((M+1,N))

for i in range(N):
    f = g(x,i,p,q)
    Gn[:,i] = f
    
# Plot  
pylab.plot(x,Gn)
pylab.xlabel('x')
pylab.ylabel('g(x)')
pylab.title('Kernel functions, $g(x)$')


Out[15]:
<matplotlib.text.Text at 0x9093be0>

The averaging matrix can be made much the same as before, although it will be larger in this case. I will print a subsample of the matrix and show it as a plot below:


In [16]:
# Make Averaging Matrix
n=M+1            # Define n as the N+1 dimension of the matrix   
w=n-1            # Define w and the n-1  dimentsion of the matrix   
s = (M,n)        # Store matrix values 
Av = np.zeros(s) # Create a matrix of zeros of the correct dimensions 
                 # and fill in with elements usin the loop below (note the 1/2 is included in here). 
for i in range(M):
    j=i
    k=i+1
    Av[i,j] = 0.5  
    Av[i,k] = 0.5
print shape(Av)

# Plot  
pylab.title('Subset of averaging matrix')
pyplot.imshow(Av[:5,:6], cmap='Greys',  interpolation='nearest')
print(Av[:5,:6])


(1000L, 1001L)
[[ 0.5  0.5  0.   0.   0.   0. ]
 [ 0.   0.5  0.5  0.   0.   0. ]
 [ 0.   0.   0.5  0.5  0.   0. ]
 [ 0.   0.   0.   0.5  0.5  0. ]
 [ 0.   0.   0.   0.   0.5  0.5]]

The next step is to make our volume matrix (note its shape, $M \times M$):


In [17]:
# make the Volume, "delta x" array

Deltax = 0.2*np.ones(M) # set x-spacings
V = np.diag(Deltax)     # create diagonal matrix     
print shape(V)


(1000L, 1000L)

The last step is to create and plot our synthetic data:


In [18]:
d = np.dot(np.dot(np.transpose(np.dot(Av, Gn)), V),m)   # Create synthetic data
xd=np.linspace(0,1,len(d))                              # make x array for data 

# Plot

pylab.plot(xd,d)
pylab.xlabel('')
pylab.ylabel('d')
pylab.title('Synthetic Data $d$')


Out[18]:
<matplotlib.text.Text at 0xa5ae9b0>

Harder, Bettter, Stronger, Faster

Now we have created our first 1D set of synthetic data using numpy exclusively. In real world situations, however, storing matrices explicitly like this is inefficient if not infeasible. For example, for a large data set, the volume matrix consists mainly of zeros, save the diagonal elements (this is a similar case for the averaging matrix). It is not practical to hold on to superfluous information when the operation can be specified without storing so many values. In Python there is a package available capable of performing these operations much more effectively: the Simulation and Parameter Estimation in Geophysics, or "SimPEG" for short (see http://www.simpeg.xyz). As a last exercise, let's see how the same process can be done using SimPEG.


In [19]:
g = lambda x, k, p, q: np.exp(-p*k*x)*np.cos(2*np.pi*q*k*x)  # I redfine our kernel functions, as above.

M = Mesh.TensorMesh([1000])                                  # create a tensor mesh
Glist = [g(M.vectorNx,i+1,p,q) for i in range(20)]           # make kernel functions on nodes with the following lines
Glist = []
for i in range(20):
    gi = g(M.vectorNx,i+1,p,q)
    Glist.append(gi)
Gnodes = np.vstack(Glist).T                            
Gcells = M.aveF2CC * Gnodes                                  # evaluate kernel functions on cell centers directly
V = Utils.sdiag(M.vol)                                       # build volume array  
d = np.dot(Gcells.T, V * m)                                  # create data  
xd=np.linspace(0,1,len(d)) 

# Plot
pylab.plot(xd,d)
pylab.xlabel('')
pylab.ylabel('d')
pylab.title('Synthetic Data $d$')


Out[19]:
<matplotlib.text.Text at 0xa8feb00>

Thus far we have created a set of synthetic data. Essentially what we have done is make a forward problem. In the next module, we are going to look at a method for performing and inversion, which will involve casting our inverse problem in terms of an optimization.