Implementation of a Devito skew self adjoint variable density visco- acoustic isotropic modeling operator
-- Nonlinear Ops --

This operator is contributed by Chevron Energy Technology Company (2020)

This operator is based on simplfications of the systems presented in:
Self-adjoint, energy-conserving second-order pseudoacoustic systems for VTI and TTI media for reverse time migration and full-waveform inversion (2016)
Kenneth Bube, John Washbourne, Raymond Ergas, and Tamas Nemeth
SEG Technical Program Expanded Abstracts
https://library.seg.org/doi/10.1190/segam2016-13878451.1

Introduction

The goal of this tutorial set is to generate and prove correctness of modeling and inversion capability in Devito for variable density visco- acoustics using an energy conserving form of the wave equation. We describe how the linearization of the energy conserving skew self adjoint system with respect to modeling parameters allows using the same modeling system for all nonlinear and linearized forward and adjoint finite difference evolutions. There are three notebooks in this series:

1. Implementation of a Devito skew self adjoint variable density visco- acoustic isotropic modeling operator -- Nonlinear Ops
2. Implementation of a Devito skew self adjoint variable density visco- acoustic isotropic modeling operator -- Linearized Ops
3. Implementation of a Devito skew self adjoint variable density visco- acoustic isotropic modeling operator -- Correctness Testing

There are similar series of notebooks implementing and testing operators for VTI and TTI anisotropy (README.md).

Below we introduce the skew self adjoint form of the scalar isotropic variable density visco- acoustic wave equation with a simple form of dissipation only Q attenuation. This dissipation only (no dispersion) attenuation term $\left (\frac{\displaystyle \omega}{Q}\ \partial_t\ u \right)$ is an approximation of a Maxwell Body -- that is to say viscoelasticity approximated with a spring and dashpot in series. In practice this approach for attenuating outgoing waves is very similar to the Cerjan style damping in absorbing boundaries used elsewhere in Devito (References).

The derivation of the attenuation model is not in scope for this tutorial, but one important point is that the physics in the absorbing boundary region and the interior of the model are unified, allowing the same modeling equations to be used everywhere, with physical Q values in the interior tapering to non-physical small Q at the boundaries to attenuate outgoing waves.

Outline

  1. Define symbols [link]
  2. Introduce the SSA wave equation [link]
  3. Show generation of skew symmetric derivatives and prove correctness with unit test [link]
  4. Derive the time update equation used to implement the nonlinear forward modeling operator [link]
  5. Create the Devito grid and model fields [link]
  6. Define a function to implement the attenuation profile ($\omega\ /\ Q$) [link]
  7. Create the Devito operator [link]
  8. Run the Devito operator [link]
  9. Plot the resulting wavefields [link]
  10. References [link]

Table of symbols

Symbol                 Description Dimensionality
$\omega_c = 2 \pi f_c$ center angular frequency constant
$m(x,y,z)$ P wave velocity function of space
$b(x,y,z)$ buoyancy $(1 / \rho)$ function of space
$Q(x,y,z)$ Attenuation at frequency $\omega_c$ function of space
$u(t,x,y,z)$ Pressure wavefield function of time and space
$q(t,x,y,z)$ Source term function of time, localized in space
$\overleftarrow{\partial_t}$ shifted first derivative wrt $t$ shifted 1/2 sample backward in time
$\partial_{tt}$ centered second derivative wrt $t$ centered in time
$\overrightarrow{\partial_x},\ \overrightarrow{\partial_y},\ \overrightarrow{\partial_z}$ + shifted first derivative wrt $x,y,z$ shifted 1/2 sample forward in space
$\overleftarrow{\partial_x},\ \overleftarrow{\partial_y},\ \overleftarrow{\partial_z}$ - shifted first derivative wrt $x,y,z$ shifted 1/2 sample backward in space
$\Delta_t, \Delta_x, \Delta_y, \Delta_z$ sampling rates for $t, x, y , z$ $t, x, y , z$

A word about notation

We use the arrow symbols over derivatives $\overrightarrow{\partial_x}$ as a shorthand notation to indicate that the derivative is taken at a shifted location. For example:

  • $\overrightarrow{\partial_x}\ u(t,x,y,z)$ indicates that the $x$ derivative of $u(t,x,y,z)$ is taken at $u(t,x+\frac{\Delta x}{2},y,z)$.

  • $\overleftarrow{\partial_z}\ u(t,x,y,z)$ indicates that the $z$ derivative of $u(t,x,y,z)$ is taken at $u(t,x,y,z-\frac{\Delta z}{2})$.

  • $\overleftarrow{\partial_t}\ u(t,x,y,z)$ indicates that the $t$ derivative of $u(t,x,y,z)$ is taken at $u(t-\frac{\Delta_t}{2},x,y,z)$.

We usually drop the $(t,x,y,z)$ notation from wavefield variables unless required for clarity of exposition, so that $u(t,x,y,z)$ becomes $u$.

SSA variable density visco- acoustic wave equation

Our skew self adjoint wave equation is written:

$$ \frac{b}{m^2} \left( \frac{\omega_c}{Q} \overleftarrow{\partial_t}\ u + \partial_{tt}\ u \right) = \overleftarrow{\partial_x}\left(b\ \overrightarrow{\partial_x}\ u \right) + \overleftarrow{\partial_y}\left(b\ \overrightarrow{\partial_y}\ u \right) + \overleftarrow{\partial_z}\left(b\ \overrightarrow{\partial_z}\ u \right) + q $$

An advantage of this form is that the same system can be used to provide stable modes of propagation for all operations needed in quasi- Newton optimization:

  • the nonlinear forward
  • the linearized forward (Jacobian forward)
  • the linearized adjoint (Jacobian adjoint)

This advantage is more important for anisotropic operators, where widely utilized non energy conserving formulations can provide unstable adjoints and thus unstable gradients for anisotropy parameters.

The skew self adjoint formulation is evident in the shifted spatial derivatives, with the derivative on the right side $\overrightarrow{\partial}$ shifting forward in space one-half cell, and the derivative on the left side $\overleftarrow{\partial}$ shifting backward in space one-half cell.

$\overrightarrow{\partial}$ and $\overleftarrow{\partial}$ are anti-symmetric (also known as skew symmetric), meaning that for two random vectors $x_1$ and $x_2$, correctly implemented numerical derivatives will have the following property:

$$ x_2 \cdot \left( \overrightarrow{\partial_x}\ x_1 \right) \approx -\ x_1 \cdot \left( \overleftarrow{\partial_x}\ x_2 \right) $$

Below we will demonstrate this skew symmetry with a simple unit test on Devito generated derivatives.

In the following notebooks in this series, material parameters sandwiched between the derivatives -- including anisotropy parameters -- become much more interesting, but here buoyancy $b$ is the only material parameter between derivatives in our skew self adjoint (SSA) wave equation.

Imports

We have grouped all imports used in this notebook here for consistency.


In [1]:
import numpy as np
from examples.seismic import RickerSource, Receiver, TimeAxis
from devito import (Grid, Function, TimeFunction, SpaceDimension, Constant, 
                    Eq, Operator, solve, configuration)
from devito.finite_differences import Derivative
from devito.builtins import gaussian_smooth
from examples.seismic.skew_self_adjoint import compute_critical_dt, setup_w_over_q
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib import cm
from timeit import default_timer as timer
# These lines force images to be displayed in the notebook, and scale up fonts 
%matplotlib inline
mpl.rc('font', size=14)

# Make white background for plots, not transparent
plt.rcParams['figure.facecolor'] = 'white'

# Set the default language to openmp
configuration['language'] = 'openmp'

# Set logging to debug, captures statistics on the performance of operators
configuration['log-level'] = 'DEBUG'

Unit test demonstrating skew symmetry for shifted derivatives

As noted above, we prove with a small 1D unit test and 8th order spatial operator that the Devito shifted first derivatives are skew symmetric. This anti-symmetry can be demonstrated for the forward and backward half cell shift first derivative operators $\overrightarrow{\partial}$ and $\overleftarrow{\partial}$ with two random vectors $x_1$ and $x_2$ by verifying the dot product test as written above.

We will use Devito to implement the following two equations with an Operator:

$$ \begin{aligned} f_2 = \overrightarrow{\partial_x}\ f_1 \\[5pt] g_2 = \overleftarrow{\partial_x}\ g_1 \end{aligned} $$

And then verify the dot products are equivalent:

$$ f_1 \cdot g_2 \approx g_1 \cdot f_2 $$

We use the following test for relative error (note the flipped signs in numerator and denominator due to anti- symmetry):

$$ \frac{\displaystyle f_1 \cdot g_2 + g_1 \cdot f_2} {\displaystyle f_1 \cdot g_2 - g_1 \cdot f_2}\ <\ \epsilon $$

In [2]:
#NBVAL_INGNORE_OUTPUT

# Make 1D grid to test derivatives 
n = 101
d = 1.0
shape = (n, )
spacing = (1 / (n-1), )   
origin = (0., )
extent = (d * (n-1), )
dtype = np.float64

# Initialize Devito grid and Functions for input(f1,g1) and output(f2,g2)
# Note that space_order=8 allows us to use an 8th order finite difference 
#   operator by properly setting up grid accesses with halo cells 
grid1d = Grid(shape=shape, extent=extent, origin=origin, dtype=dtype)
x = grid1d.dimensions[0]
f1 = Function(name='f1', grid=grid1d, space_order=8)
f2 = Function(name='f2', grid=grid1d, space_order=8)
g1 = Function(name='g1', grid=grid1d, space_order=8)
g2 = Function(name='g2', grid=grid1d, space_order=8)

# Fill f1 and g1 with random values in [-1,+1]
f1.data[:] = -1 + 2 * np.random.rand(n,)
g1.data[:] = -1 + 2 * np.random.rand(n,)

# Equation defining: [f2 = forward 1/2 cell shift derivative applied to f1]
equation_f2 = Eq(f2, f1.dx(x0=x+0.5*x.spacing))

# Equation defining: [g2 = backward 1/2 cell shift derivative applied to g1]
equation_g2 = Eq(g2, g1.dx(x0=x-0.5*x.spacing))

# Define an Operator to implement these equations and execute
op = Operator([equation_f2, equation_g2])
op()

# Compute the dot products and the relative error
f1g2 = np.dot(f1.data, g2.data)
g1f2 = np.dot(g1.data, f2.data)
diff = (f1g2+g1f2)/(f1g2-g1f2)

tol = 100 * np.finfo(np.float32).eps
print("f1g2, g1f2, diff, tol; %+.6e %+.6e %+.6e %+.6e" % (f1g2, g1f2, diff, tol))

# At last the unit test
# Assert these dot products are float epsilon close in relative error
assert diff < 100 * np.finfo(np.float32).eps


Allocating memory for f1(117,)
Allocating memory for g1(117,)
Operator `Kernel` generated in 0.15 s
  * lowering.Expressions: 0.09 s (61.4 %)
  * lowering.IET: 0.03 s (20.5 %)
     * specializing.IET: 0.03 s (20.5 %)
  * lowering.Clusters: 0.03 s (20.5 %)
     * specializing.Clusters: 0.03 s (20.5 %)
Flops reduction after symbolic optimization: [62 --> 25]
Allocating memory for f2(117,)
Allocating memory for g2(117,)
Operator `Kernel` fetched `/tmp/devito-jitcache-uid5138/8385b5f20926da52f79c78af9b291a77965d79b4.c` in 0.09 s from jit-cache
Operator `Kernel` run in 0.01 s
* section0<101> with OI=0.76 computed in 0.01 s [0.01 GFlops/s]
Performance[mode=advanced] arguments: {'nthreads': 16}
f1g2, g1f2, diff, tol; -2.765449e+00 +2.765449e+00 +8.029242e-17 +1.192093e-05

Show the finite difference operators and generated code

You can inspect the finite difference coefficients and locations for evaluation with the code shown below.

For your reference, the finite difference coefficients seen in the first two stanzas below are exactly the coefficients generated in Table 2 of Fornberg's paper Generation of Finite Difference Formulas on Arbitrarily Spaced Grids linked below (References).

Note that you don't need to inspect the generated code, but this does provide the option to use this highly optimized code in applications that do not need or require python. If you inspect the code you will notice hallmarks of highly optimized c code, including pragmas for vectorization, and decorations for pointer restriction and alignment.


In [3]:
#NBVAL_INGNORE_OUTPUT

# Show the FD coefficients generated by Devito 
# for the forward 1/2 cell shifted first derivative operator
print("\n\nForward +1/2 cell shift;")
print("..................................")
print(f1.dx(x0=x+0.5*x.spacing).evaluate)

# Show the FD coefficients generated by Devito 
# for the backward 1/2 cell shifted first derivative operator
print("\n\nBackward -1/2 cell shift;")
print("..................................")
print(f1.dx(x0=x-0.5*x.spacing).evaluate)

# Show code generated by Devito for applying the derivatives
print("\n\nGenerated c code;")
print("..................................")
print(op.ccode)



Forward +1/2 cell shift;
..................................
-1.19628906*f1(x)/h_x + 0.000697544643*f1(x - 3.0*h_x)/h_x - 0.0095703125*f1(x - 2.0*h_x)/h_x + 0.0797526042*f1(x - 1.0*h_x)/h_x + 1.19628906*f1(x + 1.0*h_x)/h_x - 0.0797526042*f1(x + 2.0*h_x)/h_x + 0.0095703125*f1(x + 3.0*h_x)/h_x - 0.000697544643*f1(x + 4.0*h_x)/h_x


Backward -1/2 cell shift;
..................................
1.19628906*f1(x)/h_x + 0.000697544643*f1(x - 4.0*h_x)/h_x - 0.0095703125*f1(x - 3.0*h_x)/h_x + 0.0797526042*f1(x - 2.0*h_x)/h_x - 1.19628906*f1(x - 1.0*h_x)/h_x - 0.0797526042*f1(x + 1.0*h_x)/h_x + 0.0095703125*f1(x + 2.0*h_x)/h_x - 0.000697544643*f1(x + 3.0*h_x)/h_x


Generated c code;
..................................
#define _POSIX_C_SOURCE 200809L
#include "stdlib.h"
#include "math.h"
#include "sys/time.h"
#include "xmmintrin.h"
#include "pmmintrin.h"
#include "omp.h"

struct dataobj
{
  void *restrict data;
  int * size;
  int * npsize;
  int * dsize;
  int * hsize;
  int * hofs;
  int * oofs;
} ;

struct profiler
{
  double section0;
} ;


int Kernel(struct dataobj *restrict f1_vec, struct dataobj *restrict f2_vec, struct dataobj *restrict g1_vec, struct dataobj *restrict g2_vec, const double h_x, struct profiler * timers, const int x_M, const int x_m, const int nthreads)
{
  double (*restrict f1) __attribute__ ((aligned (64))) = (double (*)) f1_vec->data;
  double (*restrict f2) __attribute__ ((aligned (64))) = (double (*)) f2_vec->data;
  double (*restrict g1) __attribute__ ((aligned (64))) = (double (*)) g1_vec->data;
  double (*restrict g2) __attribute__ ((aligned (64))) = (double (*)) g2_vec->data;
  /* Flush denormal numbers to zero in hardware */
  _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
  _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
  struct timeval start_section0, end_section0;
  gettimeofday(&start_section0, NULL);
  /* Begin section0 */
  #pragma omp parallel num_threads(nthreads)
  {
    #pragma omp for collapse(1) schedule(dynamic,1)
    for (int x = x_m; x <= x_M; x += 1)
    {
      double r2 = 1.0/h_x;
      f2[x + 8] = r2*(3.57142857e-3*(f1[x + 4] - f1[x + 12]) + 3.80952381e-2*(-f1[x + 5] + f1[x + 11]) + 2.0e-1*(f1[x + 6] - f1[x + 10]) + 8.0e-1*(-f1[x + 7] + f1[x + 9]));
      g2[x + 8] = r2*(3.57142857e-3*(g1[x + 4] - g1[x + 12]) + 3.80952381e-2*(-g1[x + 5] + g1[x + 11]) + 2.0e-1*(g1[x + 6] - g1[x + 10]) + 8.0e-1*(-g1[x + 7] + g1[x + 9]));
    }
  }
  /* End section0 */
  gettimeofday(&end_section0, NULL);
  timers->section0 += (double)(end_section0.tv_sec-start_section0.tv_sec)+(double)(end_section0.tv_usec-start_section0.tv_usec)/1000000;
  return 0;
}

Doing some algebra to solve for the time update

The next step in implementing our Devito modeling operator is to define the equation used to update the pressure wavefield as a function of time. What follows is a bit of algebra using the wave equation and finite difference approximations to time derivatives to express the pressure wavefield forward in time $u(t+\Delta_t)$ as a function of the current $u(t)$ and previous $u(t-\Delta_t)$ pressure wavefields.

1. Numerical approximation for $\partial_{tt}\ u$, solved for for $u(t+\Delta_t)$

The second order accurate centered approximation to the second time derivative involves three wavefields: $u(t-\Delta_t)$, $u(t)$, and $u(t+\Delta_t)$. In order to advance our finite difference solution in time, we solve for $u(t+\Delta_t)$.

$$ \begin{aligned} \partial_{tt}\ u &= \frac{\displaystyle u(t+\Delta_t) - 2\ u(t) + u(t-\Delta_t)}{\displaystyle \Delta_t^2} \\[5pt] u(t+\Delta_t)\ &= \Delta_t^2\ \partial_{tt}\ u + 2\ u(t) - u(t-\Delta_t) \end{aligned} $$

2. Numerical approximation for $\overleftarrow{\partial_{t}}\ u$

The argument for using a backward approximation is a bit hand wavy, but goes like this: a centered or forward approximation for $\partial_{t}\ u$ would involve the term $u(t+\Delta_t)$, and hence $u(t+\Delta_t)$ would appear at two places in our time update equation below, essentially making the form implicit (although it would be easy to solve for $u(t+\Delta_t)$).

We are interested in explicit time stepping and the correct behavior of the attenuation term, and so prefer the backward approximation for $\overleftarrow{\partial_{t}}\ u$. Our experience is that the use of the backward difference is more stable than forward or centered.

The first order accurate backward approximation to the first time derivative involves two wavefields: $u(t-\Delta_t)$, and $u(t)$. We can use this expression as is.

$$ \overleftarrow{\partial_{t}}\ u = \frac{\displaystyle u(t) - u(t-\Delta_t)}{\displaystyle \Delta_t} $$

3. Solve the wave equation for $\partial_{tt}$

$$ \begin{aligned} \frac{b}{m^2} \left( \frac{\omega_c}{Q} \overleftarrow{\partial_{t}}\ u + \partial_{tt}\ u \right) &= \overleftarrow{\partial_x}\left(b\ \overrightarrow{\partial_x}\ u \right) + \overleftarrow{\partial_y}\left(b\ \overrightarrow{\partial_y}\ u \right) + \overleftarrow{\partial_z}\left(b\ \overrightarrow{\partial_z}\ u \right) + q\\[10pt] \partial_{tt}\ u &= \frac{m^2}{b} \left[ \overleftarrow{\partial_x}\left(b\ \overrightarrow{\partial_x}\ u \right) + \overleftarrow{\partial_y}\left(b\ \overrightarrow{\partial_y}\ u \right) + \overleftarrow{\partial_z}\left(b\ \overrightarrow{\partial_z}\ u \right) + q \right] - \frac{\omega_c}{Q} \overleftarrow{\partial_{t}}\ u \end{aligned} $$

4. Plug in $\overleftarrow{\partial_t} u$ and $\partial_{tt} u$ into the time update equation

Next we plug in the right hand sides for $\partial_{tt}\ u$ and $\overleftarrow{\partial_{t}}\ u$ into the the time update expression for $u(t+\Delta_t)$ from step 2.

$$ \begin{aligned} u(t+\Delta_t) &= \Delta_t^2 \frac{m^2}{b} \left[ \overleftarrow{\partial_x}\left(b\ \overrightarrow{\partial_x}\ u \right) + \overleftarrow{\partial_y}\left(b\ \overrightarrow{\partial_y}\ u \right) + \overleftarrow{\partial_z}\left(b\ \overrightarrow{\partial_z}\ u \right) + q \right] \\[10pt] & \quad -\ \Delta_t^2 \frac{\omega_c}{Q} \left( \frac{\displaystyle u(t) - u(t-\Delta_t)} {\displaystyle \Delta_t} \right) + 2\ u(t) - u(t-\Delta_t) \end{aligned} $$

5. Simplify ...

$$ \begin{aligned} u(t+\Delta_t) &= \Delta_t^2 \frac{m^2}{b} \left[ \overleftarrow{\partial_x}\left(b\ \overrightarrow{\partial_x}\ u \right) + \overleftarrow{\partial_y}\left(b\ \overrightarrow{\partial_y}\ u \right) + \overleftarrow{\partial_z}\left(b\ \overrightarrow{\partial_z}\ u \right) + q \right] \\[10pt] & \quad \left(2 -\ \Delta_t\ \frac{\omega_c}{Q} \right) u(t) + \left(\Delta_t\ \frac{\omega_c}{Q} - 1 \right) u(t-\Delta_t) \end{aligned} $$

6. et voila ...

The last equation is how we update the pressure wavefield at each time step, and depends on $u(t)$ and $u(t-\Delta_t)$.

The main work of the finite difference explicit time stepping is evaluating the nested spatial derivative operators on the RHS of this equation. The particular advantage of Devito symbolic optimization is that Devito is able to solve for the complicated expressions that result from substituting the discrete forms of high order numerical finite difference approximations for these nested spatial derivatives.

We have now completed the maths required to implement the modeling operator. The remainder of this notebook deals with setting up and using the required Devito objects.

Instantiate the Devito grid for a two dimensional problem

Define the dimensions and coordinates for the model. The computational domain of the model is surrounded by an absorbing boundary region where we implement boundary conditions to eliminate outgoing waves. We define the sizes for the interior of the model nx and nz, the width of the absorbing boundary region npad, and the sizes for the entire model padded with absorbing boundaries become nxpad = nx + 2*npad and nzpad = nz + 2*npad.


In [4]:
# Define dimensions for the interior of the model
nx,nz = 751,751
dx,dz = 10.0,10.0  # Grid spacing in m
shape = (nx, nz)   # Number of grid points
spacing = (dx, dz) # Domain size is now 5 km by 5 km
origin = (0., 0.)  # Origin of coordinate system, specified in m.
extent = tuple([s*(n-1) for s, n in zip(spacing, shape)])

# Define dimensions for the model padded with absorbing boundaries
npad = 50          # number of points in absorbing boundary region (all sides)
nxpad,nzpad = nx+2*npad, nz+2*npad
shape_pad   = np.array(shape) + 2 * npad
origin_pad  = tuple([o - s*npad for o, s in zip(origin, spacing)])
extent_pad  = tuple([s*(n-1) for s, n in zip(spacing, shape_pad)])

# Define the dimensions 
# Note if you do not specify dimensions, you get in order x,y,z
x = SpaceDimension(name='x', spacing=Constant(name='h_x', 
                   value=extent_pad[0]/(shape_pad[0]-1)))
z = SpaceDimension(name='z', spacing=Constant(name='h_z', 
                   value=extent_pad[1]/(shape_pad[1]-1)))

# Initialize the Devito grid 
grid = Grid(extent=extent_pad, shape=shape_pad, origin=origin_pad, 
            dimensions=(x, z), dtype=dtype)

print("shape;           ", shape)
print("origin;          ", origin)
print("spacing;         ", spacing)
print("extent;          ", extent)
print("")
print("shape_pad;       ", shape_pad)
print("origin_pad;      ", origin_pad)
print("extent_pad;      ", extent_pad)

print("")
print("grid.shape;      ", grid.shape)
print("grid.extent;     ", grid.extent)
print("grid.spacing_map;", grid.spacing_map)


shape;            (751, 751)
origin;           (0.0, 0.0)
spacing;          (10.0, 10.0)
extent;           (7500.0, 7500.0)

shape_pad;        [851 851]
origin_pad;       (-500.0, -500.0)
extent_pad;       (8500.0, 8500.0)

grid.shape;       (851, 851)
grid.extent;      (8500.0, 8500.0)
grid.spacing_map; {h_x: 10.0, h_z: 10.0}

Define velocity and buoyancy model parameters

We have the following constants and fields from our SSA wave equation that we define as time invariant using Functions:

  Symbol   Description
$$m(x,z)$$ Acoustic velocity
$$b(x,z)=\frac{1}{\rho(x,z)}$$ Buoyancy (reciprocal density)

In [5]:
# Create the velocity and buoyancy fields. 
#   - We use a wholespace velocity of 1500 m/s
#   - We use a wholespace density of 1 g/cm^3
#   - These are scalar fields so we use Function to define them
#   - We specify space_order to establish the appropriate size halo on the edges 
space_order = 8

# Wholespace velocity
m = Function(name='m', grid=grid, space_order=space_order)
m.data[:] = 1.5

# Constant density
b = Function(name='b', grid=grid, space_order=space_order)
b.data[:,:] = 1.0 / 1.0


Allocating memory for m(867, 867)
Allocating memory for b(867, 867)

Define the simulation time range

In this notebook we run 2 seconds of simulation using the sample rate related to the CFL condition as implemented in examples/seismic/skew_self_adjoint/utils.py.

Important note smaller Q values in highly viscous media may require smaller temporal sampling rates than a non-viscous medium to achieve dispersion free propagation. This is a cost of the visco- acoustic modification we use here.

We also use the convenience TimeRange as defined in examples/seismic/source.py.


In [6]:
t0 = dtype(0.)     # Simulation time start
tn = dtype(2000.)  # Simulation time end (1 second = 1000 msec)
dt = compute_critical_dt(m)
time_range = TimeAxis(start=t0, stop=tn, step=dt)
print("Time min, max, dt, num; %10.6f %10.6f %10.6f %d" % (t0, tn, dt, int(tn//dt) + 1))
print("time_range; ", time_range)


Time min, max, dt, num;   0.000000 2000.000000   2.100000 953
time_range;  TimeAxis: start=0, stop=2001.3, step=2.1, num=954

Define the acquisition geometry: locations of sources and receivers

source:

  • X coordinate: center of the model: dx*(nx//2)
  • Z coordinate: center of the model: dz*(nz//2)
  • We use a 10 Hz center frequency RickerSource wavelet as defined in examples/seismic/source.py

receivers:

  • X coordinate: center of the model: dx*(nx//2)
  • Z coordinate: vertical line from top to bottom of model
  • We use a vertical line of Receivers as defined with a PointSource in examples/seismic/source.py

In [7]:
# Source in the center of the model at 10 Hz center frequency
fpeak = 0.010
src = RickerSource(name='src', grid=grid, f0=fpeak, npoint=1, time_range=time_range)
src.coordinates.data[0,0] = dx * (nx//2)
src.coordinates.data[0,1] = dz * (nz//2)

# line of receivers along the right edge of the model
rec = Receiver(name='rec', grid=grid, npoint=nz, time_range=time_range)
rec.coordinates.data[:,0] = dx * (nx//2)
rec.coordinates.data[:,1] = np.linspace(0.0, dz*(nz-1), nz)

print("src_coordinate  X;         %+12.4f" % (src.coordinates.data[0,0]))
print("src_coordinate  Z;         %+12.4f" % (src.coordinates.data[0,1]))
print("rec_coordinates X min/max; %+12.4f %+12.4f" % \
      (np.min(rec.coordinates.data[:,0]), np.max(rec.coordinates.data[:,0])))
print("rec_coordinates Z min/max; %+12.4f %+12.4f" % \
      (np.min(rec.coordinates.data[:,1]), np.max(rec.coordinates.data[:,1])))

# We can plot the time signature to see the wavelet
src.show()


Allocating memory for src(954, 1)
Allocating memory for src_coords(1, 2)
Allocating memory for rec_coords(751, 2)
src_coordinate  X;           +3750.0000
src_coordinate  Z;           +3750.0000
rec_coordinates X min/max;   +3750.0000   +3750.0000
rec_coordinates Z min/max;      +0.0000   +7500.0000

Plot velocity and density models

Next we plot the velocity and density models for illustration.

  • The demarcation between interior and absorbing boundary is shown with a dotted white line
  • The source is shown as a large red asterisk
  • The extent of the receiver array is shown with a thick black line

In [8]:
# note: flip sense of second dimension to make the plot positive downwards
plt_extent = [origin_pad[0], origin_pad[0] + extent_pad[0],
              origin_pad[1] + extent_pad[1], origin_pad[1]]

vmin, vmax = 1.4, 1.7
dmin, dmax = 0.9, 1.1

plt.figure(figsize=(12,8))

plt.subplot(1, 2, 1)
plt.imshow(np.transpose(m.data), cmap=cm.jet, 
           vmin=vmin, vmax=vmax, extent=plt_extent)
plt.colorbar(orientation='horizontal', label='Velocity (m/msec)')
plt.plot([origin[0], origin[0], extent[0], extent[0], origin[0]],
         [origin[1], extent[1], extent[1], origin[1], origin[1]],
         'white', linewidth=4, linestyle=':', label="Absorbing Boundary")
plt.plot(rec.coordinates.data[:, 0], rec.coordinates.data[:, 1], \
         'black', linestyle='-', label="Receiver")
plt.plot(src.coordinates.data[:, 0], src.coordinates.data[:, 1], \
         'red', linestyle='None', marker='*', markersize=15, label="Source")
plt.xlabel("X Coordinate (m)")
plt.ylabel("Z Coordinate (m)")
plt.title("Velocity w/ absorbing boundary")

plt.subplot(1, 2, 2)
plt.imshow(np.transpose(1 / b.data), cmap=cm.jet,
           vmin=dmin, vmax=dmax, extent=plt_extent)
plt.colorbar(orientation='horizontal', label='Density (m^3/kg)')
plt.plot([origin[0], origin[0], extent[0], extent[0], origin[0]],
         [origin[1], extent[1], extent[1], origin[1], origin[1]],
         'white', linewidth=4, linestyle=':', label="Absorbing Boundary")
plt.plot(rec.coordinates.data[:, 0], rec.coordinates.data[:, 1], \
         'black', linestyle='-', label="Receiver")
plt.plot(src.coordinates.data[:, 0], src.coordinates.data[:, 1], \
         'red', linestyle='None', marker='*', markersize=15, label="Source")
plt.xlabel("X Coordinate (m)")
plt.ylabel("Z Coordinate (m)")
plt.title("Density w/ absorbing boundary")

plt.tight_layout()
None


Create and plot the $\frac{\omega_c}{Q}$ model used for dissipation only attenuation

We have two remaining constants and fields from our SSA wave equation that we need to define:

  Symbol   Description
$$\omega_c = 2 \pi f_c$$ Center angular frequency
$$\frac{1}{Q(x,z)}$$ Inverse Q model used in the modeling system

The absorbing boundary condition strategy we use is designed to eliminate any corners or edges in the attenuation profile. We do this by making Q a function of distance from the nearest boundary.

We have implemented the function setup_w_over_q for 2D and 3D fields in the file utils.py, and will use it below. In Devito these fields are type Function, a concrete implementation of AbstractFunction.

Feel free to inspect the source at utils.py, which uses Devito's symbolic math to write a nonlinear equation describing the absorbing boundary for dispatch to automatic code generation.

Note that we will generate two Q models, one with strong attenuation (a Q value of 25) and one with moderate attenuation (a Q value of 100) -- in order to demonstrate the impact of attenuation in the plots near the end of this notebook.


In [9]:
#NBVAL_INGNORE_OUTPUT

# Initialize the attenuation profile for Q=25 and Q=100 models
w = 2.0 * np.pi * fpeak
print("w,fpeak; ", w, fpeak)
qmin = 0.1

wOverQ_025 = Function(name='wOverQ_025', grid=grid, space_order=space_order)
wOverQ_100 = Function(name='wOverQ_100', grid=grid, space_order=space_order)

setup_w_over_q(wOverQ_025, w, qmin, 25.0, npad)
setup_w_over_q(wOverQ_100, w, qmin, 100.0, npad)

# Plot the log of the generated Q profile
q025 = np.log10(w / wOverQ_025.data)
q100 = np.log10(w / wOverQ_100.data)
lmin, lmax = np.log10(qmin), np.log10(100)

plt.figure(figsize=(12,8))

plt.subplot(1, 2, 1)
plt.imshow(np.transpose(q025.data), cmap=cm.jet, 
           vmin=lmin, vmax=lmax, extent=plt_extent)
plt.colorbar(orientation='horizontal', label='log10(Q)')
plt.plot([origin[0], origin[0], extent[0], extent[0], origin[0]],
         [origin[1], extent[1], extent[1], origin[1], origin[1]],
         'white', linewidth=4, linestyle=':', label="Absorbing Boundary")
plt.plot([origin[0], origin[0], extent[0], extent[0], origin[0]],
         [origin[1], extent[1], extent[1], origin[1], origin[1]],
         'white', linewidth=4, linestyle=':', label="Absorbing Boundary")
plt.plot(rec.coordinates.data[:, 0], rec.coordinates.data[:, 1], \
         'black', linestyle='-', label="Receiver")
plt.plot(src.coordinates.data[:, 0], src.coordinates.data[:, 1], \
         'red', linestyle='None', marker='*', markersize=15, label="Source")
plt.xlabel("X Coordinate (m)")
plt.ylabel("Z Coordinate (m)")
plt.title("log10 of $Q=25$ model")

plt.subplot(1, 2, 2)
plt.imshow(np.transpose(q100.data), cmap=cm.jet,
           vmin=lmin, vmax=lmax, extent=plt_extent)
plt.colorbar(orientation='horizontal', label='log10(Q)')
plt.plot([origin[0], origin[0], extent[0], extent[0], origin[0]],
         [origin[1], extent[1], extent[1], origin[1], origin[1]],
         'white', linewidth=4, linestyle=':', label="Absorbing Boundary")
plt.plot([origin[0], origin[0], extent[0], extent[0], origin[0]],
         [origin[1], extent[1], extent[1], origin[1], origin[1]],
         'white', linewidth=4, linestyle=':', label="Absorbing Boundary")
plt.plot(rec.coordinates.data[:, 0], rec.coordinates.data[:, 1], \
         'black', linestyle='-', label="Receiver")
plt.plot(src.coordinates.data[:, 0], src.coordinates.data[:, 1], \
         'red', linestyle='None', marker='*', markersize=15, label="Source")
plt.xlabel("X Coordinate (m)")
plt.ylabel("Z Coordinate (m)")
plt.title("log10 of $Q=100$ model")

plt.tight_layout()
None


Operator `WOverQ_Operator` generated in 0.07 s
  * lowering.Expressions: 0.03 s (43.9 %)
  * lowering.IET: 0.03 s (43.9 %)
     * specializing.IET: 0.03 s (43.9 %)
Flops reduction after symbolic optimization: [10 --> 10]
Allocating memory for wOverQ_025(867, 867)
Operator `WOverQ_Operator` fetched `/tmp/devito-jitcache-uid5138/1188a66fe04225bb4e1dfdbbb93c8e4256f7711c.c` in 0.04 s from jit-cache
Operator `WOverQ_Operator` run in 0.01 s
* section0<851,851> with OI=1.25 computed in 0.01 s [1.59 GFlops/s]
Performance[mode=advanced] arguments: {'nthreads': 16}
w,fpeak;  0.06283185307179587 0.01
Operator `WOverQ_Operator` generated in 0.09 s
  * lowering.Expressions: 0.04 s (49.7 %)
  * lowering.IET: 0.04 s (49.7 %)
     * specializing.IET: 0.03 s (37.3 %)
  * lowering.Clusters: 0.02 s (24.9 %)
Flops reduction after symbolic optimization: [10 --> 10]
Allocating memory for wOverQ_100(867, 867)
Operator `WOverQ_Operator` fetched `/tmp/devito-jitcache-uid5138/d0a56ff0c7d7d21cdcc1223644e961c639d58ff2.c` in 0.03 s from jit-cache
Operator `WOverQ_Operator` run in 0.01 s
* section0<851,851> with OI=1.25 computed in 0.01 s [2.20 GFlops/s]
Performance[mode=advanced] arguments: {'nthreads': 16}

Define the pressure wavefield as a TimeFunction

We specify the time_order as 2, which allocates 3 time steps in the pressure wavefield. As described elsewhere, Devito will use "cyclic indexing" to index into this multi-dimensional array, mneaning that via the modulo operator, the time indices $[0, 1, 2, 3, 4, 5, ...]$ are mapped into the modulo indices $[0, 1, 2, 0, 1, 2, ...]$

This FAQ entry explains in more detail.


In [10]:
# Define the TimeFunction
u = TimeFunction(name="u", grid=grid, time_order=2, space_order=space_order)

# Get the symbols for dimensions for t, x, z 
# We need these below in order to write the source injection and the
t,x,z = u.dimensions

Define the source injection and receiver extraction

If you examine the equation for the time update we derived above you will see that the source $q$ is scaled by the term $(\Delta_t^2 m^2\ /\ b)$. You will see that scaling term in the source injection below. For $\Delta_t^2$ we use the time dimension spacing symbol t.spacing**2.

Note that source injection and receiver extraction are accomplished via linear interpolation, as implemented in SparseTimeFunction in sparse.py.


In [11]:
# Source injection, with appropriate scaling
src_term = src.inject(field=u.forward, expr=src * t.spacing**2 * m**2 / b)

# Receiver extraction 
rec_term = rec.interpolate(expr=u.forward)

Finally, the Devito operator

We next transcribe the time update expression we derived above into a Devito Eq. Then we add that expression with the source injection and receiver extraction and build an Operator that will generate the c code for performing the modeling.

We copy the time update expression from above for clarity. Note we omit $q$ because we will be explicitly injecting the source using src_term defined immediately above. However, for the linearized Born forward modeling operation the $q$ term is an appropriately scaled field, as shown in the next notebook in this series.

$$ \begin{aligned} u(t+\Delta_t) &= \Delta_t^2 \frac{m^2}{b} \left[ \overleftarrow{\partial_x}\left(b\ \overrightarrow{\partial_x}\ u \right) + \overleftarrow{\partial_y}\left(b\ \overrightarrow{\partial_y}\ u \right) + \overleftarrow{\partial_z}\left(b\ \overrightarrow{\partial_z}\ u \right) + q \right] \\[10pt] & \quad \left(2 -\ \Delta_t\ \frac{\omega_c}{Q} \right) u(t) + \left(\Delta_t\ \frac{\omega_c}{Q} - 1 \right) u(t-\Delta_t) \end{aligned} $$

In [12]:
#NBVAL_INGNORE_OUTPUT

# Generate the time update equation and operator for Q=25 model
eq_time_update = (t.spacing**2 * m**2 / b) * \
    ((b * u.dx(x0=x+x.spacing/2)).dx(x0=x-x.spacing/2) + \
     (b * u.dz(x0=z+z.spacing/2)).dz(x0=z-z.spacing/2)) + \
    (2 - t.spacing * wOverQ_025) * u + \
    (t.spacing * wOverQ_025 - 1) * u.backward

stencil = Eq(u.forward, eq_time_update)

# Update the dimension spacing_map to include the time dimension
# These symbols will be replaced with the relevant scalars by the Operator
spacing_map = grid.spacing_map
spacing_map.update({t.spacing : dt})
print("spacing_map; ", spacing_map)

# op = Operator([stencil] + src_term + rec_term)
op = Operator([stencil] + src_term + rec_term, subs=spacing_map)


spacing_map;  {h_x: 10.0, h_z: 10.0, dt: 2.1}
Operator `Kernel` generated in 1.75 s
  * lowering.Expressions: 0.90 s (51.5 %)
  * lowering.IET: 0.45 s (25.8 %)
     * specializing.IET: 0.36 s (20.6 %)
  * lowering.Clusters: 0.36 s (20.6 %)
Flops reduction after symbolic optimization: [298 --> 68]

Impact of hardwiring the grid spacing on operation count

The argument subs=spacing_map passed to the operator substitutes values for the temporal and spatial dimensions into the expressions before code generation. This reduces the number of floating point operations executed by the kernel by pre-evaluating certain coefficients, and possibly absorbing the spacing scalars from the denominators of the numerical finite difference approximations into the finite difference coefficients.

If you run the two cases of passing/not passing the subs=spacing_map argument by commenting/un-commenting the last two lines of the cell immediately above, you can inspect the difference in computed flop count for the operator. This is reported by setting Devito logging configuration['log-level'] = 'DEBUG' and is reported during Devito symbolic optimization with the output line Flops reduction after symbolic optimization. Note also if you inspect the generated code for the two cases, you will see extra calling parameters are required for the case without the substitution. We have compiled the flop count for 2D and 3D operators into the table below.

Dimensionality Passing subs Flops reduction Delta
2D False 588 --> 81
2D True 300 --> 68 13.7%
3D False 875 --> 116
3D True 442 --> 95 18.1%

Note the gain in performance is around 14% for this example in 2D, and around 18% in 3D.

We use op.arguments() to print the arguments to the operator. As noted above depending on the use of subs=spacing_map you will see different arguments here. In the case of no subs=spacing_map argument to the operator, you will see arguments for the dimensional spacing constants as parameters to the operator, including h_x, h_z, and dt.


In [13]:
# NBVAL_IGNORE_OUTPUT

op.arguments()


Allocating memory for rec(954, 751)
Allocating memory for u(3, 867, 867)
Out[13]:
{'b': <cparam 'P' (0x2b3e13aa9d70)>,
 'x_m': 0,
 'x_size': 851,
 'x_M': 850,
 'z_m': 0,
 'z_size': 851,
 'z_M': 850,
 'm': <cparam 'P' (0x2b3e16034bb0)>,
 'o_x': -500.0,
 'o_z': -500.0,
 'rec': <cparam 'P' (0x2b3e16035270)>,
 'time_m': 1,
 'time_size': 954,
 'time_M': 952,
 'p_rec_m': 0,
 'p_rec_size': 751,
 'p_rec_M': 750,
 'rec_coords': <cparam 'P' (0x2b3e16034270)>,
 'd_m': 0,
 'd_size': 2,
 'd_M': 1,
 'src': <cparam 'P' (0x2b3e1600d9f0)>,
 'p_src_m': 0,
 'p_src_size': 1,
 'p_src_M': 0,
 'src_coords': <cparam 'P' (0x2b3e1600d030)>,
 'u': <cparam 'P' (0x2b3e1600a630)>,
 't_size': 3,
 'wOverQ_025': <cparam 'P' (0x2b3e16009c70)>,
 'nthreads': 16,
 'nthreads_nonaffine': 16,
 'timers': <cparam 'P' (0x2b3e1612b2d0)>}

We use print(op) to output the generated c code for review.


In [14]:
# NBVAL_IGNORE_OUTPUT

print(op)


#define _POSIX_C_SOURCE 200809L
#include "stdlib.h"
#include "math.h"
#include "sys/time.h"
#include "xmmintrin.h"
#include "pmmintrin.h"
#include "omp.h"

struct dataobj
{
  void *restrict data;
  int * size;
  int * npsize;
  int * dsize;
  int * hsize;
  int * hofs;
  int * oofs;
} ;

struct profiler
{
  double section0;
  double section1;
  double section2;
} ;


int Kernel(struct dataobj *restrict b_vec, struct dataobj *restrict m_vec, const double o_x, const double o_z, struct dataobj *restrict rec_vec, struct dataobj *restrict rec_coords_vec, struct dataobj *restrict src_vec, struct dataobj *restrict src_coords_vec, struct dataobj *restrict u_vec, struct dataobj *restrict wOverQ_025_vec, const int x_M, const int x_m, const int x_size, const int z_M, const int z_m, const int z_size, const int p_rec_M, const int p_rec_m, const int p_src_M, const int p_src_m, const int time_M, const int time_m, struct profiler * timers, const int nthreads, const int nthreads_nonaffine)
{
  double (*restrict b)[b_vec->size[1]] __attribute__ ((aligned (64))) = (double (*)[b_vec->size[1]]) b_vec->data;
  double (*restrict m)[m_vec->size[1]] __attribute__ ((aligned (64))) = (double (*)[m_vec->size[1]]) m_vec->data;
  double (*restrict rec)[rec_vec->size[1]] __attribute__ ((aligned (64))) = (double (*)[rec_vec->size[1]]) rec_vec->data;
  double (*restrict rec_coords)[rec_coords_vec->size[1]] __attribute__ ((aligned (64))) = (double (*)[rec_coords_vec->size[1]]) rec_coords_vec->data;
  double (*restrict src)[src_vec->size[1]] __attribute__ ((aligned (64))) = (double (*)[src_vec->size[1]]) src_vec->data;
  double (*restrict src_coords)[src_coords_vec->size[1]] __attribute__ ((aligned (64))) = (double (*)[src_coords_vec->size[1]]) src_coords_vec->data;
  double (*restrict u)[u_vec->size[1]][u_vec->size[2]] __attribute__ ((aligned (64))) = (double (*)[u_vec->size[1]][u_vec->size[2]]) u_vec->data;
  double (*restrict wOverQ_025)[wOverQ_025_vec->size[1]] __attribute__ ((aligned (64))) = (double (*)[wOverQ_025_vec->size[1]]) wOverQ_025_vec->data;
  double (*r24)[z_size + 3 + 4];
  posix_memalign((void**)&r24, 64, sizeof(double[x_size + 3 + 4][z_size + 3 + 4]));
  double (*r25)[z_size + 3 + 4];
  posix_memalign((void**)&r25, 64, sizeof(double[x_size + 3 + 4][z_size + 3 + 4]));
  /* Flush denormal numbers to zero in hardware */
  _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
  _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
  for (int time = time_m, t0 = (time)%(3), t1 = (time + 1)%(3), t2 = (time + 2)%(3); time <= time_M; time += 1, t0 = (time)%(3), t1 = (time + 1)%(3), t2 = (time + 2)%(3))
  {
    struct timeval start_section0, end_section0;
    gettimeofday(&start_section0, NULL);
    /* Begin section0 */
    #pragma omp parallel num_threads(nthreads)
    {
      #pragma omp for collapse(1) schedule(dynamic,1)
      for (int x = x_m - 4; x <= x_M + 3; x += 1)
      {
        #pragma omp simd aligned(u:32)
        for (int z = z_m - 4; z <= z_M + 3; z += 1)
        {
          r24[x + 4][z + 4] = 6.97544642889625e-5*(u[t0][x + 8][z + 5] - u[t0][x + 8][z + 12]) + 9.5703125007276e-4*(-u[t0][x + 8][z + 6] + u[t0][x + 8][z + 11]) + 7.97526041715173e-3*(u[t0][x + 8][z + 7] - u[t0][x + 8][z + 10]) + 1.1962890625e-1*(-u[t0][x + 8][z + 8] + u[t0][x + 8][z + 9]);
          r25[x + 4][z + 4] = 6.97544642889625e-5*(u[t0][x + 5][z + 8] - u[t0][x + 12][z + 8]) + 9.5703125007276e-4*(-u[t0][x + 6][z + 8] + u[t0][x + 11][z + 8]) + 7.97526041715173e-3*(u[t0][x + 7][z + 8] - u[t0][x + 10][z + 8]) + 1.1962890625e-1*(-u[t0][x + 8][z + 8] + u[t0][x + 9][z + 8]);
        }
      }
    }
    #pragma omp parallel num_threads(nthreads)
    {
      #pragma omp for collapse(1) schedule(dynamic,1)
      for (int x = x_m; x <= x_M; x += 1)
      {
        #pragma omp simd aligned(b,m,u,wOverQ_025:32)
        for (int z = z_m; z <= z_M; z += 1)
        {
          double r6 = 2.1*wOverQ_025[x + 8][z + 8] - 1;
          double r7 = 2 - 2.1*wOverQ_025[x + 8][z + 8];
          u[t1][x + 8][z + 8] = r6*u[t2][x + 8][z + 8] + r7*u[t0][x + 8][z + 8] + (4.41*(m[x + 8][z + 8]*m[x + 8][z + 8])*(6.97544642889625e-5*(b[x + 4][z + 8]*r25[x][z + 4] + b[x + 8][z + 4]*r24[x + 4][z] - b[x + 8][z + 11]*r24[x + 4][z + 7] - b[x + 11][z + 8]*r25[x + 7][z + 4]) + 9.5703125007276e-4*(-b[x + 5][z + 8]*r25[x + 1][z + 4] - b[x + 8][z + 5]*r24[x + 4][z + 1] + b[x + 8][z + 10]*r24[x + 4][z + 6] + b[x + 10][z + 8]*r25[x + 6][z + 4]) + 7.97526041715173e-3*(b[x + 6][z + 8]*r25[x + 2][z + 4] + b[x + 8][z + 6]*r24[x + 4][z + 2] - b[x + 8][z + 9]*r24[x + 4][z + 5] - b[x + 9][z + 8]*r25[x + 5][z + 4]) + 1.1962890625e-1*(-b[x + 7][z + 8]*r25[x + 3][z + 4] - b[x + 8][z + 7]*r24[x + 4][z + 3] + b[x + 8][z + 8]*r24[x + 4][z + 4] + b[x + 8][z + 8]*r25[x + 4][z + 4])))/b[x + 8][z + 8];
        }
      }
    }
    /* End section0 */
    gettimeofday(&end_section0, NULL);
    timers->section0 += (double)(end_section0.tv_sec-start_section0.tv_sec)+(double)(end_section0.tv_usec-start_section0.tv_usec)/1000000;
    struct timeval start_section1, end_section1;
    gettimeofday(&start_section1, NULL);
    /* Begin section1 */
    #pragma omp parallel num_threads(nthreads_nonaffine)
    {
      int chunk_size = (int)(fmax(1, (1.0F/3.0F)*(p_src_M - p_src_m + 1)/nthreads_nonaffine));
      #pragma omp for collapse(1) schedule(dynamic,chunk_size)
      for (int p_src = p_src_m; p_src <= p_src_M; p_src += 1)
      {
        int ii_src_0 = (int)(floor(-1.0e-1*o_x + 1.0e-1*src_coords[p_src][0]));
        int ii_src_1 = (int)(floor(-1.0e-1*o_z + 1.0e-1*src_coords[p_src][1]));
        int ii_src_2 = (int)(floor(-1.0e-1*o_z + 1.0e-1*src_coords[p_src][1])) + 1;
        int ii_src_3 = (int)(floor(-1.0e-1*o_x + 1.0e-1*src_coords[p_src][0])) + 1;
        double px = (double)(-o_x - 1.0e+1*(int)(floor(-1.0e-1*o_x + 1.0e-1*src_coords[p_src][0])) + src_coords[p_src][0]);
        double pz = (double)(-o_z - 1.0e+1*(int)(floor(-1.0e-1*o_z + 1.0e-1*src_coords[p_src][1])) + src_coords[p_src][1]);
        if (ii_src_0 >= x_m - 1 && ii_src_1 >= z_m - 1 && ii_src_0 <= x_M + 1 && ii_src_1 <= z_M + 1)
        {
          double r0 = 4.41*(m[ii_src_0 + 8][ii_src_1 + 8]*m[ii_src_0 + 8][ii_src_1 + 8])*(1.0e-2*px*pz - 1.0e-1*px - 1.0e-1*pz + 1)*src[time][p_src]/b[ii_src_0 + 8][ii_src_1 + 8];
          #pragma omp atomic update
          u[t1][ii_src_0 + 8][ii_src_1 + 8] += r0;
        }
        if (ii_src_0 >= x_m - 1 && ii_src_2 >= z_m - 1 && ii_src_0 <= x_M + 1 && ii_src_2 <= z_M + 1)
        {
          double r1 = 4.41*(m[ii_src_0 + 8][ii_src_2 + 8]*m[ii_src_0 + 8][ii_src_2 + 8])*(-1.0e-2*px*pz + 1.0e-1*pz)*src[time][p_src]/b[ii_src_0 + 8][ii_src_2 + 8];
          #pragma omp atomic update
          u[t1][ii_src_0 + 8][ii_src_2 + 8] += r1;
        }
        if (ii_src_1 >= z_m - 1 && ii_src_3 >= x_m - 1 && ii_src_1 <= z_M + 1 && ii_src_3 <= x_M + 1)
        {
          double r2 = 4.41*(m[ii_src_3 + 8][ii_src_1 + 8]*m[ii_src_3 + 8][ii_src_1 + 8])*(-1.0e-2*px*pz + 1.0e-1*px)*src[time][p_src]/b[ii_src_3 + 8][ii_src_1 + 8];
          #pragma omp atomic update
          u[t1][ii_src_3 + 8][ii_src_1 + 8] += r2;
        }
        if (ii_src_2 >= z_m - 1 && ii_src_3 >= x_m - 1 && ii_src_2 <= z_M + 1 && ii_src_3 <= x_M + 1)
        {
          double r3 = 4.41e-2*px*pz*(m[ii_src_3 + 8][ii_src_2 + 8]*m[ii_src_3 + 8][ii_src_2 + 8])*src[time][p_src]/b[ii_src_3 + 8][ii_src_2 + 8];
          #pragma omp atomic update
          u[t1][ii_src_3 + 8][ii_src_2 + 8] += r3;
        }
      }
    }
    /* End section1 */
    gettimeofday(&end_section1, NULL);
    timers->section1 += (double)(end_section1.tv_sec-start_section1.tv_sec)+(double)(end_section1.tv_usec-start_section1.tv_usec)/1000000;
    struct timeval start_section2, end_section2;
    gettimeofday(&start_section2, NULL);
    /* Begin section2 */
    #pragma omp parallel num_threads(nthreads_nonaffine)
    {
      int chunk_size = (int)(fmax(1, (1.0F/3.0F)*(p_rec_M - p_rec_m + 1)/nthreads_nonaffine));
      #pragma omp for collapse(1) schedule(dynamic,chunk_size)
      for (int p_rec = p_rec_m; p_rec <= p_rec_M; p_rec += 1)
      {
        int ii_rec_0 = (int)(floor(-1.0e-1*o_x + 1.0e-1*rec_coords[p_rec][0]));
        int ii_rec_1 = (int)(floor(-1.0e-1*o_z + 1.0e-1*rec_coords[p_rec][1]));
        int ii_rec_2 = (int)(floor(-1.0e-1*o_z + 1.0e-1*rec_coords[p_rec][1])) + 1;
        int ii_rec_3 = (int)(floor(-1.0e-1*o_x + 1.0e-1*rec_coords[p_rec][0])) + 1;
        double px = (double)(-o_x - 1.0e+1*(int)(floor(-1.0e-1*o_x + 1.0e-1*rec_coords[p_rec][0])) + rec_coords[p_rec][0]);
        double pz = (double)(-o_z - 1.0e+1*(int)(floor(-1.0e-1*o_z + 1.0e-1*rec_coords[p_rec][1])) + rec_coords[p_rec][1]);
        double sum = 0.0;
        if (ii_rec_0 >= x_m - 1 && ii_rec_1 >= z_m - 1 && ii_rec_0 <= x_M + 1 && ii_rec_1 <= z_M + 1)
        {
          sum += (1.0e-2*px*pz - 1.0e-1*px - 1.0e-1*pz + 1)*u[t1][ii_rec_0 + 8][ii_rec_1 + 8];
        }
        if (ii_rec_0 >= x_m - 1 && ii_rec_2 >= z_m - 1 && ii_rec_0 <= x_M + 1 && ii_rec_2 <= z_M + 1)
        {
          sum += (-1.0e-2*px*pz + 1.0e-1*pz)*u[t1][ii_rec_0 + 8][ii_rec_2 + 8];
        }
        if (ii_rec_1 >= z_m - 1 && ii_rec_3 >= x_m - 1 && ii_rec_1 <= z_M + 1 && ii_rec_3 <= x_M + 1)
        {
          sum += (-1.0e-2*px*pz + 1.0e-1*px)*u[t1][ii_rec_3 + 8][ii_rec_1 + 8];
        }
        if (ii_rec_2 >= z_m - 1 && ii_rec_3 >= x_m - 1 && ii_rec_2 <= z_M + 1 && ii_rec_3 <= x_M + 1)
        {
          sum += 1.0e-2*px*pz*u[t1][ii_rec_3 + 8][ii_rec_2 + 8];
        }
        rec[time][p_rec] = sum;
      }
    }
    /* End section2 */
    gettimeofday(&end_section2, NULL);
    timers->section2 += (double)(end_section2.tv_sec-start_section2.tv_sec)+(double)(end_section2.tv_usec-start_section2.tv_usec)/1000000;
  }
  free(r24);
  free(r25);
  return 0;
}

Run the operator for the Q=25 and Q=100 models

By setting Devito logging configuration['log-level'] = 'DEBUG' we have enabled output of statistics related to the performance of the operator, which you will see below when the operator runs.

We will run the Operator once with the Q model as defined wOverQ_025, and then run a second time passing the wOverQ_100 Q model. For the second run with the different Q model, we take advantage of the placeholder design patten in the Devito Operator.

For more information on this see the FAQ entry.


In [15]:
#NBVAL_INGNORE_OUTPUT

# Run the operator for the Q=25 model
print("m          min/max; %+12.6e %+12.6e" % (np.min(m.data), np.max(m.data)))
print("b          min/max; %+12.6e %+12.6e" % (np.min(b.data), np.max(b.data)))
print("wOverQ_025 min/max; %+12.6e %+12.6e" % (np.min(wOverQ_025.data), np.max(wOverQ_025.data)))
print("wOverQ_100 min/max; %+12.6e %+12.6e" % (np.min(wOverQ_100.data), np.max(wOverQ_100.data)))
print(time_range)
u.data[:] = 0
op(time=time_range.num-1)
# summary = op(time=time_range.num-1, h_x=dx, h_z=dz, dt=dt)

# Save the Q=25 results and run the Q=100 case
import copy
pQ25 = copy.copy(u)
recQ25 = copy.copy(rec)

u.data[:] = 0
op(time=time_range.num-1, wOverQ_025=wOverQ_100)

print("Q= 25 receiver data min/max; %+12.6e %+12.6e" %\
      (np.min(recQ25.data[:]), np.max(recQ25.data[:])))
print("Q=100 receiver data min/max; %+12.6e %+12.6e" %\
      (np.min(rec.data[:]), np.max(rec.data[:])))


Operator `Kernel` fetched `/tmp/devito-jitcache-uid5138/d021f63181ed48b2c1ce476d6cd653fd20ee2cf3.c` in 0.02 s from jit-cache
m          min/max; +1.500000e+00 +1.500000e+00
b          min/max; +1.000000e+00 +1.000000e+00
wOverQ_025 min/max; +2.513274e-03 +6.283185e-01
wOverQ_100 min/max; +6.283185e-04 +6.283185e-01
TimeAxis: start=0, stop=2001.3, step=2.1, num=954
Operator `Kernel` run in 32.93 s
Global performance indicators
  * Achieved 0.02 FD-GPts/s
Local performance indicators
  * section0<<953,858,858>,<953,851,851>> with OI=0.94 computed in 18.67 s [2.53 GFlops/s, 0.04 GPts/s]
  * section1<<953,1>,<953,1>,<953,1>,<953,1>,<953,1>> with OI=1.93 computed in 8.61 s [0.01 GFlops/s, 0.01 GPts/s]
  * section2<<953,751>,<953,751>,<953,751>,<953,751>,<953,751>,<953,751>> with OI=2.42 computed in 5.65 s [0.01 GFlops/s]
Performance[mode=advanced] arguments: {'nthreads': 16, 'nthreads_nonaffine': 16}
Allocating memory for u(3, 867, 867)
Allocating memory for rec(954, 751)
Allocating memory for rec_coords(751, 2)
Operator `Kernel` run in 32.72 s
Global performance indicators
  * Achieved 0.02 FD-GPts/s
Local performance indicators
  * section0<<953,858,858>,<953,851,851>> with OI=0.94 computed in 18.41 s [2.57 GFlops/s, 0.04 GPts/s]
  * section1<<953,1>,<953,1>,<953,1>,<953,1>,<953,1>> with OI=1.93 computed in 8.71 s [0.01 GFlops/s, 0.01 GPts/s]
  * section2<<953,751>,<953,751>,<953,751>,<953,751>,<953,751>,<953,751>> with OI=2.42 computed in 5.60 s [0.01 GFlops/s]
Performance[mode=advanced] arguments: {'nthreads': 16, 'nthreads_nonaffine': 16}
Q= 25 receiver data min/max; -2.184589e+01 +4.205808e+01
Q=100 receiver data min/max; -2.200673e+01 +4.218462e+01

Plot the computed Q=25 and Q=100 wavefields


In [16]:
#NBVAL_INGNORE_OUTPUT

# Plot the two wavefields, normalized to Q=100 (the larger amplitude)
amax_Q25  = 1.0 * np.max(np.abs(pQ25.data[1,:,:]))
amax_Q100 = 1.0 * np.max(np.abs(u.data[1,:,:]))
print("amax Q= 25; %12.6f" % (amax_Q25))
print("amax Q=100; %12.6f" % (amax_Q100))

plt.figure(figsize=(12,8))

plt.subplot(1, 2, 1)
plt.imshow(np.transpose(pQ25.data[1,:,:] / amax_Q100), cmap="seismic", 
           vmin=-1, vmax=+1, extent=plt_extent)
plt.colorbar(orientation='horizontal', label='Amplitude')
plt.plot([origin[0], origin[0], extent[0], extent[0], origin[0]],
         [origin[1], extent[1], extent[1], origin[1], origin[1]],
         'black', linewidth=4, linestyle=':', label="Absorbing Boundary")
plt.plot(src.coordinates.data[:, 0], src.coordinates.data[:, 1], \
         'red', linestyle='None', marker='*', markersize=15, label="Source")
plt.xlabel("X Coordinate (m)")
plt.ylabel("Z Coordinate (m)")
plt.title("Data for $Q=25$ model")

plt.subplot(1, 2, 2)
plt.imshow(np.transpose(u.data[1,:,:] / amax_Q100), cmap="seismic",
           vmin=-1, vmax=+1, extent=plt_extent)
plt.colorbar(orientation='horizontal', label='Amplitude')
plt.plot([origin[0], origin[0], extent[0], extent[0], origin[0]],
         [origin[1], extent[1], extent[1], origin[1], origin[1]],
         'black', linewidth=4, linestyle=':', label="Absorbing Boundary")
plt.plot(src.coordinates.data[:, 0], src.coordinates.data[:, 1], \
         'red', linestyle='None', marker='*', markersize=15, label="Source")
plt.xlabel("X Coordinate (m)")
plt.ylabel("Z Coordinate (m)")
plt.title("Data for $Q=100$ model")

plt.tight_layout()
None


amax Q= 25;     0.156892
amax Q=100;     0.940393

Plot the computed Q=25 and Q=100 receiver gathers


In [17]:
#NBVAL_INGNORE_OUTPUT

# Plot the two receiver gathers, normalized to Q=100 (the larger amplitude)
amax_Q25  = 0.1 * np.max(np.abs(recQ25.data[:]))
amax_Q100 = 0.1 * np.max(np.abs(rec.data[:]))
print("amax Q= 25; %12.6f" % (amax_Q25))
print("amax Q=100; %12.6f" % (amax_Q100))

plt.figure(figsize=(12,8))

plt.subplot(1, 2, 1)
plt.imshow(recQ25.data[:,:] / amax_Q100, cmap="seismic", 
           vmin=-1, vmax=+1, extent=plt_extent, aspect="auto")
plt.colorbar(orientation='horizontal', label='Amplitude')
plt.xlabel("X Coordinate (m)")
plt.ylabel("Z Coordinate (m)")
plt.title("Receiver gather for $Q=25$ model")

plt.subplot(1, 2, 2)
plt.imshow(rec.data[:,:] / amax_Q100, cmap="seismic",
           vmin=-1, vmax=+1, extent=plt_extent, aspect="auto")
plt.colorbar(orientation='horizontal', label='Amplitude')
plt.xlabel("X Coordinate (m)")
plt.ylabel("Z Coordinate (m)")
plt.title("Receiver gather for $Q=100$ model")

plt.tight_layout()
None


amax Q= 25;     4.205808
amax Q=100;     4.218462

Show the output from Devito solving for the stencil

Note this takes a long time ... about 50 seconds, but obviates the need to solve for the time update expression as we did above.

If you would like to see the time update equation as generated by Devito symbolic optimization, uncomment the lines for the solve below.


In [18]:
#NBVAL_INGNORE_OUTPUT

# Define the partial_differential equation
# Note the backward shifted time derivative is obtained via u.dt(x0=t-0.5*t.spacing) 
pde = (b / m**2) * (wOverQ_100 * u.dt(x0=t-0.5*t.spacing) + u.dt2) -\
        (b * u.dx(x0=x+0.5*x.spacing)).dx(x0=x-0.5*x.spacing) -\
        (b * u.dz(x0=z+0.5*z.spacing)).dz(x0=z-0.5*z.spacing)

# Uncomment the next 5 lines to see the equation as generated by Devito
# t1 = timer()
# stencil = Eq(u.forward, solve(pde, u.forward))
# t2 = timer()
# print("solve ran in %.4f seconds." % (t2-t1)) 
# stencil

Discussion

This concludes the implementation of the nonlinear forward operator. This series continues in the next notebook that describes the implementation of the Jacobian linearized forward and adjoint operators.

ssa_02_iso_implementation2.ipynb

References