Uncertainty analysis of a 2-D slice model

Possible paper titles:

Posterior analysis of geological models/ geophysical inversions? with information theory

...with measures from information theory

...with information theoretic measures

...with information measures

...with information

Ensemble analysis...

Posterior and ensemble analysis...

(Note: reference to posterior analysis, e.g. in Tarantola paper!)

Include:

  • Analysis of pynoddy models:
    • simple example of graben (because the uncertainty reduction is so counter-intuitive),
    • and maybe more complex example of folding structure?
  • Analysis of object modelling results? (i.e. the typical "Stanford channels"? but: only two outcomes, so not so meaningful... better example?)
  • Analysis of posterior ensemble from geophysical inversion (Greenstone model? Other examples from Mark & Mark?)

Journal? Math. Geo? Tectonophysics (note: relevance to strucural geological models!)

Include theory: error bounds on information measures!


In [ ]:
%matplotlib inline

In [2]:
# here the usual imports. If any of the imports fails, 
# make sure that pynoddy is installed
# properly, ideally with 'python setup.py develop' 
# or 'python setup.py install'
import sys, os
import matplotlib.pyplot as plt
import numpy as np
# adjust some settings for matplotlib
from matplotlib import rcParams
# print rcParams
rcParams['font.size'] = 15
# determine path of repository to set paths corretly below
repo_path = os.path.realpath('../..')
sys.path.append('../..')
import pynoddy
import importlib
importlib.reload(pynoddy)
import pynoddy.history
import pynoddy.experiment
importlib.reload(pynoddy.experiment)
rcParams.update({'font.size': 15})

In [3]:
pynoddy.history.NoddyHistory(history="typeb.his")


Out[3]:
************************************************************
			Model Information
************************************************************

This model consists of 3 events:
	(1) - STRATIGRAPHY
	(2) - FOLD
	(3) - FOLD
The model extent is:
	x - 10000.0 m
	y - 7000.0 m
	z - 5000.0 m
Number of cells in each direction:
	nx = 50
	ny = 35
	nz = 25
The model origin is located at: 
	(0.0, 0.0, 5000.0)
The cubesize for model export is: 
	200 m


************************************************************
			Meta Data
************************************************************

The filename of the model is:
	 typeB.his
It was last saved (if origin was a history file!) at:
	 3/4/1997 17:30:4

Model set-up

Subsequently, we will use a model from the "Atlas of Structural Geophysics" as an example model.


In [4]:
# from pynoddy.experiment import monte_carlo
model_url = 'http://tectonique.net/asg/ch3/ch3_7/his/typeb.his'
ue = pynoddy.experiment.Experiment(history="typeb.his")

In [5]:
import sys
if sys.version_info[0] < 3:
    raise "Must be using Python 3"
print(sys.version_info[1])


6

In [6]:
ue.change_cube_size(100)
sec = ue.get_section('y')

In [7]:
tmp = open("typeb.his").readlines()

In [ ]:

BUG!!!!

Note: there is either a bug in pynoddy or in Noddy itself: but the slice plotting method fails: actually, not a slice is computed but the entire model (and the extent is also not correct!). Check with Mark in course of paper prep!


In [ ]:
sec.block.shape

In [8]:
ue.plot_section('y')


../../pynoddy/output.py:473: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  section_slice = data[:,cell_pos,:].transpose()

In [9]:
plt.imshow(sec.block[:,50,:].transpose(), origin = 'lower left', interpolation = 'none')


Out[9]:
<matplotlib.image.AxesImage at 0x11c0757f0>

In [ ]:
tmp = sec.block[:,50,:]
tmp.shape

In [10]:
ue.set_random_seed(12345)

In [11]:
ue.info(events_only = True)


This model consists of 3 events:
	(1) - STRATIGRAPHY
	(2) - FOLD
	(3) - FOLD

We now define the parameter uncertainties:


In [12]:
param_stats = [{'event' : 2, 
              'parameter': 'Amplitude',
              'stdev': 100.0,
              'type': 'normal'},
              {'event' : 2, 
              'parameter': 'Wavelength',
              'stdev': 500.0,
              'type': 'normal'},
              {'event' : 2, 
              'parameter': 'X',
              'stdev': 500.0,
              'type': 'normal'}]

ue.set_parameter_statistics(param_stats)

And, in a next step, perform the model sampling:


In [17]:
ue.set_random_seed(112358)
# perfrom random sampling
resolution = 100
sec = ue.get_section('y')
n_draws = 100
tmp = sec.block[:,50,:]
#
# Note: setting the dtype to 'int8' significantly reduces file size!
#
model_sections = np.empty((n_draws, tmp.shape[0], tmp.shape[1]), dtype='int8')

In [18]:
for i in range(n_draws):
    ue.random_draw()
    tmp_sec = ue.get_section('y', resolution = resolution, 
                             remove_tmp_files = True)
    model_sections[i,:,:] = tmp_sec.block[:,50,:]

Save the model data for later re-use (e.g. to extend the data set):


In [ ]:
import pickle

In [ ]:
model_sections.dtype

In [ ]:


In [140]:
import pickle
f_out = open("model_sections_tmp_int.pkl", 'wb')
pickle.dump(model_sections, f_out)
f_out.close()

In [ ]:
f_in = open("model_sections_tmp_int.pkl", 'rb')
model_sections = pickle.load(f_in)

Testing parallel execution

As a next step, use parallel execution of noddy calculation - should be relatively simple, however: note that, potentially, tmp-files may be overwritten! This may actually require an adaptation of compute_model to run in a tmp-folder...

Anyway, as a first step: here an example for parallel execution adapted from

http://sebastianraschka.com/Articles/2014_multiprocessing.html#the-process-class


In [ ]:
#
# Store current directory to get back from temporary files
#
ori_dir = os.getcwd()

In [46]:
import multiprocessing as mp
import random
import string

random.seed(123)

# Define an output queue
output = mp.Queue()

# define a example function
def rand_string(length, output):
    """ Generates a random string of numbers, lower- and uppercase chars. """
    rand_str = ''.join(random.choice(
                        string.ascii_lowercase
                        + string.ascii_uppercase
                        + string.digits)
                   for i in range(length))
    output.put(rand_str)

# Setup a list of processes that we want to run
processes = [mp.Process(target=rand_string, args=(5, output)) for x in range(4)]

# Run processes
for p in processes:
    p.start()

# Exit the completed processes
for p in processes:
    p.join()

# Get process results from the output queue
results = [output.get() for p in processes]

print(results)


['giEvN', 'MvYbL', 'WjpE4', 'Y2dJy']

Adapt model generation to use temp directory:


In [ ]:
import tempfile
import shutil

In [104]:
# Define function to perform one iteration
# Execute iterations in temporary directories to avoid overlap
#
# Note: needs to take outout as argument to add results
def compute_iter(ue, output):
    ue.random_draw()
    dirpath = tempfile.mkdtemp()
    os.chdir(dirpath)
    tmp_sec = ue.get_section('y', resolution = 100, 
                             remove_tmp_files = True)
    output.put(tmp_sec.block[:,50,:])

    
# Note: this is not the case for the 'pool' method:
# use `with` context management method to ensure that directory is deleted afterwards:
def compute_iter_pool(ue):
    from tempfile import TemporaryDirectory
    with TemporaryDirectory() as temp_dir:
        os.chdir(temp_dir)
        np.random.seed()
        ue.random_draw()
        tmp_sec = ue.get_section('y', resolution = 100, 
                                 remove_tmp_files = True)
    
    return tmp_sec.block[:,50,:]

In [103]:
np.random.seed()
np.random.randint(10)


Out[103]:
6

Try compute_iter_pool function in normal framework (non-parallel execution):


In [39]:
ori_dir = "/Users/Shared/git/pynoddy/docs/notebooks/"
os.chdir("/Users/Shared/git/pynoddy/docs/notebooks/") 
ue.set_random_seed(112358)
# perfrom random sampling
resolution = 100
sec = ue.get_section('y')

tmp = sec.block[:,50,:]
n_draws = 100
#
# Note: setting the dtype to 'int8' significantly reduces file size!
#
model_sections = np.empty((n_draws, tmp.shape[0], tmp.shape[1]), dtype='int8')

#
#
for i in range(n_draws):
    model_sections[i,:,:] = compute_iter_pool(ue)

os.chdir("/Users/Shared/git/pynoddy/docs/notebooks/")

In [40]:
model_sections.shape


Out[40]:
(100, 100, 40)

In [ ]:


In [ ]:
compute_iter_pool(ue)
os.chdir(ori_dir)

In [ ]:
!pwd

Trying with pool


In [ ]:
def cube(x):
    return x**3

pool = mp.Pool(processes=4)
results = [pool.apply(cube, args=(x,)) for x in range(1,7)]
print(results)

In [ ]:
pool = mp.Pool(processes=4)
results = [pool.apply_async(cube, args=(ue,)) for x in range(1,7)]
output = [p.get() for p in results]
print(output)

Executing Noddy MC in parallel

After a bit of fiddling around, it seems to work now:


In [81]:
import copy

In [ ]:
ue.set_random_seed(112358)
# perfrom random sampling
resolution = 100
sec = ue.get_section('y')

tmp = sec.block[:,50,:]
n_draws = 100000
#
# Note: setting the dtype to 'int8' significantly reduces file size!
#
model_sections = np.empty((n_draws, tmp.shape[0], tmp.shape[1]), dtype='int8')

result_list = []
def log_result(result):
    # This is called whenever foo_pool(i) returns a result.
    # result_list is modified only by the main process, not the pool workers.
    result_list.append(result)

pool = mp.Pool(processes=20)
for i in range(n_draws):
    # pool.apply_async(foo_pool, args = (i, ), callback = log_result)
    # try to use copy, but this did not fix the problem, unfortunately...
    # ue_copy = copy.deepcopy(ue)
    pool.apply_async(compute_iter_pool, args=(ue,), callback = log_result)
pool.close()
pool.join()

# model_sections = np.array([pool.apply(compute_iter_pool, args=(ue,)) for x in range(4)])

ori_dir = "/Users/Shared/git/pynoddy/docs/notebooks/"
os.chdir(ori_dir)

In [143]:
model_sections = np.array(result_list)
model_sections.shape


Out[143]:
(10000, 100, 40)

In [144]:
import pickle
f_out = open("model_sections_tmp_int.pkl", 'wb')
pickle.dump(model_sections, f_out)
f_out.close()

In [129]:
model_sections[0,:,:]


Out[129]:
array([[ 5.,  5.,  4., ...,  1.,  1.,  1.],
       [ 5.,  4.,  4., ...,  1.,  1.,  1.],
       [ 5.,  4.,  4., ...,  1.,  1.,  1.],
       ..., 
       [ 3.,  2.,  2., ...,  1.,  1.,  1.],
       [ 3.,  2.,  2., ...,  1.,  1.,  1.],
       [ 3.,  3.,  2., ...,  1.,  1.,  1.]])

In [130]:
plt.imshow(model_sections[1,:,:].transpose()-model_sections[0,:,:].transpose(), origin = 'lower left', interpolation = 'none')


Out[130]:
<matplotlib.image.AxesImage at 0x11d9f3208>

In [131]:
ori_dir = os.getcwd()
print(ori_dir)


/Users/Shared/git/pynoddy/docs/notebooks

Use hspace to calculate joint entropies

Adjusted notebook: use hspace-package to calculate information theoretic measures


In [132]:
sys.path.append('/Users/Shared/git/hspace')

In [133]:
import hspace.measures
importlib.reload(hspace.measures)


Out[133]:
<module 'hspace.measures' from '/Users/Shared/git/hspace/hspace/measures.py'>

In [134]:
model_sections.shape


Out[134]:
(1000, 100, 40)

Calculation of cell information entropy

(Include note on: theory of entropy calculation)

(Include in this paper: estimates on error bounds?)

Here now the function to calculate entropy from a data array in general. What we will need to do later is to pass all results at a single position as a "data array" and we can then estimate the information entropy at this position.

This function already expects a sorted array as an input and then uses the (ultra-fast) switchpoint method to calculate entropy:

The algorithm works on the simple idea that we do not explicitly require the single outputs at each location, but only the relative probability values. This may not matter too much for single entropy estimates (uni-variate), but it will matter a lot for multivariate cases, because we do not need to check all possible outcomes! Note that all outcomes with zero probability are simply not considered in the sorting algorithm (and they do not play any role in the calculation of the entropy, anyway), and that's exactly what we want to have!

In this new version, we use the implementation the hspace package:


In [145]:
hspace.measures.joint_entropy(model_sections[:,50,:], [0,1])


Out[145]:
13.1012216033389

In [146]:
h = np.empty_like(model_sections[0,:,:], dtype='float64')
for i in range(100):
    for j in range(40):
        h[i,j] = hspace.measures.joint_entropy(model_sections[:,i,j])
h[50,30]


Out[146]:
1.5427119961596838

We now visualise the cell information entropy, shown in Fig. (). We can here clearly identify uncertain regions within this model section. It is interesting to note that we can mostly still identify the distinct layer boundaries in the fuzzy areas of uncertainty around their borders (note: highlight in Figure!). However, additional aspects of uncertainty are now introduced: (a) the uncertainty about the x-position of the folds (see parameters: event 2, parameter x) is now clearly visible, and (b) uncertianties now seem to concentrate on the fold hinges. However, this is not so clear in the left part of the model, where the fold hing seems to be the least uncertain part. (check why: is this where the fold is actually fixed (even though still uncertain). My current interpretation: the fold location is fixed somewhere near this point, and so the wavelength uncertainty does not play a significant role. Furthermore, the fold is quite "open" at this position (i.e.: low angle between hinges) and therefore lateral shifts do not play a significant role.


In [147]:
plt.imshow(h.transpose(), origin = 'lower left',
          cmap = 'gray', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')


Out[147]:
<matplotlib.colorbar.Colorbar at 0x1254a3b38>

Here again an example of single models (adjust to visualise and probably include something like a "table plot" of multiple images for a paper!):


In [148]:
plt.imshow(model_sections[1,:,:].transpose(), origin = 'lower left', interpolation = 'none')


Out[148]:
<matplotlib.image.AxesImage at 0x1203f3d68>

And here the "mean" lithologies (note: not really a distinct meaning, simply showing the average litho ids - could be somehow interpreted as characteristic functions, though...).


In [149]:
plt.imshow(np.mean(model_sections, axis = 0).transpose(), origin = 'lower left', interpolation = 'none')


Out[149]:
<matplotlib.image.AxesImage at 0x1207102b0>

And here a bit more meaningful: the analysis of single layer probabilities:


In [154]:
# step 1: estimate probabilities (note: unfortunate workaround with ones multiplication,
# there may be a better way, but this is somehow a recurring problem of implicit
# array flattening in numpy)
litho_id = 2
prob = np.sum(np.ones_like(model_sections) * (model_sections == litho_id), axis = 0) / model_sections.shape[0]

In [155]:
plt.imshow(prob.transpose(), 
           origin = 'lower left', 
           interpolation = 'none', 
           cmap = 'gray_r')
plt.colorbar(orientation = 'horizontal')


Out[155]:
<matplotlib.colorbar.Colorbar at 0x1256a6438>

Idea: also include a "new" consideration: where to collect information to reduce uncertainty of a single layer? Could be identified by reducing layer fuzziness, for example!

Or: what are most likely positions/ locations of a specific unit, given collected information?

More ideas:
  • General methods to create simple representations in 1-D, 2-D?
  • Automatic probability plots (note: simple in 1-D, but also for slices in 2-D?) => Compare visualisations in previous notebooks!

In [ ]:

Analysis of multivariate condtional entropy

Later also:

  • "opposite" question, i.e.: if we would like to resolve uncertainty in a specific region: where to look best?
  • "complete" uncertainty (i.e.: joint entropy!)
  • greedy search for best spots for uncertainty reduction, simple (cell-wise), complex (related to potential drilling positions)
  • further ideas for "greedy search" to reduce uncertainties in a specific "search region" (i.e.: expected location of a deposit, etc.):
    • start with cell with highest (multivariate) mutual information
    • rank cells with highest I due to their own mutual information with other cells, which are not part of a defined "search region"
  • for simple, cell-wise: describe similarity to mapping! Maybe even a field example with data from Belgium? But this is still one or two MSc theses away...)

For the joint entropy analysis, we now use the new lexicographic (correct term?) sorting algorithm, implemented in the module hspace:


In [161]:
importlib.reload(hspace.measures)


Out[161]:
<module 'hspace.measures' from '/Users/Shared/git/hspace/hspace/measures.py'>

In [162]:
dx = 15
xvals = np.ones(dx, dtype=int) * 20
yvals = np.arange(11,11+dx, dtype=int)
pos = np.vstack([xvals, yvals])


hspace.measures.joint_entropy(model_sections, pos.T)


Out[162]:
3.44142984894791

In [163]:
# now: define position of "drill":
nx = 10
xvals = np.ones(nx, dtype=int) * 60
yvals = np.arange(39,39-nx, -1, dtype=int)
pos = np.vstack([xvals, yvals]).T

In [164]:
# determine joint entropy of drill_locs:
h_joint_drill = hspace.measures.joint_entropy(model_sections, pos)

In [165]:
# generate conditional entropies for entire section:
h_cond_drill = np.zeros_like(h)
for i in range(100):
    for j in range(40):
        # add new position to positions vector:
        pos_all = np.vstack([pos, np.array([i,j])])
        # determine joint entropy
        h_joint_loc = hspace.measures.joint_entropy(model_sections, pos_all)
        # subtract joint entropy of drill locs to obtain conditional entropy
        h_cond_drill[i,j] = h_joint_loc - h_joint_drill

In [166]:
plt.imshow(h_cond_drill.transpose(), origin = 'lower left',
          cmap = 'gray', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
# plot drilling positions above it:
plt.plot(pos[:,0], pos[:,1], 'ws')
plt.xlim([0,100])
plt.ylim([0,40])


Out[166]:
(0, 40)

Try own "entropy colormap":


In [169]:
n_max = 5
h_max = np.max(h)
plt.imshow(h_cond_drill.transpose(), origin = 'lower left',
          cmap = 'viridis', interpolation = 'none', vmax=h_max)
plt.colorbar(orientation = 'horizontal')
# half-step contour lines
contour_levels = np.log2(np.arange(1., n_max + 0.001, .5))
plt.contour(h_cond_drill.transpose(), contour_levels, colors = 'gray')
# superpose 1-step contour lines
contour_levels = np.log2(np.arange(1., n_max + 0.001, 1.))
plt.contour(h_cond_drill.transpose(), contour_levels, colors = 'white')

plt.plot(pos[:,0], pos[:,1], 'ws')
plt.xlim([0,99])
plt.ylim([0,39])


Out[169]:
(0, 39)

For comparison again: the entropy of the initial model:


In [170]:
plt.imshow(h.transpose(), origin = 'lower left',
          cmap = 'viridis', interpolation = 'none', vmax=h_max)
plt.colorbar(orientation = 'horizontal')
# half-step contour lines
contour_levels = np.log2(np.arange(1., n_max + 0.001, .5))
plt.contour(h.transpose(), contour_levels, colors = 'gray')
# superpose 1-step contour lines
contour_levels = np.log2(np.arange(1., n_max + 0.001, 1.))
plt.contour(h.transpose(), contour_levels, colors = 'white')


Out[170]:
<matplotlib.contour.QuadContourSet at 0x125897668>

And the difference, for clarity:


In [171]:
plt.imshow((h - h_cond_drill).transpose(), origin = 'lower left',
          cmap = 'viridis', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
# plot drilling positions above it:


plt.plot(pos[:,0], pos[:,1], 'ws')
plt.xlim([0,99])
plt.ylim([0,39])


Out[171]:
(0, 39)

Clearly, the highset reduction is in the area around the borehole, but interestingly, the uncertianty in other areas is also reduced! Note specifically the reduction of uncertainties in the two neighbouring fold hinges.

Let's check some other positions (and drilling "depths"):


In [172]:
# define position of "drill":
nx = 10
xvals = np.ones(nx, dtype=int) * 20
yvals = np.arange(39,39-nx, -1, dtype=int)
pos = np.vstack([xvals, yvals]).T

# determine joint entropy of drill_locs:
h_joint_drill = hspace.measures.joint_entropy(model_sections, pos)

We also just include one timing step to estimate the approximate simualtion time:


In [178]:
import time
start = time.time()
pos_all = np.vstack([pos, np.array([50,20])])
h_joint_loc = hspace.measures.joint_entropy(model_sections, pos_all)
end = time.time()
# print(end-start)
# esimated total time:
ttime = 100 * 40 * (end-start)
print("Estimated total time: %.3f seconds or %.3f minutes" % (ttime, ttime/60.))


Estimated total time: 30.452 seconds or 0.508 minutes

In [175]:
# generate conditional entropies for entire section:
h_cond_drill = np.zeros_like(h)
for i in range(100):
    for j in range(40):
        # add position to locations
        pos_all = np.vstack([pos, np.array([i,j])])
        h_joint_loc = hspace.measures.joint_entropy(model_sections, pos_all)
        # subtract joint entropy of drill locs to obtain conditional entropy
        h_cond_drill[i,j] = h_joint_loc - h_joint_drill

In [179]:
plt.imshow((h - h_cond_drill).transpose(), origin = 'lower left',
          cmap = 'RdBu', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
# plot drilling positions above it:
plt.plot(pos[:,0], pos[:,1], 'ws')
plt.xlim([0,99])
plt.ylim([0,39])


Out[179]:
(0, 39)

Intersting! Only a local reduction around the drilling position, however: extending to the deeper layers, as well! Why?

Drill deeper:


In [180]:
# define position of "drill":
nx = 30
xvals = np.ones(nx, dtype=int) * 20
yvals = np.arange(39,39-nx, -1, dtype=int)
pos = np.vstack([xvals, yvals]).T

# determine joint entropy of drill_locs:
h_joint_drill = hspace.measures.joint_entropy(model_sections, pos)

In [181]:
# generate conditional entropies for entire section:
h_cond_drill = np.zeros_like(h)
for i in range(100):
    for j in range(40):
        # add position to locations
        pos_all = np.vstack([pos, np.array([i,j])])
        h_joint_loc = hspace.measures.joint_entropy(model_sections, pos_all)
        # subtract joint entropy of drill locs to obtain conditional entropy
        h_cond_drill[i,j] = h_joint_loc - h_joint_drill

In [182]:
plt.imshow((h - h_cond_drill).transpose(), origin = 'lower left',
          cmap = 'RdBu', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
# plot drilling positions above it:
plt.plot(pos[:,0], pos[:,1], 'ws')
plt.xlim([0,99])
plt.ylim([0,39])


Out[182]:
(0, 39)

In [183]:
plt.imshow(h.transpose(), origin = 'lower left',
          cmap = 'gray', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')


Out[183]:
<matplotlib.colorbar.Colorbar at 0x1255e05c0>

In [184]:
plt.imshow((h - h_cond_drill).transpose(), origin = 'lower left',
          cmap = 'RdBu', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
# plot drilling positions above it:
plt.plot(pos[:,0], pos[:,1], 'ws')
plt.xlim([0,99])
plt.ylim([0,39])


Out[184]:
(0, 39)

Check deep "drilling" at pos 60


In [185]:
# define position of "drill":
nx = 30
xvals = np.ones(nx, dtype=int) * 60
yvals = np.arange(39,39-nx, -1, dtype=int)
pos = np.vstack([xvals, yvals]).T

# determine joint entropy of drill_locs:
h_joint_drill = hspace.measures.joint_entropy(model_sections, pos)

In [186]:
# generate conditional entropies for entire section:
h_cond_drill = np.zeros_like(h)
for i in range(100):
    for j in range(40):
        # add position to locations
        pos_all = np.vstack([pos, np.array([i,j])])
        h_joint_loc = hspace.measures.joint_entropy(model_sections, pos_all)
        # subtract joint entropy of drill locs to obtain conditional entropy
        h_cond_drill[i,j] = h_joint_loc - h_joint_drill

In [187]:
plt.imshow(h_cond_drill.transpose(), origin = 'lower left',
          cmap = 'gray', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
# plot drilling positions above it:
plt.plot(pos[:,0], pos[:,1], 'ws')
plt.xlim([0,99])
plt.ylim([0,39])


Out[187]:
(0, 39)

In [188]:
plt.imshow((h - h_cond_drill).transpose(), origin = 'lower left',
          cmap = 'RdBu', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
# plot drilling positions above it:
plt.plot(pos[:,0], pos[:,1], 'ws')
plt.xlim([0,99])
plt.ylim([0,39])


Out[188]:
(0, 39)

Interesting! And now both combined:


In [189]:
# define position of "drill":
nx = 30
xvals = np.hstack([np.ones(nx, dtype=int) * 60, np.ones(nx, dtype=int) * 30])
yvals = np.hstack([np.arange(39,39-nx, -1, dtype=int), np.arange(39,39-nx, -1, dtype=int)])
pos = np.vstack([xvals, yvals]).T

# determine joint entropy of drill_locs:
h_joint_drill = hspace.measures.joint_entropy(model_sections, pos)

In [190]:
%%timeit
h_joint_loc = hspace.measures.joint_entropy(model_sections, pos)


10 loops, best of 3: 48.4 ms per loop

In [191]:
# esimated total time:
ttime = 100 * 40 * 0.0002
print("Estimated total time: %.3f seconds or %.3f minutes" % (ttime, ttime/60.))


Estimated total time: 0.800 seconds or 0.013 minutes

In [192]:
# generate conditional entropies for entire section:
h_cond_drill = np.zeros_like(h)
for i in range(100):
    for j in range(40):
        # add position to locations
        pos_all = np.vstack([pos, np.array([i,j])])
        h_joint_loc = hspace.measures.joint_entropy(model_sections, pos_all)
        # subtract joint entropy of drill locs to obtain conditional entropy
        h_cond_drill[i,j] = h_joint_loc - h_joint_drill

In [193]:
plt.imshow(h_cond_drill.transpose(), origin = 'lower left',
          cmap = 'gray', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
# plot drilling positions above it:
plt.plot(pos[:,0], pos[:,1], 'ws')
plt.xlim([0,99])
plt.ylim([0,39])


Out[193]:
(0, 39)

In [194]:
plt.imshow((h - h_cond_drill).transpose(), origin = 'lower left',
          cmap = 'RdBu', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
# plot drilling positions above it:
plt.plot(pos[:,0], pos[:,1], 'ws')
plt.xlim([0,99])
plt.ylim([0,39])


Out[194]:
(0, 39)

We can see that now only a part on the left remains with significant uncertainties. So, let's "drill" into this, as well:


In [195]:
# define position of "drill":
nx = 30
xvals = np.hstack([np.ones(nx, dtype=int) * 60, 
                   np.ones(nx, dtype=int) * 30,
                   np.ones(nx, dtype=int) * 5])
yvals = np.hstack([np.arange(39,39-nx, -1, dtype=int), 
                   np.arange(39,39-nx, -1, dtype=int), 
                   np.arange(39,39-nx, -1, dtype=int)])
pos = np.vstack([xvals, yvals]).T

# determine joint entropy of drill_locs:
h_joint_drill = hspace.measures.joint_entropy(model_sections, pos)

In [196]:
# generate conditional entropies for entire section:
h_cond_drill = np.zeros_like(h)
for i in range(100):
    for j in range(40):
        # add position to locations
        pos_all = np.vstack([pos, np.array([i,j])])
        h_joint_loc = hspace.measures.joint_entropy(model_sections, pos_all)
        # subtract joint entropy of drill locs to obtain conditional entropy
        h_cond_drill[i,j] = h_joint_loc - h_joint_drill

In [197]:
plt.imshow(h_cond_drill.transpose(), origin = 'lower left',
          cmap = 'gray', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
# plot drilling positions above it:
plt.plot(pos[:,0], pos[:,1], 'ws')
plt.xlim([0,99])
plt.ylim([0,39])


Out[197]:
(0, 39)

In [198]:
plt.imshow((h - h_cond_drill).transpose(), origin = 'lower left',
          cmap = 'RdBu', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
# plot drilling positions above it:
plt.plot(pos[:,0], pos[:,1], 'ws')
plt.xlim([0,99])
plt.ylim([0,39])


Out[198]:
(0, 39)

Additional idea to speed up computation (especially for higher multivariate examples): do not estimate value at locations where conditional entropy of subset (i.e.: often previously calculated)! (Check: theoretical reason/ justification!)


In [ ]: