Possible paper titles:
Posterior analysis of geological models/ geophysical inversions? with information theory
...with measures from information theory
...with information theoretic measures
...with information measures
...with information
Ensemble analysis...
Posterior and ensemble analysis...
(Note: reference to posterior analysis, e.g. in Tarantola paper!)
Include:
Journal? Math. Geo? Tectonophysics (note: relevance to strucural geological models!)
Include theory: error bounds on information measures!
In [230]:
from IPython.core.display import HTML
css_file = 'pynoddy.css'
HTML(open(css_file, "r").read())
Out[230]:
In [231]:
%matplotlib inline
In [242]:
# here the usual imports. If any of the imports fails,
# make sure that pynoddy is installed
# properly, ideally with 'python setup.py develop'
# or 'python setup.py install'
import sys, os
import matplotlib.pyplot as plt
import numpy as np
# adjust some settings for matplotlib
from matplotlib import rcParams
# print rcParams
rcParams['font.size'] = 15
# determine path of repository to set paths corretly below
repo_path = os.path.realpath('../..')
import pynoddy
reload(pynoddy)
import pynoddy.history
import pynoddy.experiment
reload(pynoddy.experiment)
rcParams.update({'font.size': 15})
In [243]:
from pynoddy.experiment import monte_carlo
model_url = 'http://tectonique.net/asg/ch3/ch3_7/his/typeb.his'
ue = pynoddy.experiment.Experiment(url = model_url)
In [244]:
ue.change_cube_size(100)
sec = ue.get_section('y')
In [245]:
sec.block.shape
Out[245]:
In [246]:
ue.plot_section('y')
In [247]:
plt.imshow(sec.block[:,50,:].transpose(), origin = 'lower left', interpolation = 'none')
Out[247]:
In [248]:
tmp = sec.block[:,50,:]
tmp.shape
Out[248]:
In [249]:
ue.set_random_seed(12345)
In [250]:
ue.info(events_only = True)
We now define the parameter uncertainties:
In [251]:
param_stats = [{'event' : 2,
'parameter': 'Amplitude',
'stdev': 100.0,
'type': 'normal'},
{'event' : 2,
'parameter': 'Wavelength',
'stdev': 500.0,
'type': 'normal'},
{'event' : 2,
'parameter': 'X',
'stdev': 500.0,
'type': 'normal'}]
ue.set_parameter_statistics(param_stats)
And, in a next step, perform the model sampling:
In [258]:
ue.set_random_seed(112358)
# perfrom random sampling
resolution = 100
sec = ue.get_section('y')
tmp = sec.block[:,50,:]
n_draws = 5000
model_sections = np.empty((n_draws, tmp.shape[0], tmp.shape[1]))
for i in range(n_draws):
ue.random_draw()
tmp_sec = ue.get_section('y', resolution = resolution,
remove_tmp_files = True)
model_sections[i,:,:] = tmp_sec.block[:,50,:]
Save the model data for later re-use (e.g. to extend the data set):
In [100]:
import pickle
In [260]:
f_out = open("model_sections_5k.pkl", 'w')
In [261]:
pickle.dump(model_sections, f_out)
(Include note on: theory of entropy calculation)
(Include in this paper: estimates on error bounds?)
Here now the function to calculate entropy from a data array in general. What we will need to do later is to pass all results at a single position as a "data array" and we can then estimate the information entropy at this position.
This function already expects a sorted array as an input and then uses the (ultra-fast) switchpoint method to calculate entropy:
In [201]:
def entropy(diff_array, n_samples):
"""Determine entropy from diffarray using switchpoints"""
switchpts = np.where(diff_array > 0 )[0] + 1
switchpts = np.append(0, switchpts)
switchpts = np.append(switchpts, n_samples)
pdiff = np.diff(switchpts)
j_prob = pdiff / float(n_samples)
# calculate entropy
h = 0.
for jp in j_prob:
h -= jp * np.log2(jp)
return h
The algorithm works on the simple idea that we do not explicitly require the single outputs at each location, but only the relative probability values. This may not matter too much for single entropy estimates (uni-variate), but it will matter a lot for multivariate cases, because we do not need to check all possible outcomes! Note that all outcomes with zero probability are simply not considered in the sorting algorithm (and they do not play any role in the calculation of the entropy, anyway), and that's exactly what we want to have!
In [328]:
# sort data (axis = 0: sort along model results!)
mssort = np.sort(model_sections, axis = 0)
# create difference matrix
mssort_diff = mssort[1:,:,:] - mssort[:-1,:,:]
n_samples = model_sections.shape[0]
# and now: for all!
h = np.empty_like(mssub[0,:,:])
for i in range(100):
for j in range(40):
h[i,j] = entropy(mssort_diff[:,i,j], n_samples)
h[50,30]
Out[328]:
We now visualise the cell information entropy, shown in Fig. (). We can here clearly identify uncertain regions within this model section. It is interesting to note that we can mostly still identify the distinct layer boundaries in the fuzzy areas of uncertainty around their borders (note: highlight in Figure!). However, additional aspects of uncertainty are now introduced: (a) the uncertainty about the x-position of the folds (see parameters: event 2, parameter x) is now clearly visible, and (b) uncertianties now seem to concentrate on the fold hinges. However, this is not so clear in the left part of the model, where the fold hing seems to be the least uncertain part. (check why: is this where the fold is actually fixed (even though still uncertain). My current interpretation: the fold location is fixed somewhere near this point, and so the wavelength uncertainty does not play a significant role. Furthermore, the fold is quite "open" at this position (i.e.: low angle between hinges) and therefore lateral shifts do not play a significant role.
In [329]:
plt.imshow(h.transpose(), origin = 'lower left',
cmap = 'gray', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
Out[329]:
Here again an example of single models (adjust to visualise and probably include something like a "table plot" of multiple images for a paper!):
In [278]:
plt.imshow(mssub[70,:,:].transpose(), origin = 'lower left', interpolation = 'none')
Out[278]:
And here the "mean" lithologies (note: not really a distinct meaning, simply showing the average litho ids - could be somehow interpreted as characteristic functions, though...).
In [279]:
plt.imshow(np.mean(mssub, axis = 0).transpose(), origin = 'lower left', interpolation = 'none')
Out[279]:
And here a bit more meaningful: the analysis of single layer probabilities:
In [324]:
# step 1: estimate probabilities (note: unfortunate workaround with ones multiplication,
# there may be a better way, but this is somehow a recurring problem of implicit
# array flattening in numpy)
litho_id = 4
prob = np.sum(np.ones_like(mssub) * (mssub == litho_id), axis = 0) / float(n_samples)
In [325]:
plt.imshow(prob.transpose(),
origin = 'lower left',
interpolation = 'none',
cmap = 'gray_r')
plt.colorbar(orientation = 'horizontal')
Out[325]:
Idea: also include a "new" consideration: where to collect information to reduce uncertainty of a single layer? Could be identified by reducing layer fuzziness, for example!
Or: what are most likely positions/ locations of a specific unit, given collected information?
Later also:
For the joint entropy analysis, we now use the new lexicographic (correct term?) sorting algorithm, implemented in the module hspace:
In [330]:
sys.path.append("/Users/flow/git/mutual_information/")
In [352]:
import hspace
reload(hspace)
Out[352]:
Problem now: in hspace module, a single location information is required. Try to adapt?
In [353]:
locs = np.array([1,2,3])
locs.shape
Out[353]:
In [354]:
locs = np.array([[1,1],[1,2],[1,3]])
locs.shape
Out[354]:
In [339]:
models_sub = model_sections[:10,:,:]
In [351]:
joint_units = []
for entry in models_sub:
joint_val = ""
for i, loc in enumerate(locs):
joint_val += "%d" % entry[loc[0], loc[1]]
joint_units.append(joint_val)
print joint_units
In [358]:
hspace.joint_entropy(model_sections, locs, n_samples)
Out[358]:
In [443]:
# now: define position of "drill":
n = 10
drill_i = [60] * n
drill_j = range(39,39-n,-1)
drill_locs = zip(drill_i, drill_j)
In [444]:
# determine joint entropy of drill_locs:
h_joint_drill = hspace.joint_entropy(model_sections, drill_locs, n_samples)
In [445]:
# check with arbitrary additional position:
locs = drill_locs + [[50, 20]]
print locs
h_joint_loc = hspace.joint_entropy(model_sections, locs, n_samples)
print h_joint_drill
print h_joint_loc
print h_joint_loc - h_joint_drill
print h[50,30]
In [483]:
# Determine max value of initital entropies for colorbar scaling
h_max = np.max(h)
print h_max
n_max = int(np.ceil(2 ** h_max))
print n_max
print np.log2(n_max)
In [456]:
pwd
Out[456]:
In [465]:
# Import Viridis colorscale
import colormaps as cmaps
plt.register_cmap(name='viridis', cmap=cmaps.viridis)
plt.register_cmap(name='magma', cmap=cmaps.magma)
plt.set_cmap(cmaps.viridis)
Try own "entropy colormap":
In [446]:
# generate conditional entropies for entire section:
h_cond_drill = np.zeros_like(h)
for i in range(100):
for j in range(40):
# add position to locations
locs = drill_locs + [[i,j]]
# determine joint entropy
h_joint_loc = hspace.joint_entropy(model_sections, locs, n_samples)
# subtract joint entropy of drill locs to obtain conditional entropy
h_cond_drill[i,j] = h_joint_loc - h_joint_drill
In [502]:
import simple_2D_example
from simple_2D_example import entropy_colormap
reload(simple_2D_example)
Out[502]:
In [503]:
ecmap = simple_2D_example.entropy_colormap(h_max);
In [506]:
plt.imshow(h_cond_drill.transpose(), origin = 'lower left',
cmap = ecmap, interpolation = 'none', vmax=np.log2(n_max+0.02))
plt.colorbar(orientation = 'horizontal')
plt.plot(dp[0], dp[1], 'ws')
plt.xlim([0,99])
plt.ylim([0,39])
Out[506]:
For comparison again: the entropy of the initial model:
In [490]:
plt.imshow(h.transpose(), origin = 'lower left',
cmap = ecmap, interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
Out[490]:
And the difference, for clarity:
In [479]:
plt.imshow((h - h_cond_drill).transpose(), origin = 'lower left',
cmap = 'viridis', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
# plot drilling positions above it:
dp = np.array(drill_locs).transpose()
plt.plot(dp[0], dp[1], 'ws')
plt.xlim([0,99])
plt.ylim([0,39])
Out[479]:
Clearly, the highset reduction is in the area around the borehole, but interestingly, the uncertianty in other areas is also reduced! Note specifically the reduction of uncertainties in the two neighbouring fold hinges.
Let's check some other positions (and drilling "depths"):
In [381]:
# define position of "drill":
n = 10
drill_i = [20] * n
drill_j = range(39,39-n,-1)
drill_locs = zip(drill_i, drill_j)
# determine joint entropy of drill_locs:
h_joint_drill = hspace.joint_entropy(model_sections, drill_locs, n_samples)
We also just include one timing step to estimate the approximate simualtion time:
In [382]:
%%timeit
locs = drill_locs + [[50,20]]
h_joint_loc = hspace.joint_entropy(model_sections, locs, n_samples)
In [385]:
# esimated total time:
ttime = 100 * 40 * 0.0476
print("Estimated total time: %.3f seconds or %.3f minutes" % (ttime, ttime/60.))
In [386]:
# generate conditional entropies for entire section:
h_cond_drill = np.zeros_like(h)
for i in range(100):
for j in range(40):
# add position to locations
locs = drill_locs + [[i,j]]
# determine joint entropy
h_joint_loc = hspace.joint_entropy(model_sections, locs, n_samples)
# subtract joint entropy of drill locs to obtain conditional entropy
h_cond_drill[i,j] = h_joint_loc - h_joint_drill
In [438]:
# store results
f_out = open("h_cond_drill_i20_10.pkl", 'w')
pickle.dump(h_cond_drill, f_out)
f_out.close()
In [407]:
plt.imshow(h_cond_drill.transpose(), origin = 'lower left',
cmap = 'gray', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
# plot drilling positions above it:
dp = np.array(drill_locs).transpose()
plt.plot(dp[0], dp[1], 'ws')
plt.xlim([0,100])
plt.ylim([0,40])
Out[407]:
In [402]:
plt.imshow((h - h_cond_drill).transpose(), origin = 'lower left',
cmap = 'RdBu', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
# plot drilling positions above it:
dp = np.array(drill_locs).transpose()
plt.plot(dp[0], dp[1], 'ws')
plt.xlim([0,100])
plt.ylim([0,40])
Out[402]:
Intersting! Only a local reduction around the drilling position, however: extending to the deeper layers, as well! Why?
Drill deeper:
In [408]:
# define position of "drill":
n = 30
drill_i = [20] * n
drill_j = range(39,39-n,-1)
drill_locs = zip(drill_i, drill_j)
# determine joint entropy of drill_locs:
h_joint_drill = hspace.joint_entropy(model_sections, drill_locs, n_samples)
In [409]:
%%timeit
locs = drill_locs + [[50,20]]
h_joint_loc = hspace.joint_entropy(model_sections, locs, n_samples)
In [410]:
# esimated total time:
ttime = 100 * 40 * 0.130
print("Estimated total time: %.3f seconds or %.3f minutes" % (ttime, ttime/60.))
In [411]:
# generate conditional entropies for entire section:
h_cond_drill = np.zeros_like(h)
for i in range(100):
for j in range(40):
# add position to locations
locs = drill_locs + [[i,j]]
# determine joint entropy
h_joint_loc = hspace.joint_entropy(model_sections, locs, n_samples)
# subtract joint entropy of drill locs to obtain conditional entropy
h_cond_drill[i,j] = h_joint_loc - h_joint_drill
In [439]:
plt.imshow(h_cond_drill.transpose(), origin = 'lower left',
cmap = 'gray', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
# plot drilling positions above it:
dp = np.array(drill_locs).transpose()
plt.plot(dp[0], dp[1], 'ws')
plt.xlim([0,100])
plt.ylim([0,40])
Out[439]:
In [413]:
plt.imshow(h.transpose(), origin = 'lower left',
cmap = 'gray', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
Out[413]:
In [414]:
plt.imshow((h - h_cond_drill).transpose(), origin = 'lower left',
cmap = 'RdBu', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
# plot drilling positions above it:
dp = np.array(drill_locs).transpose()
plt.plot(dp[0], dp[1], 'ws')
plt.xlim([0,100])
plt.ylim([0,40])
Out[414]:
In [415]:
# define position of "drill":
n = 30
drill_i = [60] * n
drill_j = range(39,39-n,-1)
drill_locs = zip(drill_i, drill_j)
# determine joint entropy of drill_locs:
h_joint_drill = hspace.joint_entropy(model_sections, drill_locs, n_samples)
In [417]:
%%timeit
locs = drill_locs + [[50,20]]
h_joint_loc = hspace.joint_entropy(model_sections, locs, n_samples)
In [418]:
# generate conditional entropies for entire section:
h_cond_drill = np.zeros_like(h)
for i in range(100):
for j in range(40):
# add position to locations
locs = drill_locs + [[i,j]]
# determine joint entropy
h_joint_loc = hspace.joint_entropy(model_sections, locs, n_samples)
# subtract joint entropy of drill locs to obtain conditional entropy
h_cond_drill[i,j] = h_joint_loc - h_joint_drill
In [419]:
plt.imshow(h_cond_drill.transpose(), origin = 'lower left',
cmap = 'gray', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
# plot drilling positions above it:
dp = np.array(drill_locs).transpose()
plt.plot(dp[0], dp[1], 'ws')
plt.xlim([0,100])
plt.ylim([0,40])
Out[419]:
In [420]:
plt.imshow((h - h_cond_drill).transpose(), origin = 'lower left',
cmap = 'RdBu', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
# plot drilling positions above it:
dp = np.array(drill_locs).transpose()
plt.plot(dp[0], dp[1], 'ws')
plt.xlim([0,100])
plt.ylim([0,40])
Out[420]:
Interesting! And now both combined:
In [421]:
# define position of "drill":
n = 30
drill_i = [60] * n + [20] * n
drill_j = range(39,39-n,-1) + range(39,39-n,-1)
drill_locs = zip(drill_i, drill_j)
# determine joint entropy of drill_locs:
h_joint_drill = hspace.joint_entropy(model_sections, drill_locs, n_samples)
In [422]:
%%timeit
locs = drill_locs + [[50,20]]
h_joint_loc = hspace.joint_entropy(model_sections, locs, n_samples)
In [423]:
# generate conditional entropies for entire section:
h_cond_drill = np.zeros_like(h)
for i in range(100):
for j in range(40):
# add position to locations
locs = drill_locs + [[i,j]]
# determine joint entropy
h_joint_loc = hspace.joint_entropy(model_sections, locs, n_samples)
# subtract joint entropy of drill locs to obtain conditional entropy
h_cond_drill[i,j] = h_joint_loc - h_joint_drill
In [424]:
plt.imshow(h_cond_drill.transpose(), origin = 'lower left',
cmap = 'gray', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
# plot drilling positions above it:
dp = np.array(drill_locs).transpose()
plt.plot(dp[0], dp[1], 'ws')
plt.xlim([0,100])
plt.ylim([0,40])
Out[424]:
In [425]:
plt.imshow((h - h_cond_drill).transpose(), origin = 'lower left',
cmap = 'RdBu', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
# plot drilling positions above it:
dp = np.array(drill_locs).transpose()
plt.plot(dp[0], dp[1], 'ws')
plt.xlim([0,100])
plt.ylim([0,40])
Out[425]:
We can see that now only a part on the left remains with significant uncertainties. So, let's "drill" into this, as well:
In [426]:
# define position of "drill":
n = 30
drill_i = [60] * n + [20] * n + [5] * n
drill_j = range(39,39-n,-1) + range(39,39-n,-1) + range(39,39-n,-1)
drill_locs = zip(drill_i, drill_j)
# determine joint entropy of drill_locs:
h_joint_drill = hspace.joint_entropy(model_sections, drill_locs, n_samples)
In [427]:
%%timeit
locs = drill_locs + [[50,20]]
h_joint_loc = hspace.joint_entropy(model_sections, locs, n_samples)
In [428]:
# generate conditional entropies for entire section:
h_cond_drill = np.zeros_like(h)
for i in range(100):
for j in range(40):
# add position to locations
locs = drill_locs + [[i,j]]
# determine joint entropy
h_joint_loc = hspace.joint_entropy(model_sections, locs, n_samples)
# subtract joint entropy of drill locs to obtain conditional entropy
h_cond_drill[i,j] = h_joint_loc - h_joint_drill
In [431]:
# store results
f_out = open("h_cond_drill_i62_20_5_30.pkl", 'w')
pickle.dump(h_cond_drill, f_out)
f_out.close()
Additional idea to speed up computation (especially for higher multivariate examples): do not estimate value at locations where conditional entropy of subset (i.e.: often previously calculated)! (Check: theoretical reason/ justification!)
In [429]:
plt.imshow(h_cond_drill.transpose(), origin = 'lower left',
cmap = 'gray', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
# plot drilling positions above it:
dp = np.array(drill_locs).transpose()
plt.plot(dp[0], dp[1], 'ws')
plt.xlim([0,100])
plt.ylim([0,40])
Out[429]:
In [432]:
plt.imshow((h - h_cond_drill).transpose(), origin = 'lower left',
cmap = 'RdBu', interpolation = 'none')
plt.colorbar(orientation = 'horizontal')
# plot drilling positions above it:
dp = np.array(drill_locs).transpose()
plt.plot(dp[0], dp[1], 'ws')
plt.xlim([0,100])
plt.ylim([0,40])
Out[432]:
In [ ]: