striplog
Launch this notebook:
Initially based on Alfredo Molina on Medium, then incorporating quasi-independence model from Powers & Easterling, https://doi.org/10.1306/212F808F-2B24-11D7-8648000102C1865D.
You need striplog
version 0.8.2 for this notebook to work.
In [1]:
import striplog
striplog.__version__
Out[1]:
All of the Markov chain code I could find wanted a transition matrix as input. But I wanted to be able to provide a sequence, then get the transitions from it. So we need to be able to parse a sequence of states — tokens representing the items in the sequence we are modeling.
For example, we need the unique elements and 'sequence of sequences' from:
[10, 20, 10, 20, 20, 10]
'ABBDDDDDCCCC'
[[1,2,2,3], [2,4,2]]
(NB, not same length)['aaabb', 'aabbccc']
(NB, not same length)['sst', 'mud', 'sst']
(requires optional argument)[['SS', 'M', 'SS'], ['M', 'M', 'LS']]
The uniques should look like:
[10, 20]
['A', 'B', 'C', 'D']
[1, 2, 3, 4]
['a', 'b', 'c']
['mud', sst']
['LS', 'M', 'SS']
And the sequences of sequences:
[10, 20, 10, 20, 20, 10]
['A', 'B', 'B', 'D', 'D', 'D', 'D', 'D', 'C', 'C', 'C', 'C']
[[1,2,2,3], [2,4,2]]
[['a', 'a', 'a', 'b', 'b'], ['a', 'a', 'b', 'b', 'c', 'c', 'c']]
[['sst', 'mud', 'sst']]
[['SS', 'M', 'SS'], ['M', 'M', 'LS']]
For some reason, this turned out to be a bit fiddly. But for now, things work.
In [2]:
from striplog.markov import Markov_chain
In [3]:
Markov_chain??
Let's use one of the examples in Powers & Easterling — they use this transition matrix from Gingerich, PD (1969). Markov analysis of cyclic alluvial sediments. Journal of Sedimentary Petrology, 39, p. 330-332. https://doi.org/10.1306/74D71C4E-2B21-11D7-8648000102C1865D
In [3]:
data = [[ 0, 37, 3, 2],
[21, 0, 41, 14],
[20, 25, 0, 0],
[ 1, 14, 1, 0]]
m = Markov_chain(data, states=['A', 'B', 'C', 'D'])
m
Out[3]:
In [4]:
import scipy.stats
chi, p, dof, expected = scipy.stats.chi2_contingency(m.observed_counts)
However, there are problems with this:
Fisher's exact test is notoriously difficult to compute for tables larger than 2 x 2, but in theory we can use the FisherExact
module by Marc-Rolland Noutahi (see his blog post) to check Fisher's exact statistic.
In the meantime, we have Powers & Easterling and the assumption of quasi-independence.
First, let's look at the expected transition frequencies:
In [5]:
import numpy as np
np.set_printoptions(precision=8)
In [6]:
m.expected_counts
Out[6]:
The $\chi^2$ statistic shows the value for the observed ordering, along with the critical value at (by default) the 95% confidence level. If the first number is higher than the second number (ideally much higher), then we can reject the hypothesis that the ordering is quasi-independent. That is, we have shown that the ordering is non-random.
In [7]:
m.chi_squared()
Out[7]:
The normalized difference shows which transitions are 'interesting'. These numbers can be interpreted as standard deviations away from the model of quasi-independence. That is, transitions with large positive numbers represent passages that occur more often than might be expected. Any numbers greater than 2 are likely to be important.
In [8]:
m.normalized_difference
Out[8]:
We can visualize this as an image:
In [9]:
m.plot_norm_diff()
We can also interpret this matrix as a graph. The transitions from A to C are particularly strong in this one. Transitions from C to A happen less often than we'd expect. Those from B to D and D to B, less so.
In [10]:
%matplotlib inline
m.plot_graph()
We can look at an undirected version of the graph too. It downplays non-reciprocal relationships. I'm not sure this is useful...
In [11]:
%matplotlib inline
m.plot_graph(directed=False)
We can generate a random succession of beds with the same transition statistics:
In [12]:
''.join(m.generate_states(n=20))
Out[12]:
These are the transitions from some measured sections in my PhD thesis. They start at the bottom, so in Log 7, we start with lithofacies 1 (offshore mudstone) and pass upwards into lithofacies 3, then back into 1, then 3, and so on.
We can instantiate a Markov_chain
object from a sequence using its from_sequence()
method. This expects either a sequence of 'states' (numbers or letters or strings representing rock types) or a sequence of sequences of states.
In [13]:
data = {
'log7': [1, 3, 1, 3, 5, 1, 2, 1, 3, 1, 5, 6, 1, 2, 1, 2, 1, 2, 1, 3, 5, 6, 5, 1],
'log9': [1, 3, 1, 5, 1, 5, 3, 1, 2, 1, 2, 1, 3, 5, 1, 5, 6, 5, 6, 1, 2, 1, 5, 6, 1],
'log11': [1, 3, 1, 2, 1, 5, 3, 1, 2, 1, 2, 1, 3, 5, 3, 5, 1, 9, 5, 5, 5, 5, 6, 1],
'log12': [1, 5, 3, 1, 2, 1, 2, 1, 2, 1, 4, 5, 6, 1, 2, 1, 4, 5, 1, 5, 5, 5, 1, 2, 1, 8, 9, 10, 9, 5, 1],
'log13': [1, 6, 1, 3, 1, 3, 5, 3, 6, 1, 6, 5, 3, 1, 5, 1, 2, 1, 4, 3, 5, 3, 4, 3, 5, 1, 5, 9, 11, 9, 1],
'log14': [1, 3, 1, 5, 8, 5, 6, 1, 3, 4, 5, 3, 1, 3, 5, 1, 7, 7, 7, 1, 7, 1, 3, 8, 5, 5, 1, 5, 9, 9, 11, 9, 1],
'log15': [1, 8, 1, 3, 5, 1, 2, 3, 6, 3, 6, 5, 2, 1, 2, 1, 8, 5, 1, 5, 9, 9, 11, 1],
'log16': [1, 8, 1, 5, 1, 5, 5, 6, 1, 3, 5, 3, 5, 5, 5, 8, 5, 1, 9, 9, 3, 1],
'log17': [1, 3, 8, 1, 8, 5, 1, 8, 9, 5, 10, 5, 8, 9, 10, 8, 5, 1, 8, 9, 1],
'log18': [1, 8, 2, 1, 2, 1, 10, 8, 9, 5, 5, 1, 2, 1, 2, 9, 5, 9, 5, 8, 5, 9, 1]
}
logs = list(data.values())
m = Markov_chain.from_sequence(logs, states=range(1,12))
m.observed_counts
Out[13]:
Let's check out the normalized difference matrix:
In [14]:
m.expected_counts
Out[14]:
In [15]:
np.set_printoptions(suppress=True, precision=1, linewidth=120)
m.normalized_difference
Out[15]:
In [16]:
m.plot_norm_diff()
And the graph version. Note you can re-run this cell to rearrange the graph.
In [17]:
m.plot_graph(figsize=(15,15), max_size=2400, edge_labels=True)
In [18]:
m = Markov_chain.from_sequence(logs, states=range(1,12), step=2)
Now we have a 3D 'matrix' of transition probabilities.
In [19]:
m.normalized_difference.shape
Out[19]:
This is hard to inspect! Let's just get the indices of the highest values.
If we add one to the indices, we'll have a handy list of facies number transitions, since these are just 1 to 11. So we can interpret these as transitions with anomalously high probability.
In [20]:
idx = np.where(m.normalized_difference > 1.28) # 1.28 is 80% confidence
locs = np.array(list(zip(*idx)))
for score, loc in zip(m.normalized_difference[idx], locs):
print(f"{' > '.join(map(str,loc+1)):>12} {score:.3f}")
These represent, respectively (it's stochastic so some of these might be missing):
1 > 2 > 1
1 > 8 > 9
5 > 9 > 11
8 > 9 > 10
9 > 9 > 11
9 > 10 > 9
9 > 11 > 9
In [21]:
m.chi_squared()
Out[21]:
Unfortunately, it's a bit harder to draw this as a graph. Technically, it's a hypergraph.
In [22]:
# This should error for now.
m.plot_graph(figsize=(15,15), max_size=2400, edge_labels=True)
In [23]:
data = "sssmmmlllmlmlsslsllsmmllllmssssllllssmmlllllssssssmmmmsmllllssslmslmsmmmslsllll"""
In [24]:
m = Markov_chain.from_sequence(data, include_self=True)
m
Out[24]:
In [25]:
m.observed_counts
Out[25]:
In [26]:
m._state_probs
Out[26]:
In [27]:
m.observed_freqs
Out[27]:
In [28]:
m.expected_freqs
Out[28]:
In [29]:
m.states
Out[29]:
Conditional probabilities given a 'current' state:
In [30]:
m._conditional_probs('l')
Out[30]:
A random sequence generated from the Markov chain model:
In [31]:
m.generate_states(12)
Out[31]:
Based on this conversation.
Let's make a shoreface-type of thing:
We'll do it very naively, with multiple complete sequences.
In [32]:
data = "msfcmsfcmsfcmsfcmsfcmsfcmsfcmsfcmsfcmsfcmsfcmsfcmsfcmsfcmsfcmsfcmsfcmsfc"
In [33]:
m = Markov_chain.from_sequence(data)
m
Out[33]:
In [34]:
m.normalized_difference
Out[34]:
In [35]:
m.expected_counts
Out[35]:
In [36]:
m.expected_freqs
Out[36]:
In [39]:
m.chi_squared()
Out[39]:
In [40]:
m.plot_norm_diff()
In [41]:
m.plot_graph()
In [42]:
succession = 'msfc'
data = ''
for _ in range(100):
stop = np.random.randint(len(succession))
data += succession[:stop+1]
data
Out[42]:
In [43]:
m = Markov_chain.from_sequence(data, include_self=False)
m
Out[43]:
In [44]:
m.normalized_difference
Out[44]:
In [45]:
m.plot_norm_diff()
In [46]:
m.plot_graph()
So the 'correct' graph emerges.
In [47]:
np.set_printoptions(precision=3)
In [48]:
data = ''
l = len(succession)
for _ in range(100):
start = np.random.randint(l)
length = min(l-start, np.random.randint(l))
data += succession[start:length+1]
data
Out[48]:
In [49]:
m = Markov_chain.from_sequence(data, include_self=False)
m
Out[49]:
In [50]:
m.normalized_difference
Out[50]:
In [51]:
m.chi_squared()
Out[51]:
In [52]:
print("Chi2(chi2=48.36996, crit=11.07050, perc=0.99999)")
In [53]:
m.plot_norm_diff()
In [54]:
m.plot_graph()
Based on this conversation.
In [55]:
succession = ['sub', 'inter', 'supra']
In [56]:
data = 20 * succession
In [57]:
m = Markov_chain.from_sequence(data, strings_are_states=True, include_self=False)
m
Out[57]:
In [58]:
m.normalized_difference
Out[58]:
In [59]:
m.plot_norm_diff()
In [60]:
m.plot_graph()
In [61]:
data = []
l = len(succession)
for _ in range(40):
start = np.random.randint(l)
length = min(l-start, np.random.randint(l))
data += succession[start:length+1]
In [62]:
data
Out[62]:
In [63]:
m = Markov_chain.from_sequence(data, strings_are_states=True, include_self=False)
m
Out[63]:
In [64]:
m.normalized_difference
Out[64]:
In [65]:
m.plot_norm_diff()
In [66]:
m.plot_graph()
In [67]:
from welly import Well
from striplog import Striplog, Component
w = Well.from_las("P-129_out.LAS")
gr = w.data['GR']
comps = [Component({'lithology': 'sandstone'}),
Component({'lithology': 'greywacke'}),
Component({'lithology': 'shale'}),
]
s = Striplog.from_log(gr, cutoff=[30, 90], components=comps, basis=gr.basis)
s
Out[67]:
In [68]:
s.plot()
In [69]:
log, z, comps = s.to_log(return_meta=True)
log
Out[69]:
This is not quite what we want, because it's regularly sampled.
We want the beds only.
In [70]:
s[0]
Out[70]:
In [71]:
seq = [i.primary.lithology for i in s]
In [72]:
m = Markov_chain.from_sequence(seq, strings_are_states=True, include_self=False)
In [73]:
m.normalized_difference
Out[73]:
In [74]:
m.plot_norm_diff()
In [75]:
m.plot_graph()
In [76]:
import striplog
striplog.__version__
Out[76]:
In [77]:
from striplog.markov import Markov_chain
In [78]:
m = Markov_chain.from_sequence(seq, strings_are_states=True, include_self=False)
In [79]:
m.plot_norm_diff()
© Agile Scientific 2019, licensed CC-BY / Apache 2.0, please share this work