Reading SPWLA file with string methods

The 3 methods for extracting data are:

Looping over the file and applying string methods, etc. >>> this notebook.
Using regex to extract everything at once. >>> I have this 'sort of' working, see below.
Using a parser. >>> I don't think I'm clever enough for this.

Looping SPWLA file

Let's start by just trying to read the file.



In [1]:

    
!head -24 ../data/core_analysis_example.spwla









    



10     2                                                                                                                       
    9999/9-9                                Norway                                            9Sep99
    Weatherford-Labs
15    10   10
          1507      1602  2031   0Weatherford-Labs    Nitrogen Permeability, Hor.
          1512      1602  2031   0Weatherford-Labs    Klinkenberg corrected gas perm, Hor.
          1510      1602  2031   0Weatherford-Labs    Nitrogen Permeability, Vert.
          1515      1602  2031   0Weatherford-Labs    Klinkenberg corrected gas perm, Vert.
          1402      1211  3084   0Weatherford-Labs    Porosity, Horizontal PLUG
          1403      1211  3084   0Weatherford-Labs    Porosity, Vertical PLUG
          1401      1212  3084   0Weatherford-Labs    Porosity, Summation
          1302      1103  3085   0Weatherford-Labs    CORE Oil Saturation
          1301      1103  3085   0Weatherford-Labs    CORE Water Saturation
          2451      1201  1086   0Weatherford-Labs    Grain Density, Hor.
20     1
        0.00     0.00  1918.00  1983.72  0.0  1
30     1
     1918.95     0.00   1.11
40     1   10
     -1002.00000 -1002.00000 -1002.00000 -1002.00000 -1002.00000    18.44722 -1002.00000    14.78718 -1002.00000 -1002.00000
30     1
     1919.95     0.00   2.11
40     1   10
     -1002.00000 -1002.00000 -1002.00000 -1002.00000 -1002.00000    17.06246 -1002.00000    18.06427 -1002.00000 -1002.00000

Observations

Some lines are 128 characters wide
Some of the data is unidentifiable
This is probably a job for striplog
The info after the record type (10, 15, 20, 30, etc) seems to be the number of lines (and fields per line, perhaps) in that record, which is redundant information (can just read until the next record type flag)

Naive approach

In theory, this should be 'the easy way'... but in practice, with horrible formats like this one, it often seems to end up being quite brittle.

Let's poke it and see...



In [5]:

    
record_fields = {
    'header': [['well', 'country', 'date'], ['company']],  # Occurs on 2 lines
    'features': ['a', 'b', 'c', 'd', 'company', 'feature'],
    'range': ['w', 'x', 'start', 'stop', 'y', 'z'],
    'depth': ['depth', 'alpha', 'beta'],
    'descr': ['description'],
    'data': ['data'],  # Capture as array
}



In [6]:

    
record_type = {
    10: 'header',
    15: 'features',
    20: 'range',
    30: 'depth',
    36: 'descr',
    40: 'data',
}



In [172]:

    
fname = "../data/core_analysis_example.spwla"

with open(fname, 'r') as f:
    data = f.read()

def get_blocks(data):
    for line in data:
        if line[:2].isnumeric():
            code = line[:2]
            continue
        yield code, line

for code, line in get_blocks(data.split('\n')):
    rec_type = record_type[int(code)]
    fields = record_fields[rec_type]
    
    features = []
    if rec_type == 'features':
        features.append(None)

OK, this is going to be horrendous. It's doable, but won't be pretty.

It doesn't help that I have no idea what kind of variance to expect in this format.



In [ ]: