02 - Facies Classification using TPOT

George Crowther - https://www.linkedin.com/in/george-crowther-9669a931?trk=hp-identity-name

In this second attempt, I've updated some of the feature engineering before re-training an extra trees classifier on the data

1. Data Loading and Initial Observations



In [1]:

    
# Initial imports for reading data and first observations
import pandas as pd
import bokeh.plotting as bk
import numpy as np

from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

from tpot import TPOTClassifier

import sys
sys.path.append(r'C:\Users\george.crowther\Documents\Python\Projects\2016-ml-contest-master')
from classification_utilities import display_cm, display_adj_cm

bk.output_notebook()









    





    
        
        Loading BokehJS ...



In [2]:

    
# Input file paths
train_path = r'..\training_data.csv'
test_path = r'.\validation_data_nofacies.csv'

# Read training data to dataframe
train = pd.read_csv(train_path)

# TPOT library requires that the target class is renamed to 'class'
train.rename(columns={'Facies': 'class'}, inplace=True)



In [6]:

    
train.head()



In [7]:

    
train.describe()









    Out[7]:






  
    
      
      class
      Depth
      GR
      ILD_log10
      DeltaPHI
      PHIND
      PE
      NM_M
      RELPOS
    
  
  
    
      count
      3232.000000
      3232.000000
      3232.000000
      3232.000000
      3232.000000
      3232.000000
      3232.000000
      3232.000000
      3232.000000
    
    
      mean
      4.422030
      2875.824567
      66.135769
      0.642719
      3.559642
      13.483213
      3.725014
      1.498453
      0.520287
    
    
      std
      2.504243
      131.006274
      30.854826
      0.241845
      5.228948
      7.698980
      0.896152
      0.500075
      0.286792
    
    
      min
      1.000000
      2573.500000
      13.250000
      -0.025949
      -21.832000
      0.550000
      0.200000
      1.000000
      0.010000
    
    
      25%
      2.000000
      2791.000000
      46.918750
      0.492750
      1.163750
      8.346750
      3.100000
      1.000000
      0.273000
    
    
      50%
      4.000000
      2893.500000
      65.721500
      0.624437
      3.500000
      12.150000
      3.551500
      1.000000
      0.526000
    
    
      75%
      6.000000
      2980.000000
      79.626250
      0.812735
      6.432500
      16.453750
      4.300000
      2.000000
      0.767250
    
    
      max
      9.000000
      3122.500000
      361.150000
      1.480000
      18.600000
      84.400000
      8.094000
      2.000000
      1.000000

Feature Engineering and Creation

Again, as with the previous result, the method here is somewhat brute-force, looking at the differences between each sample and it's formation mean/median, its above formation lower sample and below formation upper sample. There could definitely be more metrics, and undoubtedly more informed metrics to pull in this manner, these are arguably somewhat naieve.



In [24]:

    
def feature_extraction(train):
    #------------------------------------
    # Split and separate formation names into 
    for i, value in enumerate(train.Formation.unique()):
        name_a = value.split(' ')[0]
        name_b = value.split(' ')[1]
        if name_a not in train.columns:
            train[name_a] = 0
        if name_b not in train.columns:
            train[name_b] = 0

        train.loc[train.Formation == value, name_a] = 1
        train.loc[train.Formation == value, name_b] = 1
    #------------------------------------
    # Replace formation names with values
    for i, value in enumerate(train['Formation'].unique()):
        train.loc[train['Formation'] == value, 'Formation'] = i

    #------------------------------------
    # Going to take the difference of each sample from the formation mean and median for each well for each measured parameter
    # This will add a 0 value column for each potential value
    columns = ['Formation', 'Depth', 'GR', 'ILD_log10', 'DeltaPHI', 'PHIND', 'PE', 'NM_M', 'RELPOS']

    above_columns = ['above_delta_' + col for col in columns]
    below_columns = ['below_delta_' + col for col in columns]
    formation_columns = ['formation_delta_' + col for col in columns]
    formation_med_columns = ['formation_delta_med_' + col for col in columns]

    def add_empty_columns(df, column_list):
        for column in column_list:
            df[column] = 0

    for column_list in [above_columns, below_columns, formation_columns, formation_med_columns]:
        add_empty_columns(train, column_list)

    #-------------------------------------------
    # Group data by well, sort by depth, then groupby formation
    # Take mean, median, top and bottom (by depth) values for each sub group
    # Add feature which is the difference of the sample from the mean for each formation and its adjacent formations
    # TBD - un-log 'ILD log10' prior to mean, the re-log
    for i, group in train.groupby('Well Name'):
        iteration = 0
        sorted_group = group.sort_values('Depth')
        for j, sub_group in sorted_group.groupby('Formation'):

            means = sub_group[columns].mean()
            medians = sub_group[columns].median()
            top = sub_group.iloc[0][columns]

            if iteration == 0:
                above_group = sub_group
            else:
                above_means = above_group[columns].mean()
                above_bottom = above_group.iloc[-1][columns]
                train.loc[sub_group.index, above_columns] = (train.loc[sub_group.index, columns] - above_bottom).values
                train.loc[above_group.index, below_columns] = (train.loc[sub_group.index, columns] - top).values

            train.loc[sub_group.index, formation_columns] = (train.loc[sub_group.index, columns] - means).values
            train.loc[sub_group.index, formation_med_columns] = (train.loc[sub_group.index, columns] - medians).values

            above_group = sub_group
            iteration += 1
    
    return train



In [15]:

    
facies_labels = ['SS', 'CSiS', 'FSiS', 'SiSh', 'MS',
                 'WS', 'D','PS', 'BS']
model_columns = train.columns[11:]

4. TPOT

TPOT uses a genetic algorithm to tune model parameters for the most effective fit. This can take quite a while to process if you want to re-run this part!



In [18]:

    
# Input file paths
train_path = r'..\training_data.csv'

# Read training data to dataframe
train = pd.read_csv(train_path)

# TPOT library requires that the target class is renamed to 'class'
train.rename(columns={'Facies': 'class'}, inplace=True)

train = feature_extraction(train)



In [8]:

    
alt_model_columns = ['GR', 'ILD_log10',
       'DeltaPHI', 'PHIND', 'PE', 'NM_M', 'RELPOS', 'A1', 'SH', 'LM', 'B1',
       'B2', 'B3', 'B4', 'B5', 'C', 'above_delta_Formation',
       'above_delta_Depth', 'above_delta_GR', 'above_delta_ILD_log10',
       'above_delta_DeltaPHI', 'above_delta_PHIND', 'above_delta_PE',
       'above_delta_NM_M', 'above_delta_RELPOS', 'below_delta_Formation',
       'below_delta_Depth', 'below_delta_GR', 'below_delta_ILD_log10',
       'below_delta_DeltaPHI', 'below_delta_PHIND', 'below_delta_PE',
       'below_delta_NM_M', 'below_delta_RELPOS', 'formation_delta_Formation',
       'formation_delta_Depth', 'formation_delta_GR',
       'formation_delta_ILD_log10', 'formation_delta_DeltaPHI',
       'formation_delta_PHIND', 'formation_delta_PE', 'formation_delta_NM_M',
       'formation_delta_RELPOS', 'formation_delta_med_Formation',
       'formation_delta_med_Depth', 'formation_delta_med_GR',
       'formation_delta_med_ILD_log10', 'formation_delta_med_DeltaPHI',
       'formation_delta_med_PHIND', 'formation_delta_med_PE',
       'formation_delta_med_NM_M', 'formation_delta_med_RELPOS']



In [9]:

    
#-------------------------------
# Z-scale normalisation of features. 
# Should probably exclude boolean features from normalisation, though should make nominal difference.
std_scaler = preprocessing.StandardScaler().fit(train[alt_model_columns])
norm = std_scaler.transform(train[alt_model_columns])

norm_frame = train
for i, column in enumerate(alt_model_columns):
    norm_frame.loc[:, column] = norm[:, i]

train = norm_frame



In [155]:

    
train[alt_model_columns].describe()









    Out[155]:






  
    
      
      GR
      ILD_log10
      DeltaPHI
      PHIND
      PE
      NM_M
      RELPOS
      A1
      SH
      LM
      ...
      formation_delta_RELPOS
      formation_delta_med_Formation
      formation_delta_med_Depth
      formation_delta_med_GR
      formation_delta_med_ILD_log10
      formation_delta_med_DeltaPHI
      formation_delta_med_PHIND
      formation_delta_med_PE
      formation_delta_med_NM_M
      formation_delta_med_RELPOS
    
  
  
    
      count
      3.232000e+03
      3.232000e+03
      3.232000e+03
      3.232000e+03
      3.232000e+03
      3.232000e+03
      3.232000e+03
      3.232000e+03
      3.232000e+03
      3.232000e+03
      ...
      3.232000e+03
      3232.0
      3.232000e+03
      3.232000e+03
      3.232000e+03
      3.232000e+03
      3.232000e+03
      3.232000e+03
      3.232000e+03
      3.232000e+03
    
    
      mean
      3.209754e-16
      4.484861e-16
      -3.517538e-17
      -1.363046e-16
      5.100431e-16
      -3.517538e-17
      1.429000e-16
      9.233538e-17
      -1.934646e-16
      -7.035077e-17
      ...
      -4.232038e-17
      0.0
      1.319077e-17
      3.077846e-17
      -8.793846e-18
      -5.496154e-19
      -1.758769e-17
      5.496154e-18
      3.077846e-17
      2.967923e-17
    
    
      std
      1.000155e+00
      1.000155e+00
      1.000155e+00
      1.000155e+00
      1.000155e+00
      1.000155e+00
      1.000155e+00
      1.000155e+00
      1.000155e+00
      1.000155e+00
      ...
      1.000155e+00
      0.0
      1.000155e+00
      1.000155e+00
      1.000155e+00
      1.000155e+00
      1.000155e+00
      1.000155e+00
      1.000155e+00
      1.000155e+00
    
    
      min
      -1.714285e+00
      -2.765296e+00
      -4.856726e+00
      -1.680121e+00
      -3.934108e+00
      -9.969107e-01
      -1.779569e+00
      -5.625806e-01
      -9.956777e-01
      -1.004341e+00
      ...
      -2.494438e+00
      0.0
      -1.138142e+01
      -2.715730e+00
      -4.029226e+00
      -6.260693e+00
      -4.250182e+00
      -5.731779e+00
      -8.744320e+00
      -2.549279e+00
    
    
      25%
      -6.229169e-01
      -6.202015e-01
      -4.582685e-01
      -6.672647e-01
      -6.975496e-01
      -9.969107e-01
      -8.623869e-01
      -5.625806e-01
      -9.956777e-01
      -1.004341e+00
      ...
      -8.358484e-01
      0.0
      -5.487312e-01
      -4.287593e-01
      -4.695474e-01
      -3.941843e-01
      -4.363377e-01
      -4.660069e-01
      3.258753e-02
      -8.335645e-01
    
    
      50%
      -1.342848e-02
      -7.560721e-02
      -1.140783e-02
      -1.731943e-01
      -1.936510e-01
      -9.969107e-01
      1.992191e-02
      -5.625806e-01
      -9.956777e-01
      9.956777e-01
      ...
      -2.351765e-03
      0.0
      1.056513e-02
      -1.604354e-01
      3.090394e-03
      1.516462e-02
      -1.458224e-01
      5.452730e-02
      3.258753e-02
      -4.184040e-03
    
    
      75%
      4.372920e-01
      7.031033e-01
      5.494992e-01
      3.858948e-01
      6.417158e-01
      1.003099e+00
      8.612539e-01
      -5.625806e-01
      1.004341e+00
      9.956777e-01
      ...
      8.400663e-01
      0.0
      5.698615e-01
      1.757048e-01
      5.204175e-01
      4.910436e-01
      2.205165e-01
      5.211649e-01
      3.258753e-02
      8.323156e-01
    
    
      max
      9.562844e+00
      3.462598e+00
      2.876809e+00
      9.212618e+00
      4.876027e+00
      1.003099e+00
      1.672943e+00
      1.777523e+00
      1.004341e+00
      9.956777e-01
      ...
      2.495892e+00
      0.0
      9.812970e+00
      1.110082e+01
      3.929107e+00
      4.480679e+00
      1.127884e+01
      5.779237e+00
      8.809495e+00
      2.879070e+00
    
  

8 rows × 52 columns



In [10]:

    
#------------------------------------
# Train test split
alt_train_f, alt_test_f = train_test_split(train, test_size = 0.1, 
                                   random_state = 68)



In [12]:

    
# Setup TPOT classifier and train
alt_tpot = TPOTClassifier(verbosity = 2, generations = 5, max_eval_time_mins = 60)
alt_tpot.fit(alt_train_f[alt_model_columns], alt_train_f['class'])









    



Optimization Progress:  17%|█████████████████████▌                                                                                                           | 100/600 [26:17<7:20:59, 52.92s/pipeline]





    



Generation 1 - Current best internal CV score: 0.9118500821427303






    



Optimization Progress:  32%|████████████████████████████████████████▋                                                                                      | 192/600 [1:00:59<2:45:30, 24.34s/pipeline]





    



Generation 2 - Current best internal CV score: 0.9118500821427303






    



Optimization Progress:  48%|█████████████████████████████████████████████████████████████▍                                                                 | 290/600 [1:38:36<2:25:11, 28.10s/pipeline]





    



Generation 3 - Current best internal CV score: 0.91286129457966






    



Optimization Progress:  65%|██████████████████████████████████████████████████████████████████████████████████▏                                            | 388/600 [2:12:31<1:02:33, 17.71s/pipeline]





    



Generation 4 - Current best internal CV score: 0.91286129457966






    



Optimization Progress:  82%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▎                       | 490/600 [2:52:00<41:39, 22.72s/pipeline]





    



Generation 5 - Current best internal CV score: 0.91286129457966






    









    



Best pipeline: ExtraTreesClassifier(input_matrix, 41, 0.47999999999999998)



In [22]:

    
print(alt_tpot.score(alt_test_f[alt_model_columns], alt_test_f['class']))
alt_tpot.export('02 contest_export.py')









    



0.911214630264



In [49]:

    
result = alt_tpot.predict(train[alt_model_columns])

conf = confusion_matrix(train['class'], result)
display_cm(conf, facies_labels, hide_zeros=True, display_metrics = True)

def accuracy(conf):
    total_correct = 0.
    nb_classes = conf.shape[0]
    for i in np.arange(0,nb_classes):
        total_correct += conf[i][i]
    acc = total_correct/sum(sum(conf))
    return acc

print(accuracy(conf))

adjacent_facies = np.array([[1], [0,2], [1], [4], [3,5], [4,6,7], [5,7], [5,6,8], [6,7]])

def accuracy_adjacent(conf, adjacent_facies):
    nb_classes = conf.shape[0]
    total_correct = 0.
    for i in np.arange(0,nb_classes):
        total_correct += conf[i][i]
        for j in adjacent_facies[i]:
            total_correct += conf[i][j]
    return total_correct / sum(sum(conf))

print(accuracy_adjacent(conf, adjacent_facies))









    



     Pred    SS  CSiS  FSiS  SiSh    MS    WS     D    PS    BS Total
     True
       SS   257     2                                             259
     CSiS     1   730     7                                       738
     FSiS           6   608     1                                 615
     SiSh                     182           2                     184
       MS           1           2   209     3           2         217
       WS                       3     1   455           3         462
        D                       2                96                98
       PS                       1     2     7     1   486     1   498
       BS                                               1   160   161

Precision  1.00  0.99  0.99  0.95  0.99  0.97  0.99  0.99  0.99  0.98
   Recall  0.99  0.99  0.99  0.99  0.96  0.98  0.98  0.98  0.99  0.98
       F1  0.99  0.99  0.99  0.97  0.97  0.98  0.98  0.98  0.99  0.98
0.984839108911
0.995668316832

Workflow for Test Data

All the code below here can be re-run to load the model, fit it and predict on the test dataset.



In [40]:

    
test_path = r'..\validation_data_nofacies.csv'

# Read training data to dataframe
test = pd.read_csv(test_path)

# Rename 'Facies'
test.rename(columns={'Facies': 'class'}, inplace=True)

frame = feature_extraction(test)



In [41]:

    
frame.describe()









    Out[41]:






  
    
      
      Depth
      GR
      ILD_log10
      DeltaPHI
      PHIND
      PE
      NM_M
      RELPOS
      A1
      SH
      ...
      formation_delta_RELPOS
      formation_delta_med_Formation
      formation_delta_med_Depth
      formation_delta_med_GR
      formation_delta_med_ILD_log10
      formation_delta_med_DeltaPHI
      formation_delta_med_PHIND
      formation_delta_med_PE
      formation_delta_med_NM_M
      formation_delta_med_RELPOS
    
  
  
    
      count
      830.000000
      830.00000
      830.000000
      830.000000
      830.000000
      830.000000
      830.000000
      830.000000
      830.000000
      830.000000
      ...
      8.300000e+02
      830.0
      830.000000
      830.000000
      830.000000
      830.000000
      830.000000
      830.000000
      830.000000
      830.000000
    
    
      mean
      2987.070482
      57.61173
      0.666312
      2.851964
      11.655277
      3.654178
      1.678313
      0.535807
      0.266265
      0.318072
      ...
      6.420567e-18
      0.0
      0.010843
      3.602400
      -0.006436
      0.078524
      0.472410
      0.000482
      -0.003614
      0.002496
    
    
      std
      94.391925
      27.52774
      0.288367
      3.442074
      5.190236
      0.649793
      0.467405
      0.283062
      0.442271
      0.466009
      ...
      2.765885e-01
      0.0
      8.945698
      24.054012
      0.209007
      2.378272
      4.271225
      0.471993
      0.104132
      0.276883
    
    
      min
      2808.000000
      12.03600
      -0.468000
      -8.900000
      1.855000
      2.113000
      1.000000
      0.013000
      0.000000
      0.000000
      ...
      -4.959667e-01
      0.0
      -25.000000
      -40.300000
      -0.718000
      -12.700000
      -14.395000
      -2.433000
      -1.000000
      -0.506000
    
    
      25%
      2911.625000
      36.77325
      0.541000
      0.411250
      7.700000
      3.171500
      1.000000
      0.300000
      0.000000
      0.000000
      ...
      -2.322438e-01
      0.0
      -4.500000
      -8.233500
      -0.098750
      -1.237500
      -1.940000
      -0.232125
      0.000000
      -0.230000
    
    
      50%
      2993.750000
      58.34450
      0.675000
      2.397500
      10.950000
      3.515500
      2.000000
      0.547500
      0.000000
      0.000000
      ...
      -1.416603e-04
      0.0
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      75%
      3055.375000
      73.05150
      0.850750
      4.600000
      14.793750
      4.191500
      2.000000
      0.778000
      1.000000
      1.000000
      ...
      2.336599e-01
      0.0
      4.500000
      10.198000
      0.096750
      1.292500
      2.703750
      0.241500
      0.000000
      0.236375
    
    
      max
      3160.500000
      220.41300
      1.507000
      16.500000
      31.335000
      6.321000
      2.000000
      1.000000
      1.000000
      1.000000
      ...
      5.172903e-01
      0.0
      25.000000
      176.675000
      0.824000
      8.050000
      27.760000
      1.884000
      1.000000
      0.562000
    
  

8 rows × 53 columns



In [42]:

    
alt_model_columns = ['GR', 'ILD_log10',
       'DeltaPHI', 'PHIND', 'PE', 'NM_M', 'RELPOS', 'A1', 'SH', 'LM', 'B1',
       'B2', 'B3', 'B4', 'B5', 'C', 'above_delta_Formation',
       'above_delta_Depth', 'above_delta_GR', 'above_delta_ILD_log10',
       'above_delta_DeltaPHI', 'above_delta_PHIND', 'above_delta_PE',
       'above_delta_NM_M', 'above_delta_RELPOS', 'below_delta_Formation',
       'below_delta_Depth', 'below_delta_GR', 'below_delta_ILD_log10',
       'below_delta_DeltaPHI', 'below_delta_PHIND', 'below_delta_PE',
       'below_delta_NM_M', 'below_delta_RELPOS', 'formation_delta_Formation',
       'formation_delta_Depth', 'formation_delta_GR',
       'formation_delta_ILD_log10', 'formation_delta_DeltaPHI',
       'formation_delta_PHIND', 'formation_delta_PE', 'formation_delta_NM_M',
       'formation_delta_RELPOS', 'formation_delta_med_Formation',
       'formation_delta_med_Depth', 'formation_delta_med_GR',
       'formation_delta_med_ILD_log10', 'formation_delta_med_DeltaPHI',
       'formation_delta_med_PHIND', 'formation_delta_med_PE',
       'formation_delta_med_NM_M', 'formation_delta_med_RELPOS']

std_scaler = preprocessing.StandardScaler().fit(frame[alt_model_columns])
norm = std_scaler.transform(frame[alt_model_columns])

norm_frame = frame
for i, column in enumerate(alt_model_columns):
    norm_frame.loc[:, column] = norm[:, i]

frame = norm_frame
frame.describe()









    Out[42]:






  
    
      
      Depth
      GR
      ILD_log10
      DeltaPHI
      PHIND
      PE
      NM_M
      RELPOS
      A1
      SH
      ...
      formation_delta_RELPOS
      formation_delta_med_Formation
      formation_delta_med_Depth
      formation_delta_med_GR
      formation_delta_med_ILD_log10
      formation_delta_med_DeltaPHI
      formation_delta_med_PHIND
      formation_delta_med_PE
      formation_delta_med_NM_M
      formation_delta_med_RELPOS
    
  
  
    
      count
      830.000000
      8.300000e+02
      8.300000e+02
      8.300000e+02
      8.300000e+02
      8.300000e+02
      8.300000e+02
      8.300000e+02
      8.300000e+02
      8.300000e+02
      ...
      8.300000e+02
      830.0
      8.300000e+02
      8.300000e+02
      8.300000e+02
      8.300000e+02
      8.300000e+02
      8.300000e+02
      8.300000e+02
      8.300000e+02
    
    
      mean
      2987.070482
      -7.704680e-17
      -1.198506e-16
      -3.424302e-17
      -3.210283e-16
      2.739442e-16
      1.369721e-16
      -1.112898e-16
      -1.027291e-16
      5.136454e-17
      ...
      8.560756e-18
      0.0
      -3.424302e-17
      1.712151e-17
      -1.070094e-17
      2.140189e-17
      2.140189e-17
      7.490661e-18
      8.774775e-17
      -3.424302e-17
    
    
      std
      94.391925
      1.000603e+00
      1.000603e+00
      1.000603e+00
      1.000603e+00
      1.000603e+00
      1.000603e+00
      1.000603e+00
      1.000603e+00
      1.000603e+00
      ...
      1.000603e+00
      0.0
      1.000603e+00
      1.000603e+00
      1.000603e+00
      1.000603e+00
      1.000603e+00
      1.000603e+00
      1.000603e+00
      1.000603e+00
    
    
      min
      2808.000000
      -1.656627e+00
      -3.935939e+00
      -3.416269e+00
      -1.889353e+00
      -2.373228e+00
      -1.452107e+00
      -1.848086e+00
      -6.024035e-01
      -6.829576e-01
      ...
      -1.794238e+00
      0.0
      -2.797537e+00
      -1.826260e+00
      -3.406552e+00
      -5.376268e+00
      -3.482929e+00
      -5.158872e+00
      -9.574299e+00
      -1.837613e+00
    
    
      25%
      2911.625000
      -7.574558e-01
      -4.348191e-01
      -7.095099e-01
      -7.625206e-01
      -7.432663e-01
      -1.452107e+00
      -8.335616e-01
      -6.024035e-01
      -6.829576e-01
      ...
      -8.401789e-01
      0.0
      -5.045513e-01
      -4.923518e-01
      -4.419448e-01
      -5.536867e-01
      -5.651456e-01
      -4.931162e-01
      3.473144e-02
      -8.401996e-01
    
    
      50%
      2993.750000
      2.663538e-02
      3.014625e-02
      -1.321116e-01
      -1.359673e-01
      -2.135479e-01
      6.886546e-01
      4.133311e-02
      -6.024035e-01
      -6.829576e-01
      ...
      -5.124788e-04
      0.0
      -1.212864e-03
      -1.498533e-01
      3.081250e-02
      -3.303719e-02
      -1.106695e-01
      -1.021665e-03
      3.473144e-02
      -9.021482e-03
    
    
      75%
      3055.375000
      5.612186e-01
      6.399796e-01
      5.081501e-01
      6.050525e-01
      8.274105e-01
      6.886546e-01
      8.561341e-01
      1.660017e+00
      1.464220e+00
      ...
      8.453020e-01
      0.0
      5.021255e-01
      2.743649e-01
      4.939950e-01
      5.107522e-01
      5.227272e-01
      5.109474e-01
      3.473144e-02
      8.451947e-01
    
    
      max
      3160.500000
      5.917647e+00
      2.917095e+00
      3.967453e+00
      3.793968e+00
      4.106583e+00
      6.886546e-01
      1.640888e+00
      1.660017e+00
      1.464220e+00
      ...
      1.871380e+00
      0.0
      2.795112e+00
      7.199504e+00
      3.975643e+00
      3.353814e+00
      6.392555e+00
      3.992973e+00
      9.643762e+00
      2.021944e+00
    
  

8 rows × 53 columns



In [43]:

    
#--------------------------------------
# TPOT Exported Model

from sklearn.ensemble import ExtraTreesClassifier, VotingClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline, make_union
from sklearn.preprocessing import FunctionTransformer

exported_pipeline = make_pipeline(
    ExtraTreesClassifier(criterion="entropy", max_features=0.48, n_estimators=500)
)

exported_pipeline.fit(train[alt_model_columns], train['class'])









    Out[43]:





Pipeline(steps=[('extratreesclassifier', ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='entropy',
           max_depth=None, max_features=0.48, max_leaf_nodes=None,
           min_impurity_split=1e-07, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           n_estimators=500, n_jobs=1, oob_score=False, random_state=None,
           verbose=0, warm_start=False))])



In [44]:

    
frame['Facies'] = exported_pipeline.predict(frame[alt_model_columns])



In [52]:

    
frame['Facies']









    Out[52]:





0      3
1      3
2      3
3      3
4      3
5      3
6      3
7      3
8      3
9      2
10     2
11     2
12     2
13     2
14     2
15     2
16     2
17     2
18     2
19     2
20     2
21     2
22     2
23     2
24     2
25     2
26     2
27     2
28     2
29     2
      ..
800    7
801    9
802    7
803    8
804    6
805    6
806    8
807    8
808    6
809    6
810    8
811    8
812    3
813    3
814    3
815    3
816    3
817    3
818    3
819    3
820    3
821    3
822    3
823    3
824    3
825    3
826    3
827    3
828    3
829    3
Name: Facies, dtype: int64



In [46]:

    
frame.to_csv('02 - Well Facies Prediction - Test Data Set.csv')

	class	Formation	Well Name	Depth	GR	ILD_log10	DeltaPHI	PHIND	PE	NM_M	RELPOS
0	3	A1 SH	SHRIMPLIN	2793.0	77.45	0.664	9.9	11.915	4.6	1	1.000
1	3	A1 SH	SHRIMPLIN	2793.5	78.26	0.661	14.2	12.565	4.1	1	0.979
2	3	A1 SH	SHRIMPLIN	2794.0	79.05	0.658	14.8	13.050	3.6	1	0.957
3	3	A1 SH	SHRIMPLIN	2794.5	86.10	0.655	13.9	13.115	3.5	1	0.936
4	3	A1 SH	SHRIMPLIN	2795.0	74.58	0.647	13.5	13.300	3.4	1	0.915

	class	Depth	GR	ILD_log10	DeltaPHI	PHIND	PE	NM_M	RELPOS
count	3232.000000	3232.000000	3232.000000	3232.000000	3232.000000	3232.000000	3232.000000	3232.000000	3232.000000
mean	4.422030	2875.824567	66.135769	0.642719	3.559642	13.483213	3.725014	1.498453	0.520287
std	2.504243	131.006274	30.854826	0.241845	5.228948	7.698980	0.896152	0.500075	0.286792
min	1.000000	2573.500000	13.250000	-0.025949	-21.832000	0.550000	0.200000	1.000000	0.010000
25%	2.000000	2791.000000	46.918750	0.492750	1.163750	8.346750	3.100000	1.000000	0.273000
50%	4.000000	2893.500000	65.721500	0.624437	3.500000	12.150000	3.551500	1.000000	0.526000
75%	6.000000	2980.000000	79.626250	0.812735	6.432500	16.453750	4.300000	2.000000	0.767250
max	9.000000	3122.500000	361.150000	1.480000	18.600000	84.400000	8.094000	2.000000	1.000000

	GR	ILD_log10	DeltaPHI	PHIND	PE	NM_M	RELPOS	A1	SH	LM	...	formation_delta_RELPOS	formation_delta_med_Formation	formation_delta_med_Depth	formation_delta_med_GR	formation_delta_med_ILD_log10	formation_delta_med_DeltaPHI	formation_delta_med_PHIND	formation_delta_med_PE	formation_delta_med_NM_M	formation_delta_med_RELPOS
count	3.232000e+03	3.232000e+03	3.232000e+03	3.232000e+03	3.232000e+03	3.232000e+03	3.232000e+03	3.232000e+03	3.232000e+03	3.232000e+03	...	3.232000e+03	3232.0	3.232000e+03	3.232000e+03	3.232000e+03	3.232000e+03	3.232000e+03	3.232000e+03	3.232000e+03	3.232000e+03
mean	3.209754e-16	4.484861e-16	-3.517538e-17	-1.363046e-16	5.100431e-16	-3.517538e-17	1.429000e-16	9.233538e-17	-1.934646e-16	-7.035077e-17	...	-4.232038e-17	0.0	1.319077e-17	3.077846e-17	-8.793846e-18	-5.496154e-19	-1.758769e-17	5.496154e-18	3.077846e-17	2.967923e-17
std	1.000155e+00	1.000155e+00	1.000155e+00	1.000155e+00	1.000155e+00	1.000155e+00	1.000155e+00	1.000155e+00	1.000155e+00	1.000155e+00	...	1.000155e+00	0.0	1.000155e+00	1.000155e+00	1.000155e+00	1.000155e+00	1.000155e+00	1.000155e+00	1.000155e+00	1.000155e+00
min	-1.714285e+00	-2.765296e+00	-4.856726e+00	-1.680121e+00	-3.934108e+00	-9.969107e-01	-1.779569e+00	-5.625806e-01	-9.956777e-01	-1.004341e+00	...	-2.494438e+00	0.0	-1.138142e+01	-2.715730e+00	-4.029226e+00	-6.260693e+00	-4.250182e+00	-5.731779e+00	-8.744320e+00	-2.549279e+00
25%	-6.229169e-01	-6.202015e-01	-4.582685e-01	-6.672647e-01	-6.975496e-01	-9.969107e-01	-8.623869e-01	-5.625806e-01	-9.956777e-01	-1.004341e+00	...	-8.358484e-01	0.0	-5.487312e-01	-4.287593e-01	-4.695474e-01	-3.941843e-01	-4.363377e-01	-4.660069e-01	3.258753e-02	-8.335645e-01
50%	-1.342848e-02	-7.560721e-02	-1.140783e-02	-1.731943e-01	-1.936510e-01	-9.969107e-01	1.992191e-02	-5.625806e-01	-9.956777e-01	9.956777e-01	...	-2.351765e-03	0.0	1.056513e-02	-1.604354e-01	3.090394e-03	1.516462e-02	-1.458224e-01	5.452730e-02	3.258753e-02	-4.184040e-03
75%	4.372920e-01	7.031033e-01	5.494992e-01	3.858948e-01	6.417158e-01	1.003099e+00	8.612539e-01	-5.625806e-01	1.004341e+00	9.956777e-01	...	8.400663e-01	0.0	5.698615e-01	1.757048e-01	5.204175e-01	4.910436e-01	2.205165e-01	5.211649e-01	3.258753e-02	8.323156e-01
max	9.562844e+00	3.462598e+00	2.876809e+00	9.212618e+00	4.876027e+00	1.003099e+00	1.672943e+00	1.777523e+00	1.004341e+00	9.956777e-01	...	2.495892e+00	0.0	9.812970e+00	1.110082e+01	3.929107e+00	4.480679e+00	1.127884e+01	5.779237e+00	8.809495e+00	2.879070e+00

	Depth	GR	ILD_log10	DeltaPHI	PHIND	PE	NM_M	RELPOS	A1	SH	...	formation_delta_RELPOS	formation_delta_med_Formation	formation_delta_med_Depth	formation_delta_med_GR	formation_delta_med_ILD_log10	formation_delta_med_DeltaPHI	formation_delta_med_PHIND	formation_delta_med_PE	formation_delta_med_NM_M	formation_delta_med_RELPOS
count	830.000000	830.00000	830.000000	830.000000	830.000000	830.000000	830.000000	830.000000	830.000000	830.000000	...	8.300000e+02	830.0	830.000000	830.000000	830.000000	830.000000	830.000000	830.000000	830.000000	830.000000
mean	2987.070482	57.61173	0.666312	2.851964	11.655277	3.654178	1.678313	0.535807	0.266265	0.318072	...	6.420567e-18	0.0	0.010843	3.602400	-0.006436	0.078524	0.472410	0.000482	-0.003614	0.002496
std	94.391925	27.52774	0.288367	3.442074	5.190236	0.649793	0.467405	0.283062	0.442271	0.466009	...	2.765885e-01	0.0	8.945698	24.054012	0.209007	2.378272	4.271225	0.471993	0.104132	0.276883
min	2808.000000	12.03600	-0.468000	-8.900000	1.855000	2.113000	1.000000	0.013000	0.000000	0.000000	...	-4.959667e-01	0.0	-25.000000	-40.300000	-0.718000	-12.700000	-14.395000	-2.433000	-1.000000	-0.506000
25%	2911.625000	36.77325	0.541000	0.411250	7.700000	3.171500	1.000000	0.300000	0.000000	0.000000	...	-2.322438e-01	0.0	-4.500000	-8.233500	-0.098750	-1.237500	-1.940000	-0.232125	0.000000	-0.230000
50%	2993.750000	58.34450	0.675000	2.397500	10.950000	3.515500	2.000000	0.547500	0.000000	0.000000	...	-1.416603e-04	0.0	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
75%	3055.375000	73.05150	0.850750	4.600000	14.793750	4.191500	2.000000	0.778000	1.000000	1.000000	...	2.336599e-01	0.0	4.500000	10.198000	0.096750	1.292500	2.703750	0.241500	0.000000	0.236375
max	3160.500000	220.41300	1.507000	16.500000	31.335000	6.321000	2.000000	1.000000	1.000000	1.000000	...	5.172903e-01	0.0	25.000000	176.675000	0.824000	8.050000	27.760000	1.884000	1.000000	0.562000