In this notebook, we mainly utilize extreme gradient boost to improve the prediction model originially proposed in TLE 2016 November machine learning tuotrial. Extreme gradient boost can be viewed as an enhanced version of gradient boost by using a more regularized model formalization to control over-fitting, and XGB usually performs better. Applications of XGB can be found in many Kaggle competitions. Some recommended tutorrials can be found

Our work will be orginized in the follwing order:

•Background

•Exploratory Data Analysis

•Data Prepration and Model Selection

•Final Results

Background

The dataset we will use comes from a class excercise from The University of Kansas on Neural Networks and Fuzzy Systems. This exercise is based on a consortium project to use machine learning techniques to create a reservoir model of the largest gas fields in North America, the Hugoton and Panoma Fields. For more info on the origin of the data, see Bohling and Dubois (2003) and Dubois et al. (2007).

The dataset we will use is log data from nine wells that have been labeled with a facies type based on oberservation of core. We will use this log data to train a classifier to predict facies types.

This data is from the Council Grove gas reservoir in Southwest Kansas. The Panoma Council Grove Field is predominantly a carbonate gas reservoir encompassing 2700 square miles in Southwestern Kansas. This dataset is from nine wells (with 4149 examples), consisting of a set of seven predictor variables and a rock facies (class) for each example vector and validation (test) data (830 examples from two wells) having the same seven predictor variables in the feature vector. Facies are based on examination of cores from nine wells taken vertically at half-foot intervals. Predictor variables include five from wireline log measurements and two geologic constraining variables that are derived from geologic knowledge. These are essentially continuous variables sampled at a half-foot sample rate.

The seven predictor variables are: •Five wire line log curves include gamma ray (GR), resistivity logging (ILD_log10), photoelectric effect (PE), neutron-density porosity difference and average neutron-density porosity (DeltaPHI and PHIND). Note, some wells do not have PE. •Two geologic constraining variables: nonmarine-marine indicator (NM_M) and relative position (RELPOS)

The nine discrete facies (classes of rocks) are:

1.Nonmarine sandstone

2.Nonmarine coarse siltstone

3.Nonmarine fine siltstone

4.Marine siltstone and shale

5.Mudstone (limestone)

6.Wackestone (limestone)

7.Dolomite

8.Packstone-grainstone (limestone)

9.Phylloid-algal bafflestone (limestone)

These facies aren't discrete, and gradually blend into one another. Some have neighboring facies that are rather close. Mislabeling within these neighboring facies can be expected to occur. The following table lists the facies, their abbreviated labels and their approximate neighbors.

Facies/ Label/ Adjacent Facies

1 SS 2

2 CSiS 1,3

3 FSiS 2

4 SiSh 5

5 MS 4,6

6 WS 5,7

7 D 6,8

8 PS 6,7,9

9 BS 7,8

The first thing we notice for this data is that it seems that neighboring facies are not symmetric, for example, the adjacent facies for 9 could be 7, yet the adjacent facies for 7 couldn't be 9. We already contacted the authors regarding this.

Exprolatory Data Analysis

After the background intorduction, we start to import the pandas library for some basic data analysis and manipulation. The matplotblib and seaborn are imported for data vislization.



In [1]:

    
%matplotlib inline
import pandas as pd
from pandas.tools.plotting import scatter_matrix
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
import matplotlib.colors as colors



In [4]:

    
filename = '../facies_vectors.csv'
training_data = pd.read_csv(filename)
training_data









    Out[4]:






  
    
      
      Facies
      Formation
      Well Name
      Depth
      GR
      ILD_log10
      DeltaPHI
      PHIND
      PE
      NM_M
      RELPOS
    
  
  
    
      0
      3
      A1 SH
      SHRIMPLIN
      2793.0
      77.450
      0.664
      9.900
      11.915
      4.600
      1
      1.000
    
    
      1
      3
      A1 SH
      SHRIMPLIN
      2793.5
      78.260
      0.661
      14.200
      12.565
      4.100
      1
      0.979
    
    
      2
      3
      A1 SH
      SHRIMPLIN
      2794.0
      79.050
      0.658
      14.800
      13.050
      3.600
      1
      0.957
    
    
      3
      3
      A1 SH
      SHRIMPLIN
      2794.5
      86.100
      0.655
      13.900
      13.115
      3.500
      1
      0.936
    
    
      4
      3
      A1 SH
      SHRIMPLIN
      2795.0
      74.580
      0.647
      13.500
      13.300
      3.400
      1
      0.915
    
    
      5
      3
      A1 SH
      SHRIMPLIN
      2795.5
      73.970
      0.636
      14.000
      13.385
      3.600
      1
      0.894
    
    
      6
      3
      A1 SH
      SHRIMPLIN
      2796.0
      73.720
      0.630
      15.600
      13.930
      3.700
      1
      0.872
    
    
      7
      3
      A1 SH
      SHRIMPLIN
      2796.5
      75.650
      0.625
      16.500
      13.920
      3.500
      1
      0.830
    
    
      8
      3
      A1 SH
      SHRIMPLIN
      2797.0
      73.790
      0.624
      16.200
      13.980
      3.400
      1
      0.809
    
    
      9
      3
      A1 SH
      SHRIMPLIN
      2797.5
      76.890
      0.615
      16.900
      14.220
      3.500
      1
      0.787
    
    
      10
      3
      A1 SH
      SHRIMPLIN
      2798.0
      76.110
      0.600
      14.800
      13.375
      3.600
      1
      0.766
    
    
      11
      3
      A1 SH
      SHRIMPLIN
      2798.5
      74.950
      0.583
      13.300
      12.690
      3.700
      1
      0.745
    
    
      12
      3
      A1 SH
      SHRIMPLIN
      2799.0
      71.870
      0.561
      11.300
      12.475
      3.500
      1
      0.723
    
    
      13
      3
      A1 SH
      SHRIMPLIN
      2799.5
      83.420
      0.537
      13.300
      14.930
      3.400
      1
      0.702
    
    
      14
      2
      A1 SH
      SHRIMPLIN
      2800.0
      90.100
      0.519
      14.300
      16.555
      3.200
      1
      0.681
    
    
      15
      2
      A1 SH
      SHRIMPLIN
      2800.5
      78.150
      0.467
      11.800
      15.960
      3.100
      1
      0.638
    
    
      16
      2
      A1 SH
      SHRIMPLIN
      2801.0
      69.300
      0.438
      9.500
      15.120
      3.100
      1
      0.617
    
    
      17
      2
      A1 SH
      SHRIMPLIN
      2801.5
      63.540
      0.418
      8.800
      15.190
      3.000
      1
      0.596
    
    
      18
      2
      A1 SH
      SHRIMPLIN
      2802.0
      63.870
      0.401
      7.200
      15.390
      2.900
      1
      0.574
    
    
      19
      2
      A1 SH
      SHRIMPLIN
      2802.5
      58.320
      0.386
      6.600
      14.885
      2.800
      1
      0.553
    
    
      20
      2
      A1 SH
      SHRIMPLIN
      2803.0
      56.610
      0.369
      5.500
      14.800
      3.000
      1
      0.532
    
    
      21
      2
      A1 SH
      SHRIMPLIN
      2803.5
      55.970
      0.352
      6.100
      14.460
      3.000
      1
      0.511
    
    
      22
      2
      A1 SH
      SHRIMPLIN
      2804.0
      63.670
      0.344
      6.000
      14.745
      3.000
      1
      0.489
    
    
      23
      2
      A1 SH
      SHRIMPLIN
      2804.5
      66.200
      0.342
      6.800
      15.135
      3.000
      1
      0.468
    
    
      24
      2
      A1 SH
      SHRIMPLIN
      2805.0
      61.270
      0.346
      6.100
      15.480
      3.000
      1
      0.447
    
    
      25
      3
      A1 SH
      SHRIMPLIN
      2805.5
      69.480
      0.354
      5.800
      14.675
      3.000
      1
      0.404
    
    
      26
      3
      A1 SH
      SHRIMPLIN
      2806.0
      76.370
      0.354
      5.200
      13.635
      3.000
      1
      0.383
    
    
      27
      2
      A1 SH
      SHRIMPLIN
      2806.5
      82.200
      0.348
      7.400
      15.055
      3.000
      1
      0.362
    
    
      28
      2
      A1 SH
      SHRIMPLIN
      2807.0
      90.250
      0.346
      11.500
      20.230
      3.100
      1
      0.340
    
    
      29
      2
      A1 SH
      SHRIMPLIN
      2807.5
      94.380
      0.358
      14.200
      24.015
      3.000
      1
      0.319
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      4119
      8
      C LM
      CHURCHMAN BIBLE
      3108.0
      30.734
      0.991
      1.552
      5.382
      4.738
      2
      0.887
    
    
      4120
      6
      C LM
      CHURCHMAN BIBLE
      3108.5
      32.219
      1.013
      1.342
      5.055
      4.637
      2
      0.879
    
    
      4121
      6
      C LM
      CHURCHMAN BIBLE
      3109.0
      37.688
      1.040
      0.681
      4.739
      4.539
      2
      0.871
    
    
      4122
      6
      C LM
      CHURCHMAN BIBLE
      3109.5
      35.844
      1.044
      0.960
      3.533
      4.832
      2
      0.863
    
    
      4123
      6
      C LM
      CHURCHMAN BIBLE
      3110.0
      42.156
      1.051
      1.448
      3.337
      4.797
      2
      0.855
    
    
      4124
      6
      C LM
      CHURCHMAN BIBLE
      3110.5
      42.094
      1.057
      2.736
      4.051
      4.500
      2
      0.847
    
    
      4125
      5
      C LM
      CHURCHMAN BIBLE
      3111.0
      49.719
      1.060
      3.092
      5.893
      3.830
      2
      0.839
    
    
      4126
      5
      C LM
      CHURCHMAN BIBLE
      3111.5
      46.219
      1.062
      3.018
      6.503
      3.434
      2
      0.831
    
    
      4127
      6
      C LM
      CHURCHMAN BIBLE
      3112.0
      42.313
      1.050
      2.245
      5.958
      3.318
      2
      0.823
    
    
      4128
      6
      C LM
      CHURCHMAN BIBLE
      3112.5
      36.031
      1.028
      1.193
      5.936
      3.393
      2
      0.815
    
    
      4129
      6
      C LM
      CHURCHMAN BIBLE
      3113.0
      32.594
      1.014
      0.662
      5.978
      3.422
      2
      0.806
    
    
      4130
      6
      C LM
      CHURCHMAN BIBLE
      3113.5
      37.094
      1.005
      0.377
      6.605
      3.697
      2
      0.798
    
    
      4131
      5
      C LM
      CHURCHMAN BIBLE
      3114.0
      40.031
      1.027
      0.615
      6.270
      4.035
      2
      0.790
    
    
      4132
      5
      C LM
      CHURCHMAN BIBLE
      3114.5
      42.500
      1.057
      0.672
      5.871
      4.422
      2
      0.782
    
    
      4133
      6
      C LM
      CHURCHMAN BIBLE
      3115.0
      39.719
      1.087
      0.648
      4.479
      4.203
      2
      0.774
    
    
      4134
      6
      C LM
      CHURCHMAN BIBLE
      3115.5
      38.844
      1.109
      1.025
      2.686
      3.908
      2
      0.766
    
    
      4135
      6
      C LM
      CHURCHMAN BIBLE
      3116.0
      41.719
      1.107
      0.659
      2.320
      3.943
      2
      0.758
    
    
      4136
      5
      C LM
      CHURCHMAN BIBLE
      3116.5
      44.750
      1.085
      1.165
      2.937
      4.020
      2
      0.750
    
    
      4137
      5
      C LM
      CHURCHMAN BIBLE
      3117.0
      46.469
      1.070
      1.872
      5.013
      4.156
      2
      0.742
    
    
      4138
      5
      C LM
      CHURCHMAN BIBLE
      3117.5
      51.000
      1.061
      3.760
      6.445
      3.828
      2
      0.734
    
    
      4139
      5
      C LM
      CHURCHMAN BIBLE
      3118.0
      55.563
      1.052
      4.296
      7.325
      3.805
      2
      0.726
    
    
      4140
      5
      C LM
      CHURCHMAN BIBLE
      3118.5
      58.313
      1.034
      3.863
      7.465
      3.584
      2
      0.718
    
    
      4141
      5
      C LM
      CHURCHMAN BIBLE
      3119.0
      55.344
      1.003
      2.225
      7.541
      3.645
      2
      0.710
    
    
      4142
      5
      C LM
      CHURCHMAN BIBLE
      3119.5
      53.313
      0.972
      1.640
      7.295
      3.629
      2
      0.702
    
    
      4143
      5
      C LM
      CHURCHMAN BIBLE
      3120.0
      49.594
      0.954
      1.494
      7.149
      3.727
      2
      0.694
    
    
      4144
      5
      C LM
      CHURCHMAN BIBLE
      3120.5
      46.719
      0.947
      1.828
      7.254
      3.617
      2
      0.685
    
    
      4145
      5
      C LM
      CHURCHMAN BIBLE
      3121.0
      44.563
      0.953
      2.241
      8.013
      3.344
      2
      0.677
    
    
      4146
      5
      C LM
      CHURCHMAN BIBLE
      3121.5
      49.719
      0.964
      2.925
      8.013
      3.190
      2
      0.669
    
    
      4147
      5
      C LM
      CHURCHMAN BIBLE
      3122.0
      51.469
      0.965
      3.083
      7.708
      3.152
      2
      0.661
    
    
      4148
      5
      C LM
      CHURCHMAN BIBLE
      3122.5
      50.031
      0.970
      2.609
      6.668
      3.295
      2
      0.653
    
  

4149 rows × 11 columns



In [11]:

    
training_data['Well Name'] = training_data['Well Name'].astype('category')
training_data['Formation'] = training_data['Formation'].astype('category')
training_data.info()









    



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4149 entries, 0 to 4148
Data columns (total 11 columns):
Facies       4149 non-null int64
Formation    4149 non-null category
Well Name    4149 non-null category
Depth        4149 non-null float64
GR           4149 non-null float64
ILD_log10    4149 non-null float64
DeltaPHI     4149 non-null float64
PHIND        4149 non-null float64
PE           3232 non-null float64
NM_M         4149 non-null int64
RELPOS       4149 non-null float64
dtypes: category(2), float64(7), int64(2)
memory usage: 300.1 KB



In [5]:

    
facies_colors = ['#F4D03F', '#F5B041','#DC7633','#6E2C00','#1B4F72',
                 '#2E86C1', '#AED6F1', '#A569BD', '#196F3D']

facies_labels = ['SS', 'CSiS', 'FSiS', 'SiSh', 'MS','WS', 'D','PS', 'BS']

facies_counts = training_data['Facies'].value_counts().sort_index()
facies_counts.index = facies_labels
facies_counts.plot(kind='bar',color=facies_colors,title='Distribution of Training Data by Facies')









    Out[5]:





<matplotlib.axes._subplots.AxesSubplot at 0x115e3ccc0>



In [6]:

    
sns.heatmap(training_data.corr(), vmax=1.0, square=True)









    Out[6]:





<matplotlib.axes._subplots.AxesSubplot at 0x115e3a4e0>



In [7]:

    
training_data.describe()









    



/Users/littleni/anaconda/lib/python3.5/site-packages/numpy/lib/function_base.py:3834: RuntimeWarning: Invalid value encountered in percentile
  RuntimeWarning)






    Out[7]:






  
    
      
      Facies
      Depth
      GR
      ILD_log10
      DeltaPHI
      PHIND
      PE
      NM_M
      RELPOS
    
  
  
    
      count
      4149.000000
      4149.000000
      4149.000000
      4149.000000
      4149.000000
      4149.000000
      3232.000000
      4149.000000
      4149.000000
    
    
      mean
      4.503254
      2906.867438
      64.933985
      0.659566
      4.402484
      13.201066
      3.725014
      1.518438
      0.521852
    
    
      std
      2.474324
      133.300164
      30.302530
      0.252703
      5.274947
      7.132846
      0.896152
      0.499720
      0.286644
    
    
      min
      1.000000
      2573.500000
      10.149000
      -0.025949
      -21.832000
      0.550000
      0.200000
      1.000000
      0.000000
    
    
      25%
      2.000000
      2821.500000
      44.730000
      0.498000
      1.600000
      8.500000
      NaN
      1.000000
      0.277000
    
    
      50%
      4.000000
      2932.500000
      64.990000
      0.639000
      4.300000
      12.020000
      NaN
      2.000000
      0.528000
    
    
      75%
      6.000000
      3007.000000
      79.438000
      0.822000
      7.500000
      16.050000
      NaN
      2.000000
      0.769000
    
    
      max
      9.000000
      3138.000000
      361.150000
      1.800000
      19.312000
      84.400000
      8.094000
      2.000000
      1.000000

Data Preparation and Model Selection

Now we are ready to test the XGB approach, along the way confusion matrix and f1_score are imported as metric for classification, as well as GridSearchCV, which is an excellent tool for parameter optimization.



In [17]:

    
import xgboost as xgb
import numpy as np
from sklearn.metrics import confusion_matrix, f1_score
from classification_utilities import display_cm, display_adj_cm
from sklearn.model_selection import GridSearchCV



In [12]:

    
X_train = training_data.drop(['Facies', 'Well Name','Formation','Depth'], axis = 1 ) 
Y_train = training_data['Facies' ] - 1
dtrain = xgb.DMatrix(X_train, Y_train)

The accuracy function and accuracy_adjacent function are defined in teh following to quatify the prediction correctness.



In [13]:

    
def accuracy(conf):
    total_correct = 0.
    nb_classes = conf.shape[0]
    for i in np.arange(0,nb_classes):
        total_correct += conf[i][i]
    acc = total_correct/sum(sum(conf))
    return acc

adjacent_facies = np.array([[1], [0,2], [1], [4], [3,5], [4,6,7], [5,7], [5,6,8], [6,7]])

def accuracy_adjacent(conf, adjacent_facies):
    nb_classes = conf.shape[0]
    total_correct = 0.
    for i in np.arange(0,nb_classes):
        total_correct += conf[i][i]
        for j in adjacent_facies[i]:
            total_correct += conf[i][j]
    return total_correct / sum(sum(conf))

Initial model



In [10]:

    
# Proposed Initial Model
xgb1 = xgb.XGBClassifier( learning_rate =0.1, n_estimators=200, max_depth=5,
                          min_child_weight=1, gamma=0, subsample=0.6,
                          colsample_bytree=0.6, reg_alpha=0, reg_lambda=1, objective='multi:softmax',
                          nthread=4, scale_pos_weight=1, seed=100)


#Fit the algorithm on the data
xgb1.fit(X_train, Y_train,eval_metric='merror')

#Predict training set:
predictions = xgb1.predict(X_train)
        
#Print model report

# Confusion Matrix
conf = confusion_matrix(Y_train, predictions)

# Print Results
print ("\nModel Report")
print ("-Accuracy: %.6f" % ( accuracy(conf) ))
print ("-Adjacent Accuracy: %.6f" % ( accuracy_adjacent(conf, adjacent_facies) ))

print ("\nConfusion Matrix")
display_cm(conf, facies_labels, display_metrics=True, hide_zeros=True)

# Print Feature Importance
feat_imp = pd.Series(xgb1.booster().get_fscore()).sort_values(ascending=False)
feat_imp.plot(kind='bar', title='Feature Importances')
plt.ylabel('Feature Importance Score')









    



Model Report
-Accuracy: 0.970354
-Adjacent Accuracy: 0.993492

Confusion Matrix
     Pred    SS  CSiS  FSiS  SiSh    MS    WS     D    PS    BS Total
     True
       SS   259     5     4                                       268
     CSiS     1   919    20                                       940
     FSiS     1    34   745                                       780
     SiSh                     268           2           1         271
       MS           1           1   285     5           4         296
       WS                 1     1     4   566          10         582
        D                                   1   137           3   141
       PS                 1     4     2    15         664         686
       BS                       1           1               183   185

Precision  0.99  0.96  0.97  0.97  0.98  0.96  1.00  0.98  0.98  0.97
   Recall  0.97  0.98  0.96  0.99  0.96  0.97  0.97  0.97  0.99  0.97
       F1  0.98  0.97  0.96  0.98  0.97  0.97  0.99  0.97  0.99  0.97






    Out[10]:





<matplotlib.text.Text at 0xbbc4a90>



In [11]:

    
# Cross Validation parameters
cv_folds = 10
rounds = 100

xgb_param_1 = xgb1.get_xgb_params()
xgb_param_1['num_class'] = 9

# Perform cross-validation
cvresult1 = xgb.cv(xgb_param_1, dtrain, num_boost_round=xgb_param_1['n_estimators'], 
                  stratified = True, nfold=cv_folds, metrics='merror', early_stopping_rounds=rounds)

print ("\nCross Validation Training Report Summary")
print (cvresult1.head())
print (cvresult1.tail())









    



Cross Validation Training Report Summary
   test-merror-mean  test-merror-std  train-merror-mean  train-merror-std
0          0.463624         0.034581           0.419595          0.019798
1          0.433773         0.028935           0.372004          0.014199
2          0.408699         0.026354           0.350609          0.007946
3          0.404589         0.026290           0.339788          0.007658
4          0.398107         0.024423           0.331486          0.007193
     test-merror-mean  test-merror-std  train-merror-mean  train-merror-std
195          0.292358         0.021023           0.023353          0.000796
196          0.290915         0.021367           0.022790          0.000619
197          0.291154         0.020785           0.022522          0.000776
198          0.291633         0.021096           0.022281          0.000906
199          0.290673         0.019750           0.021612          0.001124

The typical range for learning rate is around 0.01~0.2, so we vary ther learning rate a bit and at the same time, scan over the number of boosted trees to fit. This will take a little bit of time to finish.



In [12]:

    
print("Parameter optimization")
grid_search1 = GridSearchCV(xgb1,{'learning_rate':[0.05,0.01,0.1,0.2] , 'n_estimators':[200,400,600,800]},
                                   scoring='accuracy' , n_jobs = 4)
grid_search1.fit(X_train,Y_train)
print("Best Set of Parameters")
grid_search1.grid_scores_, grid_search1.best_params_, grid_search1.best_score_









    



Parameter optimization
Best Set of Parameters






    



C:\Users\chenzhan\AppData\Local\Continuum\Anaconda64\lib\site-packages\sklearn\model_selection\_search.py:667: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20
  DeprecationWarning)






    Out[12]:





([mean: 0.54616, std: 0.03023, params: {'n_estimators': 200, 'learning_rate': 0.05},
  mean: 0.53893, std: 0.02403, params: {'n_estimators': 400, 'learning_rate': 0.05},
  mean: 0.53651, std: 0.02372, params: {'n_estimators': 600, 'learning_rate': 0.05},
  mean: 0.53169, std: 0.02483, params: {'n_estimators': 800, 'learning_rate': 0.05},
  mean: 0.55363, std: 0.02880, params: {'n_estimators': 200, 'learning_rate': 0.01},
  mean: 0.55604, std: 0.02784, params: {'n_estimators': 400, 'learning_rate': 0.01},
  mean: 0.55411, std: 0.02605, params: {'n_estimators': 600, 'learning_rate': 0.01},
  mean: 0.54832, std: 0.02556, params: {'n_estimators': 800, 'learning_rate': 0.01},
  mean: 0.53989, std: 0.02591, params: {'n_estimators': 200, 'learning_rate': 0.1},
  mean: 0.53507, std: 0.02213, params: {'n_estimators': 400, 'learning_rate': 0.1},
  mean: 0.52711, std: 0.02248, params: {'n_estimators': 600, 'learning_rate': 0.1},
  mean: 0.52663, std: 0.02164, params: {'n_estimators': 800, 'learning_rate': 0.1},
  mean: 0.52398, std: 0.02532, params: {'n_estimators': 200, 'learning_rate': 0.2},
  mean: 0.52615, std: 0.02738, params: {'n_estimators': 400, 'learning_rate': 0.2},
  mean: 0.52061, std: 0.02497, params: {'n_estimators': 600, 'learning_rate': 0.2},
  mean: 0.51747, std: 0.02464, params: {'n_estimators': 800, 'learning_rate': 0.2}],
 {'learning_rate': 0.01, 'n_estimators': 400},
 0.5560375994215474)

It seems that we need to adjust the learning rate and make it smaller, which could help to reduce overfitting in my opinion. The number of boosted trees to fit also requires to be updated.



In [13]:

    
# Proposed Model with optimized learning rate and number of boosted trees to fit
xgb2 = xgb.XGBClassifier( learning_rate =0.01, n_estimators=400, max_depth=5,
                          min_child_weight=1, gamma=0, subsample=0.6,
                          colsample_bytree=0.6, reg_alpha=0, reg_lambda=1, objective='multi:softmax',
                          nthread=4, scale_pos_weight=1, seed=100)

#Fit the algorithm on the data
xgb2.fit(X_train, Y_train,eval_metric='merror')

#Predict training set:
predictions = xgb2.predict(X_train)
        
#Print model report

# Confusion Matrix
conf = confusion_matrix(Y_train, predictions )

# Print Results
print ("\nModel Report")
print ("-Accuracy: %.6f" % ( accuracy(conf) ))
print ("-Adjacent Accuracy: %.6f" % ( accuracy_adjacent(conf, adjacent_facies) ))

# Confusion Matrix
print ("\nConfusion Matrix")
display_cm(conf, facies_labels, display_metrics=True, hide_zeros=True)

# Print Feature Importance
feat_imp = pd.Series(xgb2.booster().get_fscore()).sort_values(ascending=False)
feat_imp.plot(kind='bar', title='Feature Importances')
plt.ylabel('Feature Importance Score')









    



Model Report
-Accuracy: 0.779706
-Adjacent Accuracy: 0.952519

Confusion Matrix
     Pred    SS  CSiS  FSiS  SiSh    MS    WS     D    PS    BS Total
     True
       SS   166    89    13                                       268
     CSiS    18   803   116                 2           1         940
     FSiS         139   635                 1           5         780
     SiSh                 6   224     1    27     2    11         271
       MS           6     5    17   151    75     2    40         296
       WS     1           2    30    14   432     7    93     3   582
        D                 1     3           5   106    25     1   141
       PS           1     6    16     4    82     9   566     2   686
       BS                                   8     2    23   152   185

Precision  0.90  0.77  0.81  0.77  0.89  0.68  0.83  0.74  0.96  0.79
   Recall  0.62  0.85  0.81  0.83  0.51  0.74  0.75  0.83  0.82  0.78
       F1  0.73  0.81  0.81  0.80  0.65  0.71  0.79  0.78  0.89  0.78






    Out[13]:





<matplotlib.text.Text at 0xbc80eb8>



In [14]:

    
# Cross Validation parameters
cv_folds = 10
rounds = 100

xgb_param_2 = xgb2.get_xgb_params()
xgb_param_2['num_class'] = 9

# Perform cross-validation
cvresult2 = xgb.cv(xgb_param_2, dtrain, num_boost_round=xgb_param_2['n_estimators'], 
                  stratified = True, nfold=cv_folds, metrics='merror', early_stopping_rounds=rounds)

print ("\nCross Validation Training Report Summary")
print (cvresult2.head())
print (cvresult2.tail())









    



Cross Validation Training Report Summary
   test-merror-mean  test-merror-std  train-merror-mean  train-merror-std
0          0.463624         0.034581           0.419595          0.019798
1          0.435210         0.031384           0.375298          0.014082
2          0.420986         0.024074           0.356848          0.010152
3          0.416908         0.024509           0.351465          0.008039
4          0.403438         0.015630           0.345599          0.005708
     test-merror-mean  test-merror-std  train-merror-mean  train-merror-std
395          0.336699         0.025564           0.214028          0.002945
396          0.337423         0.025555           0.213947          0.002970
397          0.336940         0.025623           0.213760          0.002869
398          0.336223         0.026504           0.213385          0.002796
399          0.335978         0.025790           0.213305          0.002611



In [15]:

    
print("Parameter optimization")
grid_search2 = GridSearchCV(xgb2,{'reg_alpha':[0, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10], 'reg_lambda':[0, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10] },
                                   scoring='accuracy' , n_jobs = 4)
grid_search2.fit(X_train,Y_train)
print("Best Set of Parameters")
grid_search2.grid_scores_, grid_search2.best_params_, grid_search2.best_score_









    



Parameter optimization
Best Set of Parameters






    



C:\Users\chenzhan\AppData\Local\Continuum\Anaconda64\lib\site-packages\sklearn\model_selection\_search.py:667: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20
  DeprecationWarning)






    Out[15]:





([mean: 0.55363, std: 0.02560, params: {'reg_alpha': 0, 'reg_lambda': 0},
  mean: 0.55483, std: 0.02838, params: {'reg_alpha': 0, 'reg_lambda': 0.05},
  mean: 0.55483, std: 0.02776, params: {'reg_alpha': 0, 'reg_lambda': 0.1},
  mean: 0.55459, std: 0.02749, params: {'reg_alpha': 0, 'reg_lambda': 0.2},
  mean: 0.55483, std: 0.02620, params: {'reg_alpha': 0, 'reg_lambda': 0.5},
  mean: 0.55604, std: 0.02784, params: {'reg_alpha': 0, 'reg_lambda': 1},
  mean: 0.55459, std: 0.02897, params: {'reg_alpha': 0, 'reg_lambda': 2},
  mean: 0.55098, std: 0.02991, params: {'reg_alpha': 0, 'reg_lambda': 5},
  mean: 0.55242, std: 0.03191, params: {'reg_alpha': 0, 'reg_lambda': 10},
  mean: 0.55411, std: 0.02701, params: {'reg_alpha': 0.05, 'reg_lambda': 0},
  mean: 0.55459, std: 0.02749, params: {'reg_alpha': 0.05, 'reg_lambda': 0.05},
  mean: 0.55459, std: 0.02784, params: {'reg_alpha': 0.05, 'reg_lambda': 0.1},
  mean: 0.55580, std: 0.02595, params: {'reg_alpha': 0.05, 'reg_lambda': 0.2},
  mean: 0.55604, std: 0.02640, params: {'reg_alpha': 0.05, 'reg_lambda': 0.5},
  mean: 0.55507, std: 0.02600, params: {'reg_alpha': 0.05, 'reg_lambda': 1},
  mean: 0.55315, std: 0.02898, params: {'reg_alpha': 0.05, 'reg_lambda': 2},
  mean: 0.55146, std: 0.02914, params: {'reg_alpha': 0.05, 'reg_lambda': 5},
  mean: 0.55194, std: 0.03225, params: {'reg_alpha': 0.05, 'reg_lambda': 10},
  mean: 0.55363, std: 0.02756, params: {'reg_alpha': 0.1, 'reg_lambda': 0},
  mean: 0.55435, std: 0.02644, params: {'reg_alpha': 0.1, 'reg_lambda': 0.05},
  mean: 0.55435, std: 0.02750, params: {'reg_alpha': 0.1, 'reg_lambda': 0.1},
  mean: 0.55628, std: 0.02721, params: {'reg_alpha': 0.1, 'reg_lambda': 0.2},
  mean: 0.55652, std: 0.02552, params: {'reg_alpha': 0.1, 'reg_lambda': 0.5},
  mean: 0.55652, std: 0.02734, params: {'reg_alpha': 0.1, 'reg_lambda': 1},
  mean: 0.55435, std: 0.02857, params: {'reg_alpha': 0.1, 'reg_lambda': 2},
  mean: 0.55170, std: 0.02891, params: {'reg_alpha': 0.1, 'reg_lambda': 5},
  mean: 0.55194, std: 0.03269, params: {'reg_alpha': 0.1, 'reg_lambda': 10},
  mean: 0.55483, std: 0.02519, params: {'reg_alpha': 0.2, 'reg_lambda': 0},
  mean: 0.55411, std: 0.02519, params: {'reg_alpha': 0.2, 'reg_lambda': 0.05},
  mean: 0.55411, std: 0.02480, params: {'reg_alpha': 0.2, 'reg_lambda': 0.1},
  mean: 0.55580, std: 0.02591, params: {'reg_alpha': 0.2, 'reg_lambda': 0.2},
  mean: 0.55435, std: 0.02634, params: {'reg_alpha': 0.2, 'reg_lambda': 0.5},
  mean: 0.55194, std: 0.02746, params: {'reg_alpha': 0.2, 'reg_lambda': 1},
  mean: 0.55411, std: 0.02770, params: {'reg_alpha': 0.2, 'reg_lambda': 2},
  mean: 0.55266, std: 0.03008, params: {'reg_alpha': 0.2, 'reg_lambda': 5},
  mean: 0.55194, std: 0.03360, params: {'reg_alpha': 0.2, 'reg_lambda': 10},
  mean: 0.55459, std: 0.02602, params: {'reg_alpha': 0.5, 'reg_lambda': 0},
  mean: 0.55507, std: 0.02602, params: {'reg_alpha': 0.5, 'reg_lambda': 0.05},
  mean: 0.55652, std: 0.02633, params: {'reg_alpha': 0.5, 'reg_lambda': 0.1},
  mean: 0.55507, std: 0.02602, params: {'reg_alpha': 0.5, 'reg_lambda': 0.2},
  mean: 0.55290, std: 0.02814, params: {'reg_alpha': 0.5, 'reg_lambda': 0.5},
  mean: 0.55242, std: 0.02823, params: {'reg_alpha': 0.5, 'reg_lambda': 1},
  mean: 0.55146, std: 0.02872, params: {'reg_alpha': 0.5, 'reg_lambda': 2},
  mean: 0.55242, std: 0.03230, params: {'reg_alpha': 0.5, 'reg_lambda': 5},
  mean: 0.55098, std: 0.03272, params: {'reg_alpha': 0.5, 'reg_lambda': 10},
  mean: 0.55387, std: 0.02924, params: {'reg_alpha': 1, 'reg_lambda': 0},
  mean: 0.55266, std: 0.02893, params: {'reg_alpha': 1, 'reg_lambda': 0.05},
  mean: 0.55266, std: 0.02893, params: {'reg_alpha': 1, 'reg_lambda': 0.1},
  mean: 0.55363, std: 0.03067, params: {'reg_alpha': 1, 'reg_lambda': 0.2},
  mean: 0.55339, std: 0.03040, params: {'reg_alpha': 1, 'reg_lambda': 0.5},
  mean: 0.55387, std: 0.02969, params: {'reg_alpha': 1, 'reg_lambda': 1},
  mean: 0.55170, std: 0.02859, params: {'reg_alpha': 1, 'reg_lambda': 2},
  mean: 0.55290, std: 0.03314, params: {'reg_alpha': 1, 'reg_lambda': 5},
  mean: 0.54929, std: 0.03083, params: {'reg_alpha': 1, 'reg_lambda': 10},
  mean: 0.55025, std: 0.03203, params: {'reg_alpha': 2, 'reg_lambda': 0},
  mean: 0.55146, std: 0.03014, params: {'reg_alpha': 2, 'reg_lambda': 0.05},
  mean: 0.55290, std: 0.03070, params: {'reg_alpha': 2, 'reg_lambda': 0.1},
  mean: 0.55218, std: 0.03158, params: {'reg_alpha': 2, 'reg_lambda': 0.2},
  mean: 0.55194, std: 0.03249, params: {'reg_alpha': 2, 'reg_lambda': 0.5},
  mean: 0.55290, std: 0.03226, params: {'reg_alpha': 2, 'reg_lambda': 1},
  mean: 0.55266, std: 0.03405, params: {'reg_alpha': 2, 'reg_lambda': 2},
  mean: 0.55242, std: 0.03318, params: {'reg_alpha': 2, 'reg_lambda': 5},
  mean: 0.54881, std: 0.02862, params: {'reg_alpha': 2, 'reg_lambda': 10},
  mean: 0.55146, std: 0.03166, params: {'reg_alpha': 5, 'reg_lambda': 0},
  mean: 0.55122, std: 0.03076, params: {'reg_alpha': 5, 'reg_lambda': 0.05},
  mean: 0.55242, std: 0.03043, params: {'reg_alpha': 5, 'reg_lambda': 0.1},
  mean: 0.55074, std: 0.03090, params: {'reg_alpha': 5, 'reg_lambda': 0.2},
  mean: 0.55194, std: 0.03018, params: {'reg_alpha': 5, 'reg_lambda': 0.5},
  mean: 0.55194, std: 0.03115, params: {'reg_alpha': 5, 'reg_lambda': 1},
  mean: 0.55122, std: 0.02885, params: {'reg_alpha': 5, 'reg_lambda': 2},
  mean: 0.55387, std: 0.02835, params: {'reg_alpha': 5, 'reg_lambda': 5},
  mean: 0.55459, std: 0.02933, params: {'reg_alpha': 5, 'reg_lambda': 10},
  mean: 0.55459, std: 0.02804, params: {'reg_alpha': 10, 'reg_lambda': 0},
  mean: 0.55483, std: 0.02781, params: {'reg_alpha': 10, 'reg_lambda': 0.05},
  mean: 0.55435, std: 0.02801, params: {'reg_alpha': 10, 'reg_lambda': 0.1},
  mean: 0.55411, std: 0.02824, params: {'reg_alpha': 10, 'reg_lambda': 0.2},
  mean: 0.55411, std: 0.02795, params: {'reg_alpha': 10, 'reg_lambda': 0.5},
  mean: 0.55411, std: 0.02852, params: {'reg_alpha': 10, 'reg_lambda': 1},
  mean: 0.55411, std: 0.02999, params: {'reg_alpha': 10, 'reg_lambda': 2},
  mean: 0.55435, std: 0.02819, params: {'reg_alpha': 10, 'reg_lambda': 5},
  mean: 0.55387, std: 0.02639, params: {'reg_alpha': 10, 'reg_lambda': 10}],
 {'reg_alpha': 0.1, 'reg_lambda': 0.5},
 0.55651964328753911)



In [16]:

    
# Proposed Model with optimized regularization 
xgb3 = xgb.XGBClassifier( learning_rate =0.01, n_estimators=400, max_depth=5,
                          min_child_weight=1, gamma=0, subsample=0.6,
                          colsample_bytree=0.6, reg_alpha=0.1, reg_lambda=0.5, objective='multi:softmax',
                          nthread=4, scale_pos_weight=1, seed=100)

#Fit the algorithm on the data
xgb3.fit(X_train, Y_train,eval_metric='merror')

#Predict training set:
predictions = xgb3.predict(X_train)
        
#Print model report

# Confusion Matrix
conf = confusion_matrix(Y_train, predictions )

# Print Results
print ("\nModel Report")
print ("-Accuracy: %.6f" % ( accuracy(conf) ))
print ("-Adjacent Accuracy: %.6f" % ( accuracy_adjacent(conf, adjacent_facies) ))

# Confusion Matrix
print ("\nConfusion Matrix")
display_cm(conf, facies_labels, display_metrics=True, hide_zeros=True)

# Print Feature Importance
feat_imp = pd.Series(xgb3.booster().get_fscore()).sort_values(ascending=False)
feat_imp.plot(kind='bar', title='Feature Importances')
plt.ylabel('Feature Importance Score')









    



Model Report
-Accuracy: 0.784285
-Adjacent Accuracy: 0.953242

Confusion Matrix
     Pred    SS  CSiS  FSiS  SiSh    MS    WS     D    PS    BS Total
     True
       SS   167    89    12                                       268
     CSiS    17   808   112                 2           1         940
     FSiS         139   636                 2           3         780
     SiSh                 6   225     1    26     2    11         271
       MS           6     5    17   152    74     2    40         296
       WS     1           2    29    12   440     7    88     3   582
        D                 1     3           5   106    25     1   141
       PS           1     6    16     5    82     9   566     1   686
       BS                                   8     2    21   154   185

Precision  0.90  0.77  0.82  0.78  0.89  0.69  0.83  0.75  0.97  0.79
   Recall  0.62  0.86  0.82  0.83  0.51  0.76  0.75  0.83  0.83  0.78
       F1  0.74  0.81  0.82  0.80  0.65  0.72  0.79  0.79  0.90  0.78






    Out[16]:





<matplotlib.text.Text at 0xbccacf8>



In [17]:

    
print("Parameter optimization")
grid_search3 = GridSearchCV(xgb3,{'max_depth':[2, 5, 8], 'gamma':[0, 1], 'subsample':[0.4, 0.6, 0.8],'colsample_bytree':[0.4, 0.6, 0.8] },
                                   scoring='accuracy' , n_jobs = 4)
grid_search3.fit(X_train,Y_train)
print("Best Set of Parameters")
grid_search3.grid_scores_, grid_search3.best_params_, grid_search3.best_score_









    



Parameter optimization
Best Set of Parameters






    



C:\Users\chenzhan\AppData\Local\Continuum\Anaconda64\lib\site-packages\sklearn\model_selection\_search.py:667: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20
  DeprecationWarning)






    Out[17]:





([mean: 0.55146, std: 0.02078, params: {'max_depth': 2, 'subsample': 0.4, 'gamma': 0, 'colsample_bytree': 0.4},
  mean: 0.55170, std: 0.01781, params: {'colsample_bytree': 0.4, 'max_depth': 2, 'subsample': 0.6, 'gamma': 0},
  mean: 0.54977, std: 0.01751, params: {'colsample_bytree': 0.4, 'max_depth': 2, 'subsample': 0.8, 'gamma': 0},
  mean: 0.53531, std: 0.01736, params: {'max_depth': 5, 'subsample': 0.4, 'gamma': 0, 'colsample_bytree': 0.4},
  mean: 0.54182, std: 0.01044, params: {'subsample': 0.6, 'max_depth': 5, 'gamma': 0, 'colsample_bytree': 0.4},
  mean: 0.53917, std: 0.01401, params: {'colsample_bytree': 0.4, 'max_depth': 5, 'subsample': 0.8, 'gamma': 0},
  mean: 0.52953, std: 0.01619, params: {'max_depth': 8, 'subsample': 0.4, 'gamma': 0, 'colsample_bytree': 0.4},
  mean: 0.52832, std: 0.01334, params: {'max_depth': 8, 'subsample': 0.6, 'gamma': 0, 'colsample_bytree': 0.4},
  mean: 0.53025, std: 0.01603, params: {'colsample_bytree': 0.4, 'max_depth': 8, 'subsample': 0.8, 'gamma': 0},
  mean: 0.55098, std: 0.02163, params: {'max_depth': 2, 'subsample': 0.4, 'gamma': 1, 'colsample_bytree': 0.4},
  mean: 0.55122, std: 0.01714, params: {'colsample_bytree': 0.4, 'max_depth': 2, 'subsample': 0.6, 'gamma': 1},
  mean: 0.54905, std: 0.01806, params: {'subsample': 0.8, 'max_depth': 2, 'gamma': 1, 'colsample_bytree': 0.4},
  mean: 0.53651, std: 0.01349, params: {'max_depth': 5, 'subsample': 0.4, 'gamma': 1, 'colsample_bytree': 0.4},
  mean: 0.54037, std: 0.01404, params: {'colsample_bytree': 0.4, 'max_depth': 5, 'subsample': 0.6, 'gamma': 1},
  mean: 0.53844, std: 0.01161, params: {'subsample': 0.8, 'max_depth': 5, 'gamma': 1, 'colsample_bytree': 0.4},
  mean: 0.53218, std: 0.01724, params: {'max_depth': 8, 'subsample': 0.4, 'gamma': 1, 'colsample_bytree': 0.4},
  mean: 0.53410, std: 0.01634, params: {'colsample_bytree': 0.4, 'max_depth': 8, 'subsample': 0.6, 'gamma': 1},
  mean: 0.53266, std: 0.00957, params: {'max_depth': 8, 'subsample': 0.8, 'gamma': 1, 'colsample_bytree': 0.4},
  mean: 0.55170, std: 0.02279, params: {'max_depth': 2, 'subsample': 0.4, 'gamma': 0, 'colsample_bytree': 0.6},
  mean: 0.55146, std: 0.01988, params: {'max_depth': 2, 'subsample': 0.6, 'gamma': 0, 'colsample_bytree': 0.6},
  mean: 0.54832, std: 0.01790, params: {'max_depth': 2, 'subsample': 0.8, 'gamma': 0, 'colsample_bytree': 0.6},
  mean: 0.55266, std: 0.03049, params: {'max_depth': 5, 'subsample': 0.4, 'gamma': 0, 'colsample_bytree': 0.6},
  mean: 0.55652, std: 0.02552, params: {'max_depth': 5, 'subsample': 0.6, 'gamma': 0, 'colsample_bytree': 0.6},
  mean: 0.54905, std: 0.02224, params: {'max_depth': 5, 'subsample': 0.8, 'gamma': 0, 'colsample_bytree': 0.6},
  mean: 0.54375, std: 0.02369, params: {'subsample': 0.4, 'max_depth': 8, 'gamma': 0, 'colsample_bytree': 0.6},
  mean: 0.54664, std: 0.02236, params: {'max_depth': 8, 'subsample': 0.6, 'gamma': 0, 'colsample_bytree': 0.6},
  mean: 0.54640, std: 0.02479, params: {'colsample_bytree': 0.6, 'max_depth': 8, 'subsample': 0.8, 'gamma': 0},
  mean: 0.55074, std: 0.02137, params: {'subsample': 0.4, 'max_depth': 2, 'gamma': 1, 'colsample_bytree': 0.6},
  mean: 0.55194, std: 0.02152, params: {'subsample': 0.6, 'max_depth': 2, 'gamma': 1, 'colsample_bytree': 0.6},
  mean: 0.54881, std: 0.01910, params: {'max_depth': 2, 'subsample': 0.8, 'gamma': 1, 'colsample_bytree': 0.6},
  mean: 0.54857, std: 0.03170, params: {'colsample_bytree': 0.6, 'max_depth': 5, 'subsample': 0.4, 'gamma': 1},
  mean: 0.55483, std: 0.02627, params: {'subsample': 0.6, 'max_depth': 5, 'gamma': 1, 'colsample_bytree': 0.6},
  mean: 0.55194, std: 0.02645, params: {'max_depth': 5, 'subsample': 0.8, 'gamma': 1, 'colsample_bytree': 0.6},
  mean: 0.54881, std: 0.02548, params: {'colsample_bytree': 0.6, 'max_depth': 8, 'subsample': 0.4, 'gamma': 1},
  mean: 0.54760, std: 0.02325, params: {'subsample': 0.6, 'max_depth': 8, 'gamma': 1, 'colsample_bytree': 0.6},
  mean: 0.54977, std: 0.01933, params: {'max_depth': 8, 'subsample': 0.8, 'gamma': 1, 'colsample_bytree': 0.6},
  mean: 0.55363, std: 0.01914, params: {'max_depth': 2, 'subsample': 0.4, 'gamma': 0, 'colsample_bytree': 0.8},
  mean: 0.55074, std: 0.01928, params: {'colsample_bytree': 0.8, 'max_depth': 2, 'subsample': 0.6, 'gamma': 0},
  mean: 0.55122, std: 0.01444, params: {'max_depth': 2, 'subsample': 0.8, 'gamma': 0, 'colsample_bytree': 0.8},
  mean: 0.54712, std: 0.02818, params: {'max_depth': 5, 'subsample': 0.4, 'gamma': 0, 'colsample_bytree': 0.8},
  mean: 0.55049, std: 0.03193, params: {'max_depth': 5, 'subsample': 0.6, 'gamma': 0, 'colsample_bytree': 0.8},
  mean: 0.54640, std: 0.02575, params: {'max_depth': 5, 'subsample': 0.8, 'gamma': 0, 'colsample_bytree': 0.8},
  mean: 0.54447, std: 0.03035, params: {'max_depth': 8, 'subsample': 0.4, 'gamma': 0, 'colsample_bytree': 0.8},
  mean: 0.54784, std: 0.03084, params: {'colsample_bytree': 0.8, 'max_depth': 8, 'subsample': 0.6, 'gamma': 0},
  mean: 0.54543, std: 0.02831, params: {'max_depth': 8, 'subsample': 0.8, 'gamma': 0, 'colsample_bytree': 0.8},
  mean: 0.55218, std: 0.01772, params: {'max_depth': 2, 'subsample': 0.4, 'gamma': 1, 'colsample_bytree': 0.8},
  mean: 0.55001, std: 0.01803, params: {'max_depth': 2, 'subsample': 0.6, 'gamma': 1, 'colsample_bytree': 0.8},
  mean: 0.55146, std: 0.01476, params: {'max_depth': 2, 'subsample': 0.8, 'gamma': 1, 'colsample_bytree': 0.8},
  mean: 0.54736, std: 0.02696, params: {'colsample_bytree': 0.8, 'max_depth': 5, 'subsample': 0.4, 'gamma': 1},
  mean: 0.55170, std: 0.03230, params: {'subsample': 0.6, 'max_depth': 5, 'gamma': 1, 'colsample_bytree': 0.8},
  mean: 0.54423, std: 0.02674, params: {'max_depth': 5, 'subsample': 0.8, 'gamma': 1, 'colsample_bytree': 0.8},
  mean: 0.54616, std: 0.02970, params: {'colsample_bytree': 0.8, 'max_depth': 8, 'subsample': 0.4, 'gamma': 1},
  mean: 0.54832, std: 0.02628, params: {'subsample': 0.6, 'max_depth': 8, 'gamma': 1, 'colsample_bytree': 0.8},
  mean: 0.54688, std: 0.03003, params: {'max_depth': 8, 'subsample': 0.8, 'gamma': 1, 'colsample_bytree': 0.8}],
 {'colsample_bytree': 0.6, 'gamma': 0, 'max_depth': 5, 'subsample': 0.6},
 0.55651964328753911)



In [18]:

    
# Load data 
filename = '../facies_vectors.csv'
data = pd.read_csv(filename)

# Change to category data type
data['Well Name'] = data['Well Name'].astype('category')
data['Formation'] = data['Formation'].astype('category')

# Leave one well out for cross validation 
well_names = data['Well Name'].unique()
f1=[]
for i in range(len(well_names)):
    
    # Split data for training and testing
    X_train = data.drop(['Facies', 'Formation','Depth'], axis = 1 ) 
    Y_train = data['Facies' ] - 1
    
    train_X = X_train[X_train['Well Name'] != well_names[i] ]
    train_Y = Y_train[X_train['Well Name'] != well_names[i] ]
    test_X  = X_train[X_train['Well Name'] == well_names[i] ]
    test_Y  = Y_train[X_train['Well Name'] == well_names[i] ]

    train_X = train_X.drop(['Well Name'], axis = 1 ) 
    test_X = test_X.drop(['Well Name'], axis = 1 )

    # Final recommended model based on the extensive parameters search
    model_final = xgb.XGBClassifier( learning_rate =0.01, n_estimators=400, max_depth=5,
                                   min_child_weight=1, gamma=0, subsample=0.6, reg_alpha=0.1, reg_lambda=0.5,
                                   colsample_bytree=0.6, objective='multi:softmax',
                                   nthread=4, scale_pos_weight=1, seed=100)

    # Train the model based on training data
    model_final.fit( train_X , train_Y , eval_metric = 'merror' )


    # Predict on the test set
    predictions = model_final.predict(test_X)

    # Print report
    print ("\n------------------------------------------------------")
    print ("Validation on the leaving out well " + well_names[i])
    conf = confusion_matrix( test_Y, predictions, labels = np.arange(9) )
    print ("\nModel Report")
    print ("-Accuracy: %.6f" % ( accuracy(conf) ))
    print ("-Adjacent Accuracy: %.6f" % ( accuracy_adjacent(conf, adjacent_facies) ))
    print ("-F1 Score: %.6f" % ( f1_score ( test_Y , predictions , labels = np.arange(9), average = 'weighted' ) ))
    f1.append(f1_score ( test_Y , predictions , labels = np.arange(9), average = 'weighted' ))
    facies_labels = ['SS', 'CSiS', 'FSiS', 'SiSh', 'MS',
                     'WS', 'D','PS', 'BS']
    print ("\nConfusion Matrix Results")
    from classification_utilities import display_cm, display_adj_cm
    display_cm(conf, facies_labels,display_metrics=True, hide_zeros=True)
    
print ("\n------------------------------------------------------")
print ("Final Results")
print ("-Average F1 Score: %6f" % (sum(f1)/(1.0*len(f1))))









    



------------------------------------------------------
Validation on the leaving out well SHRIMPLIN

Model Report
-Accuracy: 0.607219
-Adjacent Accuracy: 0.959660
-F1 Score: 0.587285

Confusion Matrix Results
     Pred    SS  CSiS  FSiS  SiSh    MS    WS     D    PS    BS Total
     True
       SS                                                           0
     CSiS     7    94    17                                       118
     FSiS          52    71                                       123
     SiSh                      13           2           3          18
       MS                       6     6    43           8          63
       WS                       2     2    38     1    18     2    63
        D                                   1     1     3           5
       PS                             2    10     1    52     4    69
       BS                                               1    11    12

Precision  0.00  0.64  0.81  0.62  0.60  0.40  0.33  0.61  0.65  0.64
   Recall  0.00  0.80  0.58  0.72  0.10  0.60  0.20  0.75  0.92  0.61
       F1  0.00  0.71  0.67  0.67  0.16  0.48  0.25  0.68  0.76  0.59






    



/Users/littleni/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1115: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true samples.
  'recall', 'true', average, warn_for)






    



------------------------------------------------------
Validation on the leaving out well ALEXANDER D

Model Report
-Accuracy: 0.626609
-Adjacent Accuracy: 0.916309
-F1 Score: 0.589447

Confusion Matrix Results
     Pred    SS  CSiS  FSiS  SiSh    MS    WS     D    PS    BS Total
     True
       SS                                                           0
     CSiS          85    32                                       117
     FSiS          17    74                                        91
     SiSh                      39                 3     2          44
       MS                       3    11     2          10          26
       WS                      12    19     1     4    33          69
        D                             1           8     7          16
       PS                       7     3     5     8    73     2    98
       BS                                   1     3           1     5

Precision  0.00  0.83  0.70  0.64  0.32  0.11  0.31  0.58  0.33  0.58
   Recall  0.00  0.73  0.81  0.89  0.42  0.01  0.50  0.74  0.20  0.63
       F1  0.00  0.78  0.75  0.74  0.37  0.03  0.38  0.65  0.25  0.59






    



/Users/littleni/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
/Users/littleni/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1115: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true samples.
  'recall', 'true', average, warn_for)






    



------------------------------------------------------
Validation on the leaving out well SHANKLE

Model Report
-Accuracy: 0.487751
-Adjacent Accuracy: 0.966592
-F1 Score: 0.467278

Confusion Matrix Results
     Pred    SS  CSiS  FSiS  SiSh    MS    WS     D    PS    BS Total
     True
       SS     7    81     1                                        89
     CSiS     8    69    12                                        89
     FSiS          55    61                             1         117
     SiSh                       4           1           2           7
       MS                      14           3     1     1          19
       WS                       7     5    46          13          71
        D                             1     2     9     5          17
       PS                                  16     1    23          40
       BS                                                           0

Precision  0.47  0.34  0.82  0.16  0.00  0.68  0.82  0.51  0.00  0.56
   Recall  0.08  0.78  0.52  0.57  0.00  0.65  0.53  0.57  0.00  0.49
       F1  0.13  0.47  0.64  0.25  0.00  0.66  0.64  0.54  0.00  0.47






    



/Users/littleni/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
/Users/littleni/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1115: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true samples.
  'recall', 'true', average, warn_for)






    



------------------------------------------------------
Validation on the leaving out well LUKE G U

Model Report
-Accuracy: 0.637744
-Adjacent Accuracy: 0.928416
-F1 Score: 0.659138

Confusion Matrix Results
     Pred    SS  CSiS  FSiS  SiSh    MS    WS     D    PS    BS Total
     True
       SS                                                           0
     CSiS    11    88    18                                       117
     FSiS     7    43    75                             4         129
     SiSh                      31     1     3                      35
       MS                             1                 1           2
       WS                       8    17    42          16     1    84
        D                                   1     7    11     1    20
       PS                 2     3     3    15     1    50          74
       BS                                                           0

Precision  0.00  0.67  0.79  0.74  0.05  0.69  0.88  0.61  0.00  0.71
   Recall  0.00  0.75  0.58  0.89  0.50  0.50  0.35  0.68  0.00  0.64
       F1  0.00  0.71  0.67  0.81  0.08  0.58  0.50  0.64  0.00  0.66






    



/Users/littleni/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1115: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true samples.
  'recall', 'true', average, warn_for)






    



------------------------------------------------------
Validation on the leaving out well KIMZEY A

Model Report
-Accuracy: 0.530752
-Adjacent Accuracy: 0.895216
-F1 Score: 0.495145

Confusion Matrix Results
     Pred    SS  CSiS  FSiS  SiSh    MS    WS     D    PS    BS Total
     True
       SS           5     4                                         9
     CSiS          75    10                                        85
     FSiS          40    34                                        74
     SiSh                      27          16                      43
       MS                       7     2    29          15          53
       WS                 1     3          28     1    18          51
        D                       3           5     5    14          27
       PS                       1     1    24     4    60          90
       BS                                   2           3     2     7

Precision  0.00  0.62  0.69  0.66  0.67  0.27  0.50  0.55  1.00  0.57
   Recall  0.00  0.88  0.46  0.63  0.04  0.55  0.19  0.67  0.29  0.53
       F1  0.00  0.73  0.55  0.64  0.07  0.36  0.27  0.60  0.44  0.50






    



/Users/littleni/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)






    



------------------------------------------------------
Validation on the leaving out well CROSS H CATTLE

Model Report
-Accuracy: 0.361277
-Adjacent Accuracy: 0.878244
-F1 Score: 0.339461

Confusion Matrix Results
     Pred    SS  CSiS  FSiS  SiSh    MS    WS     D    PS    BS Total
     True
       SS    31   112    15                                       158
     CSiS     2    58    81                             1         142
     FSiS           5    39                 1           2          47
     SiSh                 4     1     2    17           1          25
       MS           4     3                17     1     3          28
       WS                                  24     1     6          31
        D                                         1     1           2
       PS                 4     4     1    27     5    27          68
       BS                                                           0

Precision  0.94  0.32  0.27  0.20  0.00  0.28  0.12  0.66  0.00  0.53
   Recall  0.20  0.41  0.83  0.04  0.00  0.77  0.50  0.40  0.00  0.36
       F1  0.32  0.36  0.40  0.07  0.00  0.41  0.20  0.50  0.00  0.34






    



/Users/littleni/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
/Users/littleni/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1115: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true samples.
  'recall', 'true', average, warn_for)






    



------------------------------------------------------
Validation on the leaving out well NOLAN

Model Report
-Accuracy: 0.520482
-Adjacent Accuracy: 0.872289
-F1 Score: 0.541316

Confusion Matrix Results
     Pred    SS  CSiS  FSiS  SiSh    MS    WS     D    PS    BS Total
     True
       SS           4                                               4
     CSiS    15    85    17                 1                     118
     FSiS     3    23    39                 1           2          68
     SiSh           1           7     3    11     1     5          28
       MS           1     2           1    32     1     6     4    47
       WS     1                 1          11          12     5    30
        D                       1                 2     1           4
       PS                 5     1          14     2    71    23   116
       BS                                                           0

Precision  0.00  0.75  0.62  0.70  0.25  0.16  0.33  0.73  0.00  0.61
   Recall  0.00  0.72  0.57  0.25  0.02  0.37  0.50  0.61  0.00  0.52
       F1  0.00  0.73  0.60  0.37  0.04  0.22  0.40  0.67  0.00  0.54






    



/Users/littleni/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1115: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true samples.
  'recall', 'true', average, warn_for)






    



------------------------------------------------------
Validation on the leaving out well Recruit F9

Model Report
-Accuracy: 0.637500
-Adjacent Accuracy: 0.925000
-F1 Score: 0.778626

Confusion Matrix Results
     Pred    SS  CSiS  FSiS  SiSh    MS    WS     D    PS    BS Total
     True
       SS                                                           0
     CSiS                                                           0
     FSiS                                                           0
     SiSh                                                           0
       MS                                                           0
       WS                                                           0
        D                                                           0
       PS                                                           0
       BS                       1           5     4    19    51    80

Precision  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  1.00  1.00
   Recall  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.64  0.64
       F1  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.78  0.78






    



/Users/littleni/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
/Users/littleni/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1115: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true samples.
  'recall', 'true', average, warn_for)






    



------------------------------------------------------
Validation on the leaving out well NEWBY

Model Report
-Accuracy: 0.494600
-Adjacent Accuracy: 0.892009
-F1 Score: 0.486147

Confusion Matrix Results
     Pred    SS  CSiS  FSiS  SiSh    MS    WS     D    PS    BS Total
     True
       SS                                                           0
     CSiS    12    62    23                             1          98
     FSiS     1    36    43                                        80
     SiSh           1          34     4    14     2     3          58
       MS                 3     2           8     4    11          28
       WS                       4    12    40     1    39          96
        D                                   3     4     9          16
       PS                             1    10          45          56
       BS                                   5          25     1    31

Precision  0.00  0.63  0.62  0.85  0.00  0.50  0.36  0.34  1.00  0.57
   Recall  0.00  0.63  0.54  0.59  0.00  0.42  0.25  0.80  0.03  0.49
       F1  0.00  0.63  0.58  0.69  0.00  0.45  0.30  0.48  0.06  0.49






    



/Users/littleni/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1115: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true samples.
  'recall', 'true', average, warn_for)






    



------------------------------------------------------
Validation on the leaving out well CHURCHMAN BIBLE

Model Report
-Accuracy: 0.574257
-Adjacent Accuracy: 0.878713
-F1 Score: 0.548002

Confusion Matrix Results
     Pred    SS  CSiS  FSiS  SiSh    MS    WS     D    PS    BS Total
     True
       SS           7     1                                         8
     CSiS     3    31    21                 1                      56
     FSiS           8    39     2           1           1          51
     SiSh                 1     7           5                      13
       MS                 1     4     3    18           4          30
       WS                      13          65           9          87
        D                 2     7           7     2    16          34
       PS                 3     1     3    24          43     1    75
       BS                       2           1           5    42    50

Precision  0.00  0.67  0.57  0.19  0.50  0.53  1.00  0.55  0.98  0.63
   Recall  0.00  0.55  0.76  0.54  0.10  0.75  0.06  0.57  0.84  0.57
       F1  0.00  0.61  0.66  0.29  0.17  0.62  0.11  0.56  0.90  0.55

------------------------------------------------------
Final Results
-Average F1 Score: 0.549185



In [19]:

    
# Load test data
test_data = pd.read_csv('../validation_data_nofacies.csv')
test_data['Well Name'] = test_data['Well Name'].astype('category')
X_test = test_data.drop(['Formation', 'Well Name', 'Depth'], axis=1)
# Predict facies of unclassified data
Y_predicted = model_final.predict(X_test)
test_data['Facies'] = Y_predicted + 1
# Store the prediction
test_data.to_csv('Prediction1.csv')



In [20]:

    
test_data









    Out[20]:






  
    
      
      Formation
      Well Name
      Depth
      GR
      ILD_log10
      DeltaPHI
      PHIND
      PE
      NM_M
      RELPOS
      Facies
    
  
  
    
      0
      A1 SH
      STUART
      2808.0
      66.276
      0.630
      3.300
      10.650
      3.591
      1
      1.000
      2
    
    
      1
      A1 SH
      STUART
      2808.5
      77.252
      0.585
      6.500
      11.950
      3.341
      1
      0.978
      3
    
    
      2
      A1 SH
      STUART
      2809.0
      82.899
      0.566
      9.400
      13.600
      3.064
      1
      0.956
      2
    
    
      3
      A1 SH
      STUART
      2809.5
      80.671
      0.593
      9.500
      13.250
      2.977
      1
      0.933
      2
    
    
      4
      A1 SH
      STUART
      2810.0
      75.971
      0.638
      8.700
      12.350
      3.020
      1
      0.911
      2
    
    
      5
      A1 SH
      STUART
      2810.5
      73.955
      0.667
      6.900
      12.250
      3.086
      1
      0.889
      2
    
    
      6
      A1 SH
      STUART
      2811.0
      77.962
      0.674
      6.500
      12.450
      3.092
      1
      0.867
      2
    
    
      7
      A1 SH
      STUART
      2811.5
      83.894
      0.667
      6.300
      12.650
      3.123
      1
      0.844
      2
    
    
      8
      A1 SH
      STUART
      2812.0
      84.424
      0.653
      6.700
      13.050
      3.121
      1
      0.822
      2
    
    
      9
      A1 SH
      STUART
      2812.5
      83.160
      0.642
      7.300
      12.950
      3.127
      1
      0.800
      2
    
    
      10
      A1 SH
      STUART
      2813.0
      79.063
      0.651
      7.300
      12.050
      3.147
      1
      0.778
      2
    
    
      11
      A1 SH
      STUART
      2813.5
      69.002
      0.677
      6.200
      10.800
      3.096
      1
      0.756
      2
    
    
      12
      A1 SH
      STUART
      2814.0
      63.983
      0.690
      4.400
      9.700
      3.103
      1
      0.733
      2
    
    
      13
      A1 SH
      STUART
      2814.5
      61.797
      0.675
      3.500
      9.150
      3.101
      1
      0.711
      2
    
    
      14
      A1 SH
      STUART
      2815.0
      61.372
      0.646
      2.800
      9.300
      3.065
      1
      0.689
      2
    
    
      15
      A1 SH
      STUART
      2815.5
      63.535
      0.621
      2.800
      9.800
      2.982
      1
      0.667
      2
    
    
      16
      A1 SH
      STUART
      2816.0
      65.126
      0.600
      3.300
      10.550
      2.914
      1
      0.644
      2
    
    
      17
      A1 SH
      STUART
      2816.5
      75.930
      0.576
      3.400
      11.900
      2.845
      1
      0.600
      2
    
    
      18
      A1 SH
      STUART
      2817.0
      85.077
      0.584
      4.400
      12.900
      2.854
      1
      0.578
      2
    
    
      19
      A1 SH
      STUART
      2817.5
      89.459
      0.598
      6.600
      13.500
      2.986
      1
      0.556
      2
    
    
      20
      A1 SH
      STUART
      2818.0
      88.619
      0.610
      7.200
      14.800
      2.988
      1
      0.533
      2
    
    
      21
      A1 SH
      STUART
      2818.5
      81.593
      0.636
      6.400
      13.900
      2.998
      1
      0.511
      2
    
    
      22
      A1 SH
      STUART
      2819.0
      66.595
      0.702
      2.800
      11.400
      2.988
      1
      0.489
      2
    
    
      23
      A1 SH
      STUART
      2819.5
      55.081
      0.789
      2.700
      8.150
      3.028
      1
      0.467
      1
    
    
      24
      A1 SH
      STUART
      2820.0
      48.112
      0.840
      1.000
      7.500
      3.073
      1
      0.444
      2
    
    
      25
      A1 SH
      STUART
      2820.5
      43.730
      0.846
      0.400
      7.100
      3.146
      1
      0.422
      1
    
    
      26
      A1 SH
      STUART
      2821.0
      44.097
      0.840
      0.700
      6.650
      3.205
      1
      0.400
      1
    
    
      27
      A1 SH
      STUART
      2821.5
      46.839
      0.842
      0.800
      6.600
      3.254
      1
      0.378
      1
    
    
      28
      A1 SH
      STUART
      2822.0
      50.348
      0.843
      1.100
      6.750
      3.230
      1
      0.356
      1
    
    
      29
      A1 SH
      STUART
      2822.5
      57.129
      0.822
      2.200
      7.300
      3.237
      1
      0.333
      1
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      800
      B5 LM
      CRAWFORD
      3146.0
      167.803
      -0.219
      4.270
      23.370
      3.810
      2
      0.190
      8
    
    
      801
      B5 LM
      CRAWFORD
      3146.5
      151.183
      -0.057
      0.925
      17.125
      4.153
      2
      0.172
      8
    
    
      802
      B5 LM
      CRAWFORD
      3147.0
      123.264
      0.067
      0.285
      14.215
      4.404
      2
      0.155
      8
    
    
      803
      B5 LM
      CRAWFORD
      3147.5
      108.569
      0.234
      0.705
      12.225
      4.499
      2
      0.138
      8
    
    
      804
      B5 LM
      CRAWFORD
      3148.0
      101.072
      0.427
      1.150
      10.760
      4.392
      2
      0.121
      8
    
    
      805
      B5 LM
      CRAWFORD
      3148.5
      91.748
      0.625
      1.135
      9.605
      4.254
      2
      0.103
      8
    
    
      806
      B5 LM
      CRAWFORD
      3149.0
      83.794
      0.749
      2.075
      7.845
      4.023
      2
      0.086
      6
    
    
      807
      B5 LM
      CRAWFORD
      3149.5
      83.794
      0.749
      2.075
      7.845
      4.023
      2
      0.086
      6
    
    
      808
      B5 LM
      CRAWFORD
      3150.0
      79.722
      0.771
      2.890
      6.640
      4.040
      2
      0.069
      6
    
    
      809
      B5 LM
      CRAWFORD
      3150.5
      76.334
      0.800
      2.960
      6.290
      3.997
      2
      0.052
      8
    
    
      810
      B5 LM
      CRAWFORD
      3151.0
      73.631
      0.800
      2.680
      6.690
      3.828
      2
      0.034
      8
    
    
      811
      B5 LM
      CRAWFORD
      3151.5
      76.865
      0.772
      2.420
      8.600
      3.535
      2
      0.017
      8
    
    
      812
      C SH
      CRAWFORD
      3152.0
      79.924
      0.752
      2.620
      11.510
      3.148
      1
      1.000
      2
    
    
      813
      C SH
      CRAWFORD
      3152.5
      82.199
      0.728
      3.725
      14.555
      2.964
      1
      0.972
      3
    
    
      814
      C SH
      CRAWFORD
      3153.0
      79.953
      0.700
      5.610
      16.930
      2.793
      1
      0.944
      3
    
    
      815
      C SH
      CRAWFORD
      3153.5
      75.881
      0.673
      6.300
      17.570
      2.969
      1
      0.917
      3
    
    
      816
      C SH
      CRAWFORD
      3154.0
      67.470
      0.652
      4.775
      15.795
      3.282
      1
      0.889
      2
    
    
      817
      C SH
      CRAWFORD
      3154.5
      58.832
      0.640
      4.315
      13.575
      3.642
      1
      0.861
      2
    
    
      818
      C SH
      CRAWFORD
      3155.0
      57.946
      0.631
      3.595
      11.305
      3.893
      1
      0.833
      2
    
    
      819
      C SH
      CRAWFORD
      3155.5
      65.755
      0.625
      3.465
      10.355
      3.911
      1
      0.806
      2
    
    
      820
      C SH
      CRAWFORD
      3156.0
      69.445
      0.617
      3.390
      11.540
      3.820
      1
      0.778
      2
    
    
      821
      C SH
      CRAWFORD
      3156.5
      73.389
      0.608
      3.625
      12.775
      3.620
      1
      0.750
      2
    
    
      822
      C SH
      CRAWFORD
      3157.0
      77.115
      0.605
      4.140
      13.420
      3.467
      1
      0.722
      2
    
    
      823
      C SH
      CRAWFORD
      3157.5
      79.840
      0.596
      4.875
      13.825
      3.360
      1
      0.694
      2
    
    
      824
      C SH
      CRAWFORD
      3158.0
      82.616
      0.577
      5.235
      14.845
      3.207
      1
      0.667
      2
    
    
      825
      C SH
      CRAWFORD
      3158.5
      86.078
      0.554
      5.040
      16.150
      3.161
      1
      0.639
      2
    
    
      826
      C SH
      CRAWFORD
      3159.0
      88.855
      0.539
      5.560
      16.750
      3.118
      1
      0.611
      2
    
    
      827
      C SH
      CRAWFORD
      3159.5
      90.490
      0.530
      6.360
      16.780
      3.168
      1
      0.583
      3
    
    
      828
      C SH
      CRAWFORD
      3160.0
      90.975
      0.522
      7.035
      16.995
      3.154
      1
      0.556
      2
    
    
      829
      C SH
      CRAWFORD
      3160.5
      90.108
      0.513
      7.505
      17.595
      3.125
      1
      0.528
      3
    
  

830 rows × 11 columns

Future work, make more customerized objective function. Also, we could use RandomizedSearchCV instead of GridSearchCV to avoild potential local minimal trap and further improve the test results.

	Facies	Formation	Well Name	Depth	GR	ILD_log10	DeltaPHI	PHIND	PE	NM_M	RELPOS
0	3	A1 SH	SHRIMPLIN	2793.0	77.450	0.664	9.900	11.915	4.600	1	1.000
1	3	A1 SH	SHRIMPLIN	2793.5	78.260	0.661	14.200	12.565	4.100	1	0.979
2	3	A1 SH	SHRIMPLIN	2794.0	79.050	0.658	14.800	13.050	3.600	1	0.957
3	3	A1 SH	SHRIMPLIN	2794.5	86.100	0.655	13.900	13.115	3.500	1	0.936
4	3	A1 SH	SHRIMPLIN	2795.0	74.580	0.647	13.500	13.300	3.400	1	0.915
5	3	A1 SH	SHRIMPLIN	2795.5	73.970	0.636	14.000	13.385	3.600	1	0.894
6	3	A1 SH	SHRIMPLIN	2796.0	73.720	0.630	15.600	13.930	3.700	1	0.872
7	3	A1 SH	SHRIMPLIN	2796.5	75.650	0.625	16.500	13.920	3.500	1	0.830
8	3	A1 SH	SHRIMPLIN	2797.0	73.790	0.624	16.200	13.980	3.400	1	0.809
9	3	A1 SH	SHRIMPLIN	2797.5	76.890	0.615	16.900	14.220	3.500	1	0.787
10	3	A1 SH	SHRIMPLIN	2798.0	76.110	0.600	14.800	13.375	3.600	1	0.766
11	3	A1 SH	SHRIMPLIN	2798.5	74.950	0.583	13.300	12.690	3.700	1	0.745
12	3	A1 SH	SHRIMPLIN	2799.0	71.870	0.561	11.300	12.475	3.500	1	0.723
13	3	A1 SH	SHRIMPLIN	2799.5	83.420	0.537	13.300	14.930	3.400	1	0.702
14	2	A1 SH	SHRIMPLIN	2800.0	90.100	0.519	14.300	16.555	3.200	1	0.681
15	2	A1 SH	SHRIMPLIN	2800.5	78.150	0.467	11.800	15.960	3.100	1	0.638
16	2	A1 SH	SHRIMPLIN	2801.0	69.300	0.438	9.500	15.120	3.100	1	0.617
17	2	A1 SH	SHRIMPLIN	2801.5	63.540	0.418	8.800	15.190	3.000	1	0.596
18	2	A1 SH	SHRIMPLIN	2802.0	63.870	0.401	7.200	15.390	2.900	1	0.574
19	2	A1 SH	SHRIMPLIN	2802.5	58.320	0.386	6.600	14.885	2.800	1	0.553
20	2	A1 SH	SHRIMPLIN	2803.0	56.610	0.369	5.500	14.800	3.000	1	0.532
21	2	A1 SH	SHRIMPLIN	2803.5	55.970	0.352	6.100	14.460	3.000	1	0.511
22	2	A1 SH	SHRIMPLIN	2804.0	63.670	0.344	6.000	14.745	3.000	1	0.489
23	2	A1 SH	SHRIMPLIN	2804.5	66.200	0.342	6.800	15.135	3.000	1	0.468
24	2	A1 SH	SHRIMPLIN	2805.0	61.270	0.346	6.100	15.480	3.000	1	0.447
25	3	A1 SH	SHRIMPLIN	2805.5	69.480	0.354	5.800	14.675	3.000	1	0.404
26	3	A1 SH	SHRIMPLIN	2806.0	76.370	0.354	5.200	13.635	3.000	1	0.383
27	2	A1 SH	SHRIMPLIN	2806.5	82.200	0.348	7.400	15.055	3.000	1	0.362
28	2	A1 SH	SHRIMPLIN	2807.0	90.250	0.346	11.500	20.230	3.100	1	0.340
29	2	A1 SH	SHRIMPLIN	2807.5	94.380	0.358	14.200	24.015	3.000	1	0.319
...	...	...	...	...	...	...	...	...	...	...	...
4119	8	C LM	CHURCHMAN BIBLE	3108.0	30.734	0.991	1.552	5.382	4.738	2	0.887
4120	6	C LM	CHURCHMAN BIBLE	3108.5	32.219	1.013	1.342	5.055	4.637	2	0.879
4121	6	C LM	CHURCHMAN BIBLE	3109.0	37.688	1.040	0.681	4.739	4.539	2	0.871
4122	6	C LM	CHURCHMAN BIBLE	3109.5	35.844	1.044	0.960	3.533	4.832	2	0.863
4123	6	C LM	CHURCHMAN BIBLE	3110.0	42.156	1.051	1.448	3.337	4.797	2	0.855
4124	6	C LM	CHURCHMAN BIBLE	3110.5	42.094	1.057	2.736	4.051	4.500	2	0.847
4125	5	C LM	CHURCHMAN BIBLE	3111.0	49.719	1.060	3.092	5.893	3.830	2	0.839
4126	5	C LM	CHURCHMAN BIBLE	3111.5	46.219	1.062	3.018	6.503	3.434	2	0.831
4127	6	C LM	CHURCHMAN BIBLE	3112.0	42.313	1.050	2.245	5.958	3.318	2	0.823
4128	6	C LM	CHURCHMAN BIBLE	3112.5	36.031	1.028	1.193	5.936	3.393	2	0.815
4129	6	C LM	CHURCHMAN BIBLE	3113.0	32.594	1.014	0.662	5.978	3.422	2	0.806
4130	6	C LM	CHURCHMAN BIBLE	3113.5	37.094	1.005	0.377	6.605	3.697	2	0.798
4131	5	C LM	CHURCHMAN BIBLE	3114.0	40.031	1.027	0.615	6.270	4.035	2	0.790
4132	5	C LM	CHURCHMAN BIBLE	3114.5	42.500	1.057	0.672	5.871	4.422	2	0.782
4133	6	C LM	CHURCHMAN BIBLE	3115.0	39.719	1.087	0.648	4.479	4.203	2	0.774
4134	6	C LM	CHURCHMAN BIBLE	3115.5	38.844	1.109	1.025	2.686	3.908	2	0.766
4135	6	C LM	CHURCHMAN BIBLE	3116.0	41.719	1.107	0.659	2.320	3.943	2	0.758
4136	5	C LM	CHURCHMAN BIBLE	3116.5	44.750	1.085	1.165	2.937	4.020	2	0.750
4137	5	C LM	CHURCHMAN BIBLE	3117.0	46.469	1.070	1.872	5.013	4.156	2	0.742
4138	5	C LM	CHURCHMAN BIBLE	3117.5	51.000	1.061	3.760	6.445	3.828	2	0.734
4139	5	C LM	CHURCHMAN BIBLE	3118.0	55.563	1.052	4.296	7.325	3.805	2	0.726
4140	5	C LM	CHURCHMAN BIBLE	3118.5	58.313	1.034	3.863	7.465	3.584	2	0.718
4141	5	C LM	CHURCHMAN BIBLE	3119.0	55.344	1.003	2.225	7.541	3.645	2	0.710
4142	5	C LM	CHURCHMAN BIBLE	3119.5	53.313	0.972	1.640	7.295	3.629	2	0.702
4143	5	C LM	CHURCHMAN BIBLE	3120.0	49.594	0.954	1.494	7.149	3.727	2	0.694
4144	5	C LM	CHURCHMAN BIBLE	3120.5	46.719	0.947	1.828	7.254	3.617	2	0.685
4145	5	C LM	CHURCHMAN BIBLE	3121.0	44.563	0.953	2.241	8.013	3.344	2	0.677
4146	5	C LM	CHURCHMAN BIBLE	3121.5	49.719	0.964	2.925	8.013	3.190	2	0.669
4147	5	C LM	CHURCHMAN BIBLE	3122.0	51.469	0.965	3.083	7.708	3.152	2	0.661
4148	5	C LM	CHURCHMAN BIBLE	3122.5	50.031	0.970	2.609	6.668	3.295	2	0.653

	Facies	Depth	GR	ILD_log10	DeltaPHI	PHIND	PE	NM_M	RELPOS
count	4149.000000	4149.000000	4149.000000	4149.000000	4149.000000	4149.000000	3232.000000	4149.000000	4149.000000
mean	4.503254	2906.867438	64.933985	0.659566	4.402484	13.201066	3.725014	1.518438	0.521852
std	2.474324	133.300164	30.302530	0.252703	5.274947	7.132846	0.896152	0.499720	0.286644
min	1.000000	2573.500000	10.149000	-0.025949	-21.832000	0.550000	0.200000	1.000000	0.000000
25%	2.000000	2821.500000	44.730000	0.498000	1.600000	8.500000	NaN	1.000000	0.277000
50%	4.000000	2932.500000	64.990000	0.639000	4.300000	12.020000	NaN	2.000000	0.528000
75%	6.000000	3007.000000	79.438000	0.822000	7.500000	16.050000	NaN	2.000000	0.769000
max	9.000000	3138.000000	361.150000	1.800000	19.312000	84.400000	8.094000	2.000000	1.000000

	Formation	Well Name	Depth	GR	ILD_log10	DeltaPHI	PHIND	PE	NM_M	RELPOS	Facies
0	A1 SH	STUART	2808.0	66.276	0.630	3.300	10.650	3.591	1	1.000	2
1	A1 SH	STUART	2808.5	77.252	0.585	6.500	11.950	3.341	1	0.978	3
2	A1 SH	STUART	2809.0	82.899	0.566	9.400	13.600	3.064	1	0.956	2
3	A1 SH	STUART	2809.5	80.671	0.593	9.500	13.250	2.977	1	0.933	2
4	A1 SH	STUART	2810.0	75.971	0.638	8.700	12.350	3.020	1	0.911	2
5	A1 SH	STUART	2810.5	73.955	0.667	6.900	12.250	3.086	1	0.889	2
6	A1 SH	STUART	2811.0	77.962	0.674	6.500	12.450	3.092	1	0.867	2
7	A1 SH	STUART	2811.5	83.894	0.667	6.300	12.650	3.123	1	0.844	2
8	A1 SH	STUART	2812.0	84.424	0.653	6.700	13.050	3.121	1	0.822	2
9	A1 SH	STUART	2812.5	83.160	0.642	7.300	12.950	3.127	1	0.800	2
10	A1 SH	STUART	2813.0	79.063	0.651	7.300	12.050	3.147	1	0.778	2
11	A1 SH	STUART	2813.5	69.002	0.677	6.200	10.800	3.096	1	0.756	2
12	A1 SH	STUART	2814.0	63.983	0.690	4.400	9.700	3.103	1	0.733	2
13	A1 SH	STUART	2814.5	61.797	0.675	3.500	9.150	3.101	1	0.711	2
14	A1 SH	STUART	2815.0	61.372	0.646	2.800	9.300	3.065	1	0.689	2
15	A1 SH	STUART	2815.5	63.535	0.621	2.800	9.800	2.982	1	0.667	2
16	A1 SH	STUART	2816.0	65.126	0.600	3.300	10.550	2.914	1	0.644	2
17	A1 SH	STUART	2816.5	75.930	0.576	3.400	11.900	2.845	1	0.600	2
18	A1 SH	STUART	2817.0	85.077	0.584	4.400	12.900	2.854	1	0.578	2
19	A1 SH	STUART	2817.5	89.459	0.598	6.600	13.500	2.986	1	0.556	2
20	A1 SH	STUART	2818.0	88.619	0.610	7.200	14.800	2.988	1	0.533	2
21	A1 SH	STUART	2818.5	81.593	0.636	6.400	13.900	2.998	1	0.511	2
22	A1 SH	STUART	2819.0	66.595	0.702	2.800	11.400	2.988	1	0.489	2
23	A1 SH	STUART	2819.5	55.081	0.789	2.700	8.150	3.028	1	0.467	1
24	A1 SH	STUART	2820.0	48.112	0.840	1.000	7.500	3.073	1	0.444	2
25	A1 SH	STUART	2820.5	43.730	0.846	0.400	7.100	3.146	1	0.422	1
26	A1 SH	STUART	2821.0	44.097	0.840	0.700	6.650	3.205	1	0.400	1
27	A1 SH	STUART	2821.5	46.839	0.842	0.800	6.600	3.254	1	0.378	1
28	A1 SH	STUART	2822.0	50.348	0.843	1.100	6.750	3.230	1	0.356	1
29	A1 SH	STUART	2822.5	57.129	0.822	2.200	7.300	3.237	1	0.333	1
...	...	...	...	...	...	...	...	...	...	...	...
800	B5 LM	CRAWFORD	3146.0	167.803	-0.219	4.270	23.370	3.810	2	0.190	8
801	B5 LM	CRAWFORD	3146.5	151.183	-0.057	0.925	17.125	4.153	2	0.172	8
802	B5 LM	CRAWFORD	3147.0	123.264	0.067	0.285	14.215	4.404	2	0.155	8
803	B5 LM	CRAWFORD	3147.5	108.569	0.234	0.705	12.225	4.499	2	0.138	8
804	B5 LM	CRAWFORD	3148.0	101.072	0.427	1.150	10.760	4.392	2	0.121	8
805	B5 LM	CRAWFORD	3148.5	91.748	0.625	1.135	9.605	4.254	2	0.103	8
806	B5 LM	CRAWFORD	3149.0	83.794	0.749	2.075	7.845	4.023	2	0.086	6
807	B5 LM	CRAWFORD	3149.5	83.794	0.749	2.075	7.845	4.023	2	0.086	6
808	B5 LM	CRAWFORD	3150.0	79.722	0.771	2.890	6.640	4.040	2	0.069	6
809	B5 LM	CRAWFORD	3150.5	76.334	0.800	2.960	6.290	3.997	2	0.052	8
810	B5 LM	CRAWFORD	3151.0	73.631	0.800	2.680	6.690	3.828	2	0.034	8
811	B5 LM	CRAWFORD	3151.5	76.865	0.772	2.420	8.600	3.535	2	0.017	8
812	C SH	CRAWFORD	3152.0	79.924	0.752	2.620	11.510	3.148	1	1.000	2
813	C SH	CRAWFORD	3152.5	82.199	0.728	3.725	14.555	2.964	1	0.972	3
814	C SH	CRAWFORD	3153.0	79.953	0.700	5.610	16.930	2.793	1	0.944	3
815	C SH	CRAWFORD	3153.5	75.881	0.673	6.300	17.570	2.969	1	0.917	3
816	C SH	CRAWFORD	3154.0	67.470	0.652	4.775	15.795	3.282	1	0.889	2
817	C SH	CRAWFORD	3154.5	58.832	0.640	4.315	13.575	3.642	1	0.861	2
818	C SH	CRAWFORD	3155.0	57.946	0.631	3.595	11.305	3.893	1	0.833	2
819	C SH	CRAWFORD	3155.5	65.755	0.625	3.465	10.355	3.911	1	0.806	2
820	C SH	CRAWFORD	3156.0	69.445	0.617	3.390	11.540	3.820	1	0.778	2
821	C SH	CRAWFORD	3156.5	73.389	0.608	3.625	12.775	3.620	1	0.750	2
822	C SH	CRAWFORD	3157.0	77.115	0.605	4.140	13.420	3.467	1	0.722	2
823	C SH	CRAWFORD	3157.5	79.840	0.596	4.875	13.825	3.360	1	0.694	2
824	C SH	CRAWFORD	3158.0	82.616	0.577	5.235	14.845	3.207	1	0.667	2
825	C SH	CRAWFORD	3158.5	86.078	0.554	5.040	16.150	3.161	1	0.639	2
826	C SH	CRAWFORD	3159.0	88.855	0.539	5.560	16.750	3.118	1	0.611	2
827	C SH	CRAWFORD	3159.5	90.490	0.530	6.360	16.780	3.168	1	0.583	3
828	C SH	CRAWFORD	3160.0	90.975	0.522	7.035	16.995	3.154	1	0.556	2
829	C SH	CRAWFORD	3160.5	90.108	0.513	7.505	17.595	3.125	1	0.528	3