Random forest

Note from Matt: I converted this from the Rmd file with the ipyrmd utility, which worked almost perfectly.


In [1]:
# Uncomment this line if you don't already have this library.
# install.packages("randomForest", repos="http://cran.rstudio.com/")

In [2]:
library(randomForest)


randomForest 4.6-12
Type rfNews() to see new features/changes/bug fixes.

Reading the file


In [3]:
train_prod = read.csv('../facies_vectors.csv')
test_prod = read.csv('../nofacies_data.csv')

Check the structure of the file


In [4]:
str(train_prod)
str(test_prod)


'data.frame':	4149 obs. of  11 variables:
 $ Facies   : int  3 3 3 3 3 3 3 3 3 3 ...
 $ Formation: Factor w/ 14 levels "A1 LM","A1 SH",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Well.Name: Factor w/ 10 levels "ALEXANDER D",..: 10 10 10 10 10 10 10 10 10 10 ...
 $ Depth    : num  2793 2794 2794 2794 2795 ...
 $ GR       : num  77.5 78.3 79 86.1 74.6 ...
 $ ILD_log10: num  0.664 0.661 0.658 0.655 0.647 0.636 0.63 0.625 0.624 0.615 ...
 $ DeltaPHI : num  9.9 14.2 14.8 13.9 13.5 14 15.6 16.5 16.2 16.9 ...
 $ PHIND    : num  11.9 12.6 13.1 13.1 13.3 ...
 $ PE       : num  4.6 4.1 3.6 3.5 3.4 3.6 3.7 3.5 3.4 3.5 ...
 $ NM_M     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ RELPOS   : num  1 0.979 0.957 0.936 0.915 0.894 0.872 0.83 0.809 0.787 ...
'data.frame':	830 obs. of  10 variables:
 $ Formation: Factor w/ 14 levels "A1 LM","A1 SH",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Well.Name: Factor w/ 2 levels "CRAWFORD","STUART": 2 2 2 2 2 2 2 2 2 2 ...
 $ Depth    : num  2808 2808 2809 2810 2810 ...
 $ GR       : num  66.3 77.3 82.9 80.7 76 ...
 $ ILD_log10: num  0.63 0.585 0.566 0.593 0.638 0.667 0.674 0.667 0.653 0.642 ...
 $ DeltaPHI : num  3.3 6.5 9.4 9.5 8.7 6.9 6.5 6.3 6.7 7.3 ...
 $ PHIND    : num  10.7 11.9 13.6 13.2 12.3 ...
 $ PE       : num  3.59 3.34 3.06 2.98 3.02 ...
 $ NM_M     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ RELPOS   : num  1 0.978 0.956 0.933 0.911 0.889 0.867 0.844 0.822 0.8 ...

Removing the rows with NA's


In [5]:
train_prod = train_prod[!is.na(train_prod$PE),]

Converting the Facies column to factor


In [6]:
train_prod$Facies = as.factor(as.character(train_prod$Facies))

Splitting into train and local validation test


In [7]:
train_row = sample(nrow(train_prod), 0.7*nrow(train_prod), replace=F)
train_local = train_prod[train_row,]
test_local  = train_prod[-train_row,]

Creating model


In [8]:
RF.local.model = randomForest(Facies~., data = train_local[!colnames(train_local) %in% c('Formation',
                                                                                 'Well.Name',
                                                                                 'Depth'
                                                                                 )], seed=2)
RF.local.pred = predict(RF.local.model, newdata = test_local)

Local validation set accuracy


In [9]:
acc_table_RF = table(RF.local.pred, test_local$Facies)
acc_table_RF
acc_RF = sum(diag(acc_table_RF))/nrow(test_local)
acc_RF


Out[9]:
             
RF.local.pred   1   2   3   4   5   6   7   8   9
            1  57   7   1   1   1   0   0   0   0
            2  18 171  45   1   1   0   1   2   0
            3   2  29 146   0   1   0   0   7   0
            4   0   0   2  42   2   4   0   3   0
            5   0   0   0   3  31   0   0   3   0
            6   0   1   2  11  16  98   1  27   2
            7   0   0   0   0   1   1  18   2   1
            8   0   1   1   4   8  31   8 101   8
            9   0   0   0   0   0   0   0   2  44
Out[9]:
0.729896907216495

Predicting on the blind dataset


In [10]:
RF.prod.pred = predict(RF.local.model, newdata = test_prod)

Forming the submission file


In [11]:
sub = cbind(test_prod, Facies = RF.prod.pred)

Writing the predicted output file


In [12]:
write.csv(sub, row.names= F, 'RF_predicted_facies_1_MATT.csv')