Random forest

Note from Matt: I converted this from the Rmd file with the ipyrmd utility, which worked almost perfectly.



In [1]:

    
# Uncomment this line if you don't already have this library.
# install.packages("randomForest", repos="http://cran.rstudio.com/")



In [2]:

    
library(randomForest)









    



randomForest 4.6-12
Type rfNews() to see new features/changes/bug fixes.

Reading the file



In [3]:

    
train_prod = read.csv('../facies_vectors.csv')
test_prod = read.csv('../nofacies_data.csv')

Check the structure of the file



In [4]:

    
str(train_prod)
str(test_prod)









    



'data.frame':	4149 obs. of  11 variables:
 $ Facies   : int  3 3 3 3 3 3 3 3 3 3 ...
 $ Formation: Factor w/ 14 levels "A1 LM","A1 SH",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Well.Name: Factor w/ 10 levels "ALEXANDER D",..: 10 10 10 10 10 10 10 10 10 10 ...
 $ Depth    : num  2793 2794 2794 2794 2795 ...
 $ GR       : num  77.5 78.3 79 86.1 74.6 ...
 $ ILD_log10: num  0.664 0.661 0.658 0.655 0.647 0.636 0.63 0.625 0.624 0.615 ...
 $ DeltaPHI : num  9.9 14.2 14.8 13.9 13.5 14 15.6 16.5 16.2 16.9 ...
 $ PHIND    : num  11.9 12.6 13.1 13.1 13.3 ...
 $ PE       : num  4.6 4.1 3.6 3.5 3.4 3.6 3.7 3.5 3.4 3.5 ...
 $ NM_M     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ RELPOS   : num  1 0.979 0.957 0.936 0.915 0.894 0.872 0.83 0.809 0.787 ...
'data.frame':	830 obs. of  10 variables:
 $ Formation: Factor w/ 14 levels "A1 LM","A1 SH",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Well.Name: Factor w/ 2 levels "CRAWFORD","STUART": 2 2 2 2 2 2 2 2 2 2 ...
 $ Depth    : num  2808 2808 2809 2810 2810 ...
 $ GR       : num  66.3 77.3 82.9 80.7 76 ...
 $ ILD_log10: num  0.63 0.585 0.566 0.593 0.638 0.667 0.674 0.667 0.653 0.642 ...
 $ DeltaPHI : num  3.3 6.5 9.4 9.5 8.7 6.9 6.5 6.3 6.7 7.3 ...
 $ PHIND    : num  10.7 11.9 13.6 13.2 12.3 ...
 $ PE       : num  3.59 3.34 3.06 2.98 3.02 ...
 $ NM_M     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ RELPOS   : num  1 0.978 0.956 0.933 0.911 0.889 0.867 0.844 0.822 0.8 ...

Removing the rows with NA's



In [5]:

    
train_prod = train_prod[!is.na(train_prod$PE),]

Converting the Facies column to factor



In [6]:

    
train_prod$Facies = as.factor(as.character(train_prod$Facies))

Splitting into train and local validation test



In [7]:

    
train_row = sample(nrow(train_prod), 0.7*nrow(train_prod), replace=F)
train_local = train_prod[train_row,]
test_local  = train_prod[-train_row,]

Creating model



In [8]:

    
RF.local.model = randomForest(Facies~., data = train_local[!colnames(train_local) %in% c('Formation',
                                                                                 'Well.Name',
                                                                                 'Depth'
                                                                                 )], seed=2)
RF.local.pred = predict(RF.local.model, newdata = test_local)

Local validation set accuracy



In [9]:

    
acc_table_RF = table(RF.local.pred, test_local$Facies)
acc_table_RF
acc_RF = sum(diag(acc_table_RF))/nrow(test_local)
acc_RF









    Out[9]:





             
RF.local.pred   1   2   3   4   5   6   7   8   9
            1  57   7   1   1   1   0   0   0   0
            2  18 171  45   1   1   0   1   2   0
            3   2  29 146   0   1   0   0   7   0
            4   0   0   2  42   2   4   0   3   0
            5   0   0   0   3  31   0   0   3   0
            6   0   1   2  11  16  98   1  27   2
            7   0   0   0   0   1   1  18   2   1
            8   0   1   1   4   8  31   8 101   8
            9   0   0   0   0   0   0   0   2  44






    Out[9]:




0.729896907216495



In [10]:

    
RF.prod.pred = predict(RF.local.model, newdata = test_prod)

Forming the submission file



In [11]:

    
sub = cbind(test_prod, Facies = RF.prod.pred)

Writing the predicted output file



In [12]:

    
write.csv(sub, row.names= F, 'RF_predicted_facies_1_MATT.csv')

Random forest

Reading the file

Check the structure of the file

Removing the rows with NA's

Converting the Facies column to factor

Splitting into train and local validation test

Creating model

Local validation set accuracy

Predicting on the blind dataset

Forming the submission file

Writing the predicted output file