Thanish's entry

Note from Matt: I converted this from the Rmd file with the ipyrmd utility, which worked almost perfectly.

This is my first time using a notebook and github so I just want to run the inital SVM to see how it works and what is the accuracy I am getting.


In [ ]:
# Uncomment this line if you don't already have this library.
# install.packages("e1071", repos="http://cran.rstudio.com/")

In [4]:
library(e1071)

Reading the file


In [5]:
train_prod = read.csv('../facies_vectors.csv')
test_prod = read.csv('../nofacies_data.csv')

Check the structure of the file


In [6]:
str(train_prod)
str(test_prod)


'data.frame':	4149 obs. of  11 variables:
 $ Facies   : int  3 3 3 3 3 3 3 3 3 3 ...
 $ Formation: Factor w/ 14 levels "A1 LM","A1 SH",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Well.Name: Factor w/ 10 levels "ALEXANDER D",..: 10 10 10 10 10 10 10 10 10 10 ...
 $ Depth    : num  2793 2794 2794 2794 2795 ...
 $ GR       : num  77.5 78.3 79 86.1 74.6 ...
 $ ILD_log10: num  0.664 0.661 0.658 0.655 0.647 0.636 0.63 0.625 0.624 0.615 ...
 $ DeltaPHI : num  9.9 14.2 14.8 13.9 13.5 14 15.6 16.5 16.2 16.9 ...
 $ PHIND    : num  11.9 12.6 13.1 13.1 13.3 ...
 $ PE       : num  4.6 4.1 3.6 3.5 3.4 3.6 3.7 3.5 3.4 3.5 ...
 $ NM_M     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ RELPOS   : num  1 0.979 0.957 0.936 0.915 0.894 0.872 0.83 0.809 0.787 ...
'data.frame':	830 obs. of  10 variables:
 $ Formation: Factor w/ 14 levels "A1 LM","A1 SH",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Well.Name: Factor w/ 2 levels "CRAWFORD","STUART": 2 2 2 2 2 2 2 2 2 2 ...
 $ Depth    : num  2808 2808 2809 2810 2810 ...
 $ GR       : num  66.3 77.3 82.9 80.7 76 ...
 $ ILD_log10: num  0.63 0.585 0.566 0.593 0.638 0.667 0.674 0.667 0.653 0.642 ...
 $ DeltaPHI : num  3.3 6.5 9.4 9.5 8.7 6.9 6.5 6.3 6.7 7.3 ...
 $ PHIND    : num  10.7 11.9 13.6 13.2 12.3 ...
 $ PE       : num  3.59 3.34 3.06 2.98 3.02 ...
 $ NM_M     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ RELPOS   : num  1 0.978 0.956 0.933 0.911 0.889 0.867 0.844 0.822 0.8 ...

Removing the rows with NA's


In [7]:
train_prod = train_prod[!is.na(train_prod$PE),]

Converting the Facies column to factor


In [8]:
train_prod$Facies = as.factor(as.character(train_prod$Facies))

Splitting into train and local validation test


In [9]:
train_row = sample(nrow(train_prod), 0.7*nrow(train_prod), replace=F)
train_local = train_prod[train_row,]
test_local  = train_prod[-train_row,]

Creating SVM model


In [10]:
SVM.local.model = svm(Facies~., data = train_local[!colnames(train_local) %in% c('Formation',
                                                                                 'Well.Name',
                                                                                 'Depth'
                                                                                 )])
SVM.local.pred = predict(SVM.local.model, newdata = test_local)

Local validation set accuracy


In [11]:
acc_table_SVM = table(SVM.local.pred, test_local$Facies)
acc_table_SVM
acc_SVM = sum(diag(acc_table_SVM))/nrow(test_local)
acc_SVM


Out[11]:
              
SVM.local.pred   1   2   3   4   5   6   7   8   9
             1  34   6   1   0   0   0   0   0   0
             2  31 169  62   0   2   0   0   0   0
             3   1  54 128   0   3   0   1   5   0
             4   0   0   1  29   2  17   1   3   0
             5   0   0   2   1   1   0   1   0   0
             6   0   0   1  18  50  98   3  33   4
             7   0   1   0   0   2   3  12   1   1
             8   0   0   1   2  13  16   6 104  14
             9   0   0   0   0   0   0   1   2  29
Out[11]:
0.622680412371134

Predicting on the blind dataset


In [12]:
SVM.prod.pred = predict(SVM.local.model, newdata = test_prod)

Forming the submission file


In [13]:
sub = cbind(test_prod, Facies = SVM.prod.pred)

Writting the predicted output file


In [14]:
write.csv(sub, row.names= F, 'SVM_predicted_facies_MATT.csv')