In [1]:
%load_ext rmagic


During startup - Warning messages:
1: Setting LC_TIME failed, using "C" 
2: Setting LC_MONETARY failed, using "C" 
3: Setting LC_PAPER failed, using "C" 
4: Setting LC_MEASUREMENT failed, using "C" 

In [4]:
%%R

dir = "05-05-2014"
mycvs = paste(dir, "robust_traces.csv", sep="/")
robust_program_events = read.csv(mycvs, sep="\t", header = F)


mycvs = paste(dir, "buggy_traces.csv", sep="/")
buggy_program_events = read.csv(mycvs, sep="\t", header = F)

print(nrow(robust_program_events))
print(nrow(buggy_program_events))


[1] 267
[1] 260

In [5]:
%%R

print(robust_program_events[1,])


               V1
1 /usr/bin/sqlite

1 signal:ret_val=NPtr32  signal:1=GPtr32  signal:0=Num32B8  getuid:ret_val=Num32B16  getuid:0=Top32  strlen:ret_val=Num32B8  strlen:0=HPtr32  malloc:0=Num32B8  malloc:ret_val=HPtr32  strcpy:1=HPtr32  strcpy:0=HPtr32  strcpy:ret_val=HPtr32  malloc:0=Num32B8  malloc:ret_val=HPtr32  sprintf:1=GPtr32  sprintf:0=HPtr32  sprintf:ret_val=Num32B8  free:0=HPtr32  free:ret_val=Top32  fopen:ret_val=NPtr32  fopen:1=GPtr32  fopen:0=HPtr32  fputs:0=HPtr32  fputs:1=LPtr32  fputs:ret_val=Num32B8  fputs:0=SPtr32  fputs:1=LPtr32  fputs:ret_val=Num32B8  fputs:0=HPtr32  fputs:1=LPtr32  fputs:ret_val=Num32B8  fputs:0=SPtr32  fputs:1=LPtr32  fputs:ret_val=Num32B8  fputs:0=HPtr32  fputs:1=LPtr32  fputs:ret_val=Num32B8  fputs:0=SPtr32  fputs:1=LPtr32  fputs:ret_val=Num32B8  fputs:0=SPtr32  fputs:1=LPtr32  fputs:ret_val=Num32B8  fputc:0=Num32B8  fputc:1=LPtr32  fputc:ret_val=Num32B8  fputs:0=HPtr32  fputs:1=LPtr32  fputs:ret_val=Num32B8  fputs:0=SPtr32  fputs:1=LPtr32  fputs:ret_val=Num32B8  fputs:0=HPtr32  fputs:1=LPtr32  fputs:ret_val=Num32B8  fputs:0=SPtr32  fputs:1=LPtr32  fputs:ret_val=Num32B8  fputs:0=HPtr32  fputs:1=LPtr32  fputs:ret_val=Num32B8  fputs:0=SPtr32  fputs:1=LPtr32  fputs:ret_val=Num32B8  fputs:0=SPtr32  fputs:1=LPtr32  fputs:ret_val=Num32B8  fputc:0=Num32B8  fputc:1=LPtr32  fputc:ret_val=Num32B8  

TODO: Add an explanation about program traces as documents.

Now, we load the tm package and create the corpuses from the "documents"


In [6]:
%%R

library(tm)

robust_corpus = Corpus(VectorSource(robust_program_events[,2]))
buggy_corpus  = Corpus(VectorSource(buggy_program_events[,2]))

print(robust_corpus)
print(buggy_corpus)


A corpus with 267 text documents
A corpus with 260 text documents

Now, it is time to create the document matrixes, and convert them to data frames adding its classes.


In [7]:
%%R

robust_dm = DocumentTermMatrix(robust_corpus)
buggy_dm  = DocumentTermMatrix(buggy_corpus)

#print(robust_dm)
#print(buggy_dm)

sink("/dev/null")

robust_dm_df =  as.data.frame(inspect(robust_dm))
rownames(robust_dm_df) = 1:nrow(robust_dm)
robust_dm_df["class"] = "robust"
    
buggy_dm_df =  as.data.frame(inspect(buggy_dm))
rownames(buggy_dm_df) = 1:nrow(buggy_dm)
buggy_dm_df["class"] = "buggy"
    
sink()

#print(colnames(robust_dm_df))
#print(colnames(buggy_dm_df))

but we need make sure we are using the same variables for both corpuses.


In [8]:
%%R

dm_df = merge(robust_dm_df, buggy_dm_df,all=TRUE, sort=FALSE) 

#print(dm_df[1,])
#print(nrow(dm_df))
dm_df[is.na(dm_df)] = 0

robust_cases = dm_df[dm_df$class == "robust",]
buggy_cases  = dm_df[dm_df$class == "buggy",]

print(nrow(robust_cases))
print(nrow(buggy_cases))


[1] 267
[1] 260

now, we are ready to select train and test..


In [11]:
%%R

train_size = 200

# buggy train and test

n = nrow(buggy_cases)
#rsample = sample(n)

# 100 cases are selected to keep the train dataset balanced
train_sample = 1:train_size #rsample[1:as.integer(n*0.45)] 
test_sample = (train_size+1):n #rsample[as.integer(n*0.45+1):n]

buggy_train = buggy_cases[train_sample,]
buggy_test  = buggy_cases[test_sample,]

print(nrow(buggy_train))
print(nrow(buggy_test))

# robust train and test

n = nrow(robust_cases)
#rsample = sample(n)

# 100 cases are selected to keep the train dataset balanced
train_sample = 1:train_size#rsample[1:as.integer(n*0.75)]
test_sample = (train_size+1):n #rsample[as.integer(n*0.75+1):n]

robust_train = robust_cases[train_sample,]
robust_test  = robust_cases[test_sample,]

print(nrow(robust_train))
print(nrow(robust_test))

train = rbind(buggy_train, robust_train)
test  = rbind(buggy_test, robust_test)


[1] 200
[1] 60
[1] 200
[1] 67

Finally, we are ready to train and test a knn model:


In [13]:
%%R

#print(round(importance(rf),2))

library("class")

x_train = train[,names(train) != "class"]
x_test  = test[,names(test) != "class"]
y_train = train[,"class"]
y_test  = test[,"class"]

#print(y)
for (k in 1:10) {
  print(k)
  z = knn(x_train,x_test, y_train, k, use.all = FALSE)
  print(z)
  #print(test[,"class"])
  print(table(z, y_test))
}


[1] 1
  [1] buggy  robust buggy  buggy  buggy  robust buggy  buggy  robust robust
 [11] robust robust robust robust robust buggy  buggy  buggy  buggy  buggy 
 [21] buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy 
 [31] buggy  buggy  buggy  buggy  buggy  buggy  robust robust buggy  robust
 [41] robust robust robust robust buggy  robust robust robust robust buggy 
 [51] buggy  robust robust robust buggy  robust robust robust robust robust
 [61] buggy  buggy  robust robust buggy  robust robust robust robust robust
 [71] robust buggy  robust robust robust robust robust robust robust robust
 [81] robust buggy  robust robust robust robust buggy  robust robust robust
 [91] robust robust robust robust robust robust robust robust robust robust
[101] robust robust robust robust robust buggy  robust buggy  buggy  buggy 
[111] buggy  robust buggy  robust robust robust robust robust robust robust
[121] robust robust robust robust robust robust robust
Levels: buggy robust
        y_test
z        buggy robust
  buggy     32     12
  robust    28     55
[1] 2
  [1] robust robust buggy  buggy  buggy  robust buggy  buggy  robust robust
 [11] robust robust robust robust robust buggy  buggy  buggy  buggy  buggy 
 [21] buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy 
 [31] buggy  buggy  buggy  buggy  buggy  buggy  robust robust robust robust
 [41] robust robust robust robust buggy  robust robust robust robust buggy 
 [51] buggy  robust robust robust buggy  robust robust robust robust robust
 [61] buggy  buggy  robust robust buggy  robust robust robust robust robust
 [71] robust buggy  robust robust robust robust robust robust robust robust
 [81] robust buggy  robust robust robust robust buggy  robust robust robust
 [91] robust robust robust robust buggy  robust robust robust robust robust
[101] robust robust robust robust buggy  buggy  robust buggy  buggy  buggy 
[111] buggy  robust buggy  robust robust robust buggy  robust robust robust
[121] robust robust robust robust robust robust robust
Levels: buggy robust
        y_test
z        buggy robust
  buggy     30     15
  robust    30     52
[1] 3
  [1] buggy  robust buggy  buggy  buggy  robust buggy  buggy  robust robust
 [11] robust robust robust robust robust buggy  buggy  buggy  buggy  buggy 
 [21] buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy 
 [31] buggy  buggy  buggy  buggy  buggy  buggy  robust robust buggy  robust
 [41] robust robust robust robust buggy  robust robust robust robust buggy 
 [51] buggy  robust robust robust buggy  robust robust robust robust robust
 [61] robust buggy  robust robust buggy  robust robust robust robust robust
 [71] robust buggy  robust robust robust robust robust robust robust robust
 [81] robust buggy  robust robust robust robust buggy  robust buggy  robust
 [91] robust robust robust robust robust robust robust robust robust robust
[101] robust robust robust robust robust robust robust buggy  buggy  buggy 
[111] buggy  robust robust robust robust robust robust buggy  robust robust
[121] robust robust robust robust robust robust robust
Levels: buggy robust
        y_test
z        buggy robust
  buggy     32     11
  robust    28     56
[1] 4
  [1] buggy  robust robust robust robust robust buggy  buggy  robust robust
 [11] robust buggy  robust robust buggy  buggy  buggy  buggy  buggy  buggy 
 [21] buggy  buggy  buggy  buggy  buggy  buggy  robust buggy  buggy  buggy 
 [31] buggy  buggy  buggy  buggy  buggy  buggy  robust buggy  buggy  robust
 [41] robust robust robust robust buggy  robust robust robust robust buggy 
 [51] buggy  robust robust robust buggy  robust robust robust robust robust
 [61] robust buggy  robust robust buggy  robust robust robust robust robust
 [71] robust buggy  robust robust robust robust robust robust robust robust
 [81] robust buggy  robust robust robust robust buggy  robust buggy  robust
 [91] robust robust robust robust robust robust robust robust robust robust
[101] robust robust robust robust robust robust robust buggy  buggy  buggy 
[111] buggy  robust robust robust robust robust robust buggy  robust robust
[121] robust robust robust robust robust robust robust
Levels: buggy robust
        y_test
z        buggy robust
  buggy     31     11
  robust    29     56
[1] 5
  [1] robust robust robust robust robust robust robust buggy  robust robust
 [11] robust buggy  robust robust robust buggy  buggy  buggy  buggy  buggy 
 [21] buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy 
 [31] buggy  buggy  buggy  buggy  buggy  buggy  robust robust buggy  robust
 [41] robust robust robust robust buggy  robust robust robust robust buggy 
 [51] buggy  robust robust robust buggy  robust robust robust robust robust
 [61] robust robust robust robust buggy  robust robust robust robust robust
 [71] robust buggy  robust robust robust robust robust robust robust robust
 [81] robust buggy  robust robust robust robust buggy  robust buggy  robust
 [91] robust robust robust robust robust robust robust robust robust robust
[101] robust robust robust robust robust robust robust buggy  buggy  buggy 
[111] buggy  robust robust robust robust robust robust buggy  robust robust
[121] robust robust robust robust robust robust robust
Levels: buggy robust
        y_test
z        buggy robust
  buggy     28     10
  robust    32     57
[1] 6
  [1] robust robust robust robust robust robust robust buggy  robust buggy 
 [11] robust buggy  robust robust buggy  buggy  buggy  buggy  buggy  buggy 
 [21] buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy 
 [31] buggy  buggy  buggy  buggy  buggy  buggy  robust robust buggy  robust
 [41] robust robust robust robust buggy  robust robust robust robust buggy 
 [51] buggy  robust robust robust buggy  robust robust robust robust robust
 [61] robust robust robust robust buggy  robust robust robust robust robust
 [71] robust buggy  robust robust robust robust robust robust robust robust
 [81] robust buggy  robust robust buggy  robust buggy  buggy  buggy  robust
 [91] robust robust robust robust robust robust robust robust robust robust
[101] robust robust robust robust robust robust robust buggy  buggy  buggy 
[111] buggy  robust robust robust robust robust robust buggy  robust robust
[121] robust robust robust robust robust robust robust
Levels: buggy robust
        y_test
z        buggy robust
  buggy     30     12
  robust    30     55
[1] 7
  [1] robust robust robust robust robust robust robust buggy  robust robust
 [11] robust buggy  robust robust robust buggy  buggy  buggy  buggy  buggy 
 [21] buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy 
 [31] buggy  buggy  buggy  buggy  buggy  buggy  robust robust robust robust
 [41] robust robust robust robust buggy  robust robust robust robust buggy 
 [51] buggy  robust robust robust buggy  robust robust robust robust robust
 [61] robust robust robust robust buggy  robust robust robust robust robust
 [71] robust buggy  robust robust robust robust robust robust robust robust
 [81] robust buggy  robust robust robust robust buggy  robust buggy  robust
 [91] robust robust robust robust robust robust robust robust robust robust
[101] robust robust robust robust robust robust robust buggy  buggy  buggy 
[111] buggy  robust robust robust robust robust robust robust robust robust
[121] robust robust robust robust robust robust robust
Levels: buggy robust
        y_test
z        buggy robust
  buggy     27      9
  robust    33     58
[1] 8
  [1] robust robust robust robust robust robust robust buggy  robust robust
 [11] robust robust robust robust robust buggy  buggy  buggy  buggy  buggy 
 [21] buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy 
 [31] buggy  robust buggy  buggy  buggy  buggy  robust robust robust robust
 [41] robust robust robust robust buggy  robust robust robust robust buggy 
 [51] buggy  robust robust robust buggy  robust robust robust robust robust
 [61] robust robust robust robust buggy  robust buggy  robust robust robust
 [71] robust buggy  robust robust robust robust robust robust robust robust
 [81] robust buggy  robust robust buggy  robust buggy  buggy  buggy  robust
 [91] robust robust robust robust robust robust robust robust robust buggy 
[101] robust robust robust robust robust robust robust buggy  buggy  buggy 
[111] buggy  robust robust robust robust robust robust robust robust robust
[121] robust robust robust robust robust robust robust
Levels: buggy robust
        y_test
z        buggy robust
  buggy     25     13
  robust    35     54
[1] 9
  [1] robust buggy  robust robust robust robust robust buggy  robust buggy 
 [11] robust robust robust robust robust robust buggy  buggy  robust buggy 
 [21] buggy  buggy  buggy  buggy  buggy  robust buggy  buggy  buggy  buggy 
 [31] buggy  robust buggy  buggy  buggy  buggy  robust robust robust robust
 [41] robust robust robust robust buggy  robust robust robust robust buggy 
 [51] buggy  robust robust robust buggy  robust robust robust robust robust
 [61] robust robust robust robust buggy  buggy  buggy  robust robust robust
 [71] robust buggy  robust robust robust robust robust robust robust robust
 [81] robust buggy  robust robust robust robust buggy  robust buggy  robust
 [91] robust robust robust robust robust robust robust robust robust buggy 
[101] robust robust robust robust robust robust robust buggy  buggy  buggy 
[111] buggy  robust robust robust robust robust robust robust robust robust
[121] robust robust robust robust robust robust robust
Levels: buggy robust
        y_test
z        buggy robust
  buggy     24     12
  robust    36     55
[1] 10
  [1] robust buggy  robust buggy  robust robust robust robust robust buggy 
 [11] robust robust robust robust robust buggy  buggy  buggy  buggy  buggy 
 [21] buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy 
 [31] buggy  robust buggy  buggy  buggy  buggy  robust robust robust robust
 [41] robust robust robust robust buggy  robust robust robust robust buggy 
 [51] buggy  robust robust robust robust robust robust robust robust robust
 [61] buggy  robust robust robust buggy  buggy  buggy  robust robust robust
 [71] robust buggy  robust robust robust robust robust robust robust robust
 [81] robust buggy  robust robust robust robust buggy  robust buggy  robust
 [91] robust robust robust robust robust robust robust robust robust buggy 
[101] robust robust robust robust robust buggy  robust buggy  buggy  buggy 
[111] buggy  robust robust robust robust robust robust robust robust robust
[121] robust robust robust robust robust robust robust
Levels: buggy robust
        y_test
z        buggy robust
  buggy     26     14
  robust    34     53

or a random forest..


In [18]:
%%R
library("e1071")

xy_train = train#[,names(train) != "class"]
xy_train[,"class"] = factor(train[,"class"])
x_test = test[,names(test) != "class"]
#y_train = train[,"class"]
y_test  = test[,"class"]

m = svm(class ~., data=xy_train, gamma=0.001, cost=10)
#m = tune.svm(class~., data = xy_train,  gamma = 10^(-6:-1), cost = 10^(1:2))
print(summary(m))

z = predict(m,x_test)
print(z)
print(table(z, y_test))


Call:
svm(formula = class ~ ., data = xy_train, gamma = 0.001, cost = 10)


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  radial 
       cost:  10 
      gamma:  0.001 

Number of Support Vectors:  246

 ( 110 136 )


Number of Classes:  2 

Levels: 
 buggy robust



   468    469    470    471    472    473    474    475    476    477    478 
 buggy robust  buggy  buggy  buggy  buggy  buggy  buggy  buggy robust  buggy 
   479    480    481    482    483    484    485    486    487    488    489 
robust robust robust robust  buggy  buggy  buggy  buggy  buggy  buggy  buggy 
   490    491    492    493    494    495    496    497    498    499    500 
 buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy  buggy 
   501    502    503    504    505    506    507    508    509    510    511 
 buggy  buggy  buggy robust robust robust robust robust robust robust robust 
   512    513    514    515    516    517    518    519    520    521    522 
robust robust robust robust robust robust robust robust robust robust robust 
   523    524    525    526    527    201    202    203    204    205    206 
robust robust robust  buggy robust robust  buggy robust robust robust robust 
   207    208    209    210    211    212    213    214    215    216    217 
robust robust robust robust robust robust robust robust robust robust robust 
   218    219    220    221    222    223    224    225    226    227    228 
robust robust robust robust robust robust robust robust robust robust robust 
   229    230    231    232    233    234    235    236    237    238    239 
robust robust robust robust robust robust robust robust robust robust robust 
   240    241    242    243    244    245    246    247    248    249    250 
robust robust robust robust robust robust robust robust robust robust robust 
   251    252    253    254    255    256    257    258    259    260    261 
robust robust robust robust robust robust robust robust robust robust robust 
   262    263    264    265    266    267 
robust robust robust robust robust robust 
Levels: buggy robust
        y_test
z        buggy robust
  buggy     31      1
  robust    29     66