Diferente de um extrator de caracteristicas, nesse arquivo apenas focamos em entender se apenas contarmos com apenas alguns lags anteriores é suficiente para realizar uma classifciação de qualidade.

Basicamente, apenas pegamos os últimos momentos capturados pelo percurso e tranformamos esses momentos finais em atributos de uma instância de um possível modelo. Nessa simples abordagem é muito utilizada em regressão para previsão dos próximos passos ou valores.

Com essa abordagem podemos simplificar e deixar por conta dos classificadores extrair ou linearmente separar as classes binárias, como é o nosso caso.

Abaixo apenas consideramos os últimos 2 lags e assim fazer com que o classificador, SVM, separe cada instância.


In [12]:
library(caret)
library(kernlab)
library(pROC)

groups <- read.csv(file="./MovementAAL/groups/MovementAAL_DatasetGroup.csv",head=TRUE,sep=",")
targetAll <- read.csv(file="./MovementAAL/dataset/MovementAAL_target.csv",head=TRUE,sep=",")

In [2]:
#Group 1
allDataGroup1<-list()
allDataGroup1Target<-list()
groups1 = groups[groups$dataset_ID==1, ]

index<-1
for (id in groups1$X.sequence_ID){
    caminho <-paste("./MovementAAL/dataset/MovementAAL_RSS_",id,".csv",sep="")
    allDataGroup1[[index]]<-read.csv(file=caminho,head=TRUE,sep=",")
    allDataGroup1Target[index]<-targetAll[[2]][id]
    index<-index+1
}
wtData <- NULL
minStepsBack = 1
for (i in 1:length(allDataGroup1)){
     aa<- t(unlist(allDataGroup1[[i]][(nrow(allDataGroup1[[i]])-minStepsBack):nrow(allDataGroup1[[i]]),]))
    wtData <- rbind(wtData, aa)
}
wtData <- as.data.frame(wtData)
data = unlist(allDataGroup1Target)
target = factor(data,labels=c("No","Yes"))
frameDataFinal <- data.frame(cbind(target, wtData))
head(frameDataFinal)
##use only lagged data


Out[2]:
targetX.RSS_anchor11X.RSS_anchor12RSS_anchor21RSS_anchor22RSS_anchor31RSS_anchor32RSS_anchor41RSS_anchor42
1Yes0-0.142860.04-0.6-0.047619-0.28571-0.05-0.1
2Yes-0.33333-0.142860.040.040.0952380.14286-0.10.05
3Yes-0.28571-0.14286-0.04-0.08-0.0952380.14286-0.050.35
4Yes-0.42857-0.57143-0.2-0.5200.0476190.4-0.15
5Yes-0.57143-0.52381-0.6-0.440.571430.285710.90.8
6Yes-1-0.47619-0.32-0.20.714290.523810.650.95

Média e Desvio padrão respectivamente.

Group 1, com Cross Validation tipo 10-fold

In [3]:
inTraining <- createDataPartition(frameDataFinal$target, p = .7, list = TRUE,times=10)
allAccuracyGroup1 <- c()

for( i in 1:length(inTraining)){

    training <- frameDataFinal[ inTraining[[i]],]
    testing  <- frameDataFinal[-inTraining[[i]],]
    fitControl <- trainControl(method = "none", classProbs = TRUE)

    svmLinearFit <- train(target ~ ., data = training,
                     method = "svmLinear",
                     trControl = fitControl,
                     family=binomial)
    preds<- predict(svmLinearFit, newdata = testing)
    matrix <- confusionMatrix(preds,frameDataFinal$target[-inTraining[[i]]])
    allAccuracyGroup1 <- c(allAccuracyGroup1,matrix[3]$overall[[1]])
}

mean(allAccuracyGroup1)
sd(allAccuracyGroup1)


Out[3]:
0.673333333333333
Out[3]:
0.0644061188719531

In [4]:
#Group 2
allDataGroup2<-list()
allDataGroup2Target<-list()
groups2 = groups[groups$dataset_ID==2, ]

index<-1
for (id in groups2$X.sequence_ID){
    caminho <-paste("./MovementAAL/dataset/MovementAAL_RSS_",id,".csv",sep="")
    allDataGroup2[[index]]<-read.csv(file=caminho,head=TRUE,sep=",")
    allDataGroup2Target[index]<-targetAll[[2]][id]
    index<-index+1
}
wtData <- NULL
minStepsBack = 1
for (i in 1:length(allDataGroup2)){
     aa<- t(unlist(allDataGroup2[[i]][(nrow(allDataGroup2[[i]])-minStepsBack):nrow(allDataGroup2[[i]]),]))
    wtData <- rbind(wtData, aa)
}
wtData <- as.data.frame(wtData)
data = unlist(allDataGroup2Target)
target = factor(data,labels=c("No","Yes"))
frameDataFinal <- data.frame(cbind(target, wtData))
head(frameDataFinal)
##use only lagged data


Out[4]:
targetX.RSS_anchor11X.RSS_anchor12RSS_anchor21RSS_anchor22RSS_anchor31RSS_anchor32RSS_anchor41RSS_anchor42
1Yes0.0666670.066667-0.48718-0.17949-0.45455-0.36364-0.047619-0.14286
2Yes0.8666710.384620.84615-0.59091-0.81818-0.7619-0.7619
3Yes0.733330.733330.538460.53846-0.81818-0.90909-0.61905-0.57143
4Yes0.20.20.538460.23077-0.86364-0.72727-1-0.57143
5Yes0.688890.688890.692310.38462-0.86364-0.90909-0.80952-0.61905
6Yes0.644440.644440.641030.69231-0.59091-0.81818-0.61905-0.57143

Média e Desvio padrão respectivamente.

Group 2, com Cross Validation tipo 10-fold

In [5]:
inTraining <- createDataPartition(frameDataFinal$target, p = .7, list = TRUE,times=10)
allAccuracyGroup2 <- c()

for( i in 1:length(inTraining)){

    training <- frameDataFinal[ inTraining[[i]],]
    testing  <- frameDataFinal[-inTraining[[i]],]
    fitControl <- trainControl(method = "none", classProbs = TRUE)

    svmLinearFit <- train(target ~ ., data = training,
                     method = "svmLinear",
                     trControl = fitControl,
                     family=binomial)
    preds<- predict(svmLinearFit, newdata = testing)
    matrix <- confusionMatrix(preds,frameDataFinal$target[-inTraining[[i]]])
    allAccuracyGroup2 <- c(allAccuracyGroup2,matrix[3]$overall[[1]])
}

mean(allAccuracyGroup2)
sd(allAccuracyGroup2)


Out[5]:
0.683870967741935
Out[5]:
0.0757282445662753

In [6]:
#Group 3
allDataGroup3<-list()
allDataGroup3Target<-list()
groups3 = groups[groups$dataset_ID==3, ]

index<-1
for (id in groups3$X.sequence_ID){
    caminho <-paste("./MovementAAL/dataset/MovementAAL_RSS_",id,".csv",sep="")
    allDataGroup3[[index]]<-read.csv(file=caminho,head=TRUE,sep=",")
    allDataGroup3Target[index]<-targetAll[[2]][id]
    index<-index+1
}
wtData <- NULL
minStepsBack = 1
for (i in 1:length(allDataGroup3)){
     aa<- t(unlist(allDataGroup3[[i]][(nrow(allDataGroup3[[i]])-minStepsBack):nrow(allDataGroup3[[i]]),]))
    wtData <- rbind(wtData, aa)
}
wtData <- as.data.frame(wtData)
data = unlist(allDataGroup3Target)
target = factor(data,labels=c("No","Yes"))
frameDataFinal <- data.frame(cbind(target, wtData))
head(frameDataFinal)
##use only lagged data


Out[6]:
targetX.RSS_anchor11X.RSS_anchor12RSS_anchor21RSS_anchor22RSS_anchor31RSS_anchor32RSS_anchor41RSS_anchor42
1Yes0.0909090.0909090.0666670.066667-0.33333-0.33333-0.31915-0.3617
2Yes0.772730.77273-0.155560.6-0.82222-1-0.61702-0.61702
3Yes0.50.50.20.33333-0.86667-0.68889-0.57447-0.57447
4Yes0.863640.863640.24444-0.68889-0.95556-0.95556-0.53191-0.53191
5Yes0.772730.86364-0.066667-0.11111-0.68889-0.64444-0.3617-0.3617
6Yes0.318180.727270.15556-0.11111-0.6-0.51111-0.48936-0.48936

Média e Desvio padrão respectivamente.

Group 3, com Cross Validation tipo 10-fold

In [7]:
inTraining <- createDataPartition(frameDataFinal$target, p = .7, list = TRUE,times=10)
allAccuracyGroup3 <- c()

for( i in 1:length(inTraining)){

    training <- frameDataFinal[ inTraining[[i]],]
    testing  <- frameDataFinal[-inTraining[[i]],]
    fitControl <- trainControl(method = "none", classProbs = TRUE)

    svmLinearFit <- train(target ~ ., data = training,
                     method = "svmLinear",
                     trControl = fitControl,
                     family=binomial)
    preds<- predict(svmLinearFit, newdata = testing)
    matrix <- confusionMatrix(preds,frameDataFinal$target[-inTraining[[i]]])
    allAccuracyGroup3 <- c(allAccuracyGroup3,matrix[3]$overall[[1]])
}

mean(allAccuracyGroup3)
sd(allAccuracyGroup3)


Out[7]:
0.516129032258065
Out[7]:
0.056897877657303

In [8]:
#All Groups
allData<-list()
allDataTarget<-list()
targetAll <- read.csv(file="./MovementAAL/dataset/MovementAAL_target.csv",head=TRUE,sep=",")

index<-1
for (id in targetAll$X.sequence_ID){
    caminho <-paste("./MovementAAL/dataset/MovementAAL_RSS_",id,".csv",sep="")
    allData[[index]]<-read.csv(file=caminho,head=TRUE,sep=",")
    allDataTarget[index]<-targetAll[[2]][id]
    index<-index+1
}
wtData <- NULL
minStepsBack = 1
for (i in 1:length(allData)){
     aa<- t(unlist(allData[[i]][(nrow(allData[[i]])-minStepsBack):nrow(allData[[i]]),]))
    wtData <- rbind(wtData, aa)
}
wtData <- as.data.frame(wtData)
data = unlist(allDataTarget)
target = factor(data,labels=c("No","Yes"))
frameDataFinal <- data.frame(cbind(target, wtData))
head(frameDataFinal)


Out[8]:
targetX.RSS_anchor11X.RSS_anchor12RSS_anchor21RSS_anchor22RSS_anchor31RSS_anchor32RSS_anchor41RSS_anchor42
1Yes0-0.142860.04-0.6-0.047619-0.28571-0.05-0.1
2Yes-0.33333-0.142860.040.040.0952380.14286-0.10.05
3Yes-0.28571-0.14286-0.04-0.08-0.0952380.14286-0.050.35
4Yes-0.42857-0.57143-0.2-0.5200.0476190.4-0.15
5Yes-0.57143-0.52381-0.6-0.440.571430.285710.90.8
6Yes-1-0.47619-0.32-0.20.714290.523810.650.95

Média e Desvio padrão respectivamente.

Todos os Groups em uma base apenas, com Cross Validation tipo 10-fold

In [9]:
inTraining <- createDataPartition(frameDataFinal$target, p = .7, list = TRUE,times=10)
allAccuracy <- c()

for( i in 1:length(inTraining)){

    training <- frameDataFinal[ inTraining[[i]],]
    testing  <- frameDataFinal[-inTraining[[i]],]
    fitControl <- trainControl(method = "none", classProbs = TRUE)

    svmLinearFit <- train(target ~ ., data = training,
                     method = "svmLinear",
                     trControl = fitControl,
                     family=binomial)
    preds<- predict(svmLinearFit, newdata = testing)
    matrix <- confusionMatrix(preds,frameDataFinal$target[-inTraining[[i]]])
    allAccuracy <- c(allAccuracy,matrix[3]$overall[[1]])
}

mean(allAccuracy)
sd(allAccuracy)


Out[9]:
0.610752688172043
Out[9]:
0.0350448933665385

Matrix de confusão

Todos os Groups em uma base apenas


In [10]:
#All groups datasets Confusion Matrix 
inTraining <- createDataPartition(frameDataFinal$target, p = .7, list = TRUE,times=1)
training <- frameDataFinal[ inTraining[[1]],]
testing  <- frameDataFinal[-inTraining[[1]],]
fitControl <- trainControl(method = "none", classProbs = TRUE)

svmLinearFit <- train(target ~ ., data = training,
                     method = "svmLinear",
                     trControl = fitControl,
                     family=binomial)
preds<- predict(svmLinearFit, newdata = testing)
matrix <- confusionMatrix(preds,frameDataFinal$target[-inTraining[[1]]])
matrix


Out[10]:
Confusion Matrix and Statistics

          Reference
Prediction No Yes
       No  31  16
       Yes 15  31
                                          
               Accuracy : 0.6667          
                 95% CI : (0.5613, 0.7611)
    No Information Rate : 0.5054          
    P-Value [Acc > NIR] : 0.001211        
                                          
                  Kappa : 0.3334          
 Mcnemar's Test P-Value : 1.000000        
                                          
            Sensitivity : 0.6739          
            Specificity : 0.6596          
         Pos Pred Value : 0.6596          
         Neg Pred Value : 0.6739          
             Prevalence : 0.4946          
         Detection Rate : 0.3333          
   Detection Prevalence : 0.5054          
      Balanced Accuracy : 0.6667          
                                          
       'Positive' Class : No              
                                          

Curva ROC e AUC

Todos os Groups em uma base apenas


In [11]:
#ROC CURVE AND AUC
predsProb<- predict(svmLinearFit, newdata = testing,type="prob")
outcome<- predsProb[,2]
classes <- frameDataFinal$target[-inTraining[[1]]]
rocobj <- roc(classes, outcome,levels=c("No","Yes"))
plot(rocobj)


Out[11]:
Call:
roc.default(response = classes, predictor = outcome, levels = c("No",     "Yes"))

Data: outcome in 46 controls (classes No) < 47 cases (classes Yes).
Area under the curve: 0.7068