install.packages("caret", dependencies = c("Depends", "Suggests"))
caretcaret PackageThe caret package, short for Classification And REgression Training, contains numerous tools for developing predictive models using the rich set of models available in R. The package focuses on
The package is available at the Comprehensive R Archive Network (CRAN). caret depends on over 25 other packages, although many of these are listed as "suggested" packages are are not automatically loaded when caret is started. Packages are loaded individually when a model is trained or predicted.
In [1]:
%load_ext rmagic
In [2]:
%%R
require("caret")
require("mlbench")
data(Sonar)
set.seed(107)
inTrain <- createDataPartition(y = Sonar$Class, p = 3/4, list = FALSE)
## The output is a set of integers for the rows of Sonar that belong in the training set.
trainDescr <- Sonar[inTrain,1:60]
testDescr <- Sonar[-inTrain,1:60]
trainClass <- Sonar$Class[inTrain]
print(length(trainClass))
testClass <- Sonar$Class[-inTrain]
print(length(testClass))
By default, createDataPartition does stratified random splits.
In [3]:
%%R
print(ncol(trainDescr))
trainingCorr <- cor(trainDescr)
highCorr <- findCorrelation(trainingCorr, 0.90)
# returns an index of column numbers for removal
trainDescr <- trainDescr[, -highCorr]
testDescr <- testDescr[, -highCorr]
print(ncol(trainDescr))
In [4]:
%%R
xTrans <- preProcess(trainDescr, method = c("center", "scale"))
trainDescr <- predict(xTrans, trainDescr)
testDescr <- predict(xTrans, testDescr)
To apply PCA to predictors in the training, test or other data, you can use:
In [5]:
%%R
xTrans <- preProcess(trainDescr, method = "pca")
In [6]:
%%R
trControl <- trainControl(method="cv", number=25)
logFit <- train(x = trainDescr, y = trainClass,
method='glm', family=binomial(link="logit"),
trControl = trControl)
logFit
In [7]:
%%R -w 960 -h 480 -u px
resampleHist(logFit)
Resampling (i.e. the bootstrap, cross–validation) can also be used to figure out the values of model tuning parameters (if any). We come up with a set of candidate values for these parameters and fit a series of models for each tuning parameter combination. For each combination, fit $B$ models to the $B$ resamples of the training data. There are also $B$ sets of samples that are not in the resamples. These are predicted for each model. $B$ sets of performance values is computed for each candidate variable(s). Performance is estimated by averaging the $B$ performance values.
In [8]:
%%R
knnFit <- train(x = trainDescr, y = trainClass, trControl = trControl,
method = "knn", tuneLength = 5)
In [9]:
%R print(knnFit)
In [10]:
%%R
knnFit$finalModel
In [11]:
%%R
knnFit <- train(x = trainDescr, y = trainClass, method = "knn", trControl = trControl,
tuneGrid = expand.grid(k=seq(1,21,2)))
knnFit
In [12]:
%%R
plot(knnFit)
In [13]:
%%R
head(predict(knnFit$finalModel, newdata = testDescr))
However, predict, can have nuanced syntax depending upon the model in question. Instead, we can use caret functions extractPrediction and extractProb to handle all of the inconsistent syntax.
It can also handle multiple models at once.
In [14]:
%%R
predValues <- extractPrediction(list(
knnFit,
logFit),
testX = testDescr,
testY = testClass)
testValues <- subset(predValues, dataType == "Test")
str(testValues)
In [15]:
%%R
probValues <- extractProb(list(knnFit, logFit),
testX = testDescr,
testY = testClass)
testProbs <- subset(probValues,
dataType == "Test")
str(testProbs)
For classification models, there are functions to compute the confusion matrix and associated statistics. There are also functions for two–class problems: sensitivity, specificity and so on.
The function confusionMatrix calculates statistics for a data set. The no–information rate (NIR) is estimated as the largest class proportion in the data set. A one–sided statistical test is done to see if the observed accuracy is greater than the NIR.
In [16]:
%%R
knnPred <- subset(testValues, model == "knn")
confusionMatrix(knnPred$pred, knnPred$obs)
In [17]:
%%R
logPred <- subset(testValues, model == "glm")
confusionMatrix(logPred$pred, logPred$obs)
Walter Pitts (1923 - 1969) |
![]() |
Warren McCulloch (1898 - 1969) |
David Rumelhart (1942 - 2011) |
![]() |
James McClelland (1948 - ) |
"Neurons," in this case, can be thought of as logistic regressors.
The output of a neuron is a function of the weighted sum of the inputs plus a bias
$$ Output = f(i_1w_1 + i_2w_2 + \ldots + i_nw_n + bias) $$
Example:
Given input: $[1, 0, 1]$
and initial weights: $[0.5, 0.2, 0.8]$
Assuming Output Threshold = 1.2
$1 \times 0.5 + 0 \times 0.2 + 1 \times 0.8 = 1.3 > 1.2$
Assume Output was supposed to be 0 -> update the weights
Assume $\alpha = 1; \\ W_{1_{new}} = 0.5 + 1\times(0-1)\times1 = -0.5 \\ W_{2_{new}} = 0.2 + 1\times(0-1)\times0 = 0.2 \\ W_{3_{new}} = 0.8 + 1\times(0-1)\times1 = -0.2 $
In [18]:
%%R
library(nnet)
library(devtools)
source_url('https://gist.github.com/fawda123/7471137/raw/c720af2cea5f312717f020a09946800d55b8f45b/nnet_plot_update.r')
In [24]:
%%R
eight <- data.frame(X1=c(1, rep(0, 7)), X2=c(0,1,rep(0,6)), X3=c(0,0,1,rep(0,5)), X4=c(0,0,0,1,rep(0,4)),
X5=c(rep(0,4),1,0,0,0), X6=c(rep(0,5),1,0,0), X7=c(rep(0,6),1,0), X8=c(rep(0,7),1))
eight
In [37]:
%%R
library(nnet)
eight.net <- nnet(x = eight, y = eight, size = 3)
In [20]:
%%R
plot(eight.net)
In [54]:
%%R
plot(eight.net)
In [41]:
%%R
hidden_sums <- function(i) {
return(c(sum(plot.nnet(eight.net,wts.only=T)[['hidden 1 1']][c(1,i+1)]),
sum(plot.nnet(eight.net,wts.only=T)[['hidden 1 2']][c(1,i+1)]),
sum(plot.nnet(eight.net,wts.only=T)[['hidden 1 3']][c(1,i+1)])))
}
t(sapply(c(1:8), hidden_sums))
In [42]:
%%R
t(sapply(c(1:8), hidden_sums) > 1) * 1
In [138]:
%%R
nnet(trainClass ~ ., data=trainDescr, size = 3, decay = 5e-4)
In [43]:
%%R
library(nnet)
nnet(trainClass ~ ., data=trainDescr, size = 3, decay = 5e-4, trace=FALSE)
The number of hidden nodes and the decay can both greatly affect the success of a neural net.
caret to the rescue:
In [82]:
%%R
eights <- rbind(eight, eight, eight, eight, eight, eight, eight, eight)
eightTrain <- createDataPartition(y = seq(1:nrow(eights)), p = 7/8, list = FALSE)
trainEight <- eights[eightTrain,]
testEight <- eights[-eightTrain,]
is(apply(testEight, 2, factor)[,2]
In [86]:
%%R
nnetFit <- train(x = trainEight, y = apply(testEight, 2, factor),
method = "nnet", #trace=FALSE,
tuneLength = 3)
plot(nnetFit)
In [143]:
%%R
nnetFit <- train(x = trainDescr, y = trainClass,
method = "nnet", trace=FALSE,
tuneLength = 5)
plot(nnetFit)
In [147]:
%%R
plot(nnetFit, plotType="level")
In [123]:
%%R
library(caret)
data(iris)
irisTrain <- createDataPartition(y = iris$Species, p = 3/4, list = FALSE)
trainX <- iris[irisTrain,1:4]
testX <- iris[-irisTrain,1:4]
trainY <- iris$Species[irisTrain]
print(length(trainY))
testY <- iris$Species[-irisTrain]
print(length(testY))
In [135]:
%%R
irisKN <- train(x = trainX, y = trainY, method='knn')
irisNB <- train(x = trainX, y = trainY, method='nb')
irisNN <- train(x = trainX, y = trainY, method='nnet', trace=FALSE)
In [136]:
%%R
irisProbValues <- extractProb(list(irisKN, irisNB, irisNN),
testX = testX,
testY = testY)
irisTestProbs <- subset(irisProbValues, dataType == "Test")
str(irisTestProbs)
In [140]:
%%R
irisKNNPred <- subset(irisTestProbs, model == "knn")
confusionMatrix(irisKNNPred$pred, irisKNNPred$obs)
In [141]:
%%R
irisNaiveBayesPred <- subset(irisTestProbs, model == "nb")
confusionMatrix(irisNaiveBayesPred$pred, irisNaiveBayesPred$obs)
In [138]:
%%R
irisNNetPred <- subset(irisTestProbs, model == "nnet")
confusionMatrix(irisNNetPred$pred, irisNNetPred$obs)