Model Evaluation

  • Accracy

    $Accuracy= \frac{correctly classified}{all classified instances}$

  • Confusion Matrix


Predicted Class
+
-
Actual Class
+ TP FN
- FP
TN
  • True Positive rate

  • False Positive Rate

  • Precision

  • Recall

Splitting the data into train/test

If we dvide the data into 50%, how do we decide which one to use?

K-fold cross validation

At the end, we develop k models. Which one we should use to predict the label of an unlabeled data?

None of the k models. K-fold cross-validation is only used to estimate the performance of our model. For real prediction, we should use the entire dataset for training, and that trained model should be used for prediction.

Types of classification errors

  • Training error
  • Test error

Model overfitting

Underfitting model is too simple, both training and test errors are high

Overfitting model is too complex, training error is small, but test error is large.

Source of overfitting

  • Fitting the noise points
  • Not enough training data

Model selection

  • Training error is

Criteria for choosing best classifier

  • Performance (accuracy) vs. Descriptive

    • Artifical neural networks and support vector machines
    • Decision trees and rule-based classifiers provide more descritive models.
  • Efficiency in training vs. testing