• What is a model?
    a summarized representation of the training data

Base classifiers

  • Bayes

Ensemble classifiers

  • Nearest-neighbor classifier does not generate a model.

Considerations

  • Qualiticative attributes

Practical issues

  • Imbalanced class distributions
    • $99%$ of all instances are from negative class
    • Stratification
      • undersample the larger class
      • oversample the smaller class
    • Other approches
      • Assign higher penalty to mispredicting instances from smaller class
      • Generate more data by purtutbing smaller class

Approches for big-data

  • Randomly sample a subset of the data
  • Repeat the sampling $B$ times
  • Combine the classifiers

In [ ]: