- What is a model?
a summarized representation of the training data
Base classifiers
Ensemble classifiers
- Nearest-neighbor classifier does not generate a model.
Practical issues
- Imbalanced class distributions
- $99%$ of all instances are from negative class
- Stratification
- undersample the larger class
- oversample the smaller class
- Other approches
- Assign higher penalty to mispredicting instances from smaller class
- Generate more data by purtutbing smaller class
Approches for big-data
- Randomly sample a subset of the data
- Repeat the sampling $B$ times
- Combine the classifiers