Note that this excerpt contains only the raw code - the book is rich with additional explanations and illustrations. If you find this content useful, please consider supporting the work by buying the book!
In this chapter, we will dive deeper into model evaluation and hyperparameter tuning. Assume that we have two different models that might apply to our task. How can we know which one is better? Answering this question often involves repeatedly fitting different versions of our model to different subsets of the data, such as in cross-validation and bootstrapping. In combination with different scoring functions, we can obtain reliable estimates of the generalization performance of our models.
But what if two different models give similar results? Can we be sure that the two models are equivalent, or is it possible that one of them just got lucky? How can we know whether one of them is significantly better than the other? Answering these questions will lead us to discussing some useful statistical tests such as Students t-test and McNemar's test.
As we will get familiar with these techniques, we will also want to answer the following questions: