This is the Python implementation of Bayesian tests to compare the performance of classifiers (more in general algorithms) assessed via cross-validation.
The package bayesiantests
includes the following tests:
correlated_ttest
performs the correlated t-test on the performance of two classifiers that have been assessed by $m$-runs of $k$-fold cross-validation on the same dataset. It return probabilities that, based on the measured performance, one model is better than another or vice versa or they are within the region of practical equivalencesigntest
computes the probabilities that, based on the measured performance, one classifier is better than another or vice versa or they are within the region of practical equivalencesignrank
computes the Bayesian equivalent of the Wilcoxon signed-rank test. It return probabilities that, based on the measured performance, one model is better than another or vice versa or they are within the region of practical equivalence.hierarchical
compares the performance of two classifiers that have been assessed by m-runs of k-fold cross-validation on q datasets by using a bayesian hierarchical model.We have written three notebooks that explain how to use these tests. Moreover, the notebook The importance of the Rope
discusses with an example the importance of setting a region of practical equivalence