Information Theory

Using information theory, you can measure the information before and after the split.
  • The change in information before and after the split is known as the information gain.
  • When you know how to calculate the information gain, you can split your data across every feature to see which split gives you the highest information gain.
  • The split with the highest information gain is your best option.
Shannon entropy used to measure information of a set (explainer article here along with cool background:

