The differential entropy of the posterior is defined as:
$$ H(y|\mathbf{x}) = E[I(y|\mathbf{x})] = \int_X{P(y|\mathbf{x})\cdot \log(P(y|\mathbf{x})) d\mathbf{x}} $$In order to determine which point to examine next, we want to minimize the expected entropy of the posterior distribution after examining that point. Simply picking the highest point is computationally expensive and will give us poor results so instead we sample stochastically.
Entropy can sometimes be negative, which means we need a better metric:
Relative entropy (KL divergence): KL divergence Box p. 62 is maybe the most useful metric here.
Change in entropy: difference in information old posterior vs. new posterior.
Relative entropy accounts for shifts in position, change in entropy does not. Mihay
In [25]:
import scipy.stats
print(scipy.stats.norm.entropy(0,1))
print(scipy.stats.norm.entropy(0,2))
print(scipy.stats.uniform.entropy(0,0.5))
print(scipy.stats.uniform.entropy(0,1))
print(scipy.stats.uniform.entropy(0,2))
Boxoptimizes for information gain over the class of candidate models instead. Computationally simpler, less appropriate for open-ended problems.
Finite horizon vs. infinite horizon Mihay