Kullback-Leibler divergence

$$ D_{KL}\left( P \| Q \right) = \mathbb{E}_P \left[ \log \left( \frac{ P}{Q} \right) \right] = \mathbb{E}_P \left[ \log P - \log Q \right] $$

ie, its the expectation over $P$ of the difference in the log probabilities

  • if the log probabilities are similar wherever $P$ is far-ish from zero, then $KL(P \| Q)$ will be small-ish
  • where $P$ is small or zero, $Q$ is not very constrained
  • where $P$ is high, $Q$ needs to be close to $P$
  • where $Q$ is small, $P$ needs to be small too

consequences, if we are trying to fit some uni-modal distribution $s$ to $r$, and $r$ has two modes:

  • if we minimize $D_{KL}(r \| s)$, ie minimize $\mathbb{E}_r[ \log(r/s)]$, then we constrain $r$ and $s$ to be similar anywhere $r$ is large, but $s$ is relatively unconstrained for small $r$, so the result will be that the mode of $s$ will span both of $r$'s modes
  • if we minimize $D_{KL}(s \| r)$, ie minimize $\mathbb{E}_s[ \log(s/r) ]$, then we constrain $r$ and $s$ to be similar anywhere $s$ is large. A single mode that spans both of $r$'s modes will by contrast have high $s$ mass in the middle of $r$'s modes, and thus not satisfy this constraint
    • by constrast, if $s$ has its mode over one of $r$'s modes only, then $r$ and $s$ can have similar probabilities where $s$ probability is high, and thus the KL in this case can be small