We want to solve the following integral in closed form:
subsitute in:
$$ x = \mu + st $$Therefore:
$$ dx = s\,dt $$and:
$$ t = \frac{x - \mu} {s} $$For the limits, we have:
$$ x_1 = -\infty, x_2 = \infty $$Then, in terms of $t$, and bearing in mind $s > 0$, we have:
$$ t_1 = -\infty, t_2 = \infty $$Therefore:
$$ I_3 = \frac{1}{s} \int_{-\infty}^\infty \log(\sigma(\mu + st)) \phi(t) s \, dt $$Let's try differentiating with respect to $\mu$, as per 'fast dropout training':
Looking at:
$$ E_2 = \frac{\partial}{\partial \mu} \sigma(\mu + st) $$Define $\mu + st = g(\mu)$. Then:
$$ E_2 = \frac{\partial}{\partial \mu} \sigma(g(\mu)) $$And:
$$ \frac{\partial E_2}{\partial \mu} = \frac{\partial E_2}{\partial g} \frac{\partial g}{\partial \mu} $$Therefore:
$$ \partial_\mu I_3 = \int_{-\infty}^\infty \frac{\sigma(\mu + st)(1 - \sigma(\mu+st))} {\sigma(\mu + st)} \phi(t) \, dt $$Note that by symmetry, $(1 - \sigma(x)) = \sigma(-x)$. So we can write:
$$ \partial_\mu I_3 = \int_{-\infty}^\infty \sigma( -\mu - st) \phi(t)\,dt $$Per 'fast dropout training', we want to get the integral in the form:
$$ I_1 = \int_{-\infty}^\infty \sigma(x) \mathcal{N}(x \mid \mu, s^2)\,dx = \int_{-\infty}^\infty \sigma(x) \frac{1}{s} \phi\left(\frac{x - \mu}{s} \right)\,dx $$Let's try substituting $x = \mu +st$ back again:
Lets use the $\sigma(-x) = 1 - \sigma(x)$ identity again...
Therefore:
$$ \partial_\mu I_3 = \frac{1}{s}\int_{-\infty}^{\infty} \phi\left(\frac{x - \mu}{s}\right)\,dx - \frac{1}{s} \int_{-\infty}^\infty \sigma(x) \phi\left( \frac{x - \mu}{s} \right)\,dx $$But this is the partial derivative of $I$ wrt $\mu$. So, we need to reintegrate this back up.
Looking at http://mathworld.wolfram.com/SigmoidFunction.html , the indefinite integral of $\sigma(x)$ is:
$$ \int \sigma(x)\,dx = \ln(1 + \exp(x)) + C $$We want:
$$ I_3 \approx \int \sigma \left( \frac{-\mu} {\sqrt{1 + \pi s^2/8}} \right) \, d\mu + C $$Lets substitute in:
$$ z = -\frac{\mu}{\sqrt{1 + \pi s^2/8}} $$Therefore:
$$ dz = - \frac{d\mu}{\sqrt{1 + \pi s^2/8}} $$and:
$$ d\mu = - \sqrt{1 + \pi s^2/8}\,dz $$Therefore:
$$ I_3 \approx - \int \sigma(z) \sqrt{1 + \pi s^2 / 8 }\, dz + C $$And we have $z = - \frac{\mu}{\sqrt{1 + \pi s^2/8}}$. So:
... which is looking pretty close to equation (8) in the 'fast dropout training' paper :)
We have:
$$ \log(AB) = \log(A) + \log(B) $$Subsitute $C = -B$, $A = 1$. Then we have:
$$ \log(-C) = \log(1) + \log(-C) = \log(-C) $$Hmmm :P
Let's try:
$$ log(A/B) = \log(A) - \log(B) $$Substitute $A = 1$:
$$ \log(1/B) = -\log(B) $$Therefore:
$$ I_3 = \sqrt{1 + \pi s^2/8} \log\left( \frac{1} {1 + \exp(- \mu / \sqrt{1 + \pi s^2/8}} \right) + C $$Let's try solving for $C$, and see what happens. We need to find at least one known value, to 'fix' or 'ground' the $C$ value.
Let's look at the original integral again, before we differentiated it. It was:
$$ I_3 = \frac{1}{s} \int_{-\infty}^\infty \log(\sigma(x)) \, \phi \left( \frac{x - \mu }{s} \right) \, dx $$... and we want to find some values for $\mu$ and $s$ that will give us a known value.
Let's set $s=1$, which basically makes it vanish from the expressions. Then looking at $\mu$, it only appears in the second $\phi(\cdot)$ term. And, as $\mu \rightarrow \infty$, or $\mu \rightarrow -\infty$, then the second term will tend to $0$. However, we have constrainted $\mu > 0$. Therefore, let's choose to tend $\mu \rightarrow \infty$. As $\mu \rightarrow \infty$, then, looking at other terms:
Therefore:
$$ \lim_{\mu \rightarrow \infty} I_3 = 0 $$Meanwhile, looking at our final expression for $I_3$, we have:
$$ I_3 = \sqrt{1 + \pi s^2} \log \left( \sigma \left( \frac{\mu} {\sqrt{1 + \pi s^2/8}} \right) \right) + C $$As $\mu \rightarrow \infty$ $\sigma(\mu) \rightarrow 1$, and thus $\log(\sigma(\mu)) \rightarrow 0$.
Therefore, $\lim_{\mu \rightarrow \infty} I_3 = C$
Therefore, by comparison with the earlier result for $I_3$, based on the original integral, $C=0$.
Therefore we have:
$$ I_3 \approx \sqrt{1 + \pi s^2} \log \left( \sigma \left( \frac{\mu}{\sqrt{1 + \pi s^2/8}} \right) \right) $$which matches the 'fast dropout training' paper :)