Boundary conditions / edge cases

  • $t \in \{0, ..., \text{seq_len} - 1\}$
  • define $T = \text{seq_len} - 1$
  • $t \in \{0, ..., T\}$

for each of $t \in \{0, ..., T\}$:

  • we have an input[t]
  • we calculate an output[t]

define: out[t==-1] = 0

formula for each timestep

  • out[t] = (out[t - 1] + input[t]) * W

Chain rule:

$$ \frac{\partial a}{\partial b} = \frac{\partial a}{\partial c} \frac{\partial c}{\partial b} $$$$ \frac{\partial a}{\partial b} = [\text{something we know}] [\text{something we can calculate}] $$

Back propagation

We want:

$$ \frac{\partial \text{out}[T]}{\partial W[t]} $$
$$ \frac{\partial \text{out}[T]}{\partial W[t]} = \frac{\partial \text{out}[T]}{\partial \text{out}[t]} \frac{\partial \text{out}[t]}{\partial W[t]} $$
$$ \frac{\partial \text{out}[t]}{\partial W[t]} = out[t - 1] + input[t] $$
$$ \frac{\partial \text{out}[T]}{\partial \text{out}[t]} =\frac{\partial \text{out}[T]}{\partial \text{out}[t + 1]} \frac{\partial \text{out}[t+1]}{\partial \text{out}[t]} $$

out[t + 1] = (out[t] + input[t + 1]) * W

$$ \frac{\partial \text{out}[t+1]}{\partial \text{out}[t]} = W $$

Definitions

  • gradOutput[t] =
$$ \frac{\partial \text{out}[T]}{\partial \text{out}[t]} $$
  • gradW[t] = $$ \frac{\partial \text{out}[T] }{\partial W[t]} $$