Boundary conditions / edge cases
for each of $t \in \{0, ..., T\}$:
input[t]
output[t]
define: out[t==-1] = 0
formula for each timestep
out[t] = (out[t - 1] + input[t]) * W
Chain rule:
$$ \frac{\partial a}{\partial b} = \frac{\partial a}{\partial c} \frac{\partial c}{\partial b} $$$$ \frac{\partial a}{\partial b} = [\text{something we know}] [\text{something we can calculate}] $$Back propagation
We want:
$$ \frac{\partial \text{out}[T]}{\partial W[t]} $$out[t + 1] = (out[t] + input[t + 1]) * W
Definitions
gradOutput[t]
=gradW[t]
=
$$
\frac{\partial \text{out}[T] }{\partial W[t]}
$$