Value Function Approximation
Types of Value Function Approximation
$$ \begin{align} s & \mapsto \hat{v}(s, \mathbf{w}) \\ s, a & \mapsto \hat{q}(s, a, \mathbf{w}) \\ s & \mapsto \hat{q}(s, a_1, \mathbf{w}) \dots \hat{q}(s, a_m, \mathbf{w}) \end{align} $$Which function approximator?
There are many function approximators, eg:
Which function approximator? 2
We consider differentiable function approximators, ie:
Furthermore, we require a training method that is suitable for non-stationary, non-iid data.
Gradient Descent
... where $\alpha$ is a step-size parameter
Value function approximation by stochastic gradient descent
21:15 to here
Feature vectors
Linear value function approximation
Table lookup features
30:53 to here
Incremental prediction algorithms
in practice, we substitute a target for $v_\pi(s)$:
$$ \Delta \mathbf{w} = \alpha(G_t - \hat{v}(S_t, \mathbf{w})\, \nabla_\mathbf{w} \hat{v}(S_t, \mathbf{w})) $$
$$ \Delta \mathbf{w} = \alpha(R_{t+1} + \gamma \hat{v}(S_{t+1}, \mathbf{w}) - \hat{v}(S_t, \mathbf{w})) \, \nabla_\mathbf{w}(S_t, \mathbf{w}) $$
$$ \Delta \mathbf{w} = \alpha(G_t^\lambda - \hat{v}(S_t, \mathbf{w}) \, \nabla_\mathbf{w}(S_t, \mathbf{w})) $$
Monte-Carlo with value function approximation
TD Learning with Value Function Approximation
where:
$$ \delta = R_{t+1} + \gamma \hat{v}(S_{t+1}, \mathbf{w}) - \hat{v}(S_t, \mathbf{w})) $$TD($\lambda$) with Value Function Approximation
where:
$$ \begin{align} \delta_t & = R_{t+1} + \gamma \hat{v}(S_{t+1}, \mathbf{w}) - \hat{v}(S_t, \mathbf{w}) \\ E_t & = \gamma \lambda E_{t-1} + \delta_t \mathbf{x}(S_t) \end{align} $$Forward view and backward view TD($\lambda$) are equivalent.
49:00 to here