Eligibility traces unify and generalize TD and Monte Carlo methods. When TD
methods are augmented with eligibility traces, they produce a family of methods
spanning a spectrum that has Monte Carlo methods at one end (λ = 1) and one-
step TD methods at the other (λ = 0).
Understanding Baseline
We need to approximate V instead of Q *