David Silver (Deep Mind)

  • Eligibility Traces
Eligibility traces unify and generalize TD and Monte Carlo methods. When TD
methods are augmented with eligibility traces, they produce a family of methods
spanning a spectrum that has Monte Carlo methods at one end (λ = 1) and one-
step TD methods at the other (λ = 0).

Policy Gradient

  • Understanding Baseline

  • We need to approximate V instead of Q *

UC Berkeley CS 294: Deep Reinforcement Learning, Spring 2017

모두의 연구소