Reinforcement Learning with Unsupervised Auxiliary Tasks

  • Deepmind, 2016

Main Strategy

  • Learning state transition model and reward regression(anticipating) model using experience history.
  • Use the convolution front end network for the typical actor/critic front end.
  • No other information except for state, next state, reward. => Unsupervised Auxiliary

In [ ]: