Reinforcement Learning with Unsupervised Auxiliary Tasks

Learning state transition model and reward regression(anticipating) model using experience history.
Use the convolution front end network for the typical actor/critic front end.
No other information except for state, next state, reward. => Unsupervised Auxiliary



In [ ]: