Reinforcement Learning with Unsupervised Auxiliary Tasks
Main Strategy
- Learning state transition model and reward regression(anticipating) model using experience history.
- Use the convolution front end network for the typical actor/critic front end.
- No other information except for state, next state, reward. => Unsupervised Auxiliary