Deep Reinforcement Learning for Partially Observable Parameterized Envrionments
presented by Microsoft Research
It depends on how much information is available at each time-step.
Addition of the LSTM provides clear benefits:
Important Note: DRQN doesn't always beat DQN scores, case in point is the Beam Rider environment.
DRQN has been extended: