Model and Planning

  • We use "PLANNING" to refer to any computational process that takes a model as input and produces or improves a policy for interacting with the modeled environment
  • state-space planning
    • a search through the state space for an optimal policy or path to a goal.
  • Plan-space planning
    • Operators transform one plan into another, and value functions, if any, are defined over the space of plans.
    • Plan-space methods are difficult to apply efficiently to the stochastic optimal control problems that are the focus in reinforcement learning, and we do not consider them further (see, e.g., Russell and Norvig, 2010).
  • In this chapter we argue that various other state-space planning methods also fit this structure, with individual methods differing only in the kinds of backups they do, the order in which they do them, and in how long the backed-upinformation is retained.

Dyna: Integrating Planning, Acting, and Learning

  • trial-and-error learning
    • importance of cognition
  • reactive decision-making

    • deliberative planning
  • random-sample one-step tabular Q-planning

  • one-step tabular Q-learning

If n == 0

  • Just direct learning one-step Q-learning
  • except for the last step of the first episode Q table entries are not updated, remained random.

If n == 50

  • After last step (updating Q as "Up", (f) update other <50 Q by simulating Model.

The general problem here is another version of the conflict between exploration and exploitation. In a planning context, exploration means trying actions that improve the model, whereas exploitation means behaving in the optimal way given the current model.


In [ ]: