In [1]:
using POMCP
using POMDPs
using POMDPModels
using POMDPToolbox
using GenerativeModels
problem = BabyPOMDP();
rng = MersenneTwister(1);
In [2]:
solver = POMCPSolver(rng=rng, tree_queries=5)
policy = solve(solver, problem)
up = updater(policy)
Out[2]:
In this implementation of POMCP, the "belief state" is actually the tree itself, so when a new observation is received from the environment, the RootUpdater simply chooses a new node to act as the root of the tree for the next decision based on the observation.
In [3]:
# setup
init_dist = initial_state_distribution(problem)
s = rand(rng, init_dist)
first_root_node = initialize_belief(up, init_dist)
@show typeof(first_root_node)
# plan and execute first action
a = action(policy, first_root_node)
(sp, o, r) = generate_sor(problem, s, a, rng)
# the updater simply chooses the next root node
second_root_node = update(up, first_root_node, a, o)
@show typeof(second_root_node)
# this new node contains particles representing the belief
@show second_root_node.B
# at the next step, POMCP uses the new root
action(policy, second_root_node);
Unfortunately, this simple unweighted particle filter scheme will often run into the problem of particle depletion, in which there are no particles corresponding the observation from the environment.
In [4]:
# artificially simulate particle depletion
delete!(first_root_node.children[a].children, o)
update(up, first_root_node, a, o)
The three solutions listed in the error message above are available to alleviate the problem. If you still want to use the standard POMCP unweighted particle filter, you must implement a ParticleReinvigorator (particle reinvigoration was used in the original POMCP paper).
A ParticleReinvigorator should have two associated methods implemented for it, reinvigorate! and handle_unseen_observation. Below is a sample. (This is a very bad reinvigorator because it uniformly adds particles. A real reinvigorator should use domain knowledge to add particles similar to the ones already present and consistent with the action and observation.)
In [5]:
type UniformBabyReinvigorator <: ParticleReinvigorator end
function POMCP.reinvigorate!(pc::ParticleCollection,
r::UniformBabyReinvigorator,
old_node::BeliefNode, a::Bool, o::Bool)
push!(pc, true)
push!(pc, false)
return pc
end
function POMCP.handle_unseen_observation(r::UniformBabyReinvigorator,
old_node::BeliefNode, a::Bool, o::Bool)
return ParticleCollection{Bool}([true, false])
end
This allows an update to happen even when POMCP didn't simulate any particles resulting in the observation from the environment.
In [6]:
up_with_reinvig = RootUpdater(UniformBabyReinvigorator())
# artificially simulate particle depletion
delete!(first_root_node.children[a].children, o)
update(up_with_reinvig, first_root_node, a, o)
Out[6]:
In [7]:
init_dist = initial_state_distribution(problem)
@show typeof(init_dist)
action(policy, init_dist)
Out[7]:
However, in this case, the policy unnecessarily keeps track of the particles at each node in the tree.
In [8]:
get(policy._tree_ref).children[true].children[false].B
Out[8]:
In order to prevent this, we can use a VoidUpdater from POMDPToolbox.
In [9]:
solver = POMCPSolver(rng=rng, tree_queries=5,
node_belief_updater = VoidUpdater())
policy = solve(solver, problem)
a = action(policy, init_dist)
@show get(policy._tree_ref).children[true].children[false].B
Note that even if the node_belief_updater is a VoidUpdater, we can still use any belief updater to handle real observations from the environment:
In [10]:
exact_updater = BabyBeliefUpdater(problem)
(sp, o, r) = generate_sor(problem, s, a, rng)
@show belief2 = update(exact_updater, init_dist, a, o)
a2 = action(policy, belief2)
Out[10]:
We can even use an arbitrary updater to update the beliefs at the nodes (for example if the rollout policy needs a specific belief representation).
In [11]:
solver = POMCPSolver(rng=rng, tree_queries=5,
node_belief_updater = exact_updater)
policy = solve(solver, problem)
a = action(policy, init_dist)
get(policy._tree_ref).children[true].children[false].B
Out[11]:
It is also possible for a custom belief updater to use the POMCP planner's decision-making simulations for a custom updater. To do this, simply define POMCP.uses_states_from_planner(::YourBeliefType) = true and Base.push!(::YourBeliefType, ::State) will be called every time a node is visited in POMCP.
In the following example, the updater does not actually use the simulations to predict the next belief, but it prints out a message whenever a state is pushed to it by the planner.
In [12]:
POMCP.uses_states_from_planner(::BoolDistribution) = true
Base.push!(::BoolDistribution, s) = println("Received state $s from planner.")
solver = POMCPSolver(rng=rng, tree_queries=5,
node_belief_updater = exact_updater)
policy = solve(solver, problem)
a = action(policy, init_dist)
Out[12]:
In [ ]: