In [1]:
using POMCP
using POMDPs
using POMDPModels
using POMDPToolbox
using GenerativeModels

problem = BabyPOMDP();
rng = MersenneTwister(1);

Default: POMCP Unweighted Particle Filter

The default belief updater for POMCP uses the same particles for decision-making and belief updates as described in the paper. This behavior is implemented with the RootUpdater type.


In [2]:
solver = POMCPSolver(rng=rng, tree_queries=5)
policy = solve(solver, problem)
up = updater(policy)


Out[2]:
POMCP.RootUpdater{POMCP.DeadReinvigorator{Bool}}(POMCP.DeadReinvigorator{Bool}())

In this implementation of POMCP, the "belief state" is actually the tree itself, so when a new observation is received from the environment, the RootUpdater simply chooses a new node to act as the root of the tree for the next decision based on the observation.


In [3]:
# setup
init_dist = initial_state_distribution(problem)
s = rand(rng, init_dist)
first_root_node = initialize_belief(up, init_dist)
@show typeof(first_root_node)

# plan and execute first action
a = action(policy, first_root_node)
(sp, o, r) = generate_sor(problem, s, a, rng)

# the updater simply chooses the next root node
second_root_node = update(up, first_root_node, a, o)
@show typeof(second_root_node)

# this new node contains particles representing the belief
@show second_root_node.B

# at the next step, POMCP uses the new root
action(policy, second_root_node);


typeof(first_root_node) = POMCP.RootNode{POMDPModels.BoolDistribution}
typeof(second_root_node) = POMCP.ObsNode{POMCP.ParticleCollection{Bool},Bool,Bool}
second_root_node.B = POMCP.ParticleCollection{Bool}(Bool[false,false,false])

Unfortunately, this simple unweighted particle filter scheme will often run into the problem of particle depletion, in which there are no particles corresponding the observation from the environment.


In [4]:
# artificially simulate particle depletion
delete!(first_root_node.children[a].children, o)
update(up, first_root_node, a, o)


LoadError: POMCP.jl: Particle Depletion! To fix this, you have three options:
      1) use more tree_queries (will only work for very small problems)
      2) implement a ParticleReinvigorator with reinvigorate!() and handle_unseen_observation()
      3) implement a more advanced updater for the agent (POMCP can use any
         belief/state distribution that supports rand())

while loading In[4], in expression starting on line 3

 in update(::POMCP.RootUpdater{POMCP.DeadReinvigorator{Bool}}, ::POMCP.RootNode{POMDPModels.BoolDistribution}, ::Bool, ::Bool, ::Void) at /home/zach/.julia/v0.5/POMCP/src/updater.jl:15
 in update(::POMCP.RootUpdater{POMCP.DeadReinvigorator{Bool}}, ::POMCP.RootNode{POMDPModels.BoolDistribution}, ::Bool, ::Bool) at /home/zach/.julia/v0.5/POMCP/src/updater.jl:14

The three solutions listed in the error message above are available to alleviate the problem. If you still want to use the standard POMCP unweighted particle filter, you must implement a ParticleReinvigorator (particle reinvigoration was used in the original POMCP paper).

A ParticleReinvigorator should have two associated methods implemented for it, reinvigorate! and handle_unseen_observation. Below is a sample. (This is a very bad reinvigorator because it uniformly adds particles. A real reinvigorator should use domain knowledge to add particles similar to the ones already present and consistent with the action and observation.)


In [5]:
type UniformBabyReinvigorator <: ParticleReinvigorator end

function POMCP.reinvigorate!(pc::ParticleCollection,
        r::UniformBabyReinvigorator,
        old_node::BeliefNode, a::Bool, o::Bool)
    push!(pc, true)
    push!(pc, false)
    return pc
end

function POMCP.handle_unseen_observation(r::UniformBabyReinvigorator,
        old_node::BeliefNode, a::Bool, o::Bool)
    return ParticleCollection{Bool}([true, false])
end

This allows an update to happen even when POMCP didn't simulate any particles resulting in the observation from the environment.


In [6]:
up_with_reinvig = RootUpdater(UniformBabyReinvigorator())
# artificially simulate particle depletion
delete!(first_root_node.children[a].children, o)
update(up_with_reinvig, first_root_node, a, o)


Out[6]:
POMCP.ObsNode{POMCP.ParticleCollection{Bool},Bool,Bool}(false,0,POMCP.ParticleCollection{Bool}(Bool[true,false,true,false]),Dict{Bool,POMCP.ActNode{Bool,Bool,POMCP.ObsNode{POMCP.ParticleCollection{Bool},Bool,Bool}}}())

Custom Belief Updater

POMCP will work out of the box with any belief that supports rand(), for example:


In [7]:
init_dist = initial_state_distribution(problem)
@show typeof(init_dist)
action(policy, init_dist)


typeof(init_dist) = POMDPModels.BoolDistribution
Out[7]:
false

However, in this case, the policy unnecessarily keeps track of the particles at each node in the tree.


In [8]:
get(policy._tree_ref).children[true].children[false].B


Out[8]:
POMCP.ParticleCollection{Bool}(Bool[false])

In order to prevent this, we can use a VoidUpdater from POMDPToolbox.


In [9]:
solver = POMCPSolver(rng=rng, tree_queries=5,
                     node_belief_updater = VoidUpdater())
policy = solve(solver, problem)
a = action(policy, init_dist)
@show get(policy._tree_ref).children[true].children[false].B


get(policy._tree_ref).children[true].children[false].B = nothing

Note that even if the node_belief_updater is a VoidUpdater, we can still use any belief updater to handle real observations from the environment:


In [10]:
exact_updater = BabyBeliefUpdater(problem)
(sp, o, r) = generate_sor(problem, s, a, rng)
@show belief2 = update(exact_updater, init_dist, a, o)
a2 = action(policy, belief2)


belief2 = update(exact_updater,init_dist,a,o) = POMDPModels.BoolDistribution(0.02409638554216867)
Out[10]:
true

We can even use an arbitrary updater to update the beliefs at the nodes (for example if the rollout policy needs a specific belief representation).


In [11]:
solver = POMCPSolver(rng=rng, tree_queries=5,
                     node_belief_updater = exact_updater)
policy = solve(solver, problem)
a = action(policy, init_dist)
get(policy._tree_ref).children[true].children[false].B


Out[11]:
POMDPModels.BoolDistribution(0.0)

Re-using Simulations in a Custom Updater

It is also possible for a custom belief updater to use the POMCP planner's decision-making simulations for a custom updater. To do this, simply define POMCP.uses_states_from_planner(::YourBeliefType) = true and Base.push!(::YourBeliefType, ::State) will be called every time a node is visited in POMCP.

In the following example, the updater does not actually use the simulations to predict the next belief, but it prints out a message whenever a state is pushed to it by the planner.


In [12]:
POMCP.uses_states_from_planner(::BoolDistribution) = true
Base.push!(::BoolDistribution, s) = println("Received state $s from planner.")

solver = POMCPSolver(rng=rng, tree_queries=5,
                     node_belief_updater = exact_updater)
policy = solve(solver, problem)
a = action(policy, init_dist)


Received state false from planner.
Received state false from planner.
Received state false from planner.
Received state false from planner.
Received state false from planner.
Out[12]:
false

In [ ]: