In [1]:

    
using POMCP
using POMDPs
using POMDPModels
using POMDPToolbox
using GenerativeModels

problem = BabyPOMDP();
rng = MersenneTwister(1);

Default: POMCP Unweighted Particle Filter

The default belief updater for POMCP uses the same particles for decision-making and belief updates as described in the paper. This behavior is implemented with the RootUpdater type.



In [2]:

    
solver = POMCPSolver(rng=rng, tree_queries=5)
policy = solve(solver, problem)
up = updater(policy)









    Out[2]:





POMCP.RootUpdater{POMCP.DeadReinvigorator{Bool}}(POMCP.DeadReinvigorator{Bool}())

In this implementation of POMCP, the "belief state" is actually the tree itself, so when a new observation is received from the environment, the RootUpdater simply chooses a new node to act as the root of the tree for the next decision based on the observation.



In [3]:

    
# setup
init_dist = initial_state_distribution(problem)
s = rand(rng, init_dist)
first_root_node = initialize_belief(up, init_dist)
@show typeof(first_root_node)

# plan and execute first action
a = action(policy, first_root_node)
(sp, o, r) = generate_sor(problem, s, a, rng)

# the updater simply chooses the next root node
second_root_node = update(up, first_root_node, a, o)
@show typeof(second_root_node)

# this new node contains particles representing the belief
@show second_root_node.B

# at the next step, POMCP uses the new root
action(policy, second_root_node);









    



typeof(first_root_node) = POMCP.RootNode{POMDPModels.BoolDistribution}
typeof(second_root_node) = POMCP.ObsNode{POMCP.ParticleCollection{Bool},Bool,Bool}
second_root_node.B = POMCP.ParticleCollection{Bool}(Bool[false,false,false])

Unfortunately, this simple unweighted particle filter scheme will often run into the problem of particle depletion, in which there are no particles corresponding the observation from the environment.



In [4]:

    
# artificially simulate particle depletion
delete!(first_root_node.children[a].children, o)
update(up, first_root_node, a, o)









    



LoadError: POMCP.jl: Particle Depletion! To fix this, you have three options:
      1) use more tree_queries (will only work for very small problems)
      2) implement a ParticleReinvigorator with reinvigorate!() and handle_unseen_observation()
      3) implement a more advanced updater for the agent (POMCP can use any
         belief/state distribution that supports rand())

while loading In[4], in expression starting on line 3

 in update(::POMCP.RootUpdater{POMCP.DeadReinvigorator{Bool}}, ::POMCP.RootNode{POMDPModels.BoolDistribution}, ::Bool, ::Bool, ::Void) at /home/zach/.julia/v0.5/POMCP/src/updater.jl:15
 in update(::POMCP.RootUpdater{POMCP.DeadReinvigorator{Bool}}, ::POMCP.RootNode{POMDPModels.BoolDistribution}, ::Bool, ::Bool) at /home/zach/.julia/v0.5/POMCP/src/updater.jl:14

The three solutions listed in the error message above are available to alleviate the problem. If you still want to use the standard POMCP unweighted particle filter, you must implement a ParticleReinvigorator (particle reinvigoration was used in the original POMCP paper).

A ParticleReinvigorator should have two associated methods implemented for it, reinvigorate! and handle_unseen_observation. Below is a sample. (This is a very bad reinvigorator because it uniformly adds particles. A real reinvigorator should use domain knowledge to add particles similar to the ones already present and consistent with the action and observation.)



In [5]:

    
type UniformBabyReinvigorator <: ParticleReinvigorator end

function POMCP.reinvigorate!(pc::ParticleCollection,
        r::UniformBabyReinvigorator,
        old_node::BeliefNode, a::Bool, o::Bool)
    push!(pc, true)
    push!(pc, false)
    return pc
end

function POMCP.handle_unseen_observation(r::UniformBabyReinvigorator,
        old_node::BeliefNode, a::Bool, o::Bool)
    return ParticleCollection{Bool}([true, false])
end

This allows an update to happen even when POMCP didn't simulate any particles resulting in the observation from the environment.



In [6]:

    
up_with_reinvig = RootUpdater(UniformBabyReinvigorator())
# artificially simulate particle depletion
delete!(first_root_node.children[a].children, o)
update(up_with_reinvig, first_root_node, a, o)









    Out[6]:





POMCP.ObsNode{POMCP.ParticleCollection{Bool},Bool,Bool}(false,0,POMCP.ParticleCollection{Bool}(Bool[true,false,true,false]),Dict{Bool,POMCP.ActNode{Bool,Bool,POMCP.ObsNode{POMCP.ParticleCollection{Bool},Bool,Bool}}}())

Custom Belief Updater

POMCP will work out of the box with any belief that supports rand(), for example:



In [7]:

    
init_dist = initial_state_distribution(problem)
@show typeof(init_dist)
action(policy, init_dist)









    



typeof(init_dist) = POMDPModels.BoolDistribution






    Out[7]:





false

However, in this case, the policy unnecessarily keeps track of the particles at each node in the tree.



In [8]:

    
get(policy._tree_ref).children[true].children[false].B









    Out[8]:





POMCP.ParticleCollection{Bool}(Bool[false])

In order to prevent this, we can use a VoidUpdater from POMDPToolbox.



In [9]:

    
solver = POMCPSolver(rng=rng, tree_queries=5,
                     node_belief_updater = VoidUpdater())
policy = solve(solver, problem)
a = action(policy, init_dist)
@show get(policy._tree_ref).children[true].children[false].B









    



get(policy._tree_ref).children[true].children[false].B = nothing

Note that even if the node_belief_updater is a VoidUpdater, we can still use any belief updater to handle real observations from the environment:



In [10]:

    
exact_updater = BabyBeliefUpdater(problem)
(sp, o, r) = generate_sor(problem, s, a, rng)
@show belief2 = update(exact_updater, init_dist, a, o)
a2 = action(policy, belief2)









    



belief2 = update(exact_updater,init_dist,a,o) = POMDPModels.BoolDistribution(0.02409638554216867)






    Out[10]:





true

We can even use an arbitrary updater to update the beliefs at the nodes (for example if the rollout policy needs a specific belief representation).



In [11]:

    
solver = POMCPSolver(rng=rng, tree_queries=5,
                     node_belief_updater = exact_updater)
policy = solve(solver, problem)
a = action(policy, init_dist)
get(policy._tree_ref).children[true].children[false].B









    Out[11]:





POMDPModels.BoolDistribution(0.0)

Re-using Simulations in a Custom Updater

It is also possible for a custom belief updater to use the POMCP planner's decision-making simulations for a custom updater. To do this, simply define POMCP.uses_states_from_planner(::YourBeliefType) = true and Base.push!(::YourBeliefType, ::State) will be called every time a node is visited in POMCP.

In the following example, the updater does not actually use the simulations to predict the next belief, but it prints out a message whenever a state is pushed to it by the planner.



In [12]:

    
POMCP.uses_states_from_planner(::BoolDistribution) = true
Base.push!(::BoolDistribution, s) = println("Received state $s from planner.")

solver = POMCPSolver(rng=rng, tree_queries=5,
                     node_belief_updater = exact_updater)
policy = solve(solver, problem)
a = action(policy, init_dist)









    



Received state false from planner.
Received state false from planner.
Received state false from planner.
Received state false from planner.
Received state false from planner.






    Out[12]:





false



In [ ]: