Skip to content

Online POMDP tree search solver based on Prioritized Action POMCPOW (PA-POMCPOW)

License

Notifications You must be signed in to change notification settings

sisl/PA-POMCPOW.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PA-POMCPOW.jl

PA-POMCPOW is an online POMDP tree search solver that introduces action prioritization to the progressive widening of the POMCPOW algorithm. For more information, see the paper at https://arxiv.org/pdf/2010.03599.pdf. For information on the original POMCPOW, see https://arxiv.org/abs/1709.06196.

PA-POMCPOW solves problems defined using the POMDPs.jl interface. In addtion to the elements required by POMCPOW, PA-POMCPOW also requires definition of the expected information gain term as specified in the paper.

Installation

import Pkg
Pkg.add(https://github.com/sisl/PA-POMCPOW.jl)

Usage

using POMDPs
using POMDPModelTools
using POMDPSimulators
using D3Trees

m = ExamplePOMDP()
ds0 = POMDPs.initialstate_distribution(m)
s0 = rand(ds0)
up = ExampleBeliefUpdater()
b0 = initialize_belief(up, s0)

C = [0:0.1:2.0;]

action_selector = ActionSelector(example_pomdp_action_value_function, C)

solver = PAPOMCPOWSolver(tree_queries=1000,next_action=action_selector, belief_updater=up, max_depth=25)
planner = solve(solver, m)

hr = HistoryRecorder(max_steps=100)
hist = simulate(hr, pomdp, planner)
for (s, b, a, r, sp, o) in hist
    @show s, a, r, sp
end

The algorithm behavior is determined by the keyword argument values passed to the solver constructor. To enact the prioritized action selection, an ActionSelector object must be passed in the next_action argument. Without this, the algorithm defaults to the behavior of POMCPOW. All other keyword arguments control behavior as described in the original POMCPOW paper and in POMCPOW.jl.

Action Value Function

To solve a POMDP using PA-POMCPOW requires an action_values function to be defined for the problem. The function should have the arguments and returns shown below.

function action_values(p::POMDP, b::Belief)
    # YOUR CODE HERE
    return (mu=mu, sig=sig, act=act)
end

The mu and sig fields of the returned NamedTuple are the mean and standard deviation of the expected immediate reward of taking each action in the vector returned in the act field.

About

Online POMDP tree search solver based on Prioritized Action POMCPOW (PA-POMCPOW)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages