From: Tyler Robison (trobison_at_cs.washington.edu)
Date: Mon Nov 24 2003 - 00:01:16 PST
Acting Optimally in Partially Observable Stochastic Domains
Anthony R Cassandra, Leslie Pack Kaelbling and Michael L. Littman
Summary:
This paper describes an algorithm, called the Witness Algorithm, for
solving POMDPs, and is described as being significantly more efficient
than other techniques.
Important Ideas:
The main idea in this paper is the Witness Algorithm, which finds a (near)
optimal strategy (here a policy graph) when given a continuous MDP. But
this algorithm only works for MDPs, and so the paper's second important
point is that it describes how to convert the POMDP into a completely
observable continuous space 'belief' MDP. This is achieved by using
beliefs of the POMDP as states in the belief MDP, and creating transition
and reward functions that take this into account.
In reality, many situations will not be fully observable, and so a method
to convert POMDPs to MDPs and then to solve them efficiently sounds
promising.
Flaws:
The Results section of the paper was fairly ambiguous, without any
strong evidence indicating that the algorithm works well. Very few
figures are given, and no useful comparisons against other algorithms are
presented. We are told that their algorithm works, and works very well,
but we need to be shown this instead.
Research Ideas:
They state that the results presented are preliminary, and so it
would be helpful to see some more testing, and more importantly,
comparison with other techniques. Without being able to see concrete
results, it is very difficult to analyze the algorithm.
This archive was generated by hypermail 2.1.6 : Mon Nov 24 2003 - 00:01:17 PST