From: Keith Noah Snavely (snavely_at_cs.washington.edu)
Date: Mon Nov 24 2003 - 01:36:54 PST
"Acting Optimally in Paritally Observable Stochastic Domains", by
Cassandra et al. describes a new algorithm to find approximate
solutions to partially-observable MDPs.
One major contribution of this paper is the Witness algorithm. Like
other algorithms, Witness approximates the value function of a POMDP
as a piecewise-linear convex function, which can be represented by a
set of vectors, and creates an incrementally better approximation by
adding vectors to the set. The improvement of Witness over these
other algorithms seems to be in determining the vector (if any) that
should be added to the set. This problem is formulated as a linear
program, which produces approximate solutions quickly. The final set
of vectors partitions the belief space and a policy graph is created
using these partitions as nodes (I wasn't sure if this is an original
contribution, but it is a neat idea anyway, and seems similar in a way
to the use of ADDs in SPUDD). The authors include some examples that
explain the intuition behind why the structure of problems often
result in the belief state space forming a relatively small number of
partitions, showing the use of this approach.
The major weakness of this paper is just the sketchiness of some of
the details of this work and the work that it builds upon. The paper
claims that in theory and practice their algorithm is faster than its
predecessors, but the authors do not show why. For instance, it would
have been nice to see a comparison with the linear support algorithm
of Cheng's that is mentioned. The paper also mentions a heuristic for
creating the policy graph from the final set of vectors, but does not
expand on it. Another complaint that is more or less invariant over
the MDP papers we have read is with the treatment of experiments; the
problems that the algorithm was run on were not characterized in any
sense, and the results of the experiments were spotty. This probably
points to a lack of a good set of standard benchmarks for these
problems, for which the authors cannot really be blamed. From the
results given, however, it is evident that even with the improvements
made in this work it is not yet practical to solve large POMDPs.
I think that the most interesting future work in this area is in
finding ways of either a) improving the representation of the value
function or otherwise refine the algorithm to make it faster, or b)
approximating solutions to POMDPs intelligently (without too much
decrease in quality). Another really good thing would be to build a
good test suite of representative POMDPs for testing and comparing
different algorithms.
This archive was generated by hypermail 2.1.6 : Mon Nov 24 2003 - 01:36:55 PST